Web publishing, online research, stats, webmining and search engines.

Friday, April 27, 2007

New domain: orchycore.com

I've now given this blog its very own domain name: orchycore.com and have switched to s9y software, because we've found it easy to include blogs in our other sites.

No more posts here! Off to orchycore.com thanks everyone.

Monday, January 29, 2007

Links from books the ultimate authoritative source?

It just occurred to me as I was doing a vanity search that lead to Google Books: wouldn't books be the ultimate authoritative source of links for a search engine?

Books sometimes have full URLs in their footnotes like the example above. They're not links you can click on, but a search engine (like Google) that indexes books can certainly read them.

There are any number of ways of spamming websites that have arbitrary been deemed authoritative, like with form spam. But nobody, to my knowledge, has thought of negotiating links on actual tree flesh in order to get into Google? It would be a very pure source of quality links.

Sunday, January 07, 2007

Federal govt copyright legislation.

The Australian Internet Industry Association has published a risk analysis for different kinds of entities for the federal government's new copyright legislation. Scary stuff. Here it is for small businesses.

Saturday, December 23, 2006

Pandora and art

I don't remember the last time I spent 4 hours on one site. But Pandora's got me. I'm gonna blog about it again.

I really love how Pandora is telling me why I like the songs I do. Every time I click "Yes, I like it" I'm adding to my list of songs I like and refining the criteria for future songs. With each rating, it gets more articulate about my tastes, far surpassing what I could have come up with myself. I'm learning about my own preference for modal harmonies, slow moving bass lines and highly synthetic sonorities.

Pandora still keeps most of its smarts to itself, which is understandable if they want to be your radio station. But take communication a little further and you a fantastic way to teach people about art and culture: telling the story through your own tastes. Imagine a service that highlights elements that your favourite paintings have in common. It would expose you to new works and when your tastes develop and change, contrast what you liked before to what you like now, explaining why and how. It would congratulate you on your growing sophistication and challenge you with art it knows you'd like if it you'd just give it a chance. It would also connect you also with similar people going on a very similar artistic journey.

Opening Pandora's Box

I was referred to Pandora today. it plays you a personal radio station where you can specify seed songs and artists and plays similar ones based on hundreds of attributes discovered in the Music Genome Project.

I just set up a channel based on The Orb. It's currently playing Moog Apella by 16B, all new to me but sounds great. I sent the channel to a fellow Orb fan. He said he really liked the track playing now, but it was different.

Was thinking it would be great if you could share channels on an ongoing basis. With such a wide library of tracks available, you'd think it could resolve disputes on what music to play in office environments. One person would play the music through their speakers (others could do if they wanted the sound closer to them). But everyone could rate the tracks that were coming through and put in a set number of request tracks for others to rate. The anonymity of the rating system would make it fair too; a secret ballot.

They should have a feature "combine channels" where different users select channels they want to hear and submit them. Then the system merges those to create something new, common and live. Would be great for being in the same space as remote workers.

A nice way of sharing for now is just to get visitors to enter their favourite tracks now into the same channel. I went round to this friend's place yesterday and we each took turns to name favourite songs. And yeah, a lot of the time the new songs played were to everyone's liking.

Next time I have a nice lady over, I'm going to ask her what her 3 favourite romantic songs are. First she'll be happy I have the song, then be amazed how close my taste is with other songs. It's this kind of technological edge that will ensure us geeks make a real impact on the genepool of generations to come.

My tip to Alexadex users. Buy Pandora and hold!

technorati tags:      

Monday, December 11, 2006

Tis the season to be networking

While I'm blogging, I'd like to thank everyone involved with STIRR. Creative party games made for a good vibe in the room and it was a good focused crowd. Our glorious team, team 3 won with our ShoeWave.com business; a peer to peer sock sharing service where you send in an odd sock with a dollar, and receive 2 matching socks back in the mail. Genius :).

Also great fun was Clickaholics. It was a younger crowd this time, not coming on the tail end of a big expensive conference. Free alcohol didn't last long but who cares? People did!

Met fun, interesting and smart people, all of whom will be great to see again.

Baby loves to ChaCha?

Just tried out ChaCha, a new stab at the old humans-search-for-you concept. The conversation was very slow. Each response took 1-3 minutes and overall it was about half an hour. I guess if you wanted to use them effectively you could open up 10 windows and ask 10 questions. I hope they have good protection against bots running thousands of queries at once.

Status: Looking for a guide ...
Status: Connected to guide: ErinL
ErinL: Welcome to ChaCha!
ErinL: hello!
You: Hi!
ErinL: Welcome to ChaCha! Please wait a moment while I search for your results.
You: are you there?
ErinL: yes I'm still searching. i'm just getting the news show.
You: it's an australian show
You: a comedy
You: different the us one
ErinL: I've found it. bare with me.
ErinL: this is the correct show, right?
You: Yep
You: I'll give you a clue
You: http://www.google.com/search?q=%22rob+sitch+played%22+frontline&hl=en&sourceid=gd&rls=GGLD,GGLD:2006-41,GGLD:en
ErinL: here is the wikipedia site. he played Mike Moore.
You: Thanks..Does that url come up now?
ErinL: Are these results sufficient? Is there anything else I can find for you?
You: Yep what's the most popular site published by Rapid Intelligence?
ErinL: ok bare with me as I find that answer as well.
ErinL: Is this what you are looking for?
You: ..
ErinL: hmmm?
You: is somehting supposed tocome up in the guide results now?
You: I don't see a url
ErinL: yes. you should have 3.
ErinL: www.rapint.com
You: nothing showed up..I'm using firefox
You: one more question.. Who is the opposition leader in Australia?
ErinL: oooh. I don't think it's compatible with fire fox. This is the 2nd time I've had this happen.
ErinL: if I give you the www site can you find it from there? I'm not familiar with firefox.
You: yep no prob
ErinL: www.en.wikipedia.org/wiki/Frontline_%Australian_TV_series%29
ErinL: www.rapint.com for the rapid intelligence question
ErinL: I hope this helps you. If not let me know. Is there anything else I can do?
You: Yep Who is the opposition leader in Australia?
ErinL: http://en.wikipedia.org/wiki/List_of_Australian_Opposition_Leaders
ErinL: It's Kevin Rudd.
You: Thank you
You: One more quesiton: where are you based?

[many minutes pass]

ErinL: Thanks for searching ChaCha! Have a nice day. thank you.
ErinL: Please RATE ME. Thanks for using ChaCha.
Status: Session ended.
Status: Looking for a guide ...
Status: Connected to guide: AmandaG
AmandaG: Welcome to ChaCha!
AmandaG: Chacha is based in Indiana.
You: Where are you personally right now? (which city)

[10 minutes pass. Things start getting really slow here]

You: are you there?
You: Are you checking to see if you're allowed to answer this or?
You: Hello?
You: Thanks, I'm done.
Status: Session ended.
Status: Looking for a guide ...
Status: Connected to guide: Steven C
Steven C: Welcome to ChaCha!
Steven C: hi
You: Hi Steven
Steven C: hi
You: where are you?
Steven C: in USA
You: Ok thanks

Impressive they could actually field queries despite a recent spike in traffic. But I can't say I think it's a great business. Perhaps if you publish the chats, you've got yourself an easy way to generate content for AdSense. Perhaps ones that both chatters agree is worth publishing. Or it could work doing verticals. I'm sure there's room on the net for a few Indian mesothelioma experts giving you advice then sending you to affiliated sites.

Sunday, November 05, 2006

Plagiarism on Wikipedia

Wikipedia-Watch.org owner, Daniel Brandt has published a report showing widespread plagiarism in Wikipedia.

A more sophisticated methodology could be used here. One idea that has just hit me would be using archive.org and the Wikipedia history function to discover phrases that appeared on other sites first. This could be all automated.

But such activity should be applauded. Whatever people may speculate about Daniel Brandt's motives, the same thing could have been done by a Wikipedian with the same positive effect: as an open system that assimilates criticism, it will learn and evolve from negative press out there. And as a fast moving encyclopedia, it can react quickly to address the problem.

Thursday, October 05, 2006

Google inadvertently invents an acronym generator

Hats off to Google Labs for creating the first search engine to allow regular expressions, even if it's just for their new code search engine.

This is something I've been waiting for for ages. Why? acronym discovery!

This regex "\s+I\w+\s+P\w+\s+O\w+\s+D\w+" means find sequences of words that match I*, P* O* D*. We get from this:
  • indexes pairs of digits
  • invalid path or domain
  • important property of DOM
  • \sL\w+\s+O\w+\s+V\w+\s+E\w+ searches for love and gets "lock on VLDB entry"

    \sI\w+\s+B\w+\s+M\w+ finds i.b.m. and gets :
  • is being moved
  • image being manipulated

  • signal handler is trashed
  • systems hide it there

  • random string sent
  • related structures stored
  • report some statistics