What things look like on my side of the new digital divide, or hacking Amazon for fun.
Most of you, my regular readers, don't know it, but there's a vast chasm between us. I'm bored with Flickr. You've never even heard of it. This time around though, there's an awful lot of computer geeks on your side. I discovered this when Arnoud came over to New Zealand recently. I offhandedly mentioned one of the current crop of nifty new web applications, and he didn't know it. Moreover, he didn't know, as you don't know, the technologies that made this web application so nifty, or how they'd evolved from an earlier incarnation or... Well, basically anything that your average technorati weblogger has been getting excited about for the last three years.I don't intend to change that with this post. There's far too much history already to explain in text, and you're not interested anyway. I guess that's the main thing. You are not interested. You have exactly the same tools at your disposal as all those who're living in this
whole new internet, and are as capable of diving into all this fun as everyone else. But it's not your cup of tea. Or so you think.
So, what this post is, is me showing just a single evening in my world (last Friday, to be exact), touching on just a few of these new developments. Perhaps it'll interest you enough to make you go out and explore for yourself, and if not, then at least it should help you gain some understanding of what's out there and what's happening.
Work for the week done, I started my daily reading/browsing cycle. I visit more random websites in a day now than I did back in the 90s, but the difference is that it's organized. No more randomly stumbling around in the dark, typing "books" into altavista and clicking on page 30 to see if there's a website out there which I should know about. Nowadays I harness the power of everyone else doing this for me.
There's a website out there called del.icio.us. Del.icio.us is
a social bookmarks manager. The concept is: you stumble across an interesting website and you bookmark it. But instead of bookmarking it in your browser, you bookmark it to a website, and give it some quick key phrases (called "tags") to remember it by. Benefit for yourself: you can access your bookmarks from any computer in the world. Benefit for everybody else: they can see your bookmarks. Benefit for the world: synergy. Interested in landscape photography? See what everybody is bookmarking on those two subjects.
*insert pause here while I spend many minutes clicking through sites full of pretty landscape photographs*
Benefit for me: populicio.us. The most often bookmarked sites appearing in just the last 24 hours.
Backtracking a bit. One of the larger dilemmas in the weblog world is that there are a few hundred people who are read by everyone, and a few million people who are read by 'no one'. I fall on the latter category. Most everyone in the "Tech-talkers" list to the left falls in the former. When one of these few hundred "A-list" bloggers finds an interesting website / article / nifty bit of shiny-ness, and links to it, half of those millions of people will follow the link, agree that it's interesting, and then link to it as well. There are sites out there which track this. Daypop's Top 40 will give you a very good overview of what everyone has been linking to in the last week or so. Basically, if you have a weblog and want to be noticed, you have two options: you get noticed... and you get noticed and you get noticed and you get noticed and... Or you don't get noticed. There's hardly any middle ground. People are aware of the problem, and are actively trying to go out of their way to mitigate it, but it's still there.
Now back to populicio.us. Because of its very fast 24 hour turn around, there are links appearing on there which haven't and won't be linked to by everyone. It isn't directly fed by webloggers either (although of course they still feed it indirectly by linking to sites which are then bookmarked by people). I guess that makes it the best of both worlds. You get the 'mainstream' articles (mainstream for this side of the new digital divide), and the start of the long tail. (Cause like, y'know, I hadn't yet linked to enough long-since-cliché phrases.)
Anyway, yes. I always pay particular attention to the links at the bottom of this list. Of enough interest that half a dozen people have bookmarked them, but not so special that everyone knows about it. (I read those articles as well of course - assuming the title doesn't tell me that I won't be interested - but that's just keeping up with the world.) And so last Friday - because that was after all the subject here - I came across a link to a Wired article: Judging a Book by Its Contents. How could I not be drawn in?
Basically the article shortly highlighted a couple of more or less interesting new features available at Amazon. The one that drew my attention was a list of SIPs, or Statistically Improbable Phrases, to now appear with most of the books of which Amazon has the text in electronic format. (The more adventurous of you might have seen this already by following the "Ineffable Flame" link that's currently living in the "interesting reads" box over to the right.)
Basically, if you can "search inside" a book, and if the author of the book uses unique enough phrases, there'll be a line of SIPs just above the cover:
Fun was had figuring out what the constraints were for this feature and comparing authors and such. Most science fiction and fantasy authors didn't seem to have more than a handful of SIPs per book, while for example my personal favorite author David Zindell hit the maximum 25 phrases for all his books.
But comparing authors like that gets boring after a bit, so I continued exploring. Next up was the following:
http://www.amazon.com/gp/phrase/all ... ozers%20and%20things:
All the SIPs I'd seen consisted of two words together, so obviously the next thing to figure out was if there were any one or three word SIPs. Randomly browsing books didn't give an answer. However, I'd noticed that amazon just put the phrase in the url, so I went hunting for some quotes that contained a number of words after each other that promised to be unlikely to appear in many books, while not being completely unique.
Now this was interesting though...
http://www.amazon.com/gp/phrase/knock%20my%20house%20down:
You can string subsequent phrases together, just by hacking (changing) the url.
http://www.amazon.com/gp/phrase/sarcasm%20on%20betelgeuse:
And in that way potentially read an entire book, right off Amazon.
If I'd have had a little more time, and a little more Python knowledge, I'd at this point have gone to write a little Python script as proof of concept, written a weblog post just about that, gotten my 5 seconds of fame at slashdot (alongside a server that had caught fire), and probably a lawsuit to follow up on that. But alas, I didn't have this time, and so the creation of a script to download free e-books from Amazon will be left as an exercise to the reader.
Now of course, it's utterly boneheadedly stupid to do that for real. But it's also really, really nifty. It shows, in a somewhat over the top way, the potential in a new feature like this.
For the moment I've only added a custom keyword to my Mozilla to quickly search books for somewhat unique phrases. That's nice, but not all that spectacular. Yet I expect that the first GreaseMonkey script to screenscrape this information off of Amazon and use it within some other site is already in the process of being written.
And maybe Amazon will never provide a public API for this feature. But maybe they will. And maybe someday soon, someone with more imagination than I have will take these SIPs and find some killer application use for them, turning something nifty and shiny into something absolutely stunningly brilliant.
One thing I know. When it happens, I'll learn about it pretty swiftly. Because that's what happens in this world I live in. Why not come join me here?
update: With the introduction of CAPs (Capitalized Phrases), the urls for SIPs have changed.
http://www.amazon.com/gp/phrase/ineffable%20flame becomes http://www.amazon.com/gp/phrase/?phrase=ineffable%20flame
I guess Amazon doesn't know that cool URIs don't change.