register |

Login:

New Here?

This here is the weblog of me, Sander van Lambalgen. I'm a sometimes Mozilla contributor, ectophile, allaround computer geek, avid science fiction reader, amateur photographer and professional web developer with a penchant for traveling.

Although you can expect me to write about all these interest, it's this last, the traveling part, that gives rise to most entries in this here weblog, as I write "tripreports" detailing the experiences of my travels around the world.

Tue 10 May 2005, 16:56 GMT

What things look like on my side of the new digital divide, or hacking Amazon for fun.

Most of you, my regular readers, don't know it, but there's a vast chasm between us. I'm bored with Flickr. You've never even heard of it. This time around though, there's an awful lot of computer geeks on your side. I discovered this when Arnoud came over to New Zealand recently. I offhandedly mentioned one of the current crop of nifty new web applications, and he didn't know it. Moreover, he didn't know, as you don't know, the technologies that made this web application so nifty, or how they'd evolved from an earlier incarnation or... Well, basically anything that your average technorati weblogger has been getting excited about for the last three years.
I don't intend to change that with this post. There's far too much history already to explain in text, and you're not interested anyway. I guess that's the main thing. You are not interested. You have exactly the same tools at your disposal as all those who're living in this whole new internet, and are as capable of diving into all this fun as everyone else. But it's not your cup of tea. Or so you think.

So, what this post is, is me showing just a single evening in my world (last Friday, to be exact), touching on just a few of these new developments. Perhaps it'll interest you enough to make you go out and explore for yourself, and if not, then at least it should help you gain some understanding of what's out there and what's happening.

Work for the week done, I started my daily reading/browsing cycle. I visit more random websites in a day now than I did back in the 90s, but the difference is that it's organized. No more randomly stumbling around in the dark, typing "books" into altavista and clicking on page 30 to see if there's a website out there which I should know about. Nowadays I harness the power of everyone else doing this for me.
There's a website out there called del.icio.us. Del.icio.us is a social bookmarks manager. The concept is: you stumble across an interesting website and you bookmark it. But instead of bookmarking it in your browser, you bookmark it to a website, and give it some quick key phrases (called "tags") to remember it by. Benefit for yourself: you can access your bookmarks from any computer in the world. Benefit for everybody else: they can see your bookmarks. Benefit for the world: synergy. Interested in landscape photography? See what everybody is bookmarking on those two subjects.
*insert pause here while I spend many minutes clicking through sites full of pretty landscape photographs*
Benefit for me: populicio.us. The most often bookmarked sites appearing in just the last 24 hours.
Backtracking a bit. One of the larger dilemmas in the weblog world is that there are a few hundred people who are read by everyone, and a few million people who are read by 'no one'. I fall on the latter category. Most everyone in the "Tech-talkers" list to the left falls in the former. When one of these few hundred "A-list" bloggers finds an interesting website / article / nifty bit of shiny-ness, and links to it, half of those millions of people will follow the link, agree that it's interesting, and then link to it as well. There are sites out there which track this. Daypop's Top 40 will give you a very good overview of what everyone has been linking to in the last week or so. Basically, if you have a weblog and want to be noticed, you have two options: you get noticed... and you get noticed and you get noticed and you get noticed and... Or you don't get noticed. There's hardly any middle ground. People are aware of the problem, and are actively trying to go out of their way to mitigate it, but it's still there.
Now back to populicio.us. Because of its very fast 24 hour turn around, there are links appearing on there which haven't and won't be linked to by everyone. It isn't directly fed by webloggers either (although of course they still feed it indirectly by linking to sites which are then bookmarked by people). I guess that makes it the best of both worlds. You get the 'mainstream' articles (mainstream for this side of the new digital divide), and the start of the long tail. (Cause like, y'know, I hadn't yet linked to enough long-since-cliché phrases.)
Anyway, yes. I always pay particular attention to the links at the bottom of this list. Of enough interest that half a dozen people have bookmarked them, but not so special that everyone knows about it. (I read those articles as well of course - assuming the title doesn't tell me that I won't be interested - but that's just keeping up with the world.) And so last Friday - because that was after all the subject here - I came across a link to a Wired article: Judging a Book by Its Contents. How could I not be drawn in?
Basically the article shortly highlighted a couple of more or less interesting new features available at Amazon. The one that drew my attention was a list of SIPs, or Statistically Improbable Phrases, to now appear with most of the books of which Amazon has the text in electronic format. (The more adventurous of you might have seen this already by following the "Ineffable Flame" link that's currently living in the "interesting reads" box over to the right.)
Basically, if you can "search inside" a book, and if the author of the book uses unique enough phrases, there'll be a line of SIPs just above the cover:

Fun was had figuring out what the constraints were for this feature and comparing authors and such. Most science fiction and fantasy authors didn't seem to have more than a handful of SIPs per book, while for example my personal favorite author David Zindell hit the maximum 25 phrases for all his books.
But comparing authors like that gets boring after a bit, so I continued exploring. Next up was the following:
http://www.amazon.com/gp/phrase/all ... ozers%20and%20things:

All the SIPs I'd seen consisted of two words together, so obviously the next thing to figure out was if there were any one or three word SIPs. Randomly browsing books didn't give an answer. However, I'd noticed that amazon just put the phrase in the url, so I went hunting for some quotes that contained a number of words after each other that promised to be unlikely to appear in many books, while not being completely unique.
Now this was interesting though...
http://www.amazon.com/gp/phrase/knock%20my%20house%20down:

You can string subsequent phrases together, just by hacking (changing) the url.
http://www.amazon.com/gp/phrase/sarcasm%20on%20betelgeuse:

And in that way potentially read an entire book, right off Amazon.
If I'd have had a little more time, and a little more Python knowledge, I'd at this point have gone to write a little Python script as proof of concept, written a weblog post just about that, gotten my 5 seconds of fame at slashdot (alongside a server that had caught fire), and probably a lawsuit to follow up on that. But alas, I didn't have this time, and so the creation of a script to download free e-books from Amazon will be left as an exercise to the reader.
Now of course, it's utterly boneheadedly stupid to do that for real. But it's also really, really nifty. It shows, in a somewhat over the top way, the potential in a new feature like this.
For the moment I've only added a custom keyword to my Mozilla to quickly search books for somewhat unique phrases. That's nice, but not all that spectacular. Yet I expect that the first GreaseMonkey script to screenscrape this information off of Amazon and use it within some other site is already in the process of being written.
And maybe Amazon will never provide a public API for this feature. But maybe they will. And maybe someday soon, someone with more imagination than I have will take these SIPs and find some killer application use for them, turning something nifty and shiny into something absolutely stunningly brilliant.
One thing I know. When it happens, I'll learn about it pretty swiftly. Because that's what happens in this world I live in. Why not come join me here?


update: With the introduction of CAPs (Capitalized Phrases), the urls for SIPs have changed.
http://www.amazon.com/gp/phrase/ineffable%20flame becomes http://www.amazon.com/gp/phrase/?phrase=ineffable%20flame
I guess Amazon doesn't know that cool URIs don't change.

Comments

rororo commented on Tue 10 May 2005, 18:10 GMT:
Hi,

Nice! Although I usually am not fast at adapting new technology, it's always good to know what's out there. I definately want to try the presented bookmark facilities. Anyways, thanks for the info :)

Regards,

Ro.
Shadar commented on Sat 14 May 2005, 11:46 GMT:
I think the idea of the divide is valid, but somewhat exaggerated.

Certainly there are extremes at both ends - people jumping on all the latest technologies and idea, and others remaining oblivious. But that's somewhat simplistic - there are plenty of people in the middle, not early adopters, but not the tail-end either. And people follow different things in different ways.

Looking on the first article you link to, I find myself a mixed case. I use Firefox (Mozilla, actually) and an RSS reader for getting news. I also use a Linux-based desktop running bleeding edge code, and follow a number of project weblogs and mailing lists to keep an eye on what new things are coming.

On the other hand, I don't have a weblog of my own, and have never heard of either Flickr or Doc Searls. I've never payed much attention to some of the fancy services Amazon and Google and others offer. And I'm sure there are lots of other up-and-coming ideas and tech that I've never heard and perhaps should have.

So where does that leave me in such a divide?
Sander commented on Thu 19 May 2005, 16:18 GMT:
Shadar: that digital divide article definitely painted things very black and white. I don't know if this was intentional or if Seth really sees that as the truth. Also, personally I'm of the opinion that the "blogosphere" is busy sealing itself into a cocoon and furiously building an ivory tower of new ideas on top of previous ideas - without grounding them properly.
Which I think is a damned shame, as the actual tools and technologies are really cool. Yet because they're placed in the context of purely the weblog world, there's an awful lot of geeks out there not learning about them.

Not being able - or even wanting to - stop the rapid development out there (I see them a bit like the Royal Society in its early days as depicted in the Baroque Cycle), this was my little attempt to at least reach out to the circle of geeks around me and hopefully get some interest or awareness going.

There's all these things happening on the web today - but they don't stand alone. They become far more than they are because of all the other people building on them. When you point someone in the blogosphere to populicio.us, they understand not only that it's a useful service, but also that it uses del.icio.us bookmarks, and integrates with the entire hierarchy of cool things you can do with feeds, and has the hierarchy of cool things you can do with tags underlying it (so someone could extend the idea to create a populicio.us of just "photography" links, provided bookmarks tagged as such were popular enough), and...
People outside the blogosphere see populicio.us only as a useful service. Which is cool, because it is, but when I link to such things from my "worth a look" box, I'd hope that more people here will now be aware of that it ties in with a lot more, even if they don't know exactly what all more.


As for your personal position in Seth's "new digital divide": I'd say using firefox, and perhaps also the feed reader (I don't know, how "mainstream" are those at the moment?) is 'merely' because you're a geek, and you're otherwise pretty far on the non-weblog side of the 'divide'.

Oh, and for the record: Doc Searls and Flickr. :)
Shadar commented on Mon 23 May 2005, 12:12 GMT:
Oh, no need to provide links to Doc Searls and Flickr - I may not have heard of them before, but I certainly know how to use Google... :)

As for being in the "blogosphere", what does it really mean? As I understand what you and Seth's blog describe, you're talking about a group that while large and constantly growing, is in some ways rather exclusive - a group in which there's a lot of technology progress that isn't really propagating outside that group. Is that a valid understanding?


If so, I don't think it's entirely a valid concern. Firstly, not everyone is interested in everything - sometimes one encounters something new and says "Great", other times it's "So what?". I've had a look at populicio.us and del.icio.us since reading your post, and I put them in the latter category. Interesting idea, but hardly life changing. Similarly, Flickr looks interesting, but I'm not really into photography, so I don't look in depth.

Secondly, there are people like myself who while not participating in that group, still follow it with interest. And if I'm not familiar with such things as Flickr or del.icio.us, it's not because I'm in some way excluded - it's simply that I've never come across them before in any way that would encourage me to look at them. The 'blogosphere' is a big group afterall, and even as interconnected as it is, it's not a hivemind where everyone knows what everyone is doing.


Short version I guess, is that I don't consider blogging to be a criteria for a divide. You suggest that by Seth's post, I'm "pretty far on that non-weblog side of the 'divide'". Ok, so I've no interest in having a weblog, and I'm not familiar with a number of websites common to the blogging community. Is that grounds for dividing the world into "Digerati" and "The Left Behind"? No, I don't think so.
Sander commented on Mon 23 May 2005, 14:11 GMT:
*g* The links were mostly for the benefit of any other people who might be willing to click and see what the hell we're blathering on about, but wouldn't bother to go and google themselves. :)

Maybe something I should clarify, btw - the last paragraph of my previous comment was about my interpretation of Seth's point of view. Which is not my point of view. By the way he divides the world, you're one of "the left behind". And I agree with you that his way is not a good way to fundamentally divide the world by.
The way I see it, and it's been pointed out to me that I didn't make this clear in the opening paragraph of my original post either - I guess mostly because it's so fundamental to me - people like you and Arnoud basically are me, as far as geek-interests go. And so when I discovered that Arnoud didn't know these things, that started me thinking.

There is something happening in the weblog-part of the world, though. Something that those geeks who don't read weblogs should take note of, but (initially surprising to me), they haven't - probably because it's been happening entirely fueled by the same weblog world that they're not all that interested in. (And that I think gave rise to Seth's interpretation as well, even if he sees the weblog side as becoming a new and everlasting elite that will soon be near-on impossible to join, while I think it's at risk of evolving itself into a dead end.)
Del.icio.us and flickr are, by themselves, "so what?" apps. They're useful to some (not to me - though I'll happily take advantage of the output via populicio.us), but have already grown boring. But the way they integrate, the way they get developed and their open APIs taken advantage of and integrated into further building blocks isn't "so what?" at all. For the first time on the web, here are everyday users entering metadata en masse. In services that give access to this data and the aggregates of this data for other developers to use, and that almost rely on this very data to actually do anything.
That has some pretty far-reaching implications. I don't know what they are. I just know they're there in potential. (Google vastly improving its search results by giving weight to sites "tagged" with certain keywords? That'd be just a start...)

Actually, here's an absolutely excellent example - a self-contained one page wiki generated entirely by javascript (awesome concept in itself; use it as a personal notebook) that just added tags to create non-hierarchical navigation next to the regular way you would find information.
Now as a programmer I'd say "that's just metadata," and it is. But it's slightly more because it has a far more central role, one where it's directly exposed to the user rather than hiding in the properties somewhere. And that's completely due to the effect of del.icio.us and flickr.
And this self-contained wiki has the possibility to generate an rss feed. I haven't played with that yet, but I'm just imagining the possibilities. Aggregating your personal notebook with your bookmarks and your photographs and your email (gmail labels are a similar kind of "one word" exposed metadata idea). Having one central location from where you can see what you everything that has to do with "photography" that you ever did before? Finding the most popular "things" out there by everybody on d70+landscape+photography?
Shadar commented on Tue 24 May 2005, 09:30 GMT:

quote:
Del.icio.us and flickr are, by themselves, "so what?" apps. They're useful to some (not to me - though I'll happily take advantage of the output via populicio.us), but have already grown boring. But the way they integrate, the way they get developed and their open APIs taken advantage of and integrated into further building blocks isn't "so what?" at all. For the first time on the web, here are everyday users entering metadata en masse. In services that give access to this data and the aggregates of this data for other developers to use, and that almost rely on this very data to actually do anything.

Yes, that I agree with. I've seen some of the uses people have come up with for Google Maps - saw a link today that had merged it with data on crime activity to show safe and unsafe neighbourhoods in a city, and was considering linking in data on house prices. Wouldn't be surprised to see someone hooking it up to satellite photography in some way.

Not all of that kind of thing is web based though, part of the bloggers domain. There are a lot of interesting projects going on in the open desktop community (something I pay more attention to than web development) - have you seen Dashboard? A framework for gathering up information relevant to what you're doing - e.g contacts (and IM status) for people, applicable bookmarked sites, recent documents on the subject, etc. One of the screenshots shows an IRC session with Dashboard showing the status of a referenced bugzilla entry, and the contact entry corresponding to a phone number. I don't think it's ready for general use yet, but the Beagle desktop search tool it uses is starting to find it's way into use.


quote:
Actually, here's an absolutely excellent example - a self-contained one page wiki generated entirely by javascript (awesome concept in itself; use it as a personal notebook) that just added tags to create non-hierarchical navigation next to the regular way you would find information.

No offense, but that's one of the most utterly broken websites I've seen without using Flash. It's a wiki with no way of opening links in a separate tab or window, because there aren't actually any links, just javascript animations. It's a wiki that can't be bookmarked for reference, since there's no state to be bookmarked.

Much as I find some of the new rich web interfaces interesting (we're building one at work), this is the worst possible example of the type. An excellent demonstration of a good technology, put to a dreadfully bad use.
Sander commented on Tue 24 May 2005, 14:07 GMT:

quote:
It's a wiki that can't be bookmarked for reference, since there's no state to be bookmarked.
Actually any possible combination of states can be bookmarked - click on the "permaview" link in the top right.

I agree with you that it's broken for all the reasons that you say as a web interface. But for a port of the wiki idea to the offline world, I think most of that can just about be overlooked.

Dashboard indeed looks really cool - thanks for pointing me to that. I think I'd utterly hate it distractingness, or, depending on how much it could outperform my own system for knowing where exactly to grep for any information I might want to retrieve without causing system overhead to go way up (and thus also being able to retrieve required information from pretty much _anything_), love it to death.
The project however looks somewhat dead - at least as far as checkins or weblog updates goes...
Shadar commented on Wed 25 May 2005, 10:58 GMT:
Re. offline wikis, I tend to use the program Tomboy for that kind of thing - I find it invaluable for keeping track of things I'm doing both at home and at work. E.g notes on the current defect I'm fixing, or short-term design documents.

Not as portable as Taggly, granted, but it's not something where I find myself needing portability beyond simply emailing the contents of a note.

Re. Dashboard, I agree there doesn't seem to be too much activity on the site. The mailing list has a bit of activity, though mostly seems to deal with Beagle rather than Dashboard. Without looking too closely, I'd guess that since desktop search is a pretty important part of Dashboard, the developers are focussing efforts on getting Beagle stable and into widespread use.

As for the interface, I agree it could be an annoying distraction, and the screenshots as they stand have quite a lot of screenspace lost to the info panel. I'd probably prefer something more subtle that can easily be conjured up for details, than something taking up a permanent place.

Add a comment

(register)
(HTML is disabled, but MMB codes can be used.)
Options:
automatically makes hyperlinks for all urls.
MMB codes will be ignored.