The danger of bit.ly

by Marty Alchin on July 14, 2008

I’m certainly not the first to mention news of bit.ly, a new URL shortening service, but I’d like to look at it from a different angle. While I applaud most of what bit.ly is trying to here, especially in terms of data capture and availability, there’s a fundamental flaw in their design: if shortened URLs are permanent (which they clearly should be, if their data collection is to be of any long-term value), users MUST NOT be able to specify them explicitly.

Like many others, I was linked to the ReadWriteWeb article by John Gruber, whose a post which also included a link to its stats page, where I found some alarming information.

First, keywords allow a number of duplicate URLs to be floating around out there, clogging up the available addressing space. As of this comment, there are 16 distinct keywords pointing to the RWW article, when there really only needs to be a single point of entry. At least bit.ly does the sensible thing and makes sure that each submission of the same URL uses the same hash, regardless of keywords, but having 17 links (the hash plus all the keywords) is a big loss in the long run.

More importantly, by virtue of being permanent redirects, when someone claims a shortened URL by way of a “keyword”, they’ve locked that URL out from any future use. a quick and easy case in point taken from the stats Gruber pointed to: “bit.ly/rww” will forever point to that specific article, and may never be used to point to readwriteweb.com itself. Sure, it’s not a terribly long URL, but hopefully you can see where it opens things up for all sorts of malicious behavior.

To illustrate that point a bit more authoritatively, I took a (very short) intuitive leap: bit.ly/microsoft. No, I didn’t create that link, I just knew that Microsoft is the perfect kind of target for a service like this, so it’s no surprise it’s redirected somewhere else already. I’m not just wildly speculating here; people are already gaming the system.

Now, to be fair, all URL shortening services suffer from this flaw to some extent. When someone posts a short URL, you have to trust that they’re pointing you somewhere useful. You have to look at the context in which it was posted, and factor in how much you trust the person who posted it. But, I’d argue that the visibly cryptic nature of a traditionally shortened URL is what triggers this reasoanble suspicion in us as users. I just worry that by removing the random hash from the shortened URL, bit.ly is going to encourage people to become more trusting of these links, when there’s absolutely no more certainty about where it actually leads.

I enjoy being unique on the web, so I’ve taken the precaution of pointing bit.ly/gulopine at this site’s root, to make sure it stays intact. Maybe I’m a bit paranoid, since I probably haven’t made enough of a splash to be considered a target, and I’m not urging everybody to go out and “claim their keywords,” but if you have a unique identifier already, you may want to consider taking a few seconds to try to keep it that way.

To my audience at large, I’d recommend that you do use bit.ly, but that you don’t use keywords, unless they’re both:

This should be common sense to most of us, but there are obviously those out there who don’t have any such sense.

In addition, if you want to link to a page that you’re sure must already have a bit.ly link, don’t assume anything. Even if there is an existing shortened URL, the keyword you think will take you there has no guarantee of being accurate. Always go to bit.ly and verify it first. I wouldn’t recommend visiting the link itself, since you never know what may be lurking on the other side. Just hop onto bit.ly, plop in the URL and click the “Info” button if you’re interested in providing a more “friendly” URL than the auto-generated hash. Again, this should be common sense, but sense is less and less common every day.

Oh, and if anyone from betaworks is reading this, my personal recommendation is to remove keyword support immediately. I may be overreacting, but honestly, a redirect to Apple’s home page is far nicer of an attack than I was expecting. I love the service in general, and I think keywords are a noble idea, but I also think they’re ultimately doomed to fail.

If I had to recommend an alternative, perhaps allow arbitrary keywords as a prefix or suffix to the hash, so that they can be used for data collection, and to help make the URL a bit more readable, but aren’t used as part of the redirection at all. The site-provided hash should be the only thing that’s used to determine the destination URL. I could still put “microsoft” at the end of the hash for Apple’s site, but the hash would still be visible, which should still trigger the appropriate skepticism in users.