|
Ah, so that's how it works... |
|
|
|
Wednesday, 26 October 2005 |
A few years ago while reading Bruce Schneier's crypto-gram,
I learned about the service http://tinyurl.com. He used the service to
shorten a few long links he included in the email. If you haven't seen
TinyURL before, it takes any really, really, really long links and
turns it into a short one. Go try it, you'll get the idea pretty quick.
I
always wondered exactly how the site worked. Perhaps because I was
reading a cryptography newsletter at the time I discovered it, I
assumed it was some clever hash function then the hash and the URL were
stored in a database. It turns out, I overestimated the cleverness
factor and a very simple radom number generator will do. In a presentation at Phreaknic
(I missed it this year, but I've been the last two years) on "extending
webistes" acidus demonstrated an implemtation of the same concept he
calls nanoURL (PHP source code available here).
The
basic algorithm goes like this... Pick five random characters from the
set including all letters A-Z and the numbers 0-9. Done.
Just
save the string of five random characters and the long URL to a
databse. Then when someone comes to yoursite.com/w3s4e (or whatever
five character string you gave them), just look it up in the database
and redirect the browser to the really long URL.
Duh. Don't know why I over thought it so much. And if you really want to get fancy before you save the hash and URL, check that the particular five character string you generated isn't already in use. If it is, just generate another.
All
well and good, but then my over analysis engine kicked in again. Maybe
it was upset at losing round one to simplicity.
A-Z and 0-9 gives you
a character set of 36 unique characters. A string of five from that
set gives you 36^5 or 60,466,176 unique combinations. 60 million is
not that much when you consider Google has about 8 billion sites
indexed. In his presentation, acidus mentions tinyurl is up to about
18 million URLs in it's database. The TinyURL site says 11 million, so
the exact number isn't clear, but let's assume it's around 15 million.
That means about 25% of the available 60 million combinations are
already in use. So, for about 1 in 4 URLs submitted to TinyURL, they
have to roll the dice a second time, and 1 in 16 URLs they will have to
roll a third time, 1 in 64 a fourth, and so on.
Of course, if they make the URLs a little less tiny and add a sixth
character, there would be 36^6 or 2,176,782,336 (2.1 Billion)
combinations. The example on their website goes to
http://tinyurl.com/6 and it works. I tried several two, three and four
character combos and they all worked as well. Maybe the algorithm is a
little more clever than it first appeared.
Or, at least they go
in and add a character once the address space at the current length
gets crowded.
Powered by AkoComment 2.0! and SecurityImage 3.0.4 |
|
Last Updated ( Wednesday, 26 October 2005 )
|