Blog Ho!
A swashbuckling adventure in open source, innovation, and photography
Thursday, 28 August 2008

Home
Photography
Polls
Your photography level of interest...
 
IMG_1362.jpg

Date: 06/21/2004 Views: 203


 

Ah, so that's how it works... Print E-mail
Wednesday, 26 October 2005
A few years ago while reading Bruce Schneier's crypto-gram, I learned about the service http://tinyurl.com.  He used the service to shorten a few long links he included in the email.  If you haven't seen TinyURL before, it takes any really, really, really long links and turns it into a short one.  Go try it, you'll get the idea pretty quick.

I always wondered exactly how the site worked.  Perhaps because I was reading a cryptography newsletter at the time I discovered it, I assumed it was some clever hash function then the hash and the URL were stored in a database.  It turns out, I overestimated the cleverness factor and a very simple radom number generator will do.  In a presentation at Phreaknic (I missed it this year, but I've been the last two years) on "extending webistes" acidus demonstrated an implemtation of the same concept he calls nanoURL (PHP source code available here).

The basic algorithm goes like this...  Pick five random characters from the set including all letters A-Z and the numbers 0-9.  Done. 

Just save the string of five random characters and the long URL to a databse.  Then when someone comes to yoursite.com/w3s4e (or whatever five character string you gave them), just look it up in the database and redirect the browser to the really long URL.

Duh.  Don't know why I over thought it so much.  And if you really want to get fancy before you save the hash and URL, check that the particular five character string you generated isn't already in use.  If it is, just generate another.

All well and good, but then my over analysis engine kicked in again.  Maybe it was upset at losing round one to simplicity. 

A-Z and 0-9 gives you a character set of 36 unique characters.  A string of five from that set gives you 36^5 or 60,466,176 unique combinations.  60 million is not that much when you consider Google has about 8 billion sites indexed.  In his presentation, acidus mentions tinyurl is up to about 18 million URLs in it's database.  The TinyURL site says 11 million, so the exact number isn't clear, but let's assume it's around 15 million.  That means about 25% of the available 60 million combinations are already in use.  So, for about 1 in 4 URLs submitted to TinyURL, they have to roll the dice a second time, and 1 in 16 URLs they will have to roll a third time, 1 in 64 a fourth, and so on.

Of course, if they make the URLs a little less tiny and add a sixth character, there would be 36^6 or 2,176,782,336 (2.1 Billion) combinations.  The example on their website goes to http://tinyurl.com/6 and it works.  I tried several two, three and four character combos and they all worked as well.  Maybe the algorithm is a little more clever than it first appeared. 

Or, at least they go in and add a character once the address space at the current length gets crowded.

Write Comment
  • Please keep the topic of messages relevant to the subject of the article.
  • Personal verbal attacks will be deleted.
  • Please don't use comments to plug your web site.. Such material will be removed
Name:Guest
Title:
Comment:

This image contains a scrambled text, it is using a combination of colors, font size, background, angle in order to disallow computer to automate reading. You will have to reproduce it to post on my homepage
Enter what you see: *
tips: hit Reload page before writing a text if you have difficulty reading characters in image

Comments

Powered by AkoComment 2.0! and SecurityImage 3.0.4

Last Updated ( Wednesday, 26 October 2005 )
 
< Prev   Next >