Average length of a URL
Posted by Kelvin on 06 Nov 2009 at 06:48 pm | Tagged as: Lucene / Solr / Elasticsearch / Nutch, crawling, programming
Aug 16 update: I ran a more comprehensive analysis with a more complete dataset. Find out the new figures for the average length of a URL
I've always been curious what the average length of a URL is, mostly when approximating memory requirements of storing URLs in RAM.
Well, I did a dump of the DMOZ URLs, sorted and uniq-ed the list of URLs.
Ended up with 4074300 unique URLs weighing in at 139406406 bytes, which approximates to 34 characters per URL.