Useful links on HTTP Pipelining
Posted by Kelvin on 12 Jul 2005 at 03:34 am | Tagged as: Lucene / Solr / Elasticsearch / Nutch, programming
Am implementing HTTP Pipelining for Nutch. Some useful links:
http://www.oaklandsoftware.com/product_16compare.html
http://www.mozilla.org/projects/netlib/http/pipelining-faq.html
http://www.w3.org/Protocols/HTTP/Performance/Pipeline.html
http://www.jmarshall.com/easy/http/
Update
Some thoughts working on HTTP pipelining:
Its a pity Commons HttpClient doesn't have HTTP request pipelining implemented. I tried my hand at it, but gave up after 2 hours. In some ways, Commons HttpClient seems abit of an overkill, with all its glorious complexity compared to Nutch's Http and HttpResponse classes. OTOH, there's proper cookie and authentication support, amongst a host of other features. Innovation.ch's HttpClient provides request pipelining, and most of Commons HttpClient's features, but it isn't actively developed anymore, and I'm abit hesitant about using it. Nonetheless, I have to say its code is very well documented!
I spent about 3 hours knocking my head against the wall on a bug, before I realized that mixing HTTP 1.0 and 1.1 servers is quite unhealthy when attempting request pipelining. I also had to remember that HTTP 1.0 doesn't support Keep-Alive (aka connection persistence) by default. Furthermore, sending the header Connection: Keep-Alive
would cause HTTP 1.0 proxies to fail unpredictably. I decided to steer well clear of the mess and simply _not_ pipeline or reuse HTTP 1.0 sockets.
Comments Off on Useful links on HTTP Pipelining