Supermind Search Consulting Blog 
Solr - Elasticsearch - Big Data

CSS3 Selectors in Java

Posted by Kelvin on 21 Jan 2010 | Tagged as: programming

http://github.com/chrsan/css-selectors/tree has a cool implementation of full CSS3 Selector support. Yes, this is the same CSS selector support as you get in JQuery. Eat your heart out. It comes with a org.w3c.dom implementation out-of-box. I augmented it with a dom4j implementation (so I could mixin tagsoup for real-world HTML). It's slow as a dog compared […]

Dom4j + XPath + TagSoup – Namespaces = sweet!

Posted by Kelvin on 20 Jan 2010 | Tagged as: programming

TagSoup does this annoying thing of adding namespaces to the html it cleans. This annoyance becomes a major hindrance when formulating XPath queries for tagsoup-cleaned html. Instead of using //body/a/@href we have to do //html:body/html:a/@href I spent a couple hours trying to figure out how to disable namespace prefixes in TagSoup. This does not work: […]

Using expressions to assign PHP static variables

Posted by Kelvin on 14 Jan 2010 | Tagged as: programming, PHP

OK. The PHP manual explicitly states you CANNOT use an expression when assigning to a static variable. You can, however, do this: class MyClass { public static $a = 1; public static $b; public static function init() { self::$b = self::$a + 1; } } MyClass::init(); Nifty eh?

Handling single query multiple ResultSets in MySQL and JDBC

Posted by Kelvin on 14 Jan 2010 | Tagged as: programming

I've used JDBC with MySQL forever, but funnily enough, never tried issuing multiple statements in a single query, which results in multiple resultsets. If you ever get this SQLException ResultSet is from UPDATE. No Data., then read on my friend. Here's the lowdown: 1. Add ?allowMultiQueries=true to your JDBC URL, like so jdbc:mysql://localhost/mydatabase?allowMultiQueries=true Note: if […]

Random Drupal Tidbits [Dec 29th]

Posted by Kelvin on 29 Dec 2009 | Tagged as: Drupal Kamikaze

– using the Content Permissions module, you can create "hidden" fields, or admin-only fields. – you can make the body field go away for a custom content type by going to Content Type -> Edit -> Submission form settings, and clearing out the body textfield label. You can also change the label for the title […]

Pro Drupal Development = godsend

Posted by Kelvin on 29 Dec 2009 | Tagged as: Drupal Kamikaze

Having installed a bunch of modules, read abit of the API, I tried to find out more about developing with Drupal, but hit abit of a roadblock. Until I stumbled on.. Pro Drupal Development by John VanDyk & Matt Westgate. You really want the 2nd edition coz it covers Drupal 6. The book was nothing […]

Drupal Basics

Posted by Kelvin on 29 Dec 2009 | Tagged as: Drupal Kamikaze

Good at coding, but new to Drupal? Follow this series of posts documenting my journey into the world of Drupal. Why Drupal? I compared Drupal with Joomla. Consensus on the net is that Joomla is more slick and polished, but Drupal is way more flexible and performant. Lots of big visible sites running on Drupal. […]

Drupal! Drupal!

Posted by Kelvin on 29 Dec 2009 | Tagged as: Drupal Kamikaze

Embarking on a journey to learn Drupal. Will post more about this in the coming days/weeks/months.

Solr 1.4.. but no fastvectorhighlighter

Posted by Kelvin on 10 Dec 2009 | Tagged as: programming

Solr 1.4 has been released. OK. its old news. Exactly one month old actually. However… the release doesn't include Lucene's FastVectorHighlighter. I ended up writing my own simple plumbing code to fillin the gap for now. My own informal testing showed a 40-60% decrease in highlighting times for largish (1MB+ in size) documents. Definitely impressive.

java.net.URL synchronization bottleneck

Posted by Kelvin on 08 Dec 2009 | Tagged as: programming, crawling

This is interesting because I haven't found anything on google about it. There's a static Hashtable in java.net.URL (urlStreamHandlers) which gets invoked with every constructor call. Well, turns out when you're running a crawler with, say 50 threads, that turns out to be a major bottleneck. Of 70 threads, I had running, 48 were blocked […]

« Previous PageNext Page »