Kelvin Tan - Solr/Elasticsearch Consultant - Friendlier Thunderbird date columns

Posts about programming

Friendlier Thunderbird date columns

Posted by Kelvin on 24 Jan 2010 | Tagged as: programming

In Thunderbird, there's a way to customize the display of the date header in your inbox.

Short answer: Go http://kb.mozillazine.org/Date_display_format and follow instructions

Long answer:

1. Edit > Preferences > Advanced > Config Editor
2. Type 'dateformat' in the search box and hit enter.
3. If the following 3 entries do not appear, you'll have to create them.

mail.ui.display.dateformat.today
mail.ui.display.dateformat.thisweek
mail.ui.display.dateformat.default

4. Right click in the empty space of the results pane. A menu will appear.
New > Integer
Preference name: mail.ui.display.dateformat.today
Preference value: 0
New > Integer
Preference name: mail.ui.display.dateformat.thisweek
Preference value: 4
New > Integer
Preference name: mail.ui.display.dateformat.default
Preference value: 2

5. If you already have these config entries, just change the values to match what I have above.

What this does is:

1. For messages received today, displays only the time (e.g. 10:15am):.
2. For messages received this week, displays day and time (e.g. Friday 10:15am)
3. For all other messages, use the long form (mm/dd/yyyy)

No Comments »

dom4j.org – WTF?

Posted by Kelvin on 22 Jan 2010 | Tagged as: programming

dom4j is one of the better XML parsing Java libraries out there.

Its released under the uber-liberal BSD license, and is the brainchild of James Strachan, also of Jelly and Groovy fame.

Yesterday I was working on some dom4j stuff, and noticed that www.dom4j.org (I'm not going to link to it) is no longer a mirror of the original http://dom4j.sourceforge.net.

Instead, its been taken over by some SEO assholes in belgium (www.yxymedia.com) who have made a visual clone of the dom4j look-and-feel, but have changed it to be about "making your own website. The headers now read "DOM4J – Making Your Own Site".

WTF??!!

There's no 2 ways around it. This is unethical, misleading, and embarrasing.

Guys @ yxymedia, if you read this, please stop.

No Comments »

CSS3 Selectors in Java

Posted by Kelvin on 21 Jan 2010 | Tagged as: programming

http://github.com/chrsan/css-selectors/tree has a cool implementation of full CSS3 Selector support.

Yes, this is the same CSS selector support as you get in JQuery. Eat your heart out.

It comes with a org.w3c.dom implementation out-of-box.

I augmented it with a dom4j implementation (so I could mixin tagsoup for real-world HTML).

It's slow as a dog compared with native xpath or regex, but its still cool nonetheless.

Update: OK, I was wrong about performance. My initial dom4j implementation was slow BECAUSE of xpath actually. When I changed it to use dom4j node traversal methods, performance increased by over 50x. I'm happy with performance now.

2 Comments »

Dom4j + XPath + TagSoup – Namespaces = sweet!

Posted by Kelvin on 20 Jan 2010 | Tagged as: programming

TagSoup does this annoying thing of adding namespaces to the html it cleans.

This annoyance becomes a major hindrance when formulating XPath queries for tagsoup-cleaned html.

Instead of using

//body/a/@href

we have to do

//html:body/html:a/@href

I spent a couple hours trying to figure out how to disable namespace prefixes in TagSoup.

This does not work:

parser.setFeature(org.ccil.cowan.tagsoup.Parser.namespacesFeature, false);

This doesn't work either:

parser.setFeature(org.ccil.cowan.tagsoup.Parser.namespacePrefixesFeature, false);

Finally stumbled on a crude bruteforce solution at http://www.mail-archive.com/dom4j-user%40lists.sourceforge.net/msg02511.html

/**
     *Removes namespaces if removeNamespaces is true
     */   
    public static void fixNamespaces(Document doc){
        Element root = doc.getRootElement();       
        if(removeNamespaces && root.getNamespace() != 
Namespace.NO_NAMESPACE) removeNamespaces( root.content() );               
    }
 
    /**
     *Puts the namespaces back to the original root if removeNamespaces 
is true
     */   
    public static void unfixNamespaces(Document doc, Namespace original){
        Element root = doc.getRootElement();
        if(removeNamespaces && original != null) 
setNamespaces(root.content(), original);
    }
 
    /**
     *Sets the namespace of the element to the given namespace
     */
    public static void setNamespace(Element elem, Namespace ns){
        elem.setQName( QName.get( elem.getName(), ns, 
elem.getQualifiedName() ) );
    }
 
    /**
     *Recursively removes the namespace of the element and all its 
children: sets to Namespace.NO_NAMESPACE
     */
    public static void removeNamespaces(Element elem){
        setNamespaces(elem, Namespace.NO_NAMESPACE);
    }
 
    /**
     *Recursively removes the namespace of the list and all its 
children: sets to Namespace.NO_NAMESPACE
     */
    public static void removeNamespaces(List l){
        setNamespaces(l, Namespace.NO_NAMESPACE);
    }
 
    /**
     *Recursively sets the namespace of the element and all its children.
     */
    public static void setNamespaces(Element elem, Namespace ns){
        setNamespace(elem, ns);
        setNamespaces(elem.content(), ns);
    }
 
    /**
     *Recursively sets the namespace of the List and all children if the 
current namespace is match
     */
    public static void setNamespaces(List l, Namespace ns){
        Node n = null;
        for(int i=0; i<l.size(); i++){
            n = (Node)l.get(i);
            if(n.getNodeType() == Node.ATTRIBUTE_NODE) ( (Attribute)n 
).setNamespace(ns);
            if(n.getNodeType() == Node.ELEMENT_NODE) setNamespaces( 
(Element)n, ns );
        }
    }

Grrrrrrr….. but at least we can say goodbye to prefixes in xpath queries.

7 Comments »

Using expressions to assign PHP static variables

Posted by Kelvin on 14 Jan 2010 | Tagged as: programming, PHP

OK. The PHP manual explicitly states you CANNOT use an expression when assigning to a static variable.

You can, however, do this:


class MyClass {
  public static $a = 1;
  public static $b;

  public static function init() {
    self::$b = self::$a + 1;
  }
}
MyClass::init();

Nifty eh?

No Comments »

Handling single query multiple ResultSets in MySQL and JDBC

Posted by Kelvin on 14 Jan 2010 | Tagged as: programming

I've used JDBC with MySQL forever, but funnily enough, never tried issuing multiple statements in a single query, which results in multiple resultsets.

If you ever get this SQLException ResultSet is from UPDATE. No Data., then read on my friend.

Here's the lowdown:

1. Add ?allowMultiQueries=true to your JDBC URL, like so


jdbc:mysql://localhost/mydatabase?allowMultiQueries=true

Note: if you don't perform this step, the MySQL JDBC driver doesn't tell you you need to. It just complains with the usual syntax blah:


You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version...

2. Make your usual JDBC connection plumbing, then create a Statement object and call execute().


Statement stmt = conn.createStatement();
stmt.execute(sql);

3. Now the fun part. Since you're issuing multiple statements, some may return resultsets, and others may not.

The 3 methods that's going to help us navigate the multiple resultsets are getUpdateCount(), getMoreResults() and getResultSet(). Here's one loop that binds them all.


 while(true) {
      if(stmt.getUpdateCount() > -1) {
        stmt.getMoreResults();
        continue;
      }
      if(stmt.getResultSet() == null) break;
      ResultSet rs = stmt.getResultSet();
      while (rs.next()) {
        // do something
      }
    }


				
					1 Comment »


						
				Solr 1.4.. but no fastvectorhighlighter
				Posted by Kelvin on 10 Dec 2009 | Tagged as: programming
				Solr 1.4 has been released. OK. its old news. Exactly one month old actually.
However… the release doesn't include Lucene's FastVectorHighlighter.
I ended up writing my own simple plumbing code to fillin the gap for now. 
My own informal testing showed a 40-60% decrease in highlighting times for largish (1MB+ in size) documents. Definitely impressive.
				
					2 Comments »				
			        
			        		                 
		                
						
				java.net.URL synchronization bottleneck
				Posted by Kelvin on 08 Dec 2009 | Tagged as: programming, crawling
				This is interesting because I haven't found anything on google about it.
There's a static Hashtable in java.net.URL (urlStreamHandlers) which gets invoked with every constructor call. Well, turns out when you're running a crawler with, say 50 threads, that turns out to be a major bottleneck. 
Of 70 threads, I had running, 48 were blocked on the java.net.URL ctor. I was using the URL class for resolving relative URLs to absolute ones. 
Since I had previously written a URL parser to parse out the parts of a URL, I went ahead and implemented my own URL resolution function.
Went from 
Status: 12.407448 pages/s, 207.06316 kb/s, 2136.143 bytes/page
to
Status: 43.9947 pages/s, 557.29156 kb/s, 1621.4071 bytes/page
after increasing the number of threads to 100 (which would not have made much difference in the java.net.URL implementation).
Cool stuff.
				
					No Comments »				
			        
			        		                 
		                
						
				TokyoCabinet Linux Install Script
				Posted by Kelvin on 06 Dec 2009 | Tagged as: programming, Ubuntu
				Updated on Mar 22 2011 for latest versions


export JAVA_HOME=/usr/lib/jvm/current #changeme!
export MYJAVAHOME=$JAVA_HOME
 
wget http://1978th.net/tokyocabinet/tokyocabinet-1.4.47.tar.gz
tar -zxvf tokyocabinet-1.4.47.tar.gz
cd tokyocabinet-1.4.47
./configure --enable-off64 --prefix=/usr
make && sudo make install
cd ..
 
wget http://1978th.net/tokyocabinet/javapkg/tokyocabinet-java-1.24.tar.gz
tar -zxvf tokyocabinet-java-1.24.tar.gz
cd tokyocabinet-java-1.24
./configure --prefix=/usr
make && sudo make install
cd ..


You may need bzip2-devel + zlib (RH/Fedora) or libbz2-dev + zlib1g-dev (Debian/Ubuntu) installed before running configure.
Don't worry about the second bit if you don't need the java bindings.
				
					6 Comments »				
			        
			        		                 
		                
						
				TokyoCabinet Installation snafu on Fedora
				Posted by Kelvin on 06 Dec 2009 | Tagged as: programming
				Just installed TokyoCabinet on Fedora. Installation went like a breeze. Except..when running the Java app that uses TC, it complained about an UnsatisfiedLinkError:

libtokyocabinet.so.9: cannot open shared object file: No such file or directory – /usr/lib

Thanks to http://jibbajabba.info/  who in turn credits http://www.machinelake.com/2009/03/22/nerding-out-with-ruby-tokyo-cabinet-hpricot-twitter-sinatra-haml-passenger/ 
The answer is simple:

ldconfig /usr/lib

or

ldconfig /usr/local/lib

depending on where you installed TC.
				
					No Comments »				
			        
			        		                 
		                
					« Previous Page — Next Page »

Supermind Search Consulting Blog Solr - Elasticsearch - Big Data

Posts about programming

Supermind Search Consulting Blog
Solr - Elasticsearch - Big Data