Playing with JDBM (I)

I must admit I like JDBM for desktop applications. It looks extremely strong. And it is very compact. It's runtime weights only 85Kb so it looks perfect for Java Web Start downloads. Since JDBM is based on a B-Tree, it allows only for one key for queries, but this is probably good enough for me. The fact is that JDBM allows for different B-Trees in the same database. It also allows for other indexing techniques, such as hashing, so you can use a JDBM database as a huge hashtable for storing stuff.

In this entry I explore the basics of using JDBM, so that I can reference them in further entries. (In the future I'll aggregate all JDBM blog entries into my Links Room).

Where to get it

JDBM is hosted at JDBM's sourceforge homepage. The latest release seems to be 1.0 (released on August 11, 2005). As far as I can tell little bugs has been reported against it.

The JDBM API seems to be well documented. I'd add some more examples into the javadoc, but well, I must admit I'm a javadoc fanatic ;-)!!

Opening a database

JDBM databases are stored in a single file on disk. JDBM automagically appends the extension ".db" to files, so if you database is called "mydb" then a "mydb.db" file is created. JDBM uses a logging mechanism for transactions, so temporary files with an ".lg" extension may also appear in your filesystem.

JDBM databases are represented as RecordManagers in Java. RecordManagers are created using a RecordManagerFactory. The basic code looks like this:

import jdbm.RecordManager;
import jdbm.RecordManagerFactory;

...

RecordManager mydb = RecordManagerFactory.createRecordManager("mydb", new Properties() );

Note that "mydb" is the name of the database. This could also be an asbolute path to a file, such as "/var/tmp/mydb".

The second argument to RecordManagerFactory is a Properties file that allows you to indicate different arguments to create the RecordManager. There's little documentation on the properties object so I haven't yet experimented with those. More on this (probably) on a future entry.

Creating a B-Tree

Creating a B-Tree indexing mechanism with JDBM is, well, a piece of cake.

B-Trees store information by sorting keys (and keeping keys sorted in a tree structure). So, well, of course, in order to cre ate a B-Tree you need to specify how to sort keys.

The natural way to do this in Java is by using a java.util.Comparator, of course. So if you are going to use a String as the key for your data you'll need a StringComparator. JDBM kindly provides a jdbm.helper.StringComparator for you (there're no StringComparators in the Java API). In fact JDBM does also provide a jdbm.helper.IntegerComparator and a jdbm.helper.LongComparator (and others) just in case you want to use Integers or Longs as the keys in your B-Trees.

JDBM databases may hold different B-Trees within the same database (that's great!). So you probably want to give a name to your B-Tree to differenciate it from others. JDBM stores the B-Trees in specific records (indexed by longs), but you can associate a name to a long (so that you don't have to remember the long yourself).

I think this is getting too confusing, so let me explain it with some code to clarify. This little function:

/**
 * Obtains a BTree used to index objects, or creates it if it does not exist.
 * @param aRecordManager the database.
 * @param aName the name of the BTree.
 * @param aComparator the Comparator object used to sort the elements of the BTree.
 * @return the BTree with that name.
 * @throws IOException if an I/O error happens.
 */
public static BTree loadOrCreateBTree( RecordManager aRecordManager,
  String aName, Comparator aComparator )
  throws IOException
{
  // So you can't remember the recordID of the B-Tree? Well, let's
  // try to remember it from its name...
  long recordID = aRecordManager.getNamedObject( aName );
  BTree tree = null;

  if ( recordID == 0 )
  {
    // Well, the B-Tree has not been previously stored,
    // so let's create one
    tree = BTree.createInstance( aRecordManager, aComparator );
    // store it with the given name
    aRecordManager.setNamedObject( aName, tree.getRecid() );
    // And commit changes
    aRecordManager.commit();
  }
  else
  {
    // Yes, we already created this B-Tree in a previous run,
    // so let's retrieve it from the record manager
    tree = BTree.load( aRecordManager, recordID );
  }
  return tree;
}

I hope this makes things clear enough. Let's see now how to...

Insert things into a JDBM B-Tree

The fact is that you can store any serializable Java object into a B-Tree. No SQL required. No object-relational mapping. No Spring. No Hibernate. No complex XML files. Just simple serialization.

Inserting things into a B-Tree is as simple as:

RecordManager mydb = 
  RecordManagerFactory.createRecordManager( "mydb", new Properties() );
BTree indexByName = 
  loadOrCreateBTree( mydb, "index-by-name", new StringComparator() );

  // Insert "http://www.antonioshome.net" indexed by "Antonio"
  // "true" means to replace if the record already exists
  indexByName.insert( "Antonio", "http://www.antonioshome.net", true );

  // Insert another serializable object indexed by another string
  indexByName.insert( "AnInteger", new Integer(3), true );

That's it. Of course you'll have to handle IOExceptions (instead of SQLExceptions), but that's all.

Well, almost. From time to time you should remember to commit changes to the database, by issuing a commit() operation:

mydb.commit();

You should not cal "commit()" frequently (at least not on each iteration in a loop) because that's probably a performance issue. Just issue "commit()" from time to time. Experiment with it to fit your needs.

Deleting things

Deleting things is also very easy. Issuing a "indexByName.delete( aKey )" is good enough. The JDBM API is crystal clear on this.

Searching keys on B-Trees

Browsing B-Trees is also a piece of cake. JDBM development team has done a great job on this, because they've thought in little details to make your life easier.

You browse B-Trees by using a single Tuple. A Tuple is formed by a a key and a value. Note that you use only a single Tuple, and not lots of them. A single instance is good enough to search all entries. This is very intelligent because if you store several zillion keys in your B-Tree you wouldn't want to instantiate several zillion Tuples, right?

So the very first thing you need to iterate over a B-Tree recordset is a Tuple:

Tuple tuple = new Tuple();

And now you problably want to search the B-Tree. JDBM uses a TupleBrowser (I'd prefer TupleIterator, but well, nothing is perfect! ;-)) to browse entries:

// Search all keys bigger than "Antonio"
TupleBrowser browser = indexByName.browse("Antonio");
while( browser.getNext( tuple ) )
{
  System.out.println("KEY:     " + tuple.getKey() );
  System.out.println("VALUE:   " + tuple.getValue() );
}

You don't need to discard nor close TupleBrowsers, because these usually require just a "long" to keep a pointer to the entry.

Note that you shouldn't change the underlying B-Tree (by inserting or deleting stuff) while iterating on a TupleBrowser, because that will probably make the TupleBrowser useless. This also happens with different objects in the java.util.* Collections API.

So that's it!

So that's it at the moment. I'd appreciate corrections on the examples above, just to make sure everything is correct.

In the next entries I'll try to explore different lightweight storage mechanisms (that do not require all the SQL hassle) and think on how to organize all this in a GUI application.

Please provide feedback on this and have a good weekend (I need some rest!).

Cheers, Antonio

blog comments powered by Disqus