No RDMBS, No SQL: too heavy, too classic, too much a hassle

I'm afraid I'll insist in desktop persistence without RDBMS in this entry (and maybe in a few others). I've found some nice B-Tree based storage mechanisms, that seem to be widely used out there.

Some people have suggested using Derby. There're lots of embeddable, pure Java, RDBMS out there. Derby, PointBase, HSQL to name a few.

But I don't want to embed a RDBMS just to store a list of URLs, a list of categories and their relationships. I think using a RDBMS just for that is too expensive. And, of course, I like doing some research in my spare time and learning about different ways to do things. After all that's the fun part of it.

Some other people have suggested using prevayler. But the problem with prevayler is that it keeps everything in memory and, well, I think that's not very efficient.

Finally some other people have suggested using XML serialization. That's probably a good idea but you cannot query things. That's a problem for me.

What I really need is persistence and query capabilities.

Chet Haase asking it too

It seems Chet Haase has also blogged about lightweight persistence mechanisms. Reading Chet's blog comments I've seen people use XML serialization mechanisms, the JDK Preferences API and some other cool stuff.

But all those mechanisms do not provide efficient querying capabilities. Storing stuff in XML is easy to do (you can serialize Java Beans to XML easily). But the problem is querying that information. And that's something I need to do too. I need an efficient way to store and query information. Let's see what people is using out there. Let's not try to reivent the wheel.

B-Trees

As I said in my previous entry a simple file with records can be efficiently indexed using B-Trees.

As you probably know B-Trees are a data structure that allow for efficient storage and search using a key. If you want to index the main data file with two fields (say name and surname) then you'll need two indexes (two B-Tree structures). B-Trees provide logarithmic costs for inserting, removing and querying data. That basically means that if you have N records you need log(N) disk accesses to retrieve information. That allows you to store lots of information with little penalty.

Using B-Trees is probably the most efficient mechanism to achieve persistence and being able to query data. Let's see who is using B-Trees when.

  • B-Trees are used in NetBeans for storing lots of things. I think they use B-Trees for storing all the code-completions for all classes. That's a lot of information that must be queried very quickly so that there's little delay between the user writing things and the system replying with a list of code-completions.
  • JISP Java Indexed Serialization Package is a little database that uses B-Trees for indexing serialized objects. It's something to investigate too. It's very lightweight and gives good flexibility.
  • .QDBM is an implementation of the Unix DBM, that may use B-Trees too. It has an Java API (I'll have to investigate it too). It seems QDBM has been used by Eric S. Raymond to build a bayesian spam filter.
  • JDBM is a pure Java implementation of the Unix DBM, that uses B-Trees and H-Trees (wow, I didn't know those ones. H-Trees stand for scalable persistent hashtable, wow). What I like about JDBM is that it is extremely small. Binaries, Javadoc and examples fit in a 244Kb. zip file. Another thing I like about JDBM is that is being used by lots of people (including blojsom, ActiveMQ and OpenJMS) so it's probably rock solid.
  • Sesame's OpenRDF version 2.0 (alpha) seems to be able to save stuff in a filesystem too, so no RDBMS is needed. OpenRDF uses also B-Trees, so it may be work a look.

I'll keep on investigating

Some other stuff I want to investigate are multivalued databases. These store information in non-first formal form (I like that! It's efficient for hierarchical data), and are the basis of the pick operating system. Since these are good for storing hierarchical data it seems they fit well for XML documents (that'd be nice in combination with XStream or the JavaBeans XML serialization format).

Maverick is an open source multivalue database, but it seems too heavy for embedding in desktop applications.

So that's it at the moment. I'll blog whenever I find anything new for storing and querying my RSS Feed Reader model. Meanwhile I'll appreciate suggestions and ideas on how to best do it.

Thanks in advance, Antonio

blog comments powered by Disqus