With the new plugin system it's now easy to compare different types of plugins together when running tests. Here I tested the behviour when using an R-Tree as a state storage backend, compared to the History Tree I've been working on so far.

As I mentionned previously, I've added support for using in-memory R-Trees as a backend to the State History, using the Java Spatial Index library. Since this implementation is limited to RAM, it was obvious we wouldn't be able to handle very large traces and state systems, but it should at least give us an idea of the relative performance of the History Tree against well-tested methods.

I compared the performance of the R-Tree to both a complete and a partial History Tree. Both History Trees used 32K blocks and a maximum of 50 children/node. The partial history saved checkpoints every 50K events. Detailed results are:

R-Tree vs. History Tree
Construction
Full queries
Single queries

Just a reminder, a "full query" means restoring the complete saved state to what it was a given time, for all attributes. A "single query" is querying the state of only one attribute at a time, so we stop searching once we find this attribute/timestamp pair. It was relatively simple to implement both search methods with the R-Tree provider.

Also, avid readers may notice that the Partial history provider gained support for single queries. However it is implemented by falling-back on full queries + choosing the right attribute, which explains its poor performance.

 

The results are a bit surprising. I was expecting the R-Tree to fare better than the HT, for the sizes at which it would fit in RAM. We could then at least say the HT isn't "that bad", and has the advantage of going up to much larger sizes.

Turns out the R-Tree is *very* slow to fill up. This is probably due to the constant re-balancing that happen with the high number of insertions that we require (profiling reveals the two worst offenders are RTree.pickSeeds() and RTree.pickNext(), followed by Rectangle.area()). The History Tree does not support re-balancing, by design, so we avoid this costly step. Once built however, the R-Tree is *very* fast to query : ~300 us for full queries and around 10 us for singles.

The R-Tree tests stop at the 100 MB trace, since the next one (500 MB) would give OutOfMemory errors with the default JVM settings. By increasing the available VM memory to a total of 6 GB (which is what I have on my workstation), it was able to keep going, but after about 2 hours it wasn't even done building the tree.

 

I don't think the chosen implementation or R-Trees in general are bad, but since they don't map well to 1-dimensional areas they are probably not the best solution for our state storage problem (they usually offer better performance than other methods at 2+ dimensions). It is however a bit sad that the long building time makes it unusable in practice here, even for small traces, because the query performance is VERY interesting!