Previously we've compared the performance of the State History with a simple, in-memory, R-Tree implementation (for our use case). Now it would be interesting to compare it with something at the other end of the spectrum : a complete, heavy, tested-and-tried database server.
As we've done with the JSI R-Tree, we could the time intervals as 1-dimensional spatial information. A query would consist of passing a point (or a 1-dimension "rectangle", for range queries) and asking "which lines cross this point".
At first I looked at SQLite, since I figured it would be nice not having to configure a complete database server (for me, and for other people who might eventually use this backend, who knows). SQLite does have an R*Tree module which seems capable enough.
Unfortunately, what SQLite was lacking was a recent and active Java connector. Some JDBC plugins exist, but most of them were out of date, and none of them seemed to support the RTree module. And using a JNI wrapper was out of the question, since it would fuzz the performance results too much.
I have then been suggested to look at PostGIS, which is a PostgreSQL addon to "spatially enable" the database. Not only does it look quite capable, but the JDBC connector is actually a part of the main project, and is up-to-date and compatible with Java 6. It's even already packaged in Debian/Ubuntu under "libpostgis-java"! The small downside is having to install and configure an external server, but it is mainly for testing purposes at the moment. Even if it ends up staying in the main tree, all State History inputs and backends are now plugins so people can simply decide to not use it if they don't want it.
Most of this stuff is quite new to me, so I've been reading on it (Postgres, JDBC, SQL in general...) It will be important to carefully define how the tables are built and which statement/queries we use. Thanks to the documentation (which I have to admit is quite good, for once!) and the help of more knowledgeable people, it seems the best way to go will be to store all the interval information in the database itself (so time range, attribute integer and state value), on the same row.
Then we'd have the colums in the table organized as:
- A colum for the "geometry" (in my case, a LineString which represents a time range)
- An integer colum for the attribute
- A colum for the type of state values (similar to the "type" byte in the History Tree nodes)
- An integer column
- A String column
The two last columns are obviously for the state values, and we only use the one we need (or none in the case of a null value). It would also be easily extensible if we want to support more state value types.
Once we are done inserting the entries, we will build an index on the geometry column, since all queries are by timestamps (altough since we are in a database, we could eventually do more advanced stuff, like queries per PID, etc.)
When we are done building, we could run queries using a Point and asking which entries cross it. I'll have to check however if a 0-dimension Point can actually intersect something, or if we'll have to pass a perpendicular line.
The implementation is underway, and should be complete Quite Soon(TM). So far I have to say I really like the API and the way it works (simply pass the database name, username, password, and poof, it goes and does its thing). If it ends up working well I would definitely keep it as a feature, since it would allow for more specific type of queries as mentionned before, although it would probably come at a performance cost.