Hacker News new | ask | show | jobs
by tom_b 4203 days ago
I think if you are willing to trade the flexibility of normal relational back-end DBs for a storage of your own making, you can certainly optimize for a specific workload. But is that your case? Joe Celko made a comment in one of his SQL books that implied all data hackers wake up one day and think "I know, I'll just put everything into one big K/V table . . . "

Its interesting that you focus hard on "no SQL. So no SQL parsing. No query plan preparation, . . . ". Have you found SQL parsing and query plan prep to be actual bottlenecks in PostGres? This would be surprising to me.

Personally, I have found that occasionally doing simple things like memory mapping multi-gigabyte pre-sorted data and using simple binary search from within Java (well, Clojure to be truthful) can be quite performant. But that case involved static data and well-known search patterns. I have also flirted with column-stores for analytical workloads and they are pretty awesome for that.

There is such an explosion of options available these days in both relational and the NoSQL family. As much as I like hacking, I do try hard to ask myself "why can't one of these solutions do this hard work for me?"

I suspect a 10x improvement will elude you (no such thing as a free lunch), but I geek out hard reading what other people do in these cases, so if you have a business case (or personal side-interest) to tackle your own data store, have at it. Post us what you find sometime. Good luck.

2 comments

Binary search etc will be done by the KV store like LMDB. Actually generating bytecode dynamically idea is from the Clojure itself.

I may not have to do all this if my tables and indexes are fixed. They are not. User keeps adding new Models and columns. What needs indexing also keeps changing. (All this is explained in the link I shared).

You can already get a 10x improvement just by switching from SQL to LDAP. SQL parsing is expensive and inefficient, even with pre-compiled queries.

Writing indexers is pretty tedious; I personally would use something that has already done this for me. Like OpenLDAP.

So a directory server can be a replacement for SQL? Do any one use it for that purpose? I always thought they are both completely different requirements. I will look into the possibility. Thanks for the suggestion.