Hacker News new | ask | show | jobs
by bdg 4456 days ago
Someone should create a library where all your data is stored in a BTree. Think of it! Huge data sets, small data sets, everything scales in this system. And even better, they can create a language-agnostic API to a small server that just stores the btrees for them and gives them back on request. We could call it a database, but that would require we use indexes correctly for once in our lives.
3 comments

If you do want a B-tree library in C that's not tied to SQL, I've had good success with lmdb[0]. As a bonus, it's also transactional and writers do not block readers (and vice versa). Crazy fast and easier to use from C than SQLite IMO.

If you want B-trees that compete with LSM trees, stratified B-trees[1] are pretty sweet. There's an in-kernel GPL implementation called Castle[2] on GitHub.

[0] http://symas.com/mdb/

[1] http://www.slideshare.net/acunu/20110620-stratifiedbtree

[2] https://github.com/acunu/castle

And then we can stack every conceivable use case on top of it, and write that language-agnostic API in a way that makes interfacing with most current languages so onerous that people can build hundreds of thousands of lines of code to make it slightly less painful!

Maybe we can make the configuration so convoluted that we can base entire companies and revenue models around maintaining them.

>write that language-agnostic API in a way that makes interfacing with most current languages so onerous that people can build hundreds of thousands of lines of code to make it slightly less painful

Those would be people who couldn't be bothered to learn a really simple declarative language, and prefer ad-hoc OO ORMS and other shit like that...

Oh, SQL itself is fairly easy. For many things, I'm a fan.

However, SQL query construction and result munging is painful.

Consider a UI search screen that has eight potential search parameters that requires a one to many join for the results. The query construction, under something like JDBC, can end up with hundreds of lines of tedious code, like (and this is a condensed example!):

    String query = "SELECT a.*, b.* from table1 a inner join table2 b on a.fk_id = b.id where" // shorthand
    List whereClauses = []
    if (params.region) { whereClauses.add(getRegionWhereClause(params.region)) } //hope this doesn't require another join!)
    if (params.country) { whereClauses.add(getCountryWhereClause(params.country)) }
    ...
    if (params.lastParam) { whereClauses.add(getLastParamWhereClause(params.lastParam))} 
    query += whereClauses.join (" AND ")
    ...
Each of the params methods are going to have a few lines of code.

    def getRegionWhereClause(regionCodeList) { "a.region IN" } //Hope you never want to change the table alias here!
http://use-the-index-luke.com/sql/where-clause/obfuscation/s... shows why using an ISNULL/NVL hackaround for static queries is the wrong answer.

Then you're going to return a list of rows that looks like | a.1 | a.2 | b.1 | b.2 | | a.1 | a.2 | b.1' | b.2' |

Where you really want: | a.1 | a.2 | [[b.1 | b.2] | [b.1' | b.2']]

So you have to go through and munge it. (Can you use something like CONCAT as a hack and groupby? Sure, but that introduces other problems.)

Having a way to pass in optional parameters to a where clause for the DB to strip out (while staying performant) and being able to return a 1 to many as an array would solve many problems, but it doesn't fit the paradigm of SQL.

SQL is not a simple language and expecting everyone to lean it properly is crazy. Simple queries are easy, sure, but as with all things you quickly outgrow simple things and then you're stuck. Especially if you want good performance.
SQL is arguably at least as easy as any other language you can define around this idea. Usually the complications come from trying to normalize the crap out of your data and expecting a single relational model to work for all use cases.

Oddly, moving this to another domain would not make things any easier in that regard. If you tried to have a single data structure that stores all of your data for all of your use cases, expect pain.

NTFS uses BTrees I believe. And matches your requirements.
Mayhap I'm the one missing the joke. That being that this is just describing databases, period.