| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by gwern 2383 days ago
	Isn't one of the benefits of time-series databases more compact storage at minimal overhead?

3 comments

kootenpv 2383 days ago

Definitely! I really optimized for "no development time spent" and was just using pandas to extract html tables into csv and just store them :-). 2 lines of code really. I had no idea I would have it running for so long.

It was really just the example that made me wonder why I have to consider which compression would be best for files with my characteristics - but not saying it was best practice to begin with haha!

link

proverbialbunny 2383 days ago

Does anyone know what is the leading time-series database people use today? Like, eg, bar / tick data.

(I use PostgreSQL, due to my ignorance.)

link

kbaker 2383 days ago

If you can model the domain, then you can achieve very good compression.

See this recent timescaledb post (since you mentioned Postgres) which goes over an array of techniques used in column-store databases that general-purpose compressors on a csv file full of data would not be able to match:

https://blog.timescale.com/blog/building-columnar-compressio... discussion: https://news.ycombinator.com/item?id=21412596

link

JamesBarney 2382 days ago

I've been happy with influxdb, but I've also noticed that a lot of people switch to a time series database way before they need to. You can go pretty far with postgres/oracle/sql server, and it has the advantage you don't need to manage different databases.

link

proverbialbunny 2382 days ago

Thanks for the response. I've avoided influxdb because it drops data. Bar and tick data needs to be ACID or at least very close to ACID.

InfluxDB is great for server usage stats though.

link

JamesBarney 2381 days ago

I don't know much about bar or tick data. Mind going into more detail about dropping data?

We didn't really have a need for transactions because we were just recording a bunch of sensor data from oil and gas wells from the around the world and then running computations and then displaying and alerting on the results.

link

proverbialbunny 2381 days ago

Bar and tick data is financial, so dealing with numbers. It has to be precise, like not 0.0000001 off and dropped data isn't the end of the world, but it's pretty bad.

link

ploika 2383 days ago

I've never used it myself, but kdb+ is meant to be excellent if you need it.

link

kootenpv 2383 days ago

There's TimescaleDB built on top of PostgreSQL.

link

kootenpv 2383 days ago

I'm a long time fan of your blog :O

link

sillysaurusx 2383 days ago

Me too! A few months ago I made detailed notes on how to set up your own: https://github.com/shawwn/wiki

Gwern was kind enough to assist by sending over the exact version numbers of all the Haskell libraries it depends on, and answering some questions about deployment. The version numbers turned out to be crucial to getting everything running.

IMO https://www.gwern.net/ is the ideal combination of style + ease of use (for the writer) + effective ways of organizing knowledge.

The whole thing is hosted out of an S3 bucket, so there's no server to manage and zero downtime. I've wondered if it'd be possible to use github pages for this purpose, since that would make it completely free. But it only takes a couple hours of work to get everything up and running. The biggest delay is waiting for haskell to compile all the libraries.

link

kootenpv 2383 days ago

Yea, look at jekyll in combination with github pages. You can see my blog for example (https://vks.ai), the code is hosted here: https://github.com/kootenpv/kootenpv.github.io

link

gwern 2383 days ago

The downside with switching over to Jekyll is that gwern.net is increasingly dependent on Haskell integration with Pandoc, which would be difficult if you were compiling with another language: you'd have to refactor all of the additional rewrite passes (for interwiki scripts, image dimension specification, affiliate links, link annotations, inflation adjustment...) to make it at all possible, and you'd be using a lot of Haskell anyway. So it goes - the Law of Equivalent Exchange.

link

gwern 2383 days ago

Thanks. I think it's a nice point in design space. Sort of a Art Deco Modernist minimalism, perhaps? But with rich features and excellent performance.

link