Hacker News new | ask | show | jobs
by kpmcc 669 days ago
Hey! I work at a startup that does industrial automation related work and this looks super helpful. Going to take a deeper look later, but off the bat I wanted to ask why you felt a custom time series database was warranted when there are options like timescale or regular old postgres out there?
1 comments

Hey! Great question we get a lot. We've come from/talked to a lot of companies that do what you described with stuff like timescale and influxdb. They're useful tools and support a breadth of applications. We thought by building one to specifically leverage the read/write patterns you'd expect with sensor-heavy systems, we could achieve better data throughput and thus better enable real-time applications. For example, we've been able to get 5x write performance for sensor data on our DB compared to influxDB.

In general, having built out the core DB, it has been valuable in allowing us to expand to the other useful features such as being able to write commands out to hardware at sufficient control loop frequencies or create smooth real-time visualizations.

The other thing we think is really powerful is having a more integrated tool for acquiring, storing, and processing sensor data & actuating hardware. One common issue we experienced was trying to cobble together several tools that weren't fully compatible - creating a lot of friction in the overall control and acquisition workflow. We want to provide a platform to create a more cohesive but extensible system and the data storage aspect was a good base to build that off of.

Thanks for the reply! That all makes sense, and I can totally relate to the "cobbling together several tools that weren't fully compatible" experience. There's enough complexity with having to support or integrate sensors/actuators with a variety of industrial networking protocols. Anything to simplify the software portion of the system would go a long way. Excited to dig into this a bit more, best of luck with ongoing development!
Thank you! Happy to get any feedback after a deeper look
Did you build on any low-level libraries like RocksDB for data persistence etc.? Or did you fully hand-roll the database? Curious about the tradeoffs there nowadays.
The core time series engine, called cesium (https://github.com/synnaxlabs/synnax/tree/main/cesium), is written completely from scratch. It's kind of designed as a "time-series S3", where we store blobs of samples in a columnar fashion and index by time. We can actually get 5x the write performance of something like InfluxDB in certain cases.

For other meta-data, such as channels, users, permissions, etc. we rely on CockroachDB's Pebble, which is a RocksDB compatible KV store implemented in pure go.

One thing that might keep me from using this is how well it integrates with other tools I might want to use for data analysis or historical lookback. For example, I currently use grafana as a simple, easy way to review sensor data from our r&d tests. Grafana has solid support for postgres, timescale, influxdb and a number of other data sources. With a custom database, I'd imagine the availability of tools outside of the synnax ecosystem would be rather limited.
That's a valid concern! As we're currently a team of 3 building this out - we still are working on building out our integrations with other tools, hardware, etc. We have been prioritizing building direct integrations to systems that our current users are interested in.

We also value enabling developers to build off of Synnax or integrate the parts they are most interested in into their existing systems. We've tried to service that end by building out Python, Typescript, & C++ SDKs and device integrations. We're continuing to look into how we can better support developers build/expand their systems with Synnax, so if there are any integrations you think are important, I would appreciate your take.