We might still introduce them in a future release to optimise a few queries.
Two years ago we started with Clickhouse - we also built the aggregation pipelines around this. For data ingest we have a variety of different solutions, but at present stream to BigQuery for simplicity rather than anything else.
We will be bringing Clickhouse back soon (we are speaking to them at the moment!) as we try and find a way that would enable those wishing to self-host to have a simple and scalable setup.
Our goal is to support the self-hosed version as we believe that will help the community thrive.
Makes sense!! We found the same thing with redash and bigquery. We basically do the same, and don’t really need to address much at our scale.
Self-hosted will be important
business enabler for you imo. SaaS isn’t so promising in the advent of data as an asset, and beyond exponential growth of data that requires more protective, diverse, and thoughtful enterprise enablement.
We think self-hosted might be the right direction. We are going to be putting a ton of effort and energy into our self-hosted offering over the next 6-12 months. We want to support all the major databases. At present we only have MongoDB (not idea for large projects) and BigQuery, although Clickhouse is not too far off now.
We have it all available to run via docker-compose. There is a nice one liner to spin it up and have a look. There is also commercial version should you not want to manage the infrastructure yourself.
That being said... we will be releasing multiple cloud setups via Terraform for the self hosted version. We had a call with Clickhouse Cloud yesterday and that is set for a release in September - we'll aim to align with that and shortly after provide a full IAC setup.
I saw the compose file but I’m wondering if I need multiple cores and a lot of ram? Can I just host it on my raspberry pi or I need to spin up a x86 server?
I'd suggest a minimum of 2GB ram. The data collection is done via Java (Micronaut project). While it can cope with thousands of requests per second quite happily on a single instance, it does require a bit of ram.
Are you optimizing write & read processing from BigQuery? Is there middleware (eg materialized views, pubsub, cloud funcs?) Don’t need these?