|
|
|
Show HN: DataKit – Complete data analysis platform now self-hostable
(datakit.page)
|
|
4 points
by aminkhorrami
389 days ago
|
|
DataKit started as my solution to Excel crashing on large files, but it's grown into a full browser-based data platform that handles CSV/Parquet/XLSXL/JSON files up to 20GB+ entirely client-side.
What it does: Drag files → instant SQL querying (joins, aggregations, everything)
Automatic data profiling (quality issues, null values, duplicates)
Smart visualizations for every column type
Export transformed/filtered results Now self-hostable: After requests from teams needing this behind firewalls, the entire platform can run on your infrastructure via pip/Docker/brew/NPM.
Technical details: Built on DuckDB-WASM with heavy performance optimizations. All processing happens in-browser – your data never leaves your environment, whether using the hosted version or self-hosted setup.
Live demo: https://datakit.page
Self-hosting docs: https://docs.datakit.page
Previous discussion: https://www.reddit.com/r/dataengineering/comments/1l1i3ry
Built this because I was tired of the choose-two problem: fast analysis, large files, or keeping data local. Now you can have all three.
Feedback welcome – what data analysis pain points should I tackle next?
(Would be super happy to have a talk on Discord:
https://discord.gg/grKvFZHh) |
|