Hacker News new | ask | show | jobs
by user5994461 2780 days ago
In my experience, Elasticsearch is triple the size of the data.

    First is the actual json data in quasi plain text.
    Second is the _source field that duplicates the original input object (necessary for reindexing/rebuilding)
    Third is the _all field that duplicates the json data as text (only used for some text search, better disable it).
Finally, the index is duplicated to replicas, at least one if you want any redundancy.

Index compression with lz4 takes 20 or 30% off, new feature of elasticsearch v5.0, on by default.