Thanks a lot for your comment! We agree that a dataset as small as 5 GB may sound strange but it was a conscious decision. Check out our blog post to read more about the methodology of this benchmark itself.
TLDR It's not our choice, but it's meaningful. Because this 5GB is single data segment and literally what you will have in Elastic/etc when you have overall TBs of data. See https://www.elastic.co/docs/deploy-manage/production-guidanc... (single shard is one Lucene index that contains multiple data segments)
https://blog.serenedb.com/search-benchmark-game-overview