The author did give some hints on how he built it, e.g. indexing/search is driven by Apache Solr on a couple of 20-core machines.
Regarding data ingestion you probably can look at some prior art like this: https://github.com/garysieling/solr-git
There are also some pretty decent books referenced in the Solr docs: https://lucene.apache.org/solr/resources.html