Hacker News new | ask | show | jobs
by matteuan 2144 days ago
For research, I created experimental RDF storage on top of Parquet and Apache Spark for querying big graphs[1]. It converts the RDF graph in a sort of property graph, where we have a row for each entity and where the columns are the all possible properties. The trick is to use a columnar format with the proper encoding (in our case Parquet), to solve the problem of having a lot of columns and a huge NULLs space. With this representation we can eliminate costly joins for most of the common queries, but also reduce the size of the necessary ones.

[1] PRoST https://github.com/tf-dbis-uni-freiburg/PRoST