| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by matteuan 2192 days ago
	For research, I created experimental RDF storage on top of Parquet and Apache Spark for querying big graphs[1]. It converts the RDF graph in a sort of property graph, where we have a row for each entity and where the columns are the all possible properties. The trick is to use a columnar format with the proper encoding (in our case Parquet), to solve the problem of having a lot of columns and a huge NULLs space. With this representation we can eliminate costly joins for most of the common queries, but also reduce the size of the necessary ones. [1] PRoST https://github.com/tf-dbis-uni-freiburg/PRoST