Hacker News new | ask | show | jobs
by jamasb 2640 days ago
I've been doing some work on link prediction in knowledge graphs recently with poor results on real-world data. These methods don't necessarily require a huge amount of data but they are very sensitive to noise and the 'density' of dataset. The benchmark datasets are, in essence, very easy to get good performance on. It's a real shame that metrics for these methods' tolerance of noise and sparsity are not reported because these are going to be present in almost any real-world dataset in far greater quantities than current benchmarks.
1 comments

Well, the landscape is still quite fluid (there are new models proposed in literature at every major conference). Processing real-world graphs is obviously more challenging, for a number of reasons (multi-modality, scale, etc.) - even though benchmarks are catching up, and are becoming harder (see FB15k-237 or WN18RR).

As a general rule of thumb, it is important your graph has enough redundancy in it, i.e. the more relations, the better. Also, bear in mind these models do not support multi-modality, i.e. literals such as numbers, strings, geo coordinates, timestamps are simply treated as entities. In most cases it is probably better to filter literals out before generating the embeddings.