| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by ArnoVW 1365 days ago

You subsample. One package I used made N 'random walks' for each node. The random walks are written out as 'sentences', where the node id's are words.

That results in a huge text file, that you then embed as if it were a normal text. The result is a normal 'word embedding' where the words are in reality the node id's. Works like a charm. Highly scalable.

https://github.com/dwslab/jRDF2Vec

1 comments

sandGorgon 1364 days ago

really ? so u keep subsampling as the data becomes larger and larger.

instead of ...well...throwing more hardware that seems to be easier and easier these days.

P.S. not trolling. im genuinely wondering if there is a better way to split the problem heuristically

link

ArnoVW 1364 days ago

All I'm saying is that you don't take into account all paths for each node. Just for ex 100 random walks starting at each node. And that results in an embedding that is 'good enough'.

Of course it is better to throw more hardware at the issue. But at a certain point the added value of being more precise or adding more hardware becomes moot, because you gain 0.1%

That is what I meant by 'it scales'. You can solve 'reasonably complex issues' with 'reasonably cheap hardware'

link