Hacker News new | ask | show | jobs
by jayalammar 1148 days ago
There's a lot you can do with the vectors themselves without needing to embed any more text (e.g., clustering, exploration, visualization after dimensionality reduction...etc). Here's a previous embeddings exploration of top HN posts: https://txt.cohere.com/combing-for-insight-in-10-000-hacker-... A lot of that code can be used here as well.

If you want to query for a search term, you can use a trial API key which is free to use for prototyping. The embedding model itself is not open source, though. [co-author of the post here]

1 comments

If that's the intent, IMO the release dataset should have more metadata (e.g. paragraph heading, article taxonomy)
How would you add that data? As new columns you mean? Or add the paragraph headings to the text of the paragraphs before embedding them?
New columns.

For the headings, I mean the Wikipedia section headings (which isn't always a paragraphs, my mistake).

In both cases the data can be used like to classify/visualize Show HNs in your linked post.