Hacker News new | ask | show | jobs
by btbuildem 1210 days ago
> “AskHN” is a GPT-3 bot I trained on a corpus of over 6.5 million Hacker News comments to represent the collective wisdom of the HN community in a single bot.

First sentence of the first paragraph on OP's page

EDIT: it's a bit misleading, further down they describe what looks like a semantic-search approach

1 comments

Scroll a bit further down and you will see

> 7. Put top matching content into a prompt and ask GPT-3 to summarize

> 8. Return summary along with direct links to comments back to Discord user

Ah got it. Perhaps they should edit the intro then, it's misleading.
I agree, that language could be very improved. This is not a GPT-like LLM whose training corpus is HN comments, which I found to be an extremely interesting idea. Instead, it looks like it's finds relevant HN threads and tells GPT-3 (the existing model) to summarize them.

To be clear, I think this is still very cool, just misleading.

Soon we will see language style transfer vectors, akin to the image style transfer at the peak of the ML craze 5-10 years ago -- so you will be able to take a HN snark vector and apply it to regular text, you heard it here first ;)
Joking aside, that does seem like it would be very useful. Kind of reminds me of the analogies that were common in initial semantic vector research. The whole “king - man + woman = queen” thing. Presumably that sort of vector arithmetic is still valid on these new LLM embeddings? Although it still would only be finding the closest vector embedding in your dataset, it wouldn’t be generating text guided by the target embedding vector. I wonder if that would be possible somehow?
Hmm. If you're willing to be stuck in time at 2016, there's https://zenodo.org/record/45901

Build a model off of that?