I wanted to share a little side project of mine that I created while tinkering around with GPT-3.
The project uses the Algolia HN Search API [1] to retrieve the "Who is hiring?" posts from HN and then parses them with the help of GPT-3 / GPT-3.5 (I do not have API access to GPT-4, yet, but it already works quite well even with the older models). It then puts the job postings into a structured list that is hopefully easier to skim than the original postings. There are some additional features like sorting jobs by semantic similarity (based on the text embeddings from OpenAI). Filtering, sorting and saving favorites is implemented client-side, so your data and preferences remain local to your browser.
Originally, this wasn't even meant to be a public product, but if people find it useful (and HN is fine with it), I'll try to keep it running. I've also written a short article about how the parsing works behind the scenes [2]. It's quite amazing how easy many of the classic NLP tasks have become with the newer LLMs.
For example, for the March one it is ID 34983767 (from the algolia search or a "there's only so many of them, here's a list that I'll add to each month").
Thanks for mentioning the Firebase-based API. I knew it existed, but somehow I went with the Algolia API by default. I use their HN search quite a bit, so that's probably why I stuck with them. (no affiliation)
I tried a similar thing today parsing unstructured text (client excel documents) and turn them into JSON. I ran into the problem that the output format changed and sometimes the JSON wants parsable.
Thanks for your prompt. There are some pointers how to improve mine
You're welcome! For the chat model, it definitely helps to let it know that you want valid, parsable JSON (and nothing else). Otherwise it tends to get chatty. ;-) Depending on your use case, you might even ask it to fix the JSON if it's not parsable.
The core ideas for extracting the information with GPT are already available in the blog post linked above. Those are exactly the prompts I'm using. The rest is just a pretty simple Nuxt web application. So I'm not sure if open sourcing my mediocre frontend code would be of any value. Is there anything in particular you would be interested in?
First a request, would you be able to add filtering by location other than the #remote? Say I wanted to see only jobs in US, there's no way of doing that. That would also mean that "Santa Monica, CA" should also show up in the US filter so that could get tricky. Same thing I see for Europe where "Munich, DE" should also show up in a filter for "Europe".
and 2nd, the first three icon buttons at the beginning of each row are not accessible.
- You are using <div><img></div>, but since they are clickable items that perform actions on the page they should be <button type="button">s OR the less recommended way is to use aria attributes + tabindex + role="button" (but honestly you don't need to really do that because buttons come with it built in, just use buttons and css). If you go the non-button route: https://developer.mozilla.org/en-US/docs/Web/Accessibility/A...
- your icons need some kind of accessibility text because they are not obvious to me what they do. The only one that is clear (to me) is the star, the other two do something that I did not expect. I thought I was sorting but then it popped up above the table, confusing
a. add screen reader friendly text for them using the visually-hidden css class in the link below
b. add a `title` attribute for everyone else.
Thank you very much! Filtering by location (and role) is on my todo list, but it is trickier than it seems at first. And I totally agree that the buttons are confusing. Actually, the "sort" button does sort the jobs. It sorts by semantic similarity to the job you selected (using the GPT text embedding). As for the buttons (and probably other parts of the site) not being accessible: I apologize. This shouldn't be an afterthought.
Thank you very much! I will definitely try out your suggestions! However, at least with GPT-3.5 and the amount of data I have to deal with in this case, my main concern is the quality of the extractions. With about 500 posts per month, the cost is manageable. But for larger datasets, saving tokens is definitely important.
Thanks for the suggestion! It's on my todo list. For now, you at least can sort jobs by similarity to a selected job. It's the middle icon to the left of each entry (maybe not the most intuitive way how to do it, though).
This is awesome! Well done and thanks for sharing. Just subscribed :)
I'm actually working on something similar, but specific to inbound job opportunities (Email and LinkedIn). The goal is to use GPT to parse unstructured, unstandardized jobs into a structured, standardized job format that makes it easy for candidates to search and review once they start their job search.
It's in it's early stages, but you can check it out here and let me know what you think: https://sharedrecruiting.co/
I'd love to chat about more about this if you up for it! You can reach me at team at sharedrecruiting.co
Thanks! Haha, you're right. There's already a kind of rule for that in the original prompt I'm using, but the "competitive" thing somehow still slips through. Will fix it in the next version.
Seems the advantage is that OP didn't need to write any code to extract information from the unstructured data (e.g. job title, company name, remote/not, salary, location, etc.). It seems you can feed GPT all of this data and ask it to return these fields.
Exactly. I am sure you can get similar results with some "traditional" NLP skills, but the good (bad?) thing is that they are not required when using one of the newer LLMs.
The project uses the Algolia HN Search API [1] to retrieve the "Who is hiring?" posts from HN and then parses them with the help of GPT-3 / GPT-3.5 (I do not have API access to GPT-4, yet, but it already works quite well even with the older models). It then puts the job postings into a structured list that is hopefully easier to skim than the original postings. There are some additional features like sorting jobs by semantic similarity (based on the text embeddings from OpenAI). Filtering, sorting and saving favorites is implemented client-side, so your data and preferences remain local to your browser.
Originally, this wasn't even meant to be a public product, but if people find it useful (and HN is fine with it), I'll try to keep it running. I've also written a short article about how the parsing works behind the scenes [2]. It's quite amazing how easy many of the classic NLP tasks have become with the newer LLMs.
Happy to answer any questions about the project!
[1] https://hn.algolia.com
[2] https://marcotm.com/articles/information-extraction-with-lar...