Show HN: A structured list of jobs from “Who is hiring?”, parsed with GPT | HN Mirror

Y	Hacker News new \| ask \| show \| jobs

	Show HN: A structured list of jobs from “Who is hiring?”, parsed with GPT (hacker-jobs.com)
	68 points by marcotm 1182 days ago

10 comments

marcotm 1182 days ago

I wanted to share a little side project of mine that I created while tinkering around with GPT-3.

The project uses the Algolia HN Search API [1] to retrieve the "Who is hiring?" posts from HN and then parses them with the help of GPT-3 / GPT-3.5 (I do not have API access to GPT-4, yet, but it already works quite well even with the older models). It then puts the job postings into a structured list that is hopefully easier to skim than the original postings. There are some additional features like sorting jobs by semantic similarity (based on the text embeddings from OpenAI). Filtering, sorting and saving favorites is implemented client-side, so your data and preferences remain local to your browser.

Originally, this wasn't even meant to be a public product, but if people find it useful (and HN is fine with it), I'll try to keep it running. I've also written a short article about how the parsing works behind the scenes [2]. It's quite amazing how easy many of the classic NLP tasks have become with the newer LLMs.

Happy to answer any questions about the project!

[1] https://hn.algolia.com

[2] https://marcotm.com/articles/information-extraction-with-lar...

shagie 1181 days ago

You can make the intermediate step a bit more structured too via https://github.com/HackerNews/API

For example, for the March one it is ID 34983767 (from the algolia search or a "there's only so many of them, here's a list that I'll add to each month").

You can then get a list of all the top level comments at https://hacker-news.firebaseio.com/v0/item/34983767.json?pri...

And then pulling up a comment at https://hacker-news.firebaseio.com/v0/item/35255027.json?pri... to not have to parse any of its child comments or the HTML of the page.

(late edit: and re-reading the blog post while not trying to pay half attention to a meeting... that is what you are doing)

marcotm 1181 days ago

Thanks for mentioning the Firebase-based API. I knew it existed, but somehow I went with the Algolia API by default. I use their HN search quite a bit, so that's probably why I stuck with them. (no affiliation)

whinvik 1182 days ago

This is really nice. I have 1 nitpicky comment on the blog. The font used is jarring for me to read.

ta1243 1181 days ago

It's like I've stepped into an episode of futurama!

number6 1181 days ago

I tried a similar thing today parsing unstructured text (client excel documents) and turn them into JSON. I ran into the problem that the output format changed and sometimes the JSON wants parsable.

Thanks for your prompt. There are some pointers how to improve mine

marcotm 1181 days ago

You're welcome! For the chat model, it definitely helps to let it know that you want valid, parsable JSON (and nothing else). Otherwise it tends to get chatty. ;-) Depending on your use case, you might even ask it to fix the JSON if it's not parsable.

number6 1181 days ago

I had the problem that it changed the layout of the JSON file: {"data": [...]} or {"products":[...]}.

In your first example, you told GPT what data structure you expected. I added this to my prompt, and now it produces the JSON Data consistently.

avinassh 1181 days ago

Any plans for making this open source?

marcotm 1181 days ago

The core ideas for extracting the information with GPT are already available in the blog post linked above. Those are exactly the prompts I'm using. The rest is just a pretty simple Nuxt web application. So I'm not sure if open sourcing my mediocre frontend code would be of any value. Is there anything in particular you would be interested in?

flanbiscuit 1181 days ago

This is cool! I am definitely going to use this

A couple of small things.

First a request, would you be able to add filtering by location other than the #remote? Say I wanted to see only jobs in US, there's no way of doing that. That would also mean that "Santa Monica, CA" should also show up in the US filter so that could get tricky. Same thing I see for Europe where "Munich, DE" should also show up in a filter for "Europe".

and 2nd, the first three icon buttons at the beginning of each row are not accessible.

- You are using <div><img></div>, but since they are clickable items that perform actions on the page they should be <button type="button">s OR the less recommended way is to use aria attributes + tabindex + role="button" (but honestly you don't need to really do that because buttons come with it built in, just use buttons and css). If you go the non-button route: https://developer.mozilla.org/en-US/docs/Web/Accessibility/A...

- your icons need some kind of accessibility text because they are not obvious to me what they do. The only one that is clear (to me) is the star, the other two do something that I did not expect. I thought I was sorting but then it popped up above the table, confusing

  a. add screen reader friendly text for them using the visually-hidden css class in the link below
  b. add a `title` attribute for everyone else.

https://www.a11yproject.com/posts/how-to-hide-content/

awesome work!

marcotm 1181 days ago

Thank you very much! Filtering by location (and role) is on my todo list, but it is trickier than it seems at first. And I totally agree that the buttons are confusing. Actually, the "sort" button does sort the jobs. It sorts by semantic similarity to the job you selected (using the GPT text embedding). As for the buttons (and probably other parts of the site) not being accessible: I apologize. This shouldn't be an afterthought.

flanbiscuit 1181 days ago

No worries! Your MVP looks great! I'm just starting a job search so this came at the right time. Thank you!

rwhyan 1181 days ago

Looks great!

I've been playing around GPT information extraction, and I think your prompt can be simplified to save on token costs:

Instead of:

`The company name (field name: "companyName", field type: string)`

I use a prompt that looks like:

`... The JSON should consist of the following information, using the format <field name: field type>: The company name <companyName: string>`

I've also played around using JSON structure in the prompt, such as:

`Return a JSON object with following model, with the format <field type: instructions to extract> { "companyName": <string: The company name>, ... }`

In my experience, often the attribute name is enough and GPT can infer how to extract the information (i.e. { "companyName": string ... }

marcotm 1181 days ago

Thank you very much! I will definitely try out your suggestions! However, at least with GPT-3.5 and the amount of data I have to deal with in this case, my main concern is the quality of the extractions. With about 500 posts per month, the cost is manageable. But for larger datasets, saving tokens is definitely important.

ordx 1182 days ago

It would be great to add location and role filters.

marcotm 1182 days ago

Thanks for the suggestion! It's on my todo list. For now, you at least can sort jobs by similarity to a selected job. It's the middle icon to the left of each entry (maybe not the most intuitive way how to do it, though).

navane 1181 days ago

wow, that is both very neat and hard to find

devstein 1181 days ago

This is awesome! Well done and thanks for sharing. Just subscribed :)

I'm actually working on something similar, but specific to inbound job opportunities (Email and LinkedIn). The goal is to use GPT to parse unstructured, unstandardized jobs into a structured, standardized job format that makes it easy for candidates to search and review once they start their job search.

It's in it's early stages, but you can check it out here and let me know what you think: https://sharedrecruiting.co/

I'd love to chat about more about this if you up for it! You can reach me at team at sharedrecruiting.co

jkmcf 1181 days ago

Pretty cool. While looking around I noticed

  https://news.ycombinator.com/item?id=34984365

Was flagged as part-time, possible because they mention "PART REMOTE".

chrisan 1182 days ago

Very cool!

Would be nice to remove "competitive salary" from the "salary stated" filter. Are we going to assume everyone else is a noncompetitive salary?

Maybe require a number for salary stated to count

marcotm 1181 days ago

Thanks! Haha, you're right. There's already a kind of rule for that in the original prompt I'm using, but the "competitive" thing somehow still slips through. Will fix it in the next version.

candleknight 1181 days ago

Looks awesome! A filter for internships would be really helpful

moneywoes 1181 days ago

What’s the advantage of GPT parsing here, the has comp filter?

cachecrab 1181 days ago

Seems the advantage is that OP didn't need to write any code to extract information from the unstructured data (e.g. job title, company name, remote/not, salary, location, etc.). It seems you can feed GPT all of this data and ask it to return these fields.

marcotm 1181 days ago

Exactly. I am sure you can get similar results with some "traditional" NLP skills, but the good (bad?) thing is that they are not required when using one of the newer LLMs.

scrollaway 1181 days ago

... did you click the link? Are you really asking what the advantage is compared to the Who is Hiring thread as-is?

sfc32 1181 days ago

Nice work. A filter against job title would be helpful.