Hacker News new | ask | show | jobs
Show HN: I Scraped 2,200 Software Engineering Jobs from Career Pages Using LLMs (grepjob.com)
8 points by kylem866 483 days ago
Hi everyone,

I built GrepJob because I got frustrated with the user experience of LinkedIn/Indeed while looking for a new SWE job.

Specifically: 1) Not being able to trust the date posted of any job. The date shown on LinkedIn is often the date the job was reposted by the recruiter and 2) Being shown too many irrelevant jobs. For example, I get shown senior/staff level roles when I search for "Software Engineer II"

GrepJob solves 1) by populating the date_posted time directly from each company's ATS system And 2) by extracting seniority, specialty (frontend, backend, etc.), and tech stack from each job with LLMs

Please let me know if you have any feedback, thanks!

3 comments

You should connect with the person building https://hiring.cafe. They are scraping something like 1.6M jobs using ChatGPT, might be some collaboration opportunity or knowledge transfer.

https://news.ycombinator.com/item?id=42806956

Worst case, proven pattern to emulate. Wishing you success!

Thanks! I've actually already sent a message to the hiring cafe creator and didn't hear back. Might be worth another shot
I really enjoy the simple, elegant design and look of this site. Well done!

I did notice that the mid-level jobs are returning mainly senior roles though.

Thanks! Yeah I have noticed accuracy problems with the seniority too. I'm using 4o-mini + structured output to extract the seniority. Currently the seniority output is defined as an array to handle edge cases where a job could technically be either mid level or senior. But, in reality the LLM is over eager at assigning multiple seniorities. It frequently gives a mid level seniority to jobs which literally have 'Senior' in the title. I'll work on it!
cool stuff! I wish there were a fuzzy search / filter bar to make it easier to search for more specific things.

I'm also curious, what are you using to structure the outputs?

Thank you! What more specific things would you like to search for?

I'm using 4o-mini + structured output mode

mostly just in the UI like a free form fuzzy search so I could look for more specific things rather than the drop down select
Right, that can definitely be done. I was just wondering what specific things you're hoping to find with a fuzzy search so I can make sure it's implemented well
oh things like languages, tech stack, more specifics around role, etc.
Update: just added support for tech stack filtering. Let me know what you think!
Got it, thanks!