| I've been working on similar functionality for jsonresume -> https://github.com/jsonresume/jsonresume.org/blob/master/app... What the author could have done, and what I should have (but didn't) also, is add a bunch of possible values (enums) for each possible field value. This should solve it from coming up with variations e.g. node, nodejs In zod/tooling it would look like this;
remote: z.enum(['none', 'hybrid', 'full']),
framework: z.enum(['nodejs', 'rails']), But this just shifts the problem further down, which is now you need a good standard set of possible values. Which I am yet to find, but I'm sure it is out there. On top of that, I am working on publishing a JobDescription.schema.json such that the next time the models train, they will internalize an already predefined schema which should make it a lot easier to get consistent values from job descriptions. - Also I tend to forget to do it a lot recently in LLM days but there are plenty of good NER (Named Entity Recognition) tools out there these days, that you should run first before making robust prompts |