Hacker News new | ask | show | jobs
by throwup238 545 days ago
> One thing that didn't work great in the prompt above was excluding the location of places where the authors worked. They sometimes got included anyway.

Have you tried adding the institutions as an explicit property in the JSON response and just ignoring the second list?

I’ve had much better luck with having LLMs explicitly choose a different label when working with similar types of entities than asking the LLM to exclude them via prompting. This way you can also spot ambiguity if the LLM add a location to both arrays.

1 comments

I have not done that but I like that strategy not just for this use case but as a general idea for replacing exclusion with finer grained categorisation. One thing I did do is use a regex to preprocess the papers to remove bibliographies which were a really big source of noise. In titles of referenced papers there would often be a mention of location that was not directly relevant to the paper itself.

The Atlas is also trying to answer the question "Can we build inaccurate and incomplete systems with LLMs that are still useful?".