|
|
|
|
|
by benl_c
545 days ago
|
|
The backend is still a mess of code, so no. It's not too hard to do though. The prompt I used extract location is "The text provided are enviornmental science papers. They often (but not always) will include references to locations where this science is relevant, for example a study might be of soil around a small town, in this case the town would be the relevant location, extract all locations that are relevant or the subject of the science done, do not extract any locations that are related to the location of or institutions, organisations, or laboratories. So for example exclude the location of government departments and CSIRO laboratories. If there are no relevant locations, please return an empty array. Each location should be extracted in a form suitable for calling the Nominatim geocode API in Python via geopy. Also, extract out a short context string that describes the context in which this location is referenced. Please provide the output in JSON format." Then I passed it through both Nominatum and Google Geocoder. Google worked better. One thing that didn't work great in the prompt above was excluding the location of places where the authors worked. They sometimes got included anyway. |
|
Have you tried adding the institutions as an explicit property in the JSON response and just ignoring the second list?
I’ve had much better luck with having LLMs explicitly choose a different label when working with similar types of entities than asking the LLM to exclude them via prompting. This way you can also spot ambiguity if the LLM add a location to both arrays.