| HN Mirror

An example question in that eval set is "How many publications were published between 2019 and 2021?". That's something GPT without any context can understand how to answer from a schema (which I assume has a column called publications). An example question that I'd get in my previous role at an ecommerce fraud detection company could be something like "what's the chargeback rate on the ATO segment". Neither chargeback rate nor ATO segment are defined in the database schema. Not only did they have different definitions depending on the context (e.g. which customer), the definition also change over time within the same context.