| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by tmostak 1021 days ago

It wasn't clear to me what evaluation method was being used, the chart in the blog says Execution Accuracy, but the numbers that seem to be used appear to correlate with "Exact Set Match" (comparing on SQL) instead of the "Execution With Values" (comparing on result set values). For example, DIN-SQL + GPT-4 achieves an 85.3% "Execution With Values" score. Is that what is being used here?

See the following for more info:

https://yale-lily.github.io/spider https://github.com/taoyds/spider/tree/master/evaluation_exam...

1 comments

MrezaPourreza 1021 days ago

Hello, thank you very much for your meticulous comment. The 85.3% accuracy reported in our paper (I'm one of the authors of the DIN-SQL paper) pertains to the test set. However, in the blog post, we are reporting the performance on the development set, which stands at 74.2%.

link