|
|
|
|
|
by tmostak
1021 days ago
|
|
It wasn't clear to me what evaluation method was being used, the chart in the blog says Execution Accuracy, but the numbers that seem to be used appear to correlate with "Exact Set Match" (comparing on SQL) instead of the "Execution With Values" (comparing on result set values). For example, DIN-SQL + GPT-4 achieves an 85.3% "Execution With Values" score. Is that what is being used here? See the following for more info: https://yale-lily.github.io/spider
https://github.com/taoyds/spider/tree/master/evaluation_exam... |
|