|
|
|
|
|
by energy123
387 days ago
|
|
One thing I'd like to see is an apples-to-apples benchmark against e.g. aider's edit formats, on the same set of tasks. There is a published benchmark on your site, but it isn't apples-to-apples, it only establishes the relative superiority of the fine-tuned model within this patching framework -- it's not a comparison across patching frameworks. |
|