| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by paradite 815 days ago

For anyone who didn't bother looking deeper, the SWEbench benchmark contains only Python code projects, so it is not representative of all the programing languages and frameworks.

I'm working on a more general SWE task eval framework in JS for arbitrary language and framework now (for starter JS/TS, SQL and Python), for my own prompt engineering product.

Hit me up if you are interested.

1 comments

barfbagginus 815 days ago

Assuming the data set is proprietary, else please share the repo

link