|
|
|
|
|
by simonw
313 days ago
|
|
My main goal with that benchmark is to see if it can produce HTML and JavaScript code that runs without errors for a moderately complex challenge. It's not a comprehensive benchmark - there are many ways you could run it in ways that would be much more informative and robust. It's great as a quick single sentence prompt to get a feeling for if the model can produce working JavaScript or not. |
|