|
|
|
|
|
by NiloCK
1 hour ago
|
|
Very interested in this! Can you share more about the modelling method (eg, three js?), the task list, and outputs here? I think there's probably some good juice to squeeze in terms of spacial awareness by doing a benchmark something like - give 3d modelling task - render and snapshot from a variety of angles - feed to third-party vision model for a "what is this" type query - grade on end-to-end accuracy Bonus points for asking the vision model something like "how beautiful is this 1-10". |
|
I was benchmarking using a soon to be released new version of my AI CAD modeling software[0]. It's basically an agent that has access to tools that can execute build123d scripts, get sculpted models, blender to combine sculpts + parametric models, tools to inspect the model (visually and with code), search datasheets, ...
I tried what you recommend a while ago (asking an AI to evaluate using different angles) and the AI evaluations were extremely bad - barely any correlation to what I scored. Things have gotten better, but I don't trust it enough yet.
Here is how I score adherence (and how AI did as well, but I tried methods where it would just give back a boolean "pass" or not):
Here is the scenario list (prompts are much more detailed): [0]: https://grandpacad.com