I asked it to create 3d model of "AMF-O97L45-DB". It pulled datasheet and generated 3D model. Left is reality, right is what was generated: https://imgur.com/a/oNaz51q
That's why a hybrid approach is needed. The agent shouldn't be making up dimensions based on an image. It should use OCR to extract the size table from the datasheet, feed it into a parametric table, and only then map it onto the base enclosure template.