take a look at modelrift.com, it is built around annotating built models by basic pen and arrow tools, works fairly well ('smarter' model is significantly better)
Who are your users? Are you working with professionals that use similar commercial products or hobbyists? I have a hard time imagining that seasoned industrial designers prefer text over sketches…
I suspect that your VLM might do a bad job at transcribing sketches into CADs, and you wrongly interpreted the adoption data as a preference for text-based interaction