Hacker News new | ask | show | jobs
by echoangle 633 days ago
Anyone who has ever done CAD knows that a picture is worth a thousand words. Describing a 3D object you want in words is much more effort to get right than drawing a simple sketch. Wouldn’t it be better to have image input for an application like this?
4 comments

That's exactly what I was thinking. I can't describe what I want because I don't know it yet. It comes to life as I design.

I was an engineer in a former life but still do a fair amount of printing. But when I design parts there's not even ways I __could__ know what I want before hand. As I build I realize I made wrong assumptions, but also not enough, that there are better ways to do things, that I can solve other problems, that I didn't think how things would interact together, that I could modify things to be better for the manufacturing process (this is such a big on in 3d printing and so many online files get this VERY wrong. But it is a hard skill to learn), and so many other things. In part this is because I've had more time to think, but there's more to it, when you see the thing "coming alive". This is much the same way I code, though I guess that's not common.

I'm not sure if they'll address these problems, but I think anyone working in this space should make sure they also spend a lot of time in CAD themselves. It isn't clear to me that any of the authors do (looking at their websites. The main author mentions interest in AI-CAD but only has work from this year. Get this man a 3D printer). It's quite possible that they do, but they look like they've been computer scientists their whole career and that is probably not enough to understand the the intricacies of the problems they're trying to solve. There's a classic problem in CS where people get that you can learn a lot of things quickly but it is missed that getting the nuance and mastery takes time, that you should talk to experts. The first part is useful because it gives you the language to talk to experts, not because it makes you a replacement for one.

> In part this is because I've had more time to think, but there's more to it, when you see the thing "coming alive".

Yep. I have two printed prototypes of different approaches to a mechanism on my desk that only exist because of months of staring at CAD in the evenings, learning new things, doing research.

They are not radical (they may be slightly novel in places; I have never seen 3D printed mechanisms like them).

I don't know if I could describe them in words at all, but if I could, it would only be because I worked through them in CAD in the first place.

For anything other than a trivial object I just can't see how you'd even come up with the words without having worked through the design -- what, on paper in 2D in pencil? After doing the maths? That's CAD in reverse.

Yeah words and images (especially sketches) are fuzzy. I think we tend to think they are more precise than they are because we are so good at communicating, but often this is only after having a relationship with the other person. It is easy to ignore the frustration and frequency of miscommunication and blame it on other things, like your manager being dumb. When in fact, both might be true.

There's definitely things I think I could describe in words, but without a doubt could be communicated faster by sketching. There more complicated things where I think it would just be faster to cad up the damn thing. It's like math (or code). The language(s) are precise and annoying because of that precision, but they're still the easiest way to do the things we want to do, which is why we use them. Natural language's flexibility is great for abstraction and big ideas but not so great when it comes to precision. Things get very wordy very fast when you get into the details. And I'm sure everyone knows the value of arguing with your friend or coworker over those tiny things, even if it doesn't seem important. If you don't, you probably need to work on teams more often or make more friends lol

> And I'm sure everyone knows the value of arguing with your friend or coworker over those tiny things, even if it doesn't seem important.

Not to mention that in this case, this disagreement over the meaning of ultra-fine detail will be happening with an LLM, which does not really understand the words.

I'm an ML researcher and I seriously do not understand how people are avoiding the stupid loops. Like where I tell the LLM all the conditions, what works and what doesn't work, and then it tells me to do the thing that I just said doesn't work (while at the beginning of the response it even acknowledges this!). So then I say "x doesn't work, here's the output" and then it says "sorry for the confusion, you're right. Instead let's <insert bunch of useless words> then do x" where x is the same thing...

I can't be the only one, right? I feel like I'm being gaslit lol

Yeah but when all you have is an LLM hammer, everything looks like a text2text nail.
LLMs are fundamentally one dimensional which works fine when you're generating next tokens for text which because that's a 1D problem.

I do wonder how much progress we could make on a problem like this with a 3D transformer architecture.

I’m not sure I follow this. Isn’t an LLMs dimensionality measured by how many parameters the model supports? Ie 10s of billions in some cases? If I understand it correctly, then, the model is already evaluating things in lots of dimensions and reducing it down to 1, as you say in the case of text, 2 dimensions in image generation, 3 should be pretty straightforward.
I think they're referring to the dimensionality of the input / output space, not the intermediate internal representation.
The neat thing is, you can rasterize 1D space into 2D, 3D and so on. Trick as old as analog TV signal processing.
If I am understanding you right... I don't think this gets you anywhere useful.

Even if you could do what you're suggesting with an LLM (I have my doubts) this result would be a mesh or 3D pixel grid or something, yes?

This is terrible for interoperability and it's the opposite of what mainstream CAD packages do.

What you really want is draw a sketch + describe constraints in symbols
Yeah, a way to turn a flat sketch to a 3D model would be a better way to do this.