Hacker News new | ask | show | jobs
by sreekanth850 86 days ago
We’re taking a different path, building a parsing engine that converts CAD (DWG/DXF) into fully structured JSON with preserved semantics (no ML in the critical path).We also have a separate GIS parser that extracts vector data (features, layers, geometries) independently, Like to know how you handle consistency and reproducibility across runs using models and how you make it affordable, especially at scale. because as far as i know CAD and GIS need precision and accuracy.
3 comments

interesting yeah parsing DWG/DXF natively makes sense when the source file is clean and well-structured. The precision argument is valid in controlled environments.

The challenge we kept running into is that construction drawings in the wild aren’t always that clean. Unresolved xrefs, exploded dynamic blocks, version incompatibilities, SHX font substitutions — by the time a PDF hits a GC’s desk it’s often the only reliable artifact left. The CAD source may not even be available.

That’s why we see vision becomes the more pragmatic path — not because it’s more precise than structured CAD parsing, but because PDFs are the actual lingua franca of construction. Every firm, every trade, every discipline hands off PDFs. So we made a bet on meeting the document where it actually lives.

On consistency and reproducibility — that’s a real challenge with vision models. Our approach is to keep detection scope narrow and validate confidence scores on every output rather than trying to generalize broadly. Happy to go deeper on that if useful.

As a part of our product development, we had fought with PDF so much, even we have a generic PDF parser with triple pipeline (One for single column, another for multi column and third for complex table based layouts) yet we are not getting 100% accuracy, I would say that it's bit risky to bet on PDF. PDF often is the most complex format ever made and it was never made for data extraction. And You are right that vision models are the only way but hallucination is real.
Dumbcad line barf will not help you with that at all.

There already is a format that is plain text and preserves the semantics: IFC. That's what it was made for.

We’re not just dumping primitives, we extract full CAD context including entities, layers, blocks, colors, and topology. That metadata allow reconstruct structure deterministically. IFC is great when available, but in most real-world pipelines DWG is still the source of truth, often degraded. Our focus is making that usable without relying on probabilistic vision layers. People depend on PDF for cad files due to its portability and to avoid software dependency/licensing, we aim solving that, any machine or pipeline that needs CAD or GIS data for analytics, search, or reasoning can operate on our structured output without requiring a native CAD or ESRI license.
Is this a service / product you plan to offer outwardly? I'd be interested in learning more. Use case: estimation.
Happy to say that yes. We are in final round of polishing. mostly opening in couple of weeks. We are mainly targeting such uses cases where you can add CAD files into RAG or analytics or search pipeline without losing source of truth or geometry. I will definitely post here when we are ready, keep an eye and it will be free during beta, so you can play how much you want with it.
I think given the scale of projects worked on / being estimated (8 to 9 figure projects) -- it's unlikely we'd be willing to test on our real CAD files. Any chance for enterprise hosted solutions that aren't on the cloud?
That is the catch, yes on premise air gapped will be provided, this is purely inbuilt parsers no cloud dependency. We will also have a managed dedicated environment, if you don't want to manage the complex infra yourself. Even in Hosted model, we have provision to use your own s3 storage (only used for async parsing pipeline and file will be wiped after 3 hours)so file never touches our disc. We had considered this from a customer point of view and while designing the system. In my early career i had worked in GIS industry and know the privacy and the data security that is required for CAD and GIS files.