| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by the_real_sparky 1264 days ago

Yep, my only idea so far has been to generically describe the figures in text format. Doing so through recognition in any level of detail will be extremely tough, as often the drawings differ by variations that would be difficult for a model to understand. It may not matter that much though, as usually the notes and headings around each figure provides a lot of context. So maybe you can get 75% of the way there by identifying the “block” and keeping the textual information in that area associated together so that it can be fed into the embeddings (and thus later the LLM) as a single unit of related information.

It’s frustrating though as often there are hundreds to thousands of pages of this stuff with diagrams and drawings randomly situated together on the pages. Documentation like this was designed to be dense for printing and consumed by a human that is familiar with it from regular use. I’m a bit concerned that the only solution may be paying a technical expert to sit down and convert it all to blocks of text. It would be an expensive endeavor, and even after it’s complete any changes (which happen often) would have to be continually maintained.

If that’s the only solution then I may still go for it, as I think the value to the business of having all knowledge instantly searchable and then automatically summarized will be considerable.