Hacker News new | ask | show | jobs
by iamflimflam1 1263 days ago
I think you are right, this will be the key differentiator for anyone building a service like this - I guess like with most machine learning/data science projects - the real work is on the data engineering side of things.

One thing that all these models will lack is the ability to include diagrams (on both the input and output side). Working out a clever way to do that would be very cool.

At the moment there are some difficulties with the GPT interface - the most tricky one being the limit on the length of the input prompt. I'm not sure at the moment how much fine tuning helps with this.

But, my assumption is that OpenAI will improve this, so there's not a huge way to differentiate here.

3 comments

Yep, my only idea so far has been to generically describe the figures in text format. Doing so through recognition in any level of detail will be extremely tough, as often the drawings differ by variations that would be difficult for a model to understand. It may not matter that much though, as usually the notes and headings around each figure provides a lot of context. So maybe you can get 75% of the way there by identifying the “block” and keeping the textual information in that area associated together so that it can be fed into the embeddings (and thus later the LLM) as a single unit of related information.

It’s frustrating though as often there are hundreds to thousands of pages of this stuff with diagrams and drawings randomly situated together on the pages. Documentation like this was designed to be dense for printing and consumed by a human that is familiar with it from regular use. I’m a bit concerned that the only solution may be paying a technical expert to sit down and convert it all to blocks of text. It would be an expensive endeavor, and even after it’s complete any changes (which happen often) would have to be continually maintained.

If that’s the only solution then I may still go for it, as I think the value to the business of having all knowledge instantly searchable and then automatically summarized will be considerable.

You can ask ChatGPT to create SVGs and at some point in the past you could even trick it into embedding them as base64 images. Not sure if it still works since ChatGPT is unreachable for me currently.

More details:

https://www.reddit.com/r/ChatGPT/comments/zsnscy/i_asked_cha...

Adding diagrams as inputs is probably as easy as feeding in an additional CLIP embeddings during training. The trick here will be how to get enough training data. Perhaps there are enough StackOverflow questions with images in the question. For output, you could also finetune some diffusion model on that data.

I’ve actually talked with ChatGPT and asked it both to output mairmaid diagrams of discussed architecture (context was kubernetes clusters, namespaces and Pods) and also read diagrams and convert them correctly to kubectl commands to build the diagram.