| HN Mirror

Enhancing the comments on the existing data model seems to be the most common approach for sure. I'm implementing this as a data architecture at several clients and I've found creating a whole new logical structure designed for the LLM is really effective. Not being bound by the original data model lets you solve several problems related to the "n-hops" question, avoiding needing the comments, and the semantics of how data engineers define columns. Some more details here [1], but obviously you can implement this totally yourself by hand.

[1] (https://github.com/eloquentanalytics/pyeloquent/blob/main/RE...)