| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by t-writescode 727 days ago

I'm not this person; but, I've been working on LLMs pretty aggressively for the last 6ish months and I have some ideas of how this __could__ be done.

You could plainly ask the LLM something like this as the query goes on:

"Please provide 3 categories that this product could exist under, with increasing specificity in the following format:

  {
     "broad category": "a broad category that would encompass this product, as well as others, for example 'televisions' for a 50" OLED LG with Roku integration",
     "category": "a narrower category that describes this product more aggressively, for example 'Smart Televisions'",
     "narrow category": "an even narrower category that describes this product and its direct competitors, for example OLED Smart televisions"
  }

A next question you'll have pretty quick is, "Well, what if sometimes it returns 'Smart televisions' and other times it returns 'Smart TVs', won't that result in multiple of the same category?" And that's a good and valid question, so you then have another query that takes the categories that have been provided to you and asks for synonyms, alternative spellings, etc, such as:

"Given a product categorization of a specific level of specificity, please provide a list of words and phrases that mean the same thing".

In OpenAI's backend - and many of them, I think, you can have the api run the query multiple times and get back multiple answers. enumerate over those answers, build the graph, and you can have all that data in an easy to read and follow format!

It might not be perfect, but it should be pretty good!

1 comments

_lvbh 726 days ago

> Well, what if sometimes it returns 'Smart televisions' and other times it returns 'Smart TVs', won't that result in multiple of the same category

Text similarity works well in this case. You can just use cosine similarity and merge ones that are very close or ask GPT to compare for those on the edge

link