Hacker News new | ask | show | jobs
by ocolegro 844 days ago
Yes, this is on the shortlist.

Do you have any preferred frameworks?

1 comments

I haven't found any frameworks that offer it. The best explanation of an implementation that can take a stream of unformatted text and map over it to determine when a topic changes is explained in this video: https://youtu.be/8OJC21T2SL4?t=1932

They compute embeddings using a window of three sentences and then compute distance to find the largest deltas to break up the text into "topics". It is computationally expensive.

I just noticed this has been added to langchain: https://python.langchain.com/docs/modules/data_connection/do...
check out this https://preprocess.co