Hacker News new | ask | show | jobs
by wbarber 1116 days ago
I'm attempting to write a function that splits a long document into shorter segments of text, splitting the text into the topics discussed as a step in a data processing pipeline prior to embedding the shorter segments of text for vector search.

I'm attempting to use v1.0 of the pomegranate python library as I get the impression it will be more performant than some of the other common options I looked at. Below is my code. I'm a self-taught developer just trying to solve a niche problem that's of interest to me so I've not used any of these libraries before or attempted to build a hidden markov model before so be gentle and many thanks for the help.

You can see my current attempt and the error I'm getting at the link I provided. As well as here on stack overflow if you want some internet karma for your kindness: https://stackoverflow.com/questions/76409619/hidden-markov-m...