|
|
|
|
|
by cpursley
509 days ago
|
|
I've been thinking a lot about how to accomplish various RAG things in Elixir (for LLM applications). PDF is one of the missing pieces, so glad to see work here. The really tricky part is not just parsing out the text (you can just call the pdftotext unix command line utility for that), but accurately pulling out things like complex tables, etc in a way that could be chunked/post processed in a useful way. I'd love to see something like Unstructured or Marker but in Rust (i.e., fast) that Elixir could NIF out to it. And maybe some kind of hybrid system that uses open llm models with vision capabilities. Ref: - https://github.com/Unstructured-IO/unstructured - https://github.com/VikParuchuri/marker |
|
https://github.com/yobix-ai/extractous