|
|
|
|
|
by themanmaran
499 days ago
|
|
Hey this is something we know a lot about. I'd say Qwen 2.5 32B would be the best here. We've found GPT-4o/Claude 3.5 to benchmark at around 85% accuracy on document extraction. With Qwen 72B at around 70%. Smaller models will go down from there. But it really depends on the complexity of the documents, and how much information you're looking to pull out. Is it something easy like document_title or hard like array_of_all_citations. |
|
I tried the GPT-4o, it's good but it'll cost a lot if I want to process all the documents.