|
|
|
|
|
by nilirl
54 days ago
|
|
One thing I've struggled with before is building a collection of data models based off of a collection of PDF forms. I wanted to abstract away the PDF form building my own html form on top of a data model that can later be used to programmatically fill the PDF . Since I had 100s of PDFs, I wanted an OCR+LLM pipeline to build a data model for each PDF. Unfortunately, OCR + LLM works ~90% of the time but sometimes fields are missed or mislabeled in the data model. Does this sometimes get it wrong during programmatic filling? How do you deal with that? |
|