|
|
|
|
|
by jonathanstray
2555 days ago
|
|
I didn’t try separating out tables because the total field isn’t actually “inside” the table in many cases. Certainly the other fields I want are not. pdfplumber seems mostly ok at extracting tokens. Sometimes it seems to combine tokens that should be separate. I suspect a few percent of the error is actually problems earlier in the data pipeline, as opposed to the model proper. |
|