|
|
|
|
|
by existencebox
1282 days ago
|
|
Question for ya if you don't mind: I had to do some PDF scraping a while back as part of a side project collecting alternative social/economic datasources. Even within a single site, there were often errors at the fringes, especially if things like layout/styling changed, and my concern about giving bad data to users (or needing to constantly be checking data quality and adjusting custom parameters for each target site) held me back from ever feeling confident enough to convert it into a paid product. I don't mean for you to give up your secret sauce here, but wondering if you ran into this same issue, and what your approach was from a business/customer expectations perspective? |
|
I also have a "pretty good" fallback algorithm if the statement cannot be classified.