|
|
|
|
|
by huijzer
1180 days ago
|
|
Let me know if you got it working. I'm looking for such a thing too! Maybe stripping the styling and Javascript from webpages would work? Did you do the OCR as part of the complete model or did you make it a separate step? Machine learning is usually much better in one step. |
|
For more context: I have this setup as an api that I feed url + typescript definitions to, and have chatgpt output information from the website in the specified typescript definition.
For example, I can use {product_price: float, product_name: str} + a url as the input, and fairly accurately get product price info across ALL product websites. It's kind of amazing that it's able to do this much just based upon the typescript variable names + raw OCR output.