|
|
|
|
|
by throwaw33333434
947 days ago
|
|
Anyone has a way to improve pdf data extraction? I want to covert a table in pdf to a CSV. so far the best performance has conversation to string import fitz # PyMuPDF pdf_document = fitz.open("foo.pdf")
page_number = 1
page = pdf_document.load_page(page_number - 1)
text = page.get_text("text") response = client.chat.completions.create(
model="gpt-3.5-turbo",
messages=[
{
"role": "system",
"content": f""" ..... {text} .... """ If I try regular ChatGPT it takes 3 minutes to covert the table (I have to press continue). Is there a way to force API to create whole CSV? some sort of retry? |
|
It's a bit of a pain to get started with, but if you have an AWS account you can find a UI for using it buried deep within the AWS web console.