|
|
|
|
|
by mmh0000
641 days ago
|
|
This sounds like a fun and interesting challenge! I am tempted to try it on my own I’m surprised an LLM actually works for that purpose. It has been my experience with gpt reading pdfs that it’ll get the first few entries from a pdf correct then just start making up numbers. I’ve tried a few times having gpt4 analyze a credit card statement and it adds random purchases and leaves out others. And that’s with a “clean” PDF. I wouldn’t trust an llm at all on an obfuscated pdf, at least not without thorough double checking. |
|
Absolutely! It's a fucking criminal in that regard. But that's why everything is done with hard python code and the results are tested multiple times. As an assistant, gpt can be fabulous, but the user must run the necessary scripts on their own and be ever ready for a knife in the back at any moment.
Edit: below is an example of what it generated after a lot of debugging and hassle:
import csv from datetime import datetimedef clean_and_structure_data(text): """Cleans and structures the extracted text data.""" # Regular expression pattern to match the lottery data pattern = r'(\d{2}/\d{2}/\d{2})\s+(E|M)\s+(\d{1})\s-\s(\d{1})\s-\s(\d{1})\s-\s(\d{1})(?:\s+FB\s+(\d))?' matches = re.findall(pattern, text)
def save_to_csv(data, output_path): """Saves the structured data to a CSV file.""" # Sort data by date in descending order sorted_data = sorted(data, key=lambda x: datetime.strptime(x['Date'], '%m/%d/%Y'), reverse=True) def main(): # Path to the text file txt_path = 'PICK4.txt' # Ensure this path points to your actual text file output_csv_path = 'output.csv' # Ensure this path is where you want the CSV file saved if __name__ == "__main__": main()