| Claude 100k 1.3 blew me away. Giving it a task of extracting a specific column of information, using just the table header column text, from a table inside a PDF, with text extracted using tesseract, no extra layers on top. (for those that haven't tried extracting tables with OCR, it's a non-trivial problem, and the output is a mess) > 40k tokens in context, it performed at extracting the data, at 100% accuracy. Changing the prompt to target a different column from the same table, worked perfectly as well. Changing a character in the table in the OCR context to test if it was somehow hallucinating, also accurately extracted the new data. One of those "Jaw to the floor" moments for me. Did the same task in GPT-4 (just limiting the context window to just 8k tokens), and it worked, but at ~4x more expensive, and without being able to feed it the whole document. |
2023 office software already uses 1000x more ressources than 1990s'. I bet we are ready to do that again.