The big one is that LLMs get lazy on repetitive tasks. They'll skip rows or consolidate entries instead of grinding through every last one. So you need verify-and-re-extract loops rather than single-pass processing. Breaking work into sub-agent chunks with explicit correctness criteria defined upfront (e.g., "line items must sum to the stated total") lets the system self-verify autonomously. At scale (28M+ fields), this approach actually outperformed expert human labelers!