|
The "spreadsheet" example video is kind of funny: guy talks about how it normally takes him 4 to 8 hours to put together complicated, data-heavy reports. Now he fires off an agent request, goes to walk his dog, and comes back to a downloadable spreadsheet of dense data, which he pulls up and says "I think it got 98% of the information correct... I just needed to copy / paste a few things. If it can do 90 - 95% of the time consuming work, that will save you a ton of time" It feels like either finding that 2% that's off (or dealing with 2% error) will be the time consuming part in a lot of cases. I mean, this is nothing new with LLMs, but as these use cases encourage users to input more complex tasks, that are more integrated with our personal data (and at times money, as hinted at by all the "do task X and buy me Y" examples), "almost right" seems like it has the potential to cause a lot of headaches. Especially when the 2% error is subtle and buried in step 3 of 46 of some complex agentic flow. |
This is where the AI hype bites people.
A great use of AI in this situation would be to automate the collection and checking of data. Search all of the data sources and aggregate links to them in an easy place. Use AI to search the data sources again and compare against the spreadsheet, flagging any numbers that appear to disagree.
Yet the AI hype train takes this all the way to the extreme conclusion of having AI do all the work for them. The quip about 98% correct should be a red flag for anyone familiar with spreadsheets, because it’s rarely simple to identify which 2% is actually correct or incorrect without reviewing everything.
This same problem extends to code. People who use AI as a force multiplier to do the thing for them and review each step as they go, while also disengaging and working manually when it’s more appropriate have much better results. The people who YOLO it with prompting cycles until the code passes tests and then submit a PR are causing problems almost as fast as they’re developing new features in non-trivial codebases.