|
|
|
|
|
by noduerme
1014 days ago
|
|
Most of my dev work is in logistics for a very staid industry which doesn't even make this list, that would have a difficult time finding a use for LLMs, because most of the day-to-day jobs involve manual labor. That said, I've been asked "how can AI help us" quite a bit (the unsaid ending to that sentence being, "...help us lay off workers"). Just as a thought experiment, I've considered the pros and cons of automating certain linguistic-heavy aspects of the business. My considered determination is that the penalty from a single fuckup by an LLM in any important scenario would dramatically outweigh all other potential savings. This is probably why no one in a lot of sectors outside those that already provide white-label customer service are seriously considering implementing these things. There is no barrier to doing so from a financial or technical perspective, in fact there's every incentive to try it. Businesses like the one I'm in are just waiting to see how exactly the first movers will be dismembered. |
|
After that its just a gradual creep into LLM ops and madness. Speaking from the other side of that descent into madness.
As obvious as it may be, production LLM tools work on your data. You can't simply use an external benchmark to verify if your tool works for your use case. You will always have to build evaluation processes.
I'd say there are 2 type of tests you will end up running.
1) Statistical Tests - AKA good old ML. 2) Semantic Tests - Here be dragons.
Semantic tests break down further based on HOW you are using the LLM. (Categorization, Summarization)
The issue with Semantic testing is the amount of human effort. Its more akin to setting up exams and evaluating answers. Also your student may be tripping randomly.
Categorization - you can simplify it down to almost ML workflows. Summarization ? That takes effort to verify.