|
|
|
|
|
by kizer
1168 days ago
|
|
I think companies are moving too quickly with AI and LLMs in particular. I think that the data LLMs are trained on should be very well-known - not just sanitized and certainly not just trained on the "whole web". GPT-4 is unwieldy... it's incredibly powerful but is still unpredictable and has learned how many "bad patterns", so to speak, that we'll never know since its basically a giant black box. The ChatGPT version is the least harmful in my opinion; sinister are the propagated problems when GPT is utilized under-the-hood as a component in services (such as Bing search). |
|
It would be much better indeed if we knew exactly what the training data was for every given model. But they will still hallucinate things that aren't directly in that data, but could be inferred from it somehow, so that won't solve the problem.