The SSD would wear out in days while the laptop generates two responses a day. This is like saying you could power your home with AA batteries, yes technically you could but in practice entirely infeasible.
There is no wear on the SSDs, because the weights are just read, they are not written during inference.
For model training, the requirements are very different, and the training of a big LLM cannot be done with home equipment. On the other hand, inference can be done on almost any PC, even for LLMs with thousands of billions of parameters, just very slowly.
The only problem is that the inference becomes limited by the SSD reading throughput. Most of the cheap new personal computers available today can read simultaneously only 2 SSDs (if there are more they share a reading path), which are typically 1 PCIe 5.0 SSD and 1 PCIe 4.0 SSD. This has an upper throughput limit of 24 Gbyte/s, with 15 to 20 GB/s achievable in practice.
Then the speed in token/s is limited by the amount of weights that must be read per inference cycle. The ratio between output tokens and the amount of weights that must be read can be improved by various methods, like batching multiple tasks or using speculative decoding.
Faster SSD access improves performance more than RAM does, at least until all of the model is being cached in RAM. So older and cheaper HEDT platforms with lots of PCIe lanes to attach storage to are best for this approach.
For model training, the requirements are very different, and the training of a big LLM cannot be done with home equipment. On the other hand, inference can be done on almost any PC, even for LLMs with thousands of billions of parameters, just very slowly.
The only problem is that the inference becomes limited by the SSD reading throughput. Most of the cheap new personal computers available today can read simultaneously only 2 SSDs (if there are more they share a reading path), which are typically 1 PCIe 5.0 SSD and 1 PCIe 4.0 SSD. This has an upper throughput limit of 24 Gbyte/s, with 15 to 20 GB/s achievable in practice.
Then the speed in token/s is limited by the amount of weights that must be read per inference cycle. The ratio between output tokens and the amount of weights that must be read can be improved by various methods, like batching multiple tasks or using speculative decoding.