That analogy is heavily stretched with dynamically typed languages, you could say that it's a box that can hold anything, but then the value of the analogy falls apart.
As you point out, all it takes to describe dynamically typed languages is describing a variable as a box that can contain an object. It can contain a fish now and a truck later. What is the problem? Assignments, l-values and such work just fine as far as I can tell.
The analogy works because it's not far removed from what is actually happening in memory. And while laypeople don't know what computer memory is, they have a good intuition about boxes.
The analogy works because it's not far removed from what is actually happening in memory. And while laypeople don't know what computer memory is, they have a good intuition about boxes.