It's not only about the smallest unit but every cache level unit size. Imagine if the packed structure was on SSD and mapped to linear memory addresses. Certainly at the smallest scale the cache line is what matters. But there's also benefits of having likely to be accessed data be contiguous at other scales, e.g. os memory page, SSD i/o unit.
I'm fairly certain that DDR4 has 64-byte bursts (64-bit data bus x 8 length bursts per command == 64-bytes per DDR4 operation). I'd expect all modern systems with DDR4 controllers to have 64-byte cache lines or greater.
LPDDR4 is a totally different protocol however. Maybe 32-bytes is optimal on cell phones... I don't know much about that.