That's just a consequence of us sticking to 8-bit bytes (and derivative word sizes), no? Octal would have made a lot more sense if it was, say, 12-bit.
“Us sticking to 8 bit bytes” is a consequence of having preferred BCD to octal in the past, so the causation is reversed (“12-bit words would have made a lot more sense if it was, say, octal.”) [Edited: actually 12 bit words would make sense in either case, as it's three BCD digits or four octal digits]
The Intel 4004 used four bits to manipulate a single BCD digit. The 8086 had BCD instructions. There were many reasons for preferring BCD when designing computer architectures, though my favourite which was already becoming less relevant at 8086-time was that it meant a full column on a punchcard wouldn’t be “all holes” and reduced the likelihood of the cards tearing.
I'm thinking of earlier times, before microprocessors in general. 6-bit bytes were a thing for a while - fairly logical, too, given that it was just enough bits to encode the entirety of ITA2 without needing any control codes to switch between character banks.
The Intel 4004 used four bits to manipulate a single BCD digit. The 8086 had BCD instructions. There were many reasons for preferring BCD when designing computer architectures, though my favourite which was already becoming less relevant at 8086-time was that it meant a full column on a punchcard wouldn’t be “all holes” and reduced the likelihood of the cards tearing.