I would attempt a nibble based compact instruction representation to reduce external memory bandwidth. Fixed width instructions kinda suck now that memory is such a bottleneck.
I've long wished of a middle ground between FPGAs and CPUs - namely a CPU with user-changable instructions.
Have a CPU that is a CISC (but internally a microcoded TTA), but with a large chunk of the microcode user-writable (So you have push-inst and pop-inst, where push-inst pushes the new instruction microcode into the microcode storage and copies the old instruction microcode onto the stack and pop-inst does the opposite). It keeps the advantages of fixed-width instructions while, depending on how the microcode is encoded, potentially having significant memory savings.
The arc processor line from Synopsys does this commercially, I believe. Risc-v seems to be trying to support this sort of thing; there is reserved opcode space for implementation specific extensions
Have a CPU that is a CISC (but internally a microcoded TTA), but with a large chunk of the microcode user-writable (So you have push-inst and pop-inst, where push-inst pushes the new instruction microcode into the microcode storage and copies the old instruction microcode onto the stack and pop-inst does the opposite). It keeps the advantages of fixed-width instructions while, depending on how the microcode is encoded, potentially having significant memory savings.