You can make a new instruction (or repurpose an existing one) that accesses physical memory bypassing the page walk, which would be faster. You can also make instructions that bypasses some checks (like privilege checks) and squeeze some tiny performance. Note this would introduce security issues though, so you could only use it on trusted software.
It's interesting to think about the sorts of things we could do if we had low level control over our hardware. Unfortunately things seem consistently headed in the opposite direction.
RISC-V wouldn’t help here at all. There’s nothing about RISC-V that prevents a CPU manufacturer putting in custom instructions and not documenting them.
It's not especially likely you'll be able to make things faster unless you're really strapped for fetch bandwidth. The reasoning is that most useful uops will already have instructions which directly decode to them (ie you're not going to get a faster load with ucode than if you just used a normal load instruction). In fact, given that ucode branches are (typically?) only statically predicted [1], you may well see worse performance if you need to use ucode branches with non-statically predictable directions.
CPU vendors typically gain performance when adding new instructions because they add new fancy uops. For example, x86 has AES instructions which lead to uops which (I imagine) exercise some hardware AES block. Vendors are not simply implementing AES in pure ucode as this wouldn't really gain any performance advantage over doing AES directly in software.