Working with pure bytes is one option that's being researched. That way you're not really constrained by anything at all. Sound, images, text, video, etc. Anything goes in, anything comes out. It's hard to say if it's feasible with current compute yet without tokenizers to reduce dimensionality.