At that point, the distinction between threads and processes gets really muddy even when you're not using wasm. clone(2) is just a superset of fork(2) that lets you decide whether to share memory or not (among other resources).
I still think one could use a stackless clone(2) to get rid of the TLB flush when using threaded open(2). It would need to live without using the stack inside the clone however, and that's rather ugly to program.
Technically, you are right, but if multiple processes use the same shared memory is there a real, meaningful difference? You can do efficient, parallel computation in a browser, does it really matter how it is executed under the hood?