Show your code, or show you the door. There are so many native Mac and iOS apps out there right now perfectly capable of rendering Markdown and streaming text. You just gotta wonder what is this guy’s excuse.
OP says "you want to select a whole Markdown document built from SwiftUI primitives", but who wants that? what sort of product thinking tells us we want that? that sounds like a document editor, which has been hard to build for decades and sounds out of scope for an llm chat ui. everyone has landed on only supporting selection within each contiguous block, with a copy button for the entire message
LLMs are often used to generate Markdown because they're quite good at it and unlike HTML it's very forgiving.
Rendering text into things like chat bubbles or even just generic output panes as it comes in is a massive pain. Every new word requires redoing layout, detecting LTR versus RTL flows and overrides, figuring out word breaks and line breaks, possibly combined with resizing the containing UI element (which involves measuring the render space, which is often implemented by rendering to a dummy canvas and finding out the limits).
Document editors have it relatively easy because humans type at a relatively low speed and pasting is a single operation (although pasting large amounts of text does hit the render performance of the UI). They're also often provide relatively limited features on phones.
If you want to render something like ChatGPT with similar features in native UI, youre going to need to find a fully-fledged document component or build one yourself. And, as it turns out, we have document components that work quite well: web engines.
If you embed a webview rendering just HTML and CSS, you get better performance, features, and accessibility than any home-grown renderer will provide. And with every major OS coming with a browser built in, it won't even bloat your app.
HTML is famously forgiving as well - that's the whole reason XHTML failed, because one typo in the latter will make your entire web page fail to render with an error. Markdown is probably a little more forgiving, which mattered more with previous-gen LLMs with small context windows. Any near-frontier model should have no problem generating valid HTML.
Also a lot of LLMs are trained specifically to expect Markdown in their instructions, OpenAI's models in particular (Anthropic expects more pseudo-HTML/XML, but that is different from real HTML/XML).