Judging from your other comments I'm sure you have some experience in this space and your own views on the different options available.
VNC makes sense for a lot of reasons. Lots of established servers. and clients. Even a solid, free and performant web client--hello Apache guacamole.
The differential rectangle streaming method that VNC uses, where it sends only the layer or the smallest bounding rectangle that's actually changing, greatly improves bandwidth performance. VNC is also application agnostic: it's just about abstracting the viewport streaming and interaction wire protocol away from whatever is being displayed on the viewport--as far as I know. VNC has also been proven in a number of production deployments. I'm not an expert but you could probably also use VNC with xvfb or other late lightweight virtualization components that don't require a full desktop setup. I like VNC and I think it's cool.
And even though I don't think you're asking about why not use VNC for this particular project on this show hn because I think you're speaking to the larger context of streaming applications but even given that I'll just make a point about that here: the point of this little show is to do it in the simplest way possible using old HTML technology and no client-side JavaScript, plus an off-the-shelf service side instrumented browser in the form of puppeteer. I mean you could read that as like there's a theme to kind of do a project of a remote browser in the simplest way possible, and so if you consider like a thematic series of such projects, another possible member of that series could be the simplest remote browser setup achieved using vnc. That could be cool... I haven't thought about that or that thematic series before considering your question in this context.
But this show was more about exploring the possibilities of picking off the shelf puppeteer, off the shelf ancient web technologies and having like the minimal code necessary to provide a sort of MVP remote browser. Also there's a hint, a sense or a flavor of sort of retro tech aesthetic by using really ancient old school tech like mjpeg and the button image.
But aside from the context of this particular show I try to address your question in a more general sense for application virtualization or browser virtualization.
When I started my remote isolated browser project I looked at using VNC but if I recall correctly I think two things dissuaded me at the time from pursuing it further.
The first was I sensed that I wanted or that I would want and need more control over the actual protocol both of streaming frames to me and sending interactions to the server. I sensed that I would want, or like, to reproduce as faithfully as possible a browser experience with sound and multiple tabs and file upload and download and... And also that I would want the flexibility to style and design and alter and add to the functionality and the chrome of--ie the user interface of--the browser however the hell I wanted. Now maybe you could set up VNC with chromium headless and use it with xvfb so you could still just stream the viewport and then provide your own browser Chrome for the tabs the Omni box and so on. But I don't know and also I was just sure that I would want a lot of control which I would be able to have where I to be able to fully control what actions were possible in that protocol. So I wanted a way to fully access the capabilities of the Chrome remote debugging protocol which I was confident could give me a good basis for virtualizing a browser that was running remotely.
The second thing is the dissuaded me from pursuing VNC further at that time was VNC at least the clients and service that I looked into did not support things like file upload or download or audio transmission and those were things that I considered would be essential at least at some point to have or at least to have the option of having I consider that to be absolutely essential so I would not pursue no proceed with something which did not provide those capabilities. So that was pretty much a blocker for me to use VNC. I think there are some VNC clients which support audio for like professional or Enterprise users for a fee-- which I just discovered now in sort of doing a bit of research to answer your question--which you know maybe that's fortuitous it could be a good business use case for me! But flushing that out a little more it's also necessary to kind of capture application domain specific semantic actions from the user rather than just sort of clicks and point of movements. So what I mean by that is maybe we want to capture clicking on a select element or maybe we want to catch her activating a tab or maybe we want to capture responding to a modal dialogue such as a basic auth for something like that. Now VNC runs on Pixel positions as far as I know rather than more application domain specific semantic instructions and because of that it's not always going to be reproducible because if you want to replay a series of actions say for automation on a different device or even on the same website at a different point in time for those picture positions may have changed but if you're capturing semantic interactions rather than just pixel positions that's going to help you replay things for more robust and reliable automations. And I've made no secret ever of the fact that this remote isolated browser I created was initially created as a layer a delivery layer for collaborative web scraping application that's meant to run on any device so that is still the goal and the standard to which you know features added to this I should be considered as well as you know replicating as much as possible you know regular browser experience which then only increases the utility of any automations you can do because they're more faithful to what can actually happen on a regular browser.
So hopefully all that goes some way to answering your question and providing others some interesting information.
The third thing was not so much a blocker but more like a overarching meta consideration which was like I knew I wanted to dive in to a project and learn a lot out of it and so to just providing you know a simplest possible VNC layer would then have not, or would have sort of avoided, providing me the opportunity to sort of really get into the whole streaming and protocol and other learnings that I have achieved through working on this and I would have replaced those with learning what I considered to be a lot more annoying things like you know how to integrate the existing sort of legacy VNC thing with some legacy you know operating system on browser capabilities that I wanted to have to sort of glue all that together and that sort of more general specific to The domain and you know legally kind of integration related learnings I just did not find or feel words interesting to me with what I wanted to do as they sort of more general you know and I think more powerful learning on how to sort of create a protocol to really provide the capabilities of browser virtualization.
But the final thing I think I want to say about this right now at least... is like VNC is providing an inspiration for me with its differential rectangle streaming right now because I'm thinking about using the layer tree API provided in Chrome remote devtools to further reduce the bandwidth that improve the streaming capability.
Obviously that comes with some you know bookkeeping and difficulties like keeping track of you know the matrix transformation on the layer you know the actual sort of scroll offset and relative position of different layers so basically doing the composite you know on the client side but all of those things should be pretty simple and it should greatly improve you know streaming performance which is already pretty good thank you very much.
The thing that I really wanted to get right for streaming performance was flexibility while maintaining responsiveness. For me having a high quality of image was definitely not as important as maintaining responsiveness..
so when you interact with the browser it gives you a feedback it shows you what's happening even if the image quality is lower given your particular bandwidth conditions at that time in order to achieve that responsiveness, achieving that tight low latency, responsiveness is the most important thing.
What I'm currently working on now and nearly completed is fleshing out the high end where we are going to switch over to a really sort of high streaming protocol when the network conditions permit that so when responsiveness is not an issue when there's enough bandwidth we're going to switch to a really sort of much Highet frame rate type of streaming for much more fluid viewport streaming.
And then the sort of longer-term project that I just hinted at is to use the layer tree API to really see how much more that approach could get out of reducing latency improving streaming performance by basically using what VNC does because the layer trees are basically sort of the minimal changing rectangles as far as I know I'm basically using that. Which then would be funnily enough an approach inspired by VNC although very much unlike VNC in how it handles you know the types of interactions then I'm going to send and so on.
Judging by the length of this answer and how I did with voice typing next time I should probably just make like a podcast or a YouTube video and just like send a link in the hen comment answer to that and then people who are interested can refer to that.
Judging from your other comments I'm sure you have some experience in this space and your own views on the different options available.
VNC makes sense for a lot of reasons. Lots of established servers. and clients. Even a solid, free and performant web client--hello Apache guacamole. The differential rectangle streaming method that VNC uses, where it sends only the layer or the smallest bounding rectangle that's actually changing, greatly improves bandwidth performance. VNC is also application agnostic: it's just about abstracting the viewport streaming and interaction wire protocol away from whatever is being displayed on the viewport--as far as I know. VNC has also been proven in a number of production deployments. I'm not an expert but you could probably also use VNC with xvfb or other late lightweight virtualization components that don't require a full desktop setup. I like VNC and I think it's cool.
And even though I don't think you're asking about why not use VNC for this particular project on this show hn because I think you're speaking to the larger context of streaming applications but even given that I'll just make a point about that here: the point of this little show is to do it in the simplest way possible using old HTML technology and no client-side JavaScript, plus an off-the-shelf service side instrumented browser in the form of puppeteer. I mean you could read that as like there's a theme to kind of do a project of a remote browser in the simplest way possible, and so if you consider like a thematic series of such projects, another possible member of that series could be the simplest remote browser setup achieved using vnc. That could be cool... I haven't thought about that or that thematic series before considering your question in this context.
But this show was more about exploring the possibilities of picking off the shelf puppeteer, off the shelf ancient web technologies and having like the minimal code necessary to provide a sort of MVP remote browser. Also there's a hint, a sense or a flavor of sort of retro tech aesthetic by using really ancient old school tech like mjpeg and the button image.
But aside from the context of this particular show I try to address your question in a more general sense for application virtualization or browser virtualization.
When I started my remote isolated browser project I looked at using VNC but if I recall correctly I think two things dissuaded me at the time from pursuing it further.
The first was I sensed that I wanted or that I would want and need more control over the actual protocol both of streaming frames to me and sending interactions to the server. I sensed that I would want, or like, to reproduce as faithfully as possible a browser experience with sound and multiple tabs and file upload and download and... And also that I would want the flexibility to style and design and alter and add to the functionality and the chrome of--ie the user interface of--the browser however the hell I wanted. Now maybe you could set up VNC with chromium headless and use it with xvfb so you could still just stream the viewport and then provide your own browser Chrome for the tabs the Omni box and so on. But I don't know and also I was just sure that I would want a lot of control which I would be able to have where I to be able to fully control what actions were possible in that protocol. So I wanted a way to fully access the capabilities of the Chrome remote debugging protocol which I was confident could give me a good basis for virtualizing a browser that was running remotely.
The second thing is the dissuaded me from pursuing VNC further at that time was VNC at least the clients and service that I looked into did not support things like file upload or download or audio transmission and those were things that I considered would be essential at least at some point to have or at least to have the option of having I consider that to be absolutely essential so I would not pursue no proceed with something which did not provide those capabilities. So that was pretty much a blocker for me to use VNC. I think there are some VNC clients which support audio for like professional or Enterprise users for a fee-- which I just discovered now in sort of doing a bit of research to answer your question--which you know maybe that's fortuitous it could be a good business use case for me! But flushing that out a little more it's also necessary to kind of capture application domain specific semantic actions from the user rather than just sort of clicks and point of movements. So what I mean by that is maybe we want to capture clicking on a select element or maybe we want to catch her activating a tab or maybe we want to capture responding to a modal dialogue such as a basic auth for something like that. Now VNC runs on Pixel positions as far as I know rather than more application domain specific semantic instructions and because of that it's not always going to be reproducible because if you want to replay a series of actions say for automation on a different device or even on the same website at a different point in time for those picture positions may have changed but if you're capturing semantic interactions rather than just pixel positions that's going to help you replay things for more robust and reliable automations. And I've made no secret ever of the fact that this remote isolated browser I created was initially created as a layer a delivery layer for collaborative web scraping application that's meant to run on any device so that is still the goal and the standard to which you know features added to this I should be considered as well as you know replicating as much as possible you know regular browser experience which then only increases the utility of any automations you can do because they're more faithful to what can actually happen on a regular browser.
So hopefully all that goes some way to answering your question and providing others some interesting information.
The third thing was not so much a blocker but more like a overarching meta consideration which was like I knew I wanted to dive in to a project and learn a lot out of it and so to just providing you know a simplest possible VNC layer would then have not, or would have sort of avoided, providing me the opportunity to sort of really get into the whole streaming and protocol and other learnings that I have achieved through working on this and I would have replaced those with learning what I considered to be a lot more annoying things like you know how to integrate the existing sort of legacy VNC thing with some legacy you know operating system on browser capabilities that I wanted to have to sort of glue all that together and that sort of more general specific to The domain and you know legally kind of integration related learnings I just did not find or feel words interesting to me with what I wanted to do as they sort of more general you know and I think more powerful learning on how to sort of create a protocol to really provide the capabilities of browser virtualization.
But the final thing I think I want to say about this right now at least... is like VNC is providing an inspiration for me with its differential rectangle streaming right now because I'm thinking about using the layer tree API provided in Chrome remote devtools to further reduce the bandwidth that improve the streaming capability.
Obviously that comes with some you know bookkeeping and difficulties like keeping track of you know the matrix transformation on the layer you know the actual sort of scroll offset and relative position of different layers so basically doing the composite you know on the client side but all of those things should be pretty simple and it should greatly improve you know streaming performance which is already pretty good thank you very much.
The thing that I really wanted to get right for streaming performance was flexibility while maintaining responsiveness. For me having a high quality of image was definitely not as important as maintaining responsiveness.. so when you interact with the browser it gives you a feedback it shows you what's happening even if the image quality is lower given your particular bandwidth conditions at that time in order to achieve that responsiveness, achieving that tight low latency, responsiveness is the most important thing.
What I'm currently working on now and nearly completed is fleshing out the high end where we are going to switch over to a really sort of high streaming protocol when the network conditions permit that so when responsiveness is not an issue when there's enough bandwidth we're going to switch to a really sort of much Highet frame rate type of streaming for much more fluid viewport streaming.
And then the sort of longer-term project that I just hinted at is to use the layer tree API to really see how much more that approach could get out of reducing latency improving streaming performance by basically using what VNC does because the layer trees are basically sort of the minimal changing rectangles as far as I know I'm basically using that. Which then would be funnily enough an approach inspired by VNC although very much unlike VNC in how it handles you know the types of interactions then I'm going to send and so on.
Judging by the length of this answer and how I did with voice typing next time I should probably just make like a podcast or a YouTube video and just like send a link in the hen comment answer to that and then people who are interested can refer to that.