Hacker News new | ask | show | jobs
by jagracey 2726 days ago
Great work yz-yu. Hope you've learned a lot- I've personally found the session replay space to be incredibly rewarding.

However, as a session replay industry competitor and a former security researcher for most industry players, I caution anyone thinking of using a side-project like this on production applications to proceed slowly with care.

Security and Privacy are extremely hard to get right here. The tricky thing about session replay analytics is that attackers have a huge attack vector, and compromise means gaining a treasure trove of all user data. The nature of replay is in a way a form of XSS. Modern security features help (like CSPs, iframe Sandbox attribute) but browser changes can cause issues.

Some of the challenges: - CSPs can often be bypassed using Google API libraries, <Object/>, <SVG> - Blacklisting <SCRIPT/> tags can often be bypassed with an XML namespace - CSS based data or password exfiltration. - Clickjacking, "data:" urls etc. - Could you imagine a web request proxy server deploying Service Workers? - postMsg() from further nested frames

Substantial work goes into sandboxing replay environments and limiting PII. Defense in depth is particularly important here. Enterprise level research, auditing, monitoring and care should be taken seriously.

4 comments

Definitely learned a lot and enjoy the process and thanks for your really important suggestions.

Quote from my introduction blog post:

===

Today we already have some commercial session replay products like Logrocket, Fullstory, etc.

If you are just looking for a ready-to-use tool and would like to pay for its service, I would recommend you to use the above products, because they have well-tested backend services that can store the data for you and perform some higher order features.

===

So I don't think rrweb is a competitor of these commercial products.

Actually, I would like to see rrweb grows into a base of many commercial products in the future, which means it handles most of the privacy and security issues, so the other developers can build many fancy projects base on it without spending time on the hard part again and again.

(Sorry your account was rate-limited! New accounts are subject to that restriction but it's definitely not intended for cases like this. I've marked your account legit so it won't happen again.)
My current job has live, in house, QA testers.

One idea I've been toying around with is a tool that records their movements through our site when testing, aggregating them, and then being able to show hot and cold spots of out site that they hit on their full site run throughs.

I haven't really dug into your code yet, but it sounds like this might be a good base for that or am I way off base in thinking that?

You are absolutely correct with regard to privacy and security for any public facing application. However I still think this is fantastic for very specific production use cases. For instance I am working on a very large, but entirely internal web application. The application includes trade secrets that we would never want exposed to a third party, but it is not the kind of application that would ever contain personal information. So this is pretty much perfect for our production use case since the recorded information never has to leave our control and the only users of our application would be employees. Any concern about a bad actor trying to harm our system or steal information is handled at an entirely different level.
To add to the key point about privacy, this research from Princeton is really illuminating and scary: https://freedom-to-tinker.com/2017/11/15/no-boundaries-exfil...
I might be alone on this one, but I feel the Freedom-to-Tinker report was unfair to the analytics providers. I know the folks in the industry work really hard towards privacy and security. They go out of their way to make it clear that not everything is automatically censored, and provide easy tools to limit data and visualize what is and isn't recorded. Holding PII and other sensitive data truly is a liability- nobody wants it.

Companies like Walgreens should be entirely to blame.

I really do appreciate how they author(s) in that report uncovered how those services where used in practice.

[I'm not with any party listed in the report]

I would love to read more about these kind of security issues, maybe you have a blog about this?

Or some great resources maybe?

I know of the obvious ones like OWASP; but that only scratches the surface.

I got one about bypassing GitHub's authentication using Unicode on the company blog: https://blog.getwisdom.io/hacking-github/

I've wanted to write a deep dive on JS defense for a while now. Lots of cool stuff learned I'd love to share- maybe in the next few weeks.

Hm, that's odd - ublock blocks access to the site :-/
> I've wanted to write a deep dive on JS defense for a while now. Lots of cool stuff learned I'd love to share- maybe in the next few weeks.

Please do! :)