| This looks cool and could be a much needed step towards fixing the web. Some questions: [Tech] 1. How deep does the modification go? If I request a tweek to the YouTube homepage, do I need to re-specify or reload the tweek to have it persist across the entire site (deeply nested pages, iframes, etc.) 2. What is your test and eval setup? How confident are you that the model is performing the requested change without being overly aggressive and eliminating important content? 3. What is your upkeep strategy? How will you ensure that your system continues to WAI after site owners update their content in potentially adversarial ways? In my experience LLMs do a fairly poor job at website understanding when the original author is intentionally trying to mess with the model, or has overly complex CSS and JS. 4. Can I prompt changes that I want to see globally applied across all sites (or a category of sites)? For example, I may want a persistent toolbar for quick actions across all pages -- essentially becoming a generic extension builder. [Privacy] 5. Where and how are results being cached? For example, if I apply tweeks to a banking website, what content is being scraped and sent to an LLM? When I reload a site, is content being pulled purely from a local cache on my machine? [Business] 6. Is this (or will it be) open source? IMO a large component of empowering the user against enshittification is open source. As compute commoditizes it will likely be open source that is the best hope for protection against the overlords. 7. What is your revenue model? If your product essentially wrestles control from site owners and reduces their optionality for revenue, your arbitrage is likely to be equal or less than the sum of site owners' loss (a potentially massive amount to be sure). It's unclear to me how you'd capture this value though, if open source. 8. Interested in the cost and latency. If this essentially requires an LLM call for every website I visit, this will start to add up. Also curious if this means that my cost will scale with the efficiency of the sites I visit (i.e. do my costs scale with the size of the site's content). Very cool. Cheers |
If you're familiar with Greasemonkey, we work similar to the @match metadata. A given script could have a specific domain like (https://www.youtube.com/watch?v=cD5Ei8bMmUk) or all videos (https://www.youtube.com/watch*) or all of youtube (https://www.youtube.com/*) or all domains (https:///). During generation, we try to infer your intent based on your request (and you can also manually override with a dropdown.
> 2. What is your test and eval setup? How confident are you that the model is performing the requested change without being overly aggressive and eliminating important content?
Oh boy, don't get me started. We have not found a way to automate eval yet. We can automate "is there an error?", "does it target the right selectors", etc. But the request are open ended so there are 1M "correct" answers. We have a growing set of "tough" requests and when we are shipping a major change, we sit down, generate them all, and click through and manually check pass/fail. We built tooling around this so it is actually pretty quick but definitely thinking about better automation.
This is also where more users comes in. Hopefully you complain to us if it doesn't work and we get a better sense of what to improve!
> 3. What is your upkeep strategy? How will you ensure that your system continues to WAI after site owners update their content in potentially adversarial ways? In my experience LLMs do a fairly poor job at website understanding when the original author is intentionally trying to mess with the model, or has overly complex CSS and JS.
Great question. The good news is that there are things like aria labels that are pretty consistent. If the model picks the right selectors, it can be pretty robust to change. Beyond that, hopefully it is as easy as one update request ("this script doesn't work anymore, please update the selectors"). Though we can't really expect each user to do that, so we are thinking of an update system where e.g. if you install/copy script A, and then the original script A is updated, you can pull that new update. The final stage of this is an intelligent system where the script can heals itself (every so often, it assess the site, sees if selectors have changed and fixes itself) -> that is more long-term.
> 4. Can I prompt changes that I want to see globally applied across all sites (or a category of sites)? For example, I may want a persistent toolbar for quick actions across all pages -- essentially becoming a generic extension builder. Yes, if domain is https:/// it applies to all sites so you can think of this as a meta-extension builder. E.g. I have a timer script that applies across reddit, linkedin, twitter, etc. and keeps me focused.
> 5. Where and how are results being cached? For example, if I apply tweeks to a banking website, what content is being scraped and sent to an LLM? When I reload a site, is content being pulled purely from a local cache on my machine?
There is a distinction. When you generate a tweek, the page is captured and sent to an LLM. There is no way around this. You can't generate a modification for a site you cannot see.
The result of a generation is a static script that applies to the page across reloads (unless you disable it). When you apply a tweek, everything is local, there is no dynamic server communication.
Hopefully that is all helpful! I need to get to other replies, but I will try to return to finish up your business questions (those are the most boring anyway)
-- Edit: I'm back! --
> 6. Is this (or will it be) open source? IMO a large component of empowering the user against enshittification is open source. As compute commoditizes it will likely be open source that is the best hope for protection against the overlords.
It is very important to me that people trust us. I can say that we don't do X, Y, Z with your data and that using our product is safe, but trust is not freely given (nor should it be). We have a privacy policy, we have SOC II, and in theory, you could even download the extension and dig into the code yourself.
Open-source is one way to build trust. However, I also recognize that many of these "overlords" you speak of are happy to abuse their power. Who's to say that we don't open our code, only to have e.g. OpenAI fork it for their own browser? Of course, we could put restrictive licenses, but lawsuits haven't been particularly protective of copyright lately. I am interested in open-sourcing parts of our code (and there certainly is hunger for it in this post), but I am cognizant that there is a lot that goes into that decision.
> 7. What is your revenue model? If your product essentially wrestles control from site owners and reduces their optionality for revenue, your arbitrage is likely to be equal or less than the sum of site owners' loss (a potentially massive amount to be sure). It's unclear to me how you'd capture this value though, if open source.
The honest answer is TBD. I would push back on your claim that we wrestle control from site owners and reduce their optionality for revenue. While there likely will be users who say "hide this ad" (costing the site revenue) there are also users who say "move this sidbebar from left to right" or "I use {x} button all the time but it is hidden three menus in, place it prominently for easy access". I'd argue the latter cases are not negative for the site owners, they could be positive sum. Maybe we even see a trend that 80% of users make this UX modification on Z site. We could go to Z site and say, "Hey, you could probably make your users happy if you made this change". Maybe they'd even pay us for that insight?
Again, the honest answer is that I'm not certain about the business model. I am a lover of positive sum games. And in the moment, I am building something that I enjoy using and hopefully also provides value to others.
> 8. Interested in the cost and latency. If this essentially requires an LLM call for every website I visit, this will start to add up. Also curious if this means that my cost will scale with the efficiency of the sites I visit (i.e. do my costs scale with the size of the site's content).
As I noted above, this does not require an LLM call for every website you visit. You are correct that that would bankrupt us very quickly! An LLM is only involved when you actively start a generation/update request. There is still a cost and it does scale with the complexity of the site/request, but it is infinitely more feasible than running on every site.
In the future, we may extend functionality so that the core script that is generated can itself dynamically call LLMs on new page loads. That would enable you to do things like "filter political content from my feed" which requires test time LLM compute to dynamically categorize on each load (can't be hard-coded in a once-generated static script). That would likely have to be done locally (e.g. Google actually packages Gemini nano into the browser) for both cost and latency reasons. We're not there yet, and there is a lot you can do with the extension today, but there are definitely opportunities to build really cool stuff, way beyond Greasemonkey.
Wow, you really put me to work with this comment. Appreciate all the great questions!