This is very awesome! The implementation is different to what I expected (or to what I would have done).
They seem to run a browser on the server and let the user interact with it to choose DOM elements to monitor for changes. I would have just taken a screenshot (perhaps with http://urlbox.io/ or perhaps with CutyCapt) and allowed the user to draw boxes over the screenshot. Then repeatedly screenshot the site and whenever the contents of that box changes, alert the user.
The Sieve method has the advantage that you are able to tell the user what the new text is. The screenshot method is significantly simpler.
EDIT:
So, in startup terms, the screenshot method could be the MVP :). Provided, of course, that it is considered "viable" not to know the text. If I were a user I would consider it viable - it is still a vast improvement over having to check the site manually.
Getting started would certainly be much simple when using screenshot or page HTML for comparison. Sieve could do that too! More work needs to be done to make startup faster.
On the other hand having filtered text provides multiple advantages. The filtering becomes accurate. It can be used to compute rules to take conditional actions. Notifications delivery via email and SMS would make more sense as well.
Running a browser on server enables another cool function. User could record macro and run the macro when text satisfied a pre-condition.
I'm been working on http://imnosy.com for a while, and thought about that approach, but ran into difficulties when slight variations to the screen were made. Right now, we're using a diff engine. Sieve looks pretty rad though. Will check that out. Fun space :)
This is what the selection boxes are useful for - you can make it ignore changes to irrelevant areas of the page (though obviously that doesn't apply so well to imnosy).
I'd been mulling over the same concept for quite a while, as a sort of an intelligent update to IE5 for Mac's Subscription manager[1]. It was a very useful tool in my toolbox, and I mourned losing it as that browser decayed.
The main issue with the Subscriptions was that they were global, and would not inform you what changed, just that there were changes. With the increased dynamicness of the web since good ol' Y2K (especially with ads), this model is much less feasible, whereas a DOM-based model is more robust, and allows further automatic data processing.
I never got past small prototypes, so I look forward to Sieve's release since it is basically someone doing my work for me! :)
If what you are doing is essentially a diff for news, then you might be onto something very interesting.
The way news are consumed is currently tiered by the temporal interval which they cover: breaking news, daily news, weekly, monthly, annual summaries. A diff for news can help process the different tiers from a single reader, without the "breaking news" tier taking over.
At its core, the essential work is to detect changes important to the user. Adding summarization techniques to further weed out the noise would be a very important improvement.
Maybe I'll give it a try, I use http://www.changedetection.com/ for some pages I like to monitor (typically job sites) but it works only if you can clearly address the page you want to monitor
Thanks for your kind words. The stack is a mix of both. Browser runs on the server and sends updates to the client via websocket. Then it is painted on the client. On the client-side a part of noVNC js library is used to capture input and is sent to the browser running on the server.