| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by glhaynes 5520 days ago
	With the hyper-AJAXed world, client-side innovation becomes impossible. What?

2 comments

cypherpunks 5519 days ago

15 years ago, I could reasonably write a search engine. Myself. 1 person. In a few weeks (modulo bandwidth and server farm). I write a program that grabs a web page, and reads out keywords. Today, if I grab a web page, quite often, that web page has nothing except for JavaScript code. That code grabs the actual content from the server, lays it out, and animates it. To write a web search engine, I need to write a complete JavaScript library.

At the time, we were talking about developing all sorts of agents. Things that would shop for you. Things that would find parts for you. Thinks that would remember what web sites you visited, and let you search them. Things that would track where in a long set of pages you were (blog, comic, etc.), and let you keep reading from there. It happened for a while, and then it died when the web became too damn hard. Writing anything that can reasonably see and parse web pages now takes many, many web years. There are only four or five organizations with that kind of resources (WebKit, Mozilla, Opera, IE, and internally, Google). There are countless things we just didn't even imagine.

It's like the DMCA. You notice all the innovations that happen, but you miss all the innovations it made impossible.

Groxx 5519 days ago

>15 years ago, I could reasonably write a search engine.

No, 15 years ago you could reasonably write a search engine for 15 years ago. It would suck by today's standards.

You want to handle Javascript? Easy! There are plenty of tools to choose from now. Run a browser as your crawler, visit the sites, and read the generated source instead of the static source. Shove that into your 15-years-ago search engine, and there's no difference.

>Things that would track where in a long set of pages you were

You mean bookmarks? Add a scroll %, assuming they're not nice enough to use anchor tags / IDs meaningfully, and you're golden.

>Writing anything that can reasonably see and parse web pages...

has become a community effort, instead of a bunch of isolated silos where people reinvented the wheel out of necessity.

The resources required aren't so large just because it's so much more complex, it's large because it's so much faster, and you won't survive if you can't compete. How long did we languish with crappy Javascript engines? How much would you need to know to actively compete in that section alone now? It's easy to make a slow-but-functional browser, and if you looked around you'd see some people doing just that. Making a fast-and-resilient one is as hard as making a fast-and-resilient anything, especially where human input (ie, HTML) is expected to be consumed.

cypherpunks 5519 days ago

> You mean bookmarks? Add a scroll %, assuming they're not nice enough to use anchor tags / IDs meaningfully, and you're golden.

Bookmarks in books work okay. You move them. Book marks in browsers don't. You have to remove the old one, add the new one, and the overall process is too cumbersome to be useful for the application I mentioned.

JoshTriplett 5519 days ago

We actually built a site to solve that problem. If you have a series of pages (blog, comic, book, etc) and want to mark your place in them with a bookmark that moves as you read, try Serialist (https://serialist.net/).

Groxx 5519 days ago

There's an "edit" option as well.

As to the auto-updating bookmarks, would it resolve the issue if I made an extension to do that for you? I can see the use, honestly, and I like it. (seriously, I'm offering, and I'd probably use it myself. It'd be an interesting project. Even if it doesn't resolve the issue - we might just fundamentally disagree here, I'm OK with that.)

But why should that be part of the browser, when modern browsers allow you to do damn near anything by simply leveraging it? Why should we rely on browser makers to tell us what's possible, when we can do it ourselves, because of the changes in the past 15 years?

cypherpunks 5519 days ago

I'd love to see that extension. If you write it, I will use it. I use Chrome too, so it should work here.

As to what should and shouldn't be part of the browser -- the way to figure that out is experimentation and competition. When you make technologies and standards simple and easy, people will make independent implementations and try things. The vast majority will be dumb, but some (often unanticipated ones) will turn out to be useful, clever, or brilliant. That's how the technology improves.

When you make standards big and cumbersome, progress stops.

IChrisI 5519 days ago

If you want to move a bookmark to a different place on a blog / content site, it is probably because you want to read new entries. RSS does this fairly well.

If you want to read through a site's archives, what I do is keep it open in a tab. It is restored when I reopen my browser, saved if I reboot, etc. It's not as handy as a bookmark, but it comes close.

pak 5519 days ago

With all the headless Webkit tools coming out nowadays (and all the free and fast JS engines like V8), writing a spider that runs a JS engine and clicks on all kinds of non-<a> elements is not beyond the reach of somebody innovative and motivated enough to create new kinds of spidering robots.

You won't need to write a complete JavaScript library. Look at all the testing suites that automate browser instances, Selenium being the most well-known.

rimantas 5519 days ago

15 years ago the thing we call "web application" hardly existed. If web page "has nothing except JavaScript" (e.g. GMail) is probably is web app and indexing it makes little sense anyway. If someone misuses JS on content site, that's another story. And your comment about innovation makes no sense at all. Capabilities of modern browsers (Canvas, geolocation, local storage, offline apps, etc.) offer more opportunities for innovation than "old web" could even imagine.

cypherpunks 5519 days ago

I am aware of the current opportunities.

I think you (and most people here) underestimate what the "old web" could imagine, though. We had all sort of ideas for agents that would go out and grab and analyze data for us in all sorts of clever and interesting ways. Search engines got built, as did one or two other things, and then the web just got too complex.

Hell, even I had a simple app that went out and grabbed all my favorite comics and showed them to me, nicely formatted, and without ads.

keks 5519 days ago

You mean ad filtered RSS/Atom? I assume such a program would be much faster to write these days: have a set of newsfeeds, map() them with a filter function and merge the results.

While the web gets more complex, the tools at hand get better. Much better.

DCoder 5519 days ago

If web page "has nothing except JavaScript" (e.g. GMail) is probably is web app and indexing it makes little sense anyway.

What about JS frameworks like JavascriptMVC or Sammy [1]? Google even created a spec [2] for crawling such sites.

[1] http://sammyjs.org/ [2] http://code.google.com/web/ajaxcrawling/docs/getting-started...

prodigal_erik 5519 days ago

Gmail's HTML view works fine in Links. That team has been showing competence and diligence that's increasingly rare, and I wish people wouldn't tar them with the same brush as the clowns who write js-only crap.

nitrogen 5519 days ago

At the time, we were talking about developing all sorts of agents. Things that would shop for you. Things that would find parts for you. Thinks that would remember what web sites you visited, and let you search them. Things that would track where in a long set of pages you were (blog, comic, etc.), and let you keep reading from there.

The drive toward semantic markup in HTML5 is supposed to help the web get back to those original ideals. Over time, we'll increasingly expect web developers to conform to a subset of possible HTML arrangements, much like book publishers conform to a subset of the possible random arrangements and orientations of letters on a page (odd poetry excepted).

robbles 5519 days ago

Your premise that the web is somehow less effective because you can't scrape data from pages easily doesn't make much sense to me.

Have you taken a look recently at the plethora of web APIs for just about every purpose? The modern way of collecting machine-friendly data from a server is through APIs and semantic content (RDFa, microformats, etc.).

Not through HTML / CSS / Javascript formatted pages which are made primarily for human consumption.

cypherpunks 5519 days ago

Which only works for pre-intended uses, or standards that have substantial market share.

Again, read the literature on agents from the nineties, and see how diverse a technology tree was killed....

asomiv 5519 days ago

Most people would gladly make it harder for a single person to write a search engine if, in return, it makes it easier for them to make good web pages and web apps.

FlemishBeeCycle 5519 days ago

I would blame poor/lazy devs inappropriately using JS rather than the evolution of the browser for this. For the average web page, it's unnecessary 90% of the time to require JavaScript for any core functionality ( not so much with web applications ). I have a hard time understanding why people do this as it's often much easier to test and develop when you're layering on JS unobtrusively.

mnutt 5519 days ago

Agreed that it's nearly impossible to generally parse web pages now, though if you're screen scraping it's still pretty easy (if not easier than before) to pull out data. Before you had to parse the DOM; now you can often get structured data via JSON APIs. It's more brittle, though.

trafficlight 5519 days ago

You're essentially saying we should halt all progress in web design because it will make some programming things harder for you as an individual.

That's not what the web is and that is not the web environment that I want.

codehalo 5519 days ago

Did you post this using the lynx browser?

mnutt 5519 days ago

I think he's saying that it makes scraping harder.

But today JS frameworks like jQuery give us the means to do anything we want javascript-related, in any browser that half-supports javascript. By deprecating IE7 they're just saying they're going to drop all of the extra hacks they had to use to keep IE7 working.

A lot of what newer browsers give us is just better rendering. You can replace a mess of tables and nested divs with things like border-radius, which means less client-side html to wade through.

cypherpunks 5519 days ago

Not just scraping. Any sort of non-human parsing of web pages. Look at the 1990s literature on web agents for lots of applications.

icebraining 5519 days ago

Are you kidding? The semantic web and using more metadata is making it easier than ever. Nowadays in many cases not only you have the content, as it is tagged with microformats or RDFa.

Try looking at Freebase or DBpedia and tell me where did you have such a huge amount of easily parsable, semantic content in the 90s.

prodigal_erik 5519 days ago

More and more content is being taken entirely off the open web and siloed behind a server that talks an unstable proprietary protocol, with exactly one blob of javascript in existence that knows how to tunnel requests over HTTP to access shreds of that content and cram them into an utterly non-semantic DOM. We are hurtling backwards into the client-server hell the web had saved us from.

icebraining 5519 days ago

Yeah, I don't see that. I see more and more accessible APIs[1] and pages having more and more an incentive to being semantic due to search engines now reading that data (hRecipe, for example).

Service architecture have also been moving from stuff like SOAP to REST, which is definitively more open and accessible.

And even Ajax-ladden webpages are still just a Firebug Network tab away since they all run over HTTP, and then you have a nicely structured data format instead of having to deal with messy HTML pages.

[1] http://www.programmableweb.com/apis

prodigal_erik 5519 days ago

A JSON (or SOAP) backend is only usable by third parties if its API is kept stable. There are far too many devs who redesign their backend request and response formats at the drop of a hat because they think their js client is the only one that matters (a self-fulfilling prophesy) and they can replace it simultaneously. And their responses tend to look like "here's some more markup to stuff into an arbitrary location in the DOM we're using today", not semantically structured (e.g., Rails now has this built into JavaScriptGenerator). A given site can be reverse-engineered, but anything built on that is going to be fragile and short-lived, much more so than when the typical visual rendering desired for a page determined its structure.

dreamdu5t 5519 days ago

Well yeah, the web as a collection of interlinked information is dead. It's now television 2.0 with most innovation going to marketing.