Hacker News new | ask | show | jobs
by Sesse__ 1281 days ago
It's not clear at all what that strict syntax would look like that would disambiguate with less lookahead. The fundamental problem is that colon is used both for declarations and in (pseudo-)selectors, not that people are somehow inventing corner-case CSS that nobody actually writes and we can just outlaw in a spec change.

The only change I can really think of would be requiring space after the colon for declarations (i.e. “color:red” is disallowed, it must be “color: red”), but that's much more than a strict mode, that's something that invalidates millions and millions of perfectly valid web pages and introduces a much larger whitespace sensitivity than today.

1 comments

The difference isn't the separator, it is the suffix. Strict designation would afford that either properties be signed off with a semi-colon on the same line (or just a newline). Alternatively you could go the other way and enforce selector signoff with a comma or a bracket on the same line. No strict, no nesting /newfangled wizardy.

This allows for graceful degradation.

My point about corner cases is that there is very, very limited use of pseudo selectors, relatively speaking. Let alone pseudo selection where the selector is based on an element and not a class, or ID, or something else easily differentiable from a property. Which is to say, they are the corner case.

Once you start looking at the suffix to disambiguate what the first token means, you're already in the more-than-one-token lookahead land, which is what we're trying to avoid in the first place.

CSS property declarations already need to be signed off with a semicolon on the same line. If not, the entire declaration is ignored (this is specified in the CSS standard, and if you don't implement it correctly, you will break real web pages).

I'm sorry but I thought the challenge as described was one of "infinite lookahead"? Similarly, the csswg profer "graceful degradation" as the reason why a declaration isn't workable. But this solution clearly doesn't require infinite lookahead. It also degrades gracefully.

In fact lookahead isn't needed at all, except in (exceptionally) rare cases. Is the problem that the parsers are incapable of using any smarts beyond what is already provided?

Am I missing something?

Aside: Good point on the semicolon! I think in the previous discussion someone was making the point that parsers are exceptionally flexible/forgiving re. weird and wonderful line break and spacing combos. I wasn't sure about the status of semicolon usage. Idea of strict would just be to put an end to that.

Edit: and hey, apologies for labouring the point on this. But I am genuinely interested. I feel like these conversations just always end up in "you wouldn't understand" territory.

> But this solution clearly doesn't require infinite lookahead.

It clearly does? There can be an infinite number of tokens before you see the semicolon and know what you're parsing. The page contains examples of this, or you can dig into those bug threads.

> In fact lookahead isn't needed at all, except in (exceptionally) rare cases.

“You don't need to support lookahead, except sometimes” really means “you need to support lookahead”. And that changes how your parser and tokenizer has to work (in particular, you need to be capable of saving a potentially infinite amount of tokens in case you need to rewind). You don't get around that by saying it's rare.

Yes, you need to support lookahead. But not unbounded. There are so many efficient ways to solve this problem. But computer says no?

Forget the idea that the tokeniser could place markers on semicolons / curly brackets to bound any future lookaheads. Why couldn't you just look one token ahead and analyse the potential pseudo-class/property. Pseudo-classes are clearly defined. There are about 50 of them. AFAIK none of them clash with property values. Keep it this way. If it's not a valid pseudo it's either an invalid selector or a property. Then you're just analysing the equivalent of a property value anyway.

---

Of course the above logic is flawed. I'm really just trying to tease out some useful information other than "can't be done".

I think the main challenge with this discussion is that the limitations of the parsers are not clear, at least outside circles directly working with them. Not only that, the explanations of why certain cases won't work are provided without the proper context needed to understand.

From what I understand tokenisation is dumb, it basically just spits out words. Without nesting it is straightforward for a parser to iterate over these tokens one by one, distinguishing between selectors and properties, based on prior context. Of course parsers could look ahead, but by design they don't, because efficiency.

The arguments for breaking the defacto nesting syntax (i.e. scss) seem to lie in the fact that the rules of the past must lie within the rules of the future, because graceful degradation.

Option C. - the most popular option, and also the most true to scss - while pragmatic in it's approach, is still shoehorning new into old.

I'm sure most agree that unbounded lookahead is probably not workable. But to make the argument that parsers can never look ahead, can never improve, seems ideological, if not ridiculous.