| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by MrJohz 612 days ago

I think this a valid approach to semantic wrapping, but I don't think this is the only one, and specifically I think it has significant flaws: (1) We've lost grepability unless I write rather complex regexes to handle the possible places where hard line breaks may have been added. (2) We've lost diffability in the sense that if I correct a typo in the word, that correction can cascade through the word and cause multiple lines to show up as changed in the diff when semantically only one part of one word has changed.

Instead, I would prefer a soft semantic wrap: if a single semantic unit (be that a word, a clause, or whatever else) extends beyond, say, 80 characters, we keep it on the same line and let the editor/file viewer handle wrapping. This means that we maintain grepability over words and semantically-connected phrases, and we maintain diffability by avoiding the hard-wrap cascade. To me, this is a much more useful version of semantic wrapping, because it only wraps when there is a semantic clause, and not on any arbitrary semantic break.

My goal here isn't to convince you that this version is better than your version of semantic wrapping, only that wrapping based on semantics is an orthogonal concept to hard and soft wrapping, and that even if we choose to take a semantic wrapping approach, we still need to decide what to do with particularly long lines.

(Although I will add to this: I had a colleague who was a deep fan of semantic wrapping, and I just never really got it. I used it for a couple of years, but I've never run into issues with simply soft-wrapping everything. When inserting new clauses or changing text in the middle of a line, every diff tool that I've used has been able to accurately identify which portion of a given paragraph has changed and highlight it. Meanwhile, as a writer and reader, I need to put more effort into reading prose that is written in an odd, stylised format that is very different from the intended paragraph structure. I can see the argument that I've accepted semantic line breaks in code or configuration files, so I should be able to handle it in markdown, but I just find it harder to read and more irritating to write. But assuming someone does want to use semantic line breaks, I still believe that that's an orthogonal choice to deciding between hard and soft wrapping.)

1 comments

a1369209993 611 days ago

> Instead, I would prefer a soft semantic wrap

So would I, but...

> if a single semantic unit (be that a word, a clause, or whatever else) extends beyond, say, 80 characters, we keep it on the same line and let the editor/file viewer handle wrapping.

...the editor can't do that because it doesn't understand the semantics.

> that wrapping based on semantics is an orthogonal concept to hard and soft wrapping

Yes, that's why I've been saying "hard and/or soft [but in either case nonsemantic] wrapping".

> > > With semantic wrapping you put each sentence (or similar) on a new line [...] But if that sentence runs over e.g. 80 characters, [then...]

... You don't need to fall back on non-semantic wrapping, you can just just keep breaking it up into smaller and smaller semantically-meaningful pieces.

(You have to do that 'hard'-ly because the editor doesn't understand the semantics, but that's not "decid[ing] whether you're going to hard wrap or soft wrap", it's being forced to hard wrap as a implementation detail because that's what results in correct wrapping.)

It might not be worth the effort to do that, but you're never forced not to (given not-pathologically-short line length limits like 20 characters).

link

MrJohz 611 days ago

Hmm, I think we have different definitions of a semantic line wrap. To me, semantic line breaks means that line breaks are used to separate clauses and sentences, such that at least every sentence is on its own line, and every line break represents a semantic clause or sentence gap.

To you, I get the impression that semantic wraps are about ensuring that every wrap/line break happens at a semantically valid place, where semantically valid could be a semantically valid clause, but also a semantically valid intra-word line break.

In that sense, I can see how your strategy would produce the same effects as hard wrapping, albeit with different choices about where to put the wraps. But I think then, like I said, you end up running into the same difficulties that you do with conventional hard wrapping, at least in pathological cases.

link

a1369209993 611 days ago

> such that at least every sentence is on its own line

Yes, with the obvious possible exception of trivial/degenerate cases like "i++; j--;" in C or "This is a cat. That is a dog." in English.

> and every line break represents a semantic clause or sentence gap.

Specifically, it represents a maximally coarse semantic gap, drilling as shallowly down into subclauses as possible/practical.

> wrap/line break [can happen at ...] also a semantically valid intra-word line break.

Preferably only if that word would already be alone on its overly-long line. Eg:

  # bad, breaks subordinate clause before superordinate
  That sounds supercalifragilistic-
    expialidocious.
  
  # semantically valid, but ugly (a pathological case)
  That sounds
    supercalifragilisticexpialidocious.
  
  # vertically larger, but probably fine
  # (unless you're feeling incunabulum-y[0])
  That sounds
    supercalifragilistic-
    expialidocious.

> you end up running into the same difficulties that you do with conventional hard wrapping, at least in pathological cases.

I've yet to see any evidence that really pathological cases exist. (As opposed to "I'm lazy and can't be arsed" cases, which I'm fairly explicitly not disputing.)

0: http://code.jsoftware.com/wiki/Essays/Incunabulum

link

a1369209993 611 days ago

> given not-pathologically-short line length limits like 20 characters

Poor phrasing; 20 characters was meant as a example of a limit that is pathologically short.

link