Hacker News new | ask | show | jobs
by NWoodsman 1282 days ago
Change my view: given any data storage medium, the smallest granularity of data must also be the most-child element of any markup language. Given the immense overhead of storing markups on a granular level, processing markup therefore must be a perpetual exercise in recursion.

I.e.

      Poem->Verse->Line-> <char>

      Book->Page->Chapter->Paragraph->Sentence->Word-> <char>

      HTML->Body->Div->P-> <char>
Therefore, any given letter (here as a <char> type) can retain a back reference of parents, so the <char> object retains a hashset of {Line,Word,P} parent type references representing three domains, but really needs to be a Dictionary of key values, the key being the domain name, the value being the parent name, so that would be:

Domain: Poetry, Value: Line

Domain: Book Object Model, Value: Word

Domain: HTML, Value: P Element

We could then ask any letter arbitrarily "what is your Font Style in your HTML context?" and it would be able to walk up the parent P which obtains its style from a CSS markup, and return that correctly. Or "What is your Poem's name in your Poetry context?" and it could recurse up to the Poem element to find it's Title.

2 comments

Are you claiming the parents will always be unique? Because as the article says, you can easily have this, where going to the right is a parent relationship:

                  -> Line -> Verse -> Poem
    char -> Word
                  -> Clause -> Sentence -> Poem
You can try adding a further constraint that any given property must have only one path, so you can then recurse over the tree and find the one match, but as your model gets richer you will find that breaks.

And it's that last clause that is the killer for pretty much anything: "As your model gets richer you will find that breaks."

Plus the UI experience for that is awful. "I want to add this property to this Line but you're telling me it's a duplicate for some particular character? What the hell does that mean? I'm not adding a property to the character!" etc. etc.

If I'm understanding you correctly, in that model a Paragraph should have a parent Page (and there should be a clear answer to the question "what page is this paragraph on?"). Is that correct? If so, that doesn't match how most paginated texts are formatted, where paragraphs frequently start on one page but finish on another