Hacker News new | ask | show | jobs
by alok-g 4788 days ago
>> The grandparent mentions sentences of 100 word (I take that he means tokens).

Correct. Though I have seen valid (readily human-readable) sentences even longer at 140 tokens, so the number of words too can reach or exceed 100 more frequently than commonly assumed.

>> I'd guess that such sentences usually contain one or more dependent clauses, that can be parsed separately if necessary.

Absolutely. Often more than one independent clause and several dependent clauses. But I am not aware how to identify these and parse them separately. Can you please shed some light? Are there for example some simpler grammars available that do not need to do the full parse to identify these clauses?

1 comments

I haven't tried such a thing (yet), but there is some work in that area. Advaith Siddhartan's thesis may be a good start:

http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.1.8...