Hacker News new | ask | show | jobs
by microtonal 5435 days ago
Our (natural language) parser/generator is for the largest part written in Prolog (the rest is C/C++/Tcl):

http://www.let.rug.nl/vannoord/alp/Alpino/

Prolog matches very well with the grammar formalism that we use (attribute-value grammars), since attribute-value structures with arbitrary depth and re-entrancy can easily be represented as Prolog terms and larger analyses can be constructed through unification.

E.g. the grammar rules are plain Prolog rules where one argument argument is a list that represents the right hand side of a grammar rule, and another argument is the left-hand side. The rule goals define relations between the RHS and LHS by unifying parts of the RHS structures with the LHS structure. For instance, the rule for np -> det, n is this:

  grammar_rule(np_det_n, NP, [ Det, N ] ) :-
      np_det_n_struct(NP,Det,N).
Where np_det_n_struct unifies paths between NP <-> Det, NP <-> N, and unifies some paths with atoms. Completing a grammar rule is simply a matter of unifying members of the RHS list.
1 comments

This looks like the purpose for which DCGs are designed (http://en.wikipedia.org/wiki/Definite_clause_grammar). It's hard to tell from your example, but would the syntax not fit your applications? (Also, why

    grammar_rule(np_det_n, NP, [ Det, N ] )
instead of

    np_det_n(NP, [ Det, N ] )
?)
This looks like the purpose for which DCGs are designed

DCGs are not really practical when implementing different parsing/generation strategies, where you usually want to be able to access categories easily (ie. not as Prolog goals). Also, once you represent RHS constituents as a list rather than goals, you do not need DCG's difference lists to maintain adjacency.

Also, why ... instead of ...

Since the rule identifiers are never used as goals. They are just for printing pretty parse trees, and to see which rules fired for constructing an attribute-value structure (e.g. rule counts are used as features in the disambiguation component). Rules match on category (remember, there is more than one way to construct on np), which is a type in the type hierarchy for feature structures. An av-structure with type np will be structured as a Prolog term with the functor np.

Btw. note that the representation above is only the representation that the grammar writer will use. For parsing/generation transformed terms will be used. E.g. during generation grammar rules use the syntactic head as the first term argument, and chart edges use the next unprocessed RHS slot as the first term argument. Both to make use of first argument indexing.

http://www.sics.se/sicstus/docs/latest3/html/sicstus.html/In...