Hacker News new | ask | show | jobs
by gchpaco 3504 days ago
A formal language is basically just a mathematical set of acceptable strings made up of symbols in a (typically finite) alphabet; thus the smallest is probably the empty set, or if you insist on an accepting path, a set containing only the empty string.

The interesting part comes up when you want to match an infinite set of strings with an algorithm. Probably the simplest such algorithm would be "regardless of input, accept" which isn't exactly useful but would suffice.

It's dissatisfying to have an algorithm that puts weird, arbitrary restrictions on composing languages; for example it's usually the case that if you take a language L1 and a language L2 that are both acceptable to the algorithm, the concatenation (for every string in L1 and every string in L2, the concatenation of their strings is in the new language) language is usually acceptable to the algorithm.

Some commonly used languages do not have all of these properties; in particular the intersection of two context-free languages may not be a CFL, nor the complement. Deterministic context free languages are even more restrictive. You seem to need to give some things up as you move up the language hierarchy in power; in particular string homomorphism and intersection seem to be mutually exclusive in languages more powerful than regular expressions.

If you're interested, the theory of abstract families of languages has been studied, although I do not know a lot about it myself.