By "compositional datatypes", I mean being able to define datatypes and functions on them in a modular fashion. I gave a very simple example with enums. For a real world use case, imagine a compiler pipeline, where we have an AST that we are desugaring over multiple steps. Ideally we want a succession of ASTs that remove the form being eliminated, so the AST type can guarantee it's eliminated. This is very easy with structural variants, there's no need to actually define and name each separate AST, which differs only slightly.
Or imagine we are typing a SQL join that is a merge of two existing record types. Such "compositional data types" can be achieved with elaborate encodings and type-level computations in a language like Haskell (see e.g. Data types `a la carte), but it's difficult and somewhat ugly. There is an opportunity here for improvement.
You can do this with the tagless-final encoding, which has a similar flavor to Data types à la carte's concept of open data types. As an example, define
class ExpSYM a where
add :: a Int → a Int → a Int
lit :: Int → a Int
class MulSYM a where
mul :: a Int → a Int → a Int
Now suppose we extend with booleans and so on. We can freely combine addition and multiplication. As we add new data variants, old code need not be recompiled. We get the openness of OOP's class extensions with the flexibility of ADTs.
ex1 :: (ExpSYM a, MulSYM a) => a Int
ex1 = add (lit 3) (mul (lit 4) (lit 5))
Not enough space here but you can also pattern match over final terms and interpret them in different ways (e.g. CBN vs CBV interpreters).
Then as you go through your compiler pass you can see through the types that "capabilities" get reduced more and more until you reach some base language (sounds like free monads doesn't it?).
Tangentially if I were to design an FP language I would use tagless final for everything, and deforest as much as I can (if possible) to reduce overhead. How this affects other aspects (e.g. automatically translating pattern matching) I have not looked into, but it's interesting nonetheless.
Yes, but like any encoding, it has its problems. One potential issue is that you will need to formulate your transformations as strict catamorphisms (folds), e.g. you cannot match two levels deep. This has reasoning advantages, but makes it harder for the average user. Type classes can also have issues with ambiguities, if many type variables are involved. Lastly, you also cannot match all remaining terms, like these examples:
getDates
:: Exp
-> Set Date
getDates = cata alg where
alg :: ExpF (Set Date) -> Set Date
alg (EDate i) = S.singleton i
alg e = fold e
substitute
:: Map VarId (ExpF Exp)
-> Exp
-> Exp
substitute env = cata alg where
alg :: ExpF Exp -> Exp
alg (EVar i)
| Just e <-M.lookup i env = Fix e
alg e = Fix e
Ah ha, but you see, you can perform transformations on tagless-final terms that are not strict catamorphisms. Oleg demonstrates this in (page 14)[0] with examples of reassociating binary expressions and double negation, so matching two levels deep is possible.
As for matching remaining terms, I don't know the answer to that, would be interesting to investigate.
Yeah, having many typeclasses involved could be problematic. What sort of issues are you thinking of? One possibility is that so many constraints are involved no concrete type can instantiate a final term.
> Ah ha, but you see, you can perform transformations on tagless-final terms that are not strict catamorphisms ... so matching two levels deep is possible.
No this is not correct. Oleg provides a solution to double negation by creating a catamorphism that folds to a function. It's still a catamorphism. And folding to a function is not something the average Java programmer is likely to understand easily.
Right, so it's not that it's not possible but rather it's somewhat awkward to express non-compositional folds in the final approach. I'm not currently aware of a mechanical way to do the translation.