| The "magic" of Prolog is built upon two interesting concepts : Unification ( https://en.wikipedia.org/wiki/Unification_(computer_science)... ) and Backtracking ( https://en.wikipedia.org/wiki/Backtracking ). Often bad teachers only present the declarative aspect of the language. By virtue of being declarative, it allows to express inverse problems in a dangerously simple fashion, but doesn't provide any clue for a solution. And you are then using a declarative language to provide clues to guide the bad engine toward a solution. Making the whole code an awful mashup of declarative and imperative. Rules : - N integer, a integer > 1, b integer > 1 - N := a * b Goal : N = 2744977 You can embed such a simple problem easily but solving it is another thing. The real surge of Prolog and other declarative constraint programming type of language will be when the solving engines will be better. Unification is limited to the first order logic, high-order logic unification is undecidable in the general case. So we probably will have to rely on heuristics. By rewriting prolog goal solving as a game, you can use deep learning algorithms like alphago (Montecarlo tree search). This engine internally adds intermediate logical rules to your simply defined problem, based on similar problems it has encountered in its training set. And then solve them like LLM, by picking the heuristically picking the right rule from intuition. The continuous equivalent in a sort of unification is Rao-Blackwellisation (done automagically by deep-learning from its training experience) which allows to pick the right associations efficiently kind of the same way that a "most general unification algorithm" allows to pick the right variable to unify the terms. |
I don't know how to reconcile this statement about deep learning with my understanding of Rao-Blackwell. Can you explain:
- what is the value being estimated?
- what is the sufficient statistic?
- what is the crude estimator? what is the improved estimator?
Roughly, I think sufficient statistics don't really do anything useful in deep learning. If they did, they would give a recipe for embarassingly parallel training that would be assured to reach exactly the same value a fully sequential training. And from an information geometry perspective, because sufficient statistics are geodesics, the exploratory (hand-waving) and slow nature of SGD could be skipped.