Hacker News new | ask | show | jobs
by CatsAreCool 2150 days ago
Hello, this is the creator of MathLingua.

MathLingua is still in development, and the documentation still needs work.

However, I wanted to get feedback earlier rather than later to help guide development and the writing of the docs.

Note that MathLingua is the language that powers https://mathlore.org.

5 comments

could you explain the usecase a bit? i skimmed the docs and don't quite get what writing definitions and theorems in this format gets me, apart from an automated "go to definition". what can the tooling do, or what functionality are you planning to implement?

btw "The Language of Mathematics" sounds a bit over the top ;) how about "A Language for Mathematics"?

Thanks for the feedback. I'll update the docs to make this more clear.

The goal is to allow the building of a system so someone could ask, for example, "what are the known theorems that describe when a function uniformly continuous?" or "under what conditions is a group solvable?".

To enable this, it is necessary to build a system that catalogs precise statements of math theorems, definitions, and other statements (https://mathlore.org is the start of this system that uses MathLingua but currently only uses keywords and some structure for searching).

This is similar to https://mathworld.wolfram.com or the math pages in Wikipedia except the statements are not text or LaTeX, that is hard for a computer to understand the meaning of. Instead the MathLingua language is used to identify math statements by meaning.

Right now the tooling can render math statements, find duplicate definitions, and can find missing definitions. Further https://mathlore.org uses MathLingua and has some basic searching.

Further, experimental work is also underway so that the MathLingua tooling can expand a math statement by expanding the definition of items used in that statement.

Yeah, I thought about using something like "A Language for Mathematics" but I thought "The Language of Mathematics" might draw more attention :) But it is only the title of my post on hacker news, not the title of the MathLingua website.

thanks!
Cool work. You should make it very loud and clear that your toolchain does not verify proofs, since otherwise folks may think that your computer-assisted proof layout system does just that; it's rare to see tooling in this subfield which does not verify proofs.
Thanks for the feedback. I will definitely do that.
it's cool but why not just Latex? Genuine question.

I am interested in learning more abt your rationale for creating MathLingua? What drove u? What existing faults do you see in existing system that made you do it.

Thanks for the question. I thought about using LaTeX, however, it doesn't explain what a math statement means. Instead, it describes that it looks like. With MathLingua, the meaning is made precise so that a computer can understand the statement. (I'll make this more clear in the docs).

TLDR: Having a computer understand what a statement written in LaTeX means is a very difficult Machine Learning (Natural Language Processing) problem (if not impossible since math statements often have lots of hidden context).

For more details:

In LaTeX, what does "if $f$ is continuous on $[a, b]$, $f \geq 0$ there, and $\int_a^b f = 0$ then $f = 0$?

It looks clear at first, but what type of thing is $f$. From context of reading analysis books, it could be clear that $f$ must be a function because "continuous" describes functions (but can also describe other things).

Next, what does $\int_a^b f = 0$ mean? It is the integral of $f$ from $a$ to $b$, but is it the Riemann integral, the Lebesgue integral, the Darboux integral? Does the difference matter?

What exactly does $f = 0$ mean. Does it mean $f$ as a function is the same as $0$ as a function, or does it mean $f(x) = 0$ for all $x$? Are those two the same or different? Does the distinction matter?

All of these questions, show up after a computer has correctly parsed the sentence and identified the structure. However, just that step itself is a very hard natural language processing problem.

In fact, if someone comfortable in reading math was given a random theorem from a random paper and was told to explain what it means, they would likely have problems and would need to look at the context of the paper (or reference other papers) to understand the syntax. This just gives a feel for the difficultly to build a machine learning model.

That is why I decided to build a custom language. In MathLingua, for example, the theorem could be described as:

  Theorem:
  . for: f(x), a, b
    where:
    . 'a, b is \real'
    . 'a \leq b'
    . 'f is \continuous.function:on{\closed.interval{a, b}}'
    . 'f is \nonnegative.function:on{\closed.interval{a, b}}'
    . '\riemann.integral_a^b[x]{f(x)} = 0'
    then:
    . 'f is \identically.zero.function:on{\closed.interval{a, b}}'
It is more verbose, but gives a more precise meaning of what each item means. Then definitions can be made to describe what each referenced math definition means with a description of how its associated notation is written to make the rendering look similar to how it is written on paper.

For example, one could have the definition:

  [\identically.zero.function:on{A}]
  Defines: f(x)
  assuming: 'A is \set'
  means:
  . for: a
    where: 'a \in A'
    then: 'f(a) = 0'
  Metadata:
  . written: "f? = 0 \textrm{ on } A?"
This makes the definition more clear and the statement `g is \identically.zero.function:on{X}`, for example, is rendered as the familiar $g = 0 \textrm{ on } X$.

Other definitions can be written too, to make everything clear (or left for later. MathLingua doesn't require you to define the entire math universe before defining a more complex concept).

My response was a little long, but that is because there are a lot of subtleties that show up when trying to get LaTeX to work for describe the meaning of math statements. I hope this helps.

There is a conference related to that topic starting today actually (CICM 2020). Registration might still be available. The subcategory that concerns things like math lingua is called mathematical knowledge management (MKM).
Cool. Thanks for letting me know.
I spotted a typo on the linked page: "vistual"

The dot indentation syntax seems very strange. Why not just indent? Is the intention to reduce nested indentation a little?

Thanks for letting me know about the typo.

The dot space is used to mark arguments since some arguments span multiple lines but are connected. For example, in

  Theorem:
  . for: x
    where: y
    then: z
  . 'some statement'
the 'for:where:then' group together is one argument, while the 'some statement' item is the second argument. The dot is needed to distinguish from

  Theorem:
    for: x
    where: y
    then: z
    'some statement'
where it is hard to tell where one argument stops and another starts.
What about the following, though?

  Theorem:
    for: x
      where: y
      then: z
    'some statement'