Hacker News new | ask | show | jobs
by jlos 3692 days ago
This would be very interesting when applied to Biblical Studies. Any serious academic discussion of biblical texts will involve syntactical breakdown of the text being discussed. Most of the times the ambiguities are clear, but its still quite common for a phrase to have several possible syntactical arrangements that are not immediately clear. These ambiguities are also challenging becuase the languages are dead (at least as used in the biblical texts). So the type of ambiguity of "Alice drove down the street in her car" can lead to some significant scholarly disagreement.

I could see Parsey McParseface helping identifying patterns in literature contemporaneous to the biblical texts. Certain idiomatic uses of syntax, which would have been obvious to the original readers, could be identified much more quickly.

3 comments

I was going to say... my main interest in this project is precisely for Biblical studies... I could talk about analyzing the Bible for hours, but let's just say there's way more depth than many even realize. The Aleph Tav in relation to the Book of Revelation is one such example, many translations omit it, but the Aleph Tav Study Bible explores it in depth. There could be many discoveries made with these kind of projects that are missed by just about anyone only reading a translation.

There are a ton of Jewish Idioms in the Bible that many don't understand at all, including "No man knows the day or the hour" which is a traditional Jewish Wedding Idiom. Lots and lots of things could be explored with enough data and resources.

I'd think that the advantage of machine translation is on corpora that are not known up front (i.e. user-supplied text) or corpora that are exceptionally large.

If you have a small (ish), well-known text, I don't think you will get much insight from machine translation. Certainly there are plenty of uses for computer text analysis/mining in biblical studies, but I doubt translation is one of them. And for obscure idioms or hapax legomena, machine translation definitely can't help you because by definition there are no other sources to rely on.

With a sufficient level of precision, there's room for machine analysis to "reveal" things we are ignoring out of custom. A lot of text analysis done by people is full of biases and deferral to authorities.

E.g. I remember from school getting in into an argument with a teacher over the interpretation of a poem. "His" interpretation, which was really the interpretation of some authority who'd written a book was blatantly contradicted by the text if you assumed that the author hadn't suddenly forgotten all his basic grammar despite all the evidence to the contrary everywhere else that he was always very precise in this respect.

Of course, in some of these kind of instances, it will be incredibly hard to overcome the retort that any "revelation" is just a bug.

In a more general sense, people are typically exceedingly bad at parsing text, judging by how often online debates devolve into bickering caused largely by misunderstanding the other party's argument. Often to the extent of even ending up arguing against people who you agree with. Having tools that help clarify the parsing for people might be interesting in that respect too.

Well I wouldn't look for idioms, but it would be interesting to throw in information such as "Strong's Concordance" into the mix, I've yet to really think of an application for this library fully, but it would be fun to play around with it nonetheless. I would be analyzing the Hebrew / Greek / Syriac scripts, seeking verses omitted, or missing, etc. It would make for interesting studying if anything.
You might be interested in Andrew Bannister's research on computer analysis of the Quran. He wrote a book on it [1], and there's also this paper which gives a high-level overview [2].

[1] http://www.amazon.com/Oral-Formulaic-Study-Quran-Andrew-Bann...

[2] http://www.academia.edu/9490706/Retelling_the_Tale_A_Compute...

> Any serious academic discussion of biblical texts will involve syntactical breakdown of the text being discussed.

I once interned for a company that's been doing this for years. They have all kinds of features tracing individual words through various different languages, etc.

https://www.logos.com/

Actually it's not very appropriate for studying bible text. In Biblical Studies you would prefer not to have any errors at all, and since you work with a limited corpus you can afford to annotate by hand. People have in fact done this and I collaborated with a group that has been working on this for decades.
For actual syntactical breakdown of the Bible, I agree. Biblical Scholars, and even competent pastors, can syntactically analyze the the Bible sufficiently well.

I would think the technology could be helpful in a fairly narrow way: identifying syntactical constructions outside the bible to help explain ambiguous syntactical constructions within it (For example, Ugaritic texts, another ancient Semitic language similar to Hebrew, are often studied to aid in understanding portions of the Old Testament). Scholars have been doing this without computers for some time and have begun to do this type of analysis with software. I would imagine more sophisticated software would yield at least some new insights.