Hacker News new | ask | show | jobs
by vardhanw 1998 days ago
It was interesting to also read his previous article in "Is biology too complex to understand" [1], where he surmises that after the initial "physicsization" of the discovery process in biology, where you came up with the whole DNA, RNA, amino acids (codons), ribosome, protein and related discoveries, there is a lot of complexity left undiscovered and which may not have simplified/abstract solutions/theorems a la physics. This is exemplified with the reported 100+ articles per hour in pub med, which cannot be expected to be understood by a human (or groups of humans) to any significant extent to extend the boundaries of our understanding in a fundamental sense and the hope is that machines (ML) may help us in getting a better send of the discovery landscape and provide insights we may not be able to divine ourselves.

Which leads the the (my) question - what is the current level of development in ML theory & practice to "understand" a particular set of research articles to create a "knowledge database" which can then be used to ask questions about it or relate the consisting articles etc.I know some basic research in NLP like topic modeling, question answering, summarization, information extraction, etc. and perhaps some sort of causal reasoning can be applied, but is there enough progress in this so as to start meeting the goals he wishes for - i.e to be able to advance science by machine processing of research articles as an aid for further insights and research?

[1] https://berthub.eu/articles/posts/biologists-physics-envy/

1 comments

How many git repos are published per hour to github? It probably doesn't matter, practically speaking, for you to build cool things. Similar to the results from pubmed.

For a significant amount of synthetic biology, you don't really need NLP or anything. You literally just need to get good at parsing a bunch of XML and text databases, sprinkled with a little bit of data dumps + ML, to be better than the vast majority of engineering done in the field right now.

(Genbank + Uniprot + Rhea -> SQL database you can do some intense things with)