Hacker News new | ask | show | jobs
by hobofan 2387 days ago
> fully reproducible methods in chemistry experiments

Top-tier open source libraries for cheminformatics (or other natural science -informatics flavours) would already be a welcome start.

2 comments

What do you think is missing in the current offering (OpenBabel, RDKIT, maybe some other I am missing)?

Context: I do research in computational chemistry, and write an open source library for this, that could be used for cheminformatics too. I don't really know what is needed for this though, since I never touched cheminformatics.

I've dabbled a bit with OpenBabel and RDKIT, but I found their interfaces especially for simple things (traversing atoms/bonds in a molecule) quite unwieldy. I suspect that a big part of this could "just" be missing documentation / tutorials to get into it.

Maybe I'm just not deep enough into it, but from my impression so far especially when it comes to application-level software (in contrast to specialized research), OEChem and similar closed source libraries seem to be the most widely used ones, with nothing quite comparable available.

Context: Software Engineer that is also currently a biochemistry undergraduate.

> I found their interfaces especially for simple things (traversing atoms/bonds in a molecule) quite unwieldy.

Somehow the same for me, this is part of why I started my own project (http://chemfiles.org). I have the impression that for cheminformatics you want to see molecules as graphes, is this true or is a list of bonds enough for usual purposes?

I have heard of OEChem but never used it. I'll try to find some documentation to have a look.

> I have the impression that for cheminformatics you want to see molecules as graphes

Yeah, that was my thinking.

I've also seen your work on lumol, so you seem to be one of the few people working in the field with Rust! I just recently started writing a SMILES parser in Rust[0], as a first step towards an in-memory graph representation of molecules. I have a first rough draft of that locally, though it's very rough and changing a lot, as I have to adjust it weekly as I'm basically learning the required theory at the same time :D

[0]: https://github.com/hobofan/smiles-parser

Published chemical syntheses are described in very great detail. Physical chemistry/spectroscopy papers likewise describe apparatus, collection, and analysis often down to the nuts and bolts. I don't see how to open source work requiring a femtosecond mid-infrared laser or a prep requiring a synthesis lab with all the reagents, labware, and safety equipment. Buried in the open source PR is the unshakeable underlying belief that science begins when the data are in the can and ready for analysis.
> I don't see how to open source work requiring a femtosecond mid-infrared laser or a prep requiring a synthesis lab with all the reagents, labware, and safety equipment.

You can put text documents on GitHub describing process, in the same way as you can code and data. If you have some setup with a femtosecond mid-infrared laser or prep requiring a synthesis lab with all the reagents, labware, and safety equipment you can open source the bill of parts, the build instructions and the lab book. It'd probably be very valuable to do that so please do!

Here are the freely available supplemental data to a paper in the Journal of the American Chemical Society blending organic synthesis, computation, and spectral characterization. 122 pages of exquisite details from a multi-lab collaboration. Lots more like it out there.

Note: I am not in any way affiliated with this research or the labs involved. This came out of a quick search.

https://pubs.acs.org/doi/suppl/10.1021/jacs.6b13031/suppl_fi...

Doesn’t that prove my point? I know people post their artefacts. I often review them. Not sure what you’re trying to say?
Reproducing that paper will be very difficult even though all the information is out there. There is a world of science outside of data processing.
I think most people who have worked in science long enough realize that publications are not even minimum-viable: they often omit absolutely necessary information. Sometimes this is intentional, but most of the time, it's just assumed that the reproducer is working in a world-class lab and gets advice/help to implement state-of-the-art work.
> There is a world of science outside of data processing.

Yes, but that's documented in lab books and procedure documents. Or at least it should be! If it isn't, how are they able to explain their own research? And those can be open sourced.