Sure. I'm switching this conversation to email though, using the gmail account in your profile. Short version is, I trained it on the RDKit-generated SMILES strings from ChEMBL-20. Three of the strings look like this:
I wish you wouldn't do that. That defeats the entire point of a website such as this. Just because you don't think that this is interesting to random people doesn't mean that random people don't think this is interesting.
The email I sent was 178K long, with 1000 real-world examples (to get an idea of the character distribution), and the .h file model generated by shoco on the entire data set.
Assuming that bmh100 is both interested in working on this and doesn't have the domain knowledge, I gave a synopsis of the SMILES notation, its use as a molecule identifier, a way to reproduce my data set, and a couple of possible alternatives for getting something similar. (Each method requiring less domain knowledge and more CS experience.)
This this is a big chunk to chew on, and this is the weekend, I figure it will take a few days to digest and be able to response. Since HN doesn't have notifications, how long should I actively check this thread for replies?
By sending email, I also invite a response after a couple of months, should that be the case. (I yesterday got a followup on a topic that was 4 years old.) So no, supporting these long-term research exchanges is not one of the main goals of HN.
You'll note that I also answered what bmh100 asked for here. If you find it interesting, then feel feel to ask interesting questions.
If it's not confidential (and I am assuming it isn't), why not just link it in a gist or something? That way other people can also take a crack at it.
Among other things, "a synopsis of the SMILES notation, its use as a molecule identifier, a way to reproduce my data set, and a couple of possible alternatives for getting something similar" is something I would be interested in. And, considering the upvotes I got for my grandparent comment, something that other people would be interested in as well.
I do not like "a gist or something" and have used such services only a handful of times. I dislike how they decontextualize the conversation and how they require trust in an additional resource. Eg, when I come across a gist during a web search, it's hard to figure out the point.
By comparison, an email provides the full context, and is easier to integrate into a workflow. For example, I can drag an attachment directly into my editor. A gist requires additional steps.
Regarding hnnotify.com, I enjoy the ability to let go of most HN threads after a couple of days. This thread one of a handful of exceptions. Can I really subscribe to one-and-only-one thread? I don't see that's it's worthwhile to set up a third-party account and active the service for a rare event. In any case, if it takes a month for bmh100 to evaluate the code then the HN thread will be closed, so there's only a narrow window for which this service is useful.
I do not share your optimism in the random contributions of others. To start, it's not like I haven't talked about this before. See https://bitbucket.org/dalke/smilez and http://www.dalkescientific.com/writings/diary/archive/2007/0... (under "Compressing SMILES") for two examples. Have I gotten any feedback about them? No. So why put more effort into hoping for a one-in-a-million event, which is what you suggest, instead of optimizing the chance of getting a followup from someone who specifically expressed interest? Experience says that I should optimize for the latter.
What is your interest in the SMILES notation that can't be resolved through https://en.wikipedia.org/wiki/Simplified_molecular-input_lin... ? I would be glad to tell you more. I have worked with different aspects of SMILES for over 15 years and co-authored the OpenSMILES specification. I have also written many blog posts about different aspects of how to work with SMILES. And gotten few followups.
What skill set do you have, that I might tailor a response? Are you comfortable installing from source, do you prefer one of the GNU/Linux packaging systems, or Mac/homebrew? Or are you happiest with extracting data from a database dump? My 'synopsis .. of possible alternatives' was more an offer to follow up on any of those options, but was of itself incomplete. It works because email has the implied statement that I will respond to further questions.
If you don't have specific interest, are more generically wanting to be informed, then perhaps you can understand why I would prefer to use other mechanism, like my blog posts, which are more likely to get the kinds of responses I'm looking for than spending time tuning an off-topic HN comment.