|
|
|
|
|
by dalke
3968 days ago
|
|
Sure. I'm switching this conversation to email though, using the gmail account in your profile. Short version is, I trained it on the RDKit-generated SMILES strings from ChEMBL-20. Three of the strings look like this: CC(C)=CCC/C(C)=C/C=C/C(=O)N1CCCC1
CC(=O)NC(C(=O)N1CCSCC1)[C@H]1CC(C(=O)O)C[C@@H]1N=C(N)N
O=C(CC(c1ccc(F)cc1)(c1ccc(F)cc1)c1ccc(F)cc1)N1C[C@H](O)C[C@H]1C(=O)N1CCC[C@@H]1C(=O)NC[C@@H]1CCCNC1
On the raw data set (on record per line), wc reports: 1455763 1455763 82882385
while | gzip -c | wc -c reports 18773892. |
|
I wish you wouldn't do that. That defeats the entire point of a website such as this. Just because you don't think that this is interesting to random people doesn't mean that random people don't think this is interesting.