| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by maziyar 81 days ago
	full article: https://huggingface.co/blog/OpenMed/training-mrna-models-25-...

2 comments

pfisherman 78 days ago

Nice work! Here is an article you may find helpful if you have not already come across it.[0]. You may also want to consider benchmarking against some non ML methods.[1]

0. https://pubmed.ncbi.nlm.nih.gov/35318324/

1. https://www.nature.com/articles/s41586-023-06127-z

link

xyz100 78 days ago

What makes this dataset or problem worth solving compared to other health datasets? Would the results on this task be broadly useful to health?

link

CyberDildonics 78 days ago

What other "datasets" are you talking about? How do you "solve a dataset" ?

link

xyz100 78 days ago

You solve a dataset when you learn what there is to learn about the phenomenon of interest. The limit of such phenomenon is “cure all disease”, and clearly this is not solving that.

link

CyberDildonics 77 days ago

What are you talking about? "the phenomenon of interest"? There is nothing you wrote in either comment that makes sense.

What is a "dataset" that has been "solved" and what did the program do that 'solved' it?

link

xyz100 77 days ago

MNIST (the number classification task) has been “solved” a billion times and it is hard to imagine any subsequent advances there as scores using a variety of methods have hit the saturation point of accuracy. Any further improvements are likely overfitting to noise. Therefore, we know that it is easy to detect handwritten numbers. However, we may not know how to detect other things as well, like reading an MRI. Those datasets/tasks are clearly different and require different techniques. Training an LLM is likewise different.

link

CyberDildonics 77 days ago

has been “solved” a billion times

If it was really solved, wouldn't it just need to happen once?

You think classifying handwriting of 10 numbers is the same as this that took 55 hours of GPU time for someone to go through?

I have no idea what point you're trying to make and I can't tell if you do either. You were talking about "solving" other "health datasets" but you can't even come up with one or what that means.

link