| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by gillesjacobs 716 days ago

This is entirely unsurprising and in-line with the finding that even small specialized models do better in information extraction and text classification. So no wonder finetuned large LMs do good too.

Personally, my PhD did fine grained ACE-like event and sentiment extraction and "small" specialized finetuned transformers outperformed prompting LLMs like BERT and Roberta-large. Would love to see an inclusion of small model scores with some sota pipelines.

This is great work anyway even if it replicates known results!

2 comments

renegade-otter 716 days ago

The caveat here is that if you don't know how to create good specialized models - you are just wasting everyone't time and money:

https://www.threads.net/@ethan_mollick/post/C46AfItO8RS?hl=e...

link

gillesjacobs 716 days ago

Exactly, BloombergGPT performed worse on financial sentiment analysis then much smaller fine-tuned Bert-based models.

For many extractive tasks BloombergGPT was quite disappointing. A 5-10% performance hit with much larger inference cost compared to smaller models is not desirable.

But the research investment for Bloomberg makes sense to take the risk: a do-it-all generative model can mean significant complexity reduction in maintenance and deployment overhead.

It didn't directly pay off for many extractive tasks, but I bet they're iterating. Bloomberg has the data moat and the business needs in their core products to make it worthwhile.

link

pandatigox 716 days ago

Your thesis sounds interesting! Do you have a link to it by any chance?

link

gillesjacobs 716 days ago

rovr beat me to it below. Here are more links: https://jacobsgill.es/phdobtained (fun fact: because my thesis contains published papers, I am in breach of a few journal's copyright by uploading my own thesis pdf, but fuck'em).

LLM approaches were evaluated on my own time and but published (I left research after obtaining my PhD).

link

SpaceManNabs 716 days ago

> because my thesis contains published papers, ..., but f 'em

Excluding the part in the middle because I don't wanna repost potential issues for you. I just wanted to comment that that is terrible. People often talk about the siloed nature of research in industry, without considering that academia supports the draconian publishing system. I understand IP protection, but IP protection doesn't have to mean no access. This is such a huge issue in the bio- world (biostats, genetics, etc).

link

uolmir 716 days ago

I don't know your circumstances but often you retain the right to distribute a "post print", ie the final text as published but absent journal formatting. A dissertation should fit that definition.

link

gillesjacobs 716 days ago

This is indeed often the case, however, my university reviews each thesis, and deemed it can only change to open access in 2026 (+5 years from defense).

I think this is default policy for thesis based on publication agreements here.

In any case, I am not too worried.

link

pandatigox 716 days ago

Thank you for the link! And congratulations on obtaining your PhD

I have skimmed through it and it's truly amazing how good annotation of the dataset can lead to impressive results.

I apologise in advance if the question seems ignorant: The blog post talked about fine-tuning models online. Given that BERT models can run comfortably on even iPhone hardware, were you able to finetune your models locally or did you have to do it online too? If so, are there any products that you recommend?

link

gillesjacobs 716 days ago

Thanks! The fine-tunes where done in 2019-21 on a 4xV100 server with hyperparameter search, so thousands of individual fine-tuned models were trained in the end. I used weights and biased for experiment dashboarding the hyperparam search, but the hardware was our own GPU server (no cloud service used).

I doubt you can fine-tune BERT-large on a phone. A quantized, inference optimised pipeline can be leaps and bounds more efficient and is not comparable with the huggingface training pipelines on full models I did at the time. For non-adapter based training you're going to need GPUs ideally.

link

Mockapapella 716 days ago

This is really cool -- thanks for posting it! I'll have to skim through it at some point since a lot of my work is in classifications models and mirrors the results you've seen

link

rovr138 716 days ago

Check https://www.researchgate.net/publication/356873749_Extractin...

link

wuschel 716 days ago

Seconded! Any URI to your PhD?

link

rovr138 716 days ago

Check https://www.researchgate.net/publication/356873749_Extractin...

link