| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by PaulHoule 777 days ago

Some of the SBERT models now are based on T5 and newer architectures so there's not. The FlagEmbedding model that the author uses

https://huggingface.co/BAAI/bge-base-en-v1.5

is described as an "LLM" by the people who created it. It can be used in the SBERT framework.

I tried quite a few models for my RSS feed recommender (applied after taking the embedding) and SVM came out ahead of everything else. Maybe with parameter tuning XGBoost would do better but it was not a winner for me.

If you look at the literature

https://arxiv.org/abs/2405.00704

you find that the fashionable LLMs are not world-beating at many tasks and actually you can do very well at sentiment analysis applying the LSTM to unpooled BERT output.

1 comments

Karrot_Kream 776 days ago

> Some of the SBERT models now are based on T5 and newer architectures so there's not. The FlagEmbedding model that the author uses

Oh thanks! Right I had heard about T5 based embeddings but didn't realize it was basically an LLM.

> I tried quite a few models for my RSS feed recommender (applied after taking the embedding) and SVM came out ahead of everything else. Maybe with parameter tuning XGBoost would do better but it was not a winner for me.

XGBoost worked the best for me but maybe I should retry with other techniques.

> you find that the fashionable LLMs are not world-beating at many tasks and actually you can do very well at sentiment analysis applying the LSTM to unpooled BERT output.

Definitely. Use the right tool for the right job. LLMs are probably massive overkill here. My non-LLM based embeddings work just fine for my own recommender so shrug.

PaulHoule 776 days ago

Are you applying an embedding to titles on HN, comment full-text or something else?

When it comes to titles I have a model that gets an AUC around 0.62 predicting if an article will get >10 votes and a much better one (AUC 0.72 or so) that predicts if an article that got > 10 votes will get a comment/vote ratio > 0.5, which is roughly the median. Both of these are bag-of-words and didn't improve when using an embedding. If I go back to that problem I'm expecting to try some kind of stacking (e.g. there are enough New York Times articles submitted to HN that I can train a model just for NYT articles.)

Also I have heard the sentiment that "BERT is not an LLM" a lot from commenters on HN a lot but every expert source I've seen seems to treat BERT as an LLM. It is in this category in Wikipedia for instance

https://en.wikipedia.org/wiki/Category:Large_language_models

and

https://www.google.com/search?client=firefox-b-1-e&q=is+bert...

gives an affirmative answer in 8 cases out of 10, one of which denies it is a language model at all on a technicality that has since been overthrown.