Hacker News new | ask | show | jobs
by sgt101 2490 days ago
If anyone wants to use these tools practically I urge you to have a good look at this paper : https://www.aclweb.org/anthology/P19-1439/

My take away - pretraining achieves excellent paper results but robust application is hard. There is still quite a way to go down this road for fault intolerant users and applications.

1 comments

I'm not an expert, but after playing with some pre-trained transformers I think they are mostly good at the exact thing they're trained for. eg. GPT-2 is great for text generation, but if you try to use it for say translation, it will tend to add imagined details not in the source text. Similarly, BERT is great at sequence and token-level classification but quite bad at text generation.