Hacker News new | ask | show | jobs
by k8si 1496 days ago
Language models as a solution to what problems?

Yes, you can easily use AutoModel.from_pretrained('bert-base-uncased') to convert some text into a vector of floats. What then?

What are the properties of downstream (aka actually useful) datasets that might make few-shot transfer difficult or easy? How much data do your users need to provide to get a useful classifier/tagger/etc. for their problem domain?

Why do seemingly-minor perturbations like typos or concating a few numbers result in major differences in representations, and how do you detect/test/mitigate this to ensure model behavior doesn't result weird downstream system behavior?

How do you train a dialog system to map 'I'm good, thanks' to 'no'? How do you train a sentiment classifier learn from contextual/pragmatic cues rather than purely lexical ones (example: 'I hate to say it but this product solves all my problems.' - positive or negative sentiment?)

How bad is the user experience of your Arabic-speaking customers compared to that of your English-speaking customers, and what can you do to measure this and fix it?

My linguistics background really helps me think through a lot of these 'applied' NLP problems. Knowing how to make matmuls fast on GPUs and knowing exactly how multihead self-attention works is definitely useful too, but that's only one piece of building systems with NLP components.

1 comments

> My linguistics background really helps me think through a lot of these 'applied' NLP problems.

There many benchmarks where LMs absolutely outperform mechanical linguistics solutions.

Do you have success stories when there is significant outperforming solution in opposite direction?

There's no competition between linguistics and ML/NLP, they have completely different goals as fields.

I meant that my linguistics background helps me understand & solve problems: studying linguistic field work has helped me design crowd labeling jobs, knowing about morphology helps me understand why BPE tokenizers work so well (and when they might not), knowing about syntax/dominant word order makes me think that multilingual Bert should probably do something more intelligent with positional embeddings, methods from psycholinguistics are useful for understanding entropy/surprisal wrt LM next-word probabilities... just a few examples but the list could go on.