There is no such thing as "best possible model, full stop". Models are always context dependent, have implicit or explicit assumptions about what is signal and what is noise, have different performance characteristics in training or execution. Choosing the "best" model for your task is a form of hyperparameter optimization in itself.
Plenty of places use DL models, even if it's just a component of their stack. I would guess that that gradient-boosted trees are more common in applications, though.
Still mostly NLP and image stuff. Most actual data in the wild is tabular - which GBTs are usually some combination of better and easier. In some circumstances, NN can still work well in tabular problems with the right feature engineering or model stacking.
They are also more attractive for streaming data. Tree-based models can't learn incrementally. They have to be retrained from scratch each time.
ML is very good at figuring out stuff like every day at 22:00 this asset goes up if this another asset is not at a daily maximum and the volatility of the market is low.
You might call this overfitting/noise/.... but if you do it carefully it's profitable.
Real-time parsing of incoming news events and live scanning of internet news sites - coupled with sentiment analysis. Latency is an interesting challenge in that space.
Multiple parts of the iPhone stack run DL models locally on your phone. They even added hardware acceleration to the camera because most of the picture quality upgrades is software rather than hardware.