I hadn't read it before! That's a fascinating result, actually. They emphasize interpretability in the paper, but I find it more interesting that you can do so well with only local information.
My first thought is that it makes sense that averaging together a bunch of local predictions would work well on the ImageNet task, since the different classes tend to have obviously different local textures, and class-relevant information makes up a large part of the image. I would be very curious to see if the technique is as competitive for other tasks.
Yeah, it seems like it would be useful for debugging to replace some part of the architecture with a simple linear sum and see if it does just about as well?
I come from Deep reinforcement learning. When considering simulated environments (such as AlphaZero, AlphaStar), can
feature engineering dramatically improve the cpu-requirement or sample-efficiency ?
Or are low-level features the "easiest" part for the network to learn?
Edit1 : I understand of course the academic purity of working from raw data.
Edit2: so simulated means lots of samples, on policy learning, but also very cpu intensive.
I think if you have a small to medium sized dataset of images or text, deep feature extraction would be the first thing I'd try.
I'm not sure what the most interesting problems with that property are. Maybe making specialized classifiers for people based on personal labeling? I've always wanted e.g. a twitter filter that excludes specifically the tweets that I don't want to read from my stream.
One problem that intrigues me is Chinese-to-English machine translation. Specifically for a subset of Chinese Martial Arts novels (especially given there's plenty of human translated versions to work with).
So Google/Bing/etc have their own pre-trained models for translations.
How would I access that in order to develop my own refinement w/ the domain specific dataset I put together?
I don't think you could get access to the actual models that are being used to run e.g. Google Translate, but if you just want a big pretrained model as a starting point, their research departments release things pretty frequently.
For example, https://github.com/google-research/bert (the multilingual model) might be a pretty good starting point for a translator. It will probably still be a lot of work to get it hooked up to a decoder and trained, though.
There's probably a better pretrained model out there specifically for translation, but I'm not sure where you'd find it.
IMHO (deep) feature engineering is important in these cases:
o the lower the level of representation the more important it is to increase the level of abstraction by learning or defining manually new features
o in the presence of (fine-grained) raster (automated) feature engineering is especially important. Therefore, feature engineering is important in audio analysis (1d raster) and video analysis (2d raster).
I don't work with time series data much myself. I would imagine you can get at least some transfer learning, since there are patterns that show up across different domains. It looks like there's been a little bit of work done on this: https://arxiv.org/pdf/1811.01533.pdf .
According to them, transfer learning can improve a time series model if you pick the right dataset to transfer from, but they don't seem to be getting the same unbelievably strong transfer results that you'd see on images and text.
Considering the rate of change in this field, what would be beneficial to learn for people who don't actually get to use machine learning in their day to day job? I'd love to dive in and learn more about machine learning but I don't want to waste time learning something that will be totally irrelevant in a couple years.
https://openreview.net/forum?id=SkfMWhAqYQ