Hacker News new | ask | show | jobs
Path-breaking Papers About Image Classification (blog.paralleldots.com)
69 points by parths 3223 days ago
9 comments

Some really cool information, but this concluding bit annoyed me:

> By Moore’s law, we will reach computing power of human brain by 2025 and all of the humanity by 2050.

Their graph does show exponential growth, but the data points cut off at the year 2000. Not surprising, given that Moore's law has reached its end in the last decade. ML improvements now depend upon better algorithms to make them more parallel, and the economies of scale which make more parallel computation units available. I don't think we're anywhere near that exponential graph, however, and we'll keep getting further from it.

Perhaps quantum computing will become a widespread reality and blow the field open, but I'm not holding my breath that it will happen in the next few decades.

> Their graph does show exponential growth, but the data points cut off at the year 2000. Not surprising, given that Moore's law has reached its end in the last decade.

I think the graph was originally produced for Ray Kurzweil's 1999 book "The Age of Spiritual Machines".

Which predicted we'd meet a bunch of milestones of general AI by 2009.

Why is this post about a specific machine-learning task even referencing Kurzweil, anyway?

Moore’s law is still alive and well, you just have to move over to the parallel architectures like GPU's.

Considering machine learning is all on GPU's and TPU's now, I think this is still a fair assessment.

Not really - since GPUs have started to hit the same "Process" size as CPUs - and haven't been showing a lot of growth in that area. The best improvement per Wikipedia's chart is a small foray into 14 and 12nm, and those haven't doubled the transistor counts (per square mm).

What we are seeing is an increase in die sizes; more parallel cores. Parallel cores still require parallel algorithms, so I stand by my earlier statement.

Latest NVidia GPUs are ~1.3-1.4 times faster, than previous gen with the same or lower power consumption.
Moore's law is about transistor count and not performance. Performance of those parallel architectures always get subject to Amdahl's rule of scaling.

This is partly why we see slowing scaling in performance and why you cannot just throw cores at the problem.

You're conflating things here; Moore's law is about how many transistors you can pack onto a chip, not performance. Moore's law is unsustainable due to laws of physics. Transistors are reaching size scales where quantum effects like tunneling dominate.

Think of a transistor as having two regions, separated by a channel. When the transistor is on, charge carriers flow through the channel between the regions. When off, charge carriers do not flow. But when we move to smaller and smaller length scales, the channel is so small that charge carriers will tunnel through and reach the other region. How will you distinguish on/off behavior now?

Todays transistors are still generally on a plane. If we could find ways to build IC's stacked thousands or millions of layers thick, we could really start using the third dimension.

("3d" transistors built on their side don't really count)

For CPUs/GPUs it probably wouldn't help that much, since you have to get rid of the heat produced by switching a transistor.

And for storage the main issue isn't how big it is, but how cheap it is to manufacture.

Maybe this is silly, but could you use redundancy and statistics to help?
specious extrapolations aside, the plot itself is deceptive -- exponential growth in a log-linear plot should be a straight line, not one accelerating upward super-exponentially.
GPUs still improve a lot year over year, so i think Moore's law still holds true for at least a few more years.
I believe you're comparing apples and oranges.

GPUs and multicore CPUs do not perform the same work as the single core CPUs that dominated before ~2003, when Moore's Law first slowed. By expanding the measure of speedup to include GPU/multicore, arguments like yours require not only a change in hardware but in benchmark code as well.

Longstanding general app benchmarks like SPEC emphasized everyday tasks that rarely benefit from parallelism, appropriately revealing the general ineffectiveness of adding GPUs or multicore to everyday apps. The only fair way to assess the impact of GPU/multicore is to continue using the same benchmark code as when mono-core CPUs reigned. When you do that, the value of adding GPU/multicore essentially disappears and the speedup of Moore's Law duly fades (again ~2003).

Thus until users begin to run deep learning code on their computer's GPU, AI-code won't speed up exponentially, nor continue with future GPU advancement. The hardware basis driving Kurzweil's Singularity has truly run out of steam.

The caption for the top graph appears a bit out of whack.

It states "exponential decline in top 5 error rate", the decline looks more like diminishing returns to me, especially if you push the 2017 data point out to where it should be (they've omitted 2016).

It's nice that the error rate is low, but the caption appears to oversell it.

This graph reminds me of a very closely related one I saw in a talk a few years ago [1]. It was showing decline in voice recognition error rates over time, with a highlighted band for "human performance".

The speaker, Roger Moore (the academic, not the actor, and not the Moore with the law), pointed out that this line, while encouraging, hid two important points.

1) For linear improvement, exponentially more training data was needed. 2) No insight into how living beings solve the same task.

These aren't necessarily fatal flaws, but they're worth remembering.

[1] https://www.youtube.com/watch?v=iYbVsvxd3bE

A more accurate idea of what a computer sees is actually that ML models figure out what parts of the signal to throw away and pay attention to. This is why you can slightly perturb the image so that humans see a picture of two hot dogs while an ML model can be confused into two different things (hot dog and an egg plant).
Sometimes I wonder why is the top-5 image classification task so difficult. If you are giving me 5 chances to look at an image and correctly classify it from ~1000 Imagenet classes, I can surely do better than 5-10% error rate.

Also, now that the top-5 error rate been brought down considerably, what is the next benchmark for the research community to beat? A new dataset, top-1 error rate on Imagenet?

A large majority of human errors come from fine-grained categories(such as correctly identifying two similar cat species) and class unawareness. I would recommend this article by Andrej Karpathy, where he talks about his learning from competing against GoogLeNet: http://karpathy.github.io/2014/09/02/what-i-learned-from-com...
That would be relatively low grade error. Specifically errors have to be valued and not just counted.
Does anyone have insight as to why they're still doing top 5? It seems to me like the error rates have dropped low enough that they could move on to top 3 or even single guess challenges. Is there data that shows how these same models perform in such tasks? Though I suppose, if I was motivated, all the needed tools are available to find out for myself.
Is densenet the one which won the best oaper award in CVPR this year?

And which framework would you recommend to code these in?

Yes! Facebook's Densenet won the best paper award in CVPR this year. I would recommend PyTorch framework to code these in as it extends the numpy, scipy ecosystem and is simpler to use.
I'll prefer utility over hype. One has to see how the community evolves around pytorch.
Squeeze and excitation network by momenta.ai has been a watershed moment for Chinese AI prowess and I'll watch out for such Chinese startups to dominate AI landscape for a while. What amuses me is why Google haven't participated in the last couple imagenets?
Imagenet as a competition is losing its importance ever since 2016. No idea like ResNet that is widely effective and inspiring from that year. I feel people just over engineered their network structure to claim the state of art by marginal gain.

Google since brought up their Neural Architecture search that can automatically design network, which I think is way ahead of rest of the competitors here.

Google has its own huge internal datasets for image classification. You can check for its mention in Chollet's ExceptionNet paper. That may be the reason why they are not really interested in working on imagenet.
If anyone is interested here are the official ILSVRC2017 results:

http://image-net.org/challenges/LSVRC/2017/results

It would be great if you can share the links to pretrained weights if the networks mentione here in python framework.