RetinaFace: Single-stage Dense Face Localisation, implemented in TensorFlow 2.0

Y	Hacker News new \| ask \| show \| jobs

	RetinaFace: Single-stage Dense Face Localisation, implemented in TensorFlow 2.0 (github.com)
	62 points by stan_btd 2129 days ago

5 comments

AndrewThrowaway 2129 days ago

Model | Easy | Medium | Hard

Mxnet | 96.5 | 95.6 | 90.4

Ours | 95.6 | 94.6 | 88.5

My professors would be so mad if I submitted data like this. I already can hear "What are the units? Seconds? So yours are by one second better? Error margin? Percent? So yours is worse?"

I get that this is a very specific information for a specific audience. People who stumble on this repo should know what is that.

However we can all be better at presenting our data.

link

egocodedinsol 2129 days ago

It says mAP right above the table. If you know what mAP is then you know the units.

Professors will rail against imprecision to undergrads and young graduate students but they use it all the time in real life.

I can already hear my professors saying "stop commenting on style, what do you have to say about substance?"

link

stan_btd 2129 days ago

as stated in the repo, its "mAP result values"

https://medium.com/@jonathan_hui/map-mean-average-precision-...

link

AndrewThrowaway 2129 days ago

Don't get me wrong. I totally get it. But my professor is saying:

"So this is precision? 0 to 1? 1 being totally accurate? And you got 96.5? I assume it is percentage?"

link

nl 2128 days ago

It says "mAP result values on the WIDERFACE validation dataset:"

If your professor is working in on object detection they know what mAP is - all the major datasets use it as their standard evaluation criteria.

link

AndrewThrowaway 2128 days ago

So is it mAP of 0.956 or 95.6? Why not 956?

link

nl 2128 days ago

Not to be rude, but if you can't work that out you shouldn't be working on this. I'm sure the author would take a pull request, but but it really sounds like you are nitpicking.

Figures like precision and recall are often expressed either as 66% or 0.66. Confusion really isn't that big a problem.

link

symisc_devel 2129 days ago

The algorithm have been already shipped within the release of the PixLab Rest APIs 1.9.72: https://blog.pixlab.io/2020/08/pixlab-api-1972-released

Note however that Retina does not support real-time performance on the CPU especially on IoT devices and web browsers (WebAssembly). That's why we opted for a standard cascade approach for our WebAssembly port: https://sod.pixlab.io/articles/porting-c-face-detector-webas...

link

codetrotter 2129 days ago

I couldn’t find any license text in the repo.

Would you consider using an open source license like for example the ISC license?

https://choosealicense.com/licenses/isc/

link

stan_btd 2129 days ago

Of course, done !

link

codetrotter 2129 days ago

Great, thank you :)

link

cheez 2129 days ago

Not an expert, why is it called state of the art if its accuracy is worse than mxnet, whatever that is

link

stan_btd 2129 days ago

The algorithm is state of the art. There are several implementations of the algorithm, the original is with the mxnet framework, and my implementation with tensorflow framework has a slightly lower accuracy.

link

cheez 2129 days ago

Thanks! What makes it state of the art and why is lower accuracy acceptable?

link

nl 2129 days ago

> What makes it state of the art

Generally it means something roughly like "the best known approach for this specific problem". Often it means "the best known approach for this specific dataset" (eg "SOTA on ImageNet").

> why is lower accuracy acceptable?

Lower accuracy is worse, but these numbers look close enough that it's probably acceptable for most people.

There are plenty of environments where TF is preferable to MXNet (eg, you have TPUs/want to use TFLite on mobile/want to slice the model weights up and use it for your own custom TF model).

It's probably lower accuracy because it wasn't trained as long. Those extra couple of points could take days (or more) of training.

link

cheez 2129 days ago

Awesome, thanks for the explanation!

link

stan_btd 2129 days ago

The paper that presents this algorithm has the best known accuracy on the widerface dataset, which is why it is called state of the art. The authors of the paper published an implementation of this algorithm based on mxnet, but a lot people and companies use tensorflow instead of mxnet in their work, so just using the mxnet implemenation is not an option. Thats why I converted it to TF, with a slight decrease in accuracy on widerface. Then what is "acceptable" depends on your judgement, but the widerface dataset is extremely challenging, with many pictures having hundreds of small faces. In some of these pictures my implementation will miss a few faces, or find them but with lower probability. Overall for the vast majority of face detection applications, the two implementations will yield extremely similar results. I havent yet found a picture with a few dozen faces where the TF implementation performed not as well as the original one !

link

cheez 2129 days ago

Got it. I understand now. Don't be offended at my use of "acceptable", it was used in ignorance.

link

stan_btd 2129 days ago

no offense taken at all !

link

marstall 2129 days ago

I feel much safer now.

link