I see from the MTCNN code that this repo (like all others I've seen) is still bouncing tensors between GPU and CPU while passing between the P/R/ONets.
So many ML repos make this mistake in pre/post-processing and end up bottlenecked on CPU.
Anyone know of an MTCNN that's been ported to run more or less fully on GPU? (Or even that does batching instead of an image-by-image approach?)
I'm not aware of any implementation with these features, but they are both on the roadmap for the linked repo. Both should also be achievable. Batch processing, in particular, will be a straight-forward change and should result in quite a speed-up. Although it will require the input images to have the same dimensions.
That's a good repo. It uses mxnet right? The aim of the repo in the topic was mainly to provide a clean implementation that could slot easily into an existing pytorch workflow.
Thanks for sharing. I'm working on face recognition with homomorphic encryption, therefore without compromising the user privacy. The bold goal is the first privacy preserving videocamera.
If you find this interesting, I would love to chat about it.
So many ML repos make this mistake in pre/post-processing and end up bottlenecked on CPU.
Anyone know of an MTCNN that's been ported to run more or less fully on GPU? (Or even that does batching instead of an image-by-image approach?)