Did you benchmark Go vs C/C++ for parallel image processing ? I'm curious to see if there is actually a cost into switching to a higher lever language like Go.
I haven't done any formal benchmarks yet. At the moment the library is not fully optimized, just the low hanging fruit (optimization/bugfixing is the next stage).
But I will definitely post some benchmarking results soon, as they are crucial for the optimization stage.
By the way, any suggestions/contributions to the project are more than welcome!
All of the blur/ functions could see big improvements with some simple changes:
* Use separable kernels whenever possible (this reduces a O(K^2) evaluation to 2K evaluations)
Some filters (like your box filter) can be done efficiently as a linear combination of IID filters (i.e. implicitly computing a pair of running sums, subtracting one from the other as you go).
* The convolve/ package could use some cache blocking, and should definitely have the conditionals on the inner-loop removed.
Unfortunately, these changes will likely make your code a bit more difficult to read.
This is one of the reasons Halide has become so popular for image-processing (if you're interested in high performance image processing without sacrificing maintainability, Halide is definitely worth looking into!)
Thank you for the suggestions! The blur and convolution packages are definitely among the first things to be optimised, lots of other features could benefit from a much needed faster convolve function.
Halide looks very interesting, I found the "Halide Talk" video on their website to be a great primer on their methods.