My work might make me biased but I think there’s a lot of room to make more gene...

uoaei · on Jan 16, 2023

Julia has metaprogramming as a fundamental principle of the language. This makes for a very concise and powerful system that, together with a well-written AD framework like Zygote, makes every expression differentiable, meaning that effectively the entire language is differentiable.

And that isn't even the coolest thing about working in Julia: just wait til you see what people can squeeze out of macros.

pas · on Jan 16, 2023

so why it hasn't taken over the ML world already? or it did? or there are too many ML "researchers" who haven't bothered to improve their own tooling and are trapped in Anaconda?

staunton · on Jan 16, 2023

The Julia community is small and has no large commercial backers. Projects such as TF/PyTorch require community support and a lot of investment which Julia just doesn't have. In fact, Julia isn't even trying at the moment to "compete" with TF/PyTorch [1, 2].

[1] https://discourse.julialang.org/t/state-of-machine-learning-... [2] https://news.ycombinator.com/item?id=29682507

cbkeller · on Jan 18, 2023

I think this is the best answer, as someone who's been using the language for most of my work for a few years (but not in ML).

Flux.jl is probably the highest-profile relevant effort, but it's been (AFAIU) pretty much entirely volunteer-developed for the past 2+ years.

snovv_crash · on Jan 16, 2023

I've worked at 2 companies that would have liked to use Julia but it wasn't (and still isn't) product ready for anything involving high reliability or robustness.

geysersam · on Jan 16, 2023

Typical ML doesn't require an entirely differentiable language.

For applications that do need that (mainly "scientific machine learning") Julia has taken a considerable chunk of the market.

To be fair Pytorch is more convenient if you're not doing anything unusual. And Julias more flexible AD comes with some sharp edges.

martopix · on Jan 16, 2023

Pytorch can be used in a very general purpose way. It's essentially numpy + automatic differentiation + GPU support. All the 'linalg-like layers' are entirely optional. If you write y = A*x+b in pytorch, that works, and is differentiable.

wenc · on Jan 16, 2023

> make more general purpose code differentiable

Non-smooth functions (e.g. abs(x)) can be handled with bundle methods, but how would one make inherently discontinuous (non-convex) functions differentiable? (e.g. if x then 1 else 5)

Discrete problems are inherently non-differentiable. There are approaches like complementarity methods and switching functions (tanh) usually end up with numerical issues.

taylorius · on Jan 16, 2023

This already happens to an extent in existing ML pipelines. The ReLU activation function, is discontinuous in its derivative, and is one of the most widely used functions in neural networks. Its derivative looks like this.

if(i<0) return 0; else return 1;

Now ReLU is continuous itself (as well as being monotonic) so it still cooperates relatively well with gradient descent algorithms. I think this is where the problem lies - not with differentiability itself, but with gradient descent not working due to the highly non-convex search space that such general programming constructs will produce.

wenc · on Jan 16, 2023

ReLU is not discontinuous; it is nonsmooth but continuous, hence derivatives exist except at the hinge points.

Inherently discontinuous functions OTOH are disconnected and nonconvex. Gradient descent works, but you have to add a step to first partition the discrete space like branch and bound. This involves solving the continuous relaxation to find a bound. This does not require differentiability (it is not differentiable), but the price to pay is that it is combinatorial (NP hard)

The OP was talking about general differentiability but inherently discontinuous functions form a large and important class of functions (from software programming) that are not differentiable.

taylorius · on Jan 16, 2023

I could be wrong, but I think we're saying exactly the same thing. :-) I agree, anyway.

KKKKkkkk1 · on Jan 16, 2023

> Now ReLU is continuous itself (as well as being monotonic) so it still cooperates relatively well with gradient descent algorithms.

A function with a discontinuous derivative cannot cooperate with gradient descent algorithms. That's why you have the famous problem of "dead neurons".

taylorius · on Jan 16, 2023

Dead neurons do not come about because of ReLU's discontinuous gradient. It is because for large regions of parameter space (i<0) the gradient is 0.

KKKKkkkk1 · on Jan 16, 2023

> Dead neurons do not come about because of ReLU's discontinuous gradient. It is because for large regions of parameter space (i<0) the gradient is 0.

If they had a continuous derivative that would not be the case.

taylorius · on Jan 16, 2023

Imagine an alternative ReLU which had a narrow curved section, to smooth out the discontinuity. Now it has a continuous derivative, but that gradient is still zero for values < 0. This flat region is the cause of dead neurons, because backprop multiplies the propagated error by the gradient, to update the weight - and if the gradient is 0, the result of the multiplication is 0 , and the neuron's weights do not get adjusted.

A derivative can be continuous, and zero.

wnkrshm · on Jan 16, 2023

And then people go to empirical data and apply the Great Smoothing: By dropping ML/DL methods on the data (also including results from discontinuous behavior), continuity is often implicitly assumed.

CoastalCoder · on Jan 16, 2023

I think the concept you might be looking for is "piecewise differentiable".

laichzeit0 · on Jan 16, 2023

Or just use non-gradient based optimisation? Maybe I'm missing something.

geysersam · on Jan 16, 2023

That's usually really inefficient. But I agree that's the right direction (pun intended).

Question is: How can we extract similar amount of information about the problem without relying on derivatives?

kavalg · on Jan 16, 2023

Have you tried JAX?