Hacker Newsnew | past | comments | ask | show | jobs | submit | oddity's commentslogin

It's a fun observation, but the causality in this feels off. There are tons of esoteric programming languages that are difficult to write in and no one talks about because they have no other value. BF is not very useful for writing things in, but it has some value in being easy to implement. Would anyone sane genuinely encourage others to write something nontrivial in BF? What person looks at BF and says "ah, yes, this is a language I might like to write my next spreadsheet app and networking library in, let me play around in it for a bit to see if it might work", much less follow up with "oh no, I got distracted reimplementing it instead"? On the other hand, if I'm implementing an interpreter for fun, I'm going to pick something with a nice spec and my ability to actually use it is irrelevant. BF is great for implementing a spec exactly such that all existing tests and benchmarks just work.

I think the more likely scenario in a counterfactual world where Shen and Forth are difficult to implement is not that we have more things written in Shen and Forth, but that we have strictly fewer reasons to talk about Shen and Forth.

I am not convinced that the group that builds implementations and the group that builds the kinds of things that beget ecosystems for anyone other than compiler developers actually have much overlap. I do, however, think there is a strong overlap between the group that builds implementations and terminal-stage new-language addicts. An ecosystem for other things will maybe happen later if there's some other value to the language.

Where I think the article does hit on something is slightly less direct: users are more tolerant of a messy specification than implementors. C, C++, Python, Java and all the other languages that people actually use are ugly as sin at a semantic level, and yet we get by with copious handwaving. There are many problems, of course, with some of the handwaving users do to reason about these languages (hello, many warts of C), and generally, a preference for spec a user can keep in their head is a big productivity win, but if you want a language that people can use, you can afford to make your language a bit more complicated.

TL;DR: idk, make your pretty language with a beautiful spec if you want. Most languages are unused anyway. But there probably are diminishing usability returns for maximal spec beauty. If you care about that kind of thing, that is.


I think, maybe, the stronger argument is that the area of Generative Programming (which covers just about anything that generates another program) itself is too broad to properly address in a semester course. This course is particularly focused on metaprogramming, where ML has less relevance. This is to be expected, given the instructor.

It's fairly common practice for the title of the course to be far more broad than the topic. See: "Theory of Computing" courses which spend all their time on complexity classes and never mention automata, or graph theory courses which don't ever mention monadic second order logic or spectral graph theory. Decisions have to be made about what to keep and what to cut, at some point.

I spend a lot of time reading papers at the intersection of ML & PL, so I'm a bit sad, personally, but I don't think it's fair to say the course is out of date. Rather, this is just a sign of the known world getting bigger.


Yeah, I would hard disagree. It's a disservice to students to teach this course as is.

Students need to get up to date on a topic so that they can do research if they want. Teaching a course that's 10-20 years out of date is a serious problem. They won't know what questions people ask now, they won't know any of the current players, etc.

This isn't a generic ToC course. This is supposed to be a survey of research. Nowhere does it make it clear that this is vastly outdated research.


> I spend a lot of time reading papers at the intersection of ML & PL, so I'm a bit sad, personally, but I don't think it's fair to say the course is out of date. Rather, this is just a sign of the known world getting bigger.

Any recommendations to get started?


I was fortunate to learn this from pointless internet flamewars early in my life. Understand why you're in an argument and what you hope to accomplish by being in an argument. On the internet, it is usually very clear that you will accomplish nothing except maybe introduce receptive onlookers to a new idea, so the choice to minimize engagement is easy.

I think applying this has generally made me more successful outside of the Internet, too, by being more conscious about how I approach conflict. Unfortunately, in the less pseudonymous world where preconceptions and reputation have more weight, the advice also holds, but the calculus is a big mess. Arguments can have only downside risk, but you don't have the option to disengage.


> On the internet, it is usually very clear that you will accomplish nothing except maybe introduce receptive onlookers to a new idea, so the choice to minimize engagement is easy.

I'm going to disagree here. I've learned quite a lot from internet arguments. Admittedly, the majority of it was from attempting to argue positions I don't hold.

You're taking it as a given that the only goal of an argument is to convince someone, and if you can't do that, then there is no point. But there can be value in making an argument specifically in bettering your own understanding of a point.

With all that said, I will admit that some random news site's comments probably aren't the best place to do that.


You're not wrong. You can think of internet arguments as a method of developing a thought and collecting counterarguments in a real environment to refine it. The choice of forum affects the quality of the counterargument though, and developing a thought can be done with a crowd more likely to give constructive feedback. If some random website works for you, that's fine.

My statement is more that if you're interested in external impact, exposing someone to a new idea should be treated as the most likely outcome, so assess your effort spent accordingly. I've gotten a lot of value out of reading the different sides of other people's arguments, but I know I personally wouldn't have wanted to spend time being part of it.


I can't remember the exact origins of it, but I think it was a tweet that was essentially:

No one will remember who won an argument on the internet, they'll just remember that everyone involved is the kind of person who gets in arguments on the internet.

I'm not even sure that's really limited to just the internet either - outside of very niche cases, you don't win any argument, you just become known as someone who likes to argue, and thus you need to decide if this is a venue you want to become known as someone who argues about things. There's venues that may be good, wholesome, and useful, but it's been an axiom that's definitely stopped me from contributing to a lot of arguments.


> I've gotten a lot of value out of reading the different sides of other people's arguments, but I know I personally wouldn't have wanted to spend time being part of it.

The problem with discussions on the internet is that there's a lot of ignorance and ideology out there and a lack of interest in applying a modicum of thought, research and critical thinking. People know what they know because they know what they know, and that's the end of it. This can be incredibly frustrating to those of us who make an attempt to live in some kind of an objective realty, where arguments from reason and facts are considered and respected.

On the internet, the earth can be flat and there is no amount of math and science you can deploy that will convince a certain audience otherwise.

The only exception to this rule are narrowly focused professional discussion groups. HN fits this description ONLY when the topic at hand tends to be in the realm of specific engineering disciplines. In a wide range of other areas, including technical areas, discussions on HN can be just as dumb and pointless as almost anywhere else.

I've been using online forums of various kinds of four decades. You could find good, bad and ugly on USENET and the same is true today in various forms. The only thing that has changed is that the size of the audience and the reach of the nonsense have expanded.

Relevant:

https://www.cnbc.com/2023/01/19/yale-researchers-how-highly-...


How is this any different from arguing in person? You state 'on the internet' as if it is some kind of exception and makes "People know what they know because they know what they know" a fact that exists only there.

What you neglect totally are these:

1. this can be fact of arguments anywhere, and the key to having a decent one is for all members to act in good faith

2. the internet affords the ability to find sources quickly and quote them, unlike non-internet (especially before smart phones) where it is 'who can say things most convincingly or plead to already existing biases' (which of course does happen, but at least opposition has a chance)

3. on the internet you are not arguing with one person, you are holding a public debate which is forever etched into history for anyone to read


Simple. Most people do not behave as they do online when in person. Or, put a different way, the online physical gap enables behaviors and interactions not often seen in person.

And, yes, in real life you better know what you are talking about. You can’t google yourself out of it, particularly in professional settings.


> there can be value in making an argument specifically in bettering your own understanding of a point

If you want someone to educate you, the polite thing to do is ask. This kind of "stealth learning" where you say things you don't believe, expecting to learn via corrections, is inconsiderate and inefficient. I know people who have lost practically all of their friends because they couldn't bring themselves to acknowledge anyone else's expertise in either the subject itself or how to explain it. Don't be that guy.

ETA: Even if what you want is a spirited debate, mutual consent and respect matter. A voluntary debate can be a lot of fun for everyone, contrarian provocation out of the blue much less so.


There are different forms of argumentation and they have a different tone that maybe doesn't come through so well on the internet without context. I dislike the debate club mentality where arguments are more like a spectator sport than a deliberation most of all, but there are less combatative forms of arguments that are really more like questions with extra scaffolding. I think all people are weakly informed and weakly opinionated about a different subset of most things and it's not always clear when you are in an environment with people who aren't also weakly informed and weakly opinionated. Arguments of the form "I believe this" and "here are my (maybe not so great) reasons" can be a reasonable starting point for laying out your prior knowledge and biases. Making it clear that it is only a weakly held belief is an art form, though, and I can understand why someone might omit that step in an (assumed to be) hostile environment. It's one of the reasons why I think the choice of forum matters. It can be easier to be more open about what you don't know if you aren't subconsciously worried about being attacked for it.

As you said, (I'm paraphrasing) there are often better ways to go about things.


I think that's really depends on what you're trying to learn. If you want to learn about someone's position it may be better to ask them than play Devil's Advocate. If you want to learn about the devil's position, the other person won't be able to help you.

As you mentioned, time, place, and consent are important. You see this a lot in forums dedicated to debate. What consent means in general forms like this one is a little bit more ambiguous. Some people are looking to debate and some people are looking to just talk.


Or, if you just want get practice (internet) arguing, its always a good idea.

OFC, the other people need to be interested in arguing. Random debate is a valuable skill, and I don't know that I 100% agree that "consent" is important. There's a difference between harassing someone and letting them know you don't agree (we call that discourse), if they're not interested in continuing the conversation a lot of people will just stop talking. Otherwise, you don't get to say things and then have some shield that nobody will disagree with you, putting it in the public forum itself is the only consent needed.

The only major danger is when someone doesn't reply at all. The OP's article ignores the fact that argument is often not just about being right, but about winning the perception. I think it is probably better (with someone who is not being disingenuous or harassing) to state if you don't have time or interest to reply, rather than letting the other argument hang -

- the last, simplest thing said usually sticks out in onlookers brains, not the 5 paragraphs of well (or not well) thought out response.


>I don't know that I 100% agree that "consent" is important.

That's why I said it is ambiguous what consent means. Engaging is always discretionary as nobody is forced to respond. That said, I think there is value in people being more clear about their intent, so that people have more information when deciding to engage.

This leads to higher quality debate and discussion than when two people have different notions of what the topic and purpose of the conversation is.


> If you want someone to educate you, the polite thing to do is ask. This kind of "stealth learning" where you say things you don't believe, expecting to learn via corrections, is inconsiderate and inefficient.

Unfortunately, it is also unreasonably effective. I mean, learning by being wrong is how learning works. It's how science works.

Often the fastest way to find the right answer to a problem is to loudly and confidently proclaim the wrong answer.

As the ancient advice goes: want to know how to do something in Linux? Tell a bunch of Linux users Linux can't do that.

Sometimes I argue things I hope aren't true, but just don't have a good argument against in the hopes someone better equipped will come along and convince me I'm wrong.

I think as long as you're civil and willing to admit defeat then it isn't really a problem.


I'm okay with people trying out different viewpoints—I do think it's a good way to explore new ideas—but I'd prefer the post be prefaced with "to play devil's advocate" or something to that effect.


How do you know you are wrong until someone convinces you?


I’m guessing the frequency with which people are both wrong and convinced so by a random internet person is rather low.


GP seems to say this already.

The thrust of their point is that the time spent engaging isn’t worth it for the person debating, not whether it does or doesn’t impact the perspectives on those that view the debate.


> The thrust of their point is that the time spent engaging isn’t worth it for the person debating

Yes, and this is my main point of disagreement with them. My point is that there is plenty of value if you want there to be.


> Understand why you're in an argument and what you hope to accomplish by being in an argument.

There's a spectrum here, but I've noticed that a lot of people like to debate things online because they're genuinely trying to understand the problem set, or want to play around with ideas. They're not looking to "argue" necessarily. However, on the other end, some people do interpret any disagreement as an argument and will then turn the discussion into one that they will try to "win".

It can be frustrating if you're coming at it from a "discovery" perspective and the other person wants to fight, or you're coming at it from a fight perspective and the other one wants to discover. This happens in real life as well, but it's a little bit easier to communicate what your intent is based on your tone and expression.


Like you I learned this early on. But I somehow manage to forget the lesson every time I engage on a significantly new platform. I had to re-learn it again on Twitter and on Hacker News.

It’s weird because it makes no sense rationally, but there you go.


Parallelization and GPUs were the hot story 10-5 years ago, and require(d) a pretty substantial shift in the software stack for less-general gains. You're still hoping the cost-per-transistor goes down. I think recent 400W+ GPUs have shown that we're coming close to the end of this particular S-curve. The big question is whether any of the tricks we have left are broad enough and strong enough to address the economic problem.


I want to believe that there's some world where developers start putting actual craftsmanship into their software (again?), but I think this will only be true for SaaS or related environments where the developer is also the one spending money on the compute. Everyone else will probably outsource the optimization effort to a mixed hardware-software vendor that amortizes the cost (like hyperscale cloud providers) via middleware, instead of just amortizing the cost via hardware.

People (including on HN!) like to say that cycles are cheap but developer time isn't. There's a reason for this that will be invariant of the actual costs: it's the user that pays for the hardware and the developer that pays for development time. End-users are willing to spend money on hardware, but not software, and this has remained true even as companies like Apple and NVIDIA have blurred the boundary. In the current environment of $0 software, how do you (as a developer) fund more efficient software? I think the likely answer will be that developers will happily lock themselves to whatever vendor offers to solve this problem for them. We've seen this in ML already.


Autodiff does not work with for loops or if statements. The current solutions effectively pick a few promising traces through the program and then assume that nothing else exists. To handle it more elegantly (for things like preserving equational reasoning or avoiding exponential blowup) you need to address it at the level of language semantics.


> Autodiff does not work with for loops or if statements.

Is that necessarily true? Here is an incomplete automatic differentiation implementation that handles if statements just fine in a function definition. Unless you mean something else.

    type Dual = {Real: float; Epsilon: float} with
        static member (~-) (x: Dual) = {Real = -x.Real; Epsilon = -x.Epsilon}

        static member (+) (x: Dual, y: Dual) = { Real = x.Real + y.Real
                                                 Epsilon = x.Epsilon + y.Epsilon }
        static member (+) (x: Dual, c: float) = {x with Real = x.Real + c}
        static member (+) (c: float, y: Dual) = {y with Real = c + y.Real}

        static member (-) (x: Dual, y: Dual) = x + (-y)
        static member (-) (x: Dual, c: float) = x + (-c)
        static member (-) (c: float, y: Dual) = c + (-y)

        static member (*) (x: Dual, y: Dual) = { Real = x.Real * y.Real
                                                 Epsilon = x.Real * y.Epsilon + x.Epsilon * y.Real }
        static member (*) (c: float, y: Dual) = {Real = c; Epsilon = 0} * y
        static member (*) (x: Dual, c: float) = x * {Real = c; Epsilon = 0}

    let dcos (x: Dual) = {Real = cos x.Real; Epsilon = -(sin x.Real) * x.Epsilon}
    let dsin (x: Dual) = {Real = sin x.Real; Epsilon = (cos x.Real) * x.Epsilon}

    let differentiate (f: Dual -> Dual) a =
        let x = f {Real = a; Epsilon = 1.0}
        x.Epsilon

    let testFunction (x: Dual) = if x.Real < 0.0 then dcos x else dsin (x*x - 3.0*x)
Using that gives:

    > differentiate testFunction -1.0
      0.8414709848078965

    > differentiate testFunction 1.0
      0.4161468365471424

    > differentiate testFunction 0.0
      -3.0
Now, of course, one needs to be careful interpreting the result at a = 0.0. That's because the testFunction is not differentiable at that point due to a jump discontinuity there, but we still get a value back. But as far as I know, this is simply an issue with automatic differentiation in that it only correctly tells you what the derivative is if it exists at the given point.


This "discretizes then differentiates" to borrow terminology from [1] which is one of the more accessible presentations and papers. The program might evaluate correctly, but equational reasoning (like you might want for any kind of automated optimizations) is broken. In a toy example like this where you're doing everything manually then you probably don't care, but for larger systems, it gets tiring to do the mental equivalent of assembly programming.

[1] https://people.csail.mit.edu/sbangaru/projects/teg-2021/


This isn't a toy example, though. It's the start of a library. Once you've developed the dual numbers and differentiate function and defined the dual number versions of all elementary functions, then you have a full (forward-mode) automatic differentiation library that can just be used. You wouldn't have to do anything manually. You'd just define your functions using this library instead of the built-in functions, since you can use the dual number functions both to differentiate or simply to evaluate (setting the dual part to 0).

> This "discretizes then differentiates"

Not sure what you mean. It defines dual numbers, then defines elementary functions on dual numbers (I only did two as an example). From there, you get differentiation for free (i.e., automatically). The only thing that was done manually was defining the testFunction. Everything else would be part of a library that you'd consume.

I'm not sure what you mean by "equational reasoning is broken".

Thank you for the link to the paper. Seems interesting, and I'll read through it more. Although, it is discussing differentiating integrals, which is where their language "discretize-then-differentiate" comes from. From this paper, I sort of get a sense of why differentiable programming might make sense as a concept, but I've only ever seen the term introduced with automatic differentiation, which is what I was balking at (given the content of the original post here). I'll keep reading this paper, but I think what you've mentioned before hasn't convinced me. Thanks for the discussion.


> In a toy example like this where you're doing everything manually then you probably don't care, but for larger systems, it gets tiring to do the mental equivalent of assembly programming.

But by this argument (which sounds plausible to me) you have defeated your previous claim that differential programming is really a new paradigm, as it seems you have adopted what bmitc wrote earlier, that differential programming is not a new paradigm but "seems like automatic differentiation just implemented properly".


There's no contradiction: autodiff is a method of implementing differentiable programming. In this example, it is implemented as a type that handles a trace of a program, but everything else is left to the programmer. This is a problem because most of the code I would want to write is not a single trace!

Analogously, I could write a program in C that does message sends and organizes code in a design pattern called "objects" and "classes". Incredibly painful, but workable sometimes. Some people even call it "object oriented C" and go on to create a library to handle it like [1]. Is object orientation not a paradigm because I've implemented a core piece as a library?

No, because that misses the intangible part of what makes a paradigm a paradigm: I structured my code this way, for a reason. In OOP, that reason is the compartmentalization of concerns. The underlying OOP mechanism gives me a way to reason about composition and substitution of components to minimize how much I have to reason about when writing code. Similarly, in differentiable programming, the differentiability of all things gives me a way to reason about the smooth substitution of things because it more easily lets me reason about how the machine writes code.

[1] https://en.wikipedia.org/wiki/GObject


Seems we're arguing about definitions. Currently differentiable programming seems to be this vaguely defined term (I don't get what you mean by smooth substitution), with autodiff being its only (proper) instantiation.

You say autodiff is actually not representative of differentiable programming. But if there aren't any other good examples that illustrate differentiable programming, how is differentiable programming (currently) more than autodiff?...


@bmitc: Reading your replies (some of which seem to have been written the same time I wrote mine), it seems we are on the same page; I'm also a mathematician and I also have some qualms with how people invent new names for automatic differentiation :) I had a look at your bio and couldn't find any email address. Would you perhaps be interested in having a longer, scientific discussion about AD?


Julia's AD is compatible with control flow. They have their own issues, but Zygote + ChainRules actually work pretty well


Object oriented programming, for example, doesn't let me have a variable hold half of one object and half of another or let the language derive the code that gave me that object at runtime, but object oriented + differentiable programming does. It's no less of a paradigm than logic, quantum, or probabilistic programming. If you want to, you can view differentiable programming as extending logic programming with a product and chain rule (+ some additional constraints) that allows (smooth, if you want it) interpolation between data and code.

That said, most discussion of differentiable programming is at the level of syntax sugar for reverse mode differentiation, so I can't blame you for that conclusion.


You don't need OOP plus another paradigm to do automatic differentiation though. I've implemented automatic differentiation, albeit "simple" versions and only forward-mode at this point, but there's really nothing special about the implementations.

Logic programming, on the other hand and for example, needs something much more substantial to be implemented as a library in an existing language, such as backtracking, unification, or the full-on Warren Abstract Machine.

If someone has a clear example of differential programming that is different than just using automatic differentiation as a technique or library, then that might help.

> doesn't let me have a variable hold half of one object and half of another or let the language derive the code that gave me that object at runtime

I'm not sure what you mean here. Could you elaborate?


I don't need OOP to do a hash table lookup and then an indirect function call with the receiver as the first argument either but that ignores that there's more to a paradigm than the algorithm I use for facilitating it. You can embed unification of expression trees as a library in C++. People implement backtracking all the time in almost every language. Talking about differentiable programming as if it's just autodiff is missing the point of what a programming paradigm is.

There's a mechanism, yes, but that's just a means to an end of efficiently enabling a different way of approaching programming. In the case of differentiable programming, that's continuous code and continuous data enabling program search that doesn't have to use purely discrete methods (like logic programming). If that sounds like autodiff and backprop, then yes, that's because that's a good way to implement it. Tensorflow and PyTorch are DSLs embedded in Python and C++ both useable and used for more than just implementing neural networks, but most people aren't happy calling a library a language until it has a parser and a file extension.

> I'm not sure what you mean here. Could you elaborate?

Most programming languages assume that a variable can only contain one value, or a composite value of values. Differentiable programming lets code be smoothly transformed from one to the other while being meaningful at all points between. In an object oriented case, this would be like having a variable contain an object that behaves like some known object A or object B selectively depending on which choice maximizes the success of the program at any given moment.


Seems a bit of conflation of notions going on here: Just because logic/quantum/probabilistic all end with "programming", it doesn't mean that they all are what is called a "programming paradigm" in the typical sense [0]. Where does a paradigm end and a library implementation begin?

For example, varying how a program arrives at the answer induces impertive/OOP vs. declarative paradigms [0]. On the other hand, quantum programming assumes a radically different type of "CPU" (i.e. instruction set) on which your program runs - which in turn obviously changes everything. But this can safely by implemented within existing paradigms, e.g. ProjectQ [1] is implemented in an OOP language: Python. Thus, I would not call this a new programming paradigm (unfortunately [0] does that, but I think that is bad form).

> you can view differentiable programming as extending logic programming [...] that allows interpolation between data and code.

Reference? I have browsed one of the standard references [2] on automatic differentiation and googled a bit and could not find something that supports your statement. Even more so, is seems that even defining semantics for differential programming is barely in its starting stages [3].

> differentiable programming is at the level of syntax sugar for reverse mode differentiation, so I can't blame you for that conclusion.

By the same argument you could say that probabilistic programming is just syntax sugar for painless specification of statistical models.

Care to provide a reference where differential programming is presented as something more than "syntax sugar"?

[0] https://en.wikipedia.org/wiki/Programming_paradigm#Further_p...

[1] https://projectq.ch/

[2] Griewank A., Walther A., Evaluating Derivatives, SIAM 2008

[3] https://arxiv.org/pdf/1911.04523.pdf


The existence of a different kind of CPU isn't a meaningful distinction at the level of discussing paradigms. The semantics are different, so the abstract machine is different. The fact that I need a different set of atoms in my desktop to use it doesn't change the programming language part of the discussion.

The main paper to read is [1] which introduces a syntactic notion of differentiation in the lambda calculus connecting substitution and nondeterministic choice to differentiation in the calculus of infinitesimals sense and also introduces a meaningful notion of Taylor expansion of arbitrary programs. This paper is mostly of academic interest, though. The resulting expansion is wildly uncomputable meaning that more modern, practical papers like [2] cite it wistfully as a dream of what could be achieved. How to computably handle most of the constructs we care about in a general programming sense is very active, open research. At the time the paper was introduced, it was more influential on (and influenced by) work on probabilistic and quantum programming through their related models of linear logic [3]. There are only a few slight axiom differences that separate differential, logic, probabilistic, and quantum programming though, so if you're willing to accept one as a "paradigm", then you should accept the others.

[1] https://www.sciencedirect.com/science/article/pii/S030439750... [2] https://arxiv.org/abs/1911.04523 [3] https://ncatlab.org/nlab/show/differential%20category


It seems you haven't read my references? As your [2] is my [3] from above!

> The existence of a different kind of CPU isn't a meaningful distinction at the level of discussing paradigms.

Well, that was my point above: You can't really lump quantum programming together with probabilistic programming, as they are paradigms on different "levels".

> practical papers like [2] cite it wistfully as a dream of what could be achieved

Are you sure about that? I skimmed [1] as I wasn't hadn't read it and it seems to describe a rather restricted set of functions ("types are interpreted as vector spaces and terms as functions defined by power series on these spaces"), as there are many differentiable functions that cannot be defined as power series.

Moreover, in [2] it is only claimed: "Ehrhard and Regnier {i.e. your reference [1]} do not give an operational semantics but they do give rules for symbolic differentiation and it should not be too difficult to use them to give an operational semantics. However their language with its convenient vector space semantics only supports total functions. It therefore cannot be extended to include recursive function definitions or conditionals (even with total predicates, as continuous functions from R^n to the booleans are constant)." So I would not say they cite [1] as a wistful dream ...

> There are only a few slight axiom differences that separate differential, logic, probabilistic, and quantum programming though.

Give me the axioms and their differences and I believe you. :) (Honestly, I'm not even sure if the discussion on which axiomatization captures the existing developments has been settled; it seems to me you have some kind of category theoretic approach in mind where you just change the category and get a new paradigm - I'd be happy to accept this as well, if there is a clear reference, though I'm doubtful one exists ...)


I don't usually respond to old comments, so I don't know if you'll read this, but I hope I can encourage you to think more broadly about what "differentiable programming" means.

Different fields have a different perspective on the same set of tools because those tools have different pathological cases in different areas. Context really matters.

> Well, that was my point above: You can't really lump quantum programming together with probabilistic programming, as they are paradigms on different "levels"

This distinction is not useful. If I write in a functional or logic programming language, it gets translated into imperative commands for an underlying architecture that is some mix of dataflow, event driven, automata-based, concurrent, etc... that is then further built on top of some physical atoms where an engineer worried about quantum effects. If I write in a quantum programming language, it will probably go through the same process for at least another 5 years. You might argue that quantum is somehow more dictated by the underlying physical model the way that people argue that imperative programming is closer to the physical world than functional programming. But the "level" doesn't change the usefulness of viewing all of these as "paradigms" worthy of study and analysis on their own terms with their own tools. At the level of studying a programming language, the "level" is a useful thing to be aware of for implementations and motivation but usually not for a theory.

> Even more so, is seems that even defining semantics for differential programming is barely in its starting stages

This is also not a useful distinction. OOP was also, infamously, a point of contention between the academic and outside worlds because it was developed and incredibly prevalent without a rigorous theory abstracting it beyond procedural programming. It became a "paradigm" despite that because there was a set of (informal) tools for reasoning about it on its own terms [1].

Likewise, differentiable programming has largely developed to formalize what makes programs written for machine learning frameworks different from programs written in the imperative/object oriented/functional language they are built on. Autodiff has mostly developed in practical usage, so the use cases are front-running the theory. There's increasingly hardware tailored to the execution model and software developers attempting to program it. There are approaches to problems like discontinuities that people have found solutions for without a rigorous theory justifying their use. There's a structure to why and how people are writing code for these applications as well as an operational theory for how to reason about it, but there's very little compositional, equational theory for these choices.

To most people in machine learning, "differentiable programming" is just autodiff with pretty syntax because the term sprung from attempting to put what they are already trying to accomplish with that implementation on more solid theoretical footing as a computable model of a more general logic. That, hopefully, lets us more efficiently explore what a better domain-specific theory might be and if there are better execution models or logical frameworks. Autodiff itself is increasingly used as an umbrella term for many other methods of differentiation with different edge cases, so this is already happening in practice.

To reduce "differentiable programming" to just its implementation ignores that aspect. It would be equivalent to equating machine learning to matrices. Not unreasonable computationally and not a terrible place to start for a theory, but deeply unsatisfying as a mature domain-specific theory.

The main paper I linked [2] is not about autodiff at all. It's an attempt to establish a connection between differentiation in an analysis sense to models of (not otherwise obviously differentiable) program evaluation. The (unrealized) promise is that the centuries of understanding we have for the calculus of infinitesimals can be applied to the less-mature study of lambda calculus and nondeterministic computation. Papers like [3] cite it because it addresses (discrete) structures that analysis is less interested in and potentially provides a way to connect computation, calculus, and whatever it is that we're doing with machine learning.

> Are you sure about that? I skimmed [1] as I wasn't hadn't read it and it seems to describe a rather restricted set of functions ("types are interpreted as vector spaces and terms as functions defined by power series on these spaces"), as there are many differentiable functions that cannot be defined as power series.

Because it's a PL theory paper, it's not concerned with whether all differentiable functions can be represented, but whether all computable functions can be differentiated. And PL theorists are generally more comfortable than most to accept that most functions cannot be computed and choose a more restrictive model that enables more reasoning power. The category [4] is really the better place to start since it lets us also consider models that aren't vector spaces and [2] is best thought of as a prototype that left many gaps in the theory (for example the requirement on coefficients for convergence is wrong, though I can't remember which paper by Lionel Vaux proved this). It can be thought of as computable in finite instances, but it's unsound even when typed due to the zero term and result of sums.

The quote you cite from [3] is easily misunderstood without that context. As a practice-focused paper, it cares very much about computability. Conditionals and loops are possible in [2] since it allows church numerals and fixed point combinators but it introduces a nondeterministic sum which is exponential in the number of evaluation steps and may diverge (doubly so since it's the untyped lambda calculus...) and is difficult to operationalize. That's what I meant by "wildly uncomputable". So, to them, [2] offers a useful mental framework for higher order features, but is not practical. The theory isn't there yet.

The connections between logic, quantum, probabilistic, and differentiable programming can be understood by how the model treats the exponential modality (!) which converts the otherwise linear term to an analytic one. Differentiation decomposes this to give a sum of linear terms. Differential lambda calculus doesn't put any (more) structure on the sum. Probabilistic programming gives the added structure of a probabilistic sum where coefficients are weights. Quantum programming can be modeled via a Fock space [5] for (!) ([5] predates [2] so is not directly discussed as a differential category here). However, it's unclear what the right model for differentiable programming should be if we want something practical for the resulting derivative (much less antiderivative). Daniel Murfet et al [6] have some related work more directly in the context of machine learning.

[1] https://www.cs.cmu.edu/~aldrich/papers/objects-essay.pdf [2] https://www.sciencedirect.com/science/article/pii/S030439750... [3] https://arxiv.org/abs/1911.04523 [4] https://ncatlab.org/nlab/show/differential%20category [5] https://www.researchgate.net/publication/2351750_Fock_Space_... (sadly a researchgate link) [6] http://therisingsea.org


> don't know if you'll read this

I did read it ;) Because I'm very much interested in this entire topic.

> I hope I can encourage you to think more broadly about what "differentiable programming" means

I'm trying to, but I find it hard. My stance was that differentiable programming seemes like this theory for which only a single example (namely autodiff) existed, as you also said ("autodiff has mostly developed in practical usage, so the use cases are front-running the theory"). But this entire comment of yours really clarified some things for me.

> The main paper I linked [2] is not about autodiff at all. >The quote you cite from [3] is easily misunderstood without that context

You finally convinced me to have a detailed look at this. Thank you for providing the context.

> Conditionals and loops are possible in [2] since it allows church numerals and fixed point combinators but it introduces a nondeterministic sum [...] and is difficult to operationalize. That's what I meant by "wildly uncomputable".

I think I may have misunderstood some of your previous comments (and perhaps vice versa) as it now dawns on me that you use a vocabulary than comes (I guess?) from PL theory and is very different from the one I'm used to, as a mathematician versed in analysis. I'll re-read them.

> Daniel Murfet et al [6] have some related work more directly in the context of machine learning.

I'm actually aware of Daniel Murfet but haven't read his work from last years. Did you have a specific paper from him in mind?


P.S. (In case you return once more to read comments.)

It seems like these are problems that could benefit from a PL-theory <-> mathematical analysis cross-collaboration.

I would have sent you a private message via email, but couldn't find any info on your HN profile; my email is there.


At the time the comment was made the link was https://harvard-iacs.github.io/2019-CS109A/pages/materials.h... where neural networks were mentioned.

See https://news.ycombinator.com/item?id=32295656


The average consumer is more willing to pay for hardware than software, even if all the value is created by the software.

It's the reason why Apple no longer charges for OS updates. Nvidia has essentially the same business model.

It's very hard to sell software to $averageconsumer unless that software is "free"


Apple's ethos of giving more to creators was because they were focused on the niche markets that would spend good money for good products. By the late 90s, creators were their primary users so the survival of Apple meant catering to them even if it meant playing nicely with Microsoft and Adobe. Once they got a whiff of the mass market, it took them until 2019-ish to realize they had lost something.

Apple's modern identity is a lifestyle brand cosplaying as a luxury brand. Creator ("Pro"), to modern Apple, means YouTube, TikTok, podcasts, etc... All the people who might use their very expensive, but not unreasonably so hardware to visibly flaunt their taste over the slightly-less-wealthy Android/Windows plebs. Think high-res-cameras-in-the-iPhone-with-no-high-speed-data-port kind of "Pro" So, they equate Pro to a set of apps and not an ecosystem for enabling those apps, whereas before, they needed that ecosystem for their survival (the Carbon era).


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: