Hacker Newsnew | past | comments | ask | show | jobs | submit | CrazyStat's commentslogin

ROT13 is cheap enough that you can afford to apply it many more times. I use one million iterations to store passwords securely.

640k oughtta be enough for anybody.

One of my favorite bits of my PhD dissertation was factoring an intractable 3-dimensional integral

\iiint f(x, y, z) dx dy dz = \int [\int g(x, y) dx]*[\int h(y, z) dz] dy

which greatly accelerated numerical integration (O(n^2) rather than O(n^3)).

My advisor was not particularly impressed and objectively I could have skipped it and let the simulations take a bit longer (quite a bit longer--this integration was done millions of times for different function parameters in an inner loop). But it was clever and all mine and I was proud of it.


The premise of the singularity concept was always superhuman intelligence, so it’s not so much a parallel as a renaming of the same thing.

> In Vinge’s analysis, at some point not too far away, innovations in computer power would enable us to design computers more intelligent than we are, and these smarter computers could design computers yet smarter than themselves, and so on, the loop of computers-making-newer-computers accelerating very quickly towards unimaginable levels of intelligence.


Would never work in reality, you can't optimize algorithms beyond their computation complexity limits.

You can't multiply matrix x matrix (or vector x matrix) faster than O(N^2).

You can't iterate through array faster than O(N).

Search & sort are sub- or near-linear, yes - but any realistic numerical simulations are O(N^3) or worse. Computational chemistry algorithms can be as hard as O(N^7).

And that's all in P class, not even NP.


https://en.wikipedia.org/wiki/Computational_complexity_of_ma...

The n in this article is the size of each dimension of the matrix — N=n^2. Lowest known is O(N^1.175...). Most practical is O(N^1.403...). Naive is already O(N^1.5) which, you see, is less than O(N^2).


Well, but still superlinear.

We don't need to optimize algorithms beyond their computational complexity limits to improve hardware.

Hardware is bound by even harder limits (transistor's gate thickness, speed of light, Amdahl's law, Landauer's limit and so on).

But that doesn't disprove the hypothesis that in principle you can have an effective self-improvement loop (my guess is that it would quickly turn into extremely limited gains that do not justify the expenditure).

Any such "self-improvement loop" would have a natural ceiling, though. From both algorithmic complexity and hardware limits of underlying compute substrate.

P.S. I am not arguing against, but rather agreeing with you.


The natural ceiling is the amount of compute per unit of energy. At the point you can no longer improve energy efficiency, you can still add more energy to operate more compute capacity.

Which hints that truly superintelligent AIs will consume vast amount of energy to operate and matter to build.

At some point they’ll hit the speed of light as a limit to how quickly it can propagate its internal state to itself - as the brain grows larger, the mind slows down or breaks apart into smaller units that can work faster before rejoining the bigger entity and propagating its new state.

Must feel really strange.


Realistically, the physical limits to computation are the speed of light and energy dissipation.

You're measuring speed not intelligence. It's a different metric.

It is exactly the same metric. Intelligence is not magic, be it organic or LLM-based. You still need to go through the training set data to make the any useful extrapolations about the unknown inputs.

I think you mean a Poisson process rather than a Poisson distribution. The Poisson distribution is a discrete distribution on the non-negative integers. The Poisson process’s defining characteristic is that the number of points in any interval follows the Poisson distribution.

There have been a large variety of point processes explored in the literature, including some with repulsion properties that give this type of “universality” property. Perhaps unsurprisingly one way to do this is create your point process by taking the eigenvalues of a random matrix, which falls within the class of determinantal point processes [1]. Gibbs point processes are another important class.

[1] https://en.wikipedia.org/wiki/Determinantal_point_process


For anyone one confused, the first part is an approximate transliteration into Cyrillic of the English sentence “Much like how you can/could spell English in Cyrillic.”


The legal premise of training LLMs on everything ever written is that it’s fair use. If it is fair use (which is currently being disputed in court) then the license you put on your code doesn’t matter, it can be used under fair use.

If the courts decide it’s not fair use then OpenAI et al. are going to have some issues.


Presumably the author is working on the basis that it is not fair use and wants to license accordingly.


Quite possibly. If they care a great deal about not contributing to training LLMs then they should still be aware of the fair use issue, because if the courts rule that it is fair use then there’s no putting the genie back in the bottle. Any code that they publish, under any license whatsoever, would then be fair game for training and almost certainly would be used.


A few points because I actually think Lindley’s paradox is really important and underappreciated.

(1) You can get the same effect with a prior distribution concentrated around a point instead of a point prior. The null hypothesis prior being a point prior is not what causes Lindley’s paradox.

(2) Point priors aren’t intrinsically nonsensical. I suspect that you might accept a point prior for an ESP effect, for example (maybe not—I know one prominent statistician who believes ESP is real).

(3) The prior probability assigned to each of the two models also doesn’t really matter, Lindley’s paradox arises from the marginal likelihoods (which depend on the priors for parameters within each model but not the prior probability of each model).


Are you seriously saying that, because a point distribution may well make sense if the point in question is zero (or 1) other points are plausible also? Srsly?

The nonsense isn't just that they're assuming a point probability, it's that, conditional on that point probability not being true, there's only a 2% chance that theta is .5 += .01. Whereas the actual a priori probability is more like 99.99%.


Srsly? Srsly.

> The nonsense isn't just that they're assuming a point probability, it's that, conditional on that point probability not being true, there's only a 2% chance that theta is .5 += .01. Whereas the actual a priori probability is more like 99.99%.

The birth sex ratio in humans is about 51.5% male and 48.5% female, well outside of your 99.99% interval. That’s embarrassing.

You are extremely overconfident in the ratio because you have a lot of prior information (but not enough, clearly, to justify your extreme overconfidence). In many problems you don’t have that much prior information. Vague priors are often reasonable.


In a perfect world everybody would be putting careful thought into their desired (acceptable) type I and type II error rates as part of the experimental design process before they ever collected any data.

Given rampant incentive misalignments (the goal in academic research is often to publish something as much as—or more than—to discover truth), having fixed significance levels as standards across whole fields may be superior in practice.


The real problem is that you very often don't have any idea about what your data are going to look like before you collect them; type 1/2 errors depend a lot on how big the sources of variance in your data are. Even a really simple case -- e.g. do students randomly assigned to AM vs PM sessions of a class score better on exams? -- has a lot of unknown parameters: variance of exam scores, variance in baseline student ability, variance of rate of change in score across the semester, can you approximate scores as gaussian or do you need beta, ordinal, or some other model, etc.

Usually you have to go collect data first, then analyze it, then (in an ideal world where science is well-incentivized) replicate your own analysis in a second wave of data collection doing everything exactly the same. Psychology has actually gotten to a point where this is mostly how it works; many other fields have not.


Huh.

This is an interesting post but the author’s usage of Lindley’s paradox seems to be unrelated to the Lindley’s paradox I’m familiar with:

> If we raise the power even further, we get to “Lindley’s paradox”, the fact that p-values in this bin can be less likely then they are under the null.

Lindley’s paradox as I know it (and as described by Wikipedia [1]) is about the potential for arbitrarily large disagreements between frequentist and Bayesian analyses of the same data. In particular, you can have an arbitrarily small p-value (p < epsilon) from the frequentist analysis while at the same time having arbitrarily large posterior probabilities for the null hypothesis model (P(M_0|X) > 1-epsilon) from the Bayesian analysis of the same data, without any particularly funky priors or anything like that.

I don’t see any relationship to the phenomenon given the name of Lindley’s paradox in the blog post.

[1] https://en.wikipedia.org/wiki/Lindley%27s_paradox


The Secretary Problem tells us that once you’ve lived 1/e (~37%, 30ish years) of your life[1], the next time you see something that’s stupider than everything you’ve seen before there’s a 1/e chance that’s it’s the stupidest thing you’ll ever see.

[1] Strictly speaking it would be 1/e of your stupidity sightings, which may not be 1/e of your life. If you intend to retire early and become a hermit you may want to stop the exploration phase earlier.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: