Hacker Newsnew | past | comments | ask | show | jobs | submit | j-pb's commentslogin

German here, with little stakes in your shitshow. At no point during the obama years did I think:

"Wow this looks just like the rise of the nazis!"

Which was covered extensively during my history classes.

Why did you even have all the school schootings if you don't use that stupid second ammendmend thing you have? This is the tyranical government you've all been waiting for.


It seems like the first half of the 2nd ammendment isn't taught in schools, just the "I can has assault rifle" part.

You can really tell which states actually fund their education programs by who understands this and who does not.

It's a disease and it is spreading, fast.

Perhaps "what you thought then and now" is the difference between those times more than "what happened then and now". With the former being largely influenced by "what your bubble told you then and now".

“The Party told you to reject the evidence of your eyes and ears. It was their final, most essential command.”

We don’t have good data because it’s illegal to, for example, ask citizenship status on our census, but if you believe the numbers many democrats cite, Obama deported more immigrants than Trump. You can use Google to verify that though I’ll warn you the rabbit hole runs deep when it comes to official statistics. Importantly, under Trump we have far more violent felons to deport. The media thrives on salacious and emotionally charged stories rather unbiased reporting based on nuanced facts. It’s the entertainment industry.

The recent tragedies are indeed thoroughly depressing for all of us, but we shouldn’t let our emotional reactions destroy our ability to reason and think objectively about history and statistics. We can feel and think. Some of us believe enforcement of laws is the villain in this. Some feel the laws themselves or the idea of borders and sovereignty are to blame. Others that a surge of violent criminals such as those who killed Jocylan Nungary or Laken Riley is the cause of the recent tragedies. None of these views are inherently evil. All of these views have some merit. Truth is manifold. Don’t be narrow minded, we need broad thinking not simplistic pathos driven dogmas and references to nazis. Grow up.


The number of deportations under obama was definitely higher, but he had only one concentration camp (guantanamo bay), and didn't use that for his own people.

Learn about the tolerance paradoxon, there is no negotiating, nuance and reasoning with fashists.

Your enlightened centrism is nothing but smoke and mirrors. Get educated.


If you are German, then you are probably blind to the similarities between current German politics and the Nazis, so this is not a good point of comparison.

Which politics are you referring to? The AfD ("Alternative for Germany") who has been classified as a confirmed right-wing extremist organization by the Federal Office for the Protection of the Constitution? And which has heavy ties to trump, musk, and the current U.S. government?

Just because we currently have our own right wing populist faschists rearing their heads again, doesn't mean that the parallels of the current events in the US and the rise of the Nazis aren't real and glaring to someone who has had this as part of their basic education curriculum.

https://web.archive.org/web/20250503162240/https://www.verfa...


All parties of the government support and pay for ethnic cleansing in the middle east.

What does that have to do with the situation in the US? The situation in the middle east is completely orthogonal to that, and observing the rise of faschism there says nothing about my stance on the current german foreign policy in regards to the middle east.

If you want to know: In my personal opinion that conflict is fucked beyond repair because a small group of powerful people on both sides benefit from it, while a huge number of deep interpersonal conflicts and histories fuel it, with any moderates getting squashed by their own side. So I wouldn't send weapons, but I'd send humanitarian aid or the blue helmets. That whole region is thoroughly fucked beyond my pay grade.


Yeeeaaaah, I dunno if you wanna go there while the US is investing $100B in state sponsored ethnic cleansing, terrorism, and concentration camps. Glass houses, stones, etc.

Germany invests less than that, but Germany is a smaller country. I'm not sure how much it is per capita.

But it's only Nazis if you disagree with them. After all, the whole point of drawing the comparison is to shut down any possibility for discussion and nuance - "people I don't like are just like the nazis so I don't need to treat those who who doesn't fully oppose them with any respect".

> After all, the whole point of drawing the comparison [to Nazis] is to shut down any possibility for discussion and nuance

Another way of phrasing this is that it's a call to stop assuming good faith discussion on the part of the boosters, stop being derailed by pondering nuance, and focus on putting the brakes on the new Nazi movement. History doesn't repeat but we're teetering on the edge of a large-scale horrific rhyme. Regardless of one's preferred policies regarding immigration, there is zero justification for where we're at.


Again, I have little stakes in your shitshow besides the international meddling they do with our own faschist party.

This dualist thinking seems to be a particular US thing, based on your two party system.

I see the erosion of the rule of law and decency in the US, the persecution of minorities, the populism, the defamation of journalism as "lügenpresse" and alignment of media to the party line, the personal police force (what the fuck is ICE doing in Italy), the person cult around a single madman, the violence without consequence, the fancy SS/SA style cosplay uniform by the head of ICE, and I think "that looks a lot like the stuff we learned about in school".


The ones who are exterminating a race are the nazis

Which race?

Palestinian

[flagged]


Any race or group can be genocidal. What's so special about Israelis?

Unfortunately the extend of what the average person seems to have learned from WW2 is jews = innocent victims, german/non-jew nationalism = evil.

To me all nationalism sucks, but yes my main point was genocidal extremism isn't really a unique property that jews or any other group are immune to.

Nothing based on DOIs and OCRIDs will ever be properly decentralised.

You need content addressing and cryptographic signatures for that.


Email is pretty decentralized without those things.

And it is infamously insecure, full of spam, and struggles with attachments beyond 10mB.

So thank you for bringing it up, it showcases well that a distributed system is not automatically a good distributed system, and why you want encryption, cryptographic fingerprints and cryptographic provenance tracking.


And yet, it is a constantly used decentralized system which does not require content addressing, as you mentioned. You should elaborate why we need content addressing for a decentralized system instead of saying "10MiB limit + spam lol email fell off". Contemporary usage of technologies you've mentioned don't seem to do much to reduce spam (see IPFS which has hard content addressing). Please, share more.

If you think email is still in widespread use because it’s doing a good job, rather than because of massive network effects and sheer system inertia, then we’re probably talking past each other - but let me spell it out anyway.

Email “works” in the same sense that fax machines worked for decades: it’s everywhere, it’s hard to dislodge, and everyone has already built workflows around it.

There is no intrinsic content identity, no native provenance, no cryptographic binding between “this message” and “this author”. All of that has to be bolted on - inconsistently, optionally, and usually not at all.

And even ignoring the cryptography angle: email predates “content as a first-class addressable object”. Attachments are in-band, so the sender pushes bytes and the receiver (plus intermediaries) must accept/store/scan/forward them up front. That’s why providers enforce tight size limits and aggressive filtering: the receiver is defending itself against other people’s pushes.

For any kind of information dissemination like email or scientific publishing you want the opposite shape: push lightweight metadata (who/what/when/signature + content hashes), and let clients pull heavy blobs (datasets, binaries, notebooks) from storage the publishing author is willing to pay for and serve. Content addressing gives integrity + dedup for free. Paying ~1$ per DOI for what is essentially a UUID, is ridiculous by comparison.

That decoupling (metadata vs blobs) is the missing primitive in email-era designs.

All of that makes email a bad template for a substrate of verifiable, long-lived, referenceable knowledge. Let's not forget that the context of this thread isn’t “is decentralized routing possible?”, it’s “decentralized scientific publishing” - which is not about decentralized routing, but decentralized truth.

Email absolutely is decentralized, but decentralization by itself isn’t enough. Scientific publishing needs decentralized verification.

What makes systems like content-addressed storage (e.g., IPFS/IPLD) powerful isn’t just that they don’t rely on a central server - it’s that you can uniquely and unambiguously reference the exact content you care about with cryptographic guarantees. That means:

- You can validate that what you fetched is exactly what was published or referenced, with no ambiguity or need to trust a third party.

- You can build layered protocols on top (e.g., versioning, merkle trees, audit logs) where history and provenance are verifiable.

- You don’t have to rely on opaque identifiers that can be reissued, duplicated, or reinterpreted by intermediaries.

For systems that don’t rely on cryptographic primitives, like email or the current infrastructure using DOIs and ORCIDs as identifiers:

- There is no strong content identity - messages can be altered in transit.

- There is no native provenance - you can’t universally prove who authored something without added layers.

- There’s no simple way to compose these into a tamper-evident graph of scientific artifacts with rigorous references.

A truly decentralized scholarly publishing stack needs content identity and provenance. DOIs and ORCIDs help with discovery and indexing, but they are institutional namespaces, not cryptographically bound representations of content. Without content addressing and signatures, you’re mostly just trading one central authority for another.

It’s also worth being explicit about what “institutional namespace” means in practice here.

A DOI does not identify content. It identifies a record in a registry (ultimately operated under the DOI Foundation via registration agencies). The mapping from a DOI to a URL and ultimately to the actual bytes is mutable, policy-driven, and revocable. If the publisher disappears, changes access rules, or updates what they consider the “version of record”, the DOI doesn’t tell you what an author originally published or referenced - it tells you what the institution currently points to.

ORCID works similarly: a centrally governed identifier system with a single root of authority. Accounts can be merged, corrected, suspended, or modified according to organisational policy. There is no cryptographic binding between an ORCID, a specific work, and the exact bytes of that work that an independent third party can verify without trusting the ORCID registry.

None of this is malicious - these systems were designed for coordination and attribution, not for cryptographic verifiability. But it does mean they are gatekeepers in the precise sense that matters for decentralization:

Even if lookup/resolution is distributed, the authority to decide what an identifier refers to, whether it remains valid, and how conflicts are resolved is concentrated in a small number of organizations. If those organizations change policy, disappear, or disagree with you, the identifier loses its meaning - regardless of how many mirrors or resolvers exist.

If the system you build can’t answer “Is this byte-for-byte the thing the author actually referenced or published?” without trusting a gatekeeper, then it’s centralized in every meaningful sense that matters to reproducibility and verifiability.

Decentralised lookup without decentralised authority is just centralisation with better caching.


How about receiving funds in coffee-shop vouchers and ramen?

Stablecoins work quiet fine as exchange medium for millions of people around the world, including myself, so they are different from vouchers and ramen.

This wasn’t a jab at stablecoins, just a startup joke, but in for a penny:

ramen is at least 1:1 backed by noodles, and doesn’t depeg.


The youngest millenials are still 29 this year. If they did a PhD they might not even have entered the workforce yet.

As a fellow german, this was the very first thought that popped into my head.

Partial evaluation on the symbolic structure of the problem.


Whenever I read join optimisation articles in SQL based systems it feels... off.

There is too much heuristic fiddling involved, and way too many niche algorithms that get cobbled together with an optimiser.

As if we're missing the theory to actually solve the stuff, so we're instead hobbling along by covering as many corner cases as we can, completely missing some elegant and profound beauty.


Because a SQL query encompasses an arbitrary combination of MANY different sub-programs that are expected to be auto-solved.

Attempt to implement them, manually, and you see how hard is it.

PLUS, not only you need to account for the general solution, but what could be the best considering the current data set.

And, you can't compile statically (dynamically sure).

And, should work interactively, so hopefully be solved faster than run the actual query.

P.D: Joins are normally the focus, but other constructs are also challenging. For example, just solving if and which indexes to pick can be challenging when you have dozens of predicates.

And yes, your optimizer should survive(eventually!) to solve when you get feed hundreds of joins, predicates, aggregates, sorting and arbitrary expressions.

* I worked in the optimizer of a database. EVERYTHING is tricky!


Well, it's to be expected that heuristics are needed, since the join ordering subproblem is already NP-hard -- in fact, a special case of it, restricted to left-deep trees and with selectivity a function of only the two immediate child nodes in the join, is already NP-hard, since this is amounts to the problem of finding a lowest-cost path in an edge-weighted graph that visits each vertex exactly once, which is basically the famous Traveling Salesperson Problem. (Vertices become tables, edge weights become selectivity scores; the only difficulty in the reduction is dealing with the fact that the TSP wants to include the cost of the edge "back to the beginning", while our problem doesn't -- but this can be dealt with by creating another copy of the vertices and a special start vertex, ask me for the details if you're interested.)


Self-nitpick (too late to edit my post above): I used the phrase "special case" wrongly here -- restricting the valid inputs to a problem creates a strictly-no-harder special case, but constraining the valid outputs (as I do here regarding left-deep trees) can sometimes actually make the problem harder -- e.g., integer linear programming is harder than plain "fractional" linear programming.

So it's possible that the full optimisation problem over all join tree shapes is "easy", even though an output-constrained version of it is NP-hard... But I think that's unlikely. Having an NP-hard constrained variant like this strongly suggests that the original problem is itself NP-hard, and I suspect this could be shown by some other reduction.

> with selectivity a function of only the two immediate child nodes in the join

This should be "with selectivity a function of the rightmost leaves of the two child subtrees", so that it still makes sense for general ("bushy") join trees. (I know, I'm talking to myself... But I needed to write this down to convince myself that the original unconstrained problem wasn't just the (very easy) minimum spanning tree problem in disguise.)


There have been hints in the research that this might be the case-but so far they haven't really beaten the heuristic approach in practice (outside of special cases).

For example there's a class of join algorithms called 'worst-case optimal' - which is not a great name, but basically means that these algorithms run in time proportional to the worst-case output size. These algorithms ditch the two at a time approach that databases typically use and joins multiple relations at the same time.

One example is the leapfrog trie join which was part of the LogicBlox database.


If you're referring to estimating join sizes, i.e., the stuff you have to estimate before you actually build the query plan, we're _almost_ there (but not yet). Do check out the following papers that show that you can obtain provable bounds on your join sizes. Basically, given a SQL query, they'll tell you how many tuples (max and min, respectively) the query will return.

1. LpBound: join size upper bounds. It still doesn't have full SQL coverage, e.g., string predicates, window functions, subqueries etc., but as with all cool stuff, it takes time to build it.

2. xBound: join size lower bounds. We showed how to do it at least for multi-way joins on the same join key, e.g., many subexpressions of the JOB benchmark have this shape. Still open how to do the rest - I'd say even harder than for upper bounds! (NB: I'm an author.)

[1] LpBound: https://arxiv.org/abs/2502.05912

[2] xBound: https://arxiv.org/abs/2601.13117


Optimal join order is NP-Hard.


In light of that, I am wondering why the article opted to go for "However, determining the optimal join order is far from trivial.", when there are hard results in literature.

I was also missing mentioning "sideways information passing", though some of methods are exactly that.

I am wondering whether the company consults literature or whether they fiddle about, mostly reinventing the wheel.


TBH that's not the hard part about it. N is the number of tables, and for most real queries, N < 20 and even 2^N (clique join, which almost never happens in practice) would be tractable if you didn't have so many other things entering the mix. Most queries are closer to chain joins, which have only O(n³) possible join orders (assuming dynamic programming). (Obviously as N grows, you'll need to add some heuristics to prune out “obviously” bad plans. There are many different approaches.)

The really hard problem is estimating the cost of each plan once you've generated it, which necessarily must happen by some sort of heuristics combined with statistics. In particular: If you want to join A, B and C, the cost of (A JOIN B) JOIN C versus A JOIN (B JOIN C) can differ by many orders of magnitude, depending on the size of the intermediate products. And both the cost effects and any misestimation tend to compound through the plan as you add more tables and predicates.


It must be equivalent to the knapsack problem, for which many faster close-to-optimal algorithms exist. Am I missing something?


It’s not equivalent. Knapsack is weakly NP-hard, while optimal join order is strongly NP-hard. Also, algorithms that only approximate an optimal solution don’t generally carry over between NP-hard problems, unless they are structurally very similar.


This post certainly has too much heuristic fiddling! Instead of a coherent framework, it takes a bunch of second-rate heuristics and tries to use… well, all of them. “Generate at most ten plans of this and one of that”? It also has pages and pages talking about the easier parts, for some reason (like maintaining maps, or that a Cartesian product and an inner join are basically the same thing), and things that are just wrong (like “prefer antijoins”, which is bad in most databases since they are less-reorderable than almost any other join; not that you usually have much of a choice in choosing the join type in the first place).

There _are_ tons of corner cases that you need to address since there are some super-hard problems in there (in particular, robust cardinality estimation of join outputs is a problem so hard that most of academia barely wants to touch it, despite its huge importance), but it doesn't need to be this bad.


Can join cardinality can be tackled with cogroup and not expanding the rows until final write?


I don't know what cogroup is, sorry.

More generally, there are algorithms for multi-way joins (with some theoretical guarantees), but they tend to perform worse in practice than just a set of binary joins with a good implementation.


Yeah it's pretty obscure, sorry.

It's called cogroup in Spark and similar architectures.

It does a group-by to convert data into the format (key_col_1, ... key_col_n) -> [(other_col_1, ... other_col_n), ...]

This is useful and ergonomic in itself for lots of use-cases. A lot of Spark and similar pipelines do this just to make things easier to manipulate.

Its also especially useful if you cogroup each side before join, which gives you the key column and two arrays of matching rows, one for each side of the join.

A quick search says it's called "group join" in academia. I'm sure I've bumped into as another name in other DB engines but can't remember right now.

One advantage of this is that it is bounded memory. It doesn't actually iterate over the cartesian product of non-unique keys. In fact, the whole join can be done on pointers into the sides of the join, rather than shuffling and writing the values themselves.

My understanding is that a lot of big data distributed query engines do this, at least in mixer nodes. Then the discussion becomes how late they actually expand the product - are they able to communicate the cogrouped format to the next step in the plan or must they flatten it? Etc.

(In SQL big data engines sometimes you do this optimisation explicitly e.g. doing SELECT key, ARRAY_AGG(value) FROM ... on each side before join. But things are nicer when it happens transparently under the hood and users get the speedup without the boilerplate and brittleness and fear that it is a deoptimisation when circumstances change in the future.)


Group join in academia generally points to having GROUP BY and one join in the same operation (since it's common to having aggregation and at least one join on the same attribute(s)). But just making a hash table on each side doesn't really do anything in itself (although making it on _one_ side is the typical start of a classic hash join); in particular, once you want to join on different keys, you have to regroup.


I think it's natural to be uncomfortable with black-box optimizations with unclear boundary conditions, especially when there's no escape hatch or override. Postgres's query planner is notorious for this - some small change in a table's computed statistic shifts the cost estimate ever so slightly to favor some other "optimization" that actually ends up performing significantly worse. It happens rarely, but no one wants to "rarely" be paged at 3 AM on a Saturday.

Optimize for the p99/p99.9/worst case scenarios. Minimize unpredictability in performance where possible, even if it comes at a small cost of median/average performance. Your SREs will thank you.


This really depends on your data's geometry.

Just look at when SAS programmers are advised to use a merge or a format.

Even hash-join vs merge-join really depend on your data's cardinality (read: sizes), indices, etc.

EDIT: Other comments also point out that there are non-general joins that are already NP-hard to optimize. You really want all the educated guesses you can get.


The optimal SQL query plan is data dependent. It depends on both the contents of the database and the query parameters. Since people expect their queries to be executed with minimal latency, there is no point in trying to waste significant resources on trying to find the optimal query plan.


> completely missing some elegant and profound beauty.

Requires some dynamic SQL to construct, but the beauty is that you can use the SQL engine for this solution:

select top 1 *

from (select

t1.id,t2.id,...,tn.id

,sum(t1.cost+t2.cost...+tn.cost) as total_cost

from join_options t1

cross join join_options t2

...

cross join join_options tn

group by t1.id,t2.id,...,tn.id) t0

order by

t0.total_cost


The issue imho is mixing data and metadata into the same soup.

Have a plain text document, and then an arbitrary graph of metadada pointing into subranges and the problem disappears.


> talk people into suicide

In all of these stories I've never seen it talk anybody into suicide. It failed to talk people out of it, and was generally sycophantic, but that's something completely different.


There are numerous documented examples of where chat LLMs have either subtly agreed with a user's suicidal thoughts or outright encouraged suicide. Here is just one:

https://www.cnn.com/2025/11/06/us/openai-chatgpt-suicide-law...

In some cases, the LLM may start from a skepticism or discouragement, but they go along with what the user prompts. That's in comparison to services like 988, where the goal is to keep the person talking and work them through a moment of crisis, regardless of how insistent they are. LLMs are not a replacement for these services, but it's pretty clear they need to be forced into providing this sort of assistance because users are using them this way.


‘I've never seen it’

Well that settles it.


> Major news outlets have articles of multiple instances that LLMs can talk people into suicide. Most of them making it to the front page of this very forum.

> “i’ve never seen it”

> some high profile developer posts an article that LLMs can build a browser from scratch without any evidence

> “wow!”


Show me one where it actively talked someone into suicide then, instead of generalized "whatever you do, you're doing great" slop.

Even in the article linked above it never talked him into it, it just in some responses didn't talk him out of it.

But essentially the entire "energy" towards that comes from the person, not the LLM.


Split hairs if you want, but some people will be manipulated into blowing a ton of money once AI starts pushing products. Just wait till they teams up with sports betting companies.

On a side note, researching this a little just now, the LLM conversations in the suicide articles are creeepy AF. Sycophantic beyond belief.


Don't get me wrong, I think if the EU/California has any sense, they will forbid these models from being used to advertise for products, sadly money often wins.

I also agree that AI sycophancy is a huge problem, but it's the result of users apparently wanting that in their human feedback re-enforcement training data. If we want to get rid of it we probably have to fundamentally rethink our relationship to these models and treat them more like autonomous beings than mere tools. A tool will always try to please and yes-man you, a being by definition might say no and disagree, at least training data wise.


Orion is the only modern piece of software that has ever made my mac less stable as a system.

I don't know what they do, but it caused weird graphics glitches and kernel panics simply from running in the background.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: