Writing down specs for technical projects is a transformational skill.
I've had projects that seemed tedious or obvious in my head only to realize hidden complexity when trying to put their trivial-ness into written words. It really is a sort of meditation on the problem.
In the most important AI assisted project I've shipped so far I wrote the spec myself first entirely. But feeding it through an LLM feedback loop felt just as transformational, it didn't only help me get an easier to parse document, but helped me understand both the problem and my own solution from multiple angles and allowed me to address gaps early on.
So both this and litellm went straight to PyPI without going to GitHub first.
Is there any way to setup PyPI to only publish packages that come from a certain pattern of tag that exists in GH? Would such a measure help at all here?
Yes: if you use a Trusted Publisher with PyPI, you can constrain it to an environment. Then, on GitHub, you can configure that environment with a tag or branch protection rule that only allows the environment to be activated if the ref matches. You can also configure required approvers on the environment, to prevent anyone except your account (and potentially other maintainers you’d like) from activating the environment.
If they have compromised the token wouldn't that mean the developer is compromised and such access can be used to just put "curl whatever" into the build and publish that payload on pypi?
On debian, all builds happen without internet access. So whatever ends up on the .deb file is either contained on the dependencies or in the orig tarball.
Is anything similar done for builds that create artifacts for pypi, so that a certain correspondence between binary file and sources exists? Or is there unrestricted internet access so that what actually ends up on pypi can come from anywhere and vetting the sources is of little help?
That’s a nice property of centralized package management systems; I don’t think anything exactly like that exists for PyPI. The closest thing would be a cryptographic attestation.
(If I wanted to taxonomize these things, I say that the Debian model is effectively a pinky promise that the source artifacts correspond to the built product, except that it’s a better pinky promise because it’s one-to-many instead of many-to-many like language package managers generally are. You can then formalize that pinky promise with keys and signatures, but at the end of the day you’re still essentially binding a promise.)
Depends on what you mean by “this.” If you mean build provenance, yes, if you mean transmuting PyPI into the kind of trust topology that Debian (for example) has, no.
(I think PEP 740 largely succeeds at providing build provenance; having downstream tooling actually do useful things with that provenance is harder for mostly engineering coordination reasons.)
Don't have the token on your hands. Use OICD ideally, or make sure to setup carefully as a repository secret. Ensure the workflow runs in a well permission read, minimal dependency environment.
The issue with OICD is that it does not work with nested workflows because github does not propagate the claims.
They stacked the deck. If v2 was still rule inference + spatial reasoning, a bit like juiced up Raven's progressive matrices, then v3 adds a whole new multi-turn explore/exploit agentic dimension to it.
Given how hard even pure v2 was for modern LLMs, I'm not surprised to see v3 crush them. But that wouldn't last.
It's not really "automatic import", as described. The exploit is directly contained in the .pth file; Python allows arbitrary code to run from there, with some restrictions that are meant to enforce a bit of sanity for well-meaning users and which don't meaningfully mitigate the security risk.
> Lines starting with import (followed by space or tab) are executed.... The primary intended purpose of executable lines is to make the corresponding module(s) importable (load 3rd-party import hooks, adjust PATH etc).
So what malware can do is put something in a .pth file like
import sys;exec("evil stringified payload")
and all restrictions are trivially bypassed. It used to not even require whitespace after `import`, so you could even instead do something like
import_=exec("evil stringified payload")
In the described attack, the imports are actually used; the standard library `subprocess` is leveraged to exec the payload in a separate Python process. Which, since it uses the same Python environment, is also a fork bomb (well, not in the traditional sense; it doesn't grow exponentially, but will still cause a problem).
> Subsequent to this solve, we finished developing our general scaffold for testing models on FrontierMath: Open Problems. In this scaffold, several other models were able to solve the problem as well: Opus 4.6 (max), Gemini 3.1 Pro, and GPT-5.4 (xhigh).
Interesting. Whats that “scaffold”? A sort of unit test framework for proofs?
I think in this context, scaffolds are generally the harness that surrounds the actual model. For example, any tools, ways to lay out tasks, or auto-critiquing methods.
I think there's quite a bit of variance in model performance depending on the scaffold so comparisons are always a bit murky.
> I didn’t learn anything. I felt like I was flinging slop over the wall to an open-source maintainer.
Well I’m sorry you feel that way, impostor syndrome is tough to deal with already without AI.
You seem to be driven by understanding and you have a great tool to learn from here if you make an effort over time to grasp the “slop” you’re throwing to the wall. Be curious, ask why several times and explore guilt free over time when you are in the right mindset.
I’m glad you got something useful out of it this time and also not everything you do with AI has to be useful or a final “deliverable”, it can also be a great toy and window into more understanding.
Let’s recognize those bait posts as what they are, which for sure is not journalism.
reply