Hacker Newsnew | past | comments | ask | show | jobs | submit | Gehinnn's commentslogin

I just completed the formal verification of my bachelor thesis about real time cellular automata with Lean 4, with heavy use of AI.

Over the past year, I went from fully manual mode (occasionally asking chat gpt some Lean questions) to fully automatic mode, where I barely do Lean proofs myself now (and just point AI to the original .tex files, in German). It is hard to believe how much the models and agentic harnesses improved over the last year.

I cannot describe how much fun it is to do refactorings with AI on a verified Lean project!

Also, it's so easy now to have visualizations and typesetted documents generated by AI, from dependency visualizations of proofs using the Lean reflection API, to visual execution traces of cellular automatas.


Can you give some examples of this? Maybe have something online? I would love to learn more about how to do proof driven AI assisted development.

Here is a session that I just had with AI: https://gist.github.com/hediet/e3569a7c6b4b7c4f7d4a7db410104... (summarized by AI).

And here are some examples of the different philosophies of AI proofs and human proofs: https://gist.github.com/hediet/e3569a7c6b4b7c4f7d4a7db410104...

I use VS Code in a beefy Codespace, with GitHub Copilot (Opus 4.5). I have a single instruction file telling the AI to always run "lake build ./lean-file.lean" to get feedback.

(disclaimer: I work on VS Code)


it's a bit dated, but Terence Tao has a video of formalizing a proof with LLMs from 9 months ago which should be illuminating

https://youtu.be/zZr54G7ec7A?si=-l3jIZZzfghoqJtq


This is very similar to how I worked with Lean a year ago (of course in a much simpler domain) - mostly manual editing, sometimes accepting an inline completion or next edit suggestion. However, with agentic AI that can run lean via CLI my workflow changed completely and I rarely write full proofs anymore (only intermediate lemma statements or very high level calc statements).

Do lean poofs need to be manually reviewed?

Or is it as long as you formalize your theorem correctly, a valid lean program is an academically useful proof?

Are there any minimal examples of programs which claim to prove the thing without actually proving the thing in a meaningful way?


There have been bugs in Lean that allowed people to prove False, from which you can prove anything (they have been fixed).

Otherwise, if you check that no custom axiom has been used (via print axioms), the proof is valid.

It's easy to construct such an example: Prove that for all a, b, c and n between 3 and 10^5, a^n=b^n+c^n has no solution. The unmeaningful proof would enumerate all ~10^20 cases and proof them individually. The meaningful (and probably even shorter) proof would derive this from Fermat's theorem after proving that one.


The ads in Google also started like this. (However, to my knowledge, there is no way I can pay Google to get the ads in my search removed)

IIRC many OTT streaming players found that they can make more money from Ads than they can from subscription alone.

So paying for a service alone doesn't ensure that you are not going to see Ads.

Once they have exhausted their potential market of paying users, almost every service will eventually resort to Ads.


you can at least pay them to get the ads removed from youtube

Not really. I’ve had YT premium for years and it removes the obtrusive ones, but I see ads in some form every time I use it.

Probably means the perpetual “…speaking of security, let me tell you about our sponsor nordvpn…” that’s in every video on YouTube.

"if you're anything like me, ..."

What? No?

That can be done for free by more tightly managing your hardware platform.

Forget about that, the founders literally said word for word that advertising ruins search quality in their seminal paper. They became the exact thing they fore-warned about their competitors at the time.

Down-right joke really. The people who idolise them are incredibly delusional.


Good people are moral for a million bucks but almost none are for a billion.

lol pretty much. their reservation price was more than met.

Although considering Brin's interactions with female employees etc, no surprise really. They were full of it from the off. Page is better at hiding it.


Opus is quite good at refactoring. Also, we can finally have all the helper functions/beautiful libraries/tests that we always wanted to have. There is no excuse anymore to approximate a parser with regular expressions. Or to not implement the adapter class which makes an ugly unchangeable interface beautiful.

I believe the right use of AI makes it possible to write more beautiful code than ever before.


I would be extremely happy to be proven wrong! I love using agents for exploratory prototypes as well as "rote" work, but have yet to see them really pay off when dealing with existing tech debt.

I find that the flaws of agentic workflows tend to be in the vein of "repeating past mistakes", looking at previous debt-riddled files and making an equivalently debt-riddled refactor, despite it looking better on the surface. A tunnel-vision problem of sorts


Doesn't this have some implications for P vs NP?

How much compute do you need to convince a brain its environment is "real"?

What happens if I build a self replicating super computer in this environment that finds solutions to some really big SAT instances that I can verify?

Dreams run into contradictions quite quickly.


Most editors have some kind of spelling mistake linting extension, that should help!


We do use those, I clarified.


This is very cool!


I still have difficulties understanding on a high level why lengths in triangles can produce irrational numbers. I guess once you accept that area in two dimensions involves multiplication, it is a necessary consequence.

I wonder what it means for projects such as wolfram physics where space is discrete. Do truly right angled triangles even exist in nature?


I wish Spotify would allow me to easily compare the same classical pieces with different recordings!


Is doing a refactoring ever the simplest thing that could have been done? I think "do the simplest thing" should be "do the thing that increases complexity the least" (which might be difficult to do and require restructuring).


I wouldn't say "translating", but "finding/constructing a model that satisfies the business rules". This can be quite hard in some cases, in particular if some business rules are contradicting each other or can be combined in surprisingly complex ways.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: