I think a fair comparison would be against a whitepaper? Clearly this is an exploratory venture and not production grade software.
You managed to clone the repo an run your test by yourself, whatever the outcome is this is a plus against the standard model for scientific research.
Also, a breath of fresh air among every other show HN thread using hundreds of adjectives to describe the "behavior" of a fully vibed system. I think this is a good model for presenting engineering projects.
> You managed to clone the repo an run your test by yourself, whatever the outcome is this is a plus against the standard model for scientific research.
That's so true, which is kinda funny since one of the cornerstone of scientific thinking is reproducibility.