As a former competitive MtG player this is really exciting to me.
That said, I reviewed a few of the Legacy games (the format I'm most familiar with and also the hardest by far), and the level of play was so low that I don't think any of the results are valid. It's very possible for Legacy they would need some assistance for playing Blue decks, but they seem to not be able to know the most basic of concepts - Who's the beatdown?.
IMO the most important pars of current competitive Magic is mulligans and that's something an LLM should be extremely good at but none of the games I'm seeing had either player starting with less than 7 cards... in my experience about 75% of games in Legacy have at least one player mulligan their opener.
Yeah, the intention here is not to answer "which deck is best" - the standard of play is nowhere near high enough for that. It's meant as more of a non-saturated benchmark for different LLM models, so you can say things like "Grok plays as well as a 7-year-old, whereas Opus is a true frontier model and plays as well as a 9-year-old". I'm optimistic that with continued improvements to the harness and new model releases we can get to at least "official Pro Tour stream commentator" skill levels within the next few years.
Hmm well, from my perspective, none of them are even really playing the game, they are just taking random actions. Any human, even a small child, would be much better.
And re: ages, it's worth noting that the youngest player to make Day 2 of a Grand Prix is 8 years old, and the youngest Pro Tour winner was 15 years old. I don't think it's realistic to get an LLM anywhere close to either of those players in skill level, though it's absolutely possible with a specialized model.
> , so you can say things like "Grok plays as well as a 7-year-old, whereas Opus is a true frontier model and plays as well as a 9-year-old".
no, no, no.. please think. Human child psychology is not the same as an LLM engine rating. It is both inaccurate and destructive to actual understanding to say that common phrase. Asking politely - consider not saying that about LLM game ratings.
The model does not seem to get told what the cards actually do, it's just trying to guess.
Probably less of a problem for Legacy, but it isn't even told what format it's playing. So I expect it's easy to get the bots to do far better than this by giving them more information.
Let me analyze this opening hand:
*Hand analysis:*
- 3 lands (Riverpyre Verge, Steam Vents, Willowrush Verge) - dual lands which is good
- Monument to Endurance ({3}) - a 3-drop
- Accumulate Wisdom ({1}{U}) - 2-drop, likely a blue advantage engine
- Boomerang Basics ({U}) - 1-drop blue spell
- Gran-Gran ({U}) - 1-drop blue creature (1/2)
This is actually a reasonable hand. I have:
- 3 lands giving me mana access (with color fixing)
- A 1-drop (Gran-Gran or Boomerang Basics to choose from)
- A 2-drop (Accumulate Wisdom)
- A 3-drop (Monument to Endurance)
The curve is solid. I have some interactive blue play with Boomerang Basics and early board presence with Gran-Gran. The mana base supports blue well with all three lands. This is keep-able — while not spectacular, it's a functional opening hand.
Oh, that's a good bug report - historically it was just hallucinating card effects so I made the harness throw the Oracle text for all visible cards into the context, but I bet I forgot to do that for the mulligan decision specifically (it's a weird one). Thanks!
I picked a random commander game and the first message 1/4 players left was "Well I should be getting a new hand soon because I have asked for a mulligan". Its definitely in there, whether they are using it correctly is another question.
This is actually really interesting to me, but the way to determine if you should mulligan is if the 7 cards you are looking at is better than the average 6 cards in your deck. Given that games in most higher power formats end in the first 2-3 turns, the number of cards isn’t as important as the quality generally. So it’s really just math to determine what an “average” hand looks like.
Uh, LLMs are notoriously bad at basic arithmetic. I think you might be thinking about another kind of AI.
Plus I don't really believe LLMs can reliably tell which hand is better. If you remove the drawing part and simply present two hands to an LLM and ask it which one is stronger I expect it to do much much much worse than experienced player. There isn't much reason to expect otherwise (but I'm willing to be proven wrong if such benchmarks exist)
I can't see any movement, at any distance. How likely is it something weird with my vision vs. something weird with my monitor/computer? I'm on a 360hz monitor at 2k.
I think this article discounts the reasons behind frontend decisions... priorities are absolutely fast execution time and ease of hiring. There is very, very little reason to care about optimizing frontend performance for a vast majority of apps. Users just don't care. It doesn't make the company more money.
If a framework is easy to use and everyone knows it, it's simply the best choice for 90%+ of teams.
The UX for me went downhill the last 5-7 years.
I don’t know if it’s react but something changed. Pages load slow or even don’t, strange display errors, slow reaction times etc.
Too few run output analysis on their bundles or even track bundle sizes. There's a lot of kitchen sink repos, not to mention any number of other bottlenecks between the front end and back end. Worse across split teams for larger apps.
> There is very, very little reason to care about optimizing frontend performance for a vast majority of apps. Users just don't care. It doesn't make the company more money.
There’s plenty of users who care, but when the competition is also all slow and heavy they don’t get any choice in the matter.
Not directly, but when you have devs who only know how to build with the framework and don’t have a grip on what’s going on under the hood or how it all interacts in the browser environment (increasingly common), performance is sure to take a hit.
It's not React's fault that people either don't know what they're doing, or don't care enough to make their software performant. This is not a new phenomenon, bad/rushed software has always existed.
This happens regardless of which framework is used or even if no framework is used. Plenty of web developers do not understand how the browser or JS work at a deep level.
Yeah, it's pretty close to the "Imagine how great the world would be if everyone used Lisp/Haskell/WhateverLang instead of Java/JS everywhere!" take you sometimes see. As if the common developer wouldn't just write in all those languages like they're Java/JS, and keep clear of the advanced macros/type systems/whatever.
Even languages or environments that try to "steer the developer into the correct direction" have only really managed it when the new direction is something they already might've chosen to write. Otherwise, you just end up with many square pegs filed down to fit in round holes.
This isn't true at all if you're working on maintaining a web app. When ease of hiring and getting tasks done quickly have become the priority it's because the business has let too much work pile up. It has very little to do with the money unless it's a small startup.
Frontend skills are misunderstood by most of HN because it's a hard role that directly involves business and product wants. There's a ton of hiring (and firing) because it's not easy to find the right people who can communicate about the work clearly with non-devs, navigate the office politics, know what to push back on or when to ask questions, and still write good code.
I agree that maintaining web apps is an entirely different set of skills, though in my experience (mostly small and mid size companies) PMs come in with massive projects and huge changes constantly and management has to say yes to a few. I try my best to shield my devs as much as possible from the politics but usually my teams are still ending up with huge 4-5 sprint frontend projects. It's extremely hard to find devs who can create simple technical designs when there is absolutely any frontend complexity (especially things like wizards, why are wizards so hard for people...). My standard these days for a "good hire" is anyone who can handle these sorts of projects without a huge amount of help.
I felt this way for a long time and used a heavy mouse for daily use then switched to my lightweight mouse for gaming. I changed my mind very fast when I started developing carpel tunnel symptoms from the heavy mouse - using your wrist to move a heavy object in the same pattern for 6+ hours a day is just not healthy for you and when your job involves doing that, its 100% worth it to just use the lightweight mouse for everything. I actually found that my wrist felt better with the 55g mouse than with a trackball or a touchpad.
An alternative approach that has worked for me is to get out of the habit of using little muscles as much as I can. I don't use my wrist much with the mouse, I move my whole arm. I try to apply that to everything I do and I've managed to avoid repetitive motion injuries. Last time I experienced anything like carpal tunnel was in college (decades ago).
In my mid 20s I started developing RSI. I went through a bunch of different devices designed to supposedly be "ergonomic" and while the problem would go away temporarily, it would eventually come back.
I eventually found that it's not any one device that cures RSI. It's much better to switch up between different devices on a regular basis.
So, now I have two keyboards I switch between about weekly. It's fairly easy because I just switch between working at the office vs home and have two different designs of devices in the two locations. Occasionally I use the laptop keyboard and track pad directly. I still program a lot and haven't had another RSI flare-up in about 7 years.
you can ask or hire any personal ergonomic service/company to evaluate what needs to get better at your workstation and they will say that devices are the last item on the list. posture, adequate chair/table height and breaks are much more important than 50 grams less on your mouse
The OSHA Computer Workstation Posture Checklist is gospel and if anyone reading this is struggling with RSI and hasn't followed it to the letter, please do so immediately. It is frankly disgusting how quickly a few seemingly small bad habits can grow into a monster nerve issue
Love the idea and technology - I’d much prefer if the output of this was an MPC order since that’s how almost everyone is making proxies these days. Getting my entire cube printed was only about $100 and they are indistinguishable from real cards.
The joy of proxies is you can do whatever you want! My take was to not make these seem as real as possible, and one where you can print on demand.
Imagine a proxy only tournament of any card game, where you have to submit your decklist ahead of time and it is waiting for you when you arrive, ready to play and keep.
You are basically charging money for things that frameworks like Django already have built-in, and have already had built-in for decades. There are so many good free options for this sort of thing. If a company already pays developers, there's absolutely no way they won't just build things themselves.
All of your selling points are things that business people think are difficult but any developer worth their salt will tell you is no problem.
Edit: to add a bit more info here... think of it this way: I am a frontend developer. I can build a specific dashboard in about 2 months, using things I am already familiar with and will look great. Instead, I can use your product and reduce the amount of time it will take to 1 month, but lock us into paying a monthly subscription for the rest of time. How could I possibly convince somebody to buy that?
Well, if it costs $10,000 all-in to pay a developer for 1 month, and Creo costs $30/month, then it would take 27 years to break even on having a developer spend the extra month building it from scratch.
Impossible to find? You don’t have salary info on that listing and a quick google shows that your salaries are something like half what I’m making as a remote RN engineer. Very unlikely that it’s because of RN.
When I googled your company the only results listed total comp under $100k for software engineers. 120k is slightly better than that. Equity is worth zero at private companies unless there is a system in place to sell it, so don't try to claim that is worth anything to most people.
I make around $200k + equity as a Staff Engineer, fully remote, though as my company is also private, equity === 0. I would say I am moderately underpaid in the current market but I'm happy with that tradeoff since my company is fully remote, great work life balance, and I work with great people.
That’s a US salary though, right? So you would expect it to be higher. Granted some companies might pay the same salary regardless of country but in my experience it’s usually adjusted
Not related to your actual point, but I researched corporate boards a few years back and found many instances of people on boards of two direct competitors. In fact, I found a couple of companies with CEOs who also served on the board of a competitor. This was going through maybe a few hundred large corporations, so I’d have to assume it’s incredibly common.
I haven't studied this in depth but I recall that there's a book out there that sort of blasts the management consulting industry. How a lot of corporates will hire the same consulting company such as McKinsey to learn what their competitors are planning/how they are performing. IIRC
This is absolutely incorrect. Consulting companies, at least the major ones, will never let any consultant work for more than one competitor throughout their career. They also strictly forbid sharing any info learnt about a company with anyone else, talk less of with competitors.
Eric Schmidt sat on Apple's board while running Google, but left in 2009 when they became competitors in the smart phone business. Before that, Apple & Google weren't competitors.
This is notoriously playing out in the ETF space. Large institutions own vast quantities in stock in competitors and competition decreases as a result. See Rise of Institutional Investors Raises Questions of Collusion.
Yep, this is where I'm at too. 13" is just not enough space for me to get anything done productively. On my 13" MBP with my standard font size, my code editor can't show a full line of code without me having to scroll (VSCode with mostly default settings, font size 12).
That said, I reviewed a few of the Legacy games (the format I'm most familiar with and also the hardest by far), and the level of play was so low that I don't think any of the results are valid. It's very possible for Legacy they would need some assistance for playing Blue decks, but they seem to not be able to know the most basic of concepts - Who's the beatdown?.
IMO the most important pars of current competitive Magic is mulligans and that's something an LLM should be extremely good at but none of the games I'm seeing had either player starting with less than 7 cards... in my experience about 75% of games in Legacy have at least one player mulligan their opener.
reply