I'm super glad that Anthropic has put this up -- having a set of examples is super critical for folks who are writing prompts. (In fact my collaborators and I wrote a paper about these challenges last year [1].)
But I also wish they had included more than one example input/output pair for each prompt type: people really pattern match hard on single examples for prompts, and don't get a good sense for what's critical and what's optional in any given prompt or input from just one example. It wouldn't be that much work to have a carousel of input/output pairs here that readers could flip through, and I suspect it would make this page 10X more useful.
If anyone from Anthropic is reading here, I can't emphasize this enough! It's not just about "well, users should experiment" because often users don't know what they can effectively experiment with in these prompts, because it's hard to extrapolate from just one working example. Multiple examples (even just 2-3!) will give readers much more background here!
I tried this output in an HTML file locally and of course, it's barely functional, but even the prompt itself seems so strange.
> Write me a fully complete web app as a single HTML file. The app should contain a simple side-scrolling game where I use WASD to move around. When moving around the world, occasionally the character/sprite will encounter words. When a word is encountered, the player must correctly type the word as fast as possible.The faster the word is successfully typed, the more point the player gets. We should have a counter in the top-right to keep track of points. Words should be random and highly variable to keep the game interesting.
> You should make the website very aesthetic and use Tailwind.
If I'm using WASD to move around, does that overlap with the act of typing any word that includes w, a, s, or d?
It's also got multiple typos? "the more point the player gets" "possible.The"
And while not a typo, "You should make the website very aesthetic" is funny.
In a sense I think this prompt is a good example of the assumable issues that exist with the doom scenario that company CEOs will be able to simply tell an LLM what they want and the whole program will be generated flawlessly for them. What the "CEO" wants in this scenario likely makes sense in their head but can't actually be built as described.
Though, really, there is no real WASD conflict here. When you hit a word, you'd simply disable WASD controls until the word is typed successfully.
Currently working on an AI product (essentially a OS-context-sensitive agent caller, basically a "smart" Spotlight for Windows/macOS), and from my experience (~2 months of working on this basically full-time), complex prompts (basically all of the ones provided by Anthropic here) are a mistake. Not only does the quality of output decrease exponentially with complexity (do x which gives you y, and for every y, do z), but also (and this is key), it messes with end-user expectations.
When looking at an email, asking the AI to "summarize this email" will basically work 100% of the time; when looking at a Github webpage and asking it to "clone this repo" will also work 100% of the time. But doing some weird combination of the two "clone the repo this email references" works reliably around 1/2^n (25%) of the time.
I think that there are many very low hanging fruit that are totally achievable using local LLMs, but too many people enamored by a mythical pie-in-the-sky "it does everything" app that we're missing basic things like: clone a git repo without alt-tabbing in my shell, semantic search of PDFs in a folder, add an event to my calendar without alt-tabbing in a new Google Calendar tab, and so on.
Users need to be trained to give simple one-shot requests to AI. The SQL example is particularly egregious: "Get the list of customers who have placed orders but have not provided any reviews, along with the total amount they have spent on orders." This is a disaster waiting to happen (I'd be curious to mess with the schema and watch it crash and burn); I'm almost certain that the example output is cherry picked.
This seems like fundamentally the wrong UX. I clicked "Google apps scripter" and the sample prompt is "Write me a Google apps script that will translate all text in a Google Slides presentation to Korean". Is this really useful to anyone?
I much prefer OpenAI's broad approach of general principles that hold across prompts.
I created my own Zen teacher with: "I want you to be my Zen koan master. When I ask you for a koan you give me one we haven't discussed before." It's really amazing how well this works. It gently challeges and guides me towards a correct interpretation, like a half-way decent human Zen teacher would.
I appreciate the Zen-like simplicity of using Claude Opus this way. I prefer this over ChatGPT's approach.
Pretty handy collection of prompts to do basic things with LLMs. I’ve had good results with using Claude to explain code, tag sentiment and extract emails or other specific content from free form text.
If you plan on using any of these at scale I recommend investing in a good evaluation test harness to check for regressions when you tweak prompts.
But I also wish they had included more than one example input/output pair for each prompt type: people really pattern match hard on single examples for prompts, and don't get a good sense for what's critical and what's optional in any given prompt or input from just one example. It wouldn't be that much work to have a carousel of input/output pairs here that readers could flip through, and I suspect it would make this page 10X more useful.
If anyone from Anthropic is reading here, I can't emphasize this enough! It's not just about "well, users should experiment" because often users don't know what they can effectively experiment with in these prompts, because it's hard to extrapolate from just one working example. Multiple examples (even just 2-3!) will give readers much more background here!
[1] https://dl.acm.org/doi/pdf/10.1145/3544548.3581388 (open access link)