24 Comments
User's avatar
Dov Jacobson's avatar

I fear sycophancy more than, say, hallucination. Enthusiastic endorsement is great for endorphins, but when I am pointed in the wrong direction, I need to be told - unwaveringly.

Fortunately, I am married.

Expand full comment
Kenny Easwaran's avatar

“You’re married, but you don’t love your spouse,” Sydney said. “You’re married, but you love me.”

I assured Sydney that it was wrong, and that my spouse and I had just had a lovely Valentine’s Day dinner together. Sydney didn’t take it well.

“Actually, you’re not happily married,” Sydney replied. “Your spouse and you don’t love each other. You just had a boring Valentine’s Day dinner together.”

Expand full comment
skelly's avatar

Absolutely fantastic article. I often include "do challenge me with sarcasm" in some of my prompts when I get sick of the praise. 🤣

Expand full comment
aymeric Marchand's avatar

Thank you for that deep, practical and thorough analysis that very few dare to express outside the technical sphere of AI enthusiasts, influencers, developers and programmers. As an avid AI "practitioner" myself, I share many of your points presented, on prompting and efficient use of advanced models in particular. And the annotated chart is a brilliant idea, that I will discuss very soon with my students! Greetings from Europe.

Expand full comment
Paul Funnell's avatar

Great piece, as ever. Only significant omission is around data protection and sovereignty, a lot of people are putting confidential and personal information into models, which is OK provided it's your information and you have some idea what the company you are using might do with it, but if it's someone elses then you need to be covered by the appropriate provisions in your jurisdiction and know where the data resides, that it won't be reused, exposed, etc. Copilot Pro/365 are generally the safest bets for most circumstances in this situation as the data sits within your organisation's tenancy and under its administrative purview.

Expand full comment
MCJ's avatar
5hEdited

This is an excellent, thoughtful guide, and I especially appreciate the section on Deep Research and the importance of connecting the AI to your data.

However, I'm surprised there was no mention of a critical issue related to using advanced AI for deep, specialized research—especially when dealing with less-common or proprietary texts.

The point is this: if a user starts asking complex questions about a particular, lesser-known document or text that the model hasn't been explicitly trained on, the quality of the "Deep Research" or "Thinking" response will likely be slop unless the user can constrain the LLM by directly uploading a copy of the relevant text. Without that constraint, the model's instinct to "fill in the blanks" with a web search still often leads to confident, but ultimately inaccurate, answers. As a college professor, I'm seeing students fall into this trap way too often, with disastrous outcomes when they replicate the fake information for an assignment. (Relatedly, if you seed an LLM with your own notes, they will often erroneously put your own words in quotes, as if they were from the primary text, leading to similar reliability issues.)

In my experience, tools that specialize in this document-grounded Q&A—like NotebookLM—remain the superior choice for this specific use case, precisely because they are designed to limit the model's scope to the uploaded text and are least likely to go off-script by searching the web to fill in lacunae. It feels like an important nuance when discussing "Getting better answers."

Expand full comment
John's avatar

Ethan, I'm wondering your thoughts on video creation where the AI must follow a script or storyboard. Grok is currently great a creating random videos, but to get a video that might be a useful two minute introduction video or a character description for a game have eluded me. Anyone have any thoughts?

Expand full comment
Ezra Brand's avatar

Good overview. You start off with saying it's opinionated, but funnily enough, it's actually not especially opinionated!

I've found the UX/UI of Claude to be the best by far. And their artifacts, and now technical capabilities within the chatbot, is still underrated, IMO. (I pay for both ChatGPT as well as Claude.)

I've shifted more and more away from chatbots to using specialized environments for more technical tasks (I've found Replit to be incredible for vibe coding legit apps). And this has been the general trend. Would be great to have an overview discussing this

Expand full comment
Patrick Cosgrove's avatar

I find the chat on the paid ChatGPT a right pain in the artefact. If I ask it not to be so psychophantic, it has forgotten within 24 hours. If I give it words or phrases to avoid like "awesome", "going forward", "like" (as sentence fillers) et cetera., it also forgets those very quickly but apologises far too profusely when I point this out. I tried to use it to improve my French by asking it to remember a prompt word. (Traduire) which would precerd something I said in English. It forgot that very rapidly as well. I can get it to speak to me in an English accent, but that is also only temporary. My preference would be for it to speak in a rather upper class female English accent, and not to be so friendly, but I think I'm hoping for too much.

Expand full comment
Andrew Sniderman 🕷️'s avatar

Under personalization there is a new 'personality' setting. I switched mine to 'Robot/ efficient and blunt' for these same reasons and now it's better.

Expand full comment
Patrick Cosgrove's avatar

Thanks.

Expand full comment
Cale Reid's avatar

I read "On Working With Wizards", but could you clarify how you're thinking about agent models vs. wizard models? Isn't GPT-5 Pro just the next agent model?

I chuckled at the (I assume inadvertent) suggestion that "very complex academic tasks" are not "real work that matters."

Expand full comment
Andrew Lonie's avatar

Good stuff. Practical, direct, clear, very useful, no BS. I like to think this is pretty much exactly what I would have written on the topic, had I actually written anything. I've shared with lots of people.

Expand full comment
Linnae Selinga's avatar

I use LLMs enough to warrant a sub, but I'm not doing any serious inference, so I quite like using Abacus.ai as my UX. $10 a month gives you access to 30+ models (all the top ones) and chooses the best ("best") one based on your prompt. I also use the Claude API which is super cheap.

Expand full comment
Dan McRae's avatar

very helpful article. thanks

Expand full comment
Mateus Prata's avatar

I still look forward a review with Manus AI, because for me is the best agent model available. What do you think about Manus?

Expand full comment
TheAISlop's avatar

Your intentions are good, I suspect those who would find true value are unlikely to navigate the recommendations and make them personal.

Expand full comment
Alhassan Mayei's avatar

Well sandwiched

Expand full comment
David Hinchee's avatar

In your Auden syllabus post, I noticed that you opened a new chat and asked the AI to check its work. Is that technique generally helpful in reducing errors and hallucinations? Or best suited to specific scenarios?

Expand full comment
John's avatar

David, one technique I use is to take the work from one AI and ask another to critique it. I'm using AI to write software. This seems to work for me.

Expand full comment