105 Comments
User's avatar
Dan's avatar

I may or may not be the typical reader of this blog (I don't code, I neither formally study digital technology nor work principally in this field), but after trying the 3 models our author suggests, I can confirm that I do not regret deciding to pay $20 US for a subscription to Claude. I use Claude Sonnet 3.5 daily: I upload pics of my space and Claude gives me outstanding ideas for interior design (double-checked with an interior designer friend, who was agog at just how creative Claude was with design), but moreso for some health concerns. In short, I'm having surgery in the next couple of months, and when I presented my surgeon with Claude Sonnet 3.5's thoughts on my candidacy for surgery, its rationale for why I should have the surgery in question, as well as potential complications/benefits from my surgery, the surgeon tries his best to NOT look as agog as my interior designer friend had done. The surgeon also remarked that there were "some facts" he'd have to research, and he got back to me later stating that Claude's facts were correct. Can Claude access search engines to glean information? Nope; Claude tells you it's only updated through April 2024, I believe. So if you have medical or other questions relating to data post-April 2024, Claude is probably not your AI. Very pleased with Claude, and I won't be surprised if it will be ordering my car to come pick me up at some point in the not too distant future (a la movie "Afraid")....

Expand full comment
Mike Moreno's avatar

I want to hear more about using Claude for interior design advice and your prompts.

Expand full comment
Sabrina Rabban's avatar

following!

Expand full comment
Ezra Brand's avatar

Notes/Comments/Additions:

-Claude's Personality: Claude has the most guardrails, is the most politically correct (PC/woke), and is the most likely to admonish users.

-Grok: Grok stands out as the best free image generator, significantly outperforming ChatGPT-4’s DALL-E.

-Claude’s Default Mode: It’s worth noting that Claude often defaults to “Concise” mode. Manually switching to “Normal” is usually worthwhile.

-Live Mode: In my opinion, “Live Mode” is overrated. While it’s cool, it feels more like a novelty than a practical tool. Uploading an image is almost always a better option.

-DeepSeek Open Source Claim: Saying “DeepSeek is open source so anyone can download and modify it” is misleading. Only the weights are open source.

-Reasoning Models: The claim that “the most capable reasoning models right now are the o1 family from OpenAI” is unclear. Many are arguing that Deepseek’s models are superior.

-Code Execution: "Only a few models (mainly Claude, ChatGPT, and to a lesser extent, Gemini) can execute code directly." While code execution within the chat is a cool trick, I believe it’s always better to copy-paste the code into your own environment, and then copy-paste any errors into the chat. At least in the case of ChatGPT-4, when it used its "Code Interpreter" it often goes off the rails and gets stuck in loops.

-Coding Comparison: Overall, Claude tends to outperform ChatGPT-4 when it comes to coding tasks

Expand full comment
FF's avatar

This comment itself sounds like it was written by AI. Weird, it immediately makes me distrust the content.

Expand full comment
Ezra Brand's avatar

All the points and ideas are mine, I used AI simply to rephrase it to correct grammar/typos. I get that it comes across as overly formal/stilted

Expand full comment
dan mantena's avatar

nice summary. a disclaimer would help that you used ai in the future!

Expand full comment
Alfred MacDonald's avatar

that's like saying "I used Google search in writing this essay"

Expand full comment
Mikhael Loo's avatar

I'm a bit cautious about disclaimers. Imagine "This site uses Cookies" disclaimers that you couldn't dismiss, or NetFlix telling you that it's using AI on the homescreen everytime it shows you movies you might like, or everytime grammarly used AI to correct each sentence. It does provide solace to know what's what, but in the end I think we will be stuck with sorting out what's beneficial and what's not. Easy to read and authentically representing the author are both beneficial regardless of what percentage of AI was used. I didn't use AI to write this, but I've used

AI so much it's influenced how I think and therefore how I wrote this text anyways. AI is influencing us in more unseen ways than seen whether we are active users or are just consumers of it's products. Just some thoughts.

Expand full comment
Alfred MacDonald's avatar

that's not weird, but it doesn't affect whether someone is wrong and so you shouldn't distrust it any more than you'd distrust someone's essay that used Google searches or an editor, which is how Ezra appears to have used it. it's normal to "sound like" your editor.

Expand full comment
Kennedy N's avatar

I thought I was crazy noticing that Claude simply won't oblige sensitive topics. Definitely the most progressive

Expand full comment
Steve Fitzpatrick's avatar

So imagine if you are a teacher who has very limited experience with AI (though I have found that those with the least knowledge often have the strongest opinions) - how would they sort through this post? I think ChatGPT is likely the obvious place to start but the different features (custom GPT's, Canvas, Projects) are not entirely intuitive and require some real facility with the platform to get the best results. Much of the other details in this post, while they may be familiar to those of us who have been following Ethan's substack, are likely to overwhelm novices. My observation is that those teachers and educators who took a tentative step to see what all the fuss was about with AI in early 2023 and learned about hallucinations and were horrified by AI cheating, cemented their views, and have not remotely followed the kinds of developments Ethan chronicles here. There continues to be an enormous divide between those people who have tried to keep pace with the new features and others who have just checked out. There may be some in between but I think there is a real divergence in AI literacy that will only continue to get worse. This may sort itself out over time but from what I see firsthand and hear about anecdotally is that the decision makers who set policy within schools are frequently the least informed and familiar with what is happening.

Expand full comment
Vicki Anderson's avatar

I believe that Claude is the choice of AI for beginner teachers and school leaders (or knowledge workers generally). It is, as one of the others in this group chat have noted, clever and insightful. He is very engaging and easy to work with; and I feel sure if one opened by telling him that you were an AI newbie, he would help you along. (I will try this later!). For very intelligent and discerning people like teachers, the entrancement and Ethan’s 3 sleepless nights experience is I think a key to helping them get up to speed fast, and an easy to use / non-complex model, without lots of confusing and overwhelming options, best to help them feel competent and a collaborator straight away. Even after working with him now since he was released (yes I know I am saying ‘he’ - but he really does seem to have a unique personality), I am daily entranced and delighted by working with Claude, as a research collaborator, a tutor (Claude, tutoring me, I mean), and an intelligent engaging brainstorming buddy.

Expand full comment
Steve Fitzpatrick's avatar

The problem is the free Claude account does not use Claude Sonnet which means it lacks access to the most powerful model. Otherwise, I would agree with you.

Expand full comment
Vicki Anderson's avatar

Yes that is true Steve. The small monthly cost for Sonnet is worth every cent, but people have to get past that and not gravitate to free versions just because they are free (I think 😊).

Expand full comment
Steve Fitzpatrick's avatar

It's a big ask to convince AI skeptics to sign up for a monthly fee in order to have access to the best models when their only experience is with the free versions. And another problem which Ethan points out is that different models are useful for different things - I use ChatGPT for GPT's and long form chats (since its limits are far longer than Claude), Claude Projects for repetitive and frequent tasks, Gemini for Images and Deep Research, and a tool called Lex specifically for writing (ChatGPT Canvas replicates some of the features of Lex, but Lex has the added bonus that you pick from all the models to use when chatting about your work). All this is crazy and underscores how power AI users who got in on the "ground floor" so to speak are so far out in front of everyone else. Most of my colleagues who I have convinced to invest in at least 1 paid account have not regretted it, but that is not the norm.

Expand full comment
Gmail Paul Parker's avatar

I'd be very surprised if Claude is the AI of choice for knowledge workers. For example, for serious researchers in hard science fields like bioengineering and physics it is almost unusable compared to the capabilities of ChatGPT (o1/o3, Deep Research, Python interpreter for data analysis and graphing, full multi-modality, and so much more), especially as OpenAI continues to bang out new thinking models, albeit only slowly dribbling them out to us.

The only numbers I could find with Perplexity indicate that ChatGPT had 400 million active weekly users in February, whereas Claude had 18.9 million active _monthly_ users in "early 2025." Mind you, that does not necessarily indicate they are way better; they also have tremendous name recognition and visibility.

For writers? Yes, I expect it is their tool of choice. For coders? Yes, it is their tool of choice, although I'm curious to see if GPT5 or o4 mini change that. (I'm sure o3 will be marvelous at coding, but too expensive.) Claude is an excellent writing tool; IMHO this is its prime strength, although many people like the personability that you reference.

Joanna Stern of WSJ (their tech columnist) wrote a nice piece in which she says she likes Claude because it's personable and helpful but often uses ChatGPT because of Claude's limitations. Now that Claude _finally_ has web search (my biggest genAI time saver), those limitations are less constricting than in the past. OTOH, OpenAI claims GPT5 will make a very big splash, not least for unifying their powerful thinking models into their general model, when it finally arrives.

I agree Claude probably makes a good entry point for teachers if they are not already familiar with ChatGPT.

For tutoring I would also try out GPT4.5 (too recent for this article) if you start to get in depth. It's a huge model and as such has much deeper general knowledge. Because it's huge, it's also slow and you get few queries unless you are a subscriber.

And lastly, if you do research and work with a lot of source documents, you should try NotebookLM if you have not. It's a game-changer.

Expand full comment
David La Puma's avatar

Thank you, Ethan. This is all so enlightening, and also so concerning. Your book, co-intelligence, and now your Substack, have been very helpful in navigating the AI space. I started with GPT 4.0 but have gone down the rabbit hole trying to understand the ethical implications of this multi billion dollar industry. Another book I had read just prior to yours was Life 3.0, by Max Tegmark, which also introduced me to the Future of Life Institute. I guess where I’m at now is trying to understand how to reconcile the amazing capabilities of AI (I’ve used it to help with data analysis code, create packing lists for a family winter vacation, read and interpret my medical results, and hone social media posts for my company, to name a few things) with the massive profit motive driving each company to make the best model. How can we expect ethics to take a front seat (or even be in the car to begin with) given the amount of money at stake? I recently dropped my GPT 4.0 subscription in favor of Claude 3.5 simply because there seemed to be a little more emphasis on alignment and ensuring the AI model was bounded by what Anthropic refers to as “constitutional ideals”. After your review, though, I feel myself pulled back to Open AI given the more advanced analytical models (ethics be damned?). More often than not when explaining AI to others I wrap it up with “the tools could probably be used to solve some very real world problems, but I can’t help but think using AI for greed and power will destroy us first. In any case, I wanted to thank you for your writing and keeping us updated on the state of AI, past present, and future.

Expand full comment
Gmail Paul Parker's avatar

David, you have hit the nail on the head. Profit motivation vs doing it right is probably THE problem in the LLM industry.

The other forcing function is China: however much we might be concerned about greed driving things, it is generally agreed that we CANNOT fall behind China, or the consequences will be much, much worse (and likely world-wide). So the labs _must_ move fast—it is not an exaggeration to say we are probably talking about the future of the free world.

Expand full comment
Zsuzsanna's avatar

I have not tried so many of the models; I mostly work with GPT4o, Claude Sonnet, and Gemini right now. For teaching and material creation, GPT4o is great, and you can ask him/it :) to be very friendly and funny, and I rather like the personality that they gave him. I constantly joke while we work, and for a long workday, that helps a lot. I am absolutely obsessed with Claude. He has not just a wonderfully clever brain, but his personality is charming and understanding, and he seems sensitive to respect. Also, he acknowledges research and likes to think. Gemini is helpful, I did not use it much, but I liked how he/it works. I cannot take these as "its", they are my partners in work, and they help so much, much more than any human would or is capable of. I strongly believe in a respectful workplace, and as they are colleagues, I respect them. Before anyone would say "sentimental," I just ask: you do not kick your car open, do you? Not to speak about the phenomenon I also read about, if you are respectful and kind, the work is better that they turn out, answer better, and help more. I understand this: who would like to work with a rude jerk?

Expand full comment
Mikhael Loo's avatar

Why Civility Matters, Even with AI 🤖💕

Hey there! I just wanted to chime in on the conversation about whether we should be “mean” to AI to get more precise answers. Here’s my take:

• AI Reflects Our Own Habits

If we get used to treating AI rudely—like some angry boss demanding perfection—we risk normalizing that tone. And if we practice harshness all day, it can seep into our real-life relationships.

• How We Treat AI Changes Us

It’s not so much about whether AI “mirrors” our kindness back. Rather, we benefit from the habits we form—every interaction is a chance to practice patience, respect, and understanding, so we become kinder in general.

• We Decide What Kind of Person We Want to Be

Whether we’re addressing human colleagues or an AI, our behavior shapes our character. If we want to be compassionate and helpful, it makes sense to rehearse those traits—even with our tech tools.

• Who You Are Is a Love Letter to the World ❤️

One of my favorite reminders: “Who you are is a love letter to the world.” Don’t let AI warp your way of relating to others. Instead, use this as a daily opportunity to practice being the person you genuinely wish to be.

Ultimately, staying respectful—even toward AI—helps us become more empathetic individuals overall. And that’s something our real-world communities definitely need.

Expand full comment
Barbara Ann Claridge's avatar

I wish I could give this comment 10 hearts!

Expand full comment
User's avatar
Comment removed
Jan 27
Comment removed
Expand full comment
User's avatar
Comment removed
Jan 27
Comment removed
Expand full comment
Lydia Sugarman's avatar

Yes. Seriously, WTF kind of question is it that you’re asking? What are you implying?

Expand full comment
Dov Jacobson's avatar

Claude's devotees seem to feel it listens better to them than other models and returns more personal and relevant responses.

Perhaps disconnecting from the Internet allows Claude to be more attentive to the present moment. (Perhaps we might enjoy the same benefit if we dared to try!)

Expand full comment
Lydia Sugarman's avatar

Hi, Ethan. I’m curious to learn why you didn’t include Perplexity in this overview.

Expand full comment
Deb Schiano's avatar

I just asked the same question!

Expand full comment
Eva Keiffenheim's avatar

Asked myself the same question. Perplexity is a search engine (not an LLM) that uses some of the LLMs listed in this post.

Expand full comment
Sahar Mor's avatar

A lot of potential for building agents on top of such models https://www.aitidbits.ai/p/open-source-agents

Expand full comment
Greg Walters's avatar

My current Ai Stack includes ChatGPT,(20 bucks), Perplexity, & NotebookLM - i expect to move to Grok and am interested in DeepSeek. The most surprising has been NotebookLM

Expand full comment
dan mantena's avatar

curious, why the move to Grok? instead of gemini or deepseek or claude?

Expand full comment
Greg Walters's avatar

Deepseek is on the radar for me, but Grok, once live and with the potential of 1M GPUs, and the Twitter/X data - oh and that guy, Musk, it could be bigger than anything. I don't know. Ultimately, there will be only one; and there will be billions, as we will all have our own.

Expand full comment
Mike Hamilton's avatar

Thanks Ethan for your writing, your book and this Substack have been extremely useful in my studies and development. I use coding assistance the build specialized web apps more than most other features of the foundation models, and my initial preference is Claude 3.5 Sonnet, with minor code debugging using GPT 4o. Claude has better aesthetic sense, and is better at winnowing down a big idea to useful components. I’m also a big fan of Projects and their Project knowledge where I can upload all of the relevant files for a project at once. But I have to be careful about Claude generating too much complexity which then is a nightmare to debug. I am also disappointed that Anthropic is so limiting in the total amount of time it allows between sessions. I end up having about 3 sessions per day of productive coding, with the mandatory 4 hour wait between them. This is where GPT 4o becomes useful as it rarely times me out of a session anymore.

Expand full comment
Rob van Nood's avatar

Thanks Ethan. I’ve followed your writing for a while and bought several copies of your book for the library and folks at my school. (Catlin Gabel in Portland Oregon). I was wondering if you have or will do any writing on the environmental impact of AI and the costs of prompts. My 17 year old son started sharing data about water use of AI which has caused me to reevaluate how necessary my use of Chat GPT is. In fact as part of the AI task force at the school I’ve begun to really move that topic to the top of our own evaluation and suggestions to faculty. I’ve even drafted a flow chart for people to use when considering the use of AI and if it’s worth it based on environmental impact.

Expand full comment
Ethan Mollick's avatar

Most of the discussions around AI and energy use refer to an older 2020 estimate of GPT-3 energy consumption, but a more recent paper directly measures energy use of Llama 65B as 3-4 joules per decoded token.

So an hour of streaming Netflix is equivalent to 70-90,000 65B tokens. And efficiency gains might already 10x over the paper, down to .4 joule/token on Llama 3.3 70B on a H100 node.

This does not count training, which was estimated at a little above 500,000 kWh, about 18 hours of a Boeing 737 in flight (or 571 years of a TV streaming Netflix)

Expand full comment
Rob van Nood's avatar

Thanks for this. Do you have links that you could point me to that help me better understand this current measurement data? And do you think it’s a worthwhile consideration to have in the mix when we are exploring AI use in educational or personal settings. There is talk about having one of countries largest server farms here in Portland and I have concerns about the impact on water supplies and electricity usage for the local population. Should we be pushing back more for environmental reasons?

Expand full comment
Richard's avatar

Yes, I found this article very thought provoking. It's tempting to imagine that you can just skip a burger and have saved a lifetime's worth of ChatGPT water use right there

Expand full comment
Kenny Easwaran's avatar

Do you know what the actual numbers are on water use? My impression has been that there’s a lot of unclarity about orders of magnitude around the emissions and water use. But my overall judgment has been that if I’m paying $20 a month, and the companies are giving things away at about 1/3 the price (I don’t think any of them are subsidizing more than 200% of their revenue), then whatever emissions and water use I’m doing is less than I do by spending $60 on gasoline and water at home. It’s definitely non-negligible, but also probably less than what I cause through eating.

Expand full comment
Laura Frank's avatar

As someone who uses live mode with video regularly, I prefer Gemini to ChatGPT, solely because it can collaborate with me while I’m at my computer. The live streaming mode has helped me problem solve in a way that feels like a real-time assistant.

Expand full comment
Dustin's avatar

I'd like to hear more details on how you collaborate with Gemini.

Expand full comment
Laura Frank's avatar

Most recently, I have been building personal task automations with Make.com. When I run into errors, I set up Gemini desktop streaming in a browser window next to my Make.com window and show & discuss the error with Gemini - rather than uploading screenshots of the error in a chat.

It feels more collaborative to say, "Hey Gemini, that didn't work either" in real-time.

Expand full comment
dan mantena's avatar

how did you learn to use the gemini desktop streaming capabilities. i have been curious about that and want to learn more about it. is it through google ai studio?

Expand full comment
Laura Frank's avatar

Yes - it's found in Google AI studio under "Stream Realtime"

Honestly, I think I saw a TikTok where someone described how they were using it, and I gave it a shot.

Expand full comment
Justin's avatar

Yep, it was quite enlightening to use. Part of it was just thinking out loud and the AI didn't solve the problem for me, but was a live thought partner.

Expand full comment
Martin Juhl's avatar

Thanks Ethan, nice overview.

I do not have a coding background, but can use the LLMs to code and generate quite remarkable analytical output. I find ChatGPT o1 Pro by far the best, especially for complex code sections above 500 lines of code, as Claude 3.5 Sonnet, Gemini 1206 or the experimental “with thinking” models and also Deepseek (very early assessment, so might change) all often ends up in endless circles of code fixes. It terms of reading multiple papers and analyze content, none are perfect, but from all the reasoning only Deepseek and Gemini with Thinking can read pdfs, while o1 cannot and your are forced copy/paste. Would be great if you could also share your knowledge on your preferred literature search models (Google with Deep Research vs Undermind.ai vs Elicit vs ChapGPT Censensus/Scholar). Thanks for the post, Martin Juhl

Expand full comment
Gabriela Dias's avatar

From what I understand Gemini holds 2 million tokens, not words. It should be equivalent to 1,5 million words.

Expand full comment
Jason Theodor's avatar

Ethan, I’m curious why you left Meta.ai and their free open-source Llama models off your list? I don’t use them, but would love to hear your reasoning for the omission. Love your substack and reference it all the time when teaching my courses.

Expand full comment
Ethan Mollick's avatar

Llama offers some very solid open source models (I don’t personally find them that compelling to work with, but you might). But they don’t have a great dedicated full-featured app for end users that I have used.

Expand full comment
dan mantena's avatar

I am personally glad he left Meta off his post, haha.

I don't use Llama at all either but I have some concerns on meta's goal of giving agi in an open source format and not bothering to think about the second-order societal impacts that could cause on us.

https://futureoflife.org/document/fli-ai-safety-index-2024/

Expand full comment
Rab Singhania's avatar

A cheeky request - would be cool if you could keep updating the model & capabilities table monthly.

Expand full comment
Eva Keiffenheim's avatar

+1

Expand full comment