Which AI should I use? Superpowers and the…

Ethan Mollick

Mar 18, 2024

387

And then there were three...

Read →

22 Comments

Sahar Mor

Mar 18, 2024

It's still hard to choose the right commercial model for API access, taking latency, cost, and overall performance into account. I wrote a quick guide covering LLM leaderboards to help AI developers and practitioners choose the most suitable model for the task at hand

https://www.aitidbits.ai/p/leaderboards-for-choosing-best-model

As for agents, I just came across this new benchmark from Carnegie Mellon and Microsoft that evaluates LLMs as agents https://arxiv.org/abs/2310.01557

Expand full comment

Ezra Brand

Mar 18, 2024

Great piece! I agree re Claude 3, it's the most human, least fluff. I wrote a comparative review yesterday that's especially focused on translation and humanities: https://www.ezrabrand.com/p/claude-3-vs-chatgpt4-a-comparative

Expand full comment

Reply (2)

Voxi Heinrich Amavilah

Mar 18, 2024

Hahaha, agreed. In concluding any “argument” where they (chatbots) all have “Overall …” Claude 2 already sounded not only human, but highly educated human. Lol

Expand full comment

Margie Meacham @ATD #AIinTD

May 20, 2024

I agree, Ezra. Claude is my go-to assistant for text. I work in education, so it is perfect for most of my needs.

Expand full comment

dan mantena

Mar 18, 2024

I was hoping to see you comment on the ethical principles driving the three models...

OpenAI and Google seems obsessed with getting to AGI while Anthropic is less obsessed and more interested in creating AI systems that are beneficial to humans via honest, harmless and helpful through their constitutional ethics framework.

Will any of these labs solve the control problem for super intelligence AI before deploying their products?

In my opinion, to evaluate AI models on a holistic level, we need to move past just the productivity lens...

Expand full comment

Sarah Ennett

Mar 18, 2024

Annoyingly I can't sign up for Anthropic Claude (due to issues with the phone number range not being recognised in my country for validation) or Google Gemini Advanced (issues with not recognising our country as being able to purchase anything).. so I can only play with ChatGPT-4 currently... so I hugely appreciate your commentary comparing them all, and will keep pressing these organisations to make their services more widely available!

Expand full comment

Bonnie B

Mar 20, 2024

What are your thoughts on Perplexity?

Expand full comment

Jim Dunnigan

Mar 20, 2024

Loved your analysis! LLMs are starting to feel like streaming services- $20 a month. Maybe I need to swap out Disney Plus for Claude?

Expand full comment

Didier Varlot

Mar 21, 2024Edited

Thank you for that article. It clarifies lots of things for me.

The capacity of AI is already astonishing, but it is nothing to compare with what it will be very soon.

We should remember that people were also afraid of cars when the first ones were commercialized.

Cars changed our way of living and AI will do the same. There is no need to be afraid, only the need to make our gºbest to understand how Ai will be useful for us.

Expand full comment

Gerard Fox

Mar 18, 2024Edited

Claude is not available in the EU, presumably due to the recent raft of EU Tech Laws. But Anthropic tout themselves as the most ethical of AI organisations which suggests issues.

Anyone know what’s going on here?

Expand full comment

Pranita Pramodrao Deshpande

Mar 23, 2024

Do you think using AI for us is a good motivation?

Expand full comment

Sudhir Gajre

Mar 20, 2024

Perplexity.ai is also a very good search plus AI tool.

Expand full comment

Andrew Smith

Mar 18, 2024

I'm getting a handle on Gemini, and I've had about a year with GPT4 now, so feeling pretty good at the limitations within each model (and a strong feeling that everything is converging, so if one group figures out a solution, the others hop on almost immediately). Still, each has its limitations and quirks.

It's hard to imagine that we're ready for another paradigm up, but it seems like that's gonna happen. Let's be ready for it.

Expand full comment

Deb Schiano

Mar 18, 2024Edited

Just want to thank you for this easy to understand explanation for non technically educated peeps!

Expand full comment

Reply (1)

Deb Schiano

Mar 18, 2024

Adding… how to know the difference between the various levels of each llm? Also, reading Ezra Brand’s linked piece, and see that GPT 4 can access the internet. This seems important to discuss…?

Expand full comment

D R

Mar 18, 2024

A great piece as usual, but I do have a small nit — Gemini Advanced is the name of the chatbot, not the model. The model is Gemini 1.0 Ultra. 1.0 Ultra:GPT-4 :: Gemini Advanced:ChatGPT Plus. It’s important to note because eventually Gemini Advanced will be powered by Gemini 1.5 Ultra and Gemini 2.0 Ultra. And, as your preview access suggests, the improvement from Gemini 1.0 Pro to 1.5 Pro is significant so I can only imagine how good the Ultra version will be

And a question — will the book have screenshots like your blogs do?

Expand full comment

Voxi Heinrich Amavilah

Mar 18, 2024Edited

Excellent piece—thank you!! If your assessment ends up being correct, we have a new regularity: collectively LLM technologies progress along the usual S-curve, but ChatGPT-X has been on a kinda inverted-Z. Very interesting and different frontier from the usual Pareto frontier. What does it more broadly — that inefficiency in the beginning is not bad? :)

Expand full comment

Michael Schrage

Mar 24, 2024

thanks for this overview and sorry to get to it late...while almost completely agreeing with the sensibility and ethos expressed, this struck me: "Even if LLMs don’t get smarter (though I suspect they will, and soon) new capabilities and modes of interacting with AIs, like agents and massive context windows, will help LLMs do dramatic new feats." first, am taken aback by the word 'smarter' - what do you 'mean' by that? as i look at the underlying math and evolving parameters/hyperparameters, how could LLMs - and their 'ecosystems' NOT 'get smarter'...? indeed, is there not some sort of measurable/positive correlation between context window 'massification' and the - yes - 'emergence' of 'smarter' and more contextually provocative responses? indeed, wouldn't 'new feats' indicate 'novel smarts' rather than just repurposed existing smarts? one reason i write this is that the more i/one 'plays' with these LLMs - and their agentic interlocutors/mediators - the more one has to wonder [sorry!!] where the 'simulation' of 'reason' ends and 'reason' itself begins....or even what query/prompt/interaction sequences renders those distinctions moot....at risk of sounding like a jerk, even these existing models push the boundaries and dimensions of the 'explainability of explainability' and the 'interpretability of interpretability'....nextgen models seem sure to have even greater impacts on ontological, epistemological and teleological insights....

Expand full comment

Andrew DePristo

May 31, 2024

I posted "Three LLMs achieve low accuracy in summaries of 12 stories in "Lesser Known Monsters of the 21st Century" by Kim Fu" on LinkedIn.

Bottom Line on Top: I found that 11 of 36 summaries by 3 different LLMs were nonsensical and hallucinogenic, while for an additional 5 of 36 the summary was factually incorrect. Major errors in 16 of 36 summaries (44%) is a cause for concern as summarizing a short story should play to the strength of LLMs.

The link below shows the verbatim summaries of each story provided by three LLMs, chosen because I had easy access to these LLMs: https://www.dropbox.com/scl/fi/aaue6o5ufhxtzjgm6nlht/Three-LLMs-achieve-low-accuracy-in-summaries-of-12-stories-in-Lesser-Known-Monsters-of-the-21st-Century.pdf?rlkey=q3alvre23cf983xxe1fecl2o3&dl=0

Expand full comment

One Useful Thing

Which AI should I use? Superpowers and the…