21 Comments

It's still hard to choose the right commercial model for API access, taking latency, cost, and overall performance into account. I wrote a quick guide covering LLM leaderboards to help AI developers and practitioners choose the most suitable model for the task at hand

https://www.aitidbits.ai/p/leaderboards-for-choosing-best-model

As for agents, I just came across this new benchmark from Carnegie Mellon and Microsoft that evaluates LLMs as agents https://arxiv.org/abs/2310.01557

Expand full comment

Great piece! I agree re Claude 3, it's the most human, least fluff. I wrote a comparative review yesterday that's especially focused on translation and humanities: https://www.ezrabrand.com/p/claude-3-vs-chatgpt4-a-comparative

Expand full comment

Hahaha, agreed. In concluding any “argument” where they (chatbots) all have “Overall …” Claude 2 already sounded not only human, but highly educated human. Lol

Expand full comment

I agree, Ezra. Claude is my go-to assistant for text. I work in education, so it is perfect for most of my needs.

Expand full comment

I was hoping to see you comment on the ethical principles driving the three models...

OpenAI and Google seems obsessed with getting to AGI while Anthropic is less obsessed and more interested in creating AI systems that are beneficial to humans via honest, harmless and helpful through their constitutional ethics framework.

Will any of these labs solve the control problem for super intelligence AI before deploying their products?

In my opinion, to evaluate AI models on a holistic level, we need to move past just the productivity lens...

Expand full comment

Annoyingly I can't sign up for Anthropic Claude (due to issues with the phone number range not being recognised in my country for validation) or Google Gemini Advanced (issues with not recognising our country as being able to purchase anything).. so I can only play with ChatGPT-4 currently... so I hugely appreciate your commentary comparing them all, and will keep pressing these organisations to make their services more widely available!

Expand full comment

What are your thoughts on Perplexity?

Expand full comment

Loved your analysis! LLMs are starting to feel like streaming services- $20 a month. Maybe I need to swap out Disney Plus for Claude?

Expand full comment

Thank you for that article. It clarifies lots of things for me.

The capacity of AI is already astonishing, but it is nothing to compare with what it will be very soon.

We should remember that people were also afraid of cars when the first ones were commercialized.

Cars changed our way of living and AI will do the same. There is no need to be afraid, only the need to make our gºbest to understand how Ai will be useful for us.

Expand full comment
Mar 18·edited Mar 18

Claude is not available in the EU, presumably due to the recent raft of EU Tech Laws. But Anthropic tout themselves as the most ethical of AI organisations which suggests issues.

Anyone know what’s going on here?

Expand full comment

Do you think using AI for us is a good motivation?

Expand full comment

Perplexity.ai is also a very good search plus AI tool.

Expand full comment

I'm getting a handle on Gemini, and I've had about a year with GPT4 now, so feeling pretty good at the limitations within each model (and a strong feeling that everything is converging, so if one group figures out a solution, the others hop on almost immediately). Still, each has its limitations and quirks.

It's hard to imagine that we're ready for another paradigm up, but it seems like that's gonna happen. Let's be ready for it.

Expand full comment
Mar 18·edited Mar 18

Just want to thank you for this easy to understand explanation for non technically educated peeps!

Expand full comment

Adding… how to know the difference between the various levels of each llm? Also, reading Ezra Brand’s linked piece, and see that GPT 4 can access the internet. This seems important to discuss…?

Expand full comment

A great piece as usual, but I do have a small nit — Gemini Advanced is the name of the chatbot, not the model. The model is Gemini 1.0 Ultra. 1.0 Ultra:GPT-4 :: Gemini Advanced:ChatGPT Plus. It’s important to note because eventually Gemini Advanced will be powered by Gemini 1.5 Ultra and Gemini 2.0 Ultra. And, as your preview access suggests, the improvement from Gemini 1.0 Pro to 1.5 Pro is significant so I can only imagine how good the Ultra version will be

And a question — will the book have screenshots like your blogs do?

Expand full comment

Excellent piece—thank you!! If your assessment ends up being correct, we have a new regularity: collectively LLM technologies progress along the usual S-curve, but ChatGPT-X has been on a kinda inverted-Z. Very interesting and different frontier from the usual Pareto frontier. What does it more broadly — that inefficiency in the beginning is not bad? :)

Expand full comment

thanks for this overview and sorry to get to it late...while almost completely agreeing with the sensibility and ethos expressed, this struck me: "Even if LLMs don’t get smarter (though I suspect they will, and soon) new capabilities and modes of interacting with AIs, like agents and massive context windows, will help LLMs do dramatic new feats." first, am taken aback by the word 'smarter' - what do you 'mean' by that? as i look at the underlying math and evolving parameters/hyperparameters, how could LLMs - and their 'ecosystems' NOT 'get smarter'...? indeed, is there not some sort of measurable/positive correlation between context window 'massification' and the - yes - 'emergence' of 'smarter' and more contextually provocative responses? indeed, wouldn't 'new feats' indicate 'novel smarts' rather than just repurposed existing smarts? one reason i write this is that the more i/one 'plays' with these LLMs - and their agentic interlocutors/mediators - the more one has to wonder [sorry!!] where the 'simulation' of 'reason' ends and 'reason' itself begins....or even what query/prompt/interaction sequences renders those distinctions moot....at risk of sounding like a jerk, even these existing models push the boundaries and dimensions of the 'explainability of explainability' and the 'interpretability of interpretability'....nextgen models seem sure to have even greater impacts on ontological, epistemological and teleological insights....

Expand full comment

Of course AI companies suck at naming things, they named their first creation je pète.

Expand full comment