Discussion about this post

User's avatar
Sahar Mor's avatar

It's still hard to choose the right commercial model for API access, taking latency, cost, and overall performance into account. I wrote a quick guide covering LLM leaderboards to help AI developers and practitioners choose the most suitable model for the task at hand

https://www.aitidbits.ai/p/leaderboards-for-choosing-best-model

As for agents, I just came across this new benchmark from Carnegie Mellon and Microsoft that evaluates LLMs as agents https://arxiv.org/abs/2310.01557

Expand full comment
Ezra Brand's avatar

Great piece! I agree re Claude 3, it's the most human, least fluff. I wrote a comparative review yesterday that's especially focused on translation and humanities: https://www.ezrabrand.com/p/claude-3-vs-chatgpt4-a-comparative

Expand full comment
20 more comments...

No posts