63 Comments
User's avatar
The Bull and The Bot's avatar

Great breakdown! One thing I always mention when people say they’re wary of using AI assistants because of hallucinations: the mindset needs to shift. These aren’t just Q&A robots. They can actually be your critical thinking partners.

The real value isn’t in asking “what’s the answer?” It’s in using these models to stress-test your thinking. They can:

1. Expand your ideas

2. Validate or poke holes in them

3. Surface POVs you may have completely overlooked

Yes, they’re great for answering simple questions but in doing so, they can also hallucinate too. The key is in how you engage with them.

Give o3 a thesis, for example a stock idea and your reasons for liking it. And give it a persona, like a skeptical hedge fund portfolio manager. Ask it for 10 reasons that support your case and 10 that challenge it. You’ll get new angles, risks you hadn’t considered, and potential counterarguments to prepare for. Now, the conversation isn’t about being right or wrong now. It’s about being more rigorous.

Bottom line: don’t use LLMs only as search bars. Start using them like strategic thought partners. Pick its brain so that it shares information that can sharpen your thoughts and help YOU make more informed decisions.

Expand full comment
Federico's avatar

This is incredibly timely and useful. I get asked this all the time—and even some people who are paying for the good models (say, ChatGPT Plus) are not aware that they can switch to more powerful models, so they're missing out. A quick "please share your screen and tell me what you want to do" is often an hour very well spent for greater effectiveness in using AI.

I agree with all your points, but I have found Claude far less useful for writing than the other models. I did not see the leap toward Claude 4 (Opus or Sonnet) that I expected, not in writing and reasoning. In fact, not long ago I asked both Claude 4 and Gemini 2.5 Pro to quantify about three pages of data (quantitative and qualitative). The conclusions were so different that I gave each the answer the other had given. Claude apologized profusely and got it wrong again upon reanalysis. I also find that Gemini is writing better than the rest of the models. If someone wants to pay for a model, right now I would not recommend paying for Claude.

One more thing—what I just mentioned is something that I recommend to people who are willing to pay for at least two models. Make them converse! Give one model the answer the other gave you. This is generally a very fruitful exercise.

Expand full comment
Stevie Marlis's avatar

Truly, your "one more thing" suggestion is brilliant. Thank you. As soon as I read it, it seemed obvious. But I never thought of it.

Expand full comment
Jason Scharf's avatar

I use all of these for various tasks and mostly aligned with the way you describe. Two additional thoughts

1) just started using Grok Deep Search in Tasks. It has been an amazing tool for keeping up on news (in my case a news and trends on a very specific niche - Austin Bio & Health)

2) I have found memory in ChatGPT to be a super power. As many of threads I have are linked in various ways. However I can't get it to stop using em-dashes no matter how many times I tell it to remember or put it in custom instructions.

Expand full comment
Jonathan Porter's avatar

I've also tried to get it to stop using em dashes to no avail. I also find the memory really useful, and the Projects too for keeping hold of context, but the memory does get full up.

Expand full comment
Paul Funnell's avatar

Yeah you can't override default behaviours on things like em dashes and obsequiousness with modified profile instructions or intro prompts, sadly, in spite of the number of clickbait 'prompt master' posts on LinkedIn saying you can.

Expand full comment
Jason Scharf's avatar

It does override for a bit, but then drifts back. It will be interesting to see how we better train them to personal preference and style. It definitely retains knowledge, just not instructions.

Expand full comment
Josh Rowe's avatar

Great piece, Ethan. You’ve nailed the core shift in the landscape: it's no longer about the "best model" but the "best overall system." This framing is a huge help for anyone feeling overwhelmed.

That said, I'm going to challenge the quick dismissal of Copilot. While I agree its raw model performance isn't always at the bleeding edge of a new GPT-4o or Claude Opus release, you're under-valuing its power as a system.

For me, the deep integration into Windows and Office is proving to be a game-changer. The friction of alt-tabbing to a browser, copying, and pasting is a bigger productivity killer than we admit. Having a very capable AI right there in Word, Outlook, or on the desktop is an advantage that's hard to quantify but easy to feel.

For the majority of knowledge workers, the convenience of a well-integrated AI will always outweigh a slightly superior AI that resides in a separate tab.

I'm curious what others think. Are you finding this trade-off plays out the same way in your daily work, or am I over-valuing the convenience of integration?

Expand full comment
Mohammed Jama's avatar

Important to note the difference between Copilot Chat vs M365 Copilot. Not sure which one was being referred to exactly in the article, but the latter can definitely be a game changer in terms of its deep workplace integration across Outlook, Office, Teams etc.

Copilot Chat is basically ChatGPT with a Microsoft jacket on. The addition of agents is also interesting and something to keep an eye on.

Expand full comment
Eleanor Brown's avatar

AND I've just realised that using the license version, if you use the Analyst agent - it's based on o3-mini, and the Researcher agent is based on o3 deep research. Both use chain of thought (as you would expect). This is a game changer as I'm only allowed to use MS Copilot at work

Expand full comment
Josh Rowe's avatar

Excellent point. I’m using the paid version including agents.

Expand full comment
Mohammed Jama's avatar

Nice. On a personal or enterprise level? Keen to hear some thoughts on use cases. I’ve started testing some of the free agents

Expand full comment
Gortin Shyver's avatar

Check out BoltAI if you’re on a Mac, it offers inline AI with any model from any company

Expand full comment
Russell Jon Ivanhoe's avatar

I am developing an app on 4o. I provide the ideas, the logic, and the use cases. 4o provides the coding. This includes creating the patent. What are the reasons I might switch to o3? How would I do that without losing the code in the model?

Expand full comment
Hugo's avatar

If you're developing an app I would switch to Claude. There's a night and day difference for development.

Get chatgot to write a detailed prompt of everything you've done and all key information for the project.

Then give Claude the entire code base plus the prompt. Claude's coding ability and UI is just so much better than chatgpt. When I hit the limit on Claude, and I try switch to chatgpt. I end up waiting for my limit to reset rather than use chatgpt. It's that bad for coding honestly.

Expand full comment
Paul Baier's avatar

Ethan, great list. as someone who as 8 AI models on phone and browser (ChatGPT, Claude, Gemini, Copilot, Perplexity, Grok, Deepseak, and Mistral), I frequently test different models. My experience is very similar to yours on model strengths

I continue to find many people most interested in "AI tool optimization" rather than really learning one of the big 3 (OpenAI, Claude or Gemini)

OpenAI ChatGPT is so far head of Microsoft copilot for secure employee chatbots, that it's not even a meaningful comparison

on prompt tips, one that we teach is to ask AI to create a prompt and then use the AI created prompt. e.g. "create a prompt that does competitive analysis of Ford motor company. include impact of tariffs"

AI creates a much better one than we humans can create

Prompt: Competitive Analysis of Ford Motor Company (Including Tariff Impacts)

Objective:

Conduct a detailed and strategic competitive analysis of Ford Motor Company, focusing on its market position, strengths, weaknesses, opportunities, and threats—particularly in the context of global tariffs and trade policies.

Instructions:

Analyze Ford Motor Company with the following structured categories:

1. Company Overview

Provide a brief summary of Ford’s business model, major vehicle segments (e.g., trucks, SUVs, EVs), and geographic markets.

Highlight key financials: revenue, market share, profit trends over the past 3–5 years.

Detail recent strategic initiatives (e.g., EV transition, mobility services, partnerships).

2. Competitive Positioning

Identify and compare Ford’s top 3–5 competitors (e.g., GM, Toyota, Tesla, Stellantis).

Evaluate Ford’s market share relative to each competitor in major regions: North America, Europe, China.

Analyze Ford’s product innovation pipeline vs. competitors (especially in electric vehicles and autonomous tech).

Discuss Ford’s brand perception and customer loyalty.

3. SWOT Analysis

Strengths: e.g., F-Series dominance, U.S. manufacturing scale, legacy brand.

Weaknesses: e.g., union labor costs, global supply chain complexity.

Opportunities: e.g., EV leadership, software monetization, international expansion.

Threats: e.g., EV price wars, regulatory pressure, rising raw material costs.

4. Tariff and Trade Policy Impact

Evaluate how existing and proposed tariffs on steel, aluminum, and auto parts have impacted Ford’s cost structure.

Analyze Ford’s reliance on imports/exports (vehicles, components) and how trade tensions (e.g., with China, EU, Mexico) influence operations.

Compare tariff impacts on Ford vs. key competitors (especially those with more globalized production).

Assess Ford’s strategic responses: e.g., reshoring manufacturing, lobbying efforts, pricing strategies.

5. Forward-Looking Outlook

How well-positioned is Ford for the next 3–5 years in a landscape shaped by protectionism, decarbonization, and digital disruption?

Recommend strategies Ford should prioritize to mitigate risk and enhance competitive advantage.

Output Format:

A well-structured report or presentation with visual aids (charts, graphs, tables) where appropriate. Cite all sources and differentiate between data-driven conclusions and analyst opinions.

Expand full comment
Maurice Blessing's avatar

Hi Paul, what is your use case for Mistral? I have the paid version, but I find it lacking on most tasks compared to others models. I pay for it though, because I am curious how a European model stacks up.

Expand full comment
Paul Baier's avatar

Hi Maurice, i have not yet found a compelling use case where Mistral is superior. I suspect there are some and I have not done super exhaustive testing. In the research for our Corporate Buyers Guide to Enterprise Intelligence Applications, we find European companies and need behind the firewall deployments, understandably value Mistral.

Expand full comment
Maurice Blessing's avatar

Thanks. I’ve read that it’s better at non-English languages, but I still have to try that out extensively. With Dutch, this doesn’t appear to be the case unfortunately.

Expand full comment
Gortin Shyver's avatar

Check out LLM front ends like TypingMind and BoltAI which let you use any model from any company on a pay-as-you-go basis. The monthly subscription maybe cheaper for heavy users but for most people the PAYG pricing is the better option.

Expand full comment
Karl Wirth's avatar

Ethan, great writeup as always. One big criteria that is missing from your analysis is collaboration with your team and AI. All of the above assumes an individual in single-player mode. I think that is because this is what Anthropic, ChatGPT, Grok, and Google assume. Their Teams products are not really for teamwork.

I think team collaboration with AI is so important that we started a company, Stravu, to try to enable a new way of working together with AI. We are in Beta and I'd love your (and anyone else on the thread) feedback on how you want to work with you team and AI. Please check it out at https://www.stravu.com and sign up for the Beta.

Expand full comment
Harvey Freedenberg's avatar

I enjoyed your book, Co-Intelligence. I use Claude to proofread, edit and comment on my freelance book reviews at the point when I think I’m ready to file them with my editor and and have found it a useful tool.

Expand full comment
Thomas's avatar

I'm curious if you could talk a bit about agents and image generation. I see people talking about how they get AIs to write entire programs for them and I don't know how they're doing that.

Expand full comment
Sally's avatar

This is super helpful, thank you! Regarding your suggestion to "click show thinking," is that a button or setting in ChatGPT, or are you prompting it to show thinking?

Expand full comment
Patrick Barone's avatar

This is a terrific summary. I’m going to use some of these ideas next time I speak to lawyers organizations. Lawyers are badly in need of this kind of clear, candid guidance. We’re stumbling through an AI revolution without a map—and articles like this offer much-needed direction. I’ll be expanding on these ideas soon for the legal community over at AI on Trial https://patrickbarone.substack.com.

Expand full comment
Will S Johnston's avatar

Given that the frontier models are providing somewhat similar overall functionality, the big miss for me is having one that can be provide persistent memory and context based on a profile I provide. Also, it would have to have two way speech. I have not seen a model/provider offering these in combination? I don't want to 'roll my own', but I will if needed. I suspect that the winner is going to have this combination. Am I missing something and this already exists?

Expand full comment
unwordedly's avatar

I've been avoiding paying for any of these models because Google AI Studio gives access to their frontier models for free. Is there something I'm missing by not subscribing to their paid tier?

Expand full comment
Jim Samuel's avatar

A very good guide. One thing that I have found helpful is to instruct AI to ask you questions if it needs more information or needs to clarify anything in your prompt.

Expand full comment
Philip Ashton's avatar

I love the “get it to give you 10/30/100 options” idea. Does anyone have tips to get it to iterate better on a favourite idea, other than “give me 10 more ideas like X”?

Expand full comment