29 Comments
User's avatar
The Bull and The Bot's avatar

Great breakdown! One thing I always mention when people say they’re wary of using AI assistants because of hallucinations: the mindset needs to shift. These aren’t just Q&A robots. They can actually be your critical thinking partners.

The real value isn’t in asking “what’s the answer?” It’s in using these models to stress-test your thinking. They can:

1. Expand your ideas

2. Validate or poke holes in them

3. Surface POVs you may have completely overlooked

Yes, they’re great for answering simple questions but in doing so, they can also hallucinate too. The key is in how you engage with them.

Give o3 a thesis, for example a stock idea and your reasons for liking it. And give it a persona, like a skeptical hedge fund portfolio manager. Ask it for 10 reasons that support your case and 10 that challenge it. You’ll get new angles, risks you hadn’t considered, and potential counterarguments to prepare for. Now, the conversation isn’t about being right or wrong now. It’s about being more rigorous.

Bottom line: don’t use LLMs only as search bars. Start using them like strategic thought partners. Pick its brain so that it shares information that can sharpen your thoughts and help YOU make more informed decisions.

Expand full comment
Jason Scharf's avatar

I use all of these for various tasks and mostly aligned with the way you describe. Two additional thoughts

1) just started using Grok Deep Search in Tasks. It has been an amazing tool for keeping up on news (in my case a news and trends on a very specific niche - Austin Bio & Health)

2) I have found memory in ChatGPT to be a super power. As many of threads I have are linked in various ways. However I can't get it to stop using em-dashes no matter how many times I tell it to remember or put it in custom instructions.

Expand full comment
Jonathan Porter's avatar

I've also tried to get it to stop using em dashes to no avail. I also find the memory really useful, and the Projects too for keeping hold of context, but the memory does get full up.

Expand full comment
Paul Funnell's avatar

Yeah you can't override default behaviours on things like em dashes and obsequiousness with modified profile instructions or intro prompts, sadly, in spite of the number of clickbait 'prompt master' posts on LinkedIn saying you can.

Expand full comment
Jason Scharf's avatar

It does override for a bit, but then drifts back. It will be interesting to see how we better train them to personal preference and style. It definitely retains knowledge, just not instructions.

Expand full comment
Federico's avatar

This is incredibly timely and useful. I get asked this all the time—and even some people who are paying for the good models (say, ChatGPT Plus) are not aware that they can switch to more powerful models, so they're missing out. A quick "please share your screen and tell me what you want to do" is often an hour very well spent for greater effectiveness in using AI.

I agree with all your points, but I have found Claude far less useful for writing than the other models. I did not see the leap toward Claude 4 (Opus or Sonnet) that I expected, not in writing and reasoning. In fact, not long ago I asked both Claude 4 and Gemini 2.5 Pro to quantify about three pages of data (quantitative and qualitative). The conclusions were so different that I gave each the answer the other had given. Claude apologized profusely and got it wrong again upon reanalysis. I also find that Gemini is writing better than the rest of the models. If someone wants to pay for a model, right now I would not recommend paying for Claude.

One more thing—what I just mentioned is something that I recommend to people who are willing to pay for at least two models. Make them converse! Give one model the answer the other gave you. This is generally a very fruitful exercise.

Expand full comment
Stevie Marlis's avatar

Truly, your "one more thing" suggestion is brilliant. Thank you. As soon as I read it, it seemed obvious. But I never thought of it.

Expand full comment
Josh Rowe's avatar

Great piece, Ethan. You’ve nailed the core shift in the landscape: it's no longer about the "best model" but the "best overall system." This framing is a huge help for anyone feeling overwhelmed.

That said, I'm going to challenge the quick dismissal of Copilot. While I agree its raw model performance isn't always at the bleeding edge of a new GPT-4o or Claude Opus release, you're under-valuing its power as a system.

For me, the deep integration into Windows and Office is proving to be a game-changer. The friction of alt-tabbing to a browser, copying, and pasting is a bigger productivity killer than we admit. Having a very capable AI right there in Word, Outlook, or on the desktop is an advantage that's hard to quantify but easy to feel.

For the majority of knowledge workers, the convenience of a well-integrated AI will always outweigh a slightly superior AI that resides in a separate tab.

I'm curious what others think. Are you finding this trade-off plays out the same way in your daily work, or am I over-valuing the convenience of integration?

Expand full comment
Mohammed Jama's avatar

Important to note the difference between Copilot Chat vs M365 Copilot. Not sure which one was being referred to exactly in the article, but the latter can definitely be a game changer in terms of its deep workplace integration across Outlook, Office, Teams etc.

Copilot Chat is basically ChatGPT with a Microsoft jacket on. The addition of agents is also interesting and something to keep an eye on.

Expand full comment
Josh Rowe's avatar

Excellent point. I’m using the paid version including agents.

Expand full comment
Mohammed Jama's avatar

Nice. On a personal or enterprise level? Keen to hear some thoughts on use cases. I’ve started testing some of the free agents

Expand full comment
Gortin Shyver's avatar

Check out BoltAI if you’re on a Mac, it offers inline AI with any model from any company

Expand full comment
Harvey Freedenberg's avatar

I enjoyed your book, Co-Intelligence. I use Claude to proofread, edit and comment on my freelance book reviews at the point when I think I’m ready to file them with my editor and and have found it a useful tool.

Expand full comment
Paul Baier's avatar

Ethan, great list. as someone who as 8 AI models on phone and browser (ChatGPT, Claude, Gemini, Copilot, Perplexity, Grok, Deepseak, and Mistral), I frequently test different models. My experience is very similar to yours on model strengths

I continue to find many people most interested in "AI tool optimization" rather than really learning one of the big 3 (OpenAI, Claude or Gemini)

OpenAI ChatGPT is so far head of Microsoft copilot for secure employee chatbots, that it's not even a meaningful comparison

on prompt tips, one that we teach is to ask AI to create a prompt and then use the AI created prompt. e.g. "create a prompt that does competitive analysis of Ford motor company. include impact of tariffs"

AI creates a much better one than we humans can create

Prompt: Competitive Analysis of Ford Motor Company (Including Tariff Impacts)

Objective:

Conduct a detailed and strategic competitive analysis of Ford Motor Company, focusing on its market position, strengths, weaknesses, opportunities, and threats—particularly in the context of global tariffs and trade policies.

Instructions:

Analyze Ford Motor Company with the following structured categories:

1. Company Overview

Provide a brief summary of Ford’s business model, major vehicle segments (e.g., trucks, SUVs, EVs), and geographic markets.

Highlight key financials: revenue, market share, profit trends over the past 3–5 years.

Detail recent strategic initiatives (e.g., EV transition, mobility services, partnerships).

2. Competitive Positioning

Identify and compare Ford’s top 3–5 competitors (e.g., GM, Toyota, Tesla, Stellantis).

Evaluate Ford’s market share relative to each competitor in major regions: North America, Europe, China.

Analyze Ford’s product innovation pipeline vs. competitors (especially in electric vehicles and autonomous tech).

Discuss Ford’s brand perception and customer loyalty.

3. SWOT Analysis

Strengths: e.g., F-Series dominance, U.S. manufacturing scale, legacy brand.

Weaknesses: e.g., union labor costs, global supply chain complexity.

Opportunities: e.g., EV leadership, software monetization, international expansion.

Threats: e.g., EV price wars, regulatory pressure, rising raw material costs.

4. Tariff and Trade Policy Impact

Evaluate how existing and proposed tariffs on steel, aluminum, and auto parts have impacted Ford’s cost structure.

Analyze Ford’s reliance on imports/exports (vehicles, components) and how trade tensions (e.g., with China, EU, Mexico) influence operations.

Compare tariff impacts on Ford vs. key competitors (especially those with more globalized production).

Assess Ford’s strategic responses: e.g., reshoring manufacturing, lobbying efforts, pricing strategies.

5. Forward-Looking Outlook

How well-positioned is Ford for the next 3–5 years in a landscape shaped by protectionism, decarbonization, and digital disruption?

Recommend strategies Ford should prioritize to mitigate risk and enhance competitive advantage.

Output Format:

A well-structured report or presentation with visual aids (charts, graphs, tables) where appropriate. Cite all sources and differentiate between data-driven conclusions and analyst opinions.

Expand full comment
Gortin Shyver's avatar

Check out LLM front ends like TypingMind and BoltAI which let you use any model from any company on a pay-as-you-go basis. The monthly subscription maybe cheaper for heavy users but for most people the PAYG pricing is a cheaper option.

Expand full comment
Mohammed Jama's avatar

Very timely guide. I think a lot of organisations are grappling with choosing a tool. This helps clear things up massively. Thanks for sharing.

Expand full comment
Daria décrypte l’IA's avatar

Very insightful! The key takeaway for me is that using AI effectively isn't just about getting answers, but about asking better questions and using AI to challenge and refine our thinking.

Expand full comment
Roi Ezra's avatar

One of the clearest and most useful AI guides I’ve seen, thank you for writing this down :)

What I’d gently add is this:

Prompt quality doesn’t just come from structure. It comes from timing.

A prompt that shows up before reflection can sound smart, but disconnect us from the signal we actually needed to follow.

The real upgrade isn’t just knowing what AI can do.

It’s knowing when not to ask yet.

Prompting works best when it follows alignment, not when it replaces it.

Expand full comment
Andrew Chapman's avatar

"asking it to explain its logic will not get you anywhere" – another great post but this statement isn't always true. I recently asked o3 to calculate the highest point within 15 miles of where I live. It was wrong, which I knew immediately. But when I asked it to explain why it was wrong, it correctly diagnosed that it had simply done web searches when actual elevation data was necessary, then found suitable data to calculate it correctly. I'll take it as a win that in this case a human with domain knowledge was much quicker and better at the task, but of course you could also argue my prompting should have been more precise! Anyway, these constant experiments are fascinating. (And of course your overall message is exactly this: experiment constantly!)

Expand full comment
Alex Tolley's avatar

This is a very useful guide to current practices. Can you say anything about specialist models in sciences (e.g., astronomy, biology, ...), and maths (especially how to build equations from information, solve equations, and differentiation (including partial differentiation))?

Are there specialty AIs that can produce garden designs and plant species/varieties based on information about a yard, its climate zone, and soil type? AIs that can design patio decks and other structures, especially to handle issues of facing direction and the need to reduce summer sun reaching into the house? Can style be provided by input images from websites?

All these suggestions are based on some of my interests, as well as projects that I think I can do, but would like a first cut done by an AI to speed up the work, allowing for iterative design, and then material requirements for the finished design.

Expand full comment
Alex Tolley's avatar

"That doesn’t mean that there is no art to prompting. If you are building a prompt for other people to use, it can take real skill to build something that works repeatedly. "

This reminds me of Asimov's robot stories, especially the Elijah Baley mystery novels. As an Earthman, Baley had very little interaction with robots. OTOH, the Spacers had a lot of experience using robots as they were surrounded by them. Baley mentions that he had only poor control of robots, while a Spacer had far better control by using the correct phrases, and could even override his commands.

Expand full comment
Karl Wirth's avatar

Ethan, great writeup as always. One big criteria that is missing from your analysis is collaboration with your team and AI. All of the above assumes an individual in single-player mode. I think that is because this is what Anthropic, ChatGPT, Grok, and Google assume. Their Teams products are not really for teamwork.

I think team collaboration with AI is so important that we started a company, Stravu, to try to enable a new way of working together with AI. We are in Beta and I'd love your (and anyone else on the thread) feedback on how you want to work with you team and AI. Please check it out at https://www.stravu.com and sign up for the Beta.

Expand full comment
Tango Libertad's avatar

We've got gadgets and gizmos a-plenty

We've whozits and whatzits galore...

Expand full comment
giacomo catanzaro's avatar

regarding some of the limitations of AIs, I've had a recent work accepted on this: https://arxiv.org/pdf/2506.10077

in which we detail how natural language's properties fundamentally make it essentially impossible for LLMs to interpret semantic statements of moderate complexity. as the number of elements in an expression grows, the number of potential interpretations grows combinatorially, and so the likelihood of the single correct one being chosen vanishes. humans are embedded in context-rich environments with memories and more that help us to constrain potential interpretations more efficiently, and it will be really cool to start moving more towards AI systems that can better approximate this part of our nature.

Expand full comment