79 Comments
User's avatar
Michael G Wagner's avatar

This is 100% my experience. The last couple of months have been the most productive in my career, and it isn’t even close. My concern with this observation is that we are potentially entering a productivity divide between those who utilize AI properly and those who do not. And this is not just an AI literacy question, it is primarily also a question of attitude towards the benefits of AI. If you are rejecting AI as a matter of principle you will no longer be able to compete, even if you are the smartest person in the room.

Expand full comment
Paul Carpentier's avatar

Couldn’t have said it any better. Same experience, even at 70 years of age 😬. 100% aligned.

Expand full comment
Michael Frumer's avatar

With 15 additional years on you (having started in the game with punched cards and wooden disks) and not having done any coding of significance in about 20 years, Copilot and I just developed a nifty electronic bulletin board app. The old adage that I have heard, "a good programmer can write Fortran in any language" really comes to life with an AI at my side.

Expand full comment
An Ordinary Guy's avatar

That last line hit hard.

What if the real divide isn’t just about who uses AI — but who lets it think for them?

We risk building a world where even the “smartest in the room” forget how to navigate without a co-pilot.

Not because they’re lazy, but because the tools make it so easy to skip the friction — the struggle where understanding is born.

I’ve been writing a lot about that tension: productivity vs. cognitive erosion.

Your take is sharp. Grateful for it.

Expand full comment
Dale Simpson's avatar

> My concern with this observation is that we are potentially entering a productivity divide between those who utilize AI properly and those who do not.

I agree and would take this notion a step further to insist that is where we are right now.

Expand full comment
Michael G Wagner's avatar

Thanks for the comment. 👍 Yes, we are.

Expand full comment
Craig Holley's avatar

What I find significant is that this study is done before the use of the new “deep research “ methodologies were realized. I am “teaming” with a combination of Gemini, OpenAI, and Claude using deep research and doing mashups of results. And now I am integrating this with my human team members and teaching them the technologies. The results are spectacular. We are using this technology to design, architect, project plan, and build and it is hard to describe the results. As new features appear we are folding them into our processes. GPT 4o results which used to stun me are now pedestrian in comparison.

Expand full comment
Henry Coutinho-Mason's avatar

I completely agree. I think the non obvious implication of DR & other agentic AIs will be to increase the ability & benefit of human:human collaboration, by doing the bulk of the actual work

I wrote more on this here: https://open.substack.com/pub/thefuturenormal/p/chatgpts-deep-research-and-thinking

Expand full comment
Craig Holley's avatar

Henry, wow thanks for the redirect to your article. Lots of food for thought. A few things I would comment on: 1) I rarely use the research results from the first prompt, often takes a bit of human in the loop feedback. This is really making your point on the prompter’s expertise matters, and I would add fine tuning is not only acceptable but to be expected - you are literally using a custom RAG to build better research results. 2) Very serious about the mashups being useful. I find the three different models give a lot of expansion to answers and modes of “alien thinking “ as Ethan puts it in his book. 3) cross-feeding models results between models for analysis and creation can be revelatory.. just saying try it you might like is. 4) getting a lot of mileage of asking for a PhD proposal on subjects as done carefully you actually get a ton of design ideas for research proposals.

I could go on but not trying to write a blog post just pointing out how rapidly the process is changing and morphing and expanding.

Expand full comment
Uwe PLEBAN's avatar

With respect to prompting of the "OpenAI deep research" (Odr) agent, I have started to use the meta-prompting approach. I give o3-mini a brief prompt about the research topic, together with some "standard instructions" to include at the end of the prompt for Odr (how to structure the report, tabular summaries, slide outlines in the appendix, etc.), and ask it to write a detailed prompt. Here is an example:

"I would like to engage you as a specialist in the creation of detailed prompts for the 'OpenAI deep research' (Odr) agent. The agent will use your prompt to conduct extensive research and produce a high-quality report. The best way to prompt the Odr agent is to include a significant amount of detail specifying the topic and the requirements for the research task.

Here is the research topic: <<Brief description of the topic - one sentence usually suffices>>. At the end of your prompt, include the following text verbatim: ... detailed instructions about the report structure, format, etc. ..."

The generated prompt for Odr is usually at least a page long. I am actually considering to change the "significant amount of detail" portion to "reasonable amount of detail", with the goal of giving Odr more leeway when pursuing the research. I have observed that it sticks very much to the prompt details, which may be too constraining.

Anyway, Odr is in another category. I am starting to use it for work (I'm a retired consultant specializing in Generative AI) and private investigations. Can't imagine not having it. It's also good for writing code.

Expand full comment
Guy Alvarez's avatar

Craig, I find your mashup concept very interesting. What is the process that you are using for this mashup? Are you just copying and pasting from one tool to the other, or are you using some sort of automation or workflow or tool to help you get the results that you want? I would be very interested in how you are doing this.

Expand full comment
Craig Holley's avatar

Guy,

Several options. So one is to combine the three outputs and put them together and ask for an analysis on the three outputs using all references and use cases . Since I always ask for references and use cases and I really get different ones often this allows me to combine them this actually enriches a report (I am doing technical reports on potential architecture solutions). On the initial reports they are in a standard format (executive summary, several layers of details, summary and conclusions, recommendations, references with use cases spread through the details and references) with Open AI or Claude it does a nice job of mashups. If all three are similar I can do manual curation but takes a bit more effort. Haven’t automated yet but guess what, Claude or Open AI will do the code for me - it is an excellent idea.

Craig

Expand full comment
Guy Alvarez's avatar

Thanks!

Expand full comment
An Ordinary Guy's avatar

Yes Craig. So true. The way you describe it — folding new features into team workflows like it’s second nature — feels like watching the future gel in real time.

But I can’t help wondering:

What happens when the “pedestrian” becomes the new default?

When the bar keeps rising so fast that yesterday’s breakthroughs feel like slow motion?

There’s power in these mashups. But also a risk: speed can outpace reflection.

I’m curious how you and your team stay anchored while riding that wave.

Thanks for sharing your lines so openly.

Expand full comment
Yunfeng's avatar

Hi Craig, Fantastic to read about your excellent results with team work using multiple AIs. Do you use a tool for that at the moment? I would love to know more about how you use and integrated them in this process.

Expand full comment
Terence Condren's avatar

I’ve seen my productivity increase in two ways. On a basic level, I’m using AI for legal document review / summaries. It was pretty bad at that when I was using 4o, but with o1, it’s very useful and saves me a ton of time. On a more advanced level, I’m using o1 and o3 mini high and Deep Research to dig into complex tax and legal questions, which is a game changer. Yes, I have to check the cites for hallucinations, but I’m still saving a ton of time. Plus, I can add PDFs of vetted third-party tax research that I know and trust to enhance the validity and output of those AI tools. It’s all about figuring out what should go into the mixing bowl.

Expand full comment
Brad's avatar

Weighing in as an English teacher, with students in grades 7 and 11, this article makes me think about the possibilities of a dopamine hit an AI user would receive because of the sense of success and perhaps even control over difficult material. So, is the difficulty that students have in moderating their use partially energized by the feelings of positivity demonstrated above?

Expand full comment
Utsav Bhatt's avatar

This is great. I have been working as a soloprenuer with ChatGPT and Claude as my co-workers for the last 18 months. I work in the innovation and strategy consulting space and now using AI to disrupt traditional consulting business model. It would be an honor to take part in this study. I have a lot of insights to share.

Expand full comment
Stef's avatar

This is a fantastic study, and Can Confirm. My work as a cyborg has been more creative, higher quality, and way more fun. “More efficient” and “faster,” while also true, don’t begin to capture it. I’m able to play in spaces that were completely unavailable to me before. To have actual evidence backing this up is very exciting, and helps me evangelize to people who make decisions about our organization’s AI resources.

Expand full comment
Carlo Torniai's avatar

I've always been convinced that Generative AI's best usage is as a 'sparring partner' for doing your job. In my experience, this has helped me greatly in my daily work, and it has been the pattern with which my team and I have trained employees and introduced Generative AI in corporate settings. Glad to see some quantitative research corroborating this approach.

Expand full comment
David Martin's avatar

This fascinating field experiment at P&G supports the importance of Artificial Intelligence Quotient (AIQ), a term coined by MIT and Sun Yat-sen University researchers.

The MIT-Sun Yat-sen studies showed that the best human-AI performers were not necessarily the best chess players or the most AI literate and the P&G experiment shows similar results between experts and non-specialists.

AIQ refers to a person's ability to effectively use AI across a diverse range of tasks, identified as a stable, measurable factor using 18 years of global data from chess and renju games. Subsequent studies then broadened the scope to general tasks using language models like ChatGPT and Gemini to further establish and validate AIQ.

The P&G research shows that individuals and especially teams collaborating with AI achieved better results, demonstrating the real-world value of AIQ. These findings from both the original AIQ research and "The Cybernetic Teammate" emphasize the need to measure and develop AIQ for individuals, teams and organizations to fully benefit from AI.

Measuring and developing your AIQ will make you a better player with your cybernetic teammate.

Expand full comment
Yaakov Saxon's avatar

A quibble with your interpretation of the teams being “significantly” better than individuals, demonstrating the “value of teamwork”:

* Firstly .24 stdev is just not very much!

* But much more importantly, that improvement is strictly worse than just having each team member work solo and then taking the better of the two submissions! (I don’t have the math background to calculate it myself but ChatGPT 4o and Claude 3.7 both tell me that this strategy should boost stdev by .564 or more than double the improvement). So the teamwork is actually showing an anti-synergy!

Expand full comment
John Weisenfeld's avatar

Claude helped me write this email which I put in a forward of this article to my teammates.

Dear Planning Team,

I've been reflecting on some groundbreaking research that's causing me to reconsider our entire approach to project-based learning.

What if we've been doing this backward all along? What if, instead of starting with standards and trying to build engaging projects around them, we started with student passion and used AI as a teammate to help connect that passion to standards?

Recent research from Procter & Gamble, conducted by researchers at Harvard's Digital Data Design Institute, reveals something transformative: when AI joins a team, it dramatically changes what's possible. In their study, they found that individuals with AI performed as well as traditional teams, while teams with AI achieved exceptional results beyond what either could do alone.

But here's what's keeping me up at night: this means that any teacher, working with any student, and collaborating with AI, can effectively help that student build a project that addresses ANY educational standards, in ANY subject area, for ANY grade level.

Think about that for a moment.

We no longer need to force students into predetermined projects to meet standards. Instead, we can let students follow their curiosity and passion, then use AI as a teammate to help connect their work back to whatever standards are required. The standards can finally serve the student's learning journey, not the other way around.

This completely inverts our traditional approach. Our guiding star becomes student interest - not curriculum maps or pacing guides.

I believe this approach will lead to deeper learning, more authentic engagement, and better outcomes. Students will be doing work that matters to them, while still meeting (and likely exceeding) the standards we're accountable for.

I'm eager to discuss how we might pilot this approach with our students. Perhaps we could start with a small group and document the process and outcomes?

Looking forward to your thoughts,

Weisenfeld

Expand full comment
Dov Jacobson's avatar

This is great stuff: huge scale, clean design, solid numbers and now fascinating comments here below. Did you find a way to control for the 'novelty effect' in your emotional metrics? How many of your subjects were using AI effectively for the first time and enjoying the initial thrill we have all experienced in these last few years?

Expand full comment
Waqas Sheikh's avatar

Thanks for pointing out the 'novelty effect' - I was struggling to consider an effective way to control for that. Perhaps we should ask our AI teammates to suggest a way? :)

Expand full comment
Prof David Clutterbuck's avatar

An interesting question arising from this fascinating research is whether a combination of AI and PERILL diagnosis can produce even greater impact. PERILL, developed from research that included work with the highest performing teams globally in Facebook/Meta, is the only instrument to address team dynamics from a complex, adaptive systems perspective. Data from over 500 teams show that, when a team understands its own systems and how it interacts with surrounding systems, it is better able to identify and address causes of high or low performance. Putting an AI into the data analysis should identify connections between the six performance drivers that aren't initially obvious. Prof David Clutterbuck, david@clutterbuck-cmi.com

Expand full comment
Robert Sullivan's avatar

Hi David, great to see you contribute. Any ideas on the role of AI as a Coach or Mentor? Could instil organisational or team culture to smooth induction etc but is that a good use of AI? Robert

Expand full comment
Karl Wirth's avatar

Great study. Thank you for it. Ethan writes: "It is also possible that our results represent a lower bound .... chatbots are not really built for teamwork." I think this is correct. I believe that if you could truly collaborate with your team-mates while working together with AI, the impact would be even greater. We are building this over at www.stravu.com and would welcome early testers and feedback from this folks on this post and comment thread.

Expand full comment
Hal's avatar

Everyone I talk to wants to know: did the control groups have access to the internet and web search engines? This isn’t clear

Expand full comment
Ash Stuart's avatar

Valid question, but for the main use cases Ethan mentions, such as critical thinking and even problem solving, there isn't necessarily a need for LLMs to have web search access.

Expand full comment
Hal's avatar

I didn’t mean LLMs with web search access. I meant, did the groups without AI still have access to web search? I imagine that people who are looking for inspiration or to solve problems may still want to search the web. In other words, is it really AI that made a difference or just any tool-mediated access to external ideas that made a difference?

Expand full comment
Zach's avatar

This alludes to AI not replacing humans but making them ultimately more productive and better, which is what I have always thought would happen just based on my own narrow experience. People only seem to see one side of the equation: AI keeps getting amazingly better. The other side is that teams can get amazingly better as a result of using AI.

Expand full comment
Ash Stuart's avatar

Yes, good point. There isn't enough discourse about how much better can users become in using Generative AI.

Expand full comment
Fabio Annovazzi's avatar

Does this mean that there is a 2x increase in productivity, tied simply to the use of chat Gpt? When will this increase - still not detected at the macro level - finally show up in the GDP numbers?

Expand full comment