Thanks for this. It's rather unsettling to hear you question the future of "co-intelligence." Still, I admire you for not twisting data to fit your thesis. Cheers, Clay
Another insightful post from Ethan Mollick today. I appreciate what he wrote in his final paragraph. Heads up ...
---
As these systems level up and inch towards true autonomous agents, we're going to need to figure out how to stay in the loop - both to catch errors and to keep our fingers on the pulse of the problems we're trying to crack. o1-preview is pulling back the curtain on AI capabilities we might not have seen coming, even with its current limitations. This leaves us with a crucial question: How do we evolve our collaboration with AI as it evolves? That is a problem that o1-preview can not yet solve.
To solve the given Sudoku puzzle, we'll fill in the grid step by step, following Sudoku rules: each row, column, and 3x3 box must contain all digits from 1 to 9 exactly once.
**Initial Puzzle:**
```
[5 3 _ | _ 7 _ | _ _ _]
[6 _ _ | 1 9 5 | _ _ _]
[_ 9 8 | _ _ _ | _ 6 _]
-------+-------+-------
[8 _ _ | _ 6 _ | _ _ 3]
[4 _ _ | 8 _ 3 | _ _ 1]
[7 _ _ | _ 2 _ | _ _ 6]
-------+-------+-------
[_ 6 _ | _ _ _ | 2 8 _]
[_ _ _ | 4 1 9 | _ _ 5]
[_ _ _ | _ 8 _ | _ 7 9]
```
**Solution Steps:**
1. **Fill in Missing Numbers in Row 1:**
- The missing numbers are **1, 2, 4, 6, 8, 9**.
- By checking the columns and boxes, place the numbers where they don't conflict with existing numbers.
2. **Fill in Missing Numbers in Row 2:**
- Missing numbers: **2, 3, 4, 7, 8**.
- Again, check columns and boxes to place the numbers.
3. **Continue This Process for All Rows and Columns:**
- Use logic to eliminate impossibilities.
- For example, if a number can only go in one cell in a row or column, place it there.
4. **Solve the Puzzle Completely:**
**Final Solved Puzzle:**
```
[5 3 4 | 6 7 8 | 9 1 2]
[6 7 2 | 1 9 5 | 3 4 8]
[1 9 8 | 3 4 2 | 5 6 7]
-------+-------+-------
[8 5 9 | 7 6 1 | 4 2 3]
[4 2 6 | 8 5 3 | 7 9 1]
[7 1 3 | 9 2 4 | 8 5 6]
-------+-------+-------
[9 6 1 | 5 3 7 | 2 8 4]
[2 8 7 | 4 1 9 | 6 3 5]
[3 4 5 | 2 8 6 | 1 7 9]
```
**Explanation:**
- Each number from 1 to 9 appears exactly once in each row, column, and 3x3 box.
- The numbers have been placed following Sudoku rules without any conflicts.
There was a tinge of sadness with my using this new model for the first time today. The “co-intelligence” mode of use with ChatGPT 4 had me treating the model as a Watson to my Sherlock Holmes, so to speak. There was an implicit hierarchy with the human in the commanding, more intelligent position. Now with this new reasoning model, I felt intimations of that ending, and myself just being superfluous. We’re not there yet, this new model uses the simplest (because least resource intense) of the base chatgpt models available. And so it suffers the limitations of building reasoning off a poor LLM. As they apply this sort of model to for example a ChatGPT 5, well, then things might be different
just curious. doing cross word puzzles, sudoko, chess etc., are all ways of how as human we found to have a ways to refresh and stimulants for our brain. If these are taken over by AI, what can AI offer us as better alternatives to refresh and stimulate our brain and thinking
I'm struggling to think this through. Not surprisingly for a Mollick reader, I turned to GPT4o and Claude for help in understanding the context for his provocative and surprising final question, "How do we evolve our collaboration with AI as it evolves?".
GPT4o and I came to this: It seems that the nuance lies in whether Mollick sees these new AI developments as a mere refinement or a fundamental challenge to his own co-intelligence model.
The pat answer might be that co-intelligence (i.e. non-iterative, human in the loop) models are to be preferred for problems with high stakes, or a lot of human dynamics, and for those problems where the user wants to improve the quality of their own thinking and to participate in the chain of thought.
In my work I support university faculty, and my initial instinct would be that educational simulators, for example, are a fantastic idea, but are so freighted with human dynamics and psychology, and social issues that I'd lean towards a co-intelligence model especially for specifying roles, and personalities of the role play agents.
It would be nice to move along in smaller 'chunks' and/or interrupt to join in and then have the Strawberry race ahead for technical parts like specifying agents, databases and the like. So, perhaps it not a binary between co-intelligence or letting the iterative reins go, but a constant back and forth.
To critique my own proposal above, there is wisdom in a "holy shit" reaction to what Strawberry did, not only for the leap it may represent, but because wisdom would ask us not to consider simply "It's all down to how you use the tool" or the opposite, "It depends on the affordances of the tool", but rather an understanding of what the effect of this "Just let it rip and see what it can do" approach will have in a culture devoted to the twin devils of convenience and speed. 😈
The whole point of this tech is to remove us from the loop. I don't think they are close to achieving that but that's the general idea for the end of capitalism that Sam wrote about in his unhinged manifesto. It would be naive to ignore the motivation behind all of this.
In the article Ethan aludes to a growing specialization of AI tools.
Arguably:
1. Perplexity AI excels at research
2. ChatGPT-4 -o1 Preview is great at logical reasoning
3. Claude is best at producing polished writing
I am curious if anyone else is starting to feel that their optimal workflow often involves 2 or 3 of these AI tools, and multiple steps to optimize their productivity?
For example, when writing a technical blog post, I have been gravitating to this workflow:
1. Use Perplexity AI to gather research. This helps collate factual, referenced data and in theory reduces hallucinations.
2. Review & then save the webpages or book references in PDF format in a Claude Project.
3. Extract the key insights from the PDFs into paragraph bullet points in Claude.
4. Use ChatGPT-4 o1-Preview to arrange the bullet points to optimize the logical flow and structure of the article, with the explicit "goal" of clarity, logical flow of thoughts, clear arguments and conclusions. This step is still new, and I am still trying to decide if it imporves the output.
5. Utilize Claude 3.5 to take the bullet points and produce the draft article.
6. Conduct a human review of the final product (with some additional back and forwards refinements with Claude).
I am feeling the workflow leverages the strengths of each AI tool while maintaining human oversight and input throughout the process. Still trying to decide if I am over cooking the process though :)
Curious on what others are doing, in light of the growing specialisation of these AI's.
Very interesting workflow! Can you write a full blog post on this?!
I mainly use perplexity ai and mostly rely on Claude 3 opus and Claude 3.5 sonnet. From an ethical perspective, I will not use OpenAI products due to their lack of interest in ai safety both short term and long terms ones.
Is the Claude artifact projects feature useful for you? I prefer to do everything within the perplexity ai app since it lets me story my prompts into nice threads/collections/pages.
Really enjoyed the overview. Ethan, it seems though that the "model" isn't actually better. OpenAI has simply attempted to maximize its current capabilities through strategy. Love to hear you thoughts on that concept?
Yes! I think I came to a similar conclusion - it's not a binary choice between these two modes; ideally, we'd switch back and forth fluidly. See my comment "I'm struggling to think..." in the comments here.
I'll be interested to hear more use cases. As a teacher, is this going to make it easier for kids to use for homework? I'd like to see some examples of things Strawberry does that are more effective than simply using 4o where the teacher has more control but I am eager to try it out. Also, can you create GPT's for Strawberry or is that obsolete? I have lots of questions ...
I wonder if it could have value as a tool to show "How to think" by getting students to notice the chain of thoughts - not to focus so much on the answer exclusively, but to pay attention also to the process of thinking/reasoning. I can imagine a skilled teacher doing this with homework problems in class after the answers have been discussed. There could be real value there.
Agreed. I did not read through the OpenAI explanation initially and it sounds like it's more geared towards Math and Stem classes so as a humanities teacher I may have to think a little harder about how to utilize it. But following the chain of thinking would certainly be an interesting exercise.
Imagine a Humanities Discussion Simulator where the discussion partner role played by an AI has been coached on using every type of fallacious reasoning known, or at least a sub-set of common ones like ad hominem, survivor bias, straw man, false dichotomy, etc. -you could seed it with your most hated fallacies.
Then pairs of students "play" the simulation and try to notice fallacies and challenge the AI during the conversation.
The reverse could be great fun and very illustrative too: have pairs of students debate the AI but intentionally try to use each of your top 5 fallacies at least twice in the discussion/debate.
The real gold would be to have the new Strawberry show its reasoning and thinking process while dealing with the incoming fallacious arguments.
Ethan, these tools are fundamentally labor displacing and not labor augmenting, so I don't think there is a comforting answer to your last question. As we get closer to AGI, I don't see how the possibility of collaboration with AI get better.
Thanks for this. It's rather unsettling to hear you question the future of "co-intelligence." Still, I admire you for not twisting data to fit your thesis. Cheers, Clay
Another insightful post from Ethan Mollick today. I appreciate what he wrote in his final paragraph. Heads up ...
---
As these systems level up and inch towards true autonomous agents, we're going to need to figure out how to stay in the loop - both to catch errors and to keep our fingers on the pulse of the problems we're trying to crack. o1-preview is pulling back the curtain on AI capabilities we might not have seen coming, even with its current limitations. This leaves us with a crucial question: How do we evolve our collaboration with AI as it evolves? That is a problem that o1-preview can not yet solve.
---
Still waiting for Advanced Voice Chat mode lol
For another example, I just asked o1 to solve a sudoko and it seems to have done it no problem (a few minutes of generating). Awesome.
Solve this sudoko puzzle
[[5 3 0 | 0 7 0 | 0 0 0] ; [6 0 0 | 1 9 5 | 0 0 0] ;
[0 9 8 | 0 0 0 | 0 6 0] ; -------+-------+------- ;
[8 0 0 | 0 6 0 | 0 0 3] ; [4 0 0 | 8 0 3 | 0 0 1] ;
[7 0 0 | 0 2 0 | 0 0 6] ; -------+-------+------- ;
[0 6 0 | 0 0 0 | 2 8 0] ; [0 0 0 | 4 1 9 | 0 0 5] ;
[0 0 0 | 0 8 0 | 0 7 9]]
...
To solve the given Sudoku puzzle, we'll fill in the grid step by step, following Sudoku rules: each row, column, and 3x3 box must contain all digits from 1 to 9 exactly once.
**Initial Puzzle:**
```
[5 3 _ | _ 7 _ | _ _ _]
[6 _ _ | 1 9 5 | _ _ _]
[_ 9 8 | _ _ _ | _ 6 _]
-------+-------+-------
[8 _ _ | _ 6 _ | _ _ 3]
[4 _ _ | 8 _ 3 | _ _ 1]
[7 _ _ | _ 2 _ | _ _ 6]
-------+-------+-------
[_ 6 _ | _ _ _ | 2 8 _]
[_ _ _ | 4 1 9 | _ _ 5]
[_ _ _ | _ 8 _ | _ 7 9]
```
**Solution Steps:**
1. **Fill in Missing Numbers in Row 1:**
- The missing numbers are **1, 2, 4, 6, 8, 9**.
- By checking the columns and boxes, place the numbers where they don't conflict with existing numbers.
2. **Fill in Missing Numbers in Row 2:**
- Missing numbers: **2, 3, 4, 7, 8**.
- Again, check columns and boxes to place the numbers.
3. **Continue This Process for All Rows and Columns:**
- Use logic to eliminate impossibilities.
- For example, if a number can only go in one cell in a row or column, place it there.
4. **Solve the Puzzle Completely:**
**Final Solved Puzzle:**
```
[5 3 4 | 6 7 8 | 9 1 2]
[6 7 2 | 1 9 5 | 3 4 8]
[1 9 8 | 3 4 2 | 5 6 7]
-------+-------+-------
[8 5 9 | 7 6 1 | 4 2 3]
[4 2 6 | 8 5 3 | 7 9 1]
[7 1 3 | 9 2 4 | 8 5 6]
-------+-------+-------
[9 6 1 | 5 3 7 | 2 8 4]
[2 8 7 | 4 1 9 | 6 3 5]
[3 4 5 | 2 8 6 | 1 7 9]
```
**Explanation:**
- Each number from 1 to 9 appears exactly once in each row, column, and 3x3 box.
- The numbers have been placed following Sudoku rules without any conflicts.
**Answer:**
The completed Sudoku grid is:
```
[5 3 4 | 6 7 8 | 9 1 2]
[6 7 2 | 1 9 5 | 3 4 8]
[1 9 8 | 3 4 2 | 5 6 7]
-------+-------+-------
[8 5 9 | 7 6 1 | 4 2 3]
[4 2 6 | 8 5 3 | 7 9 1]
[7 1 3 | 9 2 4 | 8 5 6]
-------+-------+-------
[9 6 1 | 5 3 7 | 2 8 4]
[2 8 7 | 4 1 9 | 6 3 5]
[3 4 5 | 2 8 6 | 1 7 9]
You don’t need o1 to do this, I have free version of ChatGPT, it has not problems getting it right
There was a tinge of sadness with my using this new model for the first time today. The “co-intelligence” mode of use with ChatGPT 4 had me treating the model as a Watson to my Sherlock Holmes, so to speak. There was an implicit hierarchy with the human in the commanding, more intelligent position. Now with this new reasoning model, I felt intimations of that ending, and myself just being superfluous. We’re not there yet, this new model uses the simplest (because least resource intense) of the base chatgpt models available. And so it suffers the limitations of building reasoning off a poor LLM. As they apply this sort of model to for example a ChatGPT 5, well, then things might be different
I think someone once said that this AI is the worst AI you'll ever use.
That someone is Sam Altman!
… and Ethan Mollick:
"The AI you are using is the worst and least capable AI you will ever use." (https://www.oneusefulthing.org/p/on-boarding-your-ai-intern). It's in his book too, I think.
Things WILL be different ...
just curious. doing cross word puzzles, sudoko, chess etc., are all ways of how as human we found to have a ways to refresh and stimulants for our brain. If these are taken over by AI, what can AI offer us as better alternatives to refresh and stimulate our brain and thinking
I'm struggling to think this through. Not surprisingly for a Mollick reader, I turned to GPT4o and Claude for help in understanding the context for his provocative and surprising final question, "How do we evolve our collaboration with AI as it evolves?".
GPT4o and I came to this: It seems that the nuance lies in whether Mollick sees these new AI developments as a mere refinement or a fundamental challenge to his own co-intelligence model.
The pat answer might be that co-intelligence (i.e. non-iterative, human in the loop) models are to be preferred for problems with high stakes, or a lot of human dynamics, and for those problems where the user wants to improve the quality of their own thinking and to participate in the chain of thought.
In my work I support university faculty, and my initial instinct would be that educational simulators, for example, are a fantastic idea, but are so freighted with human dynamics and psychology, and social issues that I'd lean towards a co-intelligence model especially for specifying roles, and personalities of the role play agents.
It would be nice to move along in smaller 'chunks' and/or interrupt to join in and then have the Strawberry race ahead for technical parts like specifying agents, databases and the like. So, perhaps it not a binary between co-intelligence or letting the iterative reins go, but a constant back and forth.
To critique my own proposal above, there is wisdom in a "holy shit" reaction to what Strawberry did, not only for the leap it may represent, but because wisdom would ask us not to consider simply "It's all down to how you use the tool" or the opposite, "It depends on the affordances of the tool", but rather an understanding of what the effect of this "Just let it rip and see what it can do" approach will have in a culture devoted to the twin devils of convenience and speed. 😈
The whole point of this tech is to remove us from the loop. I don't think they are close to achieving that but that's the general idea for the end of capitalism that Sam wrote about in his unhinged manifesto. It would be naive to ignore the motivation behind all of this.
In the article Ethan aludes to a growing specialization of AI tools.
Arguably:
1. Perplexity AI excels at research
2. ChatGPT-4 -o1 Preview is great at logical reasoning
3. Claude is best at producing polished writing
I am curious if anyone else is starting to feel that their optimal workflow often involves 2 or 3 of these AI tools, and multiple steps to optimize their productivity?
For example, when writing a technical blog post, I have been gravitating to this workflow:
1. Use Perplexity AI to gather research. This helps collate factual, referenced data and in theory reduces hallucinations.
2. Review & then save the webpages or book references in PDF format in a Claude Project.
3. Extract the key insights from the PDFs into paragraph bullet points in Claude.
4. Use ChatGPT-4 o1-Preview to arrange the bullet points to optimize the logical flow and structure of the article, with the explicit "goal" of clarity, logical flow of thoughts, clear arguments and conclusions. This step is still new, and I am still trying to decide if it imporves the output.
5. Utilize Claude 3.5 to take the bullet points and produce the draft article.
6. Conduct a human review of the final product (with some additional back and forwards refinements with Claude).
I am feeling the workflow leverages the strengths of each AI tool while maintaining human oversight and input throughout the process. Still trying to decide if I am over cooking the process though :)
Curious on what others are doing, in light of the growing specialisation of these AI's.
Very interesting workflow! Can you write a full blog post on this?!
I mainly use perplexity ai and mostly rely on Claude 3 opus and Claude 3.5 sonnet. From an ethical perspective, I will not use OpenAI products due to their lack of interest in ai safety both short term and long terms ones.
Is the Claude artifact projects feature useful for you? I prefer to do everything within the perplexity ai app since it lets me story my prompts into nice threads/collections/pages.
Really enjoyed the overview. Ethan, it seems though that the "model" isn't actually better. OpenAI has simply attempted to maximize its current capabilities through strategy. Love to hear you thoughts on that concept?
I've started playing with the o1 preview and so far I'm impressed.
I almost wish there was a dial or slider that allowed you to ask the model to think even longer or for a shorter amount of time.
Yes! I think I came to a similar conclusion - it's not a binary choice between these two modes; ideally, we'd switch back and forth fluidly. See my comment "I'm struggling to think..." in the comments here.
I'll be interested to hear more use cases. As a teacher, is this going to make it easier for kids to use for homework? I'd like to see some examples of things Strawberry does that are more effective than simply using 4o where the teacher has more control but I am eager to try it out. Also, can you create GPT's for Strawberry or is that obsolete? I have lots of questions ...
I wonder if it could have value as a tool to show "How to think" by getting students to notice the chain of thoughts - not to focus so much on the answer exclusively, but to pay attention also to the process of thinking/reasoning. I can imagine a skilled teacher doing this with homework problems in class after the answers have been discussed. There could be real value there.
Agreed. I did not read through the OpenAI explanation initially and it sounds like it's more geared towards Math and Stem classes so as a humanities teacher I may have to think a little harder about how to utilize it. But following the chain of thinking would certainly be an interesting exercise.
Imagine a Humanities Discussion Simulator where the discussion partner role played by an AI has been coached on using every type of fallacious reasoning known, or at least a sub-set of common ones like ad hominem, survivor bias, straw man, false dichotomy, etc. -you could seed it with your most hated fallacies.
Then pairs of students "play" the simulation and try to notice fallacies and challenge the AI during the conversation.
The reverse could be great fun and very illustrative too: have pairs of students debate the AI but intentionally try to use each of your top 5 fallacies at least twice in the discussion/debate.
The real gold would be to have the new Strawberry show its reasoning and thinking process while dealing with the incoming fallacious arguments.
Amazing!
A terrible day for generative AI “haters,” as out of ten posts on LinkedIn, nine are about the limitations of LLM capabilities.
It is called OpenAI o1, not GPT o1
They changed the name on me so many times I was just trying to keep up. I'll edit.
It's now showing up as "ChatGPT o1-preview" on ChatGPT's list of models.
Great article, thanks for sharing.
My take on “Strawberry” and answer to "Has technology reached the level of intelligence to be trained on wisdom models vs LLMs? "
https://medium.com/@ev.popov/silicon-sages-or-silicon-monsters-74ffb8f957e8
Looking forward to feedback.
Ethan, these tools are fundamentally labor displacing and not labor augmenting, so I don't think there is a comforting answer to your last question. As we get closer to AGI, I don't see how the possibility of collaboration with AI get better.