Today, OpenAI released a new AI model, GPT-4o, with some interesting capabilities. It also maintains the OpenAI tradition of terrible names for AI models (the “o” means “omni” - more on that shortly). Previously, new AI models from the major AI labs have focused on how smart the model is. GPT-4o appears to be a step up over GPT-4 and is the smartest model I have used. However, it does not represent a major leap over the previous version of GPT-4, the way that GPT-4 was a 10x improvement over the free GPT-3.5. That has to wait, presumably, until GPT-5, which is apparently still scheduled for some future release.
But what it does do is quite interesting.
Democratizing Access
Likely the biggest impact of GPT-4o is not technical, but a business decision: soon everyone, whether they are paying or not, will get access to GPT-4o1. I think this is a big deal. When I talk with groups and ask people to raise their hands if they use ChatGPT, almost every hand goes up. When I ask if they used GPT-4, only 5% of hands remain up, at most. GPT-4 is so, so much better than free ChatGPT-3.5, it is like having a PhD student work with you instead of a high school sophomore. But that $20 a month barrier kept many people from understanding how impressive AI can be, and for gaining any benefit from AI. That is no longer true.
While Microsoft Copilot allows free GPT-4 use in limited ways, the full features of GPT-4 were always locked away behind a paywall. And GPT-4o adds some new tricks to the older model, including an ability to work really well with non-English languages. But I am especially interested in everyone getting access to GPT-4’s GPTs and Code Interpreter. GPTs allow anyone to share and build little agent-like programs (I wrote a guide to building them before), and they have turned out to be surprisingly useful tools for automating complex creative tasks, especially as GPT-4o is remarkably fast.
GPTs can serve many purposes. Take, for example, some GPTs we made. There is Framework Finder, a GPT that suggests and customizes frameworks to solve your problems, or Innovator (used over 10,000 times even before GPT-4o) which walks GPT-4o through an innovation process and gives you a document with creative ideas. You can even use GPTs to create GPTs, like Tutor Blueprint, which will help you create a prompt that will act as a tutor on a subject of your choice.
Along with GPTs, the other useful feature coming to everyone is Code Interpreter, which allows the AI to run the code it writes. There are lots of unexpected uses for this ability. One important one is that it is as an incredibly powerful tool for understanding data, since it lets the AI explore complex datasets in a way that tends to be remarkably low in hallucinations and errors. Especially if you have some training in statistics or analysis, it allows you to get a lot of insight quickly. If you want to try it out, here is Data Analysis Buddy, which, given a dataset, will help you explore it in sophisticated ways. (The best way to see what Code Interpreter does is to try it. Some datasets to try out: all the dialogue from Shakespeare, lists of superheroes and their powers)
Some implications of all this:
Education: GPT-4 is a powerful tutor and teaching tool. Many educational uses were held back because of equity of access issues - students often had trouble paying for GPT-4. With universal free access, the educational value of AI skyrockets (and that doesn’t count voice and vision, which I will discuss shortly). On the other hand, the Homework Apocalypse will reach its final stages. GPT-4 can do almost all the homework on Earth. And it writes much better than GPT-3.5, with a lot more style and a lot less noticeably “AI” tone. Cheating will become ubiquitous, as will universal high-end tutoring, creating an interesting time for education.
Work: I have increasingly been speaking to companies that have been experimenting with giving employees widespread access to GPT-4 and letting employees build GPTs to solve their own problems. One such company, Moderna, reported that 25% of active users had built one, and users were averaging 120 conversations a week. The spread of this sort of use has been limited by the need for companies to buy GPT-4 access for their employees, and their willingness to embrace the tool. Now, employees can start building on their own, and sharing with each other, without outside permission. I think we are going to see entire departments of companies get filled with secret cyborgs, building and sharing GPTs that automate work… and not telling their employers. Figuring out how to get employees to share what they are developing (and managing security and risks) will be a challenge for many organizations.
Global entrepreneurship: GPT-4o will be available around the world. This is exciting because many innovative ideas never see the light of day because innovators have trouble figuring out how to get them to market. AI acts as an excellent co-founder, filling in some of the gaps that every founder has in their skillset. Everyone can now write in perfect English, can do basic coding, can get help with problems, and more. We already know that getting advice from GPT-4 increased the profitability of high performing small business entrepreneurs in Kenya by 15%. Free access to this powerful tool may have profound implications.
The magic I haven’t tried yet
I have played with GPT-4o, but I haven’t yet been given access to its biggest tricks. GPT-4o is natively multimodal, which means it can “see” and “hear” and “speak” in an integrated way with almost no delays. The omni in the model's name means it blends all of these modes together. It can see what you are doing, react to it, respond to interruptions, use realistic voice tones, create images with precise control, and more - all seamlessly. Basically, GPT-4o is a chatbot that can interact naturally with the world around it. All of that seems kind of abstract, so I would strongly urge you to watch a demo video: this one of two AIs interacting, or this one of the AI being sarcastic, or this one of Sal Khan (of Khan Academy) using AI as a tutor with vision. If you watch these, you can see how big a change is coming, and why people building close relationships with AIs seem inevitable. Much more on this in the future, when I can try the multimodal capabilities out myself.
There are tons of other tricks that a fully multimodal model like this can do. It can create 3D images, tell apart different speakers on transcripts, and actually write coherent words in specialized photos and fonts. Again, all I have to go by here are the OpenAI demos, so I will reserve judgement until I can play with the systems, but I suspect there will be lots of surprising use cases made available as a result of these new capabilities.
Ubiquity
With GPT-4o, OpenAI again cements its lead (and least for tonight) over the AI space, but it also is a clear sign of an important shift I have been writing about for a while. All of these features we are starting to see appear — lower prices, higher speeds, multimodal capability, voice, large context windows, agentic behavior — are about making AI more present and more naturally connected to human systems and processes. If an AI that seems to reason like a human being can see and interact and plan like a human being, then it can have influence in the human world. This is where AI labs are leading us: to a near future of AI as coworker, friend, and ubiquitous presence. I don’t think anyone, including OpenAI, has a full sense of all of the implications of this shift, and what it will mean for all of us.
As a reminder, I don’t take money from OpenAI or any AI lab. I also have not yet been given early access to the multimodal GPT-4o features, so this is based on my observations of the system so far. And, because much of what I am discussing has just been announced, things could change before these products launch widely.
Current LLMs apparently aim to imitate an extremely well informed, well spoken, and polite human. I must confess that their admirable mix of patience, modesty, confidence and usefulness serves as a role model for me, and I find myself imitating AI.
Ooops.
With each newsletter from Ethan, I become more excited to read about the potential of AI. I devoured Co-Intelligence in a few days, plan to read it at least one more time, and recommend it to anyone who will listen to me.
(Disclaimer: I'm nearly 70 years old (and a former Philadelphian, but I still get pleasantly excited about new things.)