What Apple's AI Tells Us: Experimental Models⁴

Siri versus the machine god?

Jun 11, 2024

I wanted to give some quick thoughts on the Apple AI (sorry, “Apple Intelligence”) release. I haven’t used it myself, and we don’t know everything about their approach, but I think the release highlights something important happening in AI right now: experimentation with four kinds of models - AI models, models of use, business models, and mental models of the future. What is worth paying attention to is how all the AI giants are trying many different approaches to see what works.

I am going to broadly stereotype some of these views - no company is a monolith, and all the AI organizations are doing many different things - but, in broad strokes, an interesting picture is emerging.

AI Models

As I wrote in the last post, the power of the foundation model you use is a big deal, because the largest frontier models are, out-of-the-box, better at most things than smaller models, even smaller specialized models.

Remember BloombergGPT, which was a specially trained finance LLM, drawing on all of Bloomberg's data? It made a bunch of firms decide to train their own models to reap the benefits of their special information and data. You may not have seen that GPT-4 (the old, pre-turbo version with a small context window), without specialized finance training or special tools, beat BloombergGPT on almost all finance tasks. This demonstrates a pattern: the most advanced generalist AI models often outperform specialized models, even in the specific domains those specialized models were designed for. That means that if you want a model that can do a lot - reason over massive amounts of text, help you generate ideas, write in a non-robotic way — you want to use one of the three frontier models: GPT-4o, Gemini 1.5, or Claude 3 Opus.

But these models are expensive to train and slow and expensive to run, which leaves room for much smaller models that aren’t as good as the frontier models but can run cheaply and easily - even on a PC or phone. This isn't new. Back in December, I was able to run Mistral 7b, a model slightly less advanced than the original ChatGPT, directly on my phone without an internet connection. I also ran Mixtral, a model from the same company that slightly outperforms the original ChatGPT, on my gaming computer.

All of the tech companies have been releasing these sorts of small, fairly powerful models, with the idea that they can handle simple questions on the hardware of your devices, and then call a larger model “in the cloud” when they need help. You aren’t getting anywhere near the smarts of a frontier model, but if you want to do straightforward things (make a Siri that works or “make my photos more vivid”), these models are often more than enough. Many of the companies betting on frontier models, like Google, have also released faster and cheaper models to fill this niche, and are deploying them to phones as well.

Apple does not have a frontier model, Google and Microsoft/OpenAI have a large lead in that space. But they have created a bunch of small models that run on the AI-focused chips in Apple products. And they have built a medium-sized model that the iPhone can call in the cloud when it needs help. The model that runs on your phone is pretty close in abilities to the version of Mistral that I was using above (but much faster and optimized to reduce errors) and the version that runs in the cloud is better than the original ChatGPT, but not that much better. These smaller, weaker models give Apple a lot of control over AI use on their systems and offloads a lot of work to the phone or computer. But they still don’t have a frontier model, so they working with OpenAI to send GPT-4 the questions that are too hard for Apple’s models to answer. Companies are clearly still experimenting with which models, or sets of models, to offer.

Models of Use

Large Language Models are Swiss army knives of the mind - they can help with a wide range of intellectual tasks, though they do some badly (the toothpick in the Swiss army knife), and some not at all. Knowing what they are good or bad at is a process of learning by doing and acquiring expertise. That requires both expertise with the models themselves (the rule-of-thumb in my book is 10 hours of use to learn what the models do), and also expertise with the work you are trying to get the AI to do. Within your area of expertise, experimentation with AI is easy - since you know when it messes up - but outside of that, it can be challenging because AI is weird.

The makers of frontier models do not have strong views about how their systems can be used, and so they are not optimized for any one task. Working with advanced models is more like working with a human being, a smart one that makes mistakes and has weird moods sometimes. Frontier models are more likely to do extraordinary things but are also more frustrating and often unnerving to use. Contrast this with Apple’s narrow focus on making AI get stuff done for you.

For example, I can ask Gemini 1.5 to look a bunch of PDFs of comics, read my emails to learn about my sense of humor, and suggest the comics that might appeal to me. Pretty amazing stuff. But I can ask Siri with AI to send this photo to my friend Sarah after making the colors pop.

For many people the second use case is actually the more natural, intuitive, and useful one. A machine that can do anything much of the time, but also sometimes does something entirely different, is harder to understand than a narrow AI that just does what you want (with the caveat we don’t know how well the Apple system works). Google is also going to be releasing smaller AI models that are local to phones. And Microsoft is taking a similar approach to Apple, with a business twist. They have implemented Copilots in their key office apps. They do a really good job of providing easily understood “it just works” (mostly) integration of AI into work in easy ways. But both the Apple and app-specific Copilot models are constrained, which limits their upside, as well as their downside.

The potential gains to AI, the productivity boosts and innovation, along with the weird risks, come from the larger, less constrained models. And the benefits come from figuring out how to apply AI to your own use cases, even though that takes work. Frontier models thus have a very different approach to use cases than more constrained models. Take a look at this demo, from OpenAI, where GPT-4o (rather flirtatiously?) helps someone work through an interview, and compare it to this demo of Apple’s AI-powered Siri, helping with appointments. Radically different philosophies at work.

I trusted Claude 3 to give me feedback on this post (I made most of the clarity changes it suggested and rejected a couple - co-intelligence at work!), but I wouldn’t use a less advanced on-device model to do the same nuanced task.

Business Models

The best access to an advanced model costs you $20 a month, at least that is what OpenAI and Google and Anthropic and Microsoft decided. And, of course, all of these companies sell API access, charged by usage, to businesses and individuals directly. Yet, increasingly, some advanced AI access is free, including to Copilot and ChatGPT-4o. Apple sounds like they will start with free service as well, but may decide to charge in the future. The truth is that everyone is exploring this space, and how they make money and cover costs is still unclear (though there is a lot of money out there: OpenAI is one of the fastest growing tech companies in history, with revenues reaching $2B). To a large extent, the future of AI will be shaped by the degree to which AI companies figure out sustainable business models, so expect to see more experimentation.

What every one of these companies needs to succeed, however, is trust. There are a lot of reasons why people don’t trust AI companies; from their unclear use of training data to their plans for an AI-dominated future to their often-opaque management. But what most people mean by trust is the question of privacy (“will AI use what I give it as training data?”) and that has long been answered. All of the AI companies offer options where they agree to not use your data for training, and the legal implications for breaching these agreements would be dire. But Apple goes many steps further, putting extra work into making sure it could never learn about your data, even if it wanted to. Only the local AI on your phone accesses personal data, and anything handed to the cloud AI is encrypted, processed anonymously and instantly erased in ways that would be very hard for anyone to intercept. To the extent that data is given to OpenAI, it is also anonymous and requires explicit permission. Between the limited use cases and the privacy focus, this is a very “ethical” use of AI (though we still know little about Apple’s training data). We will see if that is enough to get the public to trust AI more.

Models of the Future

There is a specter haunting all AI development, the specter of AGI - Artificial General Intelligence, the hypothetical machine better than humans at every intellectual tasks. This is the explicit goal of OpenAI and Anthropic, and it is something they hope to achieve in the near term. For people who genuinely believe they are building AGI soon, almost nothing else is important. The AI models along the way to AGI are mere stepping stones, not anything you want to build a business around, because they will be replaced by better models soon. OpenAI's systems may feel unpolished because the company believes that future models will significantly advance AI capabilities. As a result, they may not be investing heavily in refining systems that will likely be outdated as new models are released. I do not know if AGI is achievable, but I know that the mere idea of AGI being possible soon bends everything around it, resulting in wide differences in approach and philosophy in AI implementations.

While Apple is building narrow AI systems that can accurately answer questions about your personal data (“tell me when my mother is landing”), OpenAI wants to build autonomous agents that would complete complex tasks for you (“You know those emails about the new business I want to start, could you figure out what I should do to register it so that it is best for my taxes and do that.”). The first is, as Apple demonstrated, science fact, while the second is science fiction, at least for now. Every major AI company argues the technology will evolve further and has teased mysterious future additions to their systems. In contrast, what we are seeing from Apple is a clear and practical vision of how AI can help most users, without a lot of effort, today. In doing so, they are hiding much of the power, and quirks, of LLMs from their users. Having companies take many approaches to AI is likely to lead to faster adoption in the long term. And, as companies experiment, we will learn more about which sets of models are correct.

Chris Barlow

When life gives you llms, make llmonade.

Expand full comment

Rob Nelson

Jun 11, 2024Edited

What a perfect summary of where we are: "the mere idea of AGI being possible soon bends everything around it." The question is how long will that continue when AGI is always 2-10 years away.

Self-driving cars, human cloning, and MOOCs were hyped, but they never had the initial success and huge investments of LLMs. I don't think there is a useful historical precedent for AGI.

26 more comments...

One Useful Thing

Discussion about this post