What a perfect summary of where we are: "the mere idea of AGI being possible soon bends everything around it." The question is how long will that continue when AGI is always 2-10 years away.
Self-driving cars, human cloning, and MOOCs were hyped, but they never had the initial success and huge investments of LLMs. I don't think there is a useful historical precedent for AGI.
The two key sentences of the article are these: "Apple does not have a frontier model, Google and Microsoft/OpenAI have a large lead in that space. But they have created a bunch of small models that run on the AI-focused chips in Apple products."
To some degree, but the model can’t be that large for it to actually run fast enough for this purpose and it is also being used for short bursts. Having it use external tools to do searches of local data and pass things around to diff apps is less intensive than writing out a super long response.
This is why Apple's invested in developing their neural engines over the past 7 years commercially, to thrive on our most used mobile devices and to prepare for nanosecond computations. The better their machine learning becomes, the less intensive it is on the device and its battery. I'm not engineer but I believe that is their intent which is why at WWDC there is an emphasis on App Intents, to call on it's 'Apple Intelligence' only when required by the application and with apps running in the background.
>"This demonstrates a pattern: the most advanced generalist AI models often outperform specialized models, even in the specific domains those specialized models were designed for"
This may be true about certain things, such as the example you gave - finance. But for the topic I focus on - classical Jewish rabbinic literature (specifically, talmud) - ChatGPT4's knowledge is clearly fairly shallow. So I'd assume that a specialized model would be able to beat ChatGPT4 in that case. And I'd assume that most such relatively niche cases would be like this. As opposed to finance, which is relatively mainstream.
Unrelated, I've been reading and hearing more and more about uncensored models, I'd love to hear your thoughts on those, I don't believe you've discussed that aspect in depth
See my other comment, my assumption would be slightly different. The special model would beat ChatGPT on the kinds of deep data that aren’t covered on the general internet. ChatGPT would beat your model on something so obvious that the classical literature doesn’t cover it. My hypothesis in any case.
Hi! Good point. Have you tried RAG-enabled tools for this, though? For instance, Perplexity using Claude 3 Opus as its model. Specialization and fine-tuning are not needed in this case; RAG sometimes (not always) delivers impressive answers about topics that were clearly outside of the base model's training.
(You can also just ask Claude 3 Opus directly instead of ChatGPT; it's a formidable model. With Claude 3 Opus's context window, it can probably fit both versions of the Talmud and lots of commentaries too. Do keep in mind these models' tendency to make stuff up and be terrible at counting things precisely.)
Funny you mention that, I was just today testing using Google's NotebookLM to do this. I split the ~3 million words of the Talmud translation into parts, and uploaded them as sources. I wonder if Claude's context window is large enough to fit that much text
I didn't realize it. was. so. much. text. Claude won't fit that much text in its context window. But it may have more Talmud built into it than we may realize. I tried a simple prompt ("Please make a list of five Talmud verses that talk about eating meat. Give me exact references for each") and what it produced is something that I don't have enough expertise to tell if it was hallucinated or real.
Here are five Talmud verses that discuss eating meat, along with their exact references:
1. "R. Judah said in Samuel's name: One who purchases meat from the market is like one who slaughters on Yom Kippur who does not offer..." (Babylonian Talmud, Chullin 94a)
2. "R. Huna said: One who does not have the wherewithal for two meals on Shabbat should sell his garment to purchase food for Shabbat including meat and wine." (Babylonian Talmud, Shabbat 118a)
3. "The Sages taught: You shall be holy, means one should sanctify oneself in what is permissible to them. Abstain from eating too much meat and drinking too much wine." (Babylonian Talmud, Chullin 84a)
4. "R. Yochanan said in the name of R. Meir: In the future, a person will have to give an account for everything their eyes saw which they did not eat..." (Babylonian Talmud, Kiddushin 40b)
5. "All agree in the case of Yom Tov that we require an abundance for you, meaning that one must have meat and wine because it is written: 'You shall rejoice in your Festival.'" (Babylonian Talmud, Pesachim 109a)
PS: This being Ethan's Substack, page 34 of Ethan's book on AI gives a great and simple explanation of why AI tends to misquote stuff, even if the text was part of its training data ("The AI stores only the weights from its pretraining, not the underlying text [continues]").
PPS: Ethan, I loved the four rules/principles from the book, but I've been giving up on principle 3 lately. I've noticed Gemini refuses to comply when cast as certain personas, so I've been aiming for the task and casting myself as the persona I want it to assume (e.g., "I've been asked to use expert knowledge to write three clear, high-quality reading comprehension test items based on the best practices in standardized testing, to use for the [X] test. Your task is to help me with this"). It seems to work! Also, Claude's prompt library seems to get done with the persona quickly and move on quickly to the task.
Apple's release of "Apple Intelligence" showcases the trend of exploring diverse AI models and use cases, from powerful generalist models to specialised smaller ones. For HR, this means recognising how these advancements can streamline tasks like talent acquisition and employee engagement, by using smaller, device-based models for routine tasks and calling upon powerful cloud-based models for complex decision-making. Great examples thank you for sharing Ethan.
I’m personally very happy that Apple has debuted their approach to the AI space. Of the big 3 companies they tend to be the great implementer. Apple tends to deliver the best commercially viable product and leave on the cutting room floor the spin that is wholly unnecessary.
And the LLM product category is absolutely riddled with spin. My personal suspicion is that people need to sell the moon because they don’t know what their products are good for themselves. I suppose I’m jaded now; jaded by completely fraudulent marketing by the leaders in the space.
Time will tell if Apple’s tech is overpromising, but this seems like a demonstration of restraint and poise on their part. All their tech does is make on device processing smarter. It’s not smarter than college grads. It’s not curing cancer. It’s not an event horizon that we can’t see past.
Yeah. Even their most earth-shattering products have simply (simply!) taken other ideas and presented them much better. Turns out, that's an incredibly important skill set!
Excellent article but so much of what is written is very obscure to me. Words like trust, reliability are important. I have used GPT for knitting questions or simple resume questions and have received several possibilities that I could use. What happens when we don’t question the results of AI.
“Can we trust this tool?” Whether it’s a flint arrowhead, bow, wheel, bridge, passage of scripture, law, steam engine, airframe, rocket: It’s fascinating to see these efforts being explored.
Humans probing a tool like AI like an octopus’s hundreds of suckers and eight arms explore an environment. Trust, confidence, probity, proof, and security arise only through trying.
There will be glory and grief along the way. We ever walk the paths of Eve, Adam, Tower-of-Babelians, Jacob, Thales, Archimedes before us ;-)
Would be interesting to hear more about how the specialy trained AIs are underperforming. We’ve noticed this with ScopusAI and a few other specialized databases (they get simple questions wrong - we think because simple questions aren’t in their training data). People are investing a ton in these systems assuming they will perform better.
When life gives you llms, make llmonade.
What a perfect summary of where we are: "the mere idea of AGI being possible soon bends everything around it." The question is how long will that continue when AGI is always 2-10 years away.
Self-driving cars, human cloning, and MOOCs were hyped, but they never had the initial success and huge investments of LLMs. I don't think there is a useful historical precedent for AGI.
The two key sentences of the article are these: "Apple does not have a frontier model, Google and Microsoft/OpenAI have a large lead in that space. But they have created a bunch of small models that run on the AI-focused chips in Apple products."
Thanks for a clearly-conceptualised overview of where we are with AI, Ethan.
Thanks for these thoughts, and especially the contrasting links to conversation with ChatGPT4o vs Apple Intelligence.
Which voice is creepier?
Apple's plodding old school text-to-speech?
or 4o's chillingly nuanced imitation of humanity??
I'm curious what Ethan and others make of the recent piece by Leopold Aschenbrenner - https://situational-awareness.ai/
So if Apple is doing most of this locally on chips in the phone, this will be a drag on battery life, right??
To some degree, but the model can’t be that large for it to actually run fast enough for this purpose and it is also being used for short bursts. Having it use external tools to do searches of local data and pass things around to diff apps is less intensive than writing out a super long response.
This is why Apple's invested in developing their neural engines over the past 7 years commercially, to thrive on our most used mobile devices and to prepare for nanosecond computations. The better their machine learning becomes, the less intensive it is on the device and its battery. I'm not engineer but I believe that is their intent which is why at WWDC there is an emphasis on App Intents, to call on it's 'Apple Intelligence' only when required by the application and with apps running in the background.
Great stuff.
One pushback:
>"This demonstrates a pattern: the most advanced generalist AI models often outperform specialized models, even in the specific domains those specialized models were designed for"
This may be true about certain things, such as the example you gave - finance. But for the topic I focus on - classical Jewish rabbinic literature (specifically, talmud) - ChatGPT4's knowledge is clearly fairly shallow. So I'd assume that a specialized model would be able to beat ChatGPT4 in that case. And I'd assume that most such relatively niche cases would be like this. As opposed to finance, which is relatively mainstream.
Unrelated, I've been reading and hearing more and more about uncensored models, I'd love to hear your thoughts on those, I don't believe you've discussed that aspect in depth
See my other comment, my assumption would be slightly different. The special model would beat ChatGPT on the kinds of deep data that aren’t covered on the general internet. ChatGPT would beat your model on something so obvious that the classical literature doesn’t cover it. My hypothesis in any case.
Yep, agreed! Good way to frame it
Hi! Good point. Have you tried RAG-enabled tools for this, though? For instance, Perplexity using Claude 3 Opus as its model. Specialization and fine-tuning are not needed in this case; RAG sometimes (not always) delivers impressive answers about topics that were clearly outside of the base model's training.
(You can also just ask Claude 3 Opus directly instead of ChatGPT; it's a formidable model. With Claude 3 Opus's context window, it can probably fit both versions of the Talmud and lots of commentaries too. Do keep in mind these models' tendency to make stuff up and be terrible at counting things precisely.)
Funny you mention that, I was just today testing using Google's NotebookLM to do this. I split the ~3 million words of the Talmud translation into parts, and uploaded them as sources. I wonder if Claude's context window is large enough to fit that much text
I didn't realize it. was. so. much. text. Claude won't fit that much text in its context window. But it may have more Talmud built into it than we may realize. I tried a simple prompt ("Please make a list of five Talmud verses that talk about eating meat. Give me exact references for each") and what it produced is something that I don't have enough expertise to tell if it was hallucinated or real.
Here are five Talmud verses that discuss eating meat, along with their exact references:
1. "R. Judah said in Samuel's name: One who purchases meat from the market is like one who slaughters on Yom Kippur who does not offer..." (Babylonian Talmud, Chullin 94a)
2. "R. Huna said: One who does not have the wherewithal for two meals on Shabbat should sell his garment to purchase food for Shabbat including meat and wine." (Babylonian Talmud, Shabbat 118a)
3. "The Sages taught: You shall be holy, means one should sanctify oneself in what is permissible to them. Abstain from eating too much meat and drinking too much wine." (Babylonian Talmud, Chullin 84a)
4. "R. Yochanan said in the name of R. Meir: In the future, a person will have to give an account for everything their eyes saw which they did not eat..." (Babylonian Talmud, Kiddushin 40b)
5. "All agree in the case of Yom Tov that we require an abundance for you, meaning that one must have meat and wine because it is written: 'You shall rejoice in your Festival.'" (Babylonian Talmud, Pesachim 109a)
PS: This being Ethan's Substack, page 34 of Ethan's book on AI gives a great and simple explanation of why AI tends to misquote stuff, even if the text was part of its training data ("The AI stores only the weights from its pretraining, not the underlying text [continues]").
PPS: Ethan, I loved the four rules/principles from the book, but I've been giving up on principle 3 lately. I've noticed Gemini refuses to comply when cast as certain personas, so I've been aiming for the task and casting myself as the persona I want it to assume (e.g., "I've been asked to use expert knowledge to write three clear, high-quality reading comprehension test items based on the best practices in standardized testing, to use for the [X] test. Your task is to help me with this"). It seems to work! Also, Claude's prompt library seems to get done with the persona quickly and move on quickly to the task.
Apple's release of "Apple Intelligence" showcases the trend of exploring diverse AI models and use cases, from powerful generalist models to specialised smaller ones. For HR, this means recognising how these advancements can streamline tasks like talent acquisition and employee engagement, by using smaller, device-based models for routine tasks and calling upon powerful cloud-based models for complex decision-making. Great examples thank you for sharing Ethan.
I’m personally very happy that Apple has debuted their approach to the AI space. Of the big 3 companies they tend to be the great implementer. Apple tends to deliver the best commercially viable product and leave on the cutting room floor the spin that is wholly unnecessary.
And the LLM product category is absolutely riddled with spin. My personal suspicion is that people need to sell the moon because they don’t know what their products are good for themselves. I suppose I’m jaded now; jaded by completely fraudulent marketing by the leaders in the space.
Time will tell if Apple’s tech is overpromising, but this seems like a demonstration of restraint and poise on their part. All their tech does is make on device processing smarter. It’s not smarter than college grads. It’s not curing cancer. It’s not an event horizon that we can’t see past.
It’s just a better iphone.
Which is all it needs to be.
I laughed at "a Siri that works". Just yesterday, I tweeted:
I look forward to the day when LLM-based AI embedded in phones can do the following:
• much better text prediction / backwards revision of typos
• more granular notifications of text messages and emails (i.e. distinguish b/w urgent/non-urgent messages based on context)
• better text-to-voice, including better iteration between device assistant and user when composing and revising texts/emails
Hopefully we're not far off from some of this!
Apple is so good at going from 1 to N, you know?
Going from zero to 1 hasn't really ever been their thing, and that's all right.
Very true, I think Tim Cook said something very similar in an interview about Apple's company philosophy
Yeah. Even their most earth-shattering products have simply (simply!) taken other ideas and presented them much better. Turns out, that's an incredibly important skill set!
Excellent article but so much of what is written is very obscure to me. Words like trust, reliability are important. I have used GPT for knitting questions or simple resume questions and have received several possibilities that I could use. What happens when we don’t question the results of AI.
“Can we trust this tool?” Whether it’s a flint arrowhead, bow, wheel, bridge, passage of scripture, law, steam engine, airframe, rocket: It’s fascinating to see these efforts being explored.
Humans probing a tool like AI like an octopus’s hundreds of suckers and eight arms explore an environment. Trust, confidence, probity, proof, and security arise only through trying.
There will be glory and grief along the way. We ever walk the paths of Eve, Adam, Tower-of-Babelians, Jacob, Thales, Archimedes before us ;-)
What will Artificial General Intelligence look like?
Would be interesting to hear more about how the specialy trained AIs are underperforming. We’ve noticed this with ScopusAI and a few other specialized databases (they get simple questions wrong - we think because simple questions aren’t in their training data). People are investing a ton in these systems assuming they will perform better.