For what it's worth, I strongly suspect AI experts are, shall we say, a bit naive in how they think about human ability and put far too much stock in all those benchmarks originally designed to gauge human ability. Those tests were designed to differentiate between humans in a way that's easy to measure. And that's not necessarily a way to probe human ability deeply. Rodney Brooks on The Seven Deadly Sins of Predicting the Future of AI has some interesting remarks on performance and competence that are germane: https://disq.us/url?url=https%3A%2F%2Frodneybrooks.com%2Fthe-seven-deadly-sins-of-predicting-the-future-of-ai%2F%3AzD97WNZ6Jg6Q9ou45kbO_Odgs0A&cuid=2539338
More recently, I've taken a look at analogical reasoning, which Geoffrey Hinton seems to think will confer some advantage on AIs because they know so much more than we do. And, yes, there's an obvious and important way in which they DO know so much more than individual humans. But identifying and explicating intellectually fruitful analogies is something else. That's what I explore here: Intelligence, A.I. and analogy: Jaws & Girard, kumquats & MiGs, double-entry bookkeeping & supply and demand, https://new-savanna.blogspot.com/2024/05/intelligence-ai-and-analogy-jaws-girard.html
Passing the bar doesn't make a human a good lawyer, any more than passing the bar would make an AI a good lawyer. So what? This is pretty banal.
Humans are pretty crap at empathy and compassion, not surprising that an AI is kinder and more thoughtful than the average clinician, and communicaties more clearly. An AI won't ever be a registered clinician for the purpose of services or writing prescriptions, so it doesn't matter that an AI can't do the simple maths for prescribing.
It's very early days. No one has even started looking at AI for a vast array of risk areas that humans have no capacity to properly assess - think policing, child welfare, the economy, international relationships, famine, wars, climate, and so on. An AI can handle millions of data points, humans can't.
The possibilities of current AIs have barely been scratched, so what if the tools don't get smarter - humans won't, either.
What I find odd is that the charts seem to show the de-acceleration, the slowing down, of the rate of improvement once a model passes the human level. I don't see acceleration of improvement at high levels. Rather, at human level and beyond it takes exponential more resources to gain mildly better scores. Am I misunderstanding the charts?
Regardless of how good or bad AI systems perform relative to humans in business environments, this headlong rush to build more and more powerful AI systems seems ill advised. Two years ago, AI development was not really on anyone's radar (except those in the field of AI research and development). Today, every company on the planet claims to have AI-aided systems ready for sale. Ignoring the question of whether any of these systems are capable of doing what is promised, what is likely is that the adoption (perhaps too early adoption) of these systems will cause a disruption in society that we aren't taking any steps to prepare for or compensate for. The truth is, except for certain specialized applications such as medical/drug development research, there isn't any compelling reason why we absolutely need these systems right now. What I think we need right now is some serious consideration as to how our society should compensate for the disruptions from a broad adoption of AI. Since the institutions that are developing AI have publicly decided to abandon any concerns for the ethical issues associated with AI development in what has become an AI arms race to see which institution will come out on top financially, some institution outside of the AI community must be responsible. In the US, government regulation seems unlikely given the current political climate and the lack of expertise available. Even if the prediction from AI experts that 60% of all jobs will be affected (replaced?) by AI systems within the next few years is hyperbole, everyone should take seriously the question of how our society will function if a great number of jobs are simply eliminated with the advent of capable AI systems. I say, "curb your enthusiasm" and be careful what you wish for.
You’ve demonstrated, convincingly, that most professions are complete frauds easily replaced by a dumb machine. Separated from idiosyncratic jargon, medicine and the law, are conceptually simple and quite easily navigated with the properly prepared mind...not professionally prepared, properly prepared.
Its a nice note. We haven't got a good definition of AGI.
But it doesn't matter. A machine that is capable (forget intelligent, because that gets people muddled up) when it comes to a task, but is a fraction of the cost, higher efficiency, simply leads to significant impacts for the workforce.
the second order effect of intelligent and capable systems wanting something: like displacing humans, is more the fanciful fiction and anxiety of a threatened human race.
Thanks for the pointers to open weights models and to the leaderboard. We had a similar bake-off system for search engines, back when human feedback was the most useful metric for results quality. Now the results quality, "relevance", is pretty much equivalent across the big engines, the competition has shifted to presentation, monitizability, privacy and marketing. Sounds like these ai models might mature in that way too.
Given the commercial nature of AI development, I guess no single engine/model will be allowed to prevail, the direction of co-intelligence, enhancing human expertise sounds most favored.
As you are seasoned practitioner in the Higher Education domain, I would be very interested in your thoughts on how that field is and could be affected by the various forms of AI. It seems it’s often a domain absent in your published work. Otherwise very much enjoy the newsletter and the book was also excellent. Cheers
The problem with evaluating LLMs against many of these benchmarks is that the questions could have been a part of training data set. This is why I believe a better benchmark for AI is whether it can create new knowledge or predict future events, things that could not conceivably have been in the training data. It would also show AI's ability to achieve a higher level of intelligence.
On competence "across academic and professional disciplines," I've been interested in ChatGPT's ability at interpreting films. Here's a piece where I manage to prompt it to a high-school level interpretation of a film: Conversing with ChatGPT about Jaws, Mimetic Desire, and Sacrifice, https://3quarksdaily.com/3quarksdaily/2022/12/conversing-with-chatgpt-about-jaws-mimetic-desire-and-sacrifice.html I hazard to guess at what will be required for an AI to reach professional level, and if we're talking about film interpretation rather than interpreting novels and poems, well, that will require an AI that can actually watch and understand what's happening in a film. I have no idea what that will require.
More recently I've considered the question of using an AI to determine the formal structure of literary texts: https://new-savanna.blogspot.com/2024/05/what-do-i-personally-want-from-ai-as.html It's not rocket science, but it's tricky because it requires a kind of "free-floating" analytic awareness. I'm not sure how well an AI can approximate that with lots of compute.
I wonder what achieving AGI in medicine would look like. Today's discussions are centered on physician/patient conversations, diagnosis etc., but physician/nurse jobs obviously involve significant physical interactions with patients (for example palpation, surgery, drawing blood, infusions etc). Does achieving AGI in medicine mean performing all this too? It would seem so, because AGI is "a machine that beats humans at every possible task.". For example, does medical AGI include operating rooms staffed entirely by robots?
This is a nice note. And it highlights one of the glaring problems: we dont really have a good definition of intelligence depth or breadth for AGI. So take for example, a single task, you may say that humans do it to level 5 on average, level 7 for experts, level 9 for savants. This is depth, so a super intelligence may also do it to level 7, but combining it together with the human 7 actually raises it to level 8. (but then, we would need to test it with other AI as well).
Equally, a human may do many things at level 5-6, so would an AGI need to do the same breadth of activities? No. Not at all, and I think the problem is that once you've solved individual expertise from AI, their scaling and combination would result easily in systems that can do many things well simply by API-ing into the relevant expert system .
It is unreasonable to assume that current large language models can become artificial general intelligence. While LLMs have shown impressive capabilities in language tasks, they fundamentally lack the reasoning and general intelligence required for AGI.
LLMs are pattern-matching statistical models trained on vast amounts of data but do not have true understanding or reasoning abilities. They excel at tasks like text generation and question answering by recognizing patterns in their training data. Still, they cannot generalize to novel situations or reason from first principles as humans can.
LLMs lack robust common-sense reasoning, abstraction abilities, and general intelligence required for AGI. They struggle with tasks that require causal reasoning, knowledge transfer to new domains, or understanding of the physical world beyond pattern matching on text.
While future LLMs may become more capable by training on more data and using more compute, scaling up the current approach alone is unlikely to lead to AGI. We will need fundamental breakthroughs in reasoning, abstraction, and grounding in the real world.
LLMs' impressive performance comes from their ability to leverage vast amounts of data, but this does not equate to the flexible intelligence and understanding required for AGI. Their capabilities are narrow and specialized, even if they can handle many language-related tasks.
In summary, while LLMs are powerful tools, the consensus among AI experts is that they are not on the path to realizing AGI with their current approaches and limitations.
Significant conceptual advances are required to develop systems with the general reasoning and understanding abilities that characterize AGI.
I see all the time this claim that generative AI cannot build a world model. And certainly with both LLMs and diffusion models, we get lots of amusing mistakes, even from the frontier models. But is there any robust science that actually demonstrates that the full human corpus (text, video, etc.) is insufficient to generate a world model that is sufficiently accurate as to be considered human-level? Most humans, after all, do not understand why we cannot put our cup of coffee in the table rather than on it. They just know it. GenAI could be said to know the world in a similar way. The arguments that posit it as an impossibility (even from luminaries like Yann LeCun) seem very "hand-wavy" to me, but perhaps I have missed the more reasoned discussions.
Thank you for this. I will say, however, that it is exactly what I mean by "hand-wavy."
Edited to add: As a counter to the argument that a sense-based world model is obviously superior to a model based on text and video, I would point out that a sense-based model is prone to mistakes as well. Our susceptibility to optical illusions is an obvious example.
Also: The hunger example in the article is particularly silly. For one thing, if you define hunger as a sense-based property, then of course it is exclusive to sense-based organisms. But if you generalize the word hunger to mean a state in which acquiring food is prioritized, then there would be many ways to determine that state.
There's another tricky-yet-crucial detail to the term "AGI", which is what counts as a "task". Sometimes this is envisioned as being anything a human could do, so that cheap AGI would imply the total obsolescence of human labor, but sometimes it's narrowed to just intellectual or cognitive tasks, the sort of thing a human would do with pencil-and-paper or perhaps sitting at a computer. The latter, narrower definition does not require the ability to control robotics.
The meaning of "task" narrows even further for LLMs, since they cannot even attempt any task that isn't of the form "text in, text out". Though LLMs are now being extended to be multimodal, they are much weaker with non-text modalities than they are with text, in my experience.
It's interesting to think about the implications of having AGI that's text-only. It wouldn't obsolete human labor, and moreover it'd be able to do only some tiny fraction of jobs. But it would also have a huge impact in domains such as mathematics, programming, and writing (including writing blog posts such as this one).
I'm curious as to why we keep measuring AI against human intelligence. Isn't the idea of an alien intelligence being superior in some ways to human intelligence plausible? As in an earlier comment here, humans have done some pretty stupid things. Perhaps an alien level of intelligence would have capabilities beyond our current comprehension and conceptual abilities and help us avoid at least some stupidity.
If AI is becoming SUPER-human, then maybe we should be focusing on becoming super-HUMAN
For what it's worth, I strongly suspect AI experts are, shall we say, a bit naive in how they think about human ability and put far too much stock in all those benchmarks originally designed to gauge human ability. Those tests were designed to differentiate between humans in a way that's easy to measure. And that's not necessarily a way to probe human ability deeply. Rodney Brooks on The Seven Deadly Sins of Predicting the Future of AI has some interesting remarks on performance and competence that are germane: https://disq.us/url?url=https%3A%2F%2Frodneybrooks.com%2Fthe-seven-deadly-sins-of-predicting-the-future-of-ai%2F%3AzD97WNZ6Jg6Q9ou45kbO_Odgs0A&cuid=2539338
I've written an article in which I express skepticism about that ability of AI "expert" to gauge human ability: Aye Aye, Cap’n! Investing in AI is like buying shares in a whaling voyage captained by a man who knows all about ships and little about whales, https://3quarksdaily.com/3quarksdaily/2023/12/aye-aye-capn-investing-in-ai-is-like-buying-shares-in-a-whaling-voyage-captained-by-a-man-who-knows-all-about-ships-and-little-about-whales.html
More recently, I've taken a look at analogical reasoning, which Geoffrey Hinton seems to think will confer some advantage on AIs because they know so much more than we do. And, yes, there's an obvious and important way in which they DO know so much more than individual humans. But identifying and explicating intellectually fruitful analogies is something else. That's what I explore here: Intelligence, A.I. and analogy: Jaws & Girard, kumquats & MiGs, double-entry bookkeeping & supply and demand, https://new-savanna.blogspot.com/2024/05/intelligence-ai-and-analogy-jaws-girard.html
Passing the bar doesn't make a human a good lawyer, any more than passing the bar would make an AI a good lawyer. So what? This is pretty banal.
Humans are pretty crap at empathy and compassion, not surprising that an AI is kinder and more thoughtful than the average clinician, and communicaties more clearly. An AI won't ever be a registered clinician for the purpose of services or writing prescriptions, so it doesn't matter that an AI can't do the simple maths for prescribing.
It's very early days. No one has even started looking at AI for a vast array of risk areas that humans have no capacity to properly assess - think policing, child welfare, the economy, international relationships, famine, wars, climate, and so on. An AI can handle millions of data points, humans can't.
The possibilities of current AIs have barely been scratched, so what if the tools don't get smarter - humans won't, either.
What I find odd is that the charts seem to show the de-acceleration, the slowing down, of the rate of improvement once a model passes the human level. I don't see acceleration of improvement at high levels. Rather, at human level and beyond it takes exponential more resources to gain mildly better scores. Am I misunderstanding the charts?
Regardless of how good or bad AI systems perform relative to humans in business environments, this headlong rush to build more and more powerful AI systems seems ill advised. Two years ago, AI development was not really on anyone's radar (except those in the field of AI research and development). Today, every company on the planet claims to have AI-aided systems ready for sale. Ignoring the question of whether any of these systems are capable of doing what is promised, what is likely is that the adoption (perhaps too early adoption) of these systems will cause a disruption in society that we aren't taking any steps to prepare for or compensate for. The truth is, except for certain specialized applications such as medical/drug development research, there isn't any compelling reason why we absolutely need these systems right now. What I think we need right now is some serious consideration as to how our society should compensate for the disruptions from a broad adoption of AI. Since the institutions that are developing AI have publicly decided to abandon any concerns for the ethical issues associated with AI development in what has become an AI arms race to see which institution will come out on top financially, some institution outside of the AI community must be responsible. In the US, government regulation seems unlikely given the current political climate and the lack of expertise available. Even if the prediction from AI experts that 60% of all jobs will be affected (replaced?) by AI systems within the next few years is hyperbole, everyone should take seriously the question of how our society will function if a great number of jobs are simply eliminated with the advent of capable AI systems. I say, "curb your enthusiasm" and be careful what you wish for.
This is why you should look into PauseAI if you are concerned. Let's get to a better place got humanity!
You’ve demonstrated, convincingly, that most professions are complete frauds easily replaced by a dumb machine. Separated from idiosyncratic jargon, medicine and the law, are conceptually simple and quite easily navigated with the properly prepared mind...not professionally prepared, properly prepared.
Common law literally cannot be conceptually simple.
Also a pretty silly thing to say about, say, the immune system.
Its a nice note. We haven't got a good definition of AGI.
But it doesn't matter. A machine that is capable (forget intelligent, because that gets people muddled up) when it comes to a task, but is a fraction of the cost, higher efficiency, simply leads to significant impacts for the workforce.
the second order effect of intelligent and capable systems wanting something: like displacing humans, is more the fanciful fiction and anxiety of a threatened human race.
Thanks for the pointers to open weights models and to the leaderboard. We had a similar bake-off system for search engines, back when human feedback was the most useful metric for results quality. Now the results quality, "relevance", is pretty much equivalent across the big engines, the competition has shifted to presentation, monitizability, privacy and marketing. Sounds like these ai models might mature in that way too.
Given the commercial nature of AI development, I guess no single engine/model will be allowed to prevail, the direction of co-intelligence, enhancing human expertise sounds most favored.
As you are seasoned practitioner in the Higher Education domain, I would be very interested in your thoughts on how that field is and could be affected by the various forms of AI. It seems it’s often a domain absent in your published work. Otherwise very much enjoy the newsletter and the book was also excellent. Cheers
My post two weeks ago & 70% of my papers are on education!
Thank you for the reminder. I somehow missed the link to your paper. I will read it.
The problem with evaluating LLMs against many of these benchmarks is that the questions could have been a part of training data set. This is why I believe a better benchmark for AI is whether it can create new knowledge or predict future events, things that could not conceivably have been in the training data. It would also show AI's ability to achieve a higher level of intelligence.
https://forwardai.substack.com/p/is-ai-truly-intelligent
On competence "across academic and professional disciplines," I've been interested in ChatGPT's ability at interpreting films. Here's a piece where I manage to prompt it to a high-school level interpretation of a film: Conversing with ChatGPT about Jaws, Mimetic Desire, and Sacrifice, https://3quarksdaily.com/3quarksdaily/2022/12/conversing-with-chatgpt-about-jaws-mimetic-desire-and-sacrifice.html I hazard to guess at what will be required for an AI to reach professional level, and if we're talking about film interpretation rather than interpreting novels and poems, well, that will require an AI that can actually watch and understand what's happening in a film. I have no idea what that will require.
More recently I've considered the question of using an AI to determine the formal structure of literary texts: https://new-savanna.blogspot.com/2024/05/what-do-i-personally-want-from-ai-as.html It's not rocket science, but it's tricky because it requires a kind of "free-floating" analytic awareness. I'm not sure how well an AI can approximate that with lots of compute.
I wonder what achieving AGI in medicine would look like. Today's discussions are centered on physician/patient conversations, diagnosis etc., but physician/nurse jobs obviously involve significant physical interactions with patients (for example palpation, surgery, drawing blood, infusions etc). Does achieving AGI in medicine mean performing all this too? It would seem so, because AGI is "a machine that beats humans at every possible task.". For example, does medical AGI include operating rooms staffed entirely by robots?
This is a nice note. And it highlights one of the glaring problems: we dont really have a good definition of intelligence depth or breadth for AGI. So take for example, a single task, you may say that humans do it to level 5 on average, level 7 for experts, level 9 for savants. This is depth, so a super intelligence may also do it to level 7, but combining it together with the human 7 actually raises it to level 8. (but then, we would need to test it with other AI as well).
Equally, a human may do many things at level 5-6, so would an AGI need to do the same breadth of activities? No. Not at all, and I think the problem is that once you've solved individual expertise from AI, their scaling and combination would result easily in systems that can do many things well simply by API-ing into the relevant expert system .
It is unreasonable to assume that current large language models can become artificial general intelligence. While LLMs have shown impressive capabilities in language tasks, they fundamentally lack the reasoning and general intelligence required for AGI.
LLMs are pattern-matching statistical models trained on vast amounts of data but do not have true understanding or reasoning abilities. They excel at tasks like text generation and question answering by recognizing patterns in their training data. Still, they cannot generalize to novel situations or reason from first principles as humans can.
LLMs lack robust common-sense reasoning, abstraction abilities, and general intelligence required for AGI. They struggle with tasks that require causal reasoning, knowledge transfer to new domains, or understanding of the physical world beyond pattern matching on text.
While future LLMs may become more capable by training on more data and using more compute, scaling up the current approach alone is unlikely to lead to AGI. We will need fundamental breakthroughs in reasoning, abstraction, and grounding in the real world.
LLMs' impressive performance comes from their ability to leverage vast amounts of data, but this does not equate to the flexible intelligence and understanding required for AGI. Their capabilities are narrow and specialized, even if they can handle many language-related tasks.
In summary, while LLMs are powerful tools, the consensus among AI experts is that they are not on the path to realizing AGI with their current approaches and limitations.
Significant conceptual advances are required to develop systems with the general reasoning and understanding abilities that characterize AGI.
I see all the time this claim that generative AI cannot build a world model. And certainly with both LLMs and diffusion models, we get lots of amusing mistakes, even from the frontier models. But is there any robust science that actually demonstrates that the full human corpus (text, video, etc.) is insufficient to generate a world model that is sufficiently accurate as to be considered human-level? Most humans, after all, do not understand why we cannot put our cup of coffee in the table rather than on it. They just know it. GenAI could be said to know the world in a similar way. The arguments that posit it as an impossibility (even from luminaries like Yann LeCun) seem very "hand-wavy" to me, but perhaps I have missed the more reasoned discussions.
Here’s another perspective https://time.com/collection/time100-voices/6980134/ai-llm-not-sentient/
Thank you for this. I will say, however, that it is exactly what I mean by "hand-wavy."
Edited to add: As a counter to the argument that a sense-based world model is obviously superior to a model based on text and video, I would point out that a sense-based model is prone to mistakes as well. Our susceptibility to optical illusions is an obvious example.
Also: The hunger example in the article is particularly silly. For one thing, if you define hunger as a sense-based property, then of course it is exclusive to sense-based organisms. But if you generalize the word hunger to mean a state in which acquiring food is prioritized, then there would be many ways to determine that state.
There's another tricky-yet-crucial detail to the term "AGI", which is what counts as a "task". Sometimes this is envisioned as being anything a human could do, so that cheap AGI would imply the total obsolescence of human labor, but sometimes it's narrowed to just intellectual or cognitive tasks, the sort of thing a human would do with pencil-and-paper or perhaps sitting at a computer. The latter, narrower definition does not require the ability to control robotics.
The meaning of "task" narrows even further for LLMs, since they cannot even attempt any task that isn't of the form "text in, text out". Though LLMs are now being extended to be multimodal, they are much weaker with non-text modalities than they are with text, in my experience.
It's interesting to think about the implications of having AGI that's text-only. It wouldn't obsolete human labor, and moreover it'd be able to do only some tiny fraction of jobs. But it would also have a huge impact in domains such as mathematics, programming, and writing (including writing blog posts such as this one).
I'm curious as to why we keep measuring AI against human intelligence. Isn't the idea of an alien intelligence being superior in some ways to human intelligence plausible? As in an earlier comment here, humans have done some pretty stupid things. Perhaps an alien level of intelligence would have capabilities beyond our current comprehension and conceptual abilities and help us avoid at least some stupidity.