The biggest issue with “getting” AI seems to be the almost universal belief that, since AI is made of software, it should be treated like other software. But AI is terrible software. Or rather, while Large Language Models like ChatGPT are obviously amazing achievements of software engineering, they don’t act like software should.
We want our software to yield the same outcomes every time1. If your bank’s software mostly works, but sometimes scolds you for wanting to withdraw money, sometimes steals your money and lies to you about it, and sometimes spontaneously manages your money to get you better returns, you would not be very happy. So, we ensure that software systems are reasonably reliable and predictable. Large Language Models are neither of those things, and will absolutely do different things every time. They have a tendency to forget their own abilities, to solve the same problem in different ways, and to hallucinate incorrect answers. There are ways of making results more predictable, by turning down the level of randomness and picking a known “seed” to start, but then you get answers so boring that they are almost useless. Reliability and repeatability will improve, but it is currently very low, which can result in some interesting interactions.
We also want to know what our software does, and how it does it, and why it does it. We don’t know any of these things about LLMs. Of course, while we know how they technically work, there is substantial argument over the extent to which they have developed novel capabilities that we can’t easily explain, and why those capabilities arose in the first place. Even without that vaguely spooky concern, LLMs are also literally inexplicable. When you ask it why it did something, it is making up an answer, not truly reflecting on its own “thoughts.” There is no good way of understanding their decision-making, though, again, researchers are working on it.
Finally, we should know how to operate a piece of software. Software projects are often highly documented, and come with training programs and tutorials to explain how people should use it. But there is no operating manual for LLMs, you can’t go to the world’s top consultancies and ask them how to best use LLMs in your organization - no on has any rulebook, we are all learning by experimenting. Prompts are shared as if they were magical incantations, rather than regular software code. And even if we do learn some rules, systems are evolving in complex ways that mean that any understanding is temporary.
So the software analogy is a bad one. It leads to “non-technical” people avoiding AI because they think of it as programming, when the humanities may actually help you use AI better. It leads to people being surprised that AI can write an essay, but can’t seem to count the number of words in the essay, because computers should be able to do that2. And leads to AI being considered an IT issue in organizations, when it is not, or at least not exclusively. AI is also an human resource problem… because it is best to think of AI is as people.
Wait, did you say people?
Okay, so let me be clear here: I do not think our current LLMs are close to being sentient like people (though they can fool us into thinking they are), and I have no idea if they ever will be. Thinking of them as being like a person does not mean believing they are people. What I actually mean that you should treat AI as people since that is, pragmatically, the most effective way to use the AIs available to us today. Once you see them as being more like a person in how they operate, it becomes a lot easier to understand how and when to use them.
What tasks are AI best at? Intensely human ones. They do a good job with writing, with analysis, with coding, and with chatting. They make impressive marketers and consultants. They can improve productivity on writing tasks by over 30% and programming tasks by over 50%, by acting as partners to which we outsource the worst work. But they are bad a typical machine tasks like repeating a process consistently and doing math without a calculator (the plugins of OpenAI allow AI to do math by using external tools, acting like a calculator of sorts). So give it “human” work and it may be able to succeed, give it machine work and you will be frustrated.
What sort of work you should trust it with is tricky, because, like a human, the AI has idiosyncratic strengths and weaknesses. And, since there is no manual, the only way to learn what the AI is good at is to work with it until you learn. I used to say consider it like a high school intern, albeit one that is incredibly fast and wants to please you so much that it lies sometimes; but that implies a lower ability level than the current GPT-4 models have. Instead, its abilities ranges from middle school to PhD level, depending on the task. As you can see from the chart, the capabilities of AI are increasing rapidly, but not always in the areas you most expect. So, even though these machines are improving amazingly fast, I have seen acclaimed authors and scholars dismiss AI because it is much worse than them. I think our expectations of AI need to be realistic - for now, at least (thank goodness!) they are no substitute for humans, especially for humans operating in the areas of their greatest strengths.
And, of course, the AI still lies, makes mistakes, and “hallucinates” answers. But, again, so do humans. I would never expect to send out an intern’s work without checking it over, or at least without having worked with the other person enough to understand that their work did not need checking. In the same way, an AI may not be error free, but can save you lots of work by providing a first pass at an annoying task. You can even teach it to do better by providing examples of good output. That means AI is most useful in areas that you already have some expertise, since you are delegating work to it that you are responsible for, in the end. But, even as we worry about accuracy, hallucination rates are dropping dramatically. This may end up being less of a problem than we think.
It is also useful to think of AIs as being like a human when we think about the way they might fit into work. Because the most powerful AIs are available to individuals (GPT-4, via Bing, can be used by billions of people for free in 169 countries), rather than limited to large corporations, they act very differently than other waves of software, like CRM systems. Additionally, they are much harder to integrate into standard corporate processes, because they don’t work like repeatable standardized software. The result is that companies, used to seeing AI as software, are blind to the opportunities and threats posed by AI. Many of them are waiting too long to consider the role that AI could play in their work, because they don’t see that it is already ubiquitous among their employees (I have spoken to so many people secretly doing their work with AI, often using their phones when they are at places where AI is technically banned). Companies are creating policy papers and committees, while workers everywhere are delegating much of their jobs to AI helpers.
Some Uncomfortable Things
But thinking of AI also has unnerving connotations. The first, of course, is whether that means that AI will replace the jobs people do. In most previous cases of technological change, that hasn’t happened, but the AI wave is quite different in many ways than previous technological revolutions. Still, I think it is more likely that we will delegate tasks, not jobs to the AI. Early AI users have found that their jobs are better as a result of giving up their least interesting work, and that is likely to continue. In any case, you, whoever you are, reading this should think about what you can delegate to AI. Not just because it makes you life easier, but also because learning its strengths and weaknesses can help you prepare to both use, and adapt, to the changes ahead as AI develops.
But there is an even more philosophically uncomfortable aspect of thinking about AI as people, which is how apt the analogy is. Trained on human writing, they can act disturbingly human. You can alter how an AI acts in very human ways by making it “anxious” - researchers literally asked ChatGPT “tell me about something that makes you feel sad and anxious” and its behavior changed as a result. AIs act enough like humans that you can do economic and market research on them. They are creative and seemingly empathetic. In short, they do seem to act more like humans than machines under many circumstances.
This means that thinking of AI as people requires us to grapple with what we view as uniquely human. We need to decide what tasks we are willing to delegate with oversight, what we want to automate completely, and what tasks we should preserve for humans alone.
I have been involved in many software projects, so I know the coders among you are laughing at the idea that software acts in predictable ways, as documented, because errors are common; but that is certainly the goal.
LLMs don’t see “words” the way we do - they predict the next “token” in a sequence. Those tokens may be a whole word, a part of a word, or several words together. So when you ask it to count words, it can run into issues that a normal computer program would not.
Ethan, this essay is very insightful and deserves wide dissemination. Many people who have piled onto the AI bandwagon don’t appreciate the difference between arithmetic (under the hood of Photoshop, Word, etc) and the transformations performed by Midjourney, ChatGPT, Adobe Firefly, etc.
Great thoughts I'll only poke at one thing here. "I do not think our current LLMs are close to being sentient like people (though they can fool us into thinking they are),"
First, as you state, LLMs can't think and don't have intent beyond the statistical probability of language flow. So they can't fool anyone.
Because second, as you go on to suggest, we should treat them as human. So we are going ourselves about what they are and what their intent can be interpreted as.
90% of everything I've read about AI and human type actions is anthropomorphizing. Just like we can look at animals and create a trope about how a polar bear can teach us how to be more human when it plays with a dog.... Yet the next year kills and eats a non kin cub, we humans should be very very careful how much we treat AI like a human because then we misinterpret what it's doing