As a business school professor, I am keenly aware of the research showing that business school professors are among the top 25 jobs (out of 1,016) whose tasks overlap most with AI. But overlap doesn’t necessarily mean replacement, it means disruption and change. I have written extensively about how a big part of my job as a professor - my role as an educator - is changing with AI, but I haven’t written as much about how the other big part of my job, academic research, is being transformed1. I think that change will be every bit as profound, and it may even be necessary.
Even before ChatGPT, something alarming was happening in academia. Though academics published ever more work, the pace of innovation appeared to be slowing rapidly. In fact, one paper found that research was losing steam in every field, from agriculture to cancer research. More researchers are required to advance the state of the art, and the speed of innovation appears to be dropping by 50% every 13 years. The reasons for this are not entirely clear, and are likely complex, but it suggests a crisis already occurring, one that AI had no role in. In fact, it is possible that AI may help address this issue, but not before creating issues of its own.
I think AI is about to bring on many more crises in scientific research… well, not crises - singularities. I don’t mean The Singularity, the hypothetical moment that humans build a machine smarter than themselves and life changes forever, but rather a narrower version. A narrow singularity is a future point in human affairs where AI has so altered a field or industry that we cannot fully imagine what the world on the other side of that singularity looks like. I think academic research is facing at least four of these narrow singularities. Each has the potential to so alter the nature of academic research that it could either restart the slowing engine of innovation or else create a crisis to derail it further. The early signs are already here, we just need to decide what we will do on the other side.
Singularity #1: How we write and publish
In many academic fields, academic research is grindingly slow. I have had papers that took nearly a decade from when I first started working on them until they were published in a journal. Top quality journals are built for this pace, and so are very ill-prepared for the flood of academic articles that AI is unleashing. That is because many researchers are using AI for writing articles, speeding up a key part of the research process and confusing the signals reviewers look for when evaluating work.
Some of this AI help is incredibly clumsy and unethical, like the flood of papers with obvious LLM-written sections or horrifying AI-created images. But AI writing can actually be extremely helpful when used correctly. After all, many scientists excel at their field of interest, but they may not be excellent writers or communicators. Yet GPT-4 class models are actually quite good at scientific writing, producing introductions that are, in at least one small study, the equal of humans. If AI can help with the writing process, it could allow scientists to focus on what they do best, speeding up the research process by having the AI help with time-consuming tasks. Of course, we have no way of knowing whether researchers are properly checking the writing from the AI, so, with a flood of new articles, peer review becomes ever more important…
…and also more automated. Already, at a major AI conference, up to 17% of the text of all peer reviews was written by AIs. While this might be bad news for science, it is good news for the people who wrote their papers using AI and got an AI peer reviewer, because AI peer reviewers prefer AI-written papers.
And yet, we may not want to dismiss the idea of AI helping with peer review. Recent experiments suggest AI peer reviews tend to be surprisingly good, with 82.4% of scientists finding AI peer reviews more useful than at least some of the human reviews they received from on a paper, and other work suggests AI is reasonably good at spotting errors, though not as good as humans, yet. Regardless of how good AI gets, the scientific publishing system was not made to support AI writers writing to AI reviews for AI opinions for papers later summarized by AI. The system is going to break.
And that is even before AI starts helping people create papers, rather than assisting on writing alone. To demonstrate the potential, I put together a little GPT that will explore any dataset, generating hypotheses and testing them in increasingly sophisticated ways. If you try it (free datasets are here if you want to experiment) you will find that AI can do some impressive data work, but it also can be used as a tool for automating bad research, tirelessly p-hacking a dataset until results are achieved. We are going to need to think about how to address these issues of integrity at many levels.
So, how do we come out the other side of this singularity? We need to reconsider the very nature of scientific publishing and reach some conclusions:
What does scientific publishing and peer review look like in the future?
How do we deal with the flood of AI content?
How do we model positive AI use that increases the pace of research while discouraging bad uses?
Singularity #2: How we research
LLMs are also transforming how research is actually done. This is probably the most discussed potential singularity, so I will not try to cover it exhaustively, but the implications are quite large. And to be clear, so are the risks: we know AI is biased in ways that we don’t fully understand (though it is often less biased than humans doing the same work), it obviously is prone to hallucination and errors. Much more research is needed to get a sense of the reliability of different AIs under different circumstances. With this caveat in mind, the evidence suggests that AI can, indeed, help with research in many ways. AI is quite useful at text analysis, for example. Plus, working with it is more like working with a human research assistant than a programming language, meaning that more researchers can benefit because they don’t need to learn specialized skills to work with AI. This expands the set of research techniques available for many academics.
But LLMs can also do things that human research assistants would struggle with. For example, large context windows, which let the AI hold hundreds of thousands of words in memory at the same time, enable pretty remarkable feats of analysis. I gave Gemini Pro the 20 papers and books that made up my academic work prior to 2022, over 1000+ pages of PDFs. It was able to extract direct quotes and find themes across all of them with only quite minor errors. This process of summarizing an entire career’s worth of writing typically takes many, many hours of work.
But AI creates even weirder possibilities for social science research, in that it can, under some circumstances, simulate human beings with high levels of accuracy. Simulated AI subjects act like human ones, allowing researchers to replicate famous experiments like the Milgram obedience studies or the results of personality tests across 50 countries. And to add an extra dimension of weirdness to these experiments, individual AI agents, assigned personalities and goals, can interact and learn with each other in simulated environments. For example, simulated doctors in a simulated hospital with simulated patients learned how to better diagnose diseases. We may be able to discover a lot from these types of counterfactual simulations, though, of course, AI agents aren’t humans and AI simulations need to be interpreted carefully.
Perhaps the most interesting (and disruptive) way to use LLMs in science is to have AI systems autonomously try to discover new things. Specialized LLMs have shown the ability to generate new mathematical knowledge with minimal human guidance. And some early work suggests that LLMs can generate new hypotheses in the social sciences, develop a plan to test those hypotheses, and then actually conduct the tests using simulations. This would create a new form of fully automated social science conducted entirely by machines. But even this is less ambitious than a set of experiments that gave GPT-4 access to a chemical database and the ability to write software to control lab equipment. This allowed the AI to plan and conduct actual chemistry experiments on its own. The scientists were thrilled by the potential, but also fully aware of the potential dangers.
In the near future, AIs may actually conduct science, changing the nature of research in ways we can’t predict. To guide this rapid change, we need to answer a few questions:
What AI methods are okay to use? Which ones risk bad science, bias, or dangerous outcomes?
What should autonomous agents be allowed to research? How can they be monitored and stopped if needed?
Singularity #3: What our research means
There is often a big gap between the research world and the public. Papers that are important in an academic field may seem meaningless to those outside of it, let alone to the wider world. And yet, having been in academia for two decades, I believe that a tremendous amount of academic research that has value in the outside world, value that even many academics don’t recognize. AI can help create that bridge between academia and the real world. For example, here is a little GPT that tells you why an academic paper might matter to you. I gave it one of my papers, and it did an excellent job explaining the implications (and summarizing the key results).
Just as interesting is the promise that AI might help researchers explain work to each other, finding opportunities for multi-disciplinary cooperation and helping handle the flood of research Singularities 1 and 2 have unleashed. We know that AI can conduct massive literature reviews and find connections between unexpected work, as well as locating errors and gaps that can be filled. An AI that can connect researchers to ongoing research and discussions can be a valuable tool to restart the engine of innovation. But we need to reconsider the boundaries between fields, and between academia and the public, in order to find a better world on the other side of this singularity.
Singularity #4: What we research
Today, we still do not understand a lot about the implications of LLMs and why they are so good at simulating human thought without actually thinking themselves. More practically, we also don’t know what tasks LLMs do well or badly at. Even the researchers who create them are not aware of their full set of capabilities, and there is large-scale debate over how much LLMs are doing “original thinking” versus spitting back what they learned when trained. The one thing the early research is showing, however, is that LLMs are going to be a big deal in the real world, outperforming humans at an increasing number of real jobs.
From the moment ChatGPT-3.5 came out, I (and many of my colleagues) were overwhelmed by the implications of LLMs, and rapidly pivoted our research focus to AIs. But there aren’t enough of us, yet. If AI is, indeed, a General Purpose Technology, one of those rare innovations that will impact much of our culture, economy, and society, then we need a crash effort to understand its implications, shape its development, mitigate its risks, and help everyone gain its benefits. Because the impacts are multidisciplinary, we need researchers from many fields to join in. This is an exciting time, but if academics don’t seize the moment to do this, others will. We have a unique opportunity to rise to the challenge of our singularities, and, if we do, the world will be better for it.
There is no more dangerous audience to write for than fellow academics, so I want to caveat all of what I am about to write with the disclaimer that there is wide variance in academic fields, academic institutions, academic jobs, and national contexts. Not everything I write about will apply to your field, maybe nothing will!
It was humbling when I first used ChatGPT to review a peer review I had written. The article being reviewed presented the efficacy of a health device based on random controlled trial sponsored by the device's manufacturer. Of course I was alert to bias, and discovered a few minor instances. But the LLM mildly mentioned a discrepancy between the control and treatment conditions that had been worded so slyly as to evade human detection. Pulling on the thread that the LLM exposed uncovered a deceptive practice that invalidated their conclusions, and (after they protested "This is the way it is always done!") a large body of previous sponsored research.
[ If you are curious: the researcher required subjects to "follow the manufacturer's instructions" for each device. In practice, treatment group subjects were told to comply with the ideal duration, frequency and manner of use specified in the printed instructions. But control group subjects were given a simpler competing device that offered no enclosed instructions and thus were given no performance requirements at all for participation in the research. ]
In my personal experience, this aspect is key: "[M]ore researchers can benefit because they don’t need to learn specialized skills to work with AI. This expands the set of research techniques available for many academics."
For the first time in my life, over the past year, I've been able to do serious text analysis on relatively large texts, with Python. (Specifically, on the Talmud, which is ~1.8 million words.)
The barrier to entry for doing meaningful coding is now far lower