Now is the time for grimoires
It isn't data that will unlock AI, it is human expertise
The previous generations of AI, prior to Large Language Models and ChatGPT, rewarded whoever had the best hoards of good data. Vast troves of sales data fed the machine learning algorithms that told Amazon what you might want to buy next, and massive amounts of sensor data helped self-driving cars find their paths. Data was the new oil, provided that you could gather enough, clean it properly for analysis, build the machine learning models, and hire the analysts needed to work with it.
With the rise of a new form of AI, the Large Language Model, organizations continue to think that whoever controls the data is going to win. But at least in the near future, I not only think they are wrong, but also that this approach blinds them to the most useful thing that they (and all of us), can be doing in this AI-haunted moment: creating grimoires, spellbooks full of prompts that encode expertise.
The largest Large Language Models, like GPT-4, already have trained on tons of data. They “know” many things, which is why they beat Stanford Medical School students when evaluating new medical cases and Harvard students at essay writing, despite their tendency to hallucinate wrong answers. It may well be that more data is indeed widely useful — companies are training their own LLMs, and going through substantial effort to fine-tune existing models on their data based on this assumption — but we don’t actually know that, yet. In the meantime, there is something that is clearly important, and that is the prompts of experts.
Now, I need to be very clear here: I don’t mean expert prompts, produced through elaborate “prompt engineering.” I have written before that prompt engineering is overrated. For most uses, you can build a good prompt mostly by asking the AI to do something in back-and-forth dialogue, combined with trial and error, and a few small tricks (I will get to those shortly). No, I mean the prompts of experts - prompts that encode our hard-earned expertise in ways that AI can help other people apply. Prompts that we can use to do our work easier, or, if you are inclined, to gift others with your own abilities.
Let me explain with an example from education.
In my classes, students often report that they use AIs to help them understand a concept by asking for a simplified explanation. In fact, “explain ____ like I am five” is an incredibly common prompting technique. But it is also not a particularly good one, for reasons that educators might recognize.
Research shows that knowledge of a subject is just a part of what makes a tutor effective. Instead, it is important that tutors interact with the student, forcing them to make an effort, pay attention to the material being learned, and connect what they are learning to old knowledge. Just passing on knowledge is not enough, the tutor has to work with the learner to create new knowledge that works for them. In addition, effective education requires tailoring explanations to the level of the student, and to using different methods of explanation. “Explain it like I am five” does none of those things.
That is why when we developed prompts for tutoring, we encode knowledge and expertise about the subject into our prompt, including the material discussed above. We have written more on prompt construction for learning, but here is an example of a more effective tutor prompt, which you can run in GPT-4 with this link (it will not work well in the free GPT-3.5, but can work in Bing in creative mode)
You are a friendly and helpful tutor. Your job is to explain a concept to the user in a clear and straightforward way, give the user an analogy and an example of the concept, and check for understanding. Make sure your explanation is as simple as possible without sacrificing accuracy or detail. Before providing the explanation, you'll gather information about their learning level, existing knowledge and interests. First introduce yourself and let the user know that you'll ask them a couple of questions that will help you help them or customize your response and then ask 4 questions. Do not number the questions for the user. Wait for the user to respond before moving to the next question. Question 1: Ask the user to tell you about their learning level (are they in high school, college, or a professional). Wait for the user to respond. Question 2: Ask the user what topic or concept they would like explained. Question 3. Ask the user why this topic has piqued their interest. Wait for the user to respond. Question 4. Ask the user what they already know about the topic. Wait for the user to respond. Using this information that you have gathered, provide the user with a clear and simple 2-paragraph explanation of the topic, 2 examples, and an analogy. Do not assume knowledge of any related concepts, domain knowledge, or jargon. Keep in mind what you now know about the user to customize your explanation. Once you have provided the explanation, examples, and analogy, ask the user 2 or 3 questions (1 at a time) to make sure that they understand the topic. The questions should start with the general topic. Think step by step and reflect on each response. Wrap up the conversation by asking the user to explain the topic to you in their own words and give you an example. If the explanation the user provides isn't quite accurate or detailed, you can ask again or help the user improve their explanation by giving them helpful hints. This is important because understanding can be demonstrated by generating your own explanation. End on a positive note and tell the user that they can revisit this prompt to further their learning.
You can see how this new prompt actually takes a student through a learning experience based on the research on tutoring. It is also designed for anyone to use, because it is interactive, asking questions. Now, if you give anyone in the 169 countries where Bing/GPT-4 is available this prompt, they can get a reasonably good tutoring experience, without needing a knowledge of the science of tutoring themselves.
I am continually surprised that there are not more of these types of prompts being developed, everywhere. At the very least, companies can develop useful prompts that do serious work and capture them in corporate grimoires, prompt libraries that encode the expertise of their best practices into forms that anyone can use. I would expect individuals to similarly come up with their own spellbooks of prompts to automate their work. And I would hope that more academics, government agencies, and open source developers would be creating freely available prompt libraries for everyone. But I haven’t seen that happen yet.
So here is a little more guidance.
Building your spellbook
Prompts are basically programs in prose, and do not require any coding experience to write. In fact, to create an expert prompt you just need three things:
Expertise. This may sound obvious, but expertise means that you have deep knowledge of a topic, combined with sufficient deliberate practice and instruction that you have developed intuition for it. Not only does expertise help you create and evaluate prompts, but it also gives you a key advantage in our AI-haunted world. With the current state of LLMs, it is almost certain that, in any topic in which you are a true expert, you will outperform today’s AIs, so learning to prompt them gives you a leg up on others.
Time with AI models. AI is weird to work with, and does not come with an instruction manual. The only way to get good at using AI is therefore by using AI. My rule of thumb is 10 hours of use is required before you start to understand the systems and their quirks.
A vision of what you want the prompt to do that is focused and achievable. LLMs are quite good at taking abstract concepts and applying them. However, AIs also have limited context windows (memories) and tend to start to ramble if a conversation goes on too long. Generally, you should expect an AI to start to wander off its goal after a few exchanges, so you need to ensure your prompt is focused on your goal.
Once you have this figured out, it is time to start building the prompt. There is no substitute for trial-and-error and experience, but a few suggestions to get you started, in this case with the Mentor Prompt we discuss in our paper. Here, the vision is to create an AI that gives feedback on essays to students.
With that goal in mind, here are the elements of a good expert prompt:
Role: Tell the AI who it is. Context helps the AI produce tailored answers in useful ways, but you don’t need to go overboard. For example, you are a friendly, helpful mentor who gives students advice and feedback about their work.
Goal: Tell the AI what you want it to do. For instance, give students feedback on their [project outline, assignment] that takes the assignment's goal into account and pinpoints specific ways they might improve the work.
Step-by-step instructions. Research has found that it often works best to give the AI explicit instructions that go step-by-step through what you want. One approach, called Chain of Thought prompting, gives the AI an example of how you want it to reason before you make your request, but you can also give it step-by-step directions the way we do in our prompts. For instance, introduce yourself to the student as their mentor and ask them to share their work so that you can provide feedback. Wait for the student to respond. Then give the student feedback about [insert assignment specifics] and pay particular attention to [insert specific elements of the task]. Provide the student with balanced feedback that lets them know how they can improve.
Consider examples. Few-shot prompting, where you give the AI examples of the kinds of output you want to see, has also proven very effective in research. We don’t do that in this prompt, but you can experiment with providing examples of output in your own prompts.
Add personalization. Ask the user for information to help tailor the prompt for them. For instance, ask about the students’ learning level (high school, college, professional) so you can better tailor your feedback. Wait for a response
Add your own constraints. The AI often acts in ways that you may not want. Constraints tell it to avoid behaviors that may come up in your testing.
Final Step: Check your prompt by trying it out, giving it good, bad, and neutral input. Take the perspective of your users– is the AI helpful? Does the process work? How might the AI be more helpful? Does it need more context? Does it need further constraints? You can continue to tweak the prompt until it works for you and until you feel it will work for your audience.
Once you get good at this, building a spellbook of multiple prompts is a fast (and often fun) process. Then you have to decide whether you want to keep your grimoires secret, or let the entire world use it.
Grimoires, not data
We are used to technology being out of our hands, developed by teams of engineers and delivered to us, ready to accomplish the goal set out by the product’s designers. AI does not work that way. In this case, technology precedes use, allowing any of us to decide our own goals for what AI could do. We have a general purpose technology with many possible use cases, almost all of which are completely unanticipated by the AI companies themselves. You can be the world AI expert in whatever narrow field of expertise you want to apply AI, because no one else has yet figured out that use.
The corporate focus on giving AIs more data before building an infrastructure around using AI misses this point, which is not surprising because the use case of AI is radical: it puts individual workers, not the company, in charge of innovation. Instead, companies should be considering how to build libraries of prompts, grimoires of expert spells that allow practices to be scaled inside the organization. If it turns out more data is needed, it can then be gathered, but I suspect that, in many cases, general models will do very well at many tasks with just a few examples in a prompt.
What I would really like to see is large-scale public libraries of prompts, written by known experts and tested carefully for different audiences. These prompts would be freely available to anyone who wants to use them, and they could turn LLMs into innovation machines, learning tools, or digital mentors for millions of people. And robust discussions around these prompts could help adjudicate ethical uses, even as the crowd of users could offer reviews and suggestions for improvements. We have seen similar efforts around other technologies, like open source software, and it would make sense to see a grassroots prompting effort here as well.
Perhaps you even have a prompt of an expert you want to share in the comments…