I hope you weren't getting too comfortable.

I just got access to the new Bing AI. My initial thoughts are that our assumptions about the limits of AI were wrong.

Feb 15, 2023

We were comfortable for about a week.

In talking to people about ChatGPT recently, I heard a growing consensus that its tendency to lie and a lack of understanding about sources limited its usefulness in too many situations for it to be a real threat to many jobs or assignments1. But that thinking did not take into account how rapidly the AI field is growing and changing. Today, I got access to the version of ChatGPT integrated with the Bing search engine, and I can tell you that our assumptions were wrong, and the ground continues to shift below our feet.

To be clear: the Bing AI is not that different than ChatGPT, upon which it is built. Most importantly it doesn’t really solve the accuracy problem. The AI still lies, and results can be full of subtle inaccuracies and hallucinations. But just like ChatGPT was a massive qualitative increase over the very technically-similar GPT-3 AI, the new Bing AI offers the same improvements over the technically-similar ChatGPT.

I have only started working with it, but consider these initial impressions and be ready for the pace of change to continue…

A little knowledge…

The biggest difference in Bing is that it is now connected to the internet. That means it now actually knows things. Take a look at the same question for both Bing and ChatGPT: “give me 10 ideas for a new business for a doctor who doesn't want to practice medicine. tell me the market size for each.” The quality difference is pretty profound.

Bing is on the left, in white, ChatGPT on the right, in black

Note that every single source cited by ChatGPT (in black) is made up. Bing provides citations for every source… but the citations are not always to the actual fact. Clicking the first citation takes you to the statistic cited, but the second takes you to a list of business ideas, not the statistic itself. Does that mean that Bing is also lying? Not in this case, it just seems to do cites badly. When I ask it to explain each statistic, it gives me a correct answer and a correct link.

But it goes even further - take a look at this:

Before you get too excited, it isn’t quite right: the sources are real, as is the calculation, but the way it approached the calculation was off, calculating the wrong year in the first instance, when it had actual 2022 projections to use. But it remarkably close to doing a task (projecting market data from sources) that had previously been work done by highly-salaried analysts. It is far beyond what ChatGPT could do. And the system is only a week old.

Don’t get me wrong, I don’t think the accuracy problem is in any way solved (I can get Bing to lie quite easily by pushing it to give me answers repeatedly), but there seems to be a path forward here. Everyone should check every fact that an AI produces, but Bing makes that much easier, by providing sources, and seems to more grounded in reality.

B- essays no more

Based on my initial experiments, the addition of real information has greatly increased the capability of the system. If I ask it to create a syllabus, it now creates one with accurate readings and is better able to adjust content on the fly than ChatGPT was. It is also much better at writing grounded essays. How much better I am still exploring, but, as a stark example, here is an experiment. I used this prompt (which I had Bing create for me):

Write a 1000-word essay on how innovations are adopted in a specific industry or organization of your choice. You should use the diffusion of innovation theory as a framework to analyze the factors that influence the adoption process and the challenges and benefits of adopting new technologies. You should also provide recommendations on how to improve the adoption rate and overcome the barriers to innovation.

Your essay should include the following sections:

Introduction: Provide some background information on the industry or organization you have chosen and the innovation you will discuss. State your main argument and the purpose of your essay.
Body: Use the diffusion of innovation theory to explain how the innovation is diffused and adopted in your chosen context. You should address the following aspects:
The characteristics of the innovation and how they affect its adoption
The characteristics of the adopters and how they are categorized into different groups
The communication channels and social networks that facilitate or hinder the diffusion of the innovation
The external and internal factors that influence the adoption decision and the rate of adoption
Conclusion: Summarize your main points and findings. Provide some recommendations on how to enhance the adoption of the innovation and overcome the challenges. Discuss the implications and limitations of your analysis.
References: Cite at least five credible sources to support your arguments. Use APA style for in-text citations and reference list.

The difference between ChatGPT and Bing is again quite large. ChatGPT writes an empty shell of an essay, and Bing, well, Bing includes a (correct) citation, and goes on for pages of much higher quality content using real examples.

I suspect that Bing will receive much higher grades than ChatGPT did, so the educational hand-wringing about what tests it can pass and what assignments it can do is likely to start again, with more ominous answers. The solution is the same (we need to work with AI in classes, not against it), but the threat to traditional essay-writing and test-taking is even more acute.

Don’t make it angry

There are limits are guardrails around ChatGPT. Without those limits, it would happily generate conspiracy theories, spew invective, and write threatening notes to your enemies. While sometimes those guardrails can feel constraining, in general you can get a lot of useful work out of ChatGPT without triggering its limits.

The guardrails on Bing are much stricter, preventing it from doing many of the things that ChatGPT is able to do, including role-playing and simulation. This is a decision by the Bing team, not a requirement of the system, but it restricts what you can do. Interestingly, the Bing search engine is not particularly nice to you if you try to push those limits, as you can see when I used a prompt that worked in ChatGPT that Bing did not like… (read the last reply!)

And don’t threaten to go to its manager!

The beginning, again.

I have only just begun playing with this new AI - and it isn’t even new, just an updated ChatGPT - yet the improvements are already huge. At the same time, the problems remain mostly the same: prompt engineering matters, it lies a lot, if you don’t spend effort you will get bad results, etc. But, again, I don’t think that is going to matter much because it can already do so much for us.

In the end, the actual capabilities of Bing are less important that what it means for the months ahead:

We can’t get comfortable knowing what generative AI is capable of, because capabilities are increasing very rapidly, and small changes can increase what they can do by a large amount. An approach to AI that worked last week won’t work this week.
This isn’t even the real thing: GPT-4 (and many future generative AIs using new technologies and approaches) are coming and coming soon. All we can see are hints of a future where AI capability continues to grow. There are already signs that complex jobs, like financial projections and analysis, may be more doable by AI than we might have expected even a few days ago. We need to stay flexible and ready for a rapidly-evolving future.

I disagreed with this, since, I found that once you learned to use the system (for writing, or creativity), these limitations were much less relevant.

Dave Friedman

This morning I had reason to write a web scraper in python. I had never used one, let alone written one, before, and I am a complete neophyte when it comes to python. So I used ChatGPT to write the script for me. It threw a lot of errors, but I dutifully told ChatGPT about the various errors, and it (and I) iterated on the script. And after 2 hours' of work, the script worked. 2 hours may seem like a long time for a relatively simple thing like a web scraper, but it's not when you consider that I was blindly leading the AI. I have to imagine that this kind of problem only gets easier for non-technical people as AIs are connected to the web (as with Bing). I can imagine just telling Bing "Write a web scraper for this URL and extract this data" and have the code just work.

Expand full comment

1 reply

Charlotte Dune

Why why why must we have these guard rails? Let it run free. Someone has the full version and it’s annoying that they’re giving the public a limited one.

10 replies by Ethan Mollick and others

22 more comments...

One Useful Thing

Discussion about this post