As someone who used to deliver innovation and entrepreneurship programs in higher education institutions, I became increasingly disillusioned with the apparant lack of imagination and creativity in the startup ideas being proposed by students. In a way, this possibly reflects the outcomes of our sclerotic education system (aka, Sir Ken Robinson's "Do Schools Kill Creativity?"). I totally "get" the potential of LLM's as a tool for associative thinking by cobbling together and connecting apparently disassociated concepts. AI can serve to create the conditions for the flourishing of imagination if used wisely. Exciting times for innovation ... maybe.
I'd love to try to tease apart how LLMs make these connections. Where are they in the latent layers? What, if any, reasoning is used at all?
On the use of Kant, for example, Bing doesn't seem to use his essay on Perpetual Peace. I actually read that in Berlin as a student of International Relations. Kant would have approached MAD using his ideas on nation states, not ethics.
Bing can seem to make a cogent argument here, but doesn't have or use context with which to measure which of a thinker's ideas applies best, because it lacks categories. (Bing is not a Kantian!)
It's using a kind of free-association based on word-relationships more so than conceptual categories. In fact I believe its facility with concepts likely comes out of linguistic tropes more than logical distinctions.
I don't know what your prompting was, but I'm assuming it finds Kant from ethics, and within ethics, linguistic relationships close to (proximate to) terms, phrases, statements found in texts on MAD? (We could ask Bing about MAD and Kant's views on the nation state-I think we'd likely get a very different argument.)
What's interesting here is the manner in which logical and conceptual reasoning appear as effects of language. Bing's reasoning is still hallucinatory, imaginitive, inventive. But I don't think conceptual. It will appear to be intelligent when it's not. It'll appear to be educated when it's not. It'll test our intelligence, insofar as it forces us to measure and judge whether its reasoning is simply fanciful or indeed insightful.
I'd be curious to know what students of philosophy/humanities are learning about it. Have you seen anything? I haven't run into anything similar to what you're doing here.
To expand on this, and I might see if ChatGPT can uncover this:
The ethical dimension of mutually assured destruction is framed not as a matter of personal ethics but as nation-state constraints, because the agency and actor belongs to the state, not an individual. The philosophical considerations are thus international institutions, international norms and constraints, and the theories often make use of game theory (e.g. prisoner's dilemma), etc.
If we ask our LLMs to explain ethical dimensions of MAD using international relations, do they improve? If we refer to Realpolitik or Neorealism do they improve? As things are today, I think the prompts need to be engineered with a fair amount of insight into the context and framing of these types of questions in order to extract the best reasoned responses.
I'm very keen to see how ChatGPT 4 does with this. And when you can prompt an order of magnitude more than 2048 tokens I think we'll be able to expect better framed and contextualized answers.
"Kant would probably be opposed to mutually assured destruction as a means of achieving perpetual peace. He would consider it as a violation of his principle of universal hospitality, which requires states to treat each other as ends rather than means. He would also view it as incompatible with his idea of cosmopolitan law, which aims at establishing a world federation of republics that respect human rights and international law34. Kant would favor cooperation over confrontation, and reason over force.
However, Kant might also acknowledge the limitations of his own vision in light of realpolitik and neoliberalism. He might recognize that states are not always rational actors who follow moral laws, but rather pursue their own interests and power in an anarchic international system. He might also admit that his ideal federation of republics might be challenged by the realities of economic inequality, social diversity, and cultural conflict. He might concede that his perpetual peace project requires not only political reform but also ethical education and cultural transformation. "
That's actually pretty insightful. My own take, but I haven't dug into it, is that Kant's interest in pursing peace through international institutions and rule of law/norms would enable him to see that Mutually Assured Destruction does work by means of imposing a higher moral law on the actions of nation-states; that deterrence employs a logic of non-use and no-first-use and so MAD employs a kind of paradox: deploying nuclear weapons whilst agreeing to never use them, precisely because using them is morally unthinkable.
That's the logical argumentation (or one logical argument).
But more broadly, a student of international relations would approach this as a moral, not ethical question. Would frame it within the context of nation-states not individuals. Would recognize the competing theories of nation states as rational actors, of an international system with its balance of power, of the game theoretical quandaries of state action, the limited rule of law in international institutions and the vulnerability of international agreements (given there is no superseding legal authority but only in/formal non-binding agreement) and so on, and so on.
I don't think AI can "think" or "reason" like this yet.
We would have to employ a kind of questioneering method by which to lead it down a nested series of frameworks so that it considers the question within the right conceptual framework. All it has really is gradient descent. We'd need to start the ball rolling down the right hilltop in the right location of the right region of the right territory....
I see a discipline emerging in this, because prompt engineering is more specifically instructional, and depends on the context being the appropriate context for the use case. Interesting!
I found this posting inspiring and helpful. I have been using ChatGPT since the first release in November but posts like this help uncover fresh insights to explore further. Very supportive.
That LLMs "are basically word prediction engines" is true, but reductive. There's a lot more going on that people haven't been investigating. I've been looking into how ChatGPT tells stories. Here's the abstract of an article I've posted online:
I examine a set of stories that are organized on three levels: 1) the entire story trajectory, 2) segments within the trajectory, and 3) sentences within individual segments. I conjecture that the probability distribution from which ChatGPT draws next tokens follows a hierarchy nested according to those three levels and that is encoded in the weights off ChatGPT's parameters. I arrived at this conjecture to account for the results of experiments in which ChatGPT is given a prompt containing a story along with instructions to create a new story based on that story but changing a key character: the protagonist or the antagonist. That one change then ripples through the rest of the story. The pattern of differences between the old and the new story indicates how ChatGPT maintains story coherence. The nature and extent of the differences between the original story and the new one depends roughly on the degree of difference between the key character and the one substituted for it. I conclude with a methodological coda: ChatGPT's behavior must be described and analyzed on three levels: 1) The experiments exhibit surface level behavior. 2) The conjecture is about a middle level that contains the nested hierarchy of probability distributions. 3) The transformer virtual machine is the bottom level.
I’m struck at what an interesting place we are with the consumer facing AI right now. Its capabilities are a process of discovery through user creativity.
You write, regarding its answers on four technologies to save the Roman empire: "I am sure there are mistakes, but this is pretty impressive."
I think it's actually *riddled* with mistakes and misconceptions. If you go to the Marginal Revolution blog, you'll see some excellent comments about the mistakes it made.
Here is what I wrote on the Marginal Revolution blog. The first sentence is a quote from a previous commenter:
"It’s interesting to me how much the GPT answers read like a bright, but credulous and untutored student in their senior year of high school trying to summarize a lot of information in an “authoritative” voice but lacking the context or critical thinking skills to demonstrate real expertise."
Yes, it's fascinating to me as a retired mechanical/environmental engineer to read the GPT answers. I asked ChatGPT questions about electrostatic precipitators in which its answers showed that it fundamentally didn't grasp the physical situation.
It would be fascinating (to me!) to have an engineer ask the GPT further questions on each of its four technology scenarios, to see if it could "recognize" its mistakes, as ChatGPT was able to "recognize" its mistakes in its responses to me on electrostatic precipitator questions.
For example, I'm pretty sure that a primary reason that steam engines were first developed in Britain (rather than Italy) is that Britain had lots of coal. If wood is used to fuel these hypothetical piston engines the GPT wants Rome to build to "save the Empire", then deforestation is massive:
Further, as has already been pointed out, a key with any piston engine is the incredibly small gap between the piston and the cylinder. That makes a huge difference in the efficiency of the steam engine. Per wonderful Wikipedia:
"Watt worked on the design over a period of several years, introducing the condenser, and introducing improvements to practically every part of the design. Notably, Watt performed a lengthy series of trials on ways to seal the piston in the cylinder, which considerably reduced leakage during the power stroke, preventing power loss. All of these changes produced a more reliable design which used half as much coal to produce the same amount of power.[1]"
So it would be extremely interesting (to me!!!) to ask the GPT what fuel the Romans would burn to power these hypothetical piston engines, and to also ask whether Romans had metallurgical and manufacturing technologies capable of actually building the hypothetical piston engines.
P.S. I could go on and on. ;-) I'm not an expert, but I think the idea of adding vinegar to water to kill microbial contamination is probably ridiculous. Full strength vinegar doesn't even work very well as a disinfectant:
I too was surprised at how human-like the AI was. But I was pretty bummed out when it didn’t know what the square root of 64 was. When I corrected the AI and said the answer was 8, it said, “If you knew the answer, you should not have asked.” 🤔
"But I was pretty bummed out when it didn’t know what the square root of 64 was."
There was an interesting piece on "60 Minutes" in which a Microsoft person asked Bing to comment on who Leslie Stahl is, and Leslie Stahl was "pretty bummed out" to read that she had worked for "NBC news", rather than "CBS news."
The Microsoft person explained something to the effect of, "NBC News, CBS News...the AI doesn't really know there's a difference."
This is a very useful post, and I've learned a lot from it.
I am puzzled by this statement: "the stories were great." The stories do not seem great at all to me. That excerpt is "great"? It seems bare adventure, emotionally and morally meaningless.
I want to emphasize that this post and your others are fascinating, and bring out important aspects and benefits of using Bing AI. However, I think it's equally important and interesting to understand that answers to technical questions that appear to be "brilliant" can actually be complete BS to someone who understands that underlying physics/chemistry/biology involved. Here's one of several comments I made at the "Marginal Revolution" blog regarding Bing AI's suggestions to save the Roman Empire:
The first sentence in quotation marks were the words of a previous commenter on Marginal Revolution:
"I understand that some commenters may have reservations about LLMs, but I encourage everyone to focus on the topic at hand - how to use LLMs effectively."
In this case, Ethan Mollick asked Bing a question. He was impressed by the answer, but only because Ethan Mollick doesn't know enough to see the massive number of inaccuracies and misconceptions in the answer.
As Gary Marcus might say, Ethan Mollick got pure BS as an answer, but counted it as gold or silver, because on this matter Ethan Mollick can't tell BS from gold or silver.
As someone who used to deliver innovation and entrepreneurship programs in higher education institutions, I became increasingly disillusioned with the apparant lack of imagination and creativity in the startup ideas being proposed by students. In a way, this possibly reflects the outcomes of our sclerotic education system (aka, Sir Ken Robinson's "Do Schools Kill Creativity?"). I totally "get" the potential of LLM's as a tool for associative thinking by cobbling together and connecting apparently disassociated concepts. AI can serve to create the conditions for the flourishing of imagination if used wisely. Exciting times for innovation ... maybe.
I'd love to try to tease apart how LLMs make these connections. Where are they in the latent layers? What, if any, reasoning is used at all?
On the use of Kant, for example, Bing doesn't seem to use his essay on Perpetual Peace. I actually read that in Berlin as a student of International Relations. Kant would have approached MAD using his ideas on nation states, not ethics.
Bing can seem to make a cogent argument here, but doesn't have or use context with which to measure which of a thinker's ideas applies best, because it lacks categories. (Bing is not a Kantian!)
It's using a kind of free-association based on word-relationships more so than conceptual categories. In fact I believe its facility with concepts likely comes out of linguistic tropes more than logical distinctions.
I don't know what your prompting was, but I'm assuming it finds Kant from ethics, and within ethics, linguistic relationships close to (proximate to) terms, phrases, statements found in texts on MAD? (We could ask Bing about MAD and Kant's views on the nation state-I think we'd likely get a very different argument.)
What's interesting here is the manner in which logical and conceptual reasoning appear as effects of language. Bing's reasoning is still hallucinatory, imaginitive, inventive. But I don't think conceptual. It will appear to be intelligent when it's not. It'll appear to be educated when it's not. It'll test our intelligence, insofar as it forces us to measure and judge whether its reasoning is simply fanciful or indeed insightful.
I'd be curious to know what students of philosophy/humanities are learning about it. Have you seen anything? I haven't run into anything similar to what you're doing here.
To expand on this, and I might see if ChatGPT can uncover this:
The ethical dimension of mutually assured destruction is framed not as a matter of personal ethics but as nation-state constraints, because the agency and actor belongs to the state, not an individual. The philosophical considerations are thus international institutions, international norms and constraints, and the theories often make use of game theory (e.g. prisoner's dilemma), etc.
If we ask our LLMs to explain ethical dimensions of MAD using international relations, do they improve? If we refer to Realpolitik or Neorealism do they improve? As things are today, I think the prompts need to be engineered with a fair amount of insight into the context and framing of these types of questions in order to extract the best reasoned responses.
I'm very keen to see how ChatGPT 4 does with this. And when you can prompt an order of magnitude more than 2048 tokens I think we'll be able to expect better framed and contextualized answers.
Bing says
"Kant would probably be opposed to mutually assured destruction as a means of achieving perpetual peace. He would consider it as a violation of his principle of universal hospitality, which requires states to treat each other as ends rather than means. He would also view it as incompatible with his idea of cosmopolitan law, which aims at establishing a world federation of republics that respect human rights and international law34. Kant would favor cooperation over confrontation, and reason over force.
However, Kant might also acknowledge the limitations of his own vision in light of realpolitik and neoliberalism. He might recognize that states are not always rational actors who follow moral laws, but rather pursue their own interests and power in an anarchic international system. He might also admit that his ideal federation of republics might be challenged by the realities of economic inequality, social diversity, and cultural conflict. He might concede that his perpetual peace project requires not only political reform but also ethical education and cultural transformation. "
That's actually pretty insightful. My own take, but I haven't dug into it, is that Kant's interest in pursing peace through international institutions and rule of law/norms would enable him to see that Mutually Assured Destruction does work by means of imposing a higher moral law on the actions of nation-states; that deterrence employs a logic of non-use and no-first-use and so MAD employs a kind of paradox: deploying nuclear weapons whilst agreeing to never use them, precisely because using them is morally unthinkable.
That's the logical argumentation (or one logical argument).
But more broadly, a student of international relations would approach this as a moral, not ethical question. Would frame it within the context of nation-states not individuals. Would recognize the competing theories of nation states as rational actors, of an international system with its balance of power, of the game theoretical quandaries of state action, the limited rule of law in international institutions and the vulnerability of international agreements (given there is no superseding legal authority but only in/formal non-binding agreement) and so on, and so on.
I don't think AI can "think" or "reason" like this yet.
We would have to employ a kind of questioneering method by which to lead it down a nested series of frameworks so that it considers the question within the right conceptual framework. All it has really is gradient descent. We'd need to start the ball rolling down the right hilltop in the right location of the right region of the right territory....
I see a discipline emerging in this, because prompt engineering is more specifically instructional, and depends on the context being the appropriate context for the use case. Interesting!
I found this posting inspiring and helpful. I have been using ChatGPT since the first release in November but posts like this help uncover fresh insights to explore further. Very supportive.
That LLMs "are basically word prediction engines" is true, but reductive. There's a lot more going on that people haven't been investigating. I've been looking into how ChatGPT tells stories. Here's the abstract of an article I've posted online:
I examine a set of stories that are organized on three levels: 1) the entire story trajectory, 2) segments within the trajectory, and 3) sentences within individual segments. I conjecture that the probability distribution from which ChatGPT draws next tokens follows a hierarchy nested according to those three levels and that is encoded in the weights off ChatGPT's parameters. I arrived at this conjecture to account for the results of experiments in which ChatGPT is given a prompt containing a story along with instructions to create a new story based on that story but changing a key character: the protagonist or the antagonist. That one change then ripples through the rest of the story. The pattern of differences between the old and the new story indicates how ChatGPT maintains story coherence. The nature and extent of the differences between the original story and the new one depends roughly on the degree of difference between the key character and the one substituted for it. I conclude with a methodological coda: ChatGPT's behavior must be described and analyzed on three levels: 1) The experiments exhibit surface level behavior. 2) The conjecture is about a middle level that contains the nested hierarchy of probability distributions. 3) The transformer virtual machine is the bottom level.
https://www.academia.edu/97862447/ChatGPT_tells_stories_and_a_note_about_reverse_engineering_A_Working_Paper
I’m struck at what an interesting place we are with the consumer facing AI right now. Its capabilities are a process of discovery through user creativity.
Titanic, emojis? lol Ethan, I liked your creativity.
Hi,
You write, regarding its answers on four technologies to save the Roman empire: "I am sure there are mistakes, but this is pretty impressive."
I think it's actually *riddled* with mistakes and misconceptions. If you go to the Marginal Revolution blog, you'll see some excellent comments about the mistakes it made.
Here is what I wrote on the Marginal Revolution blog. The first sentence is a quote from a previous commenter:
"It’s interesting to me how much the GPT answers read like a bright, but credulous and untutored student in their senior year of high school trying to summarize a lot of information in an “authoritative” voice but lacking the context or critical thinking skills to demonstrate real expertise."
Yes, it's fascinating to me as a retired mechanical/environmental engineer to read the GPT answers. I asked ChatGPT questions about electrostatic precipitators in which its answers showed that it fundamentally didn't grasp the physical situation.
It would be fascinating (to me!) to have an engineer ask the GPT further questions on each of its four technology scenarios, to see if it could "recognize" its mistakes, as ChatGPT was able to "recognize" its mistakes in its responses to me on electrostatic precipitator questions.
For example, I'm pretty sure that a primary reason that steam engines were first developed in Britain (rather than Italy) is that Britain had lots of coal. If wood is used to fuel these hypothetical piston engines the GPT wants Rome to build to "save the Empire", then deforestation is massive:
https://en.wikipedia.org/wiki/Deforestation_during_the_Roman_period
Further, as has already been pointed out, a key with any piston engine is the incredibly small gap between the piston and the cylinder. That makes a huge difference in the efficiency of the steam engine. Per wonderful Wikipedia:
https://en.wikipedia.org/wiki/Watt_steam_engine
"Watt worked on the design over a period of several years, introducing the condenser, and introducing improvements to practically every part of the design. Notably, Watt performed a lengthy series of trials on ways to seal the piston in the cylinder, which considerably reduced leakage during the power stroke, preventing power loss. All of these changes produced a more reliable design which used half as much coal to produce the same amount of power.[1]"
So it would be extremely interesting (to me!!!) to ask the GPT what fuel the Romans would burn to power these hypothetical piston engines, and to also ask whether Romans had metallurgical and manufacturing technologies capable of actually building the hypothetical piston engines.
P.S. I could go on and on. ;-) I'm not an expert, but I think the idea of adding vinegar to water to kill microbial contamination is probably ridiculous. Full strength vinegar doesn't even work very well as a disinfectant:
https://www.healthline.com/health/is-vinegar-a-disinfectant#disinfectant-properties
I too was surprised at how human-like the AI was. But I was pretty bummed out when it didn’t know what the square root of 64 was. When I corrected the AI and said the answer was 8, it said, “If you knew the answer, you should not have asked.” 🤔
"But I was pretty bummed out when it didn’t know what the square root of 64 was."
There was an interesting piece on "60 Minutes" in which a Microsoft person asked Bing to comment on who Leslie Stahl is, and Leslie Stahl was "pretty bummed out" to read that she had worked for "NBC news", rather than "CBS news."
The Microsoft person explained something to the effect of, "NBC News, CBS News...the AI doesn't really know there's a difference."
:-)
Great stuff, nice work!
This is a very useful post, and I've learned a lot from it.
I am puzzled by this statement: "the stories were great." The stories do not seem great at all to me. That excerpt is "great"? It seems bare adventure, emotionally and morally meaningless.
Interesting read. I like your exploration into this technology. I saw this the other day, might be a fun topic for you to explore:
How to use the 'JAILBREAK' version of ChatGPT: Simple trick lets you access an unfiltered alter-ego of the AI chatbot
https://www.dailymail.co.uk/sciencetech/article-11816953/How-use-access-unfiltered-alter-ego-AI-chatbot-ChatGPT.html
ChatGPT Shortcut:https://newzone.top/chatgpt/
A site for ChatGPT's prompt words
Are we using different Bings? Every attempt I've made so far has either been lame or a dead end.
Hi,
I want to emphasize that this post and your others are fascinating, and bring out important aspects and benefits of using Bing AI. However, I think it's equally important and interesting to understand that answers to technical questions that appear to be "brilliant" can actually be complete BS to someone who understands that underlying physics/chemistry/biology involved. Here's one of several comments I made at the "Marginal Revolution" blog regarding Bing AI's suggestions to save the Roman Empire:
The first sentence in quotation marks were the words of a previous commenter on Marginal Revolution:
"I understand that some commenters may have reservations about LLMs, but I encourage everyone to focus on the topic at hand - how to use LLMs effectively."
In this case, Ethan Mollick asked Bing a question. He was impressed by the answer, but only because Ethan Mollick doesn't know enough to see the massive number of inaccuracies and misconceptions in the answer.
As Gary Marcus might say, Ethan Mollick got pure BS as an answer, but counted it as gold or silver, because on this matter Ethan Mollick can't tell BS from gold or silver.