On giving AI eyes and ears

Ethan Mollick

Jun 23, 2023

183

AI can listen and see, with bigger implications than we might realize.

Read →

20 Comments

Dov Jacobson

Jun 23, 2023

Yet another eye-opening and complacency-shattering set of experiments. Bravo, Ethan.

But buried in your generally optimistic essay on new capabilities is a disturbingly dark example of the oncoming misery we can expect even from the "old" school chatbot tech; It is hard to read the letter composed by your AI assistant to your cable company. It was bullying, laced with arrogant menace conjured out of nothing at all.

Okay haha. We all hate the cable company, right?

But we love humans. And my sympathies are with the fictional, poorly paid worker whose emotional vulnerabilities are being attacked so cheaply by a inhuman algorithm. (Ps: Conceding that the reader is probably an offshore gig worker does not dehumanize her.)

And this was just a note to random reader written by a fabulously ignorant algorithm. What happens when AI is assigned to compose abusive nastygrams based on, actual personal information of any depth.

Expand full comment

Reply (2)

JamesD

Jun 26, 2023

Please, there's no need to worry. The cable company's staff won't be adversely affected - after all, don't they have an AI capable of summarizing customer letters in a respectful tone?

Expand full comment

Comment deleted

Jun 23, 2023

Comment deleted

Expand full comment

Dov Jacobson

Jun 23, 2023

Doug - thanks for responding, but ain't that the "Guns Don't Kill People" argument?

Expand full comment

Andrew Smith

Jun 23, 2023

I've heard the argument made that LLMs and such can't be the final step that leads to AGI, since a "body" is needed in order to "experience the world." I'm not really in a position to agree or disagree, but I wonder if the multimodal stuff- particularly the ability for one application to receive inputs from a variety of sources- could be a stepping stone in this direction.

Expand full comment

Reply (1)

Adam

Jul 8, 2023Edited

I always found the "body" assertion a bit silly. Why is "able to move molecules with arbitrary appendages" now included in the definition of intelligence? Why does text input not count as a sensory organ, and why does text output not count as a "body" interacting with a "world"?

I think it's a weird definition that one comes up with *after* deciding that machine learning techniques couldn't possibly be creating something intelligent.

Expand full comment

Reply (1)

Andrew Smith

Jul 8, 2023

I feel this way too, especially about "inputs." Another way to think about this is that we're essentially their inputs; we tell them about the world (pretraining) and then allow them to explore the world (by interacting with us). I'm firmly in the same camp as you, although I remain open to reasons why this view might not be 100% accurate.

It is SO fun to think about this with folks who are open to the conversation, without an axe to grind!

Expand full comment

Dana Polojärvi

Jun 23, 2023Edited

Thanks for another interesting post, Ethan. I appreciate your willingness to work with these bots and show us their results.

For some reason many people are surprised by these results. But they seem very basic to me. When scientists and engineers come together to scrape the creative algorithms from millions of people's labor, it's no surprise that they can use those algorithms to generate results quickly and that the companies they formed to profit off this activity can release bots that can radically underbid a competitive human by making the service free for a brief time. It's a classic economy of scale right out of the Walmart or dollar store playbook.

I was most struck by the language the bot used to communicate. It's so filled with anthropomorphic misrepresentation. Is this a deliberate effort to mislead human users into imagining there's a person behind the bot? It said it was proud of something. That's not possible, and I wonder if it is a feature or a glitch that the AI is being trained give the impression that it can feel.

Expand full comment

Jacob Shapiro

Jun 23, 2023

Thank you for all the work you're doing on this, Ethan. I'm working to create a resource repository for faculty on how they can use AI in their classes, and this blog continues to be a goldmine of ideas and inspirations.

Expand full comment

Michael Horwitz

Jun 23, 2023

As I was reading this I was thinking about the possibilities for developing proposals and then thought about grading papers with its input. The final idea of listening to a presentation and giving feedback was very interesting and I’m going to try it this semester to see what I get. Now I’m thinking that I’ll deliver a presentation and get feedback? What’s next?

Expand full comment

Brent A. Anders

Aug 9, 2023

Excellent work as usual Ethan, thank you. The ongoing developments of AI are something that all in academia need to aware of. I too saw how GPT4 introduced its multimodal capabilities and then didn't really make them available through GPT plus. In multiple interviews, Sam Altman seemed to indicate that he didn't think the world was ready so they were slowing down the release of new AI capabilities. Competition from other AIs will of course ensure that new capabilities will continue to be released.

Your great article is yet another important example of how all people (students, faculty, and everyone else) needs to develop AI Literacy (Awareness, Knowledge, Capability, and Critical Thinking) as well as a mindset of life-long learning in order to best understand and use these AI systems as we adapt education to best address this AI infused world.

Expand full comment

Felt

Jun 23, 2023

How these image capabilities seem to work for Bing is that when you upload an image another model seems to analyse the image and pull details from the image you uploaded, and then all the relevent features are fed to Bing (This is suggested to me based off of the "Analysing the image" and how it hallucinates), however, when asked further detail, as i believe Bing hasn't actually seen the image, it hallucinates based off of the previous images description.

Bing sidebar also got a recent update, in which it can actually search the page for information (previously it would just get a static page summary and that is all that it could work off, and the page summary didn't even work for a lot of webpages). How it seems to work though, is that the page is scraped for information that is relevent to the users request, then that section is summarised and the information is fed to Bing. however when you ask Bing further information and it doesn't re-search the page it hallucinates the answer based of off the previous information it had gotten. And also the page search seems to be a bit flawed and isn't amazing if you are searching for naunced information in a large document / webpage.

Expand full comment

Hamish Robertson

Jun 23, 2023

I was just at lunch sharing the future idea that all our meetings, and possibly all conversations will have live audio, and the AI can respond to any statement such as "I'll send you that podcast/article/document" and send the relevant thing from one person to another before the meeting is even over. It can also listen out for relevant action points and email them to the right person. If we all have our own personal AIs, then anything received can be put into our own personal "Newsfeeds" that prioritise our preferred consumption channels and patterns. I look forward to this world!

Expand full comment

Corey Hayes

Jun 23, 2023

Terrific read! Thank you.

Expand full comment

Mark Heyer

Jun 29, 2023

Another excellent posting! I'm working on a piece about visual art and AI for my pub The HeyerScope, and will link to this story. My target audience is intelligent but uninformed, and your writing is perfect for those who last week couldn't spell AI. Looking forward to more...

Expand full comment

Wesley Verhoeve

Jun 26, 2023

Dang this is eye opening!

Expand full comment

Matt

Jun 24, 2023

I'm curious what happens if you ask it to read the text in the third shoe image.

Expand full comment

Tam

Jun 23, 2023

Where does one find the ChatGPT app? The app store is showing me a ton of apps when I search for "ChatGPT" but is there, like, one definitive one?

Expand full comment

Reply (1)

Jos

Jun 23, 2023

https://openai.com/blog/introducing-the-chatgpt-app-for-ios

Expand full comment

Michael Spencer

Jun 23, 2023

What are we doing to do with robots who can code and be professors can replace us? Are we still going to praise the Singularity and sit at the altar of the Machine Economy? We're almost ready to put the ghost in the machine.

Expand full comment

Ken Kahn

Jun 23, 2023

ChatGPT:

The word "image" seems to be used incorrectly in this context. The sentence should probably use the word "imagine" instead, which would make sense given the context. Therefore, the corrected sentence would read:

"It is very easy to imagine the near future, where AI assistants are actually useful, and operate on your behalf, anticipating your needs, customizing answers to you, and more. A big contrast to trying to use Siri or Alexa today!"

Expand full comment

One Useful Thing

On giving AI eyes and ears