49 Comments
User's avatar
Ethan Mollick's avatar

I didn't mention it in the post but given discussions on social media, it is worth noting the the environmental impact of an individual image is negligible, while in aggregate it it obviously compounds.

The MIT Technology Review found that generating an image took 2,282 joules, which is equivalent to 5 seconds running a microwave or 14 seconds of a laptop. https://www.technologyreview.com/2025/05/20/1116327/ai-energy-usage-climate-footprint-big-tech/

Expand full comment
Saurabh Mukhekar's avatar

Model optimization for local models is an emerging focus in SOTA research

Expand full comment
Peter Gaffney's avatar

What's more frightening -- the threat of AI domination or the prospect of the Otters musical?

Expand full comment
Marcel van Driel's avatar

Yes

Expand full comment
Laurie's avatar

This is a fantastic roundup, thank you!

Expand full comment
Harris Madden's avatar

Curious to me that all attempts read your language for meaning and none for grammar or literally. You didn't say otter using wifi on a plane you said otter on a plane using wifi. Why didn't any of the images show a plane using wifi -- with an otter on it? Which is the literal read. It's not important just curious.

Expand full comment
Gregory S. McNeal's avatar

This post was otterly fantastic

Expand full comment
Andy Futuro's avatar

As impressive as all this is technically, I can't help but notice how dull the content is. My favorite time working with the AI was when it could barely produce a face. At least then the mistakes were usually interesting. Now it can accurately recreate the same recycled scenes and aesthetics we've seen a million times before.

It doesn't feel like the future; it feels like the past.

Expand full comment
Ethan Mollick's avatar

To be fair, I was just using the same basic prompt for each. You can get much wilder results from AI in many different ways.

Expand full comment
David's avatar

the tkin part of the essay threw out my theory that it was either just reconstructing the training data it was fed, or creating an output based on the "alignment" process of trying to make sure it didn't go too far off the rails or eclipse human preference. but agree that in general I am much more interested, in an artistic sense, in seeing what it would create "prima genesis," perhaps that's where the local weighted models might be more fun to play with. i'm also hoping we see a countermovement grow similar to the impressionists attempts to counter the disruptive power of camera and film to painting a century ago.

Expand full comment
Steve's avatar

So, now when I hear computer boffins warning about DOS, they’re talking about Deranged Otter Syndrome”?

Expand full comment
Dov Jacobson's avatar

Very useful metaphor: LLM like a novelist starting from the beginning, Diffusion like a sculptor chipping away everywhere.

Expand full comment
Kevin Hacker's avatar

How do we guard against fakes?

Expand full comment
Maxim's maxims's avatar

that's what I am worried about. Copyright issues is one thing, but very realistic deep fakes - they don't even have to be deep anymore...

Expand full comment
Eugine Nier's avatar

Humanity managed to survive until about a century ago without the ability to have "unfakable" images from far away.

Expand full comment
Eric Solomon's avatar

Amazing, Ethan. Simply amazing.

Expand full comment
Marcel van Driel's avatar

This post was much better than the otter posts

Expand full comment
Brent Brotine's avatar

Mind blown! Can you share some of the prompts you used -- such as the talk show? Really detailed, or as you suggested just as simple as “like the musical Cats but for otters”.

Expand full comment
Vicki Taylor's avatar

A lot more here than just "One Useful Thing!" I'm gobsmacked! Thank you!

Expand full comment
Deven Spear's avatar

Ethan, this is a great example of what continues to be elusive for a vast majority of the public that is not immersed in the transformative disruption on a weekly basis. It's hard to articulate and even more challenging to embrace the speed of change, but it is in fact, undeniable.

Expand full comment
Alex Tolley's avatar

One can only imagine what might be achieved within a few years. It reminds me of seeing graphics imaging like Photoshop on minicomputers in the early 1990s, yet being able to do the same thing on a PC about 5 years later.

While deep fakes and other similar productions are a cause of concern, as well as the copyright issues, the ability to do some extraordinary work in different domains using AI tools on a local computer is something once only read in science fiction.

If local, open models are just a few months or years behind the state of the art, why is there a need to race to hyperscale to AGI if a simpler, smaller model will achieve very similar output in a short time? It seems to me this implies a very transient benefit, and cheap, "good enough" should be more likely to displace the high-end model suggested by Clayton Christensen (The Innovator's Dilemma). It conjures up a cyberpunk world of distributed production of a vast range of goods and services using AI tools.

Expand full comment
Grvl's avatar

Back in 2020, me and my SO did a "predictions for 2030" time capsule. One of our predictions was that the average person would be able to generate AI art. It's always amusing seeing how far off we were on our prediction timing.

Expand full comment