I didn't mention it in the post but given discussions on social media, it is worth noting the the environmental impact of an individual image is negligible, while in aggregate it it obviously compounds.
Curious to me that all attempts read your language for meaning and none for grammar or literally. You didn't say otter using wifi on a plane you said otter on a plane using wifi. Why didn't any of the images show a plane using wifi -- with an otter on it? Which is the literal read. It's not important just curious.
As impressive as all this is technically, I can't help but notice how dull the content is. My favorite time working with the AI was when it could barely produce a face. At least then the mistakes were usually interesting. Now it can accurately recreate the same recycled scenes and aesthetics we've seen a million times before.
It doesn't feel like the future; it feels like the past.
the tkin part of the essay threw out my theory that it was either just reconstructing the training data it was fed, or creating an output based on the "alignment" process of trying to make sure it didn't go too far off the rails or eclipse human preference. but agree that in general I am much more interested, in an artistic sense, in seeing what it would create "prima genesis," perhaps that's where the local weighted models might be more fun to play with. i'm also hoping we see a countermovement grow similar to the impressionists attempts to counter the disruptive power of camera and film to painting a century ago.
Mind blown! Can you share some of the prompts you used -- such as the talk show? Really detailed, or as you suggested just as simple as “like the musical Cats but for otters”.
Ethan, this is a great example of what continues to be elusive for a vast majority of the public that is not immersed in the transformative disruption on a weekly basis. It's hard to articulate and even more challenging to embrace the speed of change, but it is in fact, undeniable.
One can only imagine what might be achieved within a few years. It reminds me of seeing graphics imaging like Photoshop on minicomputers in the early 1990s, yet being able to do the same thing on a PC about 5 years later.
While deep fakes and other similar productions are a cause of concern, as well as the copyright issues, the ability to do some extraordinary work in different domains using AI tools on a local computer is something once only read in science fiction.
If local, open models are just a few months or years behind the state of the art, why is there a need to race to hyperscale to AGI if a simpler, smaller model will achieve very similar output in a short time? It seems to me this implies a very transient benefit, and cheap, "good enough" should be more likely to displace the high-end model suggested by Clayton Christensen (The Innovator's Dilemma). It conjures up a cyberpunk world of distributed production of a vast range of goods and services using AI tools.
Back in 2020, me and my SO did a "predictions for 2030" time capsule. One of our predictions was that the average person would be able to generate AI art. It's always amusing seeing how far off we were on our prediction timing.
I didn't mention it in the post but given discussions on social media, it is worth noting the the environmental impact of an individual image is negligible, while in aggregate it it obviously compounds.
The MIT Technology Review found that generating an image took 2,282 joules, which is equivalent to 5 seconds running a microwave or 14 seconds of a laptop. https://www.technologyreview.com/2025/05/20/1116327/ai-energy-usage-climate-footprint-big-tech/
Model optimization for local models is an emerging focus in SOTA research
What's more frightening -- the threat of AI domination or the prospect of the Otters musical?
Yes
This is a fantastic roundup, thank you!
Curious to me that all attempts read your language for meaning and none for grammar or literally. You didn't say otter using wifi on a plane you said otter on a plane using wifi. Why didn't any of the images show a plane using wifi -- with an otter on it? Which is the literal read. It's not important just curious.
This post was otterly fantastic
As impressive as all this is technically, I can't help but notice how dull the content is. My favorite time working with the AI was when it could barely produce a face. At least then the mistakes were usually interesting. Now it can accurately recreate the same recycled scenes and aesthetics we've seen a million times before.
It doesn't feel like the future; it feels like the past.
To be fair, I was just using the same basic prompt for each. You can get much wilder results from AI in many different ways.
the tkin part of the essay threw out my theory that it was either just reconstructing the training data it was fed, or creating an output based on the "alignment" process of trying to make sure it didn't go too far off the rails or eclipse human preference. but agree that in general I am much more interested, in an artistic sense, in seeing what it would create "prima genesis," perhaps that's where the local weighted models might be more fun to play with. i'm also hoping we see a countermovement grow similar to the impressionists attempts to counter the disruptive power of camera and film to painting a century ago.
So, now when I hear computer boffins warning about DOS, they’re talking about Deranged Otter Syndrome”?
Very useful metaphor: LLM like a novelist starting from the beginning, Diffusion like a sculptor chipping away everywhere.
How do we guard against fakes?
that's what I am worried about. Copyright issues is one thing, but very realistic deep fakes - they don't even have to be deep anymore...
Humanity managed to survive until about a century ago without the ability to have "unfakable" images from far away.
Amazing, Ethan. Simply amazing.
This post was much better than the otter posts
Mind blown! Can you share some of the prompts you used -- such as the talk show? Really detailed, or as you suggested just as simple as “like the musical Cats but for otters”.
A lot more here than just "One Useful Thing!" I'm gobsmacked! Thank you!
Ethan, this is a great example of what continues to be elusive for a vast majority of the public that is not immersed in the transformative disruption on a weekly basis. It's hard to articulate and even more challenging to embrace the speed of change, but it is in fact, undeniable.
One can only imagine what might be achieved within a few years. It reminds me of seeing graphics imaging like Photoshop on minicomputers in the early 1990s, yet being able to do the same thing on a PC about 5 years later.
While deep fakes and other similar productions are a cause of concern, as well as the copyright issues, the ability to do some extraordinary work in different domains using AI tools on a local computer is something once only read in science fiction.
If local, open models are just a few months or years behind the state of the art, why is there a need to race to hyperscale to AGI if a simpler, smaller model will achieve very similar output in a short time? It seems to me this implies a very transient benefit, and cheap, "good enough" should be more likely to displace the high-end model suggested by Clayton Christensen (The Innovator's Dilemma). It conjures up a cyberpunk world of distributed production of a vast range of goods and services using AI tools.
Back in 2020, me and my SO did a "predictions for 2030" time capsule. One of our predictions was that the average person would be able to generate AI art. It's always amusing seeing how far off we were on our prediction timing.