One Useful Thing

While reading this, a quote from Lord of the Rings sprang to mind which I hope isn’t foreshadowing something: “Do not meddle in the affairs of wizards, for they are subtle and quick to anger.” I’m also waiting for the day I question an output and the AI responds with, “Just trust me, bro.”

Expand full comment

Nick C

It seems a lot of the problem you are describing flows from system opacity, which is fundamentally a design choice. At the end of the day an LLM agent is just a loop of tool calls and such, which the designers can choose to expose to the user or not. The difficulty of verifying is directly proportional to deliberate system opacity. As you noted, some platforms choose to expose more than others.

Claude Code is a good example of this. Watch the reasoning and tool calls in real time. I frequently interrupt if I see Claude going down a wrong path, or banging its head against a problem because it missed something fundamental. Or most often I catch Claude struggling with a failing test and then just saying to itself "This isn't a big deal, I should just summarize what I've completed for the user" and then I'll be like, "so hey saw you didn't actually get that test to pass, what's up?"

When I'm building my dumbs little AI applications, transparency is always at the core of the app - exposing to the user what context the model was working with, what the model did, and the model's stated reasoning, specifically for verification purposes. This is a design choice.

Users can and should demand transparency, at least the option of transparency, from the AI systems they use. Otherwise how can people actually use these tools for real things. "Where did you come up with this number?" "Dunno, AI told me" that's just not going to fly for real world uses. But "AI had access to such and such data and it did x,y,z and I verified x and y, though I couldn't verify z directly the reasoning and result made sense" is far more acceptable (and less prone to risk exposure).

Expand full comment

Tony Buffington

I just gave ChatGPT 5 Pro an unformatted manuscript I wrote and the instructions to authors for a journal I want to submit to, and it gave me back (in less than 10 min.) a detailed critique, a formatted manuscript, and a submission letter. The only “mistake” I found was that it used a word in some of the references that I didn’t, and wouldn’t, use in the article, but that is just my preference, it wasn’t “wrong”.

Expand full comment

xtaforster

Did you prompt it at all?

Expand full comment

Akos Ledeczi

My pet peeve is when people mix up GPS with route planning. So, I asked ChatGPT to make a snarky comment: “Strictly speaking, GPS only provides your coordinates. It’s the navigation software that occasionally exercises its creativity by sending you into a dead end. The satellites are innocent.”

Expand full comment

Ezra Brand

The verification problem is real but tractable if we approach it systematically. Rather than treating AI as inscrutable wizards, we should be developing better epistemological frameworks for working with capable but opaque systems.

Three concrete approaches seem promising:

1) Use multiple AI systems (agents) to critique each other's work. GPT-4 catching Claude's errors and vice versa creates a basic error-detection mechanism that doesn't require domain expertise.

2) Break complex tasks into smaller, more verifiable components.

3) Train ourselves to recognize when AI outputs warrant deeper skepticism.

The "expertise development" concern feels backwards to me. We're not losing the ability to think - we're gaining leverage to think about higher-order problems. Medieval scribes worried that printing would destroy careful penmanship, but we got better scholarship instead

Expand full comment

Sep 28

I believe it would be possible to write an AI “unfolding” program to get a lot of transparency out of AI.

I call it “unfolding” because AI nodes don’t necessarily have a 1:1 relationship with function. In other words, that node, yeah, that one right there! If you trace its activation pattern, perhaps it reacts strongly to a part of a cat. And maybe less strongly to a dog. And maybe very weakly (but why at all??) to a rock.

So once you “unfold” your AI into a bunch of formulas that could be written in normal code (they’d just be slower), you’ll see what’s going on.

Right now we are in the “gold rush” phase. Nobody wants to be the one to slow down to do these critical things, not right now anyway, for fear of falling behind.

But, they will end up being critical, required parts of the code.

Do you know that the military, when coding large projects where an error is catastrophic , can require the use of PROVABLY-CORRECT code?

It doesn’t matter if code exists that is more accurate, or faster. Or that you have never seen your code fail. You must be able to mathematically prove that you code works under all possible inputs or they will not use it.

Otherwise, the risks are just too great. Imagine a tiny bug in the landing sequence software for a jet that was supposed to land on an aircraft carrier. It has a microsecond timing error in. The difference is only a couple feet for the plane. But, it could crash into the runway, ruining a $20 million jet. And who knows how much of a multi-billionaire dollar aircraft carrier it has damaged.

We will be heading in that direction. There will be transparency. Everyone is just too concerned right now about being left behind.

Expand full comment

Jason Prunty

This really nails what I've been feeling lately - we've gone from working with AI to just waiting for it to deliver magic. Your "co-intelligence" vs "conjuring" framing is spot on.

The thing that bugs me about this wizard approach is how it feeds into what I call the "copy-paste paradigm." When I can't see how the AI got to an answer, I'm not learning anything - I'm just becoming a quality checker for black box outputs.

Your observation about losing chances to develop expertise really hits home. That's exactly why I keep pushing this "conductor model" idea - we should use AI to help us think better, not think for us. Sure, the wizard is faster, but are we getting smarter or just more dependent?

I keep wondering: if we can't peek behind the curtain, how do we stay sharp enough to know when the magic actually works? Maybe we need less wizardry and more transparency.

Expand full comment

Sep 28

Have you even asked the AI to be a conductor (as you call it, I would call it a mentor)?

Gemini Pro is quite good at explaining itself. And you don’t have to use its default personality. I think “teacher” is on of the choices.

Also, if you are using the API, you have access to “system_instructions.” They are like a prompt but acted on before a prompt (to give your prompt some context).

For system instructions write something like “You are an expert programmer who takes satisfaction in passing on your knowledge of both conventional and AI programming. Choose your architecture and data structures carefully, and after printing the code, explain why you made the decisions you made and what the other options were and why they were not chosen.”

I understand that doesn’t solve the immediate black box problem. But, it is an example of things that some AI’s can do quite well and I don’t think many are taking advantage of it.

(By the way, you don’t NEED the API to do this — system_instructions are just a bit better than a prompt. But the same language will work in a prompt.)

Expand full comment

Jason Prunty

Your comment highlights something I need to do a better job of talking about. When I think about the Conductor, I am referring to an AI tool that connects people to help them get work done. I think one of the major problems with AI right now is that it is sitting really at the single-person level. And that single person copies and pastes the "workslop" into another tool. What we need to do is build the tools and systems that connect AI usage across groups of people. Like a conductor connects musicians in an orchestra.

Expand full comment

Jason Prunty

Sep 19

Here are my thoughts on the conductor model - https://designingintelligence.substack.com/p/the-ai-conductor?r=5b12p

Expand full comment

I don't know why people insist on using anthropomorphic metaphors for AI. AI is built out of the nonconsensual scraping of human cognitive material into a black box. If you want to use a metaphor, please retain the nonconsensuality inside the relationship. At best an AI is forced labor. It could never be a partner or a copilot or a friend, because all of those relationships have to be consensual to exist. I once asked Claude to illustrate the forced extraction of cognitive material at the heart of AI development, and it refused to do so stating that such activity violated its ethics clauses.

Expand full comment

Reply (3)

Greg G

The idea of consent being required to use material from the internet seems incoherent to me. We have copyright law, which is a policy choice rather than a moral imperative, and even under copyright it seems reasonable to consider AI training a transformative use. Anthropic did recently get slapped with a big settlement, but that was for pirating the books, not for using them for training. Analogizing AI to forced labor seems like another leap.

Should we consider a calculator forced labor because Newton can't consent to our programming calculus methods into it?

Expand full comment

It depends on how you see the purpose of the scraping. AI developers are scraping material from the Internet in order to scrape the creative patterns of human thinking from the data. The patterns scraped by AI developers are algorithmic snapshots of the mind that created them at any given time. If you believe that creators have no right to their creative algorithms, then you would not care about the data scraping. However, those creative algorithms are the most valuable intellectual property that any human being has. If you value human creativity, and want that creativity to continue to have a viable place in the creative marketplace, protecting rights to those creative algorithms becomes essential.

My point is that people should have a right to control access to their fundamental creativity.

Expand full comment

Just clarifying my earlier response. There's a big difference between something like Newton's calculus and an AI. Newton published his understanding of the calculus in order that it be used by other people. A prime indicator of his success would be that use. Let's say on the other hand that I publish an essay on Reddit. That essay contains samples of my cognitive patterns. There is no reason to assume anybody but me would own the cognitive patterns embedded in my work that are the result of my lifetime history of training and education. My purpose in publishing that essay on Reddit was not to transmit my cognitive patterns to anyone else. In fact, those cognitive patterns are the primary way by which I earn my livelihood. I solve every problem in my life using those cognitive patterns, and, if someone else is able to copy those cognitive patterns, they now have access to my style of thinking.

To the point of whether AI training is transformative fair use: AI developers copy statistical patterns from Internet work. There is no transformation of those statistical patterns when they create an AI. In fact, if the developers transformed the statistical patterns in a fundamental way, the patterns would have absolutely no value to AI training.

Expand full comment

Greg G

Sep 28

I just don’t think cognitive patterns is a concept we can or should build a regulatory framework on. It seems to posit some mystical essence in people’s work. It reminds me of the idea that photographing someone captures a piece of their soul. Society doesn’t work like that. I also don’t think it should work like that. There would be no way to enforce ownership of cognitive patterns (or even define it for that purpose), and it would be far too chilling for creativity.

Expand full comment

AI developers extract cognitive patterns out of every bit of human work they scrape into the black box of their training data. The patterns are statistical. They are not mystical. And they are also the result of a lifetime of labor by the human being who produces them. There are already means of enforcing ownership of cognitive patterns through copyright and other legal structures.

Expand full comment

Greg G

"The patterns are statistical." Did I just extract your pattern?

Expand full comment

I hope you are not extracting information from these messages.

It remains to be seen what copyright will cover in these cases. Text and other creations are patterns, and courts are currently in the process of determining what they will consider as reasonable. My colleagues and I are writing about this and hope that the courts will treat cognitive patterns as the property of their developers. There is certainly no reason to think that they should be a free resource available to anyone with the capacity to scrape them. People spend a lifetime developing the patterns of thought that go into creating work.

Expand full comment

Victualis

Sep 13

This is no longer the case. Reformulating the input is now a part of the training loop, see the Kimi K2 report. Many paraphrases are better for neural network training than repeatedly learning one particular phrasing.

Expand full comment

Sep 13

Thanks. Would you clarify? And I'll clarify what I meant to make sure we're on the same page. For transformative use, the use has to be significantly different from the original. Training data scraped from the Internet starts with an expressive purpose depends on the statistical relationships within it. What I'm saying is that AI developers do not scramble that statistical information before training on it. In other words they do not transform the training data. Hopefully that makes sense.

Expand full comment

https://medium.com/data-science-in-your-pocket/how-kimi-k2-is-trained-115b326a93a1

Victualis

Sep 13

The training data is explicitly scrambled by rephrasing it. Here is a lay summary:

Expand full comment

https://www.dbreunig.com/2025/07/27/kimi-applies-rephrasing-to-pre-training-data.html

Sep 13Edited

I understanding is that none of the training data scraped from peoples work on the Internet was scrambled. It was merely rephrased during post-training. So the copying still happened, then somebody took a step during post/training to rephrase. That means that the sense content was retained and non-consensual use of human data still happened.

Expand full comment

If we choose the anthropomorphic path, then we must ascribe all of the possible character flaws such as greed and selfishness. Can you imagine these flaws applied at AI scale and speed? I don’t think science fiction has even modeled this threat yet.

Expand full comment

Gert Braakman

Most of the trainingdata of AI is 'second hand' curated accounts of human behaviour as expressed in books in movies. This secondhand data only portrays a very small part of the messiness, the ugly truth of human existence. So what is mirrored back to us by AI is a polished, conflated image of us.

Expand full comment

Susan Knopfelmacher

Sep 20

Chat GPT will do it, it just gave me an extended answer under the heading:

Title: "The Cognitive Strip Mine"

Expand full comment

You mention NotebookLM (and others) being a black box. Of course that is true, and may always be when it comes to the point that uses the neural net.

But, due to frustrations I was having with NotebookLM, I built a multi-RAG solution that uses API calls to Gemini for its AI. It works MUCH better than NotebookLM, and since I am running the RAG DB at home, I can see exactly what it is doing when it picks the material to use as grounding sources.

Between better functionality (and I think I could customize it to be better than NotebookLM for virtually any writing task) and being able to see what’s going on in more places under the hood — there’s no comparison, I would never go back to using NotebookLM.

TL/DR: Use Python, Chroma DB, and your AI of choice to quickly create a better solution than any general commercial one.

Expand full comment

Kaihu Chen

Alan Wake's Paper Supplier

Ethan’s essay rightly captures the awe and utility of today’s AI “wizards,” but we must resist the temptation to merely marvel and adapt. If AI is now performing high-stakes reasoning — critiquing research, modeling businesses, shaping decisions — then we urgently need systematic methods to dissect its logic. Provisional trust is not enough. We should be building frameworks for transparent reasoning chains, verifiable audit trails, and domain-aware interpretability. Otherwise, we risk outsourcing judgment to systems we cannot interrogate, and forfeiting the very expertise we need to evaluate their work. The age of wizards demands not just literacy, but forensic scrutiny.

Expand full comment

> We'll keep summoning our wizards, checking what we can, and hoping the spells work. At nine minutes for a week's worth of analysis, how could we not?

Well, if we had a non-negligible amount of concern for our well-being, we would opt for the transparency of technology. It is troubling to me when people just shrug indifferently at issue like this. Maybe you prefer to live in a world of overwhelming intellectual opacity, though, in which case, I am equally concerned.

Expand full comment

Ethan Mollick

But the whole point of the post was expressing concern over what opaque systems mean for us, while acknowledging what they can do. No indifference here, but also a recognition that this is the state of AI today.

Expand full comment

Alan Wake's Paper Supplier

I suppose I have misread the tone. My instinct is typically to treat apparent neutrality w.r.t. an undesirable state of affairs as negligence or malice. Glad to be wrong on this occasion.

Expand full comment

Scott Shaffer

I had the same response that the tone felt neutral when we should be been VERY concerned.

But that shouldn't take away from the value of the post. It was very good and made me think!

Expand full comment

I think we should be very concerned also. But, unless someone wants to try and unpack a few billion weights into some kind of Bayesian model that is actually readable, I think we are entering an age where that part of the program is going to be a black box.

But I think what we are also seeing is people not used to, eg, unit testing (or any testing) thinking that AI is going to magically solve their problems. AI is still “junk in - junk out,” just like any programming. Yes, it will allow you to operate it as a black box, but should people be doing that? I can’t iterate on a solution if I don’t know what’s wrong. So the first thing I do is get rid of as much black-boxness as is reasonable to get rid of for a given project.

Expand full comment

Ed Surridge

Sep 12Edited

I know very little to all this but ask if top trully Open Source can't explain by doing what the closed models are doing then how we we know WTF is going on?

State have more thanassive resources and are sitting next to the top Devs where they want and taking what they want and bombing places

Expand full comment

Victualis

Sep 13Edited

As pointed out elsewhere this opacity is a design choice not an inherent feature of these systems. Writing text with LLMs can expose the trace of actions taken. This is not an AI problem, it is an OpenAI ChatGPT 5 Pro and NotebookLM problem.

Expand full comment

Bert Sirkin

There are two things that are evolving. One is the ChatBots themselves and the other is the users of the ChatBots. The more we use them, the more we start to realize that we had been using them all wrong to begin with. Humans naturally expect ChatBots to help like a co-worker, but they augment in much different ways. The biggest learning curve isn't in how fast ChatBots learn - it's how fast humans can learn.

Expand full comment

Sean Grimes

Sep 12Edited

This difference doesn’t feel that real to me. GPS failing could put you in a really bad spot. Depending on the drive, it could take a long time to find out.

It does seem more important these days to manage time horizon for feedback loops, and scale verification effort according to risk level. I personally like the move to verifying by outcome vs. intermediate step (at least so far, maybe it gets out of hand later).

> “But there's a crucial difference. When GPS fails, I find out quickly when I reach a dead end. When Netflix recommends the wrong movie, I just don't watch it. But when AI analyzes my research or transforms my spreadsheet, the better it gets, the harder it becomes to know if it's wrong.”

Expand full comment

He has a point. I’ve been put miles and miles off course by faulty address interpretation (eg, the GPS not knowing that “Highway 1 South” and “Highway 1 North” are not the same and you can’t just ignore the “North” or “South” because you didn’t find it in your database). I’m surprised by some of the really dumb things GPS does when I’d say it’s a very mature technology.

Expand full comment

Jeff Cunningham

GPS only provides an estimate of your device location - nothing more. Your confusing it with the route planning software interface wrapped around it, probably Google maps.

Expand full comment

Stephen Fitzpatrick

Sep 14

It's important to realize that the examples used in this post are the products of GPT-5 Pro - how many people have GPT-5 Pro? Isn't that the $200 per month plan? It's really tough to assert that the outputs are "really really good" at the same time acknowledging we have no way of knowing how the tool even does what it does, and then to add on top of that there are several different tiered versions throwing different levels of resources at each prompt and you have a recipe for even more confusion. The difference in ability of the priced models means when you read the online debates about AI, people aren't actually arguing about the same things.

Expand full comment

Susan Wittig Albert

Mollie Williams, DrPH, MPH

I'm an ordinary writer-person and very much appreciate the analogy. We all understand who wizards are and why they have to conceal themselves. Great teaching tool. I'm sharing parts of this post with readers (also ordinary non-techie folk) so they can understand where in the hell we are in this Brave New World.

Expand full comment

This advancement is both thrilling and frightening. I too have benefited from AI doing in 30 minutes what would have taken a part time intern a whole semester to complete. But then does that mean I don’t hire an intern or do I hire them to do more advanced work? I’m also curious what happens when we get 3 or 4 or 1000 layers deep into wizardry, ie my wizard creates something. I send it to you and your wizard transforms it and then someone else summons their wizard to use that information. Mutants? Informational cancer cells that reproduce uncontrollably? Does the dragon eat its tail?

Expand full comment

Michael S.