Discussion about this post

User's avatar
Peter Dickison's avatar

While reading this, a quote from Lord of the Rings sprang to mind which I hope isn’t foreshadowing something: “Do not meddle in the affairs of wizards, for they are subtle and quick to anger.” I’m also waiting for the day I question an output and the AI responds with, “Just trust me, bro.”

Expand full comment
Nick C's avatar

It seems a lot of the problem you are describing flows from system opacity, which is fundamentally a design choice. At the end of the day an LLM agent is just a loop of tool calls and such, which the designers can choose to expose to the user or not. The difficulty of verifying is directly proportional to deliberate system opacity. As you noted, some platforms choose to expose more than others.

Claude Code is a good example of this. Watch the reasoning and tool calls in real time. I frequently interrupt if I see Claude going down a wrong path, or banging its head against a problem because it missed something fundamental. Or most often I catch Claude struggling with a failing test and then just saying to itself "This isn't a big deal, I should just summarize what I've completed for the user" and then I'll be like, "so hey saw you didn't actually get that test to pass, what's up?"

When I'm building my dumbs little AI applications, transparency is always at the core of the app - exposing to the user what context the model was working with, what the model did, and the model's stated reasoning, specifically for verification purposes. This is a design choice.

Users can and should demand transparency, at least the option of transparency, from the AI systems they use. Otherwise how can people actually use these tools for real things. "Where did you come up with this number?" "Dunno, AI told me" that's just not going to fly for real world uses. But "AI had access to such and such data and it did x,y,z and I verified x and y, though I couldn't verify z directly the reasoning and result made sense" is far more acceptable (and less prone to risk exposure).

Expand full comment
115 more comments...

No posts