Great post, but two parts of your advice feel a bit stuck in 2025.
#1. Delegation Documentation can be written WITH the AI, not alone. E.g., Claude Code has a tool called AskUserQuestionTool that will interview you with multiple choice questions for as long as you want to build your Delegation Documentation for you.
#2. "Evaluate and review" should also be initially done by the AI, since effort is now free. You can prompt BOTH the task and its evaluation, and Claude Code can take on BOTH roles, going back and forth for you until the work product passes its OWN internal tests that it created as part of the task. When compute is free, AI iteration is free.
(Sample prompts of #1 and #2 are in the next comment.👇)
Asking humans to do these 2 steps outside of AI is just not necessary or efficient.
It's Sutton's Bitter Lesson. Don't be too clever; use more compute.
Yes, this is a mind shift. Imagine growing up in a 3rd world country where clean water and electricity are scarce, and then you move to the US and have unlimited clean water and power. It will take you a while to stop conserving both.
The same is true for Knowledge Work Effort (KWE). We all grew up in a world where KWE was scarce, and now it's plentiful, and we're still out here trying to conserve. It takes some unlearning!
Consider teaching your MBA students: Stop trying to conserve effort.
"Interview me in detail using the AskUserQuestionTool about literally anything: technical implementation, UI & UX, concerns, tradeoffs, etc. but make sure the questions are not obvious be very in-depth and continue interviewing me continually until it's complete, then write the spec to the file"
#2 - Sample prompt (h/t to @maxwellfinn on x.com):
"Before you return a version to me have 10 of the world's greatest advertorial experts on subjects like design, copywriting, psychology and CRO review the page, provide detailed feedback and rank it on a scale from 0-100. Each expert should include specific areas for improvement that are reflective of their ranking. If the average ranking isn't over a 90/100 then go back and improve it using the experts' specific recommendations and feedback until it is over a 90."
This also works for images. Want 5 matching slides? Have Claude go back and forth with Gemini Nano Banana (set it up as a Skill first) until Claude is happy with the final product.
It amuses me no end that "a bit stuck in 2025" at this point means - and I think this simultaneously points out the rapid ongoing revolution in...*gestures vaguely at everything all around us*... - **27 whole days ago** :D
AI can be good at both ideation and definition - the risk to counter as the Manager Of AI is that your AI is Eager To Please to some degree at every opportunity - and as been true in AI/ML since time immemorial, using an adversarial approach (whether in one conversation with liberal use of "red team that for (insert weakness like truth or realism) or two LLMs working adversarially with each other) strengthens the whole.
Haha, thanks Mike! Just an engaged hobbyist. I'm a 40-something non-coder management consultant with an addiction to vibe-coding (Claude Code) and a Twitter feed full of folks discussing AI, just finding and collecting bits and pieces of insights like a bowerbird hoarding shiny items.
I like the explanation that AI is "middle to middle" not "end to "end", but the middle part is definitely expanding outward in both directions, which I think is part of you're trying to say.
I'm not a massive doommonger over resource use, but that is an interesting analogy (unlimited water and electricity) you've picked to suggest that compute is 'free'.
Replace "free" with "abundant, not scarce," and I think the point stands
Of course water/power/compute is not actually $0.
But if you grow up thinking of the thing as scarce, and then you live in a world where it's abundant, it takes deliberate practice to adjust your heuristics.
My point is that water is a limited resource, and so is electricity if your usage goes up faster than you build new generation capacity. These are both concerns related to the rapid increase in compute demand.
If demand exceeds supply then the price goes up - hence not 'free', whether you mean abundant or cheap.
The market can only fix that by increasing supply to the extent that the required natural resources are available (or substitutes can be found) and the externalities can be treated as someone else's problem.
"Stuck in 2025"? Today is only 1-27-2026. Time flies. ChatGPT 5.2 was a Bingo release. Now the hot button is Clawdbot. Today progress is not 10x. It's more like 100x. Or even 1000x. The biggest progress killer? Lack of imagination - given the constraints of the past, which would be early 2025. Thanks for your comment.
But I find that the acts of delegation and evaluation are precious opportunities to challenge and refine my initial plans. Whether with human or AI agents, explaining my idea often inspires rethinking it, and evaluating results always does.
Your model treats iterative delegation as mere inefficiency, overlooking its value as strategic refinement.
Ethan, this is the first “agentic AI” post I’ve read that treats the real bottleneck with respect: review time. Everyone loves the generation step. Nobody budgets for the “is this true, safe, and aligned with intent” step, which is where careers go to die quietly.
Also, the reframe that delegation is the new prompting is exactly right. PRDs, shot lists, five-paragraph orders, definition of done. Those were always translation devices from intent to execution. Now we just have cheaper labor and a higher risk of confidently wrong output. Management did not get softer. It got more literal.
Totally agree, generation is the easy part now; verification is the real tax. If a team doesn’t budget for review (sources, edge cases, safety, does this match intent?), that’s where the quiet failures happen: wrong numbers in a deck, a risky email sent, a confident bug shipped.
The winners will treat agentic AI like a production line: clear PRD + definition of done + checklists/tests + audit trail, then humans review only what truly needs judgment. Future feels less like “prompting” and more like review architecture the orgs with the best evaluation + approval loops will scale fastest and break least.
I’ve actually been focused on this in my Substack. I’ve a series now called Architects of Automation and will release on Monday new series on online scams
Love that direction, Architects of Automation is exactly the mindset shift we need. Building systems and exposing online scams hits both sides of the equation: scale the upside, reduce the downside.
Honestly, the next big edge isn’t just automation, it’s automation with fraud awareness baked in. Looking forward to Monday.
Here in undergraduate world, I'm marking and reviewing assignments. Only a scattering of hallucinated references - either cheap AI tools are doing this less now, or the students are paying more, or they've got wiser to the problem - but so, so many that have a bit of an AI vibe to them. Mostly, I think students are prompting the AI tool with questions and paraphrasing the responses, plus some use of AI to create an outline structure. The basic summarising of reading and theory seems, on average, to be a lot better than a few years ago. The synthesis, analysis, and evaluation is often shallow. Marks similar. On the other hand, different cohorts and maybe a lot of this is just writing support like Grammarly getting a lot better, IDK ¯\_(ツ)_/¯. I'm bothered that our knowledge of how much learning activity is being delegated seems largely based on anecdotal evidence (like mine), we don't seem to know how much it affects expertise development, and - relevant to this, very interesting, post - are we going to end up with a generation that no longer has the kind of subject expertise you're saying is crucial for effective management of agentic AIs?
Perhaps one lamentable outcome is that it will be more difficult to develop your own personal voice as an author. I guess baseline for writing quality will be higher in some regards as you pointed out but it feels like there are some tradeoffs to be made.
In the end writing is thinking and at least so far it has been difficult to outsource thinking entirely. I try to guide my students towards thinking about structure and narrative for this reason although they have to master the mechanics (citing etc.) as well.
The world evolves, and the tools evolve and the approach should evolve as well. If (and without context, it's a big IF) you're in a position to implement it - make turning in *the prompts and conversation with the AI* a part of handing in the assignment, too - since that's where the actual work and thinking about the topic has migrated and it shows how the learner approaches and thinks about the topic. It's "Show your work", evolved.
Are they using AI to learn (by engaging with the AI and asking questions or chasing down nuance, using the AI to write a study guide and flash cards, etc.) - or are they using AI to avoid learning (Write me a paper about...X).
I have almost complete freedom to implement that, if I wanted to. Here are some thoughts on doing that, in no particular order and without much filtering:
Marking that sort of thing strikes me as something like doubling the workload based on word count submitted.
To what extent would length of AI interaction become a proxy for quality of AI interaction, in the same way that, pre-LLM support, quality of written English tended to become a proxy for quality of thinking?
How possible is it to just get AI to produce the AI interaction?
It's only a question of reaching agreement, same as with any marking, but how do we weight the quality of AI interaction v the quality of the finished product?
What if "Write me a paper about...X" produces an assignment at least as good as "using AI to learn"?
What if "using AI to learn" involves less learning than not using AI?
What if we end up with graduates who are no longer able to take disorganised thoughts, synthesise, analyse, and turn them into organised thoughts, because everything they're working with has been turned into digestible AI output and organised for them?
1. To my line of thinking, the assignment scope changes completely - The technology is here, and it's not ever going away. So perhaps it's less about identifying and parroting back to you (in their own words, filtered through whatever they Googled in 2014) the themes of the Iliad and the Odyssey as much as it is about engaging with the new technology in a challenging and meaningful manner. Don't write me a paper about the themes of the Odyssey. Instruct the AI to pick a "hot take" from the Odyssey and have the learner have the AI first defend the perspective (while the learner tells the AI that this hot take is complete bullshit because (reasons) then have the learner switch roles with the AI and have the AI pick apart that original position while the Learner tries to defend it - even though they (may) believe that the position is complete bullshit.
The transcript is the deliverable.
The test (and test artifact) is a short series of questions *about the exchange*. e.g.
1 - What specific claims did the AI make in its initial defense of the “hot take,” and which of those claims did you directly challenge with textual evidence from the Odyssey rather than opinion or modern analogy?
2 - When roles were reversed, which weaknesses in the original position became most apparent to you, and how did defending it force you to reinterpret or recontextualize passages you initially thought disproved it?
3 - Across both roles, what changed in your understanding of the Odyssey’s themes, characters, or moral framework that would not have emerged from writing a traditional theme paper, and why did the adversarial exchange with the AI surface that change?
You're not expected to read everything qualitatively - you're necking down to just the prompts, the role switch points, and where the Learner intervenes, corrects, or reframes the discussion for the AI. They are now intended to be *leading* the discussion. You're not grading verbosity (which is a thing in traditional essay, anyway). You're grading the moves the Learner makes, instead. The difference in a two page transcript that has sharp adversarial moves and role reversals is really clear (and better) than 10 pages of "Tell me more".
AI generating AI Interaction is not impossible, but it's certainly going to collapse the under the mildest questioning. If a Learner cannot explain why they challenged a claim (or why they supported a bad hot take when it was their turn to do so in the debate), it's going to be pretty obvious. This isn't really fundamentally different than, say, contract cheating.
How do you weight? The quality of the interaction with the AI *is* the finished product. For a century we taught how to write an essay by hand as a skill-building assignment. One that was a proxy for learning how to *think*.
But now outstanding essays can be produced in seconds by AI, so we need a new proxy and need to use AI as a tool that now teaches how to think, instead. Every Instructor has the ability to engage in the Socratic Method - but you don't have to be in a group, and you definitely don't need to wear a toga. You can have the Learners engage with any and every Socrates on any and every Topic and you can evaluate (from their transcripts of those conversations) how they probed concepts, evaluated structure, and exposed contradictions through visible argument.
Your concern over loss of synthesis is solid, but it's not specific to AI. The process of having the Learner engage Socratically with the material - and especially the role reversal - makes synthesis of the knowledge unavoidable since the Learner needs to actively take and defend a position they may personally reject, and also defend that position using the primary learning text, not "whatever the AI spat out before" - this is doubly effective because the AI is going to be a World Class Expert on every scrap ever written about the Assignment, and is going to tear the Learner a new one as the Learner earnestly tries to defend a poorly defensible "hot take" position that they do not actually agree with in the first place.
As I look back on my own learning journey - there were only ever two skills being taught, and everything was a proxy for those two (and one silly) skills.
1. How to investigate something with the intent of understanding it well enough to make it personally useful in some capacity. (Math, Science, Civics (law and process), Facts)
2. How to evaluate and select a position on a topic and articulate that position in a cogent and persuasive manner. (Literature, History, Civics (politics), etc.)
3. Getting out of breath. (Physical Education, Band.)
I don't think the assignments we're setting are quite as exposed to a high quality AI response as writing about Homer's themes. We moved away from anything like a straight lit review (where an AI will beat most of us if you can work round the response length problem) a while back. However, I'm not naive enough to think students are not able to / choosing to generate structural outlines and/or firing a lot of questions at an AI and paraphrasing some of the output to make their work quite a lot better than it otherwise would be.
I still think your suggestion lends itself to having a couple of windows open and getting the AIs to take both roles but I agree with you that if we can combine what you're suggesting with an in-person assessment of basic knowledge and understanding, that's potentially a pretty strong defence.
well, there's the tie-in back to the original post topic! "I still think your suggestion lends itself to having a couple of windows open and getting the AIs to take both roles..."
IMO, that's exactly the sort of modern skill you want to develop in a Learner - in a world where *managing AI* and developing the skills for distinguishing good output from bad is a valuable skill to have going forward.
I feel like your role as the instructor is leaning more into evaluating the quality of the *process* moreso than the quality of the actual *output*.
I tried it. Without any in-depth knowledge of the text - read The Odyssey, once, years ago - I can produce what appears to be a sophisticated and detailed argument around the "hot take". It reminds me of watching a documentary series. As you watch you feel like learning is pouring into your brain but afterwards what you're left with is vibes and an amalgamation of memorable incidents that you probably can't even put into chronological order. Since I'm not an expert on Ancient Greek literature, I also tried it with a topic where I am an expert. I could quibble a bit with one or two points but the 'process' was easily 1st class UG level or Distinction for MSc. If a student was genuinely trying to argue with the AI, based on serious reading, and probably over several days as they went away to hone their questions and rebuttals, they would learn a lot, I think. If they just ask an AI to do that work, they will learn only a little. As the marker, the difference would be those asking the AI to do it would be getting high marks from me, those not asking the AI to do it would get the usual spread of marks.
At the risk of sounding miserable, all the startups are indeed further down the road than they'd have been before, but the ideas themselves all seem generic and the big barriers to the ideas aren't speed but marketing cost.
Theres a million "tailor made advice" for parents, or "sell your ticket" sites, the issue is that Media is very very expensive.
The goal of AI should be to dream bigger, have bolder ideas, not to execute faster.
That's a lot of comments and not one mentioning The Enchanted Lighthouse! Do you happen to know if the game definitively works? I think the well is broken… I can't lower anything. Or find a way into the garden shed. And I've been trying. Relentlessly.
Parser games do as parser games do. Hugo's House of Horrors destroyed me as a kid! You've now given me ideas I didn't need. We shall see what comes of it!
Many of the software engineers I work with have trouble getting value from the agentic tools. But all of the engineering managers I know have found them to be very useful.
It’s very similar to working with junior software engineers - they are likely to do things a different way than you would do it, and if you give them a vague instruction, there’s a good chance you won’t get what you want. But you can generally move much faster with a team of ten junior engineers than you can when working by yourself.
After 17+ years coaching managers, the biggest struggle I see is that new managers can't evaluate work they didn't do themselves. With AI? That gets way worse—AI produces plausible-looking garbage at scale.
The frameworks you mentioned (PRDs, design docs) aren't just good prompts. They're what managers need anyway. AI just made them non-negotiable.
This is a helpful article. I think you've presented a plausible hypothesis here about how AI might spin out in conventional business contexts.
Something we have already known is that AI is not automating core decision making, which is effectively a question of human ethics. The more you automate, the more morality and ethics you are necessarily handing over. There will be an obvious incentive to hand over more and more responsibility to try to 'get more done.' Which is just one concern here.
A more prosaic question: Will the economy, the broad, minimum wage economy, actually benefit from being able to create software this easily? It seems like the commercial software market is often pretty saturated. Most games on Steam don't sell, etc. I guess one possibility is this will unleash a wave of new entrepeneurship, but unless you're selling software that will be bought by a giant megacorporation, I'm not sure how you cash in on it. And aren't the megacorporations just automating all their own needs internally?
What I appreciate here is that you frame AI not as a shortcut, but as a decision process. Delegation isn’t about avoiding effort — it’s about choosing where human judgment adds the most value. The diagrams make that trade-off visible in a very grounded way. Thought-provoking work.
This resonates deeply. I've been running a solo SaaS operation for the past year using Claude Code as my entire engineering team — backend, frontend, infra, deployment. The delegation framework you describe is exactly what I've landed on through trial and error.
The missing variable in your equation that I'd add: domain expertise as a multiplier on Probability of Success. When I delegate infrastructure work to the AI (my specialty, 20 years in SRE), I catch errors in seconds and my prompts are surgical. When I delegate marketing copy, the evaluation time balloons because I'm not an expert there.
The real unlock isn't just "management skills" — it's that deep domain expertise now has 10x more leverage than it did before AI. A solo founder who actually understands their stack can now ship what used to require a team of 5-10 people. The bottleneck shifted from execution to taste and judgment.
100% domain expertise is the multiplier people keep missing. When you’re delegating inside your specialty (like infra/SRE), you can spot a bad assumption in seconds and steer the AI like a scalpel; outside your lane (marketing, brand, pricing), the output might look polished but the cost shifts to evaluation.
That’s the real future signal: AI makes execution cheap, so advantage moves to taste + judgment + test loops (checklists, real user feedback, measurable definition of done). The best solo builders won’t just use AI, they’ll run a tight review system that lets one expert scale like a whole team.
Please consider Aigents as a good term for ai agents. The human manager/ evaluator and all the various aigents.
Knowing what one wants, and being able to articulate reasons for evaluation, is indeed tough & valuable.
The last Superpower slide is great, yet also inspires improvement. The Human manager will talk, in English, to an aigent coordinator which (not who) communicates with all the other aigents. Each of which is ripe for specialization in their area, and yet available to serve all the other coordinator aigents. Including updating each aigent to work better as better ways of working are established.
Naturally not all aigents are listed, but the key auditing aigent will have significant impact on the functioning of all aigents. Not yet so much talked about is the likely requirement to log all requests to all aigents, from human & aigents both, in daily growing text files.
“Aigents” is a great label, it captures the reality that humans won’t use one AI, they’ll manage a swarm: an orchestrator + specialist aigents + a human evaluator. And you’re spot on about the auditing aigent + logging. That flight recorder (who asked what, what data was used, what changed, what was approved) is what turns agent work from cool to trustable, especially in finance/health/ops. The next step is making those logs structured + replayable (with privacy + key-security baked in), so orgs can scale aigents without scaling risk.
Agree with the broad premise, but of the generated business ideas how many are truly defensible? If MBA's can create almost-working software businesses in 4 days, so can the next person; anything good can be copied in less than a month.
Also the act of delegation in management help us to refine what we actually want. Personally I find AI can be somewhat helpful in this but it's massively limited by sycophancy, lack of context and bias to always give an answer (even when it shouldn't)
Strong take—and I’d add: management is only a superpower if it includes the authority to not deploy.Coordination scales capability; governance preserves legitimacy. Without refusal, orchestration just accelerates risk.
What I keep thinking about is what happens when you move this from one person to a team. The leaders I work with who are getting the most out of AI are not the ones who delegate to it most efficiently alone.
They are the ones who bring the AI's output into the room who show the team what the AI produced, invite the pushback, and use the friction between the AI's answer and the team's judgment as the actual thinking process.
The management superpower may not just be knowing when to delegate. It may be knowing when to make the delegation visible when the AI's draft becomes the starting point for collective sense making rather than the final output. That is a different skill, and I'm not sure we've named it yet.
Fascinating experiment. I wrote a LinkedIn response expanding your framework for enterprise contexts — where organizational resistance, not build speed, is the real bottleneck. Would love your thoughts:
Great post, but two parts of your advice feel a bit stuck in 2025.
#1. Delegation Documentation can be written WITH the AI, not alone. E.g., Claude Code has a tool called AskUserQuestionTool that will interview you with multiple choice questions for as long as you want to build your Delegation Documentation for you.
#2. "Evaluate and review" should also be initially done by the AI, since effort is now free. You can prompt BOTH the task and its evaluation, and Claude Code can take on BOTH roles, going back and forth for you until the work product passes its OWN internal tests that it created as part of the task. When compute is free, AI iteration is free.
(Sample prompts of #1 and #2 are in the next comment.👇)
Asking humans to do these 2 steps outside of AI is just not necessary or efficient.
It's Sutton's Bitter Lesson. Don't be too clever; use more compute.
http://www.incompleteideas.net/IncIdeas/BitterLesson.html
Yes, this is a mind shift. Imagine growing up in a 3rd world country where clean water and electricity are scarce, and then you move to the US and have unlimited clean water and power. It will take you a while to stop conserving both.
The same is true for Knowledge Work Effort (KWE). We all grew up in a world where KWE was scarce, and now it's plentiful, and we're still out here trying to conserve. It takes some unlearning!
Consider teaching your MBA students: Stop trying to conserve effort.
#1 - Sample prompt:
"Interview me in detail using the AskUserQuestionTool about literally anything: technical implementation, UI & UX, concerns, tradeoffs, etc. but make sure the questions are not obvious be very in-depth and continue interviewing me continually until it's complete, then write the spec to the file"
#2 - Sample prompt (h/t to @maxwellfinn on x.com):
"Before you return a version to me have 10 of the world's greatest advertorial experts on subjects like design, copywriting, psychology and CRO review the page, provide detailed feedback and rank it on a scale from 0-100. Each expert should include specific areas for improvement that are reflective of their ranking. If the average ranking isn't over a 90/100 then go back and improve it using the experts' specific recommendations and feedback until it is over a 90."
This also works for images. Want 5 matching slides? Have Claude go back and forth with Gemini Nano Banana (set it up as a Skill first) until Claude is happy with the final product.
This post and Drew's reply blows my mind. I will try it.
The creaking a skill is only through Claude Code, right? Or is there another alternative to create that workflow for matching slides?
Correct, ask Claude Code to walk you through making a Gemini Nano Banana skill
Useful, thanks!
It amuses me no end that "a bit stuck in 2025" at this point means - and I think this simultaneously points out the rapid ongoing revolution in...*gestures vaguely at everything all around us*... - **27 whole days ago** :D
AI can be good at both ideation and definition - the risk to counter as the Manager Of AI is that your AI is Eager To Please to some degree at every opportunity - and as been true in AI/ML since time immemorial, using an adversarial approach (whether in one conversation with liberal use of "red team that for (insert weakness like truth or realism) or two LLMs working adversarially with each other) strengthens the whole.
Who are you? This is not meant to be a negative comment, I’m just blown away by this insight hidden in the comments section of an Ethan Mollick post.
Haha, thanks Mike! Just an engaged hobbyist. I'm a 40-something non-coder management consultant with an addiction to vibe-coding (Claude Code) and a Twitter feed full of folks discussing AI, just finding and collecting bits and pieces of insights like a bowerbird hoarding shiny items.
Want to chat sometime? Contact me here: https://forms.gle/Vz8C372NAugMWA9v8
Contacted you as well Drew - would love to chat with you. Where my mind goes is to a new era of knowledge work (and its implications for learning):
Pre-2023: Value = Human Skill x Human Time
2025: Value = (Human Prompting + AI Speed) - Human Evaluation Time
2026: Value = Human Intent x (AI Computation x AI Recursion)
Just emailed you!
I like your framework.
I like the explanation that AI is "middle to middle" not "end to "end", but the middle part is definitely expanding outward in both directions, which I think is part of you're trying to say.
I'm not a massive doommonger over resource use, but that is an interesting analogy (unlimited water and electricity) you've picked to suggest that compute is 'free'.
Replace "free" with "abundant, not scarce," and I think the point stands
Of course water/power/compute is not actually $0.
But if you grow up thinking of the thing as scarce, and then you live in a world where it's abundant, it takes deliberate practice to adjust your heuristics.
My point is that water is a limited resource, and so is electricity if your usage goes up faster than you build new generation capacity. These are both concerns related to the rapid increase in compute demand.
If demand exceeds supply then the price goes up - hence not 'free', whether you mean abundant or cheap.
The market can only fix that by increasing supply to the extent that the required natural resources are available (or substitutes can be found) and the externalities can be treated as someone else's problem.
We're talking past each other. I've always been talking about the level of the task, not the society level like you are.
When we do tasks in the world, we treat certain resources as essentially free -- like tap water, power -- and others as scarce -- like effort.
If a resource suddenly goes from "scarce" to "abundant," it breaks our heuristics.
Yes, sure, I was only commenting on the irony of your choice of analogy.
"Stuck in 2025"? Today is only 1-27-2026. Time flies. ChatGPT 5.2 was a Bingo release. Now the hot button is Clawdbot. Today progress is not 10x. It's more like 100x. Or even 1000x. The biggest progress killer? Lack of imagination - given the constraints of the past, which would be early 2025. Thanks for your comment.
wow , what a way to backhand someone by saying "stuck in 2025"
tounge-in-cheek of course ... it's a firehose of new capability every week, and we gotta work together to stay afloat
You assume that managers know what they want.
We think we know.
But I find that the acts of delegation and evaluation are precious opportunities to challenge and refine my initial plans. Whether with human or AI agents, explaining my idea often inspires rethinking it, and evaluating results always does.
Your model treats iterative delegation as mere inefficiency, overlooking its value as strategic refinement.
What a great and honest response.
Dov, you make a great point. The burden of 1) setting goals and 2) providing context is easier said than done even with fancy AI tools.
Ethan, this is the first “agentic AI” post I’ve read that treats the real bottleneck with respect: review time. Everyone loves the generation step. Nobody budgets for the “is this true, safe, and aligned with intent” step, which is where careers go to die quietly.
Also, the reframe that delegation is the new prompting is exactly right. PRDs, shot lists, five-paragraph orders, definition of done. Those were always translation devices from intent to execution. Now we just have cheaper labor and a higher risk of confidently wrong output. Management did not get softer. It got more literal.
Totally agree, generation is the easy part now; verification is the real tax. If a team doesn’t budget for review (sources, edge cases, safety, does this match intent?), that’s where the quiet failures happen: wrong numbers in a deck, a risky email sent, a confident bug shipped.
The winners will treat agentic AI like a production line: clear PRD + definition of done + checklists/tests + audit trail, then humans review only what truly needs judgment. Future feels less like “prompting” and more like review architecture the orgs with the best evaluation + approval loops will scale fastest and break least.
I’ve actually been focused on this in my Substack. I’ve a series now called Architects of Automation and will release on Monday new series on online scams
Love that direction, Architects of Automation is exactly the mindset shift we need. Building systems and exposing online scams hits both sides of the equation: scale the upside, reduce the downside.
Honestly, the next big edge isn’t just automation, it’s automation with fraud awareness baked in. Looking forward to Monday.
Sometimes it's where careers go to die loudly!
All too true
Here in undergraduate world, I'm marking and reviewing assignments. Only a scattering of hallucinated references - either cheap AI tools are doing this less now, or the students are paying more, or they've got wiser to the problem - but so, so many that have a bit of an AI vibe to them. Mostly, I think students are prompting the AI tool with questions and paraphrasing the responses, plus some use of AI to create an outline structure. The basic summarising of reading and theory seems, on average, to be a lot better than a few years ago. The synthesis, analysis, and evaluation is often shallow. Marks similar. On the other hand, different cohorts and maybe a lot of this is just writing support like Grammarly getting a lot better, IDK ¯\_(ツ)_/¯. I'm bothered that our knowledge of how much learning activity is being delegated seems largely based on anecdotal evidence (like mine), we don't seem to know how much it affects expertise development, and - relevant to this, very interesting, post - are we going to end up with a generation that no longer has the kind of subject expertise you're saying is crucial for effective management of agentic AIs?
Perhaps one lamentable outcome is that it will be more difficult to develop your own personal voice as an author. I guess baseline for writing quality will be higher in some regards as you pointed out but it feels like there are some tradeoffs to be made.
In the end writing is thinking and at least so far it has been difficult to outsource thinking entirely. I try to guide my students towards thinking about structure and narrative for this reason although they have to master the mechanics (citing etc.) as well.
The world evolves, and the tools evolve and the approach should evolve as well. If (and without context, it's a big IF) you're in a position to implement it - make turning in *the prompts and conversation with the AI* a part of handing in the assignment, too - since that's where the actual work and thinking about the topic has migrated and it shows how the learner approaches and thinks about the topic. It's "Show your work", evolved.
Are they using AI to learn (by engaging with the AI and asking questions or chasing down nuance, using the AI to write a study guide and flash cards, etc.) - or are they using AI to avoid learning (Write me a paper about...X).
I have almost complete freedom to implement that, if I wanted to. Here are some thoughts on doing that, in no particular order and without much filtering:
Marking that sort of thing strikes me as something like doubling the workload based on word count submitted.
To what extent would length of AI interaction become a proxy for quality of AI interaction, in the same way that, pre-LLM support, quality of written English tended to become a proxy for quality of thinking?
How possible is it to just get AI to produce the AI interaction?
It's only a question of reaching agreement, same as with any marking, but how do we weight the quality of AI interaction v the quality of the finished product?
What if "Write me a paper about...X" produces an assignment at least as good as "using AI to learn"?
What if "using AI to learn" involves less learning than not using AI?
What if we end up with graduates who are no longer able to take disorganised thoughts, synthesise, analyse, and turn them into organised thoughts, because everything they're working with has been turned into digestible AI output and organised for them?
In order, also without much filtering. =)
1. To my line of thinking, the assignment scope changes completely - The technology is here, and it's not ever going away. So perhaps it's less about identifying and parroting back to you (in their own words, filtered through whatever they Googled in 2014) the themes of the Iliad and the Odyssey as much as it is about engaging with the new technology in a challenging and meaningful manner. Don't write me a paper about the themes of the Odyssey. Instruct the AI to pick a "hot take" from the Odyssey and have the learner have the AI first defend the perspective (while the learner tells the AI that this hot take is complete bullshit because (reasons) then have the learner switch roles with the AI and have the AI pick apart that original position while the Learner tries to defend it - even though they (may) believe that the position is complete bullshit.
The transcript is the deliverable.
The test (and test artifact) is a short series of questions *about the exchange*. e.g.
1 - What specific claims did the AI make in its initial defense of the “hot take,” and which of those claims did you directly challenge with textual evidence from the Odyssey rather than opinion or modern analogy?
2 - When roles were reversed, which weaknesses in the original position became most apparent to you, and how did defending it force you to reinterpret or recontextualize passages you initially thought disproved it?
3 - Across both roles, what changed in your understanding of the Odyssey’s themes, characters, or moral framework that would not have emerged from writing a traditional theme paper, and why did the adversarial exchange with the AI surface that change?
You're not expected to read everything qualitatively - you're necking down to just the prompts, the role switch points, and where the Learner intervenes, corrects, or reframes the discussion for the AI. They are now intended to be *leading* the discussion. You're not grading verbosity (which is a thing in traditional essay, anyway). You're grading the moves the Learner makes, instead. The difference in a two page transcript that has sharp adversarial moves and role reversals is really clear (and better) than 10 pages of "Tell me more".
AI generating AI Interaction is not impossible, but it's certainly going to collapse the under the mildest questioning. If a Learner cannot explain why they challenged a claim (or why they supported a bad hot take when it was their turn to do so in the debate), it's going to be pretty obvious. This isn't really fundamentally different than, say, contract cheating.
How do you weight? The quality of the interaction with the AI *is* the finished product. For a century we taught how to write an essay by hand as a skill-building assignment. One that was a proxy for learning how to *think*.
But now outstanding essays can be produced in seconds by AI, so we need a new proxy and need to use AI as a tool that now teaches how to think, instead. Every Instructor has the ability to engage in the Socratic Method - but you don't have to be in a group, and you definitely don't need to wear a toga. You can have the Learners engage with any and every Socrates on any and every Topic and you can evaluate (from their transcripts of those conversations) how they probed concepts, evaluated structure, and exposed contradictions through visible argument.
Your concern over loss of synthesis is solid, but it's not specific to AI. The process of having the Learner engage Socratically with the material - and especially the role reversal - makes synthesis of the knowledge unavoidable since the Learner needs to actively take and defend a position they may personally reject, and also defend that position using the primary learning text, not "whatever the AI spat out before" - this is doubly effective because the AI is going to be a World Class Expert on every scrap ever written about the Assignment, and is going to tear the Learner a new one as the Learner earnestly tries to defend a poorly defensible "hot take" position that they do not actually agree with in the first place.
As I look back on my own learning journey - there were only ever two skills being taught, and everything was a proxy for those two (and one silly) skills.
1. How to investigate something with the intent of understanding it well enough to make it personally useful in some capacity. (Math, Science, Civics (law and process), Facts)
2. How to evaluate and select a position on a topic and articulate that position in a cogent and persuasive manner. (Literature, History, Civics (politics), etc.)
3. Getting out of breath. (Physical Education, Band.)
Okay, that makes more sense to me.
I don't think the assignments we're setting are quite as exposed to a high quality AI response as writing about Homer's themes. We moved away from anything like a straight lit review (where an AI will beat most of us if you can work round the response length problem) a while back. However, I'm not naive enough to think students are not able to / choosing to generate structural outlines and/or firing a lot of questions at an AI and paraphrasing some of the output to make their work quite a lot better than it otherwise would be.
I still think your suggestion lends itself to having a couple of windows open and getting the AIs to take both roles but I agree with you that if we can combine what you're suggesting with an in-person assessment of basic knowledge and understanding, that's potentially a pretty strong defence.
Will consider further.
well, there's the tie-in back to the original post topic! "I still think your suggestion lends itself to having a couple of windows open and getting the AIs to take both roles..."
IMO, that's exactly the sort of modern skill you want to develop in a Learner - in a world where *managing AI* and developing the skills for distinguishing good output from bad is a valuable skill to have going forward.
I feel like your role as the instructor is leaning more into evaluating the quality of the *process* moreso than the quality of the actual *output*.
I tried it. Without any in-depth knowledge of the text - read The Odyssey, once, years ago - I can produce what appears to be a sophisticated and detailed argument around the "hot take". It reminds me of watching a documentary series. As you watch you feel like learning is pouring into your brain but afterwards what you're left with is vibes and an amalgamation of memorable incidents that you probably can't even put into chronological order. Since I'm not an expert on Ancient Greek literature, I also tried it with a topic where I am an expert. I could quibble a bit with one or two points but the 'process' was easily 1st class UG level or Distinction for MSc. If a student was genuinely trying to argue with the AI, based on serious reading, and probably over several days as they went away to hone their questions and rebuttals, they would learn a lot, I think. If they just ask an AI to do that work, they will learn only a little. As the marker, the difference would be those asking the AI to do it would be getting high marks from me, those not asking the AI to do it would get the usual spread of marks.
At the risk of sounding miserable, all the startups are indeed further down the road than they'd have been before, but the ideas themselves all seem generic and the big barriers to the ideas aren't speed but marketing cost.
Theres a million "tailor made advice" for parents, or "sell your ticket" sites, the issue is that Media is very very expensive.
The goal of AI should be to dream bigger, have bolder ideas, not to execute faster.
That's a lot of comments and not one mentioning The Enchanted Lighthouse! Do you happen to know if the game definitively works? I think the well is broken… I can't lower anything. Or find a way into the garden shed. And I've been trying. Relentlessly.
Yes, it works end to end! Here is how to win, according to the AI.
***SPOILERS***
Walkthrough: The Enchanted Lighthouse
Phase 1: Gather Basic Items
south → Cottage
take journal → Read for hints
examine rocking chair → Find cottage key
take key
east → Kitchen
open drawer → Find knife
take knife
open pantry → Find vial
take vial
take moonbane
west → Cottage
Phase 2: Garden & Shore Items
west → Garden
use key on shed → Unlocks shed
take oil
take bowl
examine bench → Find locket
take locket
scratch angel → Get resin (need knife)
south → Rocky Shore
take rope
use vial on pools → Collect glowing shells
combine oil with shells → Creates CELESTIAL OIL ✓
Phase 3: Get the Brass Key (Cave)
north → Garden
west → Cliffs
examine bench → Find love letter
take letter
east → Garden
Wait for "Tide: Low" message (every 10 moves), then:
south → Shore
east → Cave (only at low tide!)
talk captain
take sealed letter
open sealed letter → Get BRASS KEY ✓
west → Shore
Phase 4: Repair the Lens
north → Garden
east → Cottage
north → Lighthouse Base
up → Lighthouse Top
take fragment
take fragment
take fragment → All 3 lens pieces
take cloth
combine fragments with resin → Assembled lens
polish lens → LENS REPAIRED ✓
Phase 5: Sacred Spark Ritual
down → Lighthouse Base
south → Cottage
west → Garden
lower bowl → Ritual at well (need moonbane, bowl, locket, rope)
→ Creates GLOWING LOCKET (Sacred Spark) ✓
Phase 6: Victory!
east → Cottage
north → Lighthouse Base
up → Lighthouse Top
open mechanism → Use brass key
light flame → VICTORY!
🤦🏼♀️ I told it to look at the chair, not "rocking chair"… Interesting pinch point. Fun game, thanks for sharing!
I could have asked it to expand the parser a few more times, but I really wanted the AI to run with this on its own.
Parser games do as parser games do. Hugo's House of Horrors destroyed me as a kid! You've now given me ideas I didn't need. We shall see what comes of it!
Many of the software engineers I work with have trouble getting value from the agentic tools. But all of the engineering managers I know have found them to be very useful.
It’s very similar to working with junior software engineers - they are likely to do things a different way than you would do it, and if you give them a vague instruction, there’s a good chance you won’t get what you want. But you can generally move much faster with a team of ten junior engineers than you can when working by yourself.
After 17+ years coaching managers, the biggest struggle I see is that new managers can't evaluate work they didn't do themselves. With AI? That gets way worse—AI produces plausible-looking garbage at scale.
The frameworks you mentioned (PRDs, design docs) aren't just good prompts. They're what managers need anyway. AI just made them non-negotiable.
This is a helpful article. I think you've presented a plausible hypothesis here about how AI might spin out in conventional business contexts.
Something we have already known is that AI is not automating core decision making, which is effectively a question of human ethics. The more you automate, the more morality and ethics you are necessarily handing over. There will be an obvious incentive to hand over more and more responsibility to try to 'get more done.' Which is just one concern here.
A more prosaic question: Will the economy, the broad, minimum wage economy, actually benefit from being able to create software this easily? It seems like the commercial software market is often pretty saturated. Most games on Steam don't sell, etc. I guess one possibility is this will unleash a wave of new entrepeneurship, but unless you're selling software that will be bought by a giant megacorporation, I'm not sure how you cash in on it. And aren't the megacorporations just automating all their own needs internally?
What I appreciate here is that you frame AI not as a shortcut, but as a decision process. Delegation isn’t about avoiding effort — it’s about choosing where human judgment adds the most value. The diagrams make that trade-off visible in a very grounded way. Thought-provoking work.
This resonates deeply. I've been running a solo SaaS operation for the past year using Claude Code as my entire engineering team — backend, frontend, infra, deployment. The delegation framework you describe is exactly what I've landed on through trial and error.
The missing variable in your equation that I'd add: domain expertise as a multiplier on Probability of Success. When I delegate infrastructure work to the AI (my specialty, 20 years in SRE), I catch errors in seconds and my prompts are surgical. When I delegate marketing copy, the evaluation time balloons because I'm not an expert there.
The real unlock isn't just "management skills" — it's that deep domain expertise now has 10x more leverage than it did before AI. A solo founder who actually understands their stack can now ship what used to require a team of 5-10 people. The bottleneck shifted from execution to taste and judgment.
100% domain expertise is the multiplier people keep missing. When you’re delegating inside your specialty (like infra/SRE), you can spot a bad assumption in seconds and steer the AI like a scalpel; outside your lane (marketing, brand, pricing), the output might look polished but the cost shifts to evaluation.
That’s the real future signal: AI makes execution cheap, so advantage moves to taste + judgment + test loops (checklists, real user feedback, measurable definition of done). The best solo builders won’t just use AI, they’ll run a tight review system that lets one expert scale like a whole team.
Please consider Aigents as a good term for ai agents. The human manager/ evaluator and all the various aigents.
Knowing what one wants, and being able to articulate reasons for evaluation, is indeed tough & valuable.
The last Superpower slide is great, yet also inspires improvement. The Human manager will talk, in English, to an aigent coordinator which (not who) communicates with all the other aigents. Each of which is ripe for specialization in their area, and yet available to serve all the other coordinator aigents. Including updating each aigent to work better as better ways of working are established.
Naturally not all aigents are listed, but the key auditing aigent will have significant impact on the functioning of all aigents. Not yet so much talked about is the likely requirement to log all requests to all aigents, from human & aigents both, in daily growing text files.
“Aigents” is a great label, it captures the reality that humans won’t use one AI, they’ll manage a swarm: an orchestrator + specialist aigents + a human evaluator. And you’re spot on about the auditing aigent + logging. That flight recorder (who asked what, what data was used, what changed, what was approved) is what turns agent work from cool to trustable, especially in finance/health/ops. The next step is making those logs structured + replayable (with privacy + key-security baked in), so orgs can scale aigents without scaling risk.
Agree with the broad premise, but of the generated business ideas how many are truly defensible? If MBA's can create almost-working software businesses in 4 days, so can the next person; anything good can be copied in less than a month.
Also the act of delegation in management help us to refine what we actually want. Personally I find AI can be somewhat helpful in this but it's massively limited by sycophancy, lack of context and bias to always give an answer (even when it shouldn't)
Strong take—and I’d add: management is only a superpower if it includes the authority to not deploy.Coordination scales capability; governance preserves legitimacy. Without refusal, orchestration just accelerates risk.
What I keep thinking about is what happens when you move this from one person to a team. The leaders I work with who are getting the most out of AI are not the ones who delegate to it most efficiently alone.
They are the ones who bring the AI's output into the room who show the team what the AI produced, invite the pushback, and use the friction between the AI's answer and the team's judgment as the actual thinking process.
The management superpower may not just be knowing when to delegate. It may be knowing when to make the delegation visible when the AI's draft becomes the starting point for collective sense making rather than the final output. That is a different skill, and I'm not sure we've named it yet.
Fascinating experiment. I wrote a LinkedIn response expanding your framework for enterprise contexts — where organizational resistance, not build speed, is the real bottleneck. Would love your thoughts:
https://www.linkedin.com/feed/update/urn:li:activity:7430694201031524352/