First, one of the biggest blockers to meaningful AI adoption is this belief in a “silver bullet” solution. In reality, customizability is AI’s greatest strength, and knowing how to tailor it to how you actually work is the most powerful way to leverage it. We no longer have to contort ourselves to fit into systems built by others; AI finally lets us build systems that adapt to our quirks and workflows. But that only happens if you spend real time with it. Organizations need to invest serious time and resources into tinkering (testing different models, use cases, and configurations) to discover what truly fits. Vendors can provide the menu, but only you can figure out what actually works by using it, iterating, and “interviewing” the AI, as Ethan puts it.
Second, as you noted, different models are now clearly better at different tasks. That reality makes model-agnostic solutions increasingly valuable. The advantage today isn’t just having your own proprietary LLM; it’s being able to seamlessly access and orchestrate across multiple models. A year ago, owning an LLM was the moat. Now, the real moat is flexibility - being able to partner with, switch between, and productize multiple models around user needs.
AI can be a good hire or a bad one. As Ethan points out, it’s not just about performance; it’s about alignment. The smartest move leaders can make now is to build their guiding principles into the models before they start shaping decisions and culture
Great questions get great answers. Your puzzle of the new parent with a reservoir of 47 words inspired all four models to dig pretty deep and unearth some extravagant prose.
But wait -- aren't they computers? Can't they calculate that any word is wasted on a newborn who has not yet learned language?
Capacity-based measurements are clearly falling short and sometimes duped. I like the approach of interviewing an AI. I learn a lot from having them enter into dialogues, sharing the outputs from one with another. Even how they take criticism or spot flaws and merits in another’s work is revealing.
We might also need a new metric for AI behavior—something like a “Highly Unlikely for Humans” index. It would flag the moments when an AI says something no human ever would, like “January comes before December this year.” Not just wrong—cosmically implausible. It’s less about factual precision and more about preserving the baseline coherence of human reasoning. A low score on this index would signal much greater reliability for the tasks we actually need.
Two thoughts here:
First, one of the biggest blockers to meaningful AI adoption is this belief in a “silver bullet” solution. In reality, customizability is AI’s greatest strength, and knowing how to tailor it to how you actually work is the most powerful way to leverage it. We no longer have to contort ourselves to fit into systems built by others; AI finally lets us build systems that adapt to our quirks and workflows. But that only happens if you spend real time with it. Organizations need to invest serious time and resources into tinkering (testing different models, use cases, and configurations) to discover what truly fits. Vendors can provide the menu, but only you can figure out what actually works by using it, iterating, and “interviewing” the AI, as Ethan puts it.
Second, as you noted, different models are now clearly better at different tasks. That reality makes model-agnostic solutions increasingly valuable. The advantage today isn’t just having your own proprietary LLM; it’s being able to seamlessly access and orchestrate across multiple models. A year ago, owning an LLM was the moat. Now, the real moat is flexibility - being able to partner with, switch between, and productize multiple models around user needs.
AI can be a good hire or a bad one. As Ethan points out, it’s not just about performance; it’s about alignment. The smartest move leaders can make now is to build their guiding principles into the models before they start shaping decisions and culture
Great questions get great answers. Your puzzle of the new parent with a reservoir of 47 words inspired all four models to dig pretty deep and unearth some extravagant prose.
But wait -- aren't they computers? Can't they calculate that any word is wasted on a newborn who has not yet learned language?
Capacity-based measurements are clearly falling short and sometimes duped. I like the approach of interviewing an AI. I learn a lot from having them enter into dialogues, sharing the outputs from one with another. Even how they take criticism or spot flaws and merits in another’s work is revealing.
We might also need a new metric for AI behavior—something like a “Highly Unlikely for Humans” index. It would flag the moments when an AI says something no human ever would, like “January comes before December this year.” Not just wrong—cosmically implausible. It’s less about factual precision and more about preserving the baseline coherence of human reasoning. A low score on this index would signal much greater reliability for the tasks we actually need.
1. This sounds like a lot of work. A perfect task to offload to ... a panel of models.
2. Actually, I would be perfectly happy to hire a young guy with perfect test scores to be my new VP of whatever.