The AI expertise conundrum

13/7/2025 ☼ AI ☼ meaningmaking ☼ expertise ☼ innovation

tl;dr: Current LLMs can’t truly or create new knowledge on their own; but they can help humans do that innovation work more quickly. LLMs work best as eager research assistants: good at mapping known landscapes, bad at deciding what matters. So, paradoxically, they’re most useful to people with enough domain expertise to ask good questions and spot flaws — but they leave novices vulnerable to plausible but biased or simply incorrect outputs. If your organisation is deploying AI as a creativity engine or innovation driver, maybe reconsider. The smarter approach: Design AI use around its real affordances (information synthesis, not autonomous creation) and build AI workflows that keep human meaning-making at the centre.

Two years after ChatGPT launched, I often see a collective … wishful belief? delusion? … about the current crop of AI systems and particularly about large language models (LLMs). The belief is: These AI systems are creativity engines, breakthrough generators, digital oracles that can synthesize new knowledge from the ether, machines that will soon be able to do everything that humans can.

From my own experience (caveat: using only commercially available LLMs from Anthropic, OpenAI, and Google) it’s clear that they are not any of those things.

The AI systems we have now are tremendously sophisticated and powerful, but they’re still information synthesis machines operating on the output artifacts of many generations of humans making inherently subjective decisions about what things are valuable — outputs of thousands of years of human meaning-making. They don’t make meaning themselves, but they can help sophisticated humans do meaning-making. They’re fantastically cool tools, but they’re not more than tools. Understanding that LLMs are meaning-making tools — not totipotent entities with their own agency — makes many design principles for using LLMs fall into place.

(I use the term “meaning-making” here in a specific way, to refer to any decision we make about the subjective value of a thing. Meaning-making includes any moral decision, judgment call, and aesthetic preference that results in, for instance, an output artifact like the receipt from a buying decision, a written decision by a judge, or an official statement supporting an action. For more on meaning-making, take a look at this article I wrote a few years ago.)

The challenge we face now isn’t principally that we’re prompting badly or using the wrong models (though that is also often the case). It’s that we’ve misunderstood what these tools are good at, and in doing so, we’re stuck in a conundrum about who can actually use them effectively.

What LLMs do

LLMs are probability machines. They can tell you what words (and groups of words) are most likely to follow other words (and groups of words). This gives LLMs the ability to analyse and produce inputs and outputs at the level of phrases and sentences representing concepts and arguments.

The probabilities here come out of how LLMs are trained: They develop these probabilities by processing datasets comprising artifacts of human meaning-making work — books, reports, papers, blogposts, lyrics, etc — to see the frequency of association between words and phrases.

For a concept that is described using broadly the same words in the training dataset, the LLM can probably reliably analyse and produce outputs about the concept that are consistent with how the concept was represented in the training dataset. Ask an LLM to explain “photosynthesis” and you can expect to get something close to the current scientific consensus — because there is a broad agreement about what photosynthesis is, and it is usually described in similar ways. (In other words, the “right” patterns of words and phrases occur with high frequency in relation to “photosynthesis” in the training dataset.)

But the same process of training an LLM makes it hard to expect it to be reliable about a concept that doesn’t have broad agreement in the training dataset. Ask an LLM trained on economic theory what causes economic recessions and the response will maybe lean toward a monetarist explanation, maybe a Keynesian one, maybe an Austrian one, or maybe one that implicit accepts a range of competing explanations as equally valid. The response will average across many years of economic debate. Without knowing that there isn’t a broad consensus view of the causes of recessions, the LLM’s output reads plausibly as settled knowledge about which there is broad consensus — even when this is not the case.

The more controversial or unsettled a topic is, the more you need to know about it to use LLMs effectively to understand the topic and produce content about it.

Knowledge Catch-22

This implies that tou need substantial domain expertise to prompt an LLM to get a response that “accurately” reflects the texture of the underlying domain.

Novices who most need help navigating a complex topic they’re unfamiliar with can’t recognise when they’re getting incomplete or biased views. Someone asking about climate change who is not steeped in the current debate about might not realize they’re getting one framing of a complex scientific discussion. A person new to artificial intelligence might not understand that questions about machine consciousness touch on contested philosophical territory.

Meanwhile, experts can craft sophisticated prompts and spot incomplete answers, but they already possess much of the knowledge the LLM would provide. LLMs work best for people with intermediate expertise — those who know enough to ask good questions and spot problems, but not enough to make the tools redundant. It’s a narrow sweet spot.

The innovation trap

The confusion deepens when people try to use LLMs to produce innovation. The technology industry has encouraged this, promoting AI as a creative partner that generates breakthrough insights. But LLMs can’t create true innovation in themselves. They need a human somewhere in the loop. LLMs can recombine existing information to produce outputs, but the human user will end up being the arbiter of whether the novel recombination is interesting and useful (or not).

Where LLMs have been proving valuable for innovation work is in mapping existing landscapes. Instead of asking them to innovate, use them to understand what already exists so you can identify where innovation might be found. Ask an LLM to show you the current state of research in your field, or to compare your ideas to existing knowledge. Both approaches avoid asking the system to be creative and instead leverage its strength as an information retrieval and synthesis system.

Maybe a good way to think of LLMs is as research assistants who have read a huge amount within a domain and can synthesize that information. They can tell you how people currently think about a topic (by presenting the words/phrases most used to describe the topic), what research has been done, how different perspectives compare. But like any research assistant, they cannot do the creative work of identifying whether what’s missing from the available knowledge is important or not. They are also not good, right now, at explaining what is missing from the training dataset from which their responses are ultimately drawn.

Expertise arms race

This creates a troubling long-term conundrum. As LLMs become more sophisticated, the gap between effective and ineffective use widens rather than narrows. Those with domain expertise will become increasingly adept at extracting value from LLMs, while those without such expertise may find themselves more dependent on systems and system-outputs they cannot properly evaluate.

The democratisation of knowledge that AI advocates promise seems instead to reinforce existing hierarchies of expertise. Access to information becomes less important than the ability to assess and contextualize that information.

A framework for clear thinking

The most productive approach, in my view, is to be clear about what you’re asking these systems to do. If you’re trying to understand the state of established knowledge, LLMs can be powerful synthesis tools — but you need enough domain knowledge to ask the right questions and evaluate the answers. If you’re trying to do something innovative, use LLMs to map the existing landscape rather than generate new ideas.

These uses reflect the affordances of these tools: They excel at information synthesis and are definitionally incapable of genuine creativity without a human to do the meaning-making work. The sooner we align our expectations with this reality, the more useful these tools will become.

Uncharted territory

These observations exist within a broader landscape of AI criticism, but they reveal gaps in how we’ve been analyzing these systems. The technology industry has debated whether LLMs truly “democratise” access to AI, with many arguing that technical barriers remain significant. Academic researchers have explored various evaluation paradoxes and examined training data sustainability. The research assistant analogy appears frequently in discussions of AI tools, but typically as an unqualified positive rather than a framework that highlights expertise barriers.

What’s missing is systematic examination of the expertise paradox at the heart of practical LLM usage. While researchers have noted training data limitations and consensus mechanisms for improving reliability, they’ve focused on technical solutions rather than fundamental user knowledge requirements.

This analysis suggests that rather than democratizing expertise, these tools may actually reinforce existing knowledge hierarchies. The observation that LLMs work best for users with intermediate expertise points to a narrow and perhaps shrinking sweet spot for genuine utility.

The framework for thinking strategically about when and how to use LLMs—distinguishing between established and novel tasks, evaluating training data relevance, recognizing the consensus detection problem—represents a more nuanced approach than prevailing narratives of either techno-optimism or techno-pessimism. The most important questions about artificial intelligence may not be about the technology itself, but about the knowledge structures and intellectual hierarchies that shape how we can meaningfully engage with it.

For the last few years, I’ve been wrestling with the practical challenges of meaning-making in our increasingly AI-saturated world, developing frameworks for how humans can work effectively alongside these powerful tools while preserving the meaning-making work that is the irreplaceably human part of the reasoning we do. I’ve published this as a short series of essays on meaning-making as a valuable but overlooked lens for understanding and using AI tools

I’ve also been working on turning discomfort into something productive. idk is the first of these tools for productive discomfort.

And I’ve spent the last 15 years investigating how organisations can succeed in uncertain times. The Uncertainty Mindset is my book about how to design organisations that thrive in uncertainty and can clearly distinguish it from risk.