(Good) categorisation beats empiricism

9/8/2025 ☼ categories ☼ empiricism ☼ strategy ☼ uncertainty

tl;dr: Categories aren’t academic conceits—they’re tools that enable more effective practical action. If you care about making AI work in real-world settings, or navigating business uncertainty, or more reliably drinking wine that you enjoy, then learning to think more rigorously about what types of things you’re dealing with can be a competitive advantage. This post shows how categories clarify decision-making, allow better interpretation of potentially misleading data, and make experimentation smarter and faster. Good categorisation pays off in practice.

A friend recently told me that my category-focused approach to understanding meaning and uncertainty doesn’t land for him. He said he’s more interested in mechanisms. For him, categories are inert—they bump up against his thinking and slide right off.

I find this both mystifying and revealing. I just can’t viscerally process that view. For me, categories are what make mechanisms visible. If you don’t have the right category, you’re much less likely to see the right mechanism—or even know where to look.

This is more than a philosophical preference. It’s a deeply practical epistemic position: Categories, when chosen well, are a form of leverage. They reduce the messiness of the world into distinctions that are useful for understanding and acting.

Why good categorisation matters in practice

Consider a frequentist analysis. You’re trying to understand the effect of an action (independent variable) on an outcome (dependent variable). If the dataset contains different types of actors—but you treat them all as one undifferentiated group—you may conclude that the action has no effect. In reality, it might have a strong positive effect for one subgroup and no or negative effects for others. You’ve missed the mechanism, not because the data failed you, but because your categories did.

A concrete example: Imagine trying to improve worker performance using bonuses. You have six distinct types of workers. One type is highly responsive—say, bonuses improve their performance 60% of the time. For the other five types, bonuses do little or even backfire. If you analyze everyone together without realising that these six types exist, let alone accounting for them separately, the average effect may appear negligible or misleading.

That’s not because bonuses don’t work. It’s because the analysis failed to recognize that the effect depends on looking at differences in the effect between categories (of workers). This is why we use stratification and interaction terms in regression models, and create control variables. Their function is to articulate and capture how effects differ across categories in the data. This is why accurate categorization is not academic hair-splitting—it’s how you detect mechanisms that actually exist.

Good categories are “true” names for groups of things

In biology, taxonomy (the way we categorise living things) used to be based on how things looked. That made sense—visually inspectable form was all we had access to. But something amazing happened when molecular phylogenetics was developed (the use of molecular data such as genetic information to study the evolutionary relationships between organisms). Researchers found that genetic differences often correlated strongly with differences in energy metabolism, diet, and habitat. These weren’t just arbitrary genetic markers. They denoted real, consequential differences in how different types of organisms lived, and were accurate categorisations that were also transformatively useful in understanding those organisms.

I worked for a while with researchers in the International Barcode of Life project. They sequence a conserved section of mitochondrial DNA—a ~650 base pair region of the cytochrome c oxidase I (COI) gene. Even when trained entomologists couldn’t visually distinguish between two nearly identical caterpillars, this DNA barcode could. More importantly, the genetic difference often mapped to meaningful ecological distinctions. Like genetically distinct but visually indistinguishable species of caterpillar, each species feeding exclusively on the leaves of one species of tree and dying if fed leaves from other tree species.¹

The COI sequence was acting like a “true name” for these caterpillar species: A label that isn’t superficial, but actually revealed something important about what the thing is, that is relevant for what actions we take on the thing. When you know what category a thing is in, you have a better sense of what it will do, how it will respond, and how to interact with it. More importantly, you also have a strong theory of how to act on all other things in the same category. This is why categories are good to have.

But not all categorisations are good, even if they’re accurate.

Not good: Accurate categories that aren’t useful

Some categorisations are accurate in that they map to divisions in the world that do exist. However, the test of a useful categorisation is that it must both be accurate and map onto distinct mechanisms by category that enable actions that are distinct by category. This means that it’s possible to have accurate categories that aren’t useful.

When purchasing fruit and vegetables to cook with, it usually makes no sense to choose them based on whether they’re a funny shape. Apples grown well taste the same regardless of whether they’re misshapen or not. Nonetheless, consumers seem to put funny-looking produce into a different category from normal-looking produce … and disproportionately reject or discard funny-looking produce as having suboptimal taste based on how they look.

Bad, bad, bad: Accurate categories that are worse than useless

Accurate categories can also be unuseful or, worse, actively counterproductive.

Back in the realm of fruit, a perfect-looking, fully red tomato is often categorised as “ripe, tastes more like tomato” compared to a tomato that’s still green around the stem. The visual component driving the categorisation is accurate (some tomatoes get fully red when fully ripe, others remain green around the stem when fully ripe), but is counterproductive if your intention is to take action (choose a tomato to eat that is very delicious). The SIGLK2 gene is associated with better tomato aroma and flavour and, causally, with the visual appearance of green colouration even in ripe tomatoes. Unfortunately, nearly all mass-market tomato varieties have had SIGLK2 bred out of them for complete redness. Right now, selecting a fully ripened green-shouldered tomato over a fully red one will pretty reliably get you more tomato flavour. Choosing tomatoes for flavour using an accurate visual categorisation based on full redness is counterproductive.

Implicit bias is a pervasive example of this type of bad categorisation in business. For instance, it is not inaccurate to see that employees in an organisation are of different heights. But using a height-based categorisation as a way to infer greater employability, better leadership qualities, or greater cognitive and non-cognitive ability is at minimum poorly supported by evidence. Hiring or promoting people based on accurate categorisations (about height or other characteristics like skin colour or country of origin) which are not systematically connected to actual performance isn’t just neutral in effect. It often leads to worse outcomes for organisations, such as productivity loss or high employee dissatisfaction and turnover.

Why not just focus on functional mechanisms?

There’s an alternative view, of course: Forget categories, just observe mechanisms. Empiricism. Try stuff, see what works and reject any theory that doesn’t manifest as a functioning mechanism. To be fair, this brute-force empiricist method often does work—eventually. But it’s rarely efficient and it is sometimes ineffective.

This inefficiency and inefficacy becomes a real liability when the terrain to be explored is large, rugged, and difficult to understand—like in complex, rapidly changing, uncertain (not just risky) environments. In those situations, thinking rigorously about categories makes it easier to generate good hypotheses: Ideas about what might work, for whom, and why. Having good (accurate and useful) categories helps you throw things at the wall that are more likely to stick, because you’ve thought more carefully about what kind of wall it is, what you want to throw, and where you want to throw it.

Aim for good (accurate and useful) categories

Empiricism works as long as you’re willing to tolerate inefficiency and inefficacy, especially in an increasingly uncertain (not just risky) world.

If you’re not willing to tolerate that, good categorisation is your answer. But not all categories are good. Categorisation must be both accurate and useful to be good. (There are many, many categorisations which are accurate but unuseful or even counterproductive.) Here’s a table to summarise:

	Accurate categorisation	Inaccurate categorisation
Useful for action	GOOD; we should all aim for this.	BAD
Neutral for action	WHY BOTHER	MEH
Counterproductive	BAD	BAD

If you have good (accurate and useful) categories, you have a way to figure out which kinds of things behave similarly in terms of actions you want to take. Why wouldn’t you use good categories as the foundation for decisionmaking and strategy?

I would, and I do.

For the last few years, I’ve been wrestling with the practical challenges of meaning-making in our increasingly AI-saturated world, developing frameworks for how humans can work effectively alongside these powerful tools while preserving the meaning-making work that is the irreplaceably human part of the reasoning we do. I’ve published this as a short series of essays on meaning-making as a valuable but overlooked lens for understanding and using AI tools

I’ve also been working on turning discomfort into something productive. idk is the first of these tools for productive discomfort.

And I’ve spent the last 15 years investigating how organisations can succeed in uncertain times. The Uncertainty Mindset is my book about how to design organisations that thrive in uncertainty and can clearly distinguish it from risk.

COI barcoding works better in some taxonomic groups than others. It’s very effective in many insects and vertebrates, but less so in taxa where hybridization is common or where mitochondrial DNA doesn’t track neatly with ecological function. Still, the idea holds: When a category aligns with meaningful differences in the world, it gives you leverage to understand and act more effectively.↩︎