Designing AI tools that support critical thinking

20/8/2025 ☼ not-knowingexperienceriskuncertainty

Current AI interfaces lull us into thinking we’re talking to something that can make meaningful judgments about what’s valuable. We’re not — we’re using tools that are tremendously powerful but nonetheless can’t do meaningmaking” work (the work of deciding what matters, what’s worth pursuing).

I developed and tested with first-year undergraduates a pen-and-paper prototype designed to isolate the core mechanisms for thinking critically while using AI tools. Participants used a structured worksheet to simulate a different kind of AI tool user experience of writing an strongly reasoned argument. The main difference in the UX is in pushing them to do iterative meaningmaking work themselves, while articulating what non-meaningmaking work AI tools could help them with. The result of this experiment was compelling and encouraging: Students went from vague proposals to sharp arguments in two hours.

These results suggest that it’s possible to design AI interfaces that clearly separate what humans must do from what machines can help with, laying the groundwork for an AI-powered critical thinking tool. I’m now looking for educational institutions to pilot such a tool.

This research is supported by the Future of Life Foundation’s Programme on AI for Human Reasoning, in which I’m a fellow.

The problem with mainstream AI UX today

Most people use generative AI through an empty prompt box that looks like a chat interface — think ChatGPT or Claude. You type something (anything) into the box, press Enter, and whatever’s on the other side replies with increasingly high-quality words, images, or video that could’ve been created by a human. But a machine made that content.

This unparameterised chat experience creates a profound misunderstanding about what’s actually happening when we interact with AI systems. The interface suggests you’re communicating with a meaningmaking entity on the other side of the box. Meaningmaking is making inherently subjective decisions about what’s valuable: what’s desirable or undesirable, what’s right or wrong. The machines behind the prompt box are remarkable tools, but they’re not meaningmaking entities.

When we type into the prompt box, we’re not writing to another human with true agency. We’re communicating with a tool, hoping to use it for a job we must define and choose success metrics for. Our problem with AI tools is they’re so sophisticated it’s too easy to forget they’re only tools. Add the understandable hype from AI companies about AGIs proximity and yearning talk from big corporations about agentic AIs potential.

AI systems as tools, not more

We need to foreground tool-use logic when thinking with and using AI tools. This becomes obvious when we consider tools that can’t be mistaken for meaningmaking entities — like a kitchen knife.

I don’t ask my knife what I should eat for dinner or how I should evaluate whether it was delicious. It’s just a tool. But a tool’s properties matter for how it’s used. My knife’s qualities (sharpness, shape, weight, length) and my skill in using it affect what I can cut, how I can cut it, and therefore what I can cook. With a long, very sharp knife and years of training, I could cut fish precisely enough to make passable sushi. With a blunt paring knife, I’d be a fool to try.

The tool (the knife) plays an important role I can’t fulfill in the cooking process. I keep the decisions about subjective value for myself: What do I want to eat and how do I decide on its quality when I eat it? These are inherently subjective questions driven by personal preferences, morality, and social conditioning. Why would I ask my knife to answer those questions? I clearly separate what I must do as a human tool user from what the tool can do better than me. This is tool-use logic.

We must foreground this logic with AI tools because they’re probably among the most sophisticated and polyvalent tools ever built. But we don’t do this tool-oriented critical thinking consistently, nor are we taught to do it systematically. That’s why we see people asking LLMs to provide therapy or a US Supreme Court justice complimenting an LLMs legal analysis.

This fundamental confusion leads to poor outcomes and missed opportunities for effective human-AI collaboration. Critical thinking about tool use requires two things. First, users must understand clearly what they’re trying to produce with the tool — this is fundamentally an exercise in critical thinking about goals, values, and desired outcomes, all inherently subjective. Second, the tool must display its affordances clearly, showing what it can and cannot do so users can make informed decisions about employing those capabilities. The empty chat box experience fails catastrophically at the second requirement. It doesn’t highlight what the LLM or other machine system can and cannot do.

Better AI UX that supports critical thinking

This interface design problem intersects with a deeper conceptual challenge around meaningmaking. Meaningmaking includes all judgment calls, aesthetic preferences, and moral decisions that require determining what’s worth pursuing or believing. Here’s the critical insight: only humans can do meaningmaking work, while machines (including AI systems) cannot do meaningmaking work at all (yet).

When AI systems produce content like long passages of text that appears to contain meaningful judgments of value, users face what I call AIs seductive mirage — the illusion that the system’s engaging in all the same kinds of reasoning and decision-making that humans can do. It takes constant reminders and active attention to realize that AI systems’ current human-like output ultimately relies on meaningmaking work done by humans. This makes it crucial for humans to consciously decide what AI-generated content actually means.

How do we design better UX for AI tools, where better means foregrounding tool-use logic? This requires rethinking both elements of good tool use.

For critical thinking about goals and values, we need structured approaches that help users clarify what they’re actually trying to accomplish and why it matters — the meaningmaking work only humans can do. For tool affordances, we need interfaces that make explicit what AI systems do well (information gathering, pattern recognition, rule-following) and what they can’t do (make subjective judgments about value, worth, and meaning).

My research has focused on addressing both requirements simultaneously by changing how we think about the prompt entry field from an empty box to a structured, iterative scaffold for successive prompts that elicits rich context about subjective value over multiple interactions.

The approach I’ve been testing helps users engage in critical thinking about their goals and definitions of success while making the AI system’s capabilities and limitations explicit at each step. This would enable AI systems and their human users to work interdependently but doing fundamentally different things which they respectively do well, rather than having users fall into the trap of treating AI systems as human-equivalent conversational partners instead of sophisticated tools with many genuine strengths.

The research so far

This may sound vague and abstract, but what I’ve been testing is a concrete pen-and-paper prototype that fits on just one sheet of paper. It’s intentionally low-tech so it’s fast to test and designed specifically to isolate the functional mechanism of iterative elicitation of meaningmaking prompt context. I’ll describe it in more detail below.

Experiment setting and baseline

I began this research as an intervention taking the form of a critical thinking and writing workshop for first-year university undergraduates facing a high-stakes personal decision with academic consequences: designing a customised personal major, writing a proposed plan of study for that major, and providing a compelling justification of why that major makes sense to pursue.

The workshop was intended to help them think critically about their custom major to write a rigorous proposal their college could evaluate and approve as a course of study. They’d already been thinking about the proposal for several months.

Before the workshop, I asked each student to submit their current proposal, condensing whatever they’d prepared into a concise handful of paragraphs. On review, these draft proposals nearly universally lacked focus, failed to justify why a custom major was necessary, and showed unclear value propositions for that specific custom major. In truth, most proposals lacked any value propositions at all. On the surface, most students seemed to have spent months without engaging in compelling critical thinking about what they should do for a custom major and why it would be valuable.

Critical thinking in this context

Critical thinking needs careful definition in this context. It involves explaining to yourself, then to others, the logic behind why you’ve decided to do a particular thing that’s only arguably a good thing to do. Even better, critical thinking involves explanations and justifications that are compelling because you’ve taken into account the interests and priorities of the audiences you want to convince. In this case, the particular thing was the custom major: the decision not to join an established major like economics or computer science and instead create something new, say, a major on sustainable local food business models in Southeast Asia.

The personal implications for workshop participants were significant. If students didn’t write a rigorous, carefully thought-through explanation of their proposed custom major, they might not get to do it at all. Worse, they might get approval for a poorly conceived major and realize partway through or after completion that they shouldn’t have done it. The potential bad outcomes include doing something that turns out not to be useful, doing something they don’t actually enjoy, or doing something similar to existing programs while expending extra effort on customization.

Experimental intervention

The experimental intervention was a pen-and-paper, 2-hour workshop using a single worksheet with structured, non-linearly numbered empty boxes representing critical thinking steps in developing an argument and a series of numbered prompts (corresponding to the numbered boxes on the worksheet) which represent key critical thinking questions that could be posed and answered in developing a strongly reasoned argument. The prompts were presented to workshop participants as slides.

The intervention worksheet and some accompanying prompts.The intervention worksheet and some accompanying prompts.

For these purposes, I define an argument as a non-factual claim (i.e., a statement that isn’t objectively true) that’s supported by logical reasoning and appropriate evidence.”

This user experience was designed to simulate a multi-stage process of structured elicitation of various aspects of strongly reasoned arguments. This design explicitly addresses both requirements for good tool use. The structured prompts helped students think critically about what they were actually trying to accomplish with their custom major proposals — the meaningmaking work of determining value, worth, and personal fit. Simultaneously, the framework made clear what kinds of thinking work the students needed to do themselves versus what kinds of information gathering and analysis could potentially be supported by tools like LLMs.

The main functional innovation here seems to be that the process iteratively builds on preceding prompts and their resulting outputs — but on the user’s side. The sequencing and content of the prompts was specifically designed to lead each participant to do reasoning on their own previous prompt responses — I think of this as a sort of individual reasoning scaffold. This approach made the affordances of this pen-and-paper system explicit. The system could provide structure, systematic prompting, and information organization, but all subjective judgments about value, worth, and meaning remained clearly with the human user. Unlike traditional AI interfaces that obscure what the system can and can’t do, this framework makes the division of labour between human user and responsive system clear at every step.

Each numbered box represented a specific type of thinking work, and the non-linear numbering reinforced that critical thinking isn’t a linear process. Participants could see what kind of cognitive work they needed to do in each box, what they could expect to produce as output, and how that output would feed into subsequent reasoning steps. This transparency about the process itself helped students understand not just what to think about, but how to think about it systematically.

Results

The results were surprising and remarkable. Students emerged from the workshop with proposals that were much more sharply focused in identifying particular areas of study to combine into a custom major, much clearer about the value to be found in pursuing that custom major, and clear for the first time about how and why this value arose only from pursuing the custom major rather than any existing major. They could identify adjacent or nearby majors and why these were inadequate for their purposes, and they could now articulate what audiences of people, such as potential employers, would be interested in someone who’d completed their proposed custom major.

The qualitative difference between their proposals before and after completing this pen-and-paper exercise was striking. In the post-workshop debrief, students reported what seemed to be transformative changes in their reasoning capacity: I finally understand what it means to think critically and force myself to ask questions to refine my thinking,” and The process is so concrete and step by step that it feels so manageable, yet by the end of the process you’re so far ahead of where you started.”

Externalising the critical thinking process, combined with the iterative use of their own outputs as foundations for subsequent prompts, seemed to be the key mechanisms responsible for these results.

Implications for AI tool design

My working theory coming out of this experiment is: To learn how to think critically with AI tools, we need to first make explicit the process by which critical thinking happens, then identify the pieces of this process that are about making decisions about subjective value (i.e., the parts of the process which are about meaningmaking).

To be clear, I’m not claiming that the particular process I tested is the only critical thinking process there is for generating strongly reasoned argument. Instead, my claim is that making explicit the meaningmaking parts of reasoning about argument is crucial for generating strongly reasoned arguments.

The specific argument in this pen-and-paper experiment concerned the participant’s proposed custom major, but the design intent for this intervention is that it should work for any kind of argument being developed.

Having made this distinction explicit, it now becomes possible to conceive of and write the scope for an LLM-powered critical thinking tool that can provide some (potentially all) of the non-meaningmaking support to complement the human user’s meaningmaking work in doing critical thinking tasks.

Broad applications (beyond education)

The approach I’ve described above may also address a fundamental gap across domains where meaningful reasoning in the form of argumentation about non-factual claims is essential.

Corporate strategy teams need rigorous evaluation of strategic directions and major decisions where human judgment about value and priorities is essential. C-suite leadership requires systematic assessment of leadership approaches and organizational direction that depend on subjective decisions about what matters most. Startup founders must critically analyze product-market fit and business model decisions that depend on subjective assessments of market value and customer needs.

Government policy teams face complex policy decisions that require meaningmaking about societal values and trade-offs. Nonfiction writers need to clarify and strengthen arguments for book and article proposals that take positions on subjective matters. More broadly, anyone engaged in persuasive writing contexts needs to take defensible positions on subjective matters where human judgment about value, worth, and meaning is central.

The opportunity here emerges from the widespread — in fact, nearly universal — failure to treat AI systems as tools requiring clear understanding of their affordances and constraints, and in particular their complete inability to do meaningmaking work.

Organizations currently lack structured frameworks for high-stakes reasoning that make explicit what humans must do (meaningmaking) versus what AI systems can do well. Existing AI tools provide information synthesis but obscure the boundary between human meaningmaking work and machine capabilities, leading to poor outcomes when users unknowingly delegate subjective judgments to systems that are constitutionally unable to make them.

There’s a growing need for decision-making frameworks that make this division of labor explicit. This will allow product and UX design that leverages AI tools for non-meaningmaking work at which machines excel (like data gathering, pattern recognition, and following fully specified rules with no edge cases) while ensuring all meaningmaking work remains clearly in human hands (like handling exceptions, making aesthetic and moral decisions, and deciding about tradeoffs). The critical design challenge is building AI applications that function transparently as sophisticated tools rather than creating the seductive mirage of human-ness. With this design approach, we can avoid what I call the AI expertise conundrum, in which the design of AI tools makes them most useful to people with enough domain expertise to ask good questions and spot flaws — but they leave novices vulnerable to plausible but biased or simply incorrect outputs.”

Testers needed

Pilot testers: I’m now actively seeking educational institutions for pilot testing the next iteration of this experiment, particularly undergraduate programmes with students facing complex critical reasoning tasks as part of coursework. High school programs working with seniors on college applications, career planning, or complex project work would also be valuable partners. The value proposition for pilot institutions is clear: dramatically improved student reasoning outcomes, structured support for high-stakes decisions, and measurable improvements in critical thinking skills.

In my view, this research contributes to the broader challenge of building AI systems that are aligned with human values by ensuring that humans recognise subjective value decisions and retain control over them. By making meaningmaking work explicit and systematic, we can develop AI tools that enhance rather than diminish human agency in reasoning and decision-making.

If you want to help, let me know.


For the last few years, I’ve been wrestling with the practical challenges of meaning-making in our increasingly AI-saturated world, developing frameworks for how humans can work effectively alongside these powerful tools while preserving the meaning-making work that is the irreplaceably human part of the reasoning we do. I’ve published this as a short series of essays on meaning-making as a valuable but overlooked lens for understanding and using AI tools

I’ve also been working on turning discomfort into something productive. idk is the first of these tools for productive discomfort.

And I’ve spent the last 15 years investigating how organisations can succeed in uncertain times. The Uncertainty Mindset is my book about how to design organisations that thrive in uncertainty and can clearly distinguish it from risk.