"Loading wisdom..." — The simulation
Iron Triangle AI Simulator

Constraint visualization

Simulation Week 0
Scope
100%
baseline
Cost
100%
baseline
Time
100%
baseline
Tech debt
0
clean
Team health
100
strong
Quality
100%
Debt
0%
Jevons
+0%
Morale
100%
Experience
0%
Task-level +0%
Perceived +0%
Actual +0%
Original
AI frontier
Actual
Mgmt expects

Amdahl's Law curve

Goodhart's Law — Dashboard vs Reality

Factory floor

Guide

What this tool does

This is an interactive simulation of the iron triangle — the project management constraint that says you can't simultaneously maximize scope, speed, and quality without increasing cost. The question it explores: does AI break this constraint, or just move it?

The triangle visualization in the center shows four overlapping shapes. The green triangle is your baseline capacity — what the team can deliver without AI. The blue triangle is the theoretical AI-boosted frontier. The red triangle is actual achievable capacity after hidden costs. The amber dashed triangle is what management expects. When amber exceeds red, quality absorbs the gap.

This is a simulation, not a calculator. Set your sliders, then watch what happens over simulated weeks. Debt accumulates. Morale erodes. Jevons scope creeps. The insight is in the trajectory, not the snapshot — configurations that look fine at Week 2 can be catastrophic by Week 26.

The controls

AI paradigm belief — How transformative do you think AI actually is? This isn't a preference — it changes the underlying math. Skeptic: high overhead, steep diminishing returns. True believer: near-zero iteration cost, declining review burden. Set this first — it shapes everything else.

Demand elasticity — Jevons Paradox. When AI makes cognitive output cheaper, does the organization bank the savings or consume them as new demand? Inelastic: bounded tasks, efficiency becomes slack. Super-elastic: scope auto-expands to eat every gain. This is the slider most people don't expect. Try the "jevons demo" preset to see it in isolation.

AI productivity boost — How much AI-generated throughput the team is using. Higher values mean more output but also more hidden overhead and review burden.

Management scope push — Top-down demand increase. "We have AI now, so we should deliver more." This is additive on top of Jevons auto-expansion.

Review & validation — Time spent checking AI output. The critical dial. Low review ships faster but accumulates technical debt silently. High review catches hallucinations but slows throughput.

Time budget — Timeline compression or extension. Negative values squeeze deadlines. Positive values add slack for review and iteration.

The simulation engine

Three things compound over simulated time without you touching anything:

Technical debt accumulates when AI output ships with thin review. It drags down the effective AI boost — at 50% debt, the AI slider is lying to you by ~15%. Debt doesn't self-resolve; it requires either high review (paydown) or scope reduction (time to remediate).

Team morale erodes under sustained low quality, scope overload, time pressure, and high debt. Below 50%, attrition accelerates. Below 30%, institutional knowledge loss becomes the binding constraint. Morale recovers slowly — much slower than it drops.

Jevons scope auto-expands based on AI efficiency and demand elasticity. This is organic, bottom-up scope creep — the organization discovering new work that AI makes feasible. It's additive on top of whatever management is demanding.

Try these scenarios

The optimistic adoption: Click "sweet spot." Low scope push, adequate review, moderate AI. Watch the simulation run for 6 months. Quality holds. Debt stays low. Morale is stable. This is what sustainable AI adoption looks like — and it looks boring.

The realistic failure: Click "death march." High scope, high AI, no review, compressed timeline, high elasticity. Watch debt spiral, morale collapse, and quality crater within 3 months. This is the configuration that ships great quarterly numbers and implodes six months later.

Jevons in isolation: Click "jevons demo." No management scope push at all. Watch scope expand on its own — purely from the organization discovering new uses for cheaper cognitive output.

The recovery: Run "death march" for 3 months, then pause. Drop scope to 20%, raise review to 40%, extend timeline to +15%. Resume. Watch how long it takes to recover — and notice that recovery is slower than the descent.

The incident test: Set up high AI with low review. Click "Simulate incident" repeatedly. Each failure adds debt and destroys morale. The question isn't whether an incident happens — it's whether the configuration can absorb one.

Bull case

The case that AI is genuinely different

AI is not a faster typewriter. Every prior tool accelerated existing human work. AI performs cognitive work that previously required additional humans. When a tool can draft, reason, and iterate on its own output, you haven't sped up the line — you've added workers who don't need salaries and don't get tired.

Marginal cost of iteration approaches zero. Five drafts cost the same as one. Ten test variants cost the same as one. This changes the scope/quality tradeoff fundamentally — more iterations used to mean more time and cost. Now iteration is effectively free.

The "boring middle" is eliminated, not just accelerated. Boilerplate, scaffolding, docs, test generation, data transforms — these consumed enormous time without adding differentiated value. Freeing humans from commodity cognitive work may break old constraints because you're eliminating categories of work, not optimizing them.

The review burden is temporary. Current models hallucinate. But model capability improves on a steep curve. Budgeting for 2024-level review costs in a 2026 plan is like budgeting for horse-feeding after buying a car.

Historical discontinuities are real. Spreadsheets didn't just make accounting faster — they made entirely new analyses possible. The assembly line didn't speed up car production — it made cars affordable. These were genuine paradigm shifts where old constraint models stopped applying.

The macro data is turning. BLS reported 4.9% nonfarm productivity growth in Q3 2025 — a pattern not seen since 2019. Software engineering job demand, after an initial AI dip, sharply accelerated as AI made software viable for previously prohibitive use cases. The aggregate signal is real, even if micro attribution is disputed.

Skeptic

Why the constraints always reassert

Every generation has made this claim. Mainframes would eliminate middle management. PCs would create the paperless office. Agile would make estimation unnecessary. The cloud would make infrastructure free. The constraints always reassert — they just move.

Zero-cost iteration is an illusion. Generating five drafts is cheap. Evaluating five drafts to pick the best one and verify it's correct is not. You've traded production cost for evaluation cost.

The "boring middle" was load-bearing. Documentation and tests existed because writing them forced understanding. Skip the process, and you get teams shipping AI-generated code they don't fully understand, with AI-generated tests that don't cover the actual failure modes. That's technical debt with a new name.

"The models will get better" is a bet, not a plan. You can't manage a project on the assumption your tools improve mid-project. And even if models improve, the tasks you give them get harder — that's how organizations work.

Spreadsheets prove the triangle. They made new analyses possible — then organizations demanded more analyses, faster, from fewer people. Forty years later, finance teams are still overworked. The frontier moved. The triangle held.

The data agrees — so far. Dubach's 2026 synthesis of six studies: teams merged 98% more PRs but review time increased 91%, with DORA delivery metrics unchanged. Convergence on ~10% organizational gains vs. 40-60% vendor claims. METR's RCT: developers perceived 24% speedup while experiencing 19% slowdown. The new Solow Paradox — you can see AI everywhere except in the productivity statistics.

Mechanics

Why "fast now" means "slow later"

The debt counter in the visualization tracks what happens when AI-generated output ships without adequate review. It compounds over time: each tick with low review and high AI output adds to the debt balance. As debt grows, the AI boost becomes less effective — because teams are spending increasing time fixing, understanding, and refactoring old AI-generated work instead of building new features.

The J-curve is embedded in the debt model. Early on, low-review AI adoption looks fantastic — high output, manageable quality. But debt accumulates silently. By the time it's visible (test failures, architectural problems, incidents), the team is already underwater. Try the "death march" preset and watch the debt counter climb. Then imagine explaining to leadership why velocity dropped six months after the AI tools "worked great."

The optimist counter: Better models produce less debt-prone output. If you set paradigm belief high, the debt accumulation rate drops — because in the optimistic model, AI output is intrinsically more correct and requires less remediation. This is a testable claim. Measure it in your org and see which model fits.

The data is in. The Productivity-Quality Paradox (IJSET 2026): AI accelerates MVP development by 40-60% but produces 4x code duplication and 2x code churn vs. 2021 baselines. Veracode found 45% of AI code introduces OWASP Top 10 vulnerabilities. CodeRabbit: 2.74x more security vulnerabilities in AI-generated vs. human-written code. The incident probability in this simulation is calibrated against these numbers.

Computing theory

Amdahl's Law and the serial bottleneck

In 1967, Gene Amdahl presented a deceptively simple argument at the AFIPS Spring Joint Computer Conference that has haunted computer architects — and now AI adopters — ever since. His claim: the speedup of any system is fundamentally limited by the fraction of work that cannot be parallelized. No matter how fast you make the parallel part, the serial part sets an absolute ceiling on total improvement.

The math is elegant. If a fraction p of your work can be accelerated by a factor s, the total speedup is: 1 / ((1 − p) + p/s). As s approaches infinity — as the accelerated part becomes arbitrarily fast — the maximum possible speedup converges to 1 / (1 − p). If only 40% of your work is accelerable, the theoretical maximum speedup is 1.67x. Not 10x. Not 100x. 1.67x. The serial fraction is an asymptotic wall.

Amdahl's insight was controversial precisely because it was correct. The parallel computing community spent decades trying to argue around it — and the arguments always reduced to "we'll make more of the work parallelizable," which is Amdahl's point: the serial fraction is the constraint that matters, not the speedup of the parallel fraction. Throwing more processors (or more AI) at the problem hits diminishing returns governed by the serial bottleneck.

The AI translation

Replace "parallel" with "AI-accelerable" and the law maps directly onto AI-assisted knowledge work. A software project consists of many task types. Some are highly amenable to AI acceleration: code drafting, boilerplate generation, test scaffolding, documentation, data transformation, pattern-based refactoring. These are the parallel fraction — AI can make them 3x, 5x, even 10x faster.

But the project also contains work that is fundamentally serial — work that requires human judgment, context, and coordination that AI cannot meaningfully accelerate today: architectural decisions, stakeholder alignment, requirement disambiguation, integration testing, security review, production incident response, mentoring junior engineers, navigating organizational politics, and the judgment calls that determine whether the technically correct solution is the right one for this team at this moment.

If generation is 40% of your workflow and AI makes it 10x faster, Amdahl says your total workflow speedup is 1.56x, not 10x. The 60% serial fraction — the judgment, integration, and coordination work — didn't get faster. It's now the binding constraint. And here's the insidious part: as the accelerable part speeds up, the serial part doesn't just persist — it becomes more visible, more pressured, and often slower because it's now processing a higher volume of AI-generated output that requires human evaluation.

Why "AI gives us 60% more throughput" becomes 20-30%

This is the phenomenon every engineering manager has experienced but couldn't name. The AI vendor says "60% productivity boost." The team reports "maybe 25% faster, on a good day." The manager suspects the team is sandbagging or not using the tools properly. They're not. Amdahl predicted exactly this gap.

The vendor measured the speedup of the accelerable fraction. The team experienced the speedup of the whole workflow. These are fundamentally different numbers, and Amdahl's Law defines the precise relationship between them. A 60% boost to 50% of the work is a 23% boost to the project. A 60% boost to 30% of the work is a 15% boost to the project. The serial fraction determines where the actual number lands.

The AI-applicable work slider in this simulation models exactly this. Slide it left (mostly serial) and watch the gap between the blue triangle (theoretical frontier — what the vendor promised) and the red triangle (actual capacity — what the team delivers). That gap is Amdahl's Law. No amount of AI investment closes it. The only way to close it is to make more of the work AI-accelerable — which is a process redesign challenge, not a tooling challenge.

The organizational implications

Amdahl tells you where to invest. Once you've adopted AI for the accelerable fraction, further AI investment hits steep diminishing returns. The bottleneck has moved to the serial fraction. Improving the serial fraction — better architecture practices, clearer requirements, faster decision-making, reduced stakeholder coordination overhead — now yields more throughput improvement than any AI tool investment. But organizations keep buying AI tools because that's the visible lever, even though Amdahl says the invisible lever (serial work reduction) is the one that matters.

Amdahl also tells you why Jevons is so dangerous. The combination is toxic: Jevons expands total scope (more work), while Amdahl limits the speedup of that work (serial bottleneck persists). The team does more work, but each unit of work still hits the same serial constraint. Volume increases. Per-unit time doesn't proportionally decrease. The result: more total hours, not fewer, despite "productivity tools." This is measurable. Track it in your org.

The paradigm belief slider interacts with Amdahl. A skeptic believes the AI-applicable fraction is low (30-40%) — most work requires human judgment. A true believer thinks the fraction is high (80%+) — AI handles nearly everything. The truth for your organization is an empirical question, not a philosophical one. Measure what fraction of your team's time is actually spent on AI-accelerable tasks. That number, plugged into Amdahl's formula with your observed AI speedup, predicts your actual project-level acceleration. If the prediction matches reality, you've found your Amdahl fraction. If it doesn't, you're measuring the wrong thing.

Economics

Jevons Paradox: when efficiency increases consumption

In 1865, Jevons observed that Watt's more efficient steam engines didn't reduce coal consumption — they made coal viable for entirely new industries, and total consumption exploded. The average UK resident in 2000 consumed 6,000 times more artificial light than in 1800 for the same reason: cheaper light didn't reduce demand, it created it. AI is doing the same thing with cognitive output. Cheaper code, docs, and analysis doesn't mean less work. It means more.

The academic consensus: partial rebound (<100%) is the most common micro-level outcome — efficiency gains yield real benefit, just less than the engineering estimate. Full Jevons (consumption exceeds pre-efficiency baseline) is real but conditional. The exceptions tell you when the paradox fires and when it doesn't.

Where Jevons does NOT hold

Saturated markets. There's a ceiling on how many hours you light your house. You don't take three showers because hot water got cheaper. When demand is naturally bounded, efficiency banks as savings. AI parallel: bounded tasks (one weekly report, one deployment pipeline) don't expand infinitely. Jevons doesn't fire on tasks with natural saturation.

High-friction resources. Transportation rebound is typically 10–30% — cars got more efficient, people drove somewhat more, but driving has real friction (time, traffic, physical presence). AI parallel: tasks requiring physical presence, real-time judgment, or regulatory compliance don't expand just because adjacent tasks got cheaper.

Non-cost-constrained resources. Jevons requires under-consumption because of cost. If the bottleneck is regulation, attention, or decision-making capacity, making production cheaper doesn't increase demand. Nobody writes more contracts just because legal research got faster if the bottleneck was client relationships. AI parallel: attention and decision-making capacity are often the binding constraint in knowledge work, not production speed.

Where Jevons holds HARD

Super-elastic demand with low saturation. A 2026 paper formalized a "Structural Jevons Paradox" for AI: as the unit price of intelligence falls, firms endogenously redesign architectures to consume dramatically more compute. When there's no ceiling on useful consumption, cheaper = explosive growth. This is the lighting curve — 6,000x in two centuries.

Efficiency that opens new use cases. Jevons's coal observation wasn't just "more burning" — efficient engines made coal viable for industries that couldn't have used it before. Software engineering jobs in 2025 followed the same pattern: after an initial AI dip, demand sharply accelerated because AI made software viable where development cost was previously prohibitive.

Displaced, not reduced consumption. Nations showing efficiency-driven energy reduction have often outsourced energy-intensive production elsewhere. The efficiency didn't reduce consumption — it moved it somewhere unmeasured.

The critical question for your organization

Is demand for cognitive output more like heating (bounded — you only need so much) or lighting (unbounded — cheaper means you use it everywhere you never could before)? The answer is both, depending on the task. Status reports are heating. Test coverage is lighting. Your leadership is probably treating all cognitive output as super-elastic when some tasks are naturally bounded.

The academic confirmation. Zhang & Zhang (2026) formalized this as the "Structural Jevons Paradox": firms don't just consume more of the cheaper resource — they redesign their architectures to consume dramatically more. Falling API prices induce deeper reasoning loops, larger context windows, and multi-agent workflows that multiply token consumption per task. Reimers & Waldfogel (2026) found the same in book publishing: releases tripled post-LLM, average quality declined, but the top 1,000 monthly releases got better. Jevons + quality bifurcation in one dataset.

The pragmatic takeaway: ask "will demand for this task expand to fill the efficiency gain?" for each category of work. Where yes — plan for it, the gains increase scope not reduce effort. Where no — bank the savings. The demand elasticity slider in this simulation models exactly this distinction. The mistake is applying one answer across the board.

Measurement

Goodhart's Law — when metrics become targets

In 1975, Charles Goodhart observed: "Any observed statistical regularity will tend to collapse once pressure is placed upon it for control purposes." Simplified by Marilyn Strathern: "When a measure becomes a target, it ceases to be a good measure." This is the foundational insight for understanding why AI adoption metrics look great while outcomes deteriorate.

The mechanism is simple. Organizations choose metrics to track AI productivity: PRs merged, velocity points, code output, deploy frequency, cost per feature. These metrics are proxies — they correlate with actual productivity under normal conditions. But when they become targets ("increase PRs by 40%"), teams optimize for the metric rather than the underlying reality. AI makes this optimization trivially easy: generate more PRs, inflate velocity estimates, produce more code. The metrics improve. The product doesn't.

The dashboard in this simulation demonstrates this directly. The left panel ("What Leadership Sees") shows metrics that are designed to look good: they use the perceived boost (not actual), they lag reality by 3-4 weeks, and they have flattering floors that prevent them from ever showing decline. This isn't a caricature — it's how organizational metrics actually behave. PRs merged is a count, not a quality measure. Velocity points are self-assigned estimates. Deploy frequency says nothing about what was deployed.

The right panel ("What's Actually Happening") shows the metrics that matter: output quality, accumulated technical debt, team morale, and rework rate. These metrics are harder to measure, slower to respond, and politically inconvenient to report. They are also the ones that predict whether the product succeeds and whether the team survives.

The disconnect score at the bottom is the key number. When it exceeds 40%, the organization is flying blind — the instruments show one thing, the plane is doing another. This is the state where leadership makes confident decisions based on data that doesn't reflect reality. "The AI initiative is going great" (based on dashboard metrics) while the team is in a death spiral (based on actual outcomes).

Why AI makes Goodhart worse. Pre-AI, optimizing for proxy metrics required human effort — writing more code to inflate KLOC counts was expensive. AI makes metric optimization nearly free. You can generate 10x more PRs with no additional effort. You can inflate code output by letting AI write verbose solutions. You can increase deploy frequency by shipping smaller, less-reviewed changes. Every proxy metric becomes trivially gameable, and the gap between proxy and reality widens faster than the organization can detect it.

The NBER finding validates this: 374 S&P 500 companies mentioned AI positively in earnings calls while productivity statistics remained flat. CFO-perceived gains substantially exceed measured revenue-based gains. The narrative metric (what executives say in public) has decoupled from the outcome metric (what actually improved). This is Goodhart at the macro level.

The defense against Goodhart is measuring outcomes, not outputs. Don't measure PRs merged — measure customer-reported defects. Don't measure velocity points — measure time-to-value for features that customers actually use. Don't measure code output — measure code that's still in production six months later without modification. These metrics are harder to game because they measure the thing you actually care about, not a proxy for it. The AI-adopted organization that tracks outcomes instead of outputs is the one that will know whether its AI investment is working.

Research

Empirical evidence: what the data actually shows

The perception gap

METR (July 2025, updated Feb 2026): In a randomized controlled trial, 16 experienced open-source developers using AI tools on tasks from their own repositories took 19% longer to complete work. Before the study, they predicted AI would make them 24% faster. After experiencing the slowdown, they still believed AI had sped them up by 20% — a 39-percentage-point perception gap. The February 2026 update with a larger cohort showed -4% (CI: -15% to +9%), concluding "AI likely provides productivity benefits in early 2026." The perception gap persists. This simulation shows both numbers: "perceived" and "actual" in the stats panel.

Faros AI (June 2025): Telemetry from 10,000+ developers found over 75% use AI coding assistants, but organizations report a disconnect: developers say they're working faster, companies see no measurable improvement in delivery velocity or business outcomes. This is Amdahl's Law made empirical — individual task speedup doesn't translate to system-level throughput.

The organizational reality: ~10%

Dubach synthesis (March 2026): Six independent studies consolidated: teams with high AI adoption merged 98% more pull requests but saw review time increase 91%, with DORA delivery metrics unchanged across 10,000+ developers. At 92.6% monthly adoption and 27% of production code AI-generated, six independent research efforts converge on roughly 10% organizational productivity gains. That 10% is far below the 40-60% vendor marketing claims — and it's the Amdahl's Law prediction: task-level boost filtered through the serial bottleneck equals modest system-level improvement.

The tech debt evidence

"The Productivity-Quality Paradox" (IJSET, Jan 2026): AI accelerates MVP development by 40-60% and improves automated test case accuracy to nearly 98%, but has triggered a sustainability crisis: 4x increase in code duplication (violating DRY principles) and doubling of code churn compared to 2021 baselines. Fast now, slow later — the debt engine in this simulation calibrated against real metrics.

Veracode (2025): Testing 100+ LLMs across 80 coding tasks found 45% of AI-generated code introduced OWASP Top 10 vulnerabilities. CodeRabbit (2025): AI-generated code contains 2.74x more security vulnerabilities than human-written code. These numbers calibrate the "Simulate incident" button in this tool.

The Jevons evidence

Zhang & Zhang (Jan 2026), "The Economics of Digital Intelligence Capital": Formalized the "Structural Jevons Paradox" for AI: as the unit price of intelligence falls, firms don't run the same workloads cheaper — they endogenously redesign their architectures to consume dramatically more compute. Falling API prices induce deeper reasoning loops, larger context windows, and multi-agent workflows that multiply token consumption per task. It's not just "people want more" — the architecture of work itself restructures to consume the efficiency gains. This is exactly the demand elasticity mechanic in this simulation.

Luccioni et al. (Jan 2025, ACM FAccT 2025): Examined Jevons Paradox applied to AI, arguing efficiency gains may spur increased consumption. Concrete example: an AI-driven logistics system reduces delivery times and fuel per vehicle, yet simultaneously encourages more frequent online orders, elevating total miles driven.

Reimers & Waldfogel (2026): After LLMs entered book publishing, new releases tripled but average quality declined. The top 1,000 monthly releases showed higher quality than pre-LLM, while new entrants flooding the market drove quality decline. Jevons + Goodhart + the triangle in one dataset: efficiency created more output, average quality dropped, but the best work got better.

The macro picture

NBER (Feb 2026): Survey of nearly 6,000 executives found the vast majority see little AI impact on operations. Yet 374 S&P 500 companies mentioned AI in earnings calls as "entirely positive" — but those claims aren't reflected in productivity gains. This is Goodhart's Law: the narrative metric (earnings call mentions) has decoupled from the outcome metric (actual productivity). CFO-perceived gains substantially exceed measured revenue-based gains.

The bull case data (BLS): Nonfarm business productivity increased 4.9% in Q3 2025, with Q2 revised upward to 4.1%, while unit labor costs declined for two consecutive quarters — a pattern not seen since 2019. Jason Furman, previously skeptical, now agrees AI may be contributing to aggregate productivity gains. The macro signal is real, even if the micro attribution is disputed.

Software engineering jobs (2025): After an initial dip during the AI efficiency wave, developer demand sharply accelerated, ending the year growing faster than overall postings. AI didn't eliminate developer demand — it made software viable for use cases where development cost was previously prohibitive. Jevons Paradox at the labor market level.

Pragmatic

The honest answer is nobody knows yet

Both positions are internally consistent. The skeptic points to every prior technology cycle. The optimist points to the qualitative difference between automating physical work and automating cognitive work.

What's not honest is pretending the question is settled. An organization that mandates "use AI everywhere to increase output" without acknowledging the uncertainty is making a bet and calling it a strategy. The triangle is a useful forcing function: it makes you name which constraint you're relaxing and which you're holding.

The pragmatic position: act like the skeptic, hope for the optimist. Budget for real overhead. Measure actual vs. theoretical gains. Build review processes. And if the models get so good that review becomes unnecessary — great, you'll have slack in the budget. That's a better failure mode than the reverse.