Research Report · Tempo Labs · 2025

Where software teams spend their time — and why it's structurally wrong

Squad man-hours vs. step importance analysis · 16-person reference squad · 5 meta-framework steps · 7 roles modeled

We modeled how a typical product squad actually allocates its time across the five stages of the product development meta-framework — Discover, Define, Design, Execute, and Measure — and compared this allocation against an independent assessment of how much each stage contributes to product success. The result reveals a severe, systemic misalignment: the stages that determine whether a product succeeds receive a fraction of the time invested in them, while execution — the most automatable stage — consumes the majority of squad effort. This misalignment is not a management failure. It is a structural artifact of an era in which human hands were required to write code. That era is ending.

The meta-framework and why it matters

Across every industry that builds products, the same five-stage process repeats under different names. We call this the product development meta-framework:

🔍
Discover
Understand the problem
🎯
Define
Scope the right solution
✏️
Design
Architecture and UX
⚙️
Execute
Build and ship
📊
Measure
Learn and iterate

This framework is not specific to software. Doctors follow it (SOAP notes), lawyers follow it (IRAC), researchers follow it (IMRaD). What makes software product development interesting is that one stage — Execute — has historically required a disproportionate amount of skilled human labor, distorting the entire process around it.

The question this report investigates is simple: does the time allocation across these five stages reflect their actual importance to product success? The answer, as we will show, is no — and the divergence is large enough to constitute a structural failure of how the industry organizes itself.


Data model: how we built the man-hours estimate

Our model is built from the ground up using a reference squad composition. Rather than using per-person hours alone, we weight each role's contribution by its headcount — because a squad with 8 software engineers and 1 PM has a very different time distribution than the raw per-person numbers suggest.

Reference squad composition

The default squad modeled is a typical growth-stage product team: 1 PM, 2 UX Designers, 8 Software Engineers, 1 Data Scientist, 2 QA Engineers, 1 DevOps Engineer, and 1 Analyst — 16 people in total.

Role Headcount Discover hrs/person Define hrs/person Design hrs/person Execute hrs/person Measure hrs/person Total
PM18060304030240
UX Designer270401202015265
SWE810254020015290
Data Scientist12020358050205
QA2515158020135
DevOps1510206030125
Analyst13035151025115
Key methodological note The hours above reflect a single representative product cycle — approximately 12 weeks. They represent productive working hours only, before coordination overhead is applied. The SWE's 200 hours in Execute reflects roughly 55–60% of their available time after meetings, reviews, and admin are excluded from the productive bucket.

Headcount-weighted totals

Man-hours per step are computed as the sum of (hours-per-person × headcount) across all roles. This is what makes the model accurate: a single SWE spending 200 hours in Execute becomes 1,600 man-hours when you account for 8 engineers. Compare that to the PM's 40 hours there, and the imbalance becomes stark.

Productive man-hours by role and step
Absolute hours — before normalization — showing the raw scale difference between roles

The Execute step, driven almost entirely by SWE hours, dwarfs every other step in raw man-hour terms. This is the organizing fact around which all other analysis flows.


The importance vector: a paradigm-free lens

The most common mistake in analyzing time allocation is to derive "importance" from the existing time distribution — in effect, using the output to validate the input. We explicitly rejected this approach.

"How much does getting each step right — or wrong — determine whether the product succeeds? This is a question about outcomes, not headcount."

Our importance vector is a single set of weights assigned to each step based on one criterion: how much does the quality of work in this step determine whether a product achieves product-market fit or fails? This is deliberately independent of how many people are involved, how long it takes, or how it has traditionally been resourced.

StepImportance weightNormalized shareRationale
Discover 32%
Getting the problem wrong is the single largest cause of product failure. No excellent execution recovers from building something nobody needs. Asymmetric downside: failure here almost guarantees failure overall.
Define 24%
Fuzzy definition creates compounding misalignment downstream. Scope drift and wrong success metrics are more expensive to fix at every subsequent step.
Design 17%
Architecture and UX decisions create technical and product debt that's expensive to unwind. Bad calls here are recoverable but costly.
Execute 11%
In an AI-assisted world, execution is increasingly commoditized. It is table stakes — necessary but no longer the primary differential lever for success.
Measure 16%
Chronically underweighted in practice. Critical to compounding — products that can't measure outcomes can't iterate toward PMF. Its importance is structural and long-term.
Paradigm note This importance vector is deliberately not the historical one. In the pre-AI paradigm, Execute received high importance weights because it was genuinely hard — finding and retaining engineers was a bottleneck that constrained everything else. We have removed that constraint from the model and asked: in a world where code can be generated, what actually determines product success? The answer shifts the weight dramatically upstream.

The alignment formula

With both dimensions normalized to sum to 100%, we need a formula that captures how well time allocation tracks importance — not just the size of the gap, but how disproportionate it is.

Why absolute gap is insufficient

An early version of this model used absolute percentage-point differences to measure alignment. This fails in a headcount-weighted model because the gaps become very large — Execute might show a 39pp gap — and a linear scaling factor can't represent both large and small gaps accurately across the same chart.

The ratio-based alignment score

The correct measure is the ratio between importance and time, not the difference. Perfect alignment means time is proportional to importance. Any deviation in either direction — too much time or too little — should be penalized symmetrically.

Step alignment formula

stepAlignment(s) = 100 × min( imp[s] / time[s], time[s] / imp[s] )

Where imp[s] is the normalized importance share of step s, and time[s] is the normalized time share of step s.

This formula maps to 100% when imp equals time exactly, and approaches 0% as the ratio diverges in either direction. A step receiving 4.5× more time than its importance warrants scores approximately 22%.

overallAlignment = Σ [ stepAlignment(s) × (imp[s] / 100) ]

The overall score is importance-weighted — steps with high importance that are badly misaligned drag the score down more than low-importance steps that are badly misaligned.

Interpretation guide 80–100%: Well aligned — time roughly tracks importance. 55–79%: Moderate misalignment — worth addressing. Below 55%: Severe misalignment — the step is either starved or gorged relative to its outcome contribution.

Step-by-step findings

The chart below shows the normalized comparison: each bar represents the share of total squad man-hours going to a step, broken down by role. The dashed outline shows the importance target. The line shows the ratio-based alignment score for each step.

Normalized man-hours vs importance · productive time only · 16-person squad
Both dimensions sum to 100%. Solid bars = productive hours by role. Dashed outline = step importance. Line = alignment score (right axis).
50.3%
Overall alignment
(incl. overhead)
vs 37.7% productive-only — the gap is the coordination tax
4.6×
Execute time/importance
ratio
50.5% of time · 11% of outcomes
95.5%
Best step alignment
(Design)
Only step where time tracks importance accurately

Discover — severely underinvested

Discover Underinvested
Importance
32%
Highest of any step
Time (productive)
~9%
Of total man-hours
Time/importance ratio
0.28×
3.6× underallocated
Step alignment
28%
Severely misaligned

Discover receives the smallest share of productive hours despite driving the largest share of product outcomes. The primary contributors to this step are PM (80h), UX Designer (70h per person × 2 = 140h), and Analyst (30h). SWEs contribute only 10h each — typically initial feasibility spikes. Total productive man-hours: approximately 350h, or 9.5% of the squad's productive budget.

The reason for this underinvestment is structural: discovery work is harder to schedule, harder to measure, and produces no visible artifact by the end of a sprint. In an industry organized around shipping, time flows toward things that produce commits.

Define — moderately underinvested

Define Underinvested
Importance
24%
Second highest step
Time (productive)
~14%
Of total man-hours
Time/importance ratio
0.58×
1.7× underallocated
Step alignment
58%
Moderate misalignment

Define is where discoveries are turned into scoped problem statements, requirements, architecture decisions, and success metrics. Its underinvestment is somewhat less severe than Discover's, partly because PRD writing and architecture planning have more visible deliverables that sprint planning can accommodate.

However, the quality of Define work is frequently undermined by the fact that Discover was underfunded. Teams define solutions to poorly understood problems, building in compounding misalignment from the start.

Design — the only well-aligned step

Design Well aligned
Importance
17%
Third step
Time (productive)
~17.8%
Of total man-hours
Time/importance ratio
1.05×
Near-perfect
Step alignment
95.5%
Best step in model

Design is the only step where time and importance are well matched. UX Designers (120h per person × 2 = 240h) dominate this step, and their work — wireframes, prototypes, system architecture — has enough visible artifact weight to receive appropriate scheduling. SWEs contribute 40h each here (320h total), primarily in technical design and architecture review.

Execute — severely overinvested

Execute Overinvested
Importance
11%
Lowest importance step
Time (productive)
~50.5%
Of total man-hours
Time/importance ratio
4.6×
Most overallocated step
Step alignment
21.8%
Worst alignment score

Execute is the organizing center of the pre-AI product team. With 8 SWEs each spending 200 productive hours here (1,600h), plus 160h from QA, 60h from DevOps, and 80h from Data Scientists, the step consumes approximately 1,980 productive man-hours — the equivalent of the entire rest of the framework combined, twice over.

This is not a mistake. It reflects the genuine historical reality that writing, testing, and deploying code required human beings. The misalignment this creates — 4.6× more time than the step's outcome contribution warrants — is a structural artifact of that constraint, not of poor management.

Measure — chronically starved

Measure Underinvested
Importance
16%
Equal to Design
Time (productive)
~8.5%
Of total man-hours
Time/importance ratio
0.53×
1.9× underallocated
Step alignment
53.1%
Moderate misalignment

Measure receives similar importance to Design (16%) but half the time allocation. This reflects a near-universal pattern: teams ship features and immediately pivot to planning the next sprint, never closing the loop between what was built and whether it worked. The result is a product development process that cannot compound — each cycle starts from approximately the same knowledge base as the last.


The coordination overhead layer

On top of productive hours, every role in a multi-person squad incurs coordination overhead: time spent on standups, sprint ceremonies, PR reviews, stakeholder readouts, alignment meetings, and knowledge re-transfer. This time carries zero importance weight — it exists entirely because multiple humans need to synchronize state that a single integrated system would hold natively.

Core argument Coordination overhead is pure organizational friction. In a sufficiently capable single-agent system, this cost collapses to zero — there is no one to sync with. The magnitude of this overhead is therefore a proxy for how far a team is from the single-system ideal, and how much time could theoretically be recovered by AI-native tooling.

Overhead by role and step

We modeled coordination overhead as a separate layer on top of productive hours, estimated as follows:

RoleDiscoverDefineDesignExecuteMeasureSource of overhead in Execute
PM20h35h25h40h20hStakeholder readouts, sprint planning, escalations
UX Designer15h20h30h20h10hDesign review cycles, handoff back-and-forth
SWE5h15h15h60h8hStandups (~5h), ceremonies (~24h), PR reviews (~36h)
Data Scientist10h15h10h25h20hAnalysis review loops, stakeholder explainers
QA3h10h8h25h10hBug triage, retesting coordination, sprint ceremonies
DevOps3h8h8h20h12hDeployment coordination, incident response syncs
Analyst12h15h8h10h15hReport review cycles, stakeholder alignment
27.7%
Of all squad hours
is coordination overhead
1,457 hrs on a 16-person squad
480h
SWE coordination in Execute
alone (8 engineers × 60h)
Standups + ceremonies + PR reviews
36.6%
Of Define step time
is coordination overhead
The highest overhead share of any step

The Define step having the highest overhead share (36.6%) is counterintuitive but logical: it is the step that requires the most cross-functional alignment. PMs, designers, engineers, and analysts all need to agree on scope, requirements, and architecture — and that agreement costs time. The actual discovery and definition work is often less time-consuming than the alignment ceremonies around it.

Total man-hours including coordination overhead · by step and role
Solid bars = productive time. Faded bars = coordination overhead. Together they show how total squad time is consumed.

Implications for the AI era

The misalignment described in this report is not an accident, and it is not the result of poor product management. It is the rational outcome of an era in which writing code was genuinely the scarce, expensive, bottleneck activity — and therefore the activity around which everything else was organized.

That era is ending. AI-assisted coding tools have begun commoditizing the Execute step. As this commoditization deepens, the structural logic that justified the current time distribution dissolves. Two futures become possible.

Failure path

AI replaces hands. Same broken process, faster.

Engineers are replaced by AI code generation. The same process — underinvesting in discovery, rushing to execution — runs faster. Wrong things are built more efficiently than ever. Companies burn AI credits shipping features nobody wanted.

Success path

Freed time flows upstream to where outcomes are determined.

Teams redirect reclaimed execution hours into Discover and Measure — the two most underinvested, highest-importance steps. Product teams shrink but improve. The limiting factor shifts from hands to thinking.

What changes by step

Step
Pre-AI constraint
Post-AI opportunity
Discover
Underfunded because there's no sprint-visible artifact. Treated as a soft activity.
AI synthesis of research, automated interview analysis, pattern detection — all increasing ROI of discovery time dramatically.
Define
Alignment overhead dominates. 36% of Define time is ceremonies, not actual definition work.
A shared AI context layer eliminates re-sync overhead. Decisions stay attached to their rationale permanently.
Execute
Requires expensive human engineering labor. Justifies its 50% time share by necessity.
AI-generated code collapses the time cost. Execute becomes the output of a well-run upstream process, not the center of gravity.
Measure
Teams ship and immediately pivot to the next sprint. No loop closure, no compounding.
AI-powered analytics closes the loop automatically. Every cycle informs the next. The product development process finally compounds.
"The companies that use AI to do faster execution will ship wrong things faster. The companies that use AI to do better discovery will build the right things, faster. Only one of those is a durable advantage."

What a corrected allocation looks like

If time allocation were to converge toward the importance vector, what would a corrected squad look like? We model two scenarios: a minimal correction (close the gap by 50%) and an AI-native allocation (time tracks importance proportionally).

Current vs target time allocation · normalized shares
Orange = current allocation. Teal = importance-aligned target. The gap between them is the reallocation opportunity.
What alignment improvement looks like in practice Moving the overall alignment score from 50% to 80% does not require adding headcount. It requires redirecting approximately 20–25% of total squad hours from Execute toward Discover and Measure. On a 16-person squad over a 12-week cycle, that is roughly 800 hours — the equivalent of one full-time senior engineer — moving from shipping code to understanding users and measuring outcomes. In an AI-assisted world, that engineer's coding output can be replaced by AI. Their discovery and measurement work cannot.

The three-sentence strategic conclusion

The current industry time allocation reflects the constraints of an era that is ending. The steps that matter most for product success — Discover (32%), Define (24%), and Measure (16%) — together receive only 32% of squad time, while the most automatable step receives 50%. The first generation of product teams to redirect that freed execution time upstream will build systematically better products than their peers — not because they work harder, but because they work in the right places.

The core finding

56% of outcomes are determined before a line of code is written

Discover + Define together drive 56% of product success. They receive 23% of squad time. This is the fundamental misalignment the industry must correct.

The opportunity

~800 hours per cycle available for reallocation on a 16-person squad

As Execute is automated, these hours can flow to Discover and Measure — the two steps most underinvested relative to their importance.

The coordination multiplier

27.7% of total time is pure coordination waste

This overhead exists only because multiple humans must synchronize state. A system that holds full product context collapses this tax structurally.

The risk

AI adoption without process change accelerates failure

Teams that automate Execute without restructuring their Discover and Measure investment will simply build wrong things faster — at lower cost, until they run out of money.