Data Interpretation for PMs — the pm manual

The data never lies. But the person presenting it to you almost always does — including you.

Talvinder Singh, from a Pragmatic Leaders analytics masterclass

You have the dashboard. You have the metrics. You chose them carefully using a proper framework. None of that matters if you read them wrong.

Data interpretation is where most PMs fail silently. Not because they lack access to data, but because they draw confident conclusions from ambiguous numbers. They see a line going up and call it growth. They see two metrics moving together and call it causation. They present a number to leadership and nobody in the room knows it is misleading — including the PM.

This page is about the traps. The specific, predictable ways that data misleads smart people. And the questions you need to ask before trusting any number on any screen.

Correlation is not causation (and you know this, but you still get it wrong)

Every PM has heard this phrase. Almost none of them apply it consistently.

Here is why: when you are under pressure to show results, your brain desperately wants the data to tell a simple story. You shipped feature X. Metric Y went up. Therefore X caused Y. Your stakeholders want this story too. Nobody in the room has the incentive to question it.

But "X happened, then Y happened" is not proof that X caused Y.

// scene:

Quarterly business review. The payments team at a fintech startup in Pune is presenting results.

Payments PM: “After we launched UPI Lite, transaction volume increased 34% month-over-month.”

CFO: “Great. So UPI Lite drove the increase?”

Payments PM: “The numbers line up perfectly with our launch date.”

Data Analyst: “Actually, RBI mandated UPI interoperability that same week. Every payments app in India saw a volume spike.”

The PM had attributed an industry-wide regulatory tailwind to their feature launch. The CFO had almost approved a Rs 2 crore budget increase based on this misattribution.

// tension:

The feature may have contributed. But the PM presented correlation as if it were proof.

The fix is not to stop claiming causation. The fix is to earn the right to claim it:

Ask the counterfactual. What would have happened if you had not shipped the feature? If you cannot answer this — if you have no control group, no holdback, no baseline — then you do not have causation. You have a coincidence with a compelling narrative.

Check for confounders. What else changed during the same period? A new marketing campaign? A competitor's outage? A seasonal pattern? A regulatory change? List every external factor that could explain the movement, and rule them out before attributing it to your work.

Look at the timing. If metric Y started moving before feature X launched, X did not cause Y. This sounds obvious. In practice, people rarely check. They look at the post-launch number and stop there.

Run an experiment. A/B tests are the gold standard for causation. If you cannot run a controlled experiment (small user base, ethical constraints, infrastructure limitations), at least use a quasi-experimental method: compare the affected cohort to a similar unaffected cohort, or use a time-series analysis that accounts for trends.

Simpson's paradox: when the aggregate lies

This is the trap that catches even experienced data people. Simpson's paradox occurs when a trend that appears in several groups reverses when the groups are combined.

Here is a real-world example adapted from an Indian e-commerce context:

You are a PM at an online marketplace. You compare conversion rates across two seller categories — electronics and fashion — for mobile and desktop users.

	Mobile conversion	Desktop conversion
Electronics	4.2%	3.8%
Fashion	6.1%	5.5%

Mobile converts better in both categories. Obviously mobile is the better-performing platform, right?

But now look at the aggregate:

Platform	Overall conversion
Mobile	3.9%
Desktop	5.1%

Desktop converts better overall. How?

Because 80% of mobile traffic goes to electronics (which has lower conversion rates overall), while 70% of desktop traffic goes to fashion (which has higher conversion rates). The mix of traffic is different across platforms. When you aggregate, the category mix drowns out the within-category performance.

The lesson: whenever you see an aggregate number, ask: "What sub-groups does this combine, and could the mix between sub-groups be skewing the result?" This is especially important in India where user behavior varies dramatically across tiers, languages, and platforms. A national average conversion rate is almost meaningless for a product that serves both Mumbai and Meerut.

// thread: #product-analytics — Post-launch analysis of a checkout redesign

Ankit (PM)Checkout completion went from 62% to 58% after the redesign. Should we roll back?

Deepa (Data)Hold on. Let me break it down by payment method.

UPI users: 65% -> 71%. Card users: 58% -> 63%. COD users: 70% -> 52%.

The redesign improved UPI and card flows. But it added an extra confirmation step for COD that is killing that segment.

Ankit (PM)So the aggregate number was hiding the fact that two out of three segments improved.

Deepa (Data)Exactly. Fix the COD flow. Keep everything else.bulb (4)

Survivorship bias: the data you never see

Your analytics dashboard shows you data about users who are still around. It tells you nothing about users who left.

This is survivorship bias, and it warps every product decision you make if you are not careful.

When you analyze feature usage, you are analyzing feature usage among people who stayed. The users who found the feature confusing, or irrelevant, or broken — they churned. They are not in your data anymore. So your usage metrics are systematically optimistic. They describe the experience of your happiest users, not your average users.

Where this hits Indian PMs hard:

Tier-2/3 drop-off. You look at onboarding completion rates and see 78%. Looks good. But 40% of users from smaller towns dropped off before even reaching the onboarding screen because the app took 15 seconds to load on their device. They never entered your funnel. Your 78% describes the survivors.
Language self-selection. Your English-language product has high engagement. You conclude the product works well. But every Hindi-speaking user who could not navigate the English UI left in the first session. You are measuring a self-selected audience that already had no language barrier.
Payment method bias. Your data shows UPI users have 3x the lifetime value of COD users. You conclude UPI users are more valuable and shift marketing spend toward them. But perhaps the COD users — who are often first-time online shoppers — had a worse post-purchase experience (delayed refunds, complicated returns) that drove them away. The payment method did not cause lower LTV; the operational experience did.

The fix: Always ask "who is NOT in this data?" Before making any decision based on user analytics, think about the users who left before generating the data you are looking at.

Base rate neglect: the denominator problem

A fraud detection system catches 95% of fraudulent transactions. Sounds great. Should you trust it?

That depends entirely on the base rate. If 1 in 10,000 transactions is fraudulent, and your system flags 5% of all transactions as suspicious, then for every 10,000 transactions:

1 is actually fraudulent. Your system catches it (95% recall).
500 are flagged as suspicious. 499 of them are legitimate.

Your "95% accurate" system is wrong 499 out of 500 times it raises an alert. The false positive rate is 99.8%.

This is not a statistics trick. This is exactly what happens when PMs evaluate ML features, fraud systems, recommendation engines, or any system that deals with rare events. The base rate — how common the thing you are looking for actually is — determines whether your "accurate" system is useful or useless.

In practice: When someone tells you a model or system has "95% accuracy," your first question should be: "What is the base rate of the event we are predicting?" If the event is rare, high accuracy means almost nothing.

The seven questions to ask before trusting any number

Before you put a data point on a slide, walk through these:

What is the source? Is this from your instrumentation, a third-party tool, a manual export, or someone's spreadsheet? Each has different reliability. Event data from your own instrumentation is the most trustworthy. Screenshots of a third-party dashboard are the least.
What is the time period? A metric that looks great over 90 days might look terrible over 7 days, or vice versa. Always know the window, and ask whether a different window would tell a different story.
What is the denominator? "500 new signups" means nothing without knowing the traffic that produced them. Always pair absolute numbers with the base they come from.
Who is included and excluded? Does this number include test accounts, internal users, bot traffic? Is it filtered to a specific geography, platform, or user segment? Every filter is a choice, and every choice can change the conclusion.
What else changed? Before attributing movement to your work, list every other thing that changed during the same period. Marketing campaigns, seasonal patterns, competitor actions, platform changes, bugs that were fixed or introduced.
Could the aggregate be hiding sub-group differences? Break the number down by the most important segments — platform, geography, user tier, cohort, payment method. If the trend reverses in any segment, you have a Simpson's paradox situation.
What would make this number wrong? This is the hardest question. Actively look for reasons the data might be misleading. Instrumentation bugs, sampling errors, survivorship bias, selection effects. If you cannot think of any way the number could be wrong, you are not trying hard enough.

// exercise: · 20 min

Data interrogation drill

Pick the last data-driven decision your team made (or use this example: "We decided to sunset the web app and go mobile-only because 83% of our active users are on mobile").

Walk through all seven questions above. Write down your answers. Pay special attention to:

Question 4: Who was excluded from the "active users" count? Did web-only users churn because the web app was neglected, creating a self-fulfilling prophecy?
Question 6: Break down the 83% by user segment. Is mobile dominance true for enterprise users? For power users? For users who pay?
Question 7: What if the mobile percentage reflects your marketing channel mix (you only run mobile app install campaigns) rather than user preference?

The goal is not to reverse the decision. The goal is to know what you do not know before committing resources.

Reading charts without being misled

Dashboards are visual arguments. The person who built the chart made choices — axis scales, time windows, color coding, which series to include — and each choice shapes what story the chart tells.

Truncated y-axis. A bar chart showing revenue going from Rs 10 crore to Rs 10.5 crore looks like a 5% bump if the y-axis starts at zero. Start the y-axis at Rs 9.5 crore and it looks like a dramatic surge. Same data, different visual story. Always check where the axis starts.

Cherry-picked time windows. "Revenue is up 40% since March" sounds impressive. But if you zoom out, March was an anomalous dip. Revenue is actually flat year-over-year. The time window was chosen to make the number look good. Always ask: what does the full picture look like?

Cumulative charts. Cumulative graphs only go up. They cannot show you slowdowns, plateaus, or declines. If someone shows you a cumulative user chart, ask for the daily or weekly acquisition numbers instead. That is where the real trends live.

Averages hiding distributions. "Average session duration is 4 minutes" could mean most users spend 4 minutes, or it could mean 90% of users bounce in 10 seconds and 10% spend 40 minutes. Averages destroy information. Ask for medians and distributions whenever possible.

Test yourself

// interactive:

The Misleading Metric

You are a PM at a food delivery startup in Hyderabad. Your CEO walks into the Monday standup and says: 'Our average delivery time dropped from 38 minutes to 31 minutes last month. The ops team did a fantastic job. Let us publish this in our next investor update.' You have access to the raw data. What do you do?

You pull up the delivery time data. You notice the average dropped, but you have not looked at the distribution or segmentation yet. How do you proceed?

A checklist for your next data review

Before your next metrics review, tape this to your monitor:

Am I looking at a correlation or a proven cause?
Have I checked for confounders?
Could the aggregate be hiding sub-group reversals?
Who is missing from this data?
What is the base rate of the event I am measuring?
Did the measurement methodology change?
What would make me wrong?

The goal is not to distrust all data. The goal is to trust data for the right reasons — because you interrogated it, not because it told you what you wanted to hear.

// learn the judgment

Ather Energy's PM sees that user engagement with the vehicle's mobile app dropped 23% in week 3 after a major firmware update. The engineering team says no functionality changed. Customer support shows a 40% increase in 'app not syncing' tickets that week.

The call: Do you roll back the firmware update, hotfix the app, or investigate before deciding?

Your reasoning:

// practice

Your task: Do you roll back the firmware update, hotfix the app, or investigate before deciding?

your reasoning:

0 chars (min 80)

Where to go next

Design tests that prove causation: Experimentation — A/B testing, holdback groups, and what to do when you cannot run a controlled experiment
Choose the right metrics first: Metrics That Matter — how to pick metrics before you start interpreting them
The judgment layer: Data-Informed Decision Making — when to trust the numbers, when to override them, and the traps that catch smart PMs
When metrics drop: Diagnosing Metric Drops — the SLICE framework for systematic diagnosis