← Management Note | Home About Maritime Note
Statistics for BusinessBusiness StatisticsReay's Note

Statistics Study Notes
Correlation · Regression · Hypothesis Testing

From relationships between variables, to predictive modeling, to verifying whether results are credible — a visual study note connecting core statistical concepts with business decision-making.

Executive Summary

The Core of Statistical Analysis: From Data Insight to Business Action

Statistical analysis is not just about calculating numbers. It is a decision path from observing relationships, to quantifying effects, to verifying credibility. Correlation analysis tells us whether variables move together; regression analysis quantifies that relationship into a predictive model; hypothesis testing checks whether the observed effect is unlikely to be random noise. Used together, these tools turn data into business insights worth acting on.

3Core statistical topics
rCorrelation coefficient: -1 to +1
Regression model explanatory power
0.05Common significance level α
💡

Learning and Application Tips

  • Correlation ≠ Causation: Correlation only shows that variables move together. It does not prove that one causes the other; additional research is needed to establish causality.
  • Look at the plot before the numbers: Always inspect residual plots before interpreting R²; relying only on p-values or R² can lead to a misleading model interpretation.
  • Statistical significance ≠ Business importance: Even when the p-value is significant, the effect size must still be large enough to justify business resources.
The Analytical Workflow

A Four-Step Statistical Analysis Workflow

From business question to decision, each step answers one key question.

Step 1

Use correlation analysis to see whether two variables move together and decide whether further modeling is worthwhile.

Correlation Analysis · Is there a relationship?
Step 2

Build a simple linear regression model to quantify the relationship as a predictive equation.

Regression Analysis · How large is the effect?
Step 3

Use hypothesis testing and the p-value to check whether the observed effect is statistically significant.

Hypothesis Testing · Is the result credible?
Step 4

Apply a substantive hypothesis lens to judge whether the statistical result is worth turning into a business action.

Substantive Assessment · Is it worth acting on?
01
Correlation Analysis

Correlation Analysis

Correlation analysis measures the strength and direction of the linear relationship between two quantitative variables. It is often the starting point of analysis.

Core Concepts
Definition

Correlation Coefficient r

  • Measures the strength of the linear relationship between two quantitative variables
  • Range: -1 to +1
  • r = +1 indicates perfect positive correlation; r = -1 indicates perfect negative correlation; r = 0 indicates no linear relationship
Three Types

Direction and Strength

  • Positive correlation: variables move in the same direction (r close to +1)
  • Negative correlation: variables move in opposite directions (r close to -1)
  • Zero correlation: no clear linear relationship (r close to 0)
Variable Roles

Independent vs. Dependent Variable

  • Independent variable X: the possible driver, usually placed on the x-axis
  • Dependent variable Y: the outcome being predicted, usually placed on the y-axis
  • Causality requires additional research; correlation is only the starting point
⚠️
Core warning: correlation does not imply causation.Two variables moving together does not mean one causes the other to change. A third factor may exist, or the relationship may simply be coincidental.
Business Applications
Marketing

Ad Spend and Sales

  • Analyze the linear relationship between advertising spend and sales revenue
  • r close to +1 suggests a strong positive relationship and supports further investigation of ad effectiveness
Scenario:An e-commerce company finds that Google Ads spend and weekly order volume have r = 0.87, then decides to increase budget allocation.
Finance

Stocks and Market Index

  • Analyze how strongly an individual stock moves with the overall market index (the beta concept)
  • Combining negatively correlated assets can reduce overall portfolio risk
Scenario:A fund manager finds that two stocks have r = -0.73 and pairs them to reduce volatility risk.
Retail

Store Factors and Sales

  • Analyze correlations between foot traffic, number of competitors, and sales
  • Identify store improvements that deserve investment priority
Scenario:A supermarket chain finds that foot traffic and revenue have r = 0.91 and prioritizes expansion in high-traffic areas.
02
Simple Linear Regression

Simple Linear Regression

Regression analysis quantifies the relationship between variables as an equation, allowing known data to be used to predict unknown values.

Core Concepts
Regression Equation

Probabilistic Model

Yᵢ = α + βXᵢ + εᵢ
  • α: y-intercept (true but unobserved)
  • β: slope (the effect of each unit of X on Y)
  • εᵢ: error term (unexplained random factors)
Model Evaluation

R-squared (R²)

R² = r² Range: 0 → 1
  • Measures how much variation in the dependent variable is explained by the independent variable
  • R² < 0.5 is often treated as limited explanatory power
  • R² close to 1 indicates strong model fit
Interpretation Rule

Best-Practice Combination

  • Check residual plots before interpreting R²
  • Low p-value + high R² → strongest explanatory model
  • In complex behavioral domains, a low R² may still have research value
📈
Reading residual plots:An ideal residual plot should be randomly scattered with no visible pattern. Curves, funnel shapes, or systematic bias suggest possible model misspecification, so R² should not be trusted on its own.
Business Applications
Business Forecasting

Sales Forecasting

  • Use advertising spend X to predict next-quarter sales Y
  • The β coefficient estimates how much sales revenue changes for each additional dollar of ad spend
Scenario:A brand builds an SLR model showing that each additional NT$10,000 in ad spend predicts roughly NT$42,000 in additional sales.
Real Estate

Asset Valuation

  • Use floor area and location score to predict market selling price
  • A high R² indicates these variables explain much of the price variation
Scenario:A real estate firm uses floor area as X to fit a regression line and estimate reasonable selling prices, with R² = 0.82.
Logistics

Transportation Cost Forecasting

  • Use shipment weight X to predict transportation cost Y
  • Use R² to assess whether weight is a major driver of transportation cost
Scenario:A logistics provider builds a regression equation so its automated quoting system can estimate prices from weight in real time.
03
Hypothesis Testing

Hypothesis Testing

Hypothesis testing uses sample data to decide whether there is enough evidence to reject the null hypothesis and whether the observed effect is unlikely to be random chance.

Hypothesis Types and p-Value
Three Types

Hypothesis Classification

  • Research hypothesis: the researcher’s initial expectation about the result
  • Statistical hypothesis: expressed mathematically as H₀ and Hₐ
  • Substantive hypothesis: whether the result has practical value for business decisions
H₀ vs. Hₐ

Null and Alternative Hypotheses

  • H₀ (null hypothesis): no change from the status quo
  • Hₐ (alternative hypothesis): a new effect or the researcher’s proposed claim
  • Testing says “fail to reject H₀,” not “accept H₀”
p-Value Decision

Decision Rule

  • p-value ≤ α → reject H₀ (commonly α = 0.05)
  • p-value > α → fail to reject H₀
  • Mnemonic: P low, null go
Two Types of Error
Type I Error

Type I Error (False Positive)

  • Incorrectly rejecting a true H₀
  • Probability of occurrence = α (significance level)
  • Analogy: convicting an innocent person
Type II Error

Type II Error (False Negative)

  • Failing to reject a false H₀
  • Probability of occurrence = β; statistical power = 1 − β
  • Analogy: letting a guilty person go free
HTAB Hypothesis Testing Workflow
H
Hypothesis
Set H₀ and Hₐ and define the claim being tested
T
Test
Choose the method, set α, collect the sample, and analyze
A
Action
Use the p-value to decide whether to reject or fail to reject H₀
B
Business Meaning
Translate statistical conclusions into business decisions
Business Applications
Marketing Experiment

A/B Test

  • H₀: no difference in click-through rate between the new and old homepage
  • p-value < 0.05 → reject H₀; the new version has a significant effect
Scenario:An e-commerce platform tests a new checkout flow, obtains p = 0.03, and decides to roll out the new design.
Quality Control

Process Quality Inspection

  • H₀: the product’s average weight meets specification (μ = 500g)
  • Use a two-tailed test; deviations in either direction must be addressed
Scenario:A food manufacturer samples 500g packages and obtains p = 0.01 → reject H₀; the machine must be recalibrated.
Market Strategy

Market Share Test

  • H₀: market share ≤ 18% (the campaign has no effective lift)
  • Use a right-tailed one-tailed test, focusing only on whether market share has increased
Scenario:After a marketing campaign, a medical device company obtains p = 0.02 → reject H₀; market share increased significantly.
🎯
Statistical significance ≠ Business importance.Even if the p-value is significant, the effect must still be large enough and valuable enough to justify resources. This is the core question of the substantive hypothesis.
04
Integrated Case Study

Integrated Case: Connecting the Three Main Steps

Using an e-commerce advertising budget decision, this case walks through the full decision path: correlation analysis, regression analysis, and hypothesis testing.

🛒

Business Context: E-Commerce Advertising Budget Decision

An e-commerce marketing team wants to know whether increasing the Google Ads budget can truly raise sales, and wants to build a model to forecast an effective spending level.

1
Correlation Analysis
Are ad spend and sales related?
Correlation Coefficient r
0.89
Correlation Strength
Strong positive correlation
Sample Size
24 months
  • r = 0.89 → ad spend and sales have a strong positive correlation and move in the same direction
  • The scatter plot shows no obvious nonlinear trend, so a linear model is appropriate
Conclusion:Ad spend and sales show a strong linear relationship (r = 0.89), so it is worth building a regression model to quantify the relationship.
2
Regression Analysis
How much does sales increase for each additional NT$10,000 of ad spend?
Ŷ = 12.5 + 4.2 × X (Unit: NT$10,000)
Intercept α
12.5
Slope β
4.2
0.79
  • Intercept 12.5: even without advertising, the brand still has a baseline level of organic sales
  • Slope 4.2: each additional NT$10,000 in ad spend predicts roughly NT$42,000 in additional sales
  • R² = 0.79: ad spend explains 79% of the variation in sales, indicating good fit
  • The residual plot is randomly scattered with no systematic bias, supporting model credibility
Forecasting application:If next month’s ad budget is NT$200,000 → predicted sales = 12.5 + 4.2 × 20 = NT$965,000.
Conclusion:Each NT$10,000 of ad spend is associated with approximately NT$42,000 in incremental sales, implying an ROI of about 4.2x.
3
Hypothesis Testing HTAB
Does this effect truly exist statistically?
H
Hypothesis
H₀: β = 0 (ads have no effect)
Hₐ: β ≠ 0 (ads have an effect)
T
Test
α = 0.05
two-tailed t-test
A
Action
p = 0.0003
p < 0.05 → reject H₀
B
Business Meaning
Advertising has a statistically significant effect on sales; continued investment is recommended
p-value
0.0003
Significance Level α
0.05
Test Result
Reject H₀
  • p = 0.0003 is far below α = 0.05 → strongly reject H₀
  • The positive effect of ad spend on sales is statistically significant and unlikely to be random chance
Conclusion:β is significantly different from zero (p = 0.0003), statistically establishing the effect of advertising on sales.
D
Substantive Hypothesis Assessment
Is this statistical result worth taking business action on?
Step 1 · Correlation Analysis
Is there a relationship?
r = 0.89, strong positive correlation
Step 2 · Regression Analysis
How large is the effect?
Each NT$10,000 brings NT$42,000 in sales
Step 3 · Hypothesis Testing
Is the result credible?
p = 0.0003, statistically significant
🚀
Business decision:All three statistical results support increasing the advertising budget. With an ROI of about 4.2x and strong statistical significance, the marketing team should increase Google Ads spending and continue updating the model over time.
⚠️
Caution:Statistical significance does not automatically mean the business should act. The final budget decision should still consider ad cost, market saturation, and the opportunity cost of other marketing channels.

Three Rules for Analysis

Put these principles into every data analysis and business decision checklist.

01 | Correlation ≠ Causation Correlation analysis only shows that variables move together. Causal claims require experimental design or more rigorous research.
02 | Look at the Plot Before the Numbers Inspect residual plots and data distributions before interpreting R² and p-values, so a single number does not mislead the model assessment.
03 | Significant ≠ Important Statistical significance is only a threshold. Use a substantive hypothesis to evaluate effect size, cost-effectiveness, and strategic value.