Use correlation analysis to see whether two variables move together and decide whether further modeling is worthwhile.
Correlation Analysis · Is there a relationship?The Core of Statistical Analysis: From Data Insight to Business Action
Statistical analysis is not just about calculating numbers. It is a decision path from observing relationships, to quantifying effects, to verifying credibility. Correlation analysis tells us whether variables move together; regression analysis quantifies that relationship into a predictive model; hypothesis testing checks whether the observed effect is unlikely to be random noise. Used together, these tools turn data into business insights worth acting on.
Learning and Application Tips
- Correlation ≠ Causation: Correlation only shows that variables move together. It does not prove that one causes the other; additional research is needed to establish causality.
- Look at the plot before the numbers: Always inspect residual plots before interpreting R²; relying only on p-values or R² can lead to a misleading model interpretation.
- Statistical significance ≠ Business importance: Even when the p-value is significant, the effect size must still be large enough to justify business resources.
A Four-Step Statistical Analysis Workflow
From business question to decision, each step answers one key question.
Build a simple linear regression model to quantify the relationship as a predictive equation.
Regression Analysis · How large is the effect?Use hypothesis testing and the p-value to check whether the observed effect is statistically significant.
Hypothesis Testing · Is the result credible?Apply a substantive hypothesis lens to judge whether the statistical result is worth turning into a business action.
Substantive Assessment · Is it worth acting on?Correlation Analysis
Correlation analysis measures the strength and direction of the linear relationship between two quantitative variables. It is often the starting point of analysis.
Correlation Coefficient r
- Measures the strength of the linear relationship between two quantitative variables
- Range: -1 to +1
- r = +1 indicates perfect positive correlation; r = -1 indicates perfect negative correlation; r = 0 indicates no linear relationship
Direction and Strength
- Positive correlation: variables move in the same direction (r close to +1)
- Negative correlation: variables move in opposite directions (r close to -1)
- Zero correlation: no clear linear relationship (r close to 0)
Independent vs. Dependent Variable
- Independent variable X: the possible driver, usually placed on the x-axis
- Dependent variable Y: the outcome being predicted, usually placed on the y-axis
- Causality requires additional research; correlation is only the starting point
Ad Spend and Sales
- Analyze the linear relationship between advertising spend and sales revenue
- r close to +1 suggests a strong positive relationship and supports further investigation of ad effectiveness
Stocks and Market Index
- Analyze how strongly an individual stock moves with the overall market index (the beta concept)
- Combining negatively correlated assets can reduce overall portfolio risk
Store Factors and Sales
- Analyze correlations between foot traffic, number of competitors, and sales
- Identify store improvements that deserve investment priority
Simple Linear Regression
Regression analysis quantifies the relationship between variables as an equation, allowing known data to be used to predict unknown values.
Probabilistic Model
Yᵢ = α + βXᵢ + εᵢ- α: y-intercept (true but unobserved)
- β: slope (the effect of each unit of X on Y)
- εᵢ: error term (unexplained random factors)
R-squared (R²)
R² = r² Range: 0 → 1- Measures how much variation in the dependent variable is explained by the independent variable
- R² < 0.5 is often treated as limited explanatory power
- R² close to 1 indicates strong model fit
Best-Practice Combination
- Check residual plots before interpreting R²
- Low p-value + high R² → strongest explanatory model
- In complex behavioral domains, a low R² may still have research value
Sales Forecasting
- Use advertising spend X to predict next-quarter sales Y
- The β coefficient estimates how much sales revenue changes for each additional dollar of ad spend
Asset Valuation
- Use floor area and location score to predict market selling price
- A high R² indicates these variables explain much of the price variation
Transportation Cost Forecasting
- Use shipment weight X to predict transportation cost Y
- Use R² to assess whether weight is a major driver of transportation cost
Hypothesis Testing
Hypothesis testing uses sample data to decide whether there is enough evidence to reject the null hypothesis and whether the observed effect is unlikely to be random chance.
Hypothesis Classification
- Research hypothesis: the researcher’s initial expectation about the result
- Statistical hypothesis: expressed mathematically as H₀ and Hₐ
- Substantive hypothesis: whether the result has practical value for business decisions
Null and Alternative Hypotheses
- H₀ (null hypothesis): no change from the status quo
- Hₐ (alternative hypothesis): a new effect or the researcher’s proposed claim
- Testing says “fail to reject H₀,” not “accept H₀”
Decision Rule
- p-value ≤ α → reject H₀ (commonly α = 0.05)
- p-value > α → fail to reject H₀
- Mnemonic: P low, null go
Type I Error (False Positive)
- Incorrectly rejecting a true H₀
- Probability of occurrence = α (significance level)
- Analogy: convicting an innocent person
Type II Error (False Negative)
- Failing to reject a false H₀
- Probability of occurrence = β; statistical power = 1 − β
- Analogy: letting a guilty person go free
A/B Test
- H₀: no difference in click-through rate between the new and old homepage
- p-value < 0.05 → reject H₀; the new version has a significant effect
Process Quality Inspection
- H₀: the product’s average weight meets specification (μ = 500g)
- Use a two-tailed test; deviations in either direction must be addressed
Market Share Test
- H₀: market share ≤ 18% (the campaign has no effective lift)
- Use a right-tailed one-tailed test, focusing only on whether market share has increased
Integrated Case: Connecting the Three Main Steps
Using an e-commerce advertising budget decision, this case walks through the full decision path: correlation analysis, regression analysis, and hypothesis testing.
- r = 0.89 → ad spend and sales have a strong positive correlation and move in the same direction
- The scatter plot shows no obvious nonlinear trend, so a linear model is appropriate
- Intercept 12.5: even without advertising, the brand still has a baseline level of organic sales
- Slope 4.2: each additional NT$10,000 in ad spend predicts roughly NT$42,000 in additional sales
- R² = 0.79: ad spend explains 79% of the variation in sales, indicating good fit
- The residual plot is randomly scattered with no systematic bias, supporting model credibility
Hₐ: β ≠ 0 (ads have an effect)
two-tailed t-test
p < 0.05 → reject H₀
- p = 0.0003 is far below α = 0.05 → strongly reject H₀
- The positive effect of ad spend on sales is statistically significant and unlikely to be random chance
Three Rules for Analysis
Put these principles into every data analysis and business decision checklist.