# Formulas

95% Confidence Interval
$$sample\;statistic \pm 2 (standard\;error)$$
Between Groups (Numerator) Degrees of Freedom

$$df_{between}=k-1$$

$$k$$ = number of groups

Binomial Random Variable Probability

$$P(X=k)=\binom{n}{k}p^k(1-p)^{n-k}$$

$$n$$ = number of trials
$$k$$ = number of successes
$$p$$ = probability event of interest occurs on any one trial

Chi-Square ($$\chi^2$$) Test Statistic

$$\chi^2=\Sigma \frac{(Observed-Expected)^2}{Expected}$$

Complement of A
$$P(A^{C})=1−P(A)$$
Conditional Probability of A Given B

$$P(A\mid B)=\frac{P(A \: \cap\: B)}{P(B)}$$

Conditional Probability of B Given A

$$P(B\mid A)=\frac{P(A \: \cap\: B)}{P(A)}$$

Confidence Interval for a Population Mean

$$\overline{x} \pm t^{*} \frac{s}{\sqrt{n}}$$

Confidence Interval for the Difference Between Two Paired Means

$$\overline{x}_d \pm t^* \left(\frac{s_d}{\sqrt{n}}\right)$$

$$t^*$$ is the multiplier with $$df = n-1$$

Confidence Interval for the Difference Between Two Proportions
$$(\widehat{p}_1-\widehat{p}_2) \pm z^\ast {\sqrt{\frac{\widehat{p}_1 (1-\widehat{p}_1)}{n_1}+\frac{\widehat{p}_2 (1-\widehat{p}_2)}{n_2}}}$$
Confidence Interval for Two Independent Means

$$(\bar{x}_1-\bar{x}_2) \pm t^\ast{ \sqrt{\frac{s_1^2}{n_1}+\frac{s_2^2}{n_2}}}$$

Confidence Interval of $$p$$: Normal Approximation Method

$$\widehat{p} \pm z^{*} \left ( \sqrt{\frac{\hat{p} (1-\hat{p})}{n}} \right)$$

Confidence Interval of $$\beta_1$$

$$b_1 \pm t^\ast (SE_{b_1})$$

$$b_1$$ = sample slope
$$t^\ast$$ = value from the $$t$$ distribution with $$df=n-2$$
$$SE_{b_1}$$ = standard error of $$b_1$$

Degrees of Freedom: Chi-Square Test of Independence

$$df=(number\;of\;rows-1)(number\;of\;columns-1)$$

Estimated Degrees of Freedom

$$df=smallest\;n-1$$

Expected Cell Value

$$E=\frac{row\;total \; \times \; column\;total}{n}$$

Expected Count

$$Expected\;count=n (p_i)$$

$$n$$ is the total sample size
$$p_i$$ is the hypothesized proportion of the "ith" group

Finding Sample Size for Estimating a Population Proportion

$$n=\left ( \frac{z^*}{M} \right )^2 \tilde{p}(1-\tilde{p})$$

$$M$$ is the margin of error
$$\tilde p$$ is an estimated value of the proportion

Finding the Sample Size for Estimating a Population Mean

$$n=\frac{z^{2}\widetilde{\sigma}^{2}}{M^{2}}=\left ( \frac{z\widetilde{\sigma}}{M} \right )^2$$

$$z$$ = z multiplier for given confidence level
$$\widetilde{\sigma}$$ = estimated population standard deviation
$$M$$ = margin of error

Five Number Summary

(Minimum, $$Q_1$$, Median, $$Q_3$$, Maximum)

General Form of 95% Confidence Interval

$$sample\ statistic\pm2\ (standard\ error)$$

General Form of a Test Statistic

$$test\;statistic=\frac{sample\;statistic-null\;parameter}{standard\;error}$$

Interquartile Range

$$IQR = Q_3 - Q_1$$

Intersection

$$P(A\cap B) =P(A)\times P(B\mid A)$$

Mean of a Binomial Random Variable

$$\mu=np$$
Also known as $$E(X)$$

Observed Sample Mean Difference

$$\overline{x}_d=\frac{\Sigma{x}_d}{n}$$

$$x_d$$ = observed difference

Odds

$$odds = \frac {number \;with \;the\; outcome}{number \;without \;the \;outcome}$$

OR

$$odds=\frac{risk}{1-risk}$$

Pearson's r: Conceptual Formula

$$r=\frac{\sum{z_x z_y}}{n-1}$$
where $$z_x=\frac{x - \overline{x}}{s_x}$$ and $$z_y=\frac{y - \overline{y}}{s_y}$$

Pooled Estimate of $$p$$

$$\widehat{p}=\frac{\widehat{p}_1n_1+\widehat{p}_2n_2}{n_1+n_2}$$

Population Mean

$$\mu=\frac{\Sigma x}{N}$$

Power

$$Power = 1-\beta$$

$$\beta$$ = probability of committing a Type II Error.

Probability of Event A
$$P(A)=\frac{Number\;in\;group\;A}{Total\;number}$$
Proportion

$$Proportion=\frac{Number\;in\;the\;category}{Total\;number}$$

Range

$$Range = Maximum - Minimum$$

Relative Risk
$$Relative\ Risk=\frac{Risk\ in\ Group\ 1}{Risk\ in\ Group\ 2}$$
Residual

$$e_i =y_i -\widehat{y}_i$$

$$y_i$$ = actual value of y for the ith observation
$$\widehat{y}_i$$ = predicted value of y for the ith observation

Risk

$$Risk= \frac{number \;with \;the\; outcome}{total\;number\;of\;outcomes}$$

Sample Standard Deviation

$$s=\sqrt{\frac{\sum (x-\overline{x})^{2}}{n-1}}$$

Sample Variance

$$s^{2}=\frac{\sum (x-\overline{x})^{2}}{n-1}$$

Simple Linear Regression Line: Population

$$\widehat{y}=\alpha+\beta x$$

Simple Linear Regression Line: Sample

$$\widehat{y}=a+bx$$

$$\widehat{y}$$ = predicted value of $$y$$ for a given value of $$x$$
$$a$$ = $$y$$-intercept
$$b$$ = slope

Slope

$$b_1 =r \frac{s_y}{s_x}$$

$$r$$ = Pearson’s correlation coefficient between $$x$$ and $$y$$
$$s_y$$ = standard deviation of $$y$$
$$s_x$$ = standard deviation of $$x$$

Standard Deviation of a Binomial Random Variable

$$\sigma=\sqrt {np(1-p)}$$

Standard Deviation of the Differences
$$s_d=\sqrt{\frac{\sum (x_d-\overline{x}_d)^{2}}{n-1}}$$
Standard Error

$$\sqrt{\frac{s_1^2}{n_1}+\frac{s_2^2}{n_2}}$$

Sum of Squared Residuals

Also known as Sum of Squared Errors (SSE)
$$SSE=\sum (y-\widehat{y})^2$$

Sum of Squares

$$SS={\sum (x-\overline{x})^{2}}$$

Test Statistic

$$test\; statistic = \frac{sample \; statistic - null\;parameter}{standard \;error}$$

Test Statistic for Dependent Means

$$t=\frac{\bar{x}_d-\mu_0}{\frac{s_d}{\sqrt{n}}}$$

$$\overline{x}_d$$ = observed sample mean difference
$$\mu_0$$ = mean difference specified in the null hypothesis
$$s_d$$ = standard deviation of the differences
$$n$$ = sample size (i.e., number of unique individuals)

Test Statistic for Dependent Means

$$t=\frac{\bar{x}_d-\mu_0}{\frac{s_d}{\sqrt{n}}}$$

$$\overline{x}_d$$ = observed sample mean difference
$$\mu_0$$ = mean difference specified in the null hypothesis
$$s_d$$ = standard deviation of the differences
$$n$$ = sample size (i.e., number of unique individuals)

Test Statistic for Two Independent Proportions

$$z=\frac{\widehat{p}_1-\widehat{p}_2}{SE_0}$$

Test statistic: One Group Proportion

$$z=\frac{\widehat{p}- p_0 }{\sqrt{\frac{p_0 (1- p_0)}{n}}}$$

$$\widehat{p}$$ = sample proportion
$$p_{0}$$ = hypothesize population proportion
$$n$$ = sample size

Union
$$P(A\cup B) = P(A)+P(B)-P(A\cap B)$$
Within Groups (Denominator, Error) Degrees of Freedom

$$df_{within}=n-k$$

$$n$$ = total sample size with all groups combined

$$k$$ = number of groups

y-intercept

$$b_0=\overline {y} – b_1 \overline {x}$$

$$\overline {y}$$ = mean of $$y$$
$$\overline {x}$$ = mean of $$x$$
$$b_1$$ = slope

z Test Statistic: One Group Mean

$$z=\frac{\overline{x}-\mu_0}{\frac{\sigma}{\sqrt{n}}}$$

$$\overline{x}$$ = sample mean
$$\mu_{0}$$ = hypothesized population mean
$$s$$ = sample standard deviation
$$n$$ = sample size

z-score

$$z=\frac{x - \overline{x}}{s}$$

$$x$$ = original data value
$$\overline{x}$$ = mean of the original distribution
$$s$$ = standard deviation of the original distribution