1. Creating/Describing Distributions
a. Creating a histogram—by hand and on calculator
b. Creating a boxplot—*by hand* and on calculator
i. Determining outliers using fences
c. Creating/Reading/Describing…
i. Dotplot
ii. Stem and Leaf plot
iii. Cumulative Frequency Histogram
d. Describing a distribution
i. Shape: skewed vs. symmetric
ii. Center: mean vs. median
iii. Spread: standard deviation vs. IQR (and can also use range)
iv. Note gaps
v. Note outliers
e. Adding a constant to a data set: affects center, but not spread
f. Multiplying by a constant to a data set: affects both center and spread
g. The Normal Model
i. What is a z-score?
ii. Calculating z-scores
iii. Using z-scores to find probability
1. Normalcdf(lower bound, upper bound, mean, standard deviation)
2. Using z-table (not necessary if you can use normalcdf)
3. 68/95/99.7 rule
2. Linear Regression
a. Interpret slope
b. Interpret y-intercept
c. Reading computer output—identify slope, y-int, standard deviation of x, standard deviation of residuals
d. Interpreting the Coefficient of Determination (R^2)
e. Describing a scatterplot:
i. Shape, direction, strength (r)
f. Examining/creating a residual plot
i. Overestimate: residual is (-); Underestimate: residual is (+)
g. Finding LSRL with calculator
i. STATÃ CALCÃ (8)LinReg(a+bx) L1, L2, Y1
1. Used to find r, R^2
2. Also need to do this before you can look at a residual plot
h. Outliers, Influence, and Leverage
i. Lurking Variables
3. Sample Surveys
a. Understanding randomness
i. Describing randomization processes—using random number generator, cards, names from a hat, etc.
ii. Using random number tables
b. Sampling Methods
i. SRS
ii. Stratified
iii. Cluster
iv. Convenience
v. Systematic
c. Bias: over or under representing a specific group in the population
i. Response Bias
ii. Nonresponse Bias—people have the choice, and some do not respond, leaving out part of the population
iii. Voluntary Response Bias
iv. Undercoverage—your design misses part of the population
4. Experimental Design
a. Writing experimental designs/procedures
i. Response Variable
ii. Factors, levels, treatments
iii. Control, Randomization, Replication—and don’t forget to comment on comparison!
b. Blocking: create homogenous groups to allow for better comparison
c. Confounding Variables
d. Single vs. Double Blind
5. Observational Studies
a. Retrospective vs. Prospective
b. Matching (same as blocking, but for observational studies)
6. Probability
a. Venn Diagrams
b. Conditional Probability
i. Tree Diagrams
c. Independence Formula: P(B/A) = P(A)
d. Expected Value and Variance
i. Remember, we cannot add standard deviations but we can ALWAYS ADD VARIANCES
ii. Using a Normal model after finding a new E(X), variance
e. “And, Or, Not, Given”
f. Binomial Probability Distribution
i. Binomialpdf( Ã Used when given a specific sample size and one specific number of successes
ii. Binomialcdf( Ã Cumulative; used when given a specific sample size and multiple numbers of successes
g. Geometric Probability Distribution
i. Used to calculate the “first” (Hint: if you simply use the ideas of “and,or” you won’t really need to use a geometric distribution)
h. Mutually Exclusive/Disjoint VS. Independent
7. Statistical Inference
a. See statistical inference chart