Your cart is empty!
Cumulative AP Exam Study Guide (Statistics)
AP Statistics Cumulative AP Exam Study Guide Statistics – the science of collecting, analyzing, and drawing conclusions from data. Descriptive – methods of organizing and summarizing statistics Inferential – making generalizations from a sample to the population. Population – an entire collection of individuals or objects. Sample – A subset of the population selected for study. Variable – any characteristic whose value changes. Data – observations on single or multi-variables. Variables Categorical – (Qualitative) – basic characteristics Numerical – (Quantitative) – measurements or observations of numerical data. Discrete – listable sets (counts) Continuous – any value over an interval of values (measurements) Univariate – one variable Bivariate – two variables Multivariate – many variables Distributions Symmetrical – data on which both sides are fairly the same shape and size. “Bell Curve” Uniform – every class has an equal frequency (number) “a rectangle” Skewed – one side (tail) is longer than the other side. The skewness is in the direction that the tail points (left or right) Bimodal – data of two or more classes have large frequencies separated by another class between them. “double hump camel” How to describe numerical graphs - S.O.C.S. Shape – overall type (symmetrical, skewed right left, uniform, or bimodal) Outliers – gaps, clusters, etc. Center – middle of the data (mean, median, and mode) Spread – refers to variability (range, standard deviation, and IQR) *Everything must be in context to the data and situation of the graph. *When comparing two distributions – MUST use comparative language! Parameter – value of a population (typically unknown) Statistic – a calculated value about a population from a sample(s). Measures of Center Median – the middle point of the data (50th percentile) when the data is in numerical order. If two values are present, then average them together. Mean – μ is for a population (parameter) and x is for a sample (statistic). Mode – occurs the most in the data. There can be more then one mode, or no mode at all if all data points occur once. Variability – allows statisticians to distinguish between usual and unusual occurrences. Measures of Spread (variability) Range – a single value – (Max – Min) IQR – interquartile range – (Q3 – Q1) Standard deviation – σ for population (parameter) & s for sample (statistic) – measures the typical or average deviation of observations from the mean – sample standard deviation is divided by df = n-1 *Sum of the deviations from the mean is always zero! Variance – standard deviation squared Resistant – not affected by outliers. Resistant Non-Resistant Median Mean IQR Range Variance Standard Deviation Correlation Coefficient (r) Least Squares Regression Line (LSRL) Coefficient of Determination 2 r Comparison of mean & median based on graph type Symmetrical – mean and the median are the same value. Skewed Right – mean is a larger value than the median. Skewed Left – the mean is smaller than the median. *The mean is always pulled in the direction of the skew away from the median. Trimmed Mean – use a % to take observations away from the top and bottom of the ordered data. This possibly eliminates outliers. Linear Transformations of random variables μa +bx =a +bμx The mean is changed by both addition (subtract) & multiplication (division). σa +bx = a bx b x The standard deviation is changed by multiplication (division) ONLY. Combination of two (or more) random variables μx ±y = μx ± μy Just add or subtract the two (or more) means 2 2 x y x y Always add the variances – X & Y MUST be independent Z –Score – is a standardized score. This tells you how many standard deviations from the mean an observation is. It creates a standard normal curve consisting of z-scores with a μ = 0 & σ = 1. x z Normal Curve – is a bell shaped and symmetrical curve. As σ increases the curve flattens. As σ decreases the curve thins. Empirical Rule (68-95-99.7) measures 1σ, 2σ, and 3σ on normal curves from a center of μ. 68% of the population is between -1σ and 1σ 95% of the population is between -2σ and 2σ 99.7% of the population is between -3σ and 3σ Boxplots – are for medium or large numerical data. It does not contain original observations. Always use modified boxplots where the fences are 1.5 IQRs from the ends of the box (Q1 & Q3). Points outside the fence are considered outliers. Whiskers extend to the smallest & largest observations within the fences. 5-Number Summary – Minimum, Q1 (1st Quartile – 25th Percentile), Median, Q3 (3rd Quartile – 75th Percentile), Maximum Probability Rules Sample Space – is collection of all outcomes. Event – any sample of outcomes. Complement – all outcomes not in the event. Union – A or B, all the outcomes in both circles. A B Intersection – A and B, happening in the middle of A and B. A B Mutually Exclusive (Disjoint) – A and B have no intersection. They cannot happen at the same time. Independent – if knowing one event does not change the outcome of another. Experimental Probability – is the number of success from an experiment divided by the total amount from the experiment. Law of Large Numbers – as an experiment is repeated the experimental probability gets closer and closer to the true (theoretical) probability. The difference between the two probabilities will approach “0”. Rules (1) All values are 0 < P < 1. (2) Probability of sample space is 1. (3) Compliment = P + (1 - P) = 1 (4) Addition P(A or B) = P(A) + P(B) – P(A & B) (5) Multiplication P(A & B) = P(A) · P(B) if a & B are independent (6) P (at least 1 or more) = 1 – P (none) (7) Conditional Probability – takes into account a certain condition. P A& B P both P A|B P B P given Correlation Coefficient – (r) – is a quantitative assessment of the strength and direction of a linear relationship. (use ρ (rho) for population parameter) Values – [-1, 1] 0 – no correlation, (0, ±0.5) – weak, [±0.5, ±0.8) – moderate, [±0.8, ±1] - strong Least Squares Regression Line (LSRL) – is a line of mathematical best fit. Minimizes the deviations (residuals) from the line. Used with bivariate data. ˆ y a bx x is independent, the explanatory variable & y is dependent, the response variable Residuals (error) – is vertical difference of a point from the LSRL. All residuals sum up to “0”. Residual = y y ˆ Residual Plot – a scatterplot of (x (or ˆ y ) , residual). No pattern indicates a linear relationship. Coefficient of Determination 2 r - gives the proportion of variation in y (response) that is explained by the relationship of (x, y). Never use the adjusted 2 r . Interpretations: must be in context! Slope (b) – For unit increase in x, then the y variable will increase/decrease slope amount. Correlation coefficient (r) – There is a strength, direction, linear association between x & y. Coefficient of determination 2 r - Approximately 2 r % of the variation in y can be explained by the LSRL of x any y. Extrapolation – LRSL cannot be used to find values outside of the range of the original data. Influential Points – are points that if removed significantly change the LSRL. Outliers – are points with large residuals.
Please Sign In to contact this author.
Master nursing concepts with this study material; Test Bank for Essentials for Nursing Practice, 8th...
1
6
Prepare for success with A&P 1 MA278 BSC2 Final Module II 2024 Q&A. Boost your grades with this comp...
0
2
Achieve top grades with the ATI Comprehensive Exam 2024, including answers for a guaranteed A. Get t...
0
0