Contents
The wonderful world of Math! If you are confident with the material below, feel free to skip this post. I'm not great at math so I find quick reviews helpful at priming my brain to be in math-mode.
1. Review of Basic Math
1.1 Real Limits
The precision of measurement depends on the unit of measurement, which is typically selected by the researcher.
Real Limits are the numbers that establish the upper and lower limits within which the true valure is contained. To determine the real limit, divide the unit of measurement in half, and add AND subtract it from $X$ to get the limit for $X$.
$X = X \pm\frac{1}{2} \text{Unit of Measurement}$
1.2 Order of Operations
The order of operations must be completed in PEDMAS:
- Parentheses;
- Exponents, square roots;
- Division, multiplication; and
- Addition, subtraction.
1.3 Summary of Notation
Here's a quick summary of common notation:
- $x$ or $y$: variables used to represent a set of observations or data.
- $N$ = number of scores or observation in a population data set.
- $n$ = number of people in each group.
- $X_i$ = score on $X$ for person $i$.
- $Y_i$ = score on $Y$ for person $i$.
- $\sum$ (sum): where you add everything together that are represented by a variable
- E.g. $\sum X$ means "add all $X$ values together."
- $\sum_{i=\text{here is where you start}}^{\text{here is where you end}} \text{here are the values you calculate}$
Here are rules of (in)equality, using constants, where $N$ is the sample size and $C$ is a constant.
- When you are calculating the sum of a constant, multiply the sample size $N$ by the constant.
$\displaystyle \sum_{i=1}^N C = NC$
- If you multiply the sum of $X$ by a constant, you can remove the constant from the summation and multiply the result by the constant.
$\displaystyle \sum_{i=1}^N CX_i = C \sum_{i=1}^N X_i$
- If you are calculating the sum of $X$ plus the sum of $Y$, you can calculate the sum separately and add the results together.
$\displaystyle \sum_{i=1}^N (X_i + Y_i) = \sum_{i=1}^N X_i + \sum_{i=1}^N Y_i$
- However, this only works with additions and subtractions, not multiplications! For example,
$\displaystyle \sum_{i=1}^N X_i Y_i \neq \sum_{i=1}^N X_i \sum_{i=1}^N Y_i$
1.4 Rounding
Round at the last step only (i.e., only round your result). Round calculations to TWO more digits than the unit of measurement. Here are two rules for rounding:
- If the remainder is less than 5, drop the remainder. For example, $8.33333 = 8.33$
- If the remainder is greater than or equal to 5, increase the last digit by 1. For example, $5.66666 = 5.67$
2. Exploratory Data Analyses
Using exploratory data analyses, we can determine the best number that represents our data, the variance (spread) of the data set, how individual values compares to the entire set, whether there exists a systematic relationship between the variables being studied, and the shape of the distributions.
- Skew: the relative symmetry of a distribution of scores.
- Kurtosis: the degree to which values cluster versus being distributed across or throughout the range of values.
We can use stem and leaf graphs to display the original values in our data set. The stem (left of the line) represents the interval whereas the leaf (right of the line) represents the last digit values within the interval; the frequency of each scores is represented by the repeated values.
Stem | Leaf | |||||
---|---|---|---|---|---|---|
9 | 8 | 5 | 5 | 3 | 2 | 1 |
8 | 9 | 6 | 5 | 5 | 3 | 1 |
7 | 7 | 6 | 6 | 5 | 4 | 1 |
6 | 9 | 8 | 8 | 6 | ||
5 | 8 | |||||
4 | 6 | |||||
3 | 8 |
We can use frequency distributions to arrange scores in order of magnitude. The distribution shows the number of times each scores occurs (i.e., the frequency of the scores). It is composed of two elements: a class and a frequency.
- Class: categorizing a grouping of values that are similar to each other.
- Frequency: the number of times a score, $X$, occurs in a data set. $f(X)$
A frequency distribution table is created by listing all the $X$ scores from the highest to lowest and translating the frequencies into a table.
Stem | Leaf | |||||||||
---|---|---|---|---|---|---|---|---|---|---|
11 | 11 | 11 | 11 | 11 | 11 | 11 | ||||
10 | 10 | 10 | 10 | 10 | 10 | 10 | 10 | 10 | 10 | 10 |
9 | 9 | 9 | 9 | 9 | 9 | 9 | 9 | 9 | ||
8 | 8 | 8 | 8 | 8 | 8 | 8 | ||||
7 | 7 | 7 | 7 | 7 | 7 | |||||
6 | 6 | 6 | 6 |
$\uparrow$turns into$\downarrow$
Stem | Leaf |
---|---|
11 | 6 |
10 | 10 |
9 | 8 |
8 | 6 |
7 | 5 |
6 | 3 |
Grouped frequency distributions are used when the scores are spread out, especially when there are a small amount of frequencies. For example, salary ranges are normally displayed in grouped frequency distribution tables.
The cumulative frequency is the total frequency (or total number of scores) at or below that score. The total frequency of scores below the upper true limit of the class interval. The cumulative frequency distribution is where the scores are arranged in order of magnitude and the distribution shows the total frequency at or below each score.
Exam Score ($X$) | $f(X)$ | Cumulative $f(X)$ |
---|---|---|
30-39 | 4 | 4 |
40-49 | 6 | 10 |
50-59 | 10 | 20 |
60-69 | 160 | 180 |
70-79 | 10 | 190 |
80-89 | 6 | 196 |
90-100 | 4 | 200 |
Relative Frequency (a.k.a. proportion) is where we divide the frequency of a score or class by $N$ (the total number of scores).
$\displaystyle \text{Relative Frequency} = \frac{f(X)}{N}$
The relative frequency distribution is where the scores are arranged in order of magnitude and the distribution shows the relative frequency for each score.
The cumulative relative frequency is the relative frequency of scores at or below that point. The cumulative relative frequency distribution is where the scores are arranged in order of magnitude and the distribution shows the relative frequency at or below each score.
Exam Score ($X$) | $f(X)$ | Cumulative $f(X)$ | Relative $f(X)$ | Cumulative Relative $f(X)$ |
---|---|---|---|---|
30-39 | 4 | 4 | 0.02 | 0.02 |
40-49 | 6 | 10 | 0.03 | 0.05 |
50-59 | 10 | 20 | 0.05 | 0.1 |
60-69 | 160 | 180 | 0.8 | 0.9 |
70-79 | 10 | 190 | 0.05 | 0.95 |
80-89 | 6 | 196 | 0.03 | 0.98 |
90-100 | 4 | 200 | 0.02 | 1 |
Percent is where you multiply the relative frequency by 100.
$\displaystyle \text{Percent} = \frac{f(X)}{N} \times 100$
The percent distribution displays the scores arranged in order of magnitude and the distribution shows the percent for each score.
The cumulative percentage is the percentage of score at or below that point, or the percentage of scores below the upper true limit of the class interval. The cumulative percent distribution is where the scores are arranged in order of magnitude and the distribution shows the total percent at or below each score.
Exam Score $(X)$ | $f(X)$ | Cumulative $f(X)$ | Relative $f(X)$ | Cumulative Relative $f(X)$ | % $(X)$ | Cumulative X $(X)$ |
---|---|---|---|---|---|---|
30-39 | 4 | 4 | 0.02 | 0.02 | 2 | 2 |
40-49 | 6 | 10 | 0.03 | 0.05 | 3 | 5 |
50-59 | 10 | 20 | 0.05 | 0.1 | 5 | 10 |
60-69 | 160 | 180 | 0.8 | 0.9 | 80 | 90 |
70-79 | 10 | 190 | 0.05 | 0.95 | 5 | 95 |
80-89 | 6 | 196 | 0.03 | 0.98 | 3 | 98 |
90-100 | 4 | 200 | 0.02 | 1 | 2 | 100 |
Converting frequencies to proportions or percentages allows us to compare two or more groups when the size of the groups differs.
3. Graphs
There are five fundamental characteristics of a graph:
- Two axes are drawn at a right angle.
- The horizontal axis is the x-axis (abscissa) and the vertical axis is the y-axis (ordinate).
- The IV is plotted along the x-axis and the DV is plotted along the y-axis.
- The variables must be clearly labelled on both the x- and y-axes.
- The graph should contain all of the information needed to understand the data and nothing more.
Bar graphs have spaces between the bars because they represent discrete, qualitative (nominal) data. The categories are plotted along the x-axis.
In contrast, histograms have no spaces between them. They are used to represent ordinal, interval, or ratio data. Quantitative values are plotted along the x-axis.
- For continuous variables, a class interval is a category of numbers with specified limits. Each number on the x-axis represents the midpoint.
- Height corresponds to the frequency for the category.
- Each number on the x-axis represents the midpoint of the class interval.
- For interval and ratio data, the width of the bar extends to the real limits for the category. E.g., for a value of 3 that falls equally between 2 and 4, the real limits would be 2.5 to 3.5.
Polygons and histograms are similar. If you add a dot at the top of each category's midpoint and connect the lines, you get a polygon. Make sure to join the outer limits of the polygon line to the x-axis at zero (to indicate frequency = 0).
0 comments:
Post a Comment