# Study Notes #2

### Summation

\displaystyle\sum_{i=1}^n

We use Sigma symbol to represent a summation.

x_1 + x_2 + x_3 + x_4 + x_5 + x_6

We can write:

\displaystyle\sum_{i=1}^6x_i

Five Number Summary – gives values for calculating the range and interquartile range.

1. Minimum – the smallest number in the dataset.
2. Q1 – The value such that 25% of the data fall below.
3. Q2 – The value such that 50% of the data fall below.
4. Q3 – The value such that 75% of the data fall below.
5. Maximum – The largest value in the dataset.

Range – calculated as the difference between the maximum and the minimum.

range = maximum - minimum

IQR (Interquartile Range) – calculated as the difference between Q3 and Q1

IQR = Q_3 - Q_1

Steps to compute:

1. Arrange data set from least to highest number.
2. Get the lowest number as the minimum.
3. Get the highest number as the maximum.
4. Get the median/mean as Q2.
5. Get the median/mean of the first data set as the Q1. (don’t include the median/mean of the entire data set Q2.)
6. Get the median/mean of the second data set as the Q3. (don’t include the median/mean of the entire data set Q2.)
7. Get the difference of the maximum and minimum as the range.
8. Get the difference of the Q3 and Q1 as the IQR.

Examples

1, 5, 10, 3, 8, 12, 4, 1, 2, 8

1. 1, 1, 2, 3, 4, 5, 8, 8, 10, 12
2. Minimum = 1
3. Maximum = 12
4. Q2 = 4+5 = 9/2 = 4.5
5. Q1 = 2
6. Q3 = 8
7. Range = 12-1 = 11
8. IQR = 8-2 = 6
5, 10, 3, 8, 12, 4, 1, 2, 8

1. 1, 2, 3, 4, 5, 8, 8, 10, 12
2. Minimum = 1
3. Maximum = 12
4. Q2 = 5
5. Q1 = 2+3=5/2 = 2.5
6. Q3 = 8+10=18/2 = 9
7. Range = 12-1 = 11
8. IQR = 9-2.5 = 6.5

Box Plot – are useful for quickly comparing the spread of two data sets across some key metrics, like quartiles, maximum, and minimum.

1. The beginning of the line to the left of the box and the end of the line to the right of the box represent the minimum and maximum values in a dataset.
2. The visual distance between these markings is an indication of the range of the values.
3. The box itself represents the IQR. The box begins at the Q1 value, ends at the Q3 value, and Q2, or the median, is represented by a line within the box.

## Standard Deviation and Variance

Standard Deviation – on average, how much each point varies from the mean of the points.

Variance – average squared difference of each observation from the mean.

\sqrt {\frac 1 n \displaystyle\sum_{i=1}^n (x_i - \bar{x})^2}

#### How to Calculate Standard Deviation

Dataset=
10, 14, 10, 6
1. Calculate the mean.
(\sum_{i=1}^4 x_i)/n \\ 10+14+10+6 \\ 40/n \\ 40/4 \\ =10
1. Calculate the distance of each observation from the mean and square the value.
(x_i - \bar{x})^2 =  \\ (10-10)^2 = 0^2 = 0 \\ (14-10)^2 = 4^2 = 16 \\ (10-10)^2=0^2=0 \\ (6-10)^2=-4^2=16
1. Calculate the variance, the average squared difference of each observation from the mean.
\sqrt {\frac 1 n \displaystyle\sum_{i=1}^n (x_i - \bar{x})^2} \\ (0+16+0+16)/4\\32/4\\=8
The variance is 8. 
1. Calculate the standard deviation, the square root of the variance.
\sqrt 8 \\ =2.83
The standard deviation is 2.83.

#### Sample Problem

Dataset
1, 5, 10, 3, 8, 12, 4
1. Mean – 6.14
(\sum_{i=1}^7 x_i)/7 \\ 1+5+10+3+8+12+4=43\\43/7\\=6.14
1. Variance
(x_1-\bar{x})^2\\(1-6.14)^2=-5.14^2=26.42\\(5-6.14)^2=-1.14^2=1.30\\(10-6.14)^2=3.86^2=14.90\\(3-6.14)^2=-3.14^2=9.86\\(8-6.14)^2=1.86^2=3.46\\(12-6.14)^2=5.86^2=34.34\\(4-6.14)^2=-2.14^2=4.58\\94.86/7\\=13.55
1. Standard Deviation
\sqrt {13.55} \\ =3.68