(<-) |
(->) |
Part III. Correlation and Regression
Ch 8. Correlation
- The Scatter Diagram
- The relationship between two variables can be represented by a scatter diagram.
- positive association -
- strong association - knowing one helps a lot in predicting the other.
- weak assocation - information about one vatiable does neo help much in guess the other.
- relationship between two variables
- independent
- dependent
- independent variable이 dependent variable에 영향을 준다고 생각한다. 그런데 independent variable과 dependent variable은 관점을 달리해서 바라보면 independent가 dependent가 되고, dependent가 independent가 된다.
- The Correlation Coefficient
- 어떻게 Scatter diagram을 수학적으로 요약할 것인가?
- The average of the x-values, the SD of the x-values
- The average of the y-values, the SD of the y-values
- The correlation coefficient r
- Perfect correlation - crrelation coefficient가 1인 경우로 이 경우에는 Scatter diagram의 모든 점이 하나의 line을 이루고 있게 된다.
- Correlation 값은 항상 -1 과 1 사이가 된다.
- 숫자의 진정한 의미는 알 수 없다.
- (+)값은 Scatter diagram에서 line의 기울기가 (+)라는 걸 의미한다.
- (-)값은 Scatter diagram에서 line의 기울기가 (-)라는 걸 의미한다.
- 만약 x-values나 y-values의 SD값이 0일 경우 correlation coefficient를 구하는 방법은 없다.
- 어떻게 Scatter diagram을 수학적으로 요약할 것인가?
- The SD Line
- Computing The Correlation Coefficient
- 각 variable을 standard units으로 변환한다.
- 변환한 variable의 각각의 곱의 평균이 Correlation Coefficient이다.
- Convert each variable to standard units. The average of the products gives the correlation coefficient.
Ch 9. More about Correlation
- Features of The Correlation Coefficient
- The correlation coefficient is a pure number, without units. It is not affected by
- interchanging the two variables,
- adding the same number to all the values of one variables,
- multiplying all the values of one variable by same positive number.
- The correlation coefficient is a pure number, without units. It is not affected by
- Change SDs
- 만약 correlation coefficient 값이 같다면 SD값이 작은 쪽이 더 tightly clustering 되어 있는거다.
- Some Exceptional Cases
- The correlation coefficient can be misleading in the presence of outliers or non-linear association. Whenever possible, look at the scatter diagram to check for the problems.
- correlation coefficient, 'r', measures linear association, not association in general.
- Ecological Correlations
- Ecological correlations are based on rates or averages. They are often used in political science and sociology. And they tend to overstate the strength of an association. So watch out.
- Association Is Not Causation
- Correlation measures association. But association is not the same as causation. It may only show that both variables are simultaneously influenced by some third variable.
Ch 10. Regression
- Introduction
- Associated with each increase of one SD in x there is an increase of only r * SDs in y, on the average.
- slope = r * SDy / SDx, a point = ( average x, average y )
- The Graph of Average
- The regression line is a smoothed version of the graph of averages. If the graph of averages follows a straight line, that line is the regression line.
- Regression lines should not be used when there is strong non-linear association between the variables.
- The Regression Method for Individuals
- The regression line can be used to make predictions for individuals. But if you have to extrapolate far from the data, or to a different group of subjects, be careful.
- The Regression Fallacy
- There Are Two Regression Lines
Ch 11. The R.M.S. Error for Regression
- Introduction
- Computing the R.M.S. Error
- Plotting the Residuals
- Looking at Vertical Strips
- Using the Normal Curve Inside a Vertical Strip
Ch 12. Regression Line
- Slope and Intercept
- The Method of Least Squares
- Does the Regression Make Sense?