(<-)

../Part2

StatisticsStudy

../Part4

(->)

Part III. Correlation and Regression

Ch 8. Correlation

  1. The Scatter Diagram
    • The relationship between two variables can be represented by a scatter diagram.
    • positive association -
    • strong association - knowing one helps a lot in predicting the other.
    • weak assocation - information about one vatiable does neo help much in guess the other.
    • relationship between two variables
      • independent
      • dependent
      • independent variable이 dependent variable에 영향을 준다고 생각한다. 그런데 independent variable과 dependent variable은 관점을 달리해서 바라보면 independent가 dependent가 되고, dependent가 independent가 된다.
  2. The Correlation Coefficient
    • 어떻게 Scatter diagram을 수학적으로 요약할 것인가?
      • The average of the x-values, the SD of the x-values
      • The average of the y-values, the SD of the y-values
      • The correlation coefficient r
    • Perfect correlation - crrelation coefficient가 1인 경우로 이 경우에는 Scatter diagram의 모든 점이 하나의 line을 이루고 있게 된다.
    • Correlation 값은 항상 -1 과 1 사이가 된다.
      • 숫자의 진정한 의미는 알 수 없다.
      • (+)값은 Scatter diagram에서 line의 기울기가 (+)라는 걸 의미한다.
      • (-)값은 Scatter diagram에서 line의 기울기가 (-)라는 걸 의미한다.
    • 만약 x-values나 y-values의 SD값이 0일 경우 correlation coefficient를 구하는 방법은 없다.
  3. The SD Line
  4. Computing The Correlation Coefficient
    • 각 variable을 standard units으로 변환한다.
    • 변환한 variable의 각각의 곱의 평균이 Correlation Coefficient이다.
    • Convert each variable to standard units. The average of the products gives the correlation coefficient.

Ch 9. More about Correlation

  1. Features of The Correlation Coefficient
    • The correlation coefficient is a pure number, without units. It is not affected by
      • interchanging the two variables,
      • adding the same number to all the values of one variables,
      • multiplying all the values of one variable by same positive number.
  2. Change SDs
    • 만약 correlation coefficient 값이 같다면 SD값이 작은 쪽이 더 tightly clustering 되어 있는거다.
  3. Some Exceptional Cases
    • The correlation coefficient can be misleading in the presence of outliers or non-linear association. Whenever possible, look at the scatter diagram to check for the problems.
    • correlation coefficient, 'r', measures linear association, not association in general.
  4. Ecological Correlations
    • Ecological correlations are based on rates or averages. They are often used in political science and sociology. And they tend to overstate the strength of an association. So watch out.
  5. Association Is Not Causation
    • Correlation measures association. But association is not the same as causation. It may only show that both variables are simultaneously influenced by some third variable.

Ch 10. Regression

  1. Introduction
    • Associated with each increase of one SD in x there is an increase of only r * SDs in y, on the average.
    • slope = r * SDy / SDx, a point = ( average x, average y )
  2. The Graph of Average
    • The regression line is a smoothed version of the graph of averages. If the graph of averages follows a straight line, that line is the regression line.
    • Regression lines should not be used when there is strong non-linear association between the variables.
  3. The Regression Method for Individuals
    • The regression line can be used to make predictions for individuals. But if you have to extrapolate far from the data, or to a different group of subjects, be careful.
  4. The Regression Fallacy
  5. There Are Two Regression Lines

Ch 11. The R.M.S. Error for Regression

  1. Introduction
  2. Computing the R.M.S. Error
  3. Plotting the Residuals
  4. Looking at Vertical Strips
  5. Using the Normal Curve Inside a Vertical Strip

Ch 12. Regression Line

  1. Slope and Intercept
  2. The Method of Least Squares
  3. Does the Regression Make Sense?

StatisticsStudy/Part3 (last edited 2014-04-08 13:24:40 by 61)

web biohackers.net