BiologistsGuideToAnalysisOfDnaMicroarrayData/Chap4

(<-)

[Chap3]

[Chap5]

(->)

Chap.4 Visualization by reduction of dimensionality

Contents

6000 genes of 15 patients -> 6000*15 matrix
- dimension을 줄여야 할 필요성이 있음 (visualize -> analysis)
Dimension을 줄여 분석할 수 있는 방법
- PrincipalComponentAnalysis
- Correspondence Analysis
- Singular Value Decomposition
- Multidimesional Scaling
- Cluster Analysis

PrincipalComponentAnalysis

15*6000
- gene 1 : (patient 1, patient 2, patient 3, patient 4, patient 5, ... , patient 15) gene 2 : (patient 1, patient 2, patient 3, patient 4, patient 5, ... , patient 6)
  - ..
  gene 6000 : (patient 1, patient 2, patient 3, patient 4, patient 5, ... , patient 6)
  ==> 15 dimension 위의 6000개의 점으로 표시될 수 있다. 이 점들을 잘 설명할수 있는 2개의 axis를 잡는 것이 PCA법이다.
여기서는 15 dimensions의 중점을 지나고, 각 gene에 해당하는 점들의 variation을 maximum 하게 보여주는, 즉 $$ \sum d^2 $$ 이 minimal한 axis를 Principal Component 1 (이하 PC1)으로 정함.
그리고 PC1에 독립적인, 즉 직각인 평면 상에 있는 직선들 중, 6000개의 gene에 대해 가장 잘 설명할 수 있는 (maximal variation) axis를 PC2로 선정
이 두 axis 즉, PC1, PC2를 각각 x, y좌표로 놓고 6000개의 gene들을 투영한다.
이 좌표들을 보면 전체적인 trend를 알 수 있는 데, 여기서부터의 해석에 Biological Knowledge가 사용이 되게 된다.
- 각 PC에 기여도가 높은 Axis(15개의 환자 axis중)를 알아보기 위해 이들 15 axis들을 2개의 PC가 이루는 평면으로 투영한다.
cluster analysis가 다른 측면을 설명해 줄 수도 있다.
PCA와 함께 t-Test, ANOVA 등을 이용하여 cutoff size 즉 pick-up하여 분석할 gene의 개수를 산정할 수 있다.