[HumanMolecularGenetics] '''CHAPTER 11 Genetic mapping of mendelian characters'''
----
== 11.1 Recombinants and nonrecombinants ==
 
 - in human, genetic mapping의 목적은  two loci가 meiotic recombination에 의해 얼마나 자주 분리되는 가를 알아보는것

 - father (-A1--B1-,homozygote) --- mother (-A2--B2-,homozygote)
                                 l
               -A1--B1-
               -A2--B2-(heterozygote) : '''nonrecombinant'''
               
               -A1--B2-
               -A2--B1-(heterozygote) : '''recombinant'''

 '''recombinant fraction''' : A와 B loci 사이에서 children 중 recombinant의 비율

 === 11.1.1 The recombination fraction is a measure of genetic distance ===
  -  two loci가 다른 chromosome에 있으면, 독립적으로 segregate
     -A2--B2-
     -A1--B1- 에서 spermatogenesis가 일어나는 동안, A1이 B2 allele를 받을 chance : 50%(recombinant)
                                                    A1이 B1 allele를 받을 chance : 50%(nonrecombinant)
     -> '''recombination fraction : 0.5'''
 
 - two loci가 syntenic하면(같은 chromosome내에 있다면), 같이 segregate (meotic cross-over를 무시) -> no recombination

 - recombination은 가깝게 있는 locus를 separate 못하는데(드물게 separate 될때도 있다.)
    가까이 있는 locus는 한 set로 block처럼 transmit 된다. 이러한 block allele들을  haplotype이라 한다.

 - recombination 에 의해 깨지지 않는다면, haplotype은 single highly polymorphic locus에서 allele로써 mapping에 쓰여진다.

 - recombination fraction = genetic distance ( between two loci )

 - '''two loci가 1% recombination을 가질때 1cM 으로 정의'''
      
 === 11.1.2 Recombination fractions do not exceed 0.5 however great the physical distance ===

 - Figure 11.2

 - two loci가 멀리 떨어져 있으면, double recombination이 발생, two, three, four-strand가 관여

 - 전체적인 recombination의 비율은 50%이다. so, 아무리 멀리 떨어져 있어 cross-over가 잘된다 해도 50%를 넘지 않는다.

 === 11.1.3 Mapping functions define the relationship between recombination fraction and genetic distance ===

 - --A-B-C-D-E-.........-M-...: A와 M 의 recombination fraction은 60%가 아니다.
      5 5 5 5.......

 - recombination fraction과 genetic map distance 사이의 관계는 '''mapping function'''으로 describe
   
   * cross-over가 randomly 발생하고, 다른것에 영향을 주지 않는다면, '''Haldane's function'''을 이용
    
      w = -1/2 ln(1-2θ) or θ = 1/2{1-exp(-2w)}
     
       w=map distance, θ = recombination fraction
   
      -> but, '''cross-over는 random하게 일어나지 않는다.'''
 
 - Interference : 한 chiasma(교차)의 존재가 두번째 chiasma의 형성을 방해하는 현상

   * interference의 정도차가 있기 때문에 '''Kosambi's function'''을 사용

     w = 1/4 ln{(1+2θ)/(1-2θ)} or θ = 1/2{exp(4w)-1}/{exp(4w)+1]}

 === 11.1.4 The relation between physical and genetic distances is not constant across the genome ===

 - human male meiosis에서 chiasma는 평균 cell당 49 crossover가 발생, 각 cross-over가 50% recombination이기 때문에 chiasma는 genetic map length가 2450cM이 된다.
   (DB에 따르면 total length = 2851cM)

 - 교차는 female에서 더 빈번히 발생

   female의 total map length는 4296cM이다.(X를 제외)
   autosomal genome이 3000Mb이므로, 

       male의 평균 cM은     3000Mb /2851cM = 1.05Mb/cM
       female의 평균 cM은   3000Mb / 4296cM = 0.7Mb/cM
          so, average (1.05 + 0.7)/2 = 0.88Mb --> 대략 '''1cM = 1Mb'''로 사용

 - figure 11.3
   male : telomere 쪽에서 보다 높은 recombination
   female : centromere 쪽에서 보다 높은 recombination
 
 - X와 Y의 short arm 끝에서 존재하는 pseudoautosomal region에서 편차를 보이고 있다.
    -> male은 이 region에서 2.6Mb 안에서 cross-over가 일어나는데, 이 부위는 실질적으로는 50cM에 해당한다. so, 1Mb = 19cM(50% / 2.6 Mb)이 됨
       female은 1Mb = 2.7cM(7% / 2.6Mb)에 해당 

 - Y chromosome은 pseudochromosomal region out에서는 meiosis에서 cross-over가 일어나지 않아 genetic map 도 없다.
  
== 11.2 Genetic Markers ==
 === 11.2.1 Mapping human disease genes requires genetic markers ===

 - disease들 사이에서 recombination fraction의 계산은 mapping에 좋은 방법이지만, human에서 disease-disease mapping은 불가능하다.
   (because double heterozygote 이어야 하는데, 두개의 다른 disease에서 heterozygote는 극히 드물기 때문)
   -> 이런 경우에 human 에서는 mapping을 marker에 의존(Mendelian character를 Marker로 사용)
  
 - marker는 충분히 polymorphic해야하고, 좋은 heterozygouse를 가져야 하고, genome 전체에 걸쳐 분포해야 가능

 - recombinant 가 없으면, 10번의 meiosis로도 linkage analysis 충분
   recombinant fraction이 0.3 이라면 85번의 meiosis가 필요(Box 11.3)

 - rare disease에서 30 meiosis 이상인 family를 얻기 힘들다.

 - mapping에 필요한 marker는 20cM을 넘어서는 안된다. genome length가 3000Mb로 보면, 최소 150개의 marker가 필요

 - Human Genome Project의 주요 성과 중 하나가 10,000개 이상의 highly polymorphic marker를 찾아낸데 있다.

 === 11.2.2 The heterozygosity or polymorphism information content measure how informative a marker is ===
 
 - linkage analysis를 하기 위해서 informative meiosis가 필요
    uninformative인 것은 parent 모두 homozygote or 같이 heterozygote 인데 marker에 대해 uninformative한 경우

 - frequency p1,p2,p3...를 갖는 marker allele A1, A2, A3..이 있을때 Heterozygouse를 가진 사람의 비율은 1-(p1^2 + p2^2 +p3^2...)이다.

 - 보다 유용한 측정으로 {{{A1A2}}} 둘다 heterozygouse인 couple에 대해서도 가능한 '''PIC(polymorphism information content)'''를 이용

 === 11.2.3 DNA polymorphisms are the basis of all current genetic markers ===

 - 1980년대 초 처음으로 DNA polymorphism이 보고
 
 - DNA marker의 장점은 same technique에 의해 typing되어 질수 있다는 것 (FISH or radiation hybrid mapping)

 - linkage paraoxonase의 protein polymorphism은 찾았어도 chromosome에서의 location은 알지 못한다.

 - DNA marker의 발달로 인해 본격적인 human gene mapping이 시작

  * RFLPs
    limited informativeness라는 한계가 있고, only two alleles만 가능(site의 유무)

  * Minisatellites
    전체 genome에 균등히 퍼져 있지 않다.
 
  * Microsatellites
    PCR을 이용하여 size allele 구별

  * SNPs
    solid-array 이용

== 11.3 Two-point mapping ==

 === 11.3.1 Scoring recombinations in human pedigrees is not always simple ===

 - family를 수집한 후

   '''How do we know when we have found linkage?'''
      1.  recombinant fraction ?
      2.  what statistical test ?

 - figure 11.4

 === 11.3.2 Computerized lod score analysis is the best way to analyze complex pedigree for linkage between mendelian characters ===

 - 11.4B가 명확한 recombinant identify가 불가능하지만, pedigree의 전체 likelihood를 계산 하는 것은 가능

 - loci가 link 되어 있으면 recombinant fraction = θ
         not-link 이면                         = 0.5

 - 이들 likelihood의 ratio는 linkage의 odd로 주어지고, 이 odd logarithm이 '''lod score'''이다.

 - lod score는 linkage pedigree를 평가하는데 가장 좋은 통계법

 - human linkage analysis는 computer program에 의존하는데, pedigree data and gene frequency를 알 수 있다.

 === 11.3.3 Lod scores of +3 and -2 are the criteria for linkage and exclusion (for a single test) ===

 - Box 11.3

 - positive lod = linkage가 있는것, negative lod = linkage가 없는것이다.

 - 모든 lod score는 recombination fraction(θ) = 0.5에서 '0'이다.

 - θ = recombination fraction, likelihood이기도 함 
   1-θ = non-recombinant의 likelihood
 
 - figure 11.5

 - 두번째 질문에 대한것은 significance의 threshold와 연관

 - z=0.3 은 5% error를 갖는 linkage에 대한 threshold이다. (p<0.05를 중요한 threshold로써 사용하곤 함)

 - z=0.4은 1000:1 odd에 해당, 이 threshold는 linked이고, random하게 선택한 two loci가 유전되어질 것 같지 않을 때 선택

 - 어떤 것이 유전적으로 일어나기 힘든일이라면, 그것이 true임을 보이기 위한 strong evidence가 필요
   이러한것은 Bayesian calculation에서 qualify할 수 있다.(1000:1 odd는 p=0.05에 해당)

 - X-linked character와 X-chromosome marker 사이의 linkage는 2.3으로 threshold lod를 제안

 - human genetic map에서 distance는 매우 자주 부정확한 estimate임을 알아야 한다.

 - '''exclusion mapping''' : negative lod score( Z <-2 )이면, 그 부분에 disease gene이 없다는 것임

 === 11.3.4 For whole genome searches a genome-wide threshold of significance must be used ===

 - disease에 대한 연구에서 positive lod를 얻을때 까지 family의 marker를 typing한다.

 - genome 어느곳에서나 p=0.05를 갖는 false positive를 얻을 수도 있다.

 - 50개의 marker를 사용했을때가 1개를 사용한 것 보다 false positive result를 얻기가 쉽다.
   -> stringent 과정은  significance를 test하기 전에 p value에 50을 곱해준다.

 - (n)개의 marker를 사용했을 때 threshold lod score는 '''3 + log(n)'''이된다.
    ex) 10개를 marker로 쓰면, 3 + log(10) = 4, 100이면, 3 + log(100) = 5 

 - 일반적으로 Mendelian character는 3.3으로 사용, 실제로  lod score는 marker의 수에 상관없이 '''5'''이하로 임시적으로 고려하고 있다.

== 11.4 Multipoint mapping is more efficient than two-point mapping ==

 === 11.4.1 Multipoint linkage can locate a disease locus on a framework of markers ===

 - linkage analysis는 two loci 이상이 동시에 분석될때 보다 효율적, multilocus analysis는 linked loci들의 set에서 염색체상의 순서를 정할때 유용
   -> 유전학자들은 three-point cross를 이용

 - rarest recombinant class의 경우 double recombination이 필요

 - multilocus mapping은 human에서 제한된 marker의 informativeness에 의해 발생하는 문제를 극복할 수 있다.

 - Table 11.1

 === 11.4.2 Multipoint mapping by computer ===

 - disease gene의 위치를 알아내기 위해

 - LINKMAP, GENEHUNTER : 각 position에서 pedigree data의 전체 likelihood를 계산

 - Figure 11.6

 === 11.4.3 Multipoint linkage is essential for constructing marker framework maps ===

 - disease-marker mapping을 위해서는 관심있는 disease를 가진 family를 찾는 것인데, 그 family들은 또한 이상적인 구조를 가지기가 힘들다.
   --> marker-marker mapping이 이문제를 해결

 - marker는 어느 family에서나 연구될 수 있고, linkage의 이상적 구조를 가지고, 아이들이 많은 family를 선택할 수도 있다.

 - HGP의 1st 목표는 highly polymorphic marker의 high-density frame work map을 만드는 것이었다.

 === 11.4.4 Integrated maps combine genetic and physical data ===

 - multipoint mapping에서 loci의 순서를 정하는 것은 매우 어려운 문제다.
  ''n'' marker의 경우 n!/2 경우의 수가 있다. 그리구 chromosome당 수백개의 marker를 가진다.

 - Brute force보다 intelligent한 physical mapping 방법을 이용

 - mapping의 목적은 genetic과 physical scale에서 주어진 distance가 chromosomal band와 관련되고, chromosome location의 순서를 list화 한 integrated map이다.

== 11.5 Standard lod score analysis is not without problems ==

- standard lod score analysis는 disease gene이 위치한 20Mb segment내에서는 유용하지만, 어려운점이 있다.

 === 11.5.1 Errors in genotyping and misdiagnoses can generate spurious recombinants ===

 - common error(misread gels. switched samples nonpaternity)는 parent와 모순된 genoype을 가진 child의 결과가 나올수 있다.
   -> misdiagnosis
   -> false recombinant의 add에 의해 genetic map의 길이를 늘리게 된다.

 - multilocus analysis가 이 문제를 도와줌
  ( false recombinant는 close double recombianant로 나타나기 때문)

 - figure 11.7

 === 11.5.2 Computational difficulties limit the pedigree that can be analyzed ===

 - human linkage analysis - by computer programs
 
 - LIPED, MLINK : Elston-Steward algorithm 사용
   -> can handle large pedgree, but 가능한 haplotype의 수가 증가하면 computing time exponentially increase
   -> multipoint data를 analyze하는데 MLINK는 역부족

 - GENEHUNTER : Lander-Green algorithm 사용
  -> genotype의 수에 상관없이 처리, but computing time은 pedigree size에 따라 exponentially increase
  -> 적절한 size의 pedigree를 가지고, whole-genome을 분석할때는 good

 === 11.5.3 Locus heterogeneity is always a pitfall in human gene mapping ===

 - 몇몇 unlinked gene에서의 mutation은 같은 clinical phenotype을 보이기도 한다.

 - large family에서 모든 family 내에서 locus heterogeneity가 있다면 dominant condition이라도 mapping이 어려울 수 있다.

 - GENEHUNTER or HOMOG 는 locus homogeneity와 heterogeneity에서 likelihood를 비교할 수 있따.

 === 11.5.4 The limited resolution of human genetic mapping may be overcome by typing single sperm or by using linkage disequilibrium ===

 - marker-marker mapping을 해결하는 한가지 가능한 방법은 children 대신에 sperm을 typing 하는 것이다.
   children은 수가 적은 반면, doubly heterozygous man의 sperm을 분리하여 PCR 수행

 - 한가지 결함은 흥미 있는 결과를 confirm 하기 위해 반복하여 resample할 수 없다는 것, child에서는 resample 가능

 - unfortunately, sperm typing은 disease mutation이 characterize 되어 있지 않다면, disease-marker mapping은 불가능

 - Linkage disequilibrium은  disease-marker mapping에서 candidate region을 좁혀나갈 수 있다.
  
 === 11.5.5 Autozygosity mapping can map recessive conditions efficiently in extended inbred families ===

 - '''Autozygosity''' : 최근 공통조상으로부터 유전된 동일한 marker에 대해 homozygosity 의미로 사용

 - 동족 family에서 드문 recessive disease를 가진 사람은 disease locus에 link 된 marker가 autozygous가 된다.

 - 만약 한 child에서 특정  marker allele가 homozygous이면, 이것은 autozygosity 때문이거나, 같은 allele의 2nd copy가 독립적으로 family로 들어갔을 때 발생

 - 매우 작은 inbred family들도 중요한 lod score를 만들어 낼 수 있다.
  autozygosity mapping은 둘 또는 그 이상의 친척관계에서 영향을 받은 사람들이 많은 family를 찾는다면 linkage analysis에 좋은 tool이 된다.

 - figure 11.8

 === 11.5.6 Characters whose inheritance is not mendelian are not suitable for mapping by the methods described in this chapter ===

 - lod score analysis method는 정확한 genetic model이 필요
       (유전형태, gene frequency, genotype의 penetrance..)
    for mendelian, penetrance가 main problem area 이다. 

 - marker frame work map에서 순서의 error는 문제의 원인이 되는데, 이것은 physical mapping data와 cross check되어 genetic map을 해결가능

 - 충분한 meiosis가 있고, mendelian character의 linkage analysis에서 주요 방해물은 locus heterogeneity이다.
   (Complex disease는 더 다루기 힘듬 - chapter 12)

 - 어떠한 genetic model도 가설일 뿐이다.

 - gene frequency나 allele penetrance, inheritance mode에 대한 진정한 idea가 없기 때문에 위에서 언급한 모든 방법을 적용한다.