EST서열에서 Protein코딩영역을 예측하는 프로그램
http://www.ch.embnet.org/software/ESTScan.html
TrEst DB는 UniGene서열들을 이 프로그램으로 Protein Translation한 것이다. training데이터로 GenScan과 같은 테이블을 사용하고 있으며, 호환된다.
프로그램을 다운로드받으면, 6개종의 tranining데이터가 들어있다.
- Arabidopsis thaliana
- Drosophila melanogaster
- Danio rerio
- Homo sapiens
- Mus musculus
- Rattus norvegicus
그외의 종에 대해서는 직접 training데이터를 만들어야 한다. 이를 위해서는 다음의 정보가 필요하다. (Human일경우의 예)
- organism: Homo sapiens - database files are: /db/refseq/hs-up.gbff /db/refseq/hs.gbff - UniGene data is in: /db/unigene/Hs.data - ESTs for testing: /db/dbest/est_hum-??.seq - data directory: /export/scratch/ESTScan/Hs - mRNA file is: /export/scratch/ESTScan/Hs/mrna.seq - EST file is: /export/scratch/ESTScan/Hs/ests.seq - ESTs with coding: /export/scratch/ESTScan/Hs/Evaluate/estcds.seq - ESTs without coding: /export/scratch/ESTScan/Hs/Evaluate/estutr.seq - training file is: /export/scratch/ESTScan/Hs/training.seq - test file is: /export/scratch/ESTScan/Hs/test.seq - clean UTR file is: /export/scratch/ESTScan/Hs/Evaluate/rnautr.seq - clean CDS file is: /export/scratch/ESTScan/Hs/Evaluate/rnacds.seq - HMM paramters file: /export/scratch/ESTScan/Hs/Matrices/6_00030_0000001_4242.smat - tuple size: 6 - min redundancy mask: 30 - added pseudocounts: 1 - minimum score: -100 - start profile length/preroll: 4/2 - stop profile length/preroll: 4/2 - Isochores: 0-43, 43-47, 47-51, 51-100
Boa tarus의 EST]서열에 대해 EstScan을 해야하는데, 트레이닝데이터를 어디서 어떻게 구해야하는가. AnswerMe. --[[yong27, 2004-09-16
"T"옵션 주의
-T <int*> 8 integers used as log-probabilities for transitions, start->5'UTR, start->CDS, start->3'UTR, 5'UTR->CDS, 5'UTR->end, CDS->3'UTR, CDS->end, 3'UTR->end (-10,-10,-5,-80,-40,-80,-40,-20)