Gene, GeneOntology를 이용한 유전자에 관한 다양한 통계들.
재료
- Gene: NCBI gene ftp - Homo_sapiens_gene_info.gz (2012-06-28)
- GO: Gene ontology site - GO.obo (2012-06-28)
- GOA: NCBI gene ftp - gene2go (2012-06-28)
Count
Gene: Gene.objects.count() (->) 43,484
GO: GOTerm.objects.count() (->) 35,847
GOA: GOAssociation.objects.count() (->) 190,477
Gene category
- protein-coding: 20,258
- rRNA: 478
- snRNA: 96
- unknown: 2,954
- scRNA: 5
- pseudo: 12,671
- other: 817
- miscRNA: 5,219
- tRNA: 599
- snoRNA: 391
- ncRNA: 1
chromosome 별 유전자 갯수
1 >>> chrs = defaultdict(int)
2 >>> for gene in Gene.objects.all():
3 chrs[gene.chromosome] += 1
4 >>> sorted(chrs.items(), key=itemgetter(0))
5 [(u'-', 458),
6 (u'1', 4001),
7 (u'10', 1606),
8 (u'10|19|3', 1),
9 (u'11', 2514),
10 (u'12', 1960),
11 (u'12|Un', 1),
12 (u'13', 1119),
13 (u'13|Un', 1),
14 (u'14', 1704),
15 (u'15', 1458),
16 (u'16', 1546),
17 (u'17', 2062),
18 (u'17|Un', 1),
19 (u'18', 686),
20 (u'18|Un', 2),
21 (u'19', 2267),
22 (u'2', 2795),
23 (u'20', 1024),
24 (u'21', 543),
25 (u'22', 1004),
26 (u'2|Un', 1),
27 (u'3', 2303),
28 (u'3|11', 1),
29 (u'3|Un', 5),
30 (u'4', 1694),
31 (u'5', 1900),
32 (u'6', 2437),
33 (u'7', 2248),
34 (u'8', 1574),
35 (u'9', 1756),
36 (u'MT', 74),
37 (u'Un', 205),
38 (u'X', 2015),
39 (u'X|Y', 33),
40 (u'Y', 485)]