BioPythonTip for GenBank <> == 파싱한 후 취할 수 있는 객체들 == === RecordParser를 썼을때 === {{{ parser = GenBank.RecordParser() iterator = GenBank.Iterator() while 1: cur_record = iterator.next() if cur_record is None: break cur_record.어쩌구저쩌구 }}} 위 파서는 핵산DB, 단백질 DB모두에게 적용된다. * locus - The name specified after the LOCUS keyword in the GenBank record. This may be the accession number, or a clone id or something else. * size - The size of the record. * residue_type - The type of residues making up the sequence in this record. Normally something like RNA, DNA or PROTEIN, but may be as esoteric as 'ss-RNA circular'. * data_file_division - The division this record is stored under in GenBank (ie. PLN -> plants; PRI -> humans, primates; BCT -> bacteria...) * date - The date of submission of the record, in a form like '28-JUL-1998' * accession - list of all accession numbers for the sequence. * nid - Nucleotide identifier number. * pid - Proteint identifier number * version - The accession number + version (ie. AB01234.2) * db_source - Information about the database the record came from * gi - The NCBI gi identifier for the record. * keywords - A list of keywords related to the record. * segment - If the record is one of a series, this is info about which segment this record is (something like '1 of 6'). * source - The source of material where the sequence came from. * organism - The genus and species of the organism (ie. 'Homo sapiens') * taxonomy - A listing of the taxonomic classification of the organism, starting general and getting more specific. * references - A list of Reference objects. * number - The number of the reference in the listing of references. * bases - The bases in the sequence the reference refers to. * authors - String with all of the authors. * title - The title of the reference. * journal - Information about the journal where the reference appeared. * medline_id - The medline id for the reference. * pubmed_id - The pubmed_id for the reference. * remark - Free-form remarks about the reference. * comment - Text with any kind of comment about the record. * features - A listing of Features making up the feature table. * key - The key name of the featue (ie. source) * location - The string specifying the location of the feature. * qualfiers - A listing Qualifier objects in the feature. * key - The key name of the qualifier (ie. /organim=) * value - The value of the qualifier ("Dictyostelium discoideum"). * base_counts - A string with the counts of bases for the sequence. * origin - A string specifying info about the origin of the sequence. * sequence - A string with the sequence itself. === FeatureParser 를 썼을때 === {{{ parser = GenBank.FeatureParser() iterator = GenBank.Iterator() while 1: cur_record = iterator.next() if cur_record is None: break cur_record.어쩌구저쩌구 }}} 가장 자세히 파싱할 수 있는 파서이며, 위 cur_record 객체는 SeqFeature를 따르는 객체로, BioCorba로 연동도 될 수 있다. * features * location - the location of the feature on the sequence * position - The position of the boundary. * extension - An optional argument which must be zero since we don't have an extension. The argument is provided so that the same number of arguments can be passed to all position types. * type - the specified type of the feature (ie. CDS, exon, repeat...) * ref - A reference to another sequence. This could be an accession number for some different sequence. * ref_db - A different database for the reference accession number. * qualifier - A dictionary of qualifiers on the feature. These are analagous to the qualifiers from a GenBank feature table. The keys of the dictionary are qualifier names, the values are the qualifier values. * sub_features - Additional SeqFeatures which fall under this 'parent' feature. For instance, if we having something like: {{{CDS join(1..10,30..40,50..60)}}} The the top level feature would be a CDS from 1 to 60, and the sub features would be of 'CDS_span' type and would be from 1 to 10, 30 to 40 and 50 to 60, respectively. * references * location - A list of Location objects specifying regions of the sequence that the references correspond to. If no locations are specified, the entire sequence is assumed. * authors - A big old string, or a list split by author, of authors for the reference. * title - The title of the reference. * journal - Journal the reference was published in. * medline_id - A medline reference for the article. * pubmed_id - A pubmed reference for the article. * comment - A place to stick any comments about the reference. * name : Version * annotations * discription : Definition