[15] Grand challenges in bioinformatics (QnAboard)

BioinfoSarangNet's FreeBoard 자유게시판

Front

Sefiroth

Category

Board

Guest

PDS

Blog

RecentChanges

이름: Yong (yong27@nownuri.net) ( 남 )

홈페이지: http://madang.ajou.ac.kr/~yong27

2000/9/29(금) 23:37 (MSIE5.5,WindowsNT5.0) 202.30.26.103 1024x768

Grand challenges in bioinformatics

Bric에 게시되었던 글입니다.
Genome 정보로 부터 생명현상의 전 기능을 유추할 수 있을것인가 아닌가...
글중에 나오는 얘기입죠....
단순한 서열정보로 부터는 아무래도 힘들지 않나 싶네요

이 글은 KEGG를 운영하고 있는 일본의 Dr. Minoru Kanehisa가 Bioinformatics journal (Vol.14 no.4 1998, 309)에 낸 글을 번역한 것입니다.

Amino acid sequence로부터 단백질의 3 차 구조를 예측하는 문제는 computational molecular biology의 가장 큰 도전 중의 하나이다. 단백질의 3 차 구조는 열역학적 안정성에 의해 결정되므로, 단백질의 3차 구조를 결정하는데 필요한 모든 정보들은 서열에 포함되어 있다고 생각할 수 있다. 즉 어떤 특정 환경이 주어지면 단백질은 환경에 맞추어 저절로 folding이 일어난 다는 것을 의미하며 Anfinsen's thermodynamic principle이라 부른다.

in vitro 실험 조건에서 몇몇 선택적 단백질에 대해 이 법칙은 잘 적용되었다. 하지만 최근의 연구에 의해 in vivo protein folding은 훨씬 복잡하고, chaperon과 같은 다른 molecule들과 연관되어 동적인 구조를 가진 것으로 알려지고 있다. 또한 단백질 주위 환경도 단순한 열역학적 환경이 아니고, 단백질을 구성하는 각 분자들이 가지는 여러 형태의 interaction의 총합으로 인식되어 지고 있다. 따라서 분자들사이의 이러한 특별한 interaction을 고려하지 않는 한 자연 상태에서의 protein folding 문제는 해결되어지지 않을 것이다. 이러한 문제는 단백질의 2 차구조를 예측하는 문제에서도 이미 발생하였다. 즉 아무리 좋은 알고리듬을 사용하여도 short-range interaction만을 고려하는 한 예측 프로그램은 한계를 가지고 있다. 이러한 문제는 단백질 3 차 구조를 예측하는 프로그램들의 한계로도 적용될 것이다.

우리는 whole-genome sequencing 시대에 접어들면서 organism reconstruction problem이라는 새로운 큰 문제에 직면하게 되었다.즉 주어진 complete genome sequence로부터 한 개의 세포로부터 성숙한 개체로 발생하는 전과정과 발생 과정에 필요한 기능들을 컴퓨터를 통해 예측하는 커다란 도전에 직면해 있다. 여기서 현재 우리는 protein folding problem과 마찬가지로 genome은 한 개체의 청사진이며 개체를 구성하는데 필요한 모든 정보들을 다 포함하고 있다는 전통적인 시각을 가지고 있다. 즉,원래의 핵(nucleous)을 대신하여 특정 개체의 모든 정보들을 가진 clone들을 분자 생물학 방법에 의해 제작할 수 있다는 결론에 도달할 수 있으며 이것은 Dolly's cloning principle이라 불릴 수 있을 것이다.

이러한 가정이 맞다면 우리는 언젠가는 sequence information으로부터 모든 유전자의 기능을 예측할 수 있을 것이다.각각의 유전자들의 기능은 주위 환경과 연관되어 그 기능을 가지므로, 앞의 가정은 서열로부터 유전자의 기능 뿐 아니라 주위 환경까지 예측되어 질 수 있다는 것을 암시한다. 따라서, 예를 들면, genome sequence의 bioinformatics 연구를 통해 한 개의 germ cell로부터 파생되는 모든 분자 구조와 반응 기작을 예측 할 수 있을 것이다. 따라서 우리는 한 개체의 형태와 기능은 핵에 의해 대표된다고 말할 수 있을 것이다.

하지만 최근 제시된 또 다른 시각으로는 genome은 단지 개체를 구성하는 일부분에 지나지 않고, 진정한 청사진은 분자들의 interaction들의 network으로 구성된 세포 전체라는 가정이다. 이러한 가정에 따르면 분자들간의 상호 interaction, 유전자의 시간적 공간적 발현에 관련된 정보등, 추가 정보 없이 genome sequence만으로는 전체적인 개체를 파악할 수 없다는 결론에 이르게 된다. 사실상 특정 개체의 1/3 혹은 1/2의 sequencing이 끝난 상태에서도 가상 단백질의 기능에 관련된 정보를 얻기 위해서는 disruption experiment를 통해 gene-gene interaction을 밝히거나 yeast two-hybrid system을 이용한 protein-protein interaction을 밝히는 실험 과정을 거쳐야 한다.

Sequence information이 급격히 증가함에 따라 Bioinformatics는 새로운 학문으로 급부상하고 있으며, 새로운 데이터베이스들과 컴퓨터 기술들을 발전시킴으로써 서열이 가진 생물학 정보를 밝히는데 큰 역할을 하고 있다. 앞으로 도래할 systematic functional analysis 시대에는 bioinformatics는 단순한 정보 저장 및 가공의 차원을 넘어서 분자들간의 interaction에 관련된 완벽한 catalog를 제공할 수 있을 것으로 기대된다.이러한 발전된 형태의 정보들을 통해 현재 Bioinformatics가 당면한 큰 문제들은, 처음 계획된 형태는 아닐 지라도, 언젠가는 밝혀질 것으로 기대된다.

<원문>

Grand challenges in bioinformatics
The protein folding problem has been one of the grand challenges in computational molecular biology. The problem is to predict the native three-dimensional structure of a protein from its amino acid sequence. It is widely believed that the amino acid sequence contains all the necessary information to make up the correct three-dimensional structure, since the protein folding is apparently thermodynamically determined; namely, given a proper environment, a protein would fold up spontaneously. This is called Anfinsen's thermodynamic principle.

While this principle is well established in selected proteins under in vitro experimental conditions, protein folding in vivo is a more complex and dynamic process involving a number of other molecules such as chaperones. The environment has to be considered as a collection of various interactions with molecules rather than a smooth thermodynamic environment. It is not unreasonable to expect that the protein folding problem cannot be solved for the majority of proteins in nature without considering specific molecular interactions. This is reminiscent of the problem of secondary structure prediction in proteins. However good the algorithms developed for secondary structure prediction are, the success rate will be limited as long as only the short-range interactions are considered. Similarly, however good the algorithms developed for the three-dimensional structure prediction are, the success rate will be limited as long as only the information of a single molecule is examined.

In the era of whole-genome sequencing, we are faced with another grand challenge problem, which may be called the organism reconstruction problem. Given a complete genome sequence, the problem is to predict computationally the development of the adult from a single cell and its continual function as a biological organism. Here again, a traditional view is that the genome is a blueprint of life containing all the necessary information that would make up an organism. A clone can be made by replacing the nucleus, which is the localized area containing all genetic information. Thus, this might be called Dolly's cloning principle.

According to this genetic determinism principle, we should eventually be able to predict the function of every gene in the genome by its sequence information alone. Implicitly, this assumes that the environment of each gene is also computable from the complete genome sequence because the function of a molecule can only become meaningful in relation to its environment. Therefore, the entire molecular architectures and molecular reaction pathways in a germ cell, for example, may be computable from the genomic sequence. We thus end up asserting that the form and function of an organism are represented in the nucleus.

In an alternative view, the genome is simply a warehouse of parts, or building blocks of life, and a real blueprint of life is written in the entire cell, perhaps as a network of molecular interactions. Whichever view one takes, it is impossible in practice to make sense fully out of the sequence data without additional information, including time and localization of expression and, especially, the information on molecular interactions. In fact, in order to obtain any functional clue of hypothetical proteins that still form one-third to one-half of the genes in every genome that has been sequenced, new systematic experiments are being designed to observe, for example, gene-gene interactions by disruption experiments and protein-protein interactions by yeast two-hybrid system experiments.

Bioinformatics has emerged as a major discipline due to the rapid increase in sequence information, developing new databases and computational technologies that help us to understand the biological meaning encoded in the sequence data. In a post-genomic era of systematic functional analysis, the basis of bioinformatics is not only the complete catalogue of building blocks, but also the complete catalogue of their interactions. With this new level of information, the grand challenge problems in bioinformatics, both old and new, and both structural and functional, may one day be elucidated, although not in the manner in which they were originally formulated.

번호	제 목		이름	첨부	작성일	조회

17	Re..답...		Yong		10/12 [13:59]	403

16	help...		전영진		10/5 [23:22]	382

15	Grand challenges in bioinformatics		Yong		9/29 [23:37]	2024

14	생물정보학의 최근동항 (원세연�...	741	Yong		9/25 [02:07]	13929

13	선배님 하이~~		iris		9/24 [17:41]	439

12	생물정보학... (원세연박사님 홈�...		Yong		9/15 [14:40]	811

11	있잖아요~		몽이		9/15 [09:34]	535

10	그냥 심심해서...		Yong		9/14 [21:47]	381

9	Re..힘내요~^^		ghost		9/27 [23:00]	311

8	초딩반창회 홈페이지		Yong		9/13 [13:39]	422

Web	biohackers.net