(<-)

[BioinfoMla/NeuralNetworkTheory]

BioinfoMla

[BioinfoMla/HiddenMarkovModelTheory]

(->)

BioinformaticsTheMachineLearningApproach Chap 6.

History of application of NeuralNetwork to BiologicalSequenceAnalysis.

6.1 Sequence Encoding and Output Interpretation

Importance of Input representation

During training of MLP, networks tries to segregate the input space into decision regions using hyperplanes.

a window of size W :

|A| possible monomers for each position

trade-offs between different encoding scheme

real-numbered quantification --> harmful impact on the input space

decreasing nonlinearity of a prediction problem

In some aplication, the whole sequence or large segments of it is casted into global preprocessed measure that can be used as input information.

Output interpretation or postprocessing

6.2 Sequence Correlations and Neural Networks

ProteinStructure can be highly conserved despite a very low sequence similarity

NN can sense this cooperativity through its ability to correlate the different input values to each other. NN can complement what one can obtain by weight matrices an by HMM.

6.3 Prediction of Protein Secondary Structure

The assignment fo the secondary structure categories to the experimentally determined 3D structure

6.3.1 Secondary Structure Prediction Using MLPs

Qian, Sejnowski -- fully connected MLP with a single hidden layer.

6.3.2 Prediction Based on Evolutionary Information and Amino Acid Composition

NN + Chou-Fasman rule

bayesian

NN + bayesian method.

PHD prediction server (Rost and Samder) : 1996 Asilomar competition CASP2 --> 65-68% accuracy

6.3.3 Network Ensembles and Adaptive Encoding

Riis and Krogh address the overfitting problem by careful design of the NN architecture

result

6.3.4 Secondary Structure Prediction Based on Profiles Made by Position-Specific Scoring Matrices

the profile quality depends on the alignment approach used to select the sequences behind the profile. replacing HSSP profile in PHD method

6.3.5 Prediction by Averaging over 800 Different Networks

Output Expansion

6.4 Prediction of Signal Peptides and Their Cleavage Sites

The identification problem is to some extent organism-specific. NN-based prediction is successful when Gram(+), Gram(-) and eukaryote is treated separately

6.4.1 SignalP

6.5 Application for DNA and RNA Nucleotide Sequences

6.5.1 The Structure and Origin of the Genetic Code

The codon assignments are correlated to the physical properties of the amino acids in a systematic and error-correcting manner.

NN to the genetic code is unbiased and completely data-driven. --> no a priori realationship

backpropagation to train the feed-forward architecture --> to get a low analog network error E, but not necessarily a low classification error Ec.

internal representation of 2 hidden units --> 3 groups dividd the GES scale of transfer free energy into 3 interval (except arginine)

genetic code is inherently nonlinear.

The weight of the trained network

6.5.2 Eukaryotic Gene Finding and Intron Splice Site Prediction

6.5.3 Examples of Gene Structure Prediction by Sensor Integration

The use of a combination of sensors for detection of various signals related to a complex object has a long histroy in the theory of pattern recognition

GRAIL

GeneParser

6.5.4 Prediction of Intron Splice Sites by Combining Local and Global Sequence Information

1991 NetGene : 3 networks

Arabidopsis thaliana NetPlantGene

6.5.5 Doing Sequence Analysis by Inspecting the Order in Which Neural Networks Learn

6.6 Prediction Performance Evaluation

It is often relevant to measure accuracy of prediction at different levels.

At higher levels, the measures tend to be more complicated and problem-specific.

How do we assess the accuracy of M or how do we compare M to D?

6.7 Different Performance Measures

6.7.1 Percentages

6.7.2 Hamming Distance

6.7.3 Quadratic "Distance"

6.7.4 Lp Distances

6.7.5 Correlation

6.7.6 Approximate Correlation

6.7.7 Relative Entropy

6.7.8 Mutual Information

6.7.9 Sensitivity and Specificity

6.7.10 Summary

BioinfoMla/NeuralNetworkApplication (last edited 2011-08-03 11:01:01 by localhost)