BioinformaticsTheMachineLearningApproach Chap 4.
MachineLearning Algorithm
Introduction
Once a parameterized model M(w) for th data has been constructed, next step :
The estimation of the complete distribution P(w,D) and the posterior P(w|D)
The estimation of the optimal set of parameters w by maximizing P(w|D)
The estimation of marginals and expectations with respect to the posterior
Dynamic Programming
DynamicProgramming is a general optimization technique. Using Recursion, subdivid into two similar subproblems of smaller size.
- Finding the shortest path between two nodes in a graph.
BiologicalSequenceAnalysis에 다양하게 사용됨. 기본적으로 N2개의 행렬을 만들고 각각에 접근해야하므로, ComputationalComplexity가 O(N2)
Gradient Descent
GradientDescent : 최소값을 찾기위한 수치해석적 방법. iterative procedure.
learning method for NeuralNetwork (BackpropagationAlgorithm)
이 방법은 GlobalOptima 를 찾기 보다는, LocalOptima에 머무는 경우가 많다. 이에 다양한 변형된 GradientDescent방법이 존재하는데, ConjugateGradientDescent가 그중 하나.
Random-Direction Descent
LocalMinima 탈출이 중요하며, 기울기계산이 용이하지 않을 때 SteepestDescent방법 없이 써먹을 수 있는 descent procedure.
ExpectationMaximization 및 각종 evolutionary algorithm에도 사용된다.
EM/GEM Algorithm
ExpectationMaximization (EM) 및 generalized ExpectationMaximization algorithm
hidden value가 있는 model, situation에 유용하다. (hidden units in NeuralNetwork, hidden states in HiddenMarkovModel)
- 주어진 parameter로 hidden value가 계산되는 E step, hidden value로 다시금 parameter를 update하는 M step로 나뉜다.
if we define the energy of a hidden configuration H to be f(H), then the BoltzmannGibbsDistribution is given by the posterior.
Markov-Chain Monte-Carlo Methods
일반적인 Bayesian framework의 목적은 high-dimensional probability distribution P(x1,...,xn)에서의 expectation을 계산해내는 일이다. 이에관한 MarkovChainMonteCarlo (MCMC)방법은,
우선, MonteCarloMethod로 expectation을 approximation한다. E(f) = 1/T[ sum(0,T)f(xt1,...,xtn) ]
in order to sample from P, construct a MarkovChain having P as its equilibrium distribution, then simulate the chain and try tosample from its equilibrium distribution.
A MarkovChain is entirely defined by the initial distribution P(S0) and the TransitionProbability Pt = P(S(t+1)|S(t)).
In order to achieve our goal of sampling from P(x1,...,xn), we now turn to the two main MCMC algorithm: GibbsSampling and MetropolisAlgorithm
Simulated Annealing
SimulatedAnnealing : StatisticalMechanics에서 영감을 얻은 general-purpose optimization algorithm, MetropolisAlgorithm과 같은 MCMC ideas와 결합하여 사용된다.
Probability of being in state s at temp T is given by BoltzmannGibbsDistribution
e^(-f(s)/kT) P(s) = P(x1,...,xn) = -------------- Z
- Starting with a high temperature, and prograssively lowering it according to some annealing schedule.
logarithmic annealing schedule, Tt = T / log t, --> very slow, impractical
geometric annealing schedule, Tt = uT(t-1)
Evolutionary and Genetic Algorithm
Broad class of optimization algorithms simulate evolution.
- Generation of random perturbations, or mutations, and the presence of a fitness function that is used to assess the quality of a given point and filter out mutations that are not useful.
GeneticAlgorithm is subclass of EvolutionaryAlgorithm.
Learning Algorithm: Miscellaneous Aspects
Control of Model Complexity
Online/Batch Learning
Training/Test/Validation
Early Stopping
Ensembles