Notation

The transition probability from state xX to state yX can be denoted using v(x,y)a given the occurrence of a state jump, that is: y:y+x v(x,y)=1, v(x,x)

Only the point-wise evaluation of l., v(.,.) and simulation from v(x,.) for all xX are needed

The first assumption is that v and l are fixed and hence their estimation comes much later.

For the sample MCTC pathways, the following notations are essential

All visited states are denoted by X1,X2 which possess the jump chain denoted by XiXi+1All the corresponding holding times are denoted by H1,H2.Characterization of the model is under the following distribution: Xi+1|Xi~vXi,.,Hi|Xi~F(lXi) where F(l) represent the exponential distribution CDF with the rate lThe probability distribution Px induced by the model given that XN=x. N is the number of states that the process can exist in counting multiplicities with given time intervals of time (0,T). This can be denoted as follows: N=n=(i=1n-1HiT<i=1nHi)Proposal distributionThe basis of the methods proposal distribution is the need to simulate a sequence of events from v till y is achieved. The idea still needs some modification since it the number of states is countably infinite hence the probability of the jump sampling reaching the target state is minimal Reaching the target state would take a large number of steps even if the target states were finite. Secondly, modification of the idea is necessary due assignation of zero probability to paths visiting y state more than once.

Tree and sequences of observationsObservation takes the form a single branch. Each of the states is observed at each endpoint once the target state is reached. A sequential Monte Carlo algorithm (SMC) is utilised to replace the sampling algorithm to enable generalise the approach to most types of observation. This approach can be used for a series of states observed partially. SMC algorithm is however limited by the fact that it is problematic to reveal hidden states when the observations are weakly informative.

Parameter estimation

Our assumption has been that the dynamics driving the process are known; that is v and y are known. When the scenario changes and one of the governing dynamic parameters is known, for instance th, th is unknown for the holding time lth and jumping transition probabilities vth. Pseudo-marginal methods are employed in the approximation of posterior distribution on th for particle Markov chain Monte Carlo methods and fixed end-point setups.

Using the grouped independence Metropolis -Hasting (GIMH) algorithm, a pair x(t)=(thtZthtt) containing a present parameter th(t) and a marginal probability estimation Zthtt of the observations y given at th(t),ZthttPth(t) is kept in the algorithm memory at each MCM iteration t. GIMH sampler is advantageous due to accurate estimation of stationary distribution. However, is limited when it comes to approximation using finite number of particles. A proposal density parameter q(th(t)th needs to be specified as a requirement by the method.

Numerical examples

RNA folding pathway

Practical application of the approach is achieved experimentally using theoretical and real data from two CTMC examples as an earlier state. However, the primary CTMC examples that are focused on are the RNA folding pathways. These pathways are essential predictors of how nucleic acid molecules fold into their tertiary and secondary structures using intramolecular associations. The secondary structure of RNA molecules forms the state space describing the folding in the processes. Understanding the folding pathway of the nucleic acid has potential application in the medicinal field. This is because it entails understanding RNA and DNA secondary structures which are key in RNA functioning and determination of the rate of gene transcription respectively. It is key to note that folding of nucleic acid molecules into secondary structures occurs in a dynamic fashion.

Using our current approach, it can be estimated that the nucleic acid begins from secondary structure x and will transition over time T to acquire a new target structure y. This is referred to as the transition probability and can be calculated either by finding the solution to a system of linear differential equations or by solving a matrix exponential of a large matrix.

Model

When characterising RNA folding a set of base pairs are used to denote the position of the bases involved in the pairing in the sequence. Focusing on pseudoknot-free RNA secondary structure, it is essential to not that a planar circle graph can be used to represent these structures. In the graph, the sequence is arrayed along and positions in the sequence which are base paired are represented by cross arcs between them.

Xi and Xi+1 represent successive structures which should differ by a single base pair. Let X1=x and XN=y with the assumption that x is the start structure and y is the target structure. A generator matrix Q needs to be introduce so that the folding pathway can be formalized. In the matrix, there are all possible pairs of start and final secondary structure. The rate of the transition probability from x to x' as lxvx,x'=exp(Ex-Ex')/(kT) if x' R(x) and zero or otherwise is given by the Kawasaki rule. In this equation, E(x) represents the energy possessed by the secondary structure, R(x) the set of secondary structures with a single base pair of structure x and k represents Boltzmann constant.

The size of the generator matrix Qbecomes exponentially in the sequence length when the nucleic acid sequence of m basese. This is because it is possible to generated many secondary structures from the given sequence.

String-valued evolutionary modelEvolution of biomolecular sequences evolutionary models in phylogenetic is a crucial aspect of CTMC. In these models, the mutation such as insertion and deletion are representative of the jumps while biomolecular sequences are represented by the states

Model

The SSM-aware model is utilised in an approximation of transition probabilities. In the model, a single marginal variable possesses an infinite domain count with all the possibilities in terms of molecular possibilities. This marginal variable is denoted as Xt. l(x) is defined as a function of mutation rate per base thsub, nucleotide insertion rate lpt rate of deletion of a single nucleotide per base upt, global SSM insertion rate lSSM(this copies a length of substring up to three to the right of the substring and the deletion rate of SSM per valid location of SSM deletion uSSM (similar to SSM insertion rate except the process involves nucleotide removal rather than addition): l(x) = m(x)thsub + lpt + m(x)upt + lSSM + k(x)uSSMIn the above equation, the undefined parameters m(x) is representative of the string x length and the number of location for SSM deletion are represented by k(x). th = (thsub,lpt,upt,lSSM,uSSM) is used to denote the evolutionary parameters. Normalization of each of the stated rate is utilized to obtain the jump transition probabilities. For all th the process is explosion free since the total insertion rate is independent of the string length.

Conclusion and recommendations

The Markov chain Monte Carlo approach used is evidently efficient in...

If you are the original author of this essay and no longer wish to have it published on the midtermguru.com website, please click below to request its removal:

- Defining Success Factors of Singapore Airlines by Data Collection Method - Paper Example
- Statistics Paper Example: Sampling Distribution
- Essay on ANOVA (Analysis of Variance) in Public Health
- Statistics Paper Example on Female Headed Households in Indonesia
- Statistics in Business - Paper Example
- Statistics Pape Example: One-Sample Hypothesis Test
- Comparing Statistics of the US and Greece Health Care System - Paper Example