Skip to main content


Table 2 Potential applications

From: Enviro-geno-pheno state approach and state based biomarkers for differentiation, prognosis, subtypes, and staging

Task Study description
(a) Expression data differentiation We find d-states as biomarkers by examining one \({g}_a\) vs \({e}_a\) (e.g., one \({g}_a\) or \({g}_b\)) by 2D scattering map. Also, one \({e}_c\) can be jointly examined with one map for \({e}_c=1\) and one map for \({e}_c=0\)
(b) Mutation analysis We examine one \({g}_D\) vs \({e}_c\). First, get a \(2\times 2\) table for \({g}_D\). Then, the table is split into a 3D one with one slice for \({e}_c=1\) and the other for \({e}_c=0\). Also, we may use one additional \({g}_D\) as \({e}_c\) to get a 3D table. Moreover, each slice may be further split by considering a new \({e}_c\). All the resulted slices are analysed in a way similar to Table 1(3)(a)
(c) SNP analysis The situation is similar to the above except that a \(2\times 2\) table becomes a \(2\times 3\) table in consideration of \({g}_D\) in a tri-nary values to denote AA, Aa, and aa. When using another SNP as \({e}_c\), its tri-valued \({g}_D\) is replaced by a binary one that takes either 0 if the sample has no SNP on this site or 1 otherwise
(f) High-risk samples Based on the above studies, we estimate the posteriori \(p(+ |x)\) per sample x and pick one with its value higher than a threshold as a high-risk sample, which is directly applicable to expression data. For sequencing data and particularly for finding SNPs, it difficult to get \(p(+ |x)\) because merely a few samples have variants on a particular site of \(g_c\). Instead, a sample is regarded as risk simply when there is a variant on the site of \(g_c\) or an enough number of variants on the sites of multiple SNPs
(g) Expression-sequencing echoing We obtain d-states and trees on expression data and sequencing data, and examine whether the results from two types of data in accordance with each other.
(h) Expression-sequencing combining (ESC) test Assume the null \(H_0\) holds on both the E-side and the S-side and using \(E_{\lnot {H}^*}\) and \(S_{\lnot {H}^*}\) to denote making alarm on its corresponding side, we get \(p(E_{\lnot {H}^*}, S_{\lnot {H}^*}|s)=p(E_{\lnot {H}^*}| S_{\lnot {H}^*}|s)p_S\) with \(p_S=p( S_{\lnot {H}^*}|s)\) being the p value obtained on the S-side and \(p(E_{\lnot {H}^*}| S_{\lnot {H}^*},s )\approx Card(B_E)/Card(B_S), \) being the probability of rejecting \(H_0\) on the E-side conditioning on that \(H_0\) is rejected on the S-side, where \(B_S\) consists of biomarkers on which \(H_0\) is rejected significantly on the S-side, and \(B_E\subseteq B_S\) consists of biomarkers on which \(H_0\) is also regarded as significantly rejected on the S-side
(i) E-GPS based Integration Integration may also be made by examining one \({g}_a\) from expression of a gene versus \({g}_c\) from multiple SNPs within the DNA sequence of the gene (e.g., either the number of or the average score of multiple SNPs)
* General settings
\(\mathbf{g}\):     each of its elements is a g-variable that could be
\({g}_a\) a real variable for expression of an RNA unit, e.g., either of mRNA, lncRNA, and circRNA;
\({g}_b\)  a real variable for a signature expression (i.e., a collective expression of a set of RNA-units);
\({g}_c\) a discrete label for an SNP in DNA sequence (could be multiple SNPs per an RNA unit);
\({g}_D\) a binary variable that indicates whether there is a mutation within a bio-unit sequence (e.g., gene, pathway, etc). There are usually multiple variables for different type mutations
\(\mathbf{\pmb {\phi }}\):    each of its elements is a \(\phi\)-variable that could be
\({\phi }_a\) a binary variable that indicates ‘case vs control’ or ‘abnormal vs normal’ ;
\({\phi }_b\) a binary or discrete variable that indicates clinical features;
\({\phi }_c\)a discrete label that indicates one of subtypes or grades or stages;
\({\phi }_D\) a real variable that indicates the occurrence of an event (e.g., survival time)
\(\mathbf{e}\):     each of its elements is an e-variable that could be
\({e}_a\) a g-variable that acts as a condition for our examination;
\({e}_b\) a \(\phi\)-variable that act as a condition for our examination;
\({e}_c\)  a binary variable that indicates whether a treatment is made, e.g., adjuvant chemotherapy;
\({e}_D\) an environmental variable, in either discrete (e.g., sex M/F) or real (e.g., age)