From: Further advances on Bayesian Ying-Yang harmony learning
Year | Outcomes | |
---|---|---|
1995 | The following fundamental points of BYY harmony learning were firstly proposed in Xu (1995): | |
(a) | The BYY system is proposed as a unified perspective for statistical learning. | |
(b) | Under the name of BKYY learning, the Ying-Yang best matching by the minimisation of K L(p(Y|X)p(X)∥q(Y|R)q(Y)) has been proposed for learning parameters θ. | |
(c) | One simplified version of H(θ) is proposed to get a hard-cut version of EM algorithm, see its Eqs. (19) and (20) and a criterion for selecting the number of components in Gaussian mixture (i.e. the cluster number), see Eqs. (22) and (24) in Xu (1995). | |
(d) | One preliminary version of the BYY harmony learning based automatic model selection was presented, see its Sect. 5.2. | |
(e) | The relationship H(p||q)=H _{ R|X }−K L(p||q) by Equation 10 was also firstly identified, see Eqs. (8), (11) and (12) in Xu (1995). | |
1996 | Points (c)(d) were verified experimentally in Xu (1996). | |
1997 | Four progresses are made as follows: | |
(a) | Beyond 1995(d), suggested H(θ) in a general expression as model selection criterion, see Eq. (12) in Xu (1997a) and Eq. (3.8) in Xu (1997b). Also, addressed its special cases on Gaussian mixture. | |
(b) | Proposed to use ${p_{h}^{N}}(X)$ by Equation 8 and learn h for regularisation, see Eq. (3.10) in Xu (1997b). A smoothed EM is proposed for Gaussian mixture, see Eq. (18) in Xu (1997c). | |
(c) | Proposed semi-supervised EM algorithm for Gaussian mixture, see Eq.(7.14) in XU (1997b). | |
(d) | Extended BKYY to BCYY by replacing Kullback divergence with its convex counterpart, see Sect.5 in Xu (1997a) and Eqs.(19)-(23) in Xu (1997c). | |
1998 | The following progresses are made: | |
(a) | Proposed equation (A) in Table 2 as a criterion for model complexity, e.g. see Eq. (49) in Xu (1998a) and Eq. (22) in Xu (1998b). | |
(b) | As an exemplar of 1997(a), derived model selection criteria for three-layer net and RBF net (see Eq. (56) and Eqs (61)-(64) in Xu (1998a)) and also for FA (see Eqs. (37)(43) in Xu (1998b)). | |
(c) | Beyond 1995(c), developed adaptive EM algorithms for learning RBF net (see Sect.3.2) and FA (see Sect.4.2.4) in Xu (1998b) and Sect.3.2 in Xu (1998c). | |
1999 | Further efforts are made, among which major ones are as follows: | |
(a) | Beyond 1997(a), proposed a general form for parameter learning and model selection, see Sect.2 in Xu (1999b), Sect.2.2 in Xu (1999a), and Sect.2.2 in Xu (1999c). | |
(b) | Beyond 1997(b), systematically studied data smoothing regularisation in Xu (1999d), with an approximation technique in Equation 18 and estimating techniques for h. | |
(c) | Proposed Taylor expansion approximation by Equation 18 to remove the integral in BYY implementation, see Eq. (90) and Eq. (91) in Xu (1999e), later in the journal papers (Xu 2000c, 2001b). | |
2000 | In Xu (2000d,2000a), H(θ) based harmony learning has been elaborated into its present formulation, supported by mathematical analysis on Ying-Yang best harmony versus Ying-Yang best matching, and featured with three innovative points: | |
(a) | Beyond 1999(a), proposed a general form of maxθ H(θ) with automatic model selection, see Eq. (29) in Xu (2000d) and Sect.4 in Xu (2000a). | |
(b) | Proposed Eq (23) in Xu (2000a) to implement equation (A) in Table 2 by learning θ with automatic model selection. | |
(c) | Also proposed normalisation regularisation in parallel to data smoothing regularisation in the above 1998(b), see Sect. 2 and Sect.3 in Xu (2000a) and Eq. (21) in Xu (2000d). | |
2001 | Further progresses are made as follows: | |
(a) | Used p(Y|θ,X _{ N })=q(Y|θ,X _{ N }) in Equation 12 to get Yang structure for maxθ H(θ), see Eq. (40) in Xu (2001a), Eqs. (24)(27) in Xu (2001c). | |
(b) | Developed a BYY harmony learning algorithm for Kernel regression and support vectors, see Sec.4.5 and Table seven in Xu (2001a). | |
(c) | Understood H(θ) in its general form from an information transferring aspect via three layer encoding, see Sect.4.3 in Xu (2001c). | |
(d) | Beyond 1998(b), derived model selection criteria for local PCA, see Eq. (23) in Xu (2001c), and local ICA, see Eq. (33) in Xu (2001d). |