 Research
 Open Access
 Published:
Learning sceneaware image priors with highorder Markov random fields
Applied Informatics volume 4, Article number: 12 (2017)
Abstract
Many methods have been proposed to learn image priors from natural images for the illposed image restoration tasks. However, many prior learning algorithms assume that a general prior distribution is suitable for over all kinds of images. Since the contents of the natural images and the corresponding lowlevel statistical characteristics vary from scene to scene, we argue that learning a universal generative prior for all natural images may be imperfect. Although the universal generative prior can remove artifacts and reserve a natural smoothness in image restoration, it also tends to introduce unreal flatness and clutter textures. To address this issue, in this paper, we present to learn a sceneaware image prior based on the highorder Markov random field (MRF) model (SAMRF). With this model, we jointly learn a set of shared lowlevel features and different potentials for specific scene contents. In prediction, a good prior can be adapted to the given degenerated image with the scene content perception. Experimental results on the image denoising and inpainting tasks demonstrate the efficiency of the SAMRF on both numerical evaluation and visual compression.
Introduction
Image restoration tasks, such as denoising (Tappen et al. 2007; Schmidt et al. 2010; Schmidt and Roth 2014), deblurring (Krishnan and Fergus 2009; Krishnan et al. 2011; Levin et al. 2009; Zhang et al. 2013; Gong et al. 2016, 2017) and super resolution (Tappen and Liu 2012) are all inherently illposed. Some knowledge of natural images is used as prior to boost the estimation stability and to recover information lost in nonideal imaging processes. Recently, many image priors work on image gradients for briefness of modeling and better performance (Fergus et al. 2006; Levin et al. 2007, 2009; Krishnan et al. 2011; Krishnan and Fergus 2009; Xu et al. 2013; Zhang et al. 2013). However, the representation of image prior distribution in gradient domain is fragile for sophisticated concept of natural, as the variant of image content and/or scale makes the gradient characteristics unstable for modeling the unique clear individual images.
Motivated by the demand of capturing stable and accurate prior knowledge of natural image, many lowlevel modeling technologies including feature representation and related distribution are studied. Recent years have seen a trend to figure out this issue through the use of probabilistic graphical models (e.g., MRF and CRF) with nonGaussian potential functions (Roth and Black 2005; Weiss and Freeman 2007; Samuel and Tappen 2009; Schmidt et al. 2010, 2014; Schmidt and Roth 2014; Chen et al. 2015), such as fields of experts (FoE) (Roth and Black 2005; Schmidt et al. 2010).
All of the manually designed priors and learned priors expect to model a universal distribution to represent all realworld natural images (in a specific discussed domain). Unfortunately, different images with different scene contents have varying statistics on usual lowlevel features like gradients or responses of learned filters in highorder MRF cliques (Fig. 1). Figure 1 shows that images with different contents (Left) have different responses on the gradient filter (Middle) or the learned highorder filters in Schmidt et al. (2010) (Right). Therefore, relying on universal generative image prior to recover every specific image is improper.
Considering the gap between the universal image prior and the special property of individual images, a series of contentrelated image priors are exploited in many image restoration tasks (Tappen et al. 2007; Cho et al. 2010; Sun et al. 2010; Schmidt and Roth 2014; McAuley et al. 2006). In Tappen et al. (2007) and Cho et al. (2010), local features are utilized to adapt the prior works on local areas in restoration tasks. However, as the local features like gradient filter responses (Tappen et al. 2007) and local texture (Cho et al. 2010) are usually not striking on weak edges or regions with ambiguous content, these localspecific models face inaccurate labeling problems, and the restoration results often suffer artifacts. In addition, the models in Tappen et al. (2007) and Schmidt and Roth (2014) can only be learned for specific state of the degeneration, which limits the range of application. The previous related works trying to approach the contentaware prior mainly focus on connecting the contents with some simple features such as statistics on gradients, since connecting the complex lowlevel features (e.g., any filter responses) with the highlevel features representing the scene contents is more difficult. Additionally, recently, McAuley et al. (2006) proposed to the highorder MRF prior for color images. In Feng et al. (2016), a highorder natural image prior model was proposed for reducing the Poisson noise. Ren et.al. (2013) introduced the “contextaware” concept into the sparse representation for image denoising and superresolution. Considering the limitation of expression ability of the classical MRF, Wu et.al. proposed to compact the MRFs with deep neural networks (Wu et al. 2016).
In a natural image, lowlevel statistical characteristics are usually generated by the contents in the captured scene (Torralba and Oliva 2003). And the scene perception for an image is usually more robust than the pixellevel (lowlevel) characteristics. Based on this observation, we focus on developing a sceneaware prior model that can adapt the manifolds of the scenerelated content in an image globally instead of taking the local structures. In this paper, we propose a sceneaware Markov random field (SAMRF) model to capture the scenediscriminating statistical prior of any whole natural image; the SAMRF model owes highorder nonGaussian potential conditioned on a scene coefficient extracted from highlevel concepts of observations. This is based on an assumption that the highlevel contents are preserved fairly even in degenerated observations. Then related efficient algorithms for learning and inference are proposed. Experiments on image restoration tasks, denoising and inpainting, illustrate that the SAMRFbased sceneaware image prior captures the image statistic characteristics accurately and improves the quality of images effectively.
Sceneaware image prior based on MRFs
The purpose of this paper is to build a system, in which (1) a highorder MRF model depending on scene content of the image is proposed to model the lowlevel statistical distribution and (2) the observed image can be adapted to a specific proper prior in restoration procedure. Overview of the system is illustrated in Fig. 2.
Sceneaware MRF model
The distribution of natural image \(\mathbf {x}\) is formulated as a highorder MRF (Schmidt et al. 2010). To let the scene content information guide the modeling, we introduce an explicit scene coefficient as a parameter of the distribution.
Let \(\{\mathbf {F}_i\}_{i=1}^{N}\) denote a set of filterbased features to capture the lowlevel characteristics of natural images, and \(f(\mathbf{x})\) denote the scene content perceiving feature, which is defined as the scene coefficient. With the scene coefficient, the probability distribution of the corresponding clear image is defined as
where \({\mathcal C}\) is the set of maximal cliques (Koller and Friedman 2009) of the MRFs; \(\mathbf{x}_c\) thus denotes a subvector of \(\mathbf{x}\) corresponding to the clique^{Footnote 1} c; \(\phi (\cdot )\) represents the potential function; \(\mathbf {w}_i\) is the parameter of the potential function \(\phi (\cdot )\) which depends on \(f(\mathbf{x})\) and the parameter \(\varvec{\theta }\) through a function \(\mathbf{w}(\cdot )\) ^{Footnote 2}; \(\varvec{\Theta }\) is the collection of parameters \(\mathbf{F}_i\)’s and \(\mathbf{w}_i\)’s; and \(Z(\cdot )\) denotes partition function normalizing the product of the potential functions (Koller and Friedman 2009). Because our MRF model (1) explicitly considers the scene content, we call it as sceneaware MRF. Model (1) is called sceneaware prior since the parameters of the prior distribution in (1), i.e., \(\mathbf{w}_i\), depend on he scene coefficient f(x), which captures the scene content of a specific image \(\mathbf{x}\) in practice.
Potential function conditioned on scene coefficient
In (1), the formulation of the potential function is still not given. In this section, we will focus on the modeling of the potential function depending on the scene coefficient.
On account of the heavytailed filter statistics of natural image, we formulate the nonGaussian potential function based on Gaussian scale mixtures (GSMs) models:
where J is the number of mixture components and is set as 15 in this paper; \(w_{ij}\) is the j th component of the parameter vector \(\mathbf {w}_i\); \(\sigma _i^2\) and \(s_j\) denote the base variance and scale of Gaussian components, respectively. Following Schmidt et al. (2010), we set the scales as \(s={\text {exp}}(9, 7, 5, 4, ..., 1, 0, 1, ..., 4, 5, 7, 9)\). Benefiting from the mixture of Gaussian formulation in (2), the inference of the model can be simplified due to the conjugacy. From (2), the scene coefficient \(f(\mathbf{x})\) influences the potential function through the weights of Gaussian components \(w_{ij}\), and links the lowlevel characteristics and highlevel properties associated with the contents in the scene. Then we will introduce how to build the linkage, and give the definitions of \(\mathbf{w}(\cdot )\) and \(f(\cdot )\).
Link the image \(\mathbf{x}\) and the scene perception through \(f(\mathbf{x})\)
Given a \(\mathbf{x}\), an easy way to represent its scene is to assign the discrete labels associated with the content (e.g., objects or scene) in \(\mathbf{x}\) as many scene understanding works (Li et al. 2009). However, because there is a bias between the highlevel perception of the content and the lowlevel feature [e.g., SIFT (Lowe 2004) and GIST (Oliva and Torralba 2001)] (Li et al. 2010), even images with same content labels may have dissimilar lowlevel statistical distributions. Instead of tackling this issue directly, we try to take advantage of it. Because our task roots in the lowlevel tasks, we do not need to assign exact labels to the contents in the scene. We directly use the Bagofwords (BoW) histogram of SIFT descriptors to toward scene perception. Given an image \(\mathbf{x}\), we extract dense SIFT (DSIFT) from it and generate DSIFTBoW histogram with 200 vocabularies as \(\mathbf{b}_{\mathbf{x}}\). \(f(\mathbf{x})\) is defined as \(f(\mathbf{x})=\mathbf{b}_{\mathbf{x}}\). To extract dense SIFT, we run the SIFT feature extractor on a dense grid of location covering all locations on an image at a fixed scale and orientation. Specifically, in prediction task, given a \(\mathbf{y}\), we first roughly recover a clear image \(\hat{\mathbf {x}}(\mathbf {y})\). For example, for noisy observation \(\mathbf{y}\), we do denoising via a simple Wiener filter (Sonka et al. 2014) or Gaussian lowpass filter. Then we extract DSIFTBoW feature from \(\hat{\mathbf {x}}(\mathbf {y})\) and let \(\mathbf{b}_{\hat{\mathbf {x}}(\mathbf {y})}\) represent the corresponding feature of the latent clear image. The encoder of DSIFTBoW is denoted as \(\mathrm {D}\). Note that, a clear image can be roughly recovered using some simple methods for extracting the BoW feature as the initialization. But it is not good enough to show many pixellevel details.
Link the scene coefficient \(f(\mathbf{x})\) and lowlevel statistics through \(\mathbf{w}(f(\mathbf{x}), \varvec{\theta })\)
Images containing similar scene contents usually follow similar lowlevel distributions. Based on this observation, for simpleness, we first assume that all images \(\mathbf{x}\) can be clustered into K clusters w.r.t. \(f(\mathbf{x})\). Accordingly, we assign the images to the clusters of which the centroids are the closest based on Euclidean distance. The images belonging to the kth cluster is given a tag k and represented as \(\mathbf{x}^k\). We assume that the training images are from a distribution of which the parameters are the combination of K sets of parameters \(\{\mathbf{w}_i^k\}\). In learning process, we assume each \(\{\mathbf{w}_i^k\}\) for all k can be learned by fitting the observations belonging to the kth cluster, i.e., \(\{\mathbf{x}^k_i\}\). We define the centroids of the each clusters as \(\mathbf{x}'^{k}\). Following this, we approximate each \(\mathbf{w}_i\) as a linear combination of K principle \(\mathbf{w}_i^k\):
where \(\kappa _k(\cdot , \cdot )\) is a similarity measurement of \(f(\mathbf{x})\). We let \(\kappa _k(\cdot , \cdot )\) be a simple Gaussian kernel:
where \(\sigma _k^2\) is the band width of the kernel regarding to the kth cluster. Benefiting from this clusteringbased representation, the distribution model (1) can be learned efficiently (see “Learning algorithm” section).
Learning algorithm
In this section, we will introduce an efficient learning algorithm that estimates model parameters from highquality training samples, and inference algorithm for image restoration.
Given a set of training images, \(\{\mathbf {x}_t\}_{t=1}^{T}\), the parameters of the model \(\varvec{\Theta }\) and lowlevel features \(\{\mathbf {F}_i\}\) are estimated by maximizing the likelihood on the training data. We maximize the likelihood through minimizing the Kullback–Leibler divergence (KLD) between the model and empirical distribution of training data.
Substituting (2) into (1), the loglikelihood of observations is formulated as
Note that in our model, the filters \(\mathbf{F}_i\)’s are shared by all images in K different clusters, while the weights \(\mathbf{w}_i^k\)’s are different for K different clusters.
Relying on the clustering based definition of \(\mathbf{w}_i\), the whole learning scheme comprises two steps: (1) calculating \(\kappa (f(\mathbf{x}_t), f(\mathbf{x}'^k))\) for all images, and (2) estimating \(\{\mathbf{F}_i\}\) and \(\{\mathbf{w}_i^k\}\). Firstly, we extract dense SIFT features from \(\{\mathbf{x}_t\}\), build the encoder dictionary \(\mathrm {D}\), and generate DSIFTBoW features \(\mathbf{b}_{\mathbf{x}}\) for all images. Secondly, we cluster images w.r.t. \(\{\mathbf{b}_{\mathbf{x}_t}\}\) using Gaussian mixture model (GMM). We let \(\mathbf{b}^k\) denote the centers of kth clusters, and \(\sigma _k^2\) in (4) be the average of the diagonal of the covariance of the corresponding kth Gaussian distribution. With the clustering result, \(\kappa (f(\mathbf{x}_t), f(\mathbf{x}'^k))\) can be easily computed. Lastly, we estimate \(\{\mathbf{F}_i\}\) and \(\{\mathbf{w}_i^k\}\) via contrastive divergence learning (CD) with onestep sampling (Hinton 2002) and stochastic gradient descent algorithm (SGD) (Bottou 2010). Taking partial derivatives of the energy function (5) w.r.t. the parameters leads to the following update:
where \({\frac{\partial E(\mathbf{x}_t)}{\partial \mathbf {F}_i}}\) is the derivative w.r.t. \(\mathbf{F}_i\) at \(\mathbf{x}_t\), and \(\mathbf {E}_{p(\mathbf{x};\{\mathbf{F}_i\},\varvec{\Theta })}(\mathbf{x})\) is the expectation value w.r.t. the model distribution.
An auxiliaryvariablebased Gibbs sampler (Schmidt et al. 2010) is used to draw samples from the model distribution. The expectation can be calculated by averaging over the samples. Full learning scheme is illustrated in Algorithm 1.
Applications and experiments
To evaluate the modeling ability of the sceneaware prior on realworld image directly, we evaluate the performance of the learned prior on image denoising and image inpainting. Before the evaluation, we will first introduce some implementation details for learning the sceneaware image prior and the learned model in this paper. Following that, we then revisit the standard Bayesian restoration formulation and derive an MMSE estimation approach for our sceneaware image prior.
Learning details and learning results of the SAMRF
To learn the MRF model, 450 images in Berkeley Segmentation Database (BSD500) (Martin et al. 2001) are exploited as the training dataset. The training images are transferred to grayscale. In learning procedure, the scene coefficients, e.g., dense BoW vectors, are extracted from each whole image for data clustering. Nevertheless, 10 patches with size \(50 \times 50\) are cropped randomly from each image to update the lowlevel MRF model parameter. Note that, the testing images are excluded from training dataset. The number of GMM mode is set as \(K=4\) in this paper. Before extracting DSIFT for both learning and prediction, images are smoothed with a Gaussian kernel whose standard deviation is set as 1.0.
When we set the number of GMM mode K as 4, the training images are split into four sets. We randomly select several representative samples from each cluster and illustrate them into Fig. 4. As shown in Fig. 4, images within same clusters have closed appearances; conversely, images in different clusters have different visual properties. Although the clustering result does not follow the contents strictly, it reflects the lowlevel properties properly. For example, in Fig. 4, the cluster on the left contains a lot of clear and flat background areas, and the right bottom one has more complex textures and clutters. The clustering result provides a preferred intermediate result to let the algorithm learn diverse and meaningfull features and distributions. As a result, the learned filters and four sets of experts (potential functions) are shown in Fig. 3. Figure 3a shows the learned filters, and b–e are the learned weights and curves of the potential functions for the four clusters, respectively. Comparing to the learning result in Schmidt et al. (2010), our filters have a wider variant region, and the experts have more spiky peak and heavy tail, which reserve the favored image in a narrower region.
Bayesian image restoration formulation
Given an observed image \(\mathbf {y}\), which is assumed to be degenerated from a latent highquality image \(\mathbf {x}\). The distribution of \(\mathbf {x}\) follows the model in Eq. (1) conditioned on the scene coefficient \(f(\mathbf{x})\). The restoration algorithm consists of two steps: (1) generating \(\mathbf {b}_{\hat{\mathbf{x}}(\mathbf{y})}\) based on “Potential function conditioned on scene coefficient” section and learned DSIFTBoW encoder \(\mathrm {D}\), and calculating \(\kappa (f(\hat{\mathbf{x}}(\mathbf{y})), f(\mathbf{x}'^k))\), and (2) estimating the \(\hat{\mathbf {x}}\) with the MRF model by computing the Bayesian minimum mean squared error estimation (MMSE) through Gibbs sampling:
where \(p(\mathbf {x}  \mathbf {y}; \{\mathbf {F}_i\}, \mathbf {w})\) means the image prior distribution from the MRF model in Eq. (1) with calculated \(\kappa (f(\hat{\mathbf{x}}(\mathbf{y})), f(\mathbf{x}'^k))\). A posterior version of auxiliaryvariable Gibbs sampler is imposed to sample the GSM scale and the latent image iteratively (Schmidt et al. 2010).
Evaluation on image denoising
We focus on comparing our denoising results to the reconstructs relying on the stateoftheart generative MRF prior in Schmidt et al. (2010) and another broadly used denoising technique BM3D (Dabov et al. 2007), using peak signaltonoise ratio (PSNR) and grayscale structural similarity (SSIM) (Wang et al. 2004). Denoising results on 10 test images in BSD500 are illustrated. We consider the restoration performances on Gaussian noise at two levels: \(\sigma =10\) and \(\sigma =25\) (Schmidt et al. 2010). Note that, the noise levels are corresponding to the images in scale [0, 255]. Considering the space limitation, we only illustrate the 10 noisy images with \(\sigma =25\) in Fig. 5.
Figure 6 illustrates the numerical comparison of denoising results among the BM3D method (Dabov et al. 2007), learned image priors in Schmidt et al. (2010) and Schmidt and Roth (2014) and the proposed sceneaware prior. For fairness, regarding the model in Schmidt and Roth (2014), we use the version with \(3\times 3\) filters, which is same with the settings for the other learningbased method [MRF model in Schmidt et al. (2010) and the proposed method]. The sceneaware prior has a stable superiority on nearly all test images. And the performances of the two MRF priorbased methods are better than that of BM3D. When the noise level is \(\sigma =25\), the recovering results of sceneaware prior exceed the results of another two. Because the sceneaware prior model captures the statistical characteristic of natural image more preciously. When real details and textures are degenerated heavily, the sceneaware prior gives more benefits for recovering texture details and hallucinating the lost information. Figure 7 shows denoising results on test image goat, proposed method recovers the details better and avoids the fake texture which appears in the results of Schmidt et al. (2010). Without comparison with the ground truth, these fake details in the result of Schmidt et al. (2010) can be easily recognized as real details, because they satisfy the average perspective of humans on the concept of “natural” image. Figure 8 demonstrates recovering results of the generative MRF prior and our sceneaware prior. Sceneaware prior recovers more detail informations.
When the noise level is low (\(\sigma =10\)), the performance of the sceneaware prior is very similar with Schmidt et al. (2010). It can be explained as that there is always noise that is hard to be removed; and both the prior in Schmidt et al. (2010) and ours reach the latent limitation of similar algorithms. An example for visual illustration is shown in Fig. 9. As shown in Fig. 9, our result is much more closed to the ground truth than others. Hence, although the sceneaware prior and the MRF model learned in Schmidt et al. (2010) are closed to each other on the numerical evaluation, the proposed method can achieve more natural and accurate results, which illustrates the power of the sceneaware concept in prior learning.
Evaluation on image inpainting
Image inpainting is to recover a highquality image from a degenerated image in which part of the image pixels is lost or deteriorated. Apart from the image denoising task, we also test the proposed method on image inpainting in this section. As shown in Fig. 10, a part of an image is crimped and deteriorated due to folding, which is used as the input image in this experiment. Given a binary mask indicating the deteriorated pixels, the MRF model in Schmidt et al. (2010) and the proposed sceneaware prior both work well on recovering an intact image. The result of the proposed however is more natural, especially on the pixels near the crimps in the original image. Since the groundtruth image for the realworld deteriorated image, only the visual comparison is illustrated in Fig. 10.
Conclusion and future work
The proposed highorder MRFbased sceneaware image prior models the lowlevel distribution of image conditioned on highlevel scene characteristic of observations, and improves the restoration of the degenerated observations. Experimental results demonstrate that the proposed method can generate desirable restoration results.
Our work provides a possible path bridging the lowlevel prior learning and highlevel concept, and proves that the idea can achieve better results than the stateoftheart methods in similar settings. However, at the same time, it opens up several possible directions for future research:

Our proposed model learns lowlevel features in a small local area, and use the simple DSIFTBoW to express the highlevel scene concept, which restricts the expression ability of the model. Embedding the proposed method with the deep convolutional neural network, Bengio et al. (2013) might enable higher expression ability.

We evaluated the efficiency of the proposed method on image denoising and inpainting tasks. In the future, we may extent this work to more applications tasks, including image superresolution, image deblurring, optical flow, etc.
Notes
 1.
In model (1), if \(\mathbf{F}_i\)’s are \(l\times l\) filters, each \(\mathbf{x}_c\) is a \(l\times l\) subvector in \(\mathbf{x}\).
 2.
We slightly abuse the notation \(\{\mathbf {w}_i\}_{i=1}^N\) as both the parameters of the potentials and the functions \(\mathbf{w}_i(f(\mathbf {x}), \varvec{\theta })\).
References
Bengio Y, Courville A, Vincent P (2013) Representation learning: a review and new perspectives. IEEE Trans Pattern Anal Mach Intell 35(8):1798–1828
Bottou L (2010) Largescale machine learning with stochastic gradient descent. In: Proceedings of COMPSTAT’2010. Springer, Berlin, pp 177–186
Chen Y, Yu W, Pock T (2015) On learning optimized reaction diffusion processes for effective image restoration. In: Proceedings of the IEEE conference on computer vision and pattern recognition. pp 5261–5269
Cho TS, Joshi N, Zitnick CL, Kang SB, Szeliski R, Freeman WT (2010) A contentaware image prior. In: IEEE conference on computer vision and pattern recognition (CVPR)
Dabov K, Foi A, Katkovnik V, Egiazarian K (2007) Image denoising by sparse 3D transformdomain collaborative filtering. In: IEEE transactions on image processing
Feng W, Qiao H, Chen Y (2016) Poisson noise reduction with higherorder natural image prior model. SIAM J Imaging Sci 9(3):1502–1524
Fergus R, Singh B, Hertzmann A, Roweis ST, Freeman WT (2006) Removing camera shake from a single photograph. In: ACM transactions on graphics (TOG)
Gong D, Tan M, Zhang Y, van den Hengel A, Shi Q (2016) Blind image deconvolution by automatic gradient activation. In: IEEE conference on computer vision and pattern recognition (CVPR)
Gong D, Yang J, Liu L, Zhang Y, Reid I, Shen C et al (2017) From motion blur to motion flow: a deep learning solution for removing heterogeneous motion blur. In: The IEEE conference on computer vision and pattern recognition (CVPR)
Hinton GE (2002) Training products of experts by minimizing contrastive divergence. Neural computation
Koller D, Friedman N (2009) Probabilistic graphical models: principles and techniques. MIT Press, Cambridge
Krishnan D, Fergus R (2009) Fast image deconvolution using hyperLaplacian priors. In: NIPS
Krishnan D, Tay T, Fergus R (2011) Blind deconvolution using a normalized sparsity measure. In: IEEE conference on computer vision and pattern recognition (CVPR)
Levin A, Fergus R, Durand F, Freeman WT (2007) Image and depth from a conventional camera with a coded aperture. ACM Trans Gr 26:70
Levin A, Weiss Y, Durand F, Freeman WT (2009) Understanding and evaluating blind deconvolution algorithms. In: IEEE conference on computer vision and pattern recognition (CVPR)
Li LJ, Socher R, FeiFei L (2009) Towards total scene understanding: classification, annotation and segmentation in an automatic framework. In: IEEE conference on computer vision and pattern recognition, 2009. CVPR 2009. IEEE, New York
Li LJ, Su H, FeiFei L, Xing EP (2010) Object bank: a highlevel image representation for scene classification & semantic feature sparsification. In: Advances in neural information processing systems
Lowe DG (2004) Distinctive image features from scaleinvariant keypoints. In: IEEE international conference on computer vision (ICCV)
Martin D, Fowlkes C, Tal D, Malik J (2001) A database of human segmented natural images and its application to evaluating segmentation algorithms and measuring ecological statistics. In: IEEE international conference on computer vision (ICCV)
McAuley JJ, Caetano TS, Smola AJ, Franz MO (2006) Learning highorder mrf priors of color images. In: Proceedings of the 23rd international conference on machine learning. ACM, New York, pp 617–624
Oliva A, Torralba A (2001) Modeling the shape of the scene: a holistic representation of the spatial envelope. Int J Comput Vis 42(3):145–175
Ren J, Liu J, Guo Z (2013) Contextaware sparse decomposition for image denoising and superresolution. IEEE Trans Image Process 22(4):1456–1469
Roth S, Black MJ (2005) Fields of experts: a framework for learning image priors. In: IEEE conference on computer vision and pattern recognition (CVPR)
Samuel KGG, Tappen MF (2009) Learning optimized MAP estimates in continuouslyvalued MRF models. In: IEEE conference on computer vision and pattern recognition (CVPR)
Schmidt U, Gao Q, Roth S (2010) A generative perspective on MRFs in lowlevel vision. In: IEEE conference on computer vision and pattern recognition (CVPR)
Schmidt U, Jancsary J, Nowozin S, Roth S, Rother C (2014) Cascades of regression tree fields for image restoration
Schmidt U, Roth S (2014) Shrinkage fields for effective image restoration. In: IEEE conference on computer vision and pattern recognition (CVPR)
Sonka M, Hlavac V, Boyle R (2014) Image processing, analysis, and machine vision. Cengage Learning, Boston
Sun J, Zhu J, Tappen MF (2010) Contextconstrained hallucination for image superresolution. In: IEEE conference on computer vision and pattern recognition (CVPR)
Tappen MF, Liu C (2012) A Bayesian approach to alignmentbased image hallucination. In: ECCV
Tappen MF, Liu C, Adelson EH, Freeman WT (2007) Learning Gaussian conditional random fields for lowlevel vision. In: IEEE conference on computer vision and pattern recognition (CVPR)
Torralba A, Oliva A (2003) Statistics of natural image categories. Netw Comput Neural Syst 14:391–412
Wang Z, Bovik AC, Sheikh HR, Simoncelli EP (2004) Image quality assessment: from error visibility to structural similarity. In: IEEE transactions on image processing
Weiss Y, Freeman WT (2007) What makes a good model of natural images? In: IEEE conference on computer vision and pattern recognition (CVPR)
Wu Z, Lin D, Tang X (2016) Deep Markov random field for image modeling. In: European conference on computer vision. Springer, Berlin, pp 295–312
Xu L, Zheng S, Jia J (2013) Unnatural l0 sparse representation for natural image deblurring. In: IEEE conference on computer vision and pattern recognition (CVPR). IEEE, New York, pp 1107–1114
Zhang H, Wipf D, Zhang Y (2013) Multiimage blind deblurring using a coupled adaptive sparse prior. In: IEEE conference on computer vision and pattern recognition (CVPR)
Authors' contributions
DG drafted the manuscript. YZ, QY, and HL participated in its design and coordination and/or helped to revise the manuscript. All authors read and approved the final manuscript.
Acknowledgements
This work was supported in part by National Natural Science Foundation of China (61231016, 61572405), China 863 Project 2015AA016402.
Competing interests
The authors declare that they have no competing interests.
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Author information
Rights and permissions
Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.
About this article
Cite this article
Gong, D., Zhang, Y., Yan, Q. et al. Learning sceneaware image priors with highorder Markov random fields. Appl Inform 4, 12 (2017) doi:10.1186/s4053501700390
Received
Accepted
Published
DOI
Keywords
 Image restoration
 Markov random fields
 Sceneaware image prior learning