A spatialconstrained multitarget regression model for human brain activity prediction
 Zhenfu Wen^{1, 2} and
 Yuanqing Li^{1, 2}Email author
DOI: 10.1186/s405350160026x
© The Author(s) 2016
Received: 14 September 2016
Accepted: 9 November 2016
Published: 24 November 2016
Abstract
Analyzing functional magnetic resonance imaging (fMRI) data from the encoding perspective provides a powerful tool to explore human vision. Using voxelwise encoding models, previous studies predicted the brain activity evoked by external stimuli successfully. However, these models constructed a regularized regression model for each single voxel separately, which overlooked the intrinsic spatial property of fMRI data. In this work, we proposed a multitarget regression model that predicts the activities of adjacent voxels simultaneously. Different from the previous models, the spatial constraint is considered in our model. The effectiveness of the proposed model is demonstrated by comparing it with two stateoftheart voxelwise models on a publicly available dataset. Results indicate that the proposed method can predict voxel responses more accurately than the competing methods.
Keywords
fMRI Encoding Spatial constraint Multitarget regressionBackground
One important goal of neuroscience is to understand the relationship between external visual stimulus and human brain activity. We can gain the understanding by analyzing fMRI data from the mirror perspectives of neural decoding and neural encoding (Naselaris et al. 2011). In the view of neural decoding, we often attempt to predict information of stimuli from measured brain activity. Numerous studies have explored human vision using decoding models (Haxby et al. 2001, 2014; Norman et al. 2006). Conversely, in the view of neural encoding, we try to model how brain activity varies corresponding to external stimulus and attempt to predict brain activity from stimuli features. Previous studies have indicated that encoding models are more efficient in describing the function of brain areas than decoding models (Naselaris et al. 2011), suggesting the advantages of analyzing fMRI in the encoding view.
In recent years, voxelbased encoding models were proposed and caught much attention (Kay et al. 2008). A typical encoding model can be divided into two parts. The first part tries to find a feature space to describe the external stimulus. The second part corresponds to the construction of regression models, which uses the stimulus features to predict corresponding brain activity. Lots of effort were taken to find ways to represent the stimulus images. Previous studies used Gabor wavelet pyramid model (Kay et al. 2008; Vu et al. 2011), twolayer sparse coding model (Güçlü and van Gerven 2014), and convolutional neural networks (Agrawal et al. 2014) to extract features that can represent natural images effectively. However, fewer studies focused on efficient regression model construction.
In the regression part of encoding, regularized linear regression models such as lasso (Kay et al. 2008), ridge regression (Güçlü and van Gerven 2014) and graphconstrained elastic net (Kay et al. 2008; Schoenmakers et al. 2013) were most commonly used. Recently, a more advanced sparse nonparametric regression model was proposed (Vu et al. 2011). In spite of the successful prediction of brain activity using these models, one drawback of these voxelwise models in previous studies is that the response of each voxel is modeled separately; thus, the estimated parameters of different voxels are independent. As a result, these regression models cannot fully employ the correlations between voxels and brain regions. Numerous studies have indicated the benefits of taking the spatial smoothness of fMRI data into account. For example, in the decoding models, when the spatial structure of the data is considered, higher decoding accuracies and more informative and interpretable results can be obtained (Michel et al. 2011; de Brecht and Yamagishi 2012). In functional brain mapping, combining local brain activity often results in more consistent patterns across subjects (Kriegeskorte et al. 2006). All these results suggest that spatial structure of fMRI data should also be considered in encoding models.
In this paper, we focus on the part of regression models construction in the encoding models, i.e., given the features of external stimuli images, we try to construct a regression model that can predict internal brain activity efficiently. We employ the spatial smoothness property of fMRI data and construct a multitarget linear regression model (Evgeniou and Pontil 2004, Argyriou et al. 2008) in which the activities of local adjacent voxels will be predicted simultaneously, and a spatial constraint is proposed to restrict the model parameters. To demonstrate the effectiveness of this model, we compare the brain activity prediction performances of the proposed method with two stateoftheart voxelwise models on a public fMRI dataset.
Methods
Data description
The publicly available fMRI data (Kay et al. 2011) were used for model validation; this dataset is widely used in comparing models (Güçlü and van Gerven 2014; Naselaris et al. 2009; Agrawal et al. 2014), and detailed experiment information is available in the original papers (Kay et al. 2008; Naselaris et al. 2009). The fMRI responses were recorded when human subjects viewing grayscale natural images while fixating on a central white square. Two subjects took part in the experiments. They viewed 1750 training images (for encoding model training), each presented twice; and 120 validation images (for encoding model testing), each presented ten times. For each subject, the data were acquired in five scanner sessions on five different days. Each scan session consisted of five training runs, each lasted 11 min, and two validation runs, each lasted 12 min.
The brain activity from the occipital cortex were recorded at a spatial resolution of 2 mm × 2 mm × 2.5 mm and a temporal resolution of 1 s using a 4T INOVA MR scanner (Varian, Inc.). Brain volumes were coregistered to correct head movements, and the timeseries data were deconvolved from the data to account for the delay in the hemodynamic response (Friston et al. 1994). Thus after the preprocessing, each stimulus image corresponds to one brain volume. The voxels in early visual areas were further divided into visual area one (V1), visual area two (V2,) and visual area three (V3). We only considered brain activity prediction in these areas in this study.
Problem formulation
In a standard regression framework, the design matrix \(X \in \mathfrak {R}^{N\times M}\) is formed by \(1\times M\) feature vectors \(x_{s},s=1,2,\dots ,N\) of N samples. The goal is to predict the value of a \(N\times 1\) target vector y, which contains corresponding target values of \(x_{s}\). In this work, the design matrix comprises the features of N stimuli images, and the target vector is composed of intensities of a voxel, with each intensity corresponding to a image feature vector. Thus the problem here is to find a model that can predict voxel activity in response to stimuli accurately.
Voxelwise models
To determine the optimal \(\lambda _{v}\) in the models, we conduct a nested threefold crossvalidation and choose for each voxel v that model which maximizes the correlation between \(Xb_{v}\) and \(y_{v}\) on holdout data. As done in the previous study (Schoenmakers et al. 2013), we sample lambda in the range \((10^{5},10^{5})\) on a log scale. For the convenience of discussion, we refer the voxelwise model with ridge regularizer as Ridge and the model with Lasso regularizer as Lasso.
Proposed model
The voxelwise models proposed in previous studies constructed regression model for each voxel separately, but ignored the dependents between voxels. However, fMRI data often possess the specific spatial smoothness property, and voxels from the same local brain area often exhibit similar properties. To elevate the performance of brain activity prediction, we employ the spatial smoothness property of fMRI data and construct a multitarget regression model.
Model estimation
Actually, this is the Sylvester equation, with \(B_v\) the unknown parameter matrix to be determined. The equation can be solved efficiently (Bartels and Stewart 1972). Similar to the estimation of Ridge and Lasso regression, we used a nested threefold crossvalidation to determine \(\lambda _1, \lambda _2\) in the range \((10^{5},10^{5})\) on a log scale.
Prediction
Implementation details
Results and discussion
Percentage of voxels survived an \(R^2\) threshold of 0.1 for different models
Models  Subject 1  Subject 2  

V1 (%)  V2 (%)  V3 (%)  V1 (%)  V2 (%)  V3 (%)  
Ridge  25.50  16.61  6.09  12.29  7.78  2.20 
Lasso  25.73  16.13  5.75  12.29  7.25  2.09 
Proposed  29.44  21.84  10.22  16.30  11.11  4.29 
Table 1 lists how many voxels (in percentage) survived a \(R^2\) threshold of 0.1 for different models in brain area V1, V2, and V3; these voxels are thought as activity well predicted. In all models, the performance in V1 is better than in V2 and V3. For subject 1, the percent of survived voxels systematically decreased from 29% in V1 to 10% in V3 when proposed method is used. While for the voxelbased models (ridge and lasso), the percent of survived voxels systematically decreased from about 25% in V1 to 6% in V3. Similar trend is observed for subject 2, though the performance is not as better as for subject 1.
Fig.1 compares the mean \(R^2\) of different models across the survived voxels. The mean \(R^2\) of the proposed method is about 0.26 in V1, and it systematically decreases to 0.19 in V3. In contrast, the mean \(R^2\) of voxelbased ridge and lasso models are similar, systematically decreasing from 0.24 in V1 to 0.17 in V3.
Figures 2 and 3 compare the performance of different models across voxels and brain areas. Figure 2 represents the distribution of prediction \(R^2\) for survived voxels. In most values of \(R^2\), the proposed method obtained more voxels than ridge and lasso models. The prediction \(R^2\) for all voxels are displayed in Fig. 3, where the points above the diagonals indicate the superiority of the model on the yaxis over the one on the xaxis. Obviously, most voxels in each brain area are better predicted by the proposed model than the traditional voxelwise models.
Conclusions
In this paper, we proposed a multitarget regression model to predict brain activity when subjects view grayscale images. Based on the hypothesis that the property of a voxel is similar to its local neighbors, we constructed a spatial constraint on model parameters. The parameters can be estimated in an efficient way. We illustrated that the proposed method achieves better prediction performance on a public dataset fMRI data than voxelwise ridge and lasso models did. The prediction \(R^2\) of proposed model was higher than those acquired by voxelwise models and more voxels survived an \(R^2\) threshold of 0.1. These results suggest the benefits of considering essential spatial property of fMRI data in encoding models.
Abbreviations
 fMRI:

functional magnetic resonance imaging
 \(R^{2}\) :

the coefficient of determination
 V1:

visual area one
 V2:

visual area two
 V3:

visual area three
 PCA:

principal component analysis
Declarations
Authors' contributions
ZW and YL participated in the design of the study, performed in the data analysis, and drafted the manuscript. Both authors read and approved the final manuscript.
Acknowledgements
This work was supported by the National Key Basic Research Program of China (973 Program) under the Grant 2015CB351703, the National Natural Science Foundation of China under the Grants 61633010, 91420302, and 61573150, and Guangdong Natural Science Foundation under the Grant 2014A030312005.
Competing interests
The authors declare that they have no competing interests.
Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.
Authors’ Affiliations
References
 Agrawal P, Stansbury D, Malik J, Gallant JL (2014) Pixels to voxels: modeling visual representation in the human brain. arXiv preprint arXiv:1407.5104
 Argyriou A, Evgeniou T, Pontil M (2008) Convex multitask feature learning. Mach Learn 73(3):243–272View ArticleGoogle Scholar
 Bartels RH, Stewart G (1972) Solution of the matrix equation ax+ xb= c [f4]. Commun ACM 15(9):820–826View ArticleGoogle Scholar
 Bishop CM (2006) Pattern recognition and machine learning. Springer, New York
 de Brecht M, Yamagishi N (2012) Combining sparseness and smoothness improves classification accuracy and interpretability. Neuroimage 60(2):1550–1561View ArticleGoogle Scholar
 Evgeniou T, Pontil M (2004) Regularized multitask learning. In: Proceedings of the tenth ACM SIGKDD international conference on knowledge discovery and data mining, ACM, New York, pp 109–117
 Friston KJ, Holmes AP, Worsley KJ, Poline J, Frith CD, Frackowiak RS (1994) Statistical parametric maps in functional imaging: a general linear approach. Hum Brain Mapp 2(4):189–210View ArticleGoogle Scholar
 Güçlü U, van Gerven MA (2014) Unsupervised feature learning improves prediction of human brain activity in response to natural images. PLoS Comput Biol 10:e1003724View ArticleGoogle Scholar
 Haxby JV, Connolly AC, Guntupalli JS (2014) Decoding neural representational spaces using multivariate pattern analysis. Annu Rev Neurosci 37:435–456View ArticleGoogle Scholar
 Haxby JV, Gobbini MI, Furey ML, Ishai A, Schouten JL, Pietrini P (2001) Distributed and overlapping representations of faces and objects in ventral temporal cortex. Science 293(5539):2425–2430View ArticleGoogle Scholar
 Jones JP, Palmer LA (1987) An evaluation of the twodimensional gabor filter model of simple receptive fields in cat striate cortex. J Neurophysiol 58(6):1233–1258Google Scholar
 Kay KN, Naselaris T, Prenger RJ, Gallant JL (2008) Identifying natural images from human brain activity. Nature 452(7185):352–355View ArticleGoogle Scholar
 Kay K, Naselaris T, Gallant J (2011) fmri of human visual areas in response to natural images. http://CRCNS.org/. Accessed 18 June 2015
 Kriegeskorte N, Goebel R, Bandettini P (2006) Informationbased functional brain mapping. Proc Natl Acad Sci USA 103(10):3863–3868View ArticleGoogle Scholar
 Michel V, Gramfort A, Varoquaux G, Eger E, Thirion B (2011) Total variation regularization for fmribased prediction of behavior. IEEE Trans Med Imaging 30(7):1328–1340View ArticleGoogle Scholar
 Naselaris T, Kay KN, Nishimoto S, Gallant JL (2011) Encoding and decoding in fmri. Neuroimage 56(2):400–410View ArticleGoogle Scholar
 Naselaris T, Prenger RJ, Kay KN, Oliver M, Gallant JL (2009) Bayesian reconstruction of natural images from human brain activity. Neuron 63(6):902–915View ArticleGoogle Scholar
 Norman KA, Polyn SM, Detre GJ, Haxby JV (2006) Beyond mindreading: multivoxel pattern analysis of fmri data. Trends Cogn Sci 10(9):424–430View ArticleGoogle Scholar
 Schoenmakers S, Barth M, Heskes T, van Gerven M (2013) Linear reconstruction of perceived images from human brain activity. Neuroimage 83:951–961View ArticleGoogle Scholar
 Tibshirani R (1996) Regression shrinkage and selection via the lasso. J R Stat Soc Ser B 58:267–288MathSciNetMATHGoogle Scholar
 Vu VQ, Ravikumar P, Naselaris T, Kay KN, Gallant JL, Yu B (2011) Encoding and decoding v1 fmri responses to natural images with sparse nonparametric models. Ann Appl Stat 5(2B):1159MathSciNetView ArticleMATHGoogle Scholar