Dataset | Method | Task | ||
---|---|---|---|---|
Image\(\rightarrow\)Text | Text\(\rightarrow\)Image | Average | ||
Wikipedia dataset | CCA | 0.176 | 0.178 | 0.177 |
CFA | 0.330 | 0.306 | 0.318 | |
KCCA (Poly) | 0.230 | 0.224 | 0.227 | |
KCCA (Gaussian) | 0.357 | 0.328 | 0.343 | |
Bimodal AE | 0.301 | 0.267 | 0.284 | |
Multimodal DBN | 0.204 | 0.145 | 0.175 | |
Corr-AE | 0.373 | 0.357 | 0.365 | |
JRL | 0.408 | 0.353 | 0.381 | |
LGCFL | 0.416 | 0.360 | 0.388 | |
CMDN | 0.409 | 0.364 | 0.387 | |
Deep-SM | 0.458 | 0.345 | 0.402 | |
RCN (OnlyCorrelation) | 0.465 | 0.407 | 0.436 | |
Our RCN | 0.489 | 0.418 | 0.454 | |
NUS-WIDE-10k dataset | CCA | 0.159 | 0.189 | 0.174 |
CFA | 0.299 | 0.301 | 0.300 | |
KCCA (Poly) | 0.129 | 0.157 | 0.143 | |
KCCA (Gaussian) | 0.295 | 0.162 | 0.229 | |
Bimodal AE | 0.234 | 0.376 | 0.305 | |
Multimodal DBN | 0.178 | 0.144 | 0.161 | |
Corr-AE | 0.306 | 0.340 | 0.323 | |
JRL | 0.410 | 0.444 | 0.427 | |
LGCFL | 0.408 | 0.374 | 0.391 | |
CMDN | 0.410 | 0.450 | 0.430 | |
Deep-SM | 0.389 | 0.496 | 0.443 | |
RCN (OnlyCorrelation) | 0.360 | 0.406 | 0.383 | |
Our RCN | 0.497 | 0.517 | 0.507 | |
Pascal Sentences dataset | CCA | 0.110 | 0.116 | 0.113 |
CFA | 0.341 | 0.308 | 0.325 | |
KCCA (Poly) | 0.271 | 0.280 | 0.276 | |
KCCA (Gaussian) | 0.312 | 0.329 | 0.321 | |
Bimodal AE | 0.404 | 0.447 | 0.426 | |
Multimodal DBN | 0.438 | 0.363 | 0.401 | |
Corr-AE | 0.411 | 0.475 | 0.443 | |
JRL | 0.416 | 0.377 | 0.397 | |
LGCFL | 0.381 | 0.435 | 0.408 | |
CMDN | 0.458 | 0.444 | 0.451 | |
Deep-SM | 0.440 | 0.414 | 0.427 | |
RCN (OnlyCorrelation) | 0.433 | 0.443 | 0.438 | |
Our RCN | 0.472 | 0.453 | 0.463 |