Username   Password       Forgot your password?  Forgot your username? 


Cross-Media Retrieval based on Pseudo-Label Learning and Semantic Consistency Algorithm

Volume 14, Number 9, September 2018, pp. 2219-2229
DOI: 10.23940/ijpe.18.09.p31.22192229

Gongwen Xua, Zhiqi Sangb, and Zhijun Zhangc

aSchool of Information Science and Engineering, Shandong Normal University, Jinan, 250358, China
bCollege of Architecture and Urban Planning, Shandong Jianzhu University, Jinan, 250101, China
cSchool of Computer Science and Technology, Shandong Jianzhu University, Jinan, 250101, China

(Submitted on June 13, 2018; Revised on July 24, 2018; Accepted on August 30, 2018)


To retrieve heterogeneous multimodal data with the same semantics, many algorithms for retrieval over multimodal data have been suggested. The organization and analysis of heterogeneous data have become the focus of intensive research. Here, a new and efficient algorithm for cross-media retrieval is proposed based on pseudo-label learning and semantic consistency (PLSC). In this algorithm, an adaptive learning projection matrix optimization method is proposed, and in the process of learning the projection matrices, the method fully considers the semantic information of the labeled and unlabeled samples. Thus, the PLSC algorithm can utilize more useful information than other methods and can learn the more efficient projection matrices. Firstly, the class centers of labeled text are computed. We use median feature vectors as the class center vectors. Next, unlabeled images are projected onto the text space and are assigned pseudo-labels by comparing with the class center vectors of the text data. Finally, a new training dataset, which includes labeled and unlabeled data, is generated for training the projection matrix. Using the projection matrix to project image or text data onto the same feature space, the data can be compared with each other for similarity, and the distance between data points can be calculated using the Euclidean metric. Validation experiments suggest that the PLSC outperforms other state-of-the-art algorithms.


References: 31

                1. F. Wu, H. Zhang, and Y. Zhuang, “Learning Semantic Correlations for Cross-Media Retrieval,” in Proceedings of IEEE International Conference on Image Processing, pp. 1465-1468, IEEE, 2007
                2. D. R. Hardoon, S. Szedmak, and J. Shawe-Taylor, “Canonical Correlation Analysis: An Overview with Application to Learning Methods,” Neural Computation, Vol. 16, No. 12, pp. 2639, 2004
                3. Y. Q. Jia, M. Salzmann, and T. Darrell, “Learning Cross-Modality Similarity for Multinomial Data,” in Proceedings of IEEE International Conference on Computer Vision, pp. 2407-2414, 2011
                4. C. C. Kang, et al., “Learning Consistent Feature Representation for Cross-Modal Multimedia Retrieval,” IEEE Transactions on Multimedia, Vol. 17, No. 3, pp. 370-381, 2015
                5. J. F. He, et al., “Cross-Modal Retrieval by Real Label Partial Least Squares,” in Proceedings of ACM on Multimedia Conference ACM, pp. 227-231, 2016
                6. X. Chang and Y. Yang, “Semisupervised Feature Analysis by Mining Correlations Among Multiple Tasks,” IEEE Transactions on Neural Networks & Learning Systems, Vol. 28, No. 10, pp. 2294-2305, 2016
                7. H. Zhang, Y. Liu, and Z. Ma, “Fusing Inherent and External Knowledge with Nonlinear Learning for Cross-Media Retrieval,” Neurocomputing, Vol. 119, No.16, pp. 10-16, 2013
                8. H. Zhang and X. Liu, “Cross-Media Semantics Mining based on Sparse Canonical Correlation Analysis and Relevance Feedback,” in Proceedings of Pacific-Rim Conference on Advances in Multimedia Information Processing, pp. 759-768, Springer-Verlag, 2012
                9. Y. X. Wang, H. Zhang, and F. Yang, “A Weighted Sparse Neighbourhood-Preserving Projections for Face Recognition,” IETE Journal of Research, pp. 1-10, 2017
                10. H. X. Zhang, L. Cao, and S. Gao, “A Locality Correlation Preserving Support Vector Machine,” Pattern Recognition, Vol. 47, No. 9, pp. 3168-3178, 2014
                11. J. H. Yan, et al., “Joint Graph Regularization based Modality-Dependent Cross-Media Retrieval,” Multimedia Tools & Applications, No. 6, pp. 1-19, 2017
                12. X. Liang, Y. Wei, X. Shen, et al., “Proposal-Free Network for Instance-Level Object Segmentation,” IEEE Transactions on Pattern Analysis and Machine, 2015
                13. Y. H. Xiao, et al., “Topographic NMF for Data Representation,” IEEE Transactions on Cybernetics, Vol. 44, No. 10, pp. 1762, 2014
                14. X. Liang, Y. Wei, L. Lin, et al., “Learning to Segment Human by Watching YouTube,” IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 39, No. 7, pp. 1462-1468, 2017
                15. X. Zhai, Y. Peng, and J. Xiao, “Learning Cross-Media Joint Representation with Sparse and Semisupervised Regularization,” IEEE Transactions on Circuits & Systems for Video Technology, Vol. 24, No. 6, pp. 965-978, 2014
                16. X. Zhai, Y. Peng, and J. Xiao, “Heterogeneous Metric Learning with Joint Graph Regularization for Cross-Media Retrieval,” in Proceedings of Twenty-Seventh AAAI Conference on Artificial Intelligence, pp. 1198-1204, 2013
                17. X. Zhai, Y. Peng, and J. Xiao, “Cross-modality Correlation Propagation for Cross-Media Retrieval,” in Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 2337-2340, 2012
                18. X. Zhai, Y. Peng, and J. Xiao, “Effective Heterogeneous Similarity Measure with Nearest Neighbors for Cross-Media Retrieval,” in Proceedings of International Conference on Advances in Multimedia Modeling Springer-Verlag, pp. 312-322, 2012
                19. Y. Wei, Y. Zhao, Z. Zhu, et al., “Modality-Dependent Cross-Media Retrieval, ” ACM Transactions on Intelligent Systems & Technology, Vol. 7, No. 4, pp. 1-13, 2016
                20. D. H. Le, “Pseudo-Label: The Simple and Efficient Semi-Supervised Learning Method for Deep Neural Networks,” in Proceedings of the 2013 ICML Workshop on Challenges in Representation Learning, pp. 1-4, 2013.
                21. W. Wang, R. Arora, K. Livescu, et al., “On Deep Multi-View Representation Learning,” in Proceedings of International Conference on Machine Learning, pp. 1083-1092, 2015
                22. A. Karpathy, A. Joulin, and L. Fei-Fei, “Deep Fragment Embeddings for Bidirectional Image Sentence Mapping,” Advances in Neural Information Processing Systems, pp. 1889-1897, 2015
                23. D. Yu, L. Deng, and G. E. Dahl, “Roles of Pre-Training and Fine-Tuning in Context-Dependent DBN-HMMs for Real-World Speech Recognition,” in Proceedings of Nips Workshop on Deep Learning & Unsupervised Feature Learning, 2010
                24. X. Zhang, B. He, and T. Luo, “Training Query Filtering for Semi-Supervised Learning to Rank with Pseudo Labels,” World Wide Web-Internet & Web Information Systems, Vol. 19, No. 5, pp. 833-864, 2016
                25. N. Rasiwasia, J. C. Pereira, E. Coviello, et al., “A New Approach to Cross-Modal Multimedia Retrieval,” in Proceedings of ACM International Conference on Multimedia, pp. 251-260, 2010
                26. Y. Ke and R. Sukthankar, “PCA-SIFT: A More Distinctive Representation for Local Image Descriptors,” in Proceedings of IEEE Computer Society Conference on Computer Vision & Pattern Recognition, pp. 506-513, 2004
                27. D. M. Blei, A. Y. Ng, and M. I. Jordan, “Latent Dirichlet Allocation,” Journal of Machine Learning Research Archive, No. 3, pp. 993-1022, 2003
                28. C. Rashtchian, P. Young, M. Hodosh, et al., “Collecting Image Annotations using Amazon's Mechanical Turk,” in Proceedings of the NAACL HLT 2010 Workshop on Creating Speech and Language Data with Amazon's Mechanical Turk. Association for Computational Linguistics, pp. 139-147, 2010
                29. L. Zheng, Y. Zhao, S. Wang, et al., “Good Practice in CNN Feature Transfer,” arXiv preprint, arXiv:1604.00133, pp. 1-9 2016
                30. Y. Gong, Q. Ke, M. Isard, et al., “A Multi-View Embedding Space for Modeling Internet Images, Tags, and Their Semantics,” International Journal of Computer Vision, Vol. 106, No. 2, pp. 210-233, 2014
                31. D. W. Jacobs, H. Daume, A. Kumar, and A. Sharma, “Generalized Multiview Analysis: A discriminative Latent Space,” in Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, pp. 2160-2167, IEEE Computer Society, 2012


                              Please note : You will need Adobe Acrobat viewer to view the full articles.Get Free Adobe Reader

                              This site uses encryption for transmitting your passwords.