Short Text Classification based on Feature Extension using Information in Images

Volume 15, Number 2, February 2019, pp. 667-675
DOI: 10.23940/ijpe.19.02.p31.667675

Shengjie Zhaoa,b and Qianyun Jianga

aCollege of Electronic and Information Engineering, Tongji University, Shanghai, 200800, China
bSchool of Software Engineering, Tongji University, Shanghai, 200800, China

(Submitted on November 10, 2018; Revised on December 12, 2018; Accepted on January 5, 2019)


With the quick development and extensive application of the Internet, there is a growing desire for people to share their life or opinions on social networks, which produces a mass of short texts. Short texts are characterized by short length, sparse features, and a lack of contextual information. Thus, it is difficult for conventional methods to achieve high quality classification performance. To achieve a higher classification accuracy, this paper proposes a novel short text classification method based on feature extension by incorporating the information of the images. Specifically, we first generate a sentence that descripts the images by image caption technology, and then we combine the generated sentence with the text as the input of the classifier. Meanwhile, we introduce a similarity module in terms of the correlation between the image and the short text so as to determine whether the two sentences are combined or not. Simulation results show that our proposed model significantly outperforms the state-of-the-art methods in terms of classification accuracy.


