Username   Password       Forgot your password?  Forgot your username? 


Deep Web Entity Identification Method with Unique Constraint

Volume 14, Number 10, October 2018, pp. 2470-2482
DOI: 10.23940/ijpe.18.10.p23.24702482

Xuefeng Xiana, Pengpeng Zhaob, Zhaobin Liua, Caidong Gua, and Victor S. Shengc

aSchool of Computer Engineering, Suzhou Vocational University, Suzhou, 215104, China
bThe Institute of Intelligent Information Processing and Application, Soochow University, Suzhou, 215006, China
cComputer Science Department, University of Central Arkansas, Conway, 72035, USA

(Submitted on July 5, 2018; Revised on August 8, 2018; Accepted on September 15, 2018)


In practice, some attributes meet a unique constraint: each entity has a unique value for the attribute. A deep web entity identification method was presented to solve problems of data error correction, uniqueness constraint enforcement, and local data fusion in deep web data integration. The method transformed the entity identification phrase to a k-partite graph clustering problem, considering both similarity and association of attribute values. Moreover, it performed global record linkage and data fusion simultaneously and could identify incorrect values and differentiate them from correct ones at the beginning. Experimental results demonstrate the high precision and scalability of our method.


References: 16

                1. P. Lyman and H. R Varian, “How Much Information?,” ( much-info-2003/execsum.htm, accessed 27th October 2003)
                2. A. K. Elmagarmid, P. G. Ipeirotis, and V. S. Verykios, “Duplicate Record Detection: A Survey,” IEEE Transactions on Knowledge and Data Engineering, Vol. 19, No. 1, pp. 1-16, 2007
                3. E. K. Rezig, E. C. Dragut, M. Ouzzani, A. K. Elmagarmid, and W. G. Aref, “ORLF: A Flexible Framework for Online Record Linkage and Fusion,” in Proceedings of 2016 IEEE 32nd International Conference on Data Engineering, Helsinki, Finland, pp. 16-20, May 2016
                4. X. L. Dong and F. Naumann, “Data Fusion-Resolving Data Conflicts for Integration,” in Proceedings of 2009 the 35th International Conference on Very Large Data Bases, 2009
                5. D. L. Davies and D. W. Bouldin, “A Cluster Separation Measure,” IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 1, No. 2, pp. 224-227, 1979
                6. J. C. Dunn, “Well Separated Clusters and Optimal Fuzzy Partitions,” Journal of Cybernetics, Vol. 4, No. 1, pp. 95-104, 1974
                7. P. J. Rousseeuw, “Silhouettes: A Graphical Aid to the Interpretation and Validation of Cluster Analysis,” Journal of Computational and Applied Mathematics, Vol. 20, No. 1, pp. 53-65, 1987
                8. C. Legany, S. Juhasz, and A. Babos, “Cluster Validity Measurement Techniques,” in Proceedings of the 5th International Conference on Artificial Intelligence, Knowledge Engineering and Data Bases, Madrid, Spain: WSEAS, pp. 388-393, 2006
                9. S. Petrović, “A Comparison Between the Silhouette Index and the Davies-Bouldin Index in Labelling IDS Clusters,” in Proceedings of the 11th Nordic Workshop on Secure IT-Systems, Linkoping, Sweden: NORDSEC, pp. 53-64, 2006
                10. W. W. Cohen, P. Ravikumar, and S. E. Fienberg, “A Comparison of String Distance Metrics for Name-Matching Tasks,” in Proceedings of the IJCAI Workshop on Information Integration on the Web, Acapulco, Mexico: IIWEB, pp. 73-78, 2003
                11. T. F. Gonzalez, “On the Computational Complexity of Clustering and Related Problems,” System Modeling and Optimization, pp. 174-182, 2005
                12. J. Sima and S. E. Schaeffer, “On the NP-completeness of Some Graph Cluster Measures,” Lecture Notes in Control and Information Sciences, Vol. 3831, pp. 530-537, 2006
                13. M. W. Krentel, “The Complexity of Optimization Problems,” Journal of Computer and System Sciences, Vol. 36, No. 3, pp. 490-509, 1988
                14. X. L. Dong, L. Bertiequille, and D. Srivastava, “Integrating Conflicting Data: the Role of Source Dependence,” in Proceedings of the VLDB Endowment, Vol. 2, No. 1, pp. 50-561, 2009
                15. N. Koudas, S. Sarawagi, and D. Srivastava, “Record Linkage: Similarity Measures and Algorithms,” in Proceedings of the 2006 ACM SIGMOD International Conference on Management of Data, NY, USA:ACM, pp. 802-803, 2006
                16. E. K. Rezig, E. C. Dragut, and M. Ouzzani, “Query-Time Record Linkage and Fusion over Web Databases,” in Proceedings of 2015 IEEE 31st International Conference on Data Engineering, Seoul, South Korea, pp. 13-17:42-53, April 2015


                              Please note : You will need Adobe Acrobat viewer to view the full articles.Get Free Adobe Reader

                              Download this file (IJPE-2018-10-23.pdf)IJPE-2018-10-23.pdf[Deep Web Entity Identification Method with Unique Constraint]758 Kb
                              This site uses encryption for transmitting your passwords.