An XML Streaming Data Processing Method based on Forest Transducer
Volume 13, Number 5, September 2017 - Paper 7 - pp. 620-632 DOI: 10.23940/ijpe.17.05.p7.620632
Zhixue Hea,b,*, Husheng Liaoa
aCollege of Computer Science, Beijing University of Technology, Beijing, 100124, ChinabCollege of Computer, North China Institute of Aerospace Engineering, Langfang, 065000, China (Submitted on March 28, 2017; Revised on July 2, 2017; Accepted on August 10, 2017)
Abstract:
XML is the de facto standard for data representation and exchanging on web. The query processing technique of XML streaming data is a hotspot in current research. Focused on the characteristics of processing semi-structure XML streaming data such as the stream arriving continuously, requiring to be read sequentially and only once into memory, the querying must be processed on the fly, a method of processing XPath query based on forest transducer is proposed. Firstly, conversion rules of forest transducer are defined for XPath query. And then the transducer is driven by input streaming data nodes. Stack and abstract syntax tree are applied to implement match and state transformation in running procedure. The relationships between state functions and intermediate results are kept by the abstract syntax tree, and the query results are output in reducing process. Finally, the experimental results show that our approach is effective and efficient on this problem, and outperforms about 30 percent of the state-of-the-art algorithms especially for large processed data. At the same time, memory usage is nearly constant. This method resolves the balance between time and space complexity, and it is a useful reference for other methods.
References: 29
- M. Altinel and M. J. Franklin, “Efficient Filtering of XML Documents for Selective Dissemination of Information,” in Proceedings of the 26th International Conference on Very Large Data Bases, pp.53-64, Cairo, Egypt, September 2000
- C. Barton, P. Charles, D. Goyal, M. Raghavachari, M. Fontoura, and V. Josifovski, “Streaming XPath Processing with Forward and Backward Axes,” in Proceedings of the 19th International Conference on Data Engineering, pp.455–466, Bangalore, India, March 2003
- S. Boag, D. Chamberlin, M.F. Fernández, D. Florescu, J. Robie, and J. Siméon, “XQuery 1.0: An XML Query Language (Second Edition): W3C Recommendation,” (2015-09-07). http://www.w3.org/TR/xquery/, 2016
- D. Brownell and D. Megginson, “SAX: Simple API for XML,” (2004-04-27). http://www.saxproject.org/, 2016
- N. Bruno, L. Gravano, N. Koudas, and D. Srivastava, “Navigation-vs. Index-based XML Multi-query Processing,” in Proceedings of the 19th International Conference on Data Engineering, pp.139−150, Bangalore, India, March 2003
- C. Chan, P. Felber, M. Garofalakis, and R. Rastogi, “Efficient Filtering of XML Document with XPath Expressions,” The VLDB Journal, vol. 11, no. 4,pp. 354-379, December 2002
- Y. Chen, S. B. Davidson, and Y. Zheng, “An Efficient XPath Query Processor for XML Streams,” in Proceedings of the 22nd International Conference on Data Engineering, pp.79-79, Atlanta, GA, USA, April 2006
- J. Clark and S. Derose, “XML Path Language (XPath) Version 1.0: W3C Recommendation,” (2015-09-07). http://www.w3.org/TR/xpath/, 2016
- Y. L. Diao, M. Altinel, M. J. Franklin, H. Zhang, and P. Fischer, “Path Sharing and Predicate Evaluation for High-performance XML Filtering,” ACM Transactions on Database Systems, vol. 28, no. 4, pp.467-516, December 2003
- J. Gao, D. Q. Yang, S. W. Tang, and T. J. Wang, “Tree Automata based Efficient XPath Evaluation over XML Data Stream”. Journal of Software, vol. 16, no. 2, pp.223-232, February 2005
- X. Q. Gong, W.N.Qian, Y. Yan, and A. Zhou, “Bloom Filter-based XML Packets Filtering for Millions of Path Queries,” in Proceedings of the 21st International Conference on Data Engineering, pp.890-901, Tokyo, Japan, April 2005
- G. Gou and R. Chirkova, “Efficient Algorithms for Evaluating XPath over Streams,” in Proceedings of the 2007 ACM SIGMOD International Conference on Management of Data, pp.269-280, Beijing, China, June 2007
- A. Gupta and D. Suciu, “Stream Processing of XPath Queries With Predicates,” in Proceedings of the 2003 ACM SIGMOD International Conference on Management of Data, pp.419−430, San Diego, California, USA, June 2003
- S. Hakuta , S. Maneth , K. Nakano , and H. Iwasaki. “XQuery Streaming by Forest Transducer,” in Proceedings of the IEEE 30th International Conference on Data Engineering, pp. 952-963, Chicago, USA, March 2014.
- W.S. Han, H.F. Jiang, H. Howard, and Q. Li, “StreamTX: Extracting Tuples from Streaming XML Data,” in Proceedings of the VLDB Endowment, pp.289-300, Auckland, New Zealand, August 2008
- V. Josifovski, M. Fontoura, and A. Barta, “Querying XML Streams,” The VLDB Journal, vol. 14, no. 2, pp. 197-210, April 2005
- J. Kwon, P. Rao, B. Moon, and S. Lee, “FiST: Scalable XML Document Filtering by Sequencing Twig Patterns,” in Proceedings of the 31st International Conference on Very Large Data Bases, pp.217-228, Trondheim, Norway, August 2005
- M. Ley. DBLP XML Records. http://dblp.uni-trier.de/xml/
- D. Olteanu, “SPEX: Streamed and Progressive Evaluation of XPath,” IEEE Transactions on Knowledge and Data Engineering, vol. 19, no. 7, pp.934-949, July 2007
- M. Onizuka, “Processing XPath Queries with Forward and Downward Axes over XML Streams,” In Proceedings of the 13th International Conference on Extending Database Technology, pp.27-38, Lausanne, Switzerland, March 2010
- F. Peng and S. C. Sudarshan, “XSQ: A Streaming XPath Engine,” ACM Transactions on Database Systems, vol. 30, no. 2, pp.577-623, June 2005
- T. Perst and H. Seidl, “Macro Forest Transducers,” Information Processing Letters, vol. 89, no. 3, pp.141-149, February 2004
- A. Schmidt, F. Waas, M. Kersten, M. J. Carey, I. Manolescu, and R. Busse, “XMark: A Benchmark for XML Data Management,” In Proceedings of the 28th international conference on Very Large Data Bases, pp.974-985, Hong Kong, China, August 2002
- M. Schmidt, S. Scherzinger, and C. Koch, “Combined Static and Dynamic Analysis for Effective Buffer Minimization in Streaming XQuery Evaluation,” in Proceedings of the 23rd International Conference on Data Engineering, pp.236-245, Istanbul, Turkey, April 2007
- A. Taylor, M. Marcus, and B. Santorini, “The Penn Treebank: An Overview,” In Treebanks, pp.5-22, Springer Netherlands, 2003
- X. D. Wu, , X. Q. Zhu, G. Q. Wu, and W. Ding, “Data Mining with Big Data,” IEEE Transactions on Knowledge and Data Engineering, vol. 26, no. 1, pp.97-107, January 2014
- X. Y. Wu, and D. Theodoratos, “A Survey on XML Streaming Evaluation Techniques,” The VLDB Journal, vol. 22, no. 2, pp.177-202, April 2013
- W. D. Yang, Q. M. Wang, and B. L. Shi, “Complex Twig Pattern Query Processing over XML Streams,” Journal of Software, vol. 18, no. 4, pp.893-904, April 2007
- W. D. Yang, and B. L. Shi, “A Survey of XML Stream Management,” Journal of Computer Research and Development, vol. 46, no. 10, pp.1721-1728, October 2009
Click here to download the paper.
Please note : You will need Adobe Acrobat viewer to view the full articles. |