Username   Password       Forgot your password?  Forgot your username? 


Compression Method of Factor Oracle by Triple-Array Structures

Volume 15, Number 4, April 2019, pp. 1247-1254
DOI: 10.23940/ijpe.19.04.p20.12471254

Koji Bandoa, Takato Nakanob, Kazuhiro Moritac, and Masao Fuketac

aNTT Plala Inc., 24th Florr, Sunshine 60, 3-1-1 Higashi Ikebukuro, Toshima-ku, Tokyo, 170-6024, Japan
bNichia Corporation, 491 Oka, Kaminaka-Cho, Anan-Shi, Tokushima, 774-8601, Japan
cTokushima University, 2-1 Minami-Josanjima, Tokushima, 770-8506, Japan

(Submitted on March 12, 2019; Revised on March 14, 2019; Accepted on April 15, 2019)


Pattern matching is an important technique in text processing and is used for character string replacement and search. A factor oracle is a data structure for pattern matching, and it is a finite state automaton that can search substrings. This data structure consists of internal and external transitions and has the characteristic property of accepting at least all substrings. The automaton including the factor oracle is represented by a two-dimensional array called Table, Johnson method, etc. Search using Table is fast, but the memory capacity is large. The Johnson method has the feature of representing the factor oracle using a small amount of storage, and since it can be achieved with a transition speed of O(1), it is considered as fast as Table. The memory of the Johnson method depends on the size of the element itself and the number of elements. In other words, by reducing these, the applicability of the factor oracle to enormous data will be further enhanced. In this research, using compact double-array, we propose a method consisting of compressing the size of the elements themselves, improving the Johnson method in order to achieve a construction using external transitions only, and compressing the number of elements. As shown by the experimental results, the proposed method has higher storage efficiency compared to the Johnson method and is capable under certain conditions of high-speed search.

References: 14

    1. P. Weiner, “Linear Pattern Matching Algorithms,” in Proceedings of the 14th Annual IEEE Symposium on Switching and Automata Theory, pp. 1-11, 1973
    2. C. Allauzen, M. Crochemore, and M. Raffinot, “Factor Oracle: A New Structure for Pattern Matching,” in Proceedings of the 26th Conference on Current Trends in Theory and Practice of Informatics, pp. 291- 306, 1999
    3. S. C. Johnson, “YACC: Yet Another Compiler Compiler,” Bell Laboratories, Murray Hill, NJ, 1978
    4. A. V. Aho, R. Sethi, J. D. Ullman, “Compilers-Principles Techniques and Tools,” Addison-Wesley, 1986
    5. M. Fuketa, K. Morita, and J. Aoe, “Comparisons of Efficient Implementations for DAWG,” International Journal of Computer Theory and Engineering, Vol. 8, No. 1, pp. 48-52, 2016
    6. R. Baeza-Yates and B. Ribeiro-Neto, “Modern Information Retrieval,” 2nd Edition, Addison Wesley, 2011
    7. D. E. Knuth, J. Morris Jr, and V. Pratt, “Fast Pattern Matching in Strings,” SIAM Journal on Computing, Vol. 6, No. 2, pp. 323-350, 1977
    8. A. Maeda and K. Mizushima, “A Compressed-Array Representation of Automata and its Application to Programming Language,” in Proceedings of the 49th Programming Symposium on Information Processing Society of Japan, pp. 49-54, 2008
    9. M. Toro, “Probabilistic Extension to the Concurrent Constraint Factor Oracle Model for Music Improvisation,” Inteligencia Artificial, Vol. 19, No. 57, pp. 37-73, 2016
    10. J. Aoe, “An Efficient Digital Search Algorithm by using a Double-Array Structure,” IEEE Transactions on Software Engineering, Vol. 15, No. 9, pp. 1066-1077, 1989
    11. H. Bast, C.W. Mortensen, and I. Weber, “Output-Sensitive Autocompletion Search,” Information Retrieval, Vol. 11, No. 4, pp. 269-286, 2008
    12. D. E. Knuth, “The Art of Computer Programming, 3,” Sorting and Searching, 2nd Edition, Addison Wesley, 1998
    13. G. Assayag and S. Dubnov, “Using Factor Oracles for Machine Improvisation,” Soft Computing, Vol. 8, No. 9, pp. 604-610, 2004
    14. S. Yata, K. Morita, M. Fuketa, and J. Aoe, “Fast String Matching with Space-Efficient Word Graphs,” in Proceedings of 2008 International Conference on Innovations in Information Technology, pp. 79-83, 2008


    Please note : You will need Adobe Acrobat viewer to view the full articles.Get Free Adobe Reader

    This site uses encryption for transmitting your passwords.