Association Matrix Method and Its Applications in Mining DNA Sequences

Mao, Guojun

doi:10.1007/978-3-030-20454-9_15

Guojun Mao¹⁵

Part of the book series: Advances in Intelligent Systems and Computing ((AISC,volume 965))

Included in the following conference series:

International Conference on Applied Human Factors and Ergonomics

2523 Accesses
1 Citations

Abstract

Many mining algorithms have been presented for business big data such as marketing baskets, but they cannot be effective or efficient for mining DNA sequences, any of which is typically with a small alphabet but a much long sizes. This paper will design a compact data structure called Association Matrix, and give an algorithm to specially mine long DNA sequences. The Association Matrix is novel in-memory data structure, which can be so compact that it can deal with super long DNA sequences in a limited memory spaces. Such, based on the Association Matrix structure, we can design the algorithms for efficiently mining key segments from DNA sequences. Additionally, we will show our related experiments and results in this paper.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 129.00; Price excludes VAT (USA)

Softcover Book: USD 169.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Papapetrou, P., Benson, G., Kollios, G.: Mining poly-regions in DNA. Int. J. Data Min. Bioinform. 4, 406–428 (2012)
Article Google Scholar
Agrawal, R., Srikant, R.: Mining sequential patterns. In: The 1995 International Conference on Data Engineering, pp. 3–14. Taipei, Taiwan (1995)
Google Scholar
Srikant, R., Agrawal, R.: Mining sequential patterns: generalizations and performance improvements. In: The 1996 International Conference on Extending Database Technology (EDBT), pp. 3–17 (1996)
Google Scholar
Han, J., Pei, J.: Free-span: frequent pattern-projected sequential pattern mining. In: The 2000 International Conference on Knowledge Discovery and Data Mining, pp. 355–359 (2000)
Google Scholar
Mohammed, J.: SPADE: an efficient algorithm for mining frequent sequences. J. Mach. Learn. 1, 31–60 (2001)
MATH Google Scholar
Liu, C., Chen, L., Liu, Z., Tseng, V.: Effective peak alignment for mass spectrometry data analysis using two-phase clustering approach. Int. J. Data Min. Bioinf. 1, 52–66 (2014)
Article Google Scholar
Bell, D., Guan, J.: Data mining for motifs in DNA sequences. In: The 2003 Rough Sets, Fuzzy Sets, Data Mining, and Granular Computing. LNCS, vol. 2639, pp. 507–514 (2003)
Google Scholar
Liu, Z., Jiao, D., Sun, X.: Classifying genomic sequences by sequence feature analysis. Genomics Proteomics Bioinf. 4, 201–205 (2005)
Article Google Scholar
Habib, N., Kaplan, T., Margalit, H., Friedman, N.: A novel Bayesian DNA motif comparison method for clustering and retrieval. PLoS Comput. Biol. 4, 1–16 (2008)
Article MathSciNet Google Scholar
Mannila, H., Toivonen, H., Verkamo, I.: Discovery of frequent episodes in event sequences. Data Min. Knowl. Discov. 1, 259–289 (1997)
Article Google Scholar
Mannila, H., Salmenkivi, M.: Finding simple intensity descriptions from event sequence data. In: The 7th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 341–346 (2001)
Google Scholar
Keogh, E., Chu, S., Hart, D., Pazzani, M.: An online algorithm for segmenting time series. In: The 2001 IEEE International Conference on Data Mining, pp. 289–296 (2001)
Google Scholar
Stegmaier, P., Kel, A., Wingender, E., Borlak, J.: A discriminative approach for unsupervised clustering of DNA sequence motifs. PLoS Comput. Bio. 9, e1002958 (2013)
Article MathSciNet Google Scholar
Wu, Y., Wang, L., Ren, J., Ding, W., Wu, X.: Mining sequential patterns with periodic wildcard gap. J. Appl. Intell. 41, 99–116 (2014)
Article Google Scholar
Wang, K., Xu, Y., Yu, J.: Scalable sequential pattern mining for biological sequences. In: The 13th International Conference on Information and Knowledge Management, pp. 10–15 (2004)
Google Scholar

Download references

Acknowledgements

I am deeply indebted to the NSFC (China National Science Foundation of China), for its funding support with Number 61773415 makes the related re-search of this paper better.

Author information

Authors and Affiliations

School of Mathematics and Physics, AI & AK Institute, Fujian University of Technology, Fuzhou, 350118, People’s Republic of China
Guojun Mao

Authors

Guojun Mao
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Guojun Mao .

Editor information

Editors and Affiliations

Institute for Advanced Systems Engineering, University of Central Florida, Orlando, FL, USA
Tareq Ahram

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Mao, G. (2020). Association Matrix Method and Its Applications in Mining DNA Sequences. In: Ahram, T. (eds) Advances in Artificial Intelligence, Software and Systems Engineering. AHFE 2019. Advances in Intelligent Systems and Computing, vol 965. Springer, Cham. https://doi.org/10.1007/978-3-030-20454-9_15

Download citation

DOI: https://doi.org/10.1007/978-3-030-20454-9_15
Published: 11 June 2019
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-20453-2
Online ISBN: 978-3-030-20454-9
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics