Skip to main content

Association Matrix Method and Its Applications in Mining DNA Sequences

  • Conference paper
  • First Online:
Advances in Artificial Intelligence, Software and Systems Engineering (AHFE 2019)

Part of the book series: Advances in Intelligent Systems and Computing ((AISC,volume 965))

Included in the following conference series:

Abstract

Many mining algorithms have been presented for business big data such as marketing baskets, but they cannot be effective or efficient for mining DNA sequences, any of which is typically with a small alphabet but a much long sizes. This paper will design a compact data structure called Association Matrix, and give an algorithm to specially mine long DNA sequences. The Association Matrix is novel in-memory data structure, which can be so compact that it can deal with super long DNA sequences in a limited memory spaces. Such, based on the Association Matrix structure, we can design the algorithms for efficiently mining key segments from DNA sequences. Additionally, we will show our related experiments and results in this paper.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 129.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 169.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Papapetrou, P., Benson, G., Kollios, G.: Mining poly-regions in DNA. Int. J. Data Min. Bioinform. 4, 406–428 (2012)

    Article  Google Scholar 

  2. Agrawal, R., Srikant, R.: Mining sequential patterns. In: The 1995 International Conference on Data Engineering, pp. 3–14. Taipei, Taiwan (1995)

    Google Scholar 

  3. Srikant, R., Agrawal, R.: Mining sequential patterns: generalizations and performance improvements. In: The 1996 International Conference on Extending Database Technology (EDBT), pp. 3–17 (1996)

    Google Scholar 

  4. Han, J., Pei, J.: Free-span: frequent pattern-projected sequential pattern mining. In: The 2000 International Conference on Knowledge Discovery and Data Mining, pp. 355–359 (2000)

    Google Scholar 

  5. Mohammed, J.: SPADE: an efficient algorithm for mining frequent sequences. J. Mach. Learn. 1, 31–60 (2001)

    MATH  Google Scholar 

  6. Liu, C., Chen, L., Liu, Z., Tseng, V.: Effective peak alignment for mass spectrometry data analysis using two-phase clustering approach. Int. J. Data Min. Bioinf. 1, 52–66 (2014)

    Article  Google Scholar 

  7. Bell, D., Guan, J.: Data mining for motifs in DNA sequences. In: The 2003 Rough Sets, Fuzzy Sets, Data Mining, and Granular Computing. LNCS, vol. 2639, pp. 507–514 (2003)

    Google Scholar 

  8. Liu, Z., Jiao, D., Sun, X.: Classifying genomic sequences by sequence feature analysis. Genomics Proteomics Bioinf. 4, 201–205 (2005)

    Article  Google Scholar 

  9. Habib, N., Kaplan, T., Margalit, H., Friedman, N.: A novel Bayesian DNA motif comparison method for clustering and retrieval. PLoS Comput. Biol. 4, 1–16 (2008)

    Article  MathSciNet  Google Scholar 

  10. Mannila, H., Toivonen, H., Verkamo, I.: Discovery of frequent episodes in event sequences. Data Min. Knowl. Discov. 1, 259–289 (1997)

    Article  Google Scholar 

  11. Mannila, H., Salmenkivi, M.: Finding simple intensity descriptions from event sequence data. In: The 7th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 341–346 (2001)

    Google Scholar 

  12. Keogh, E., Chu, S., Hart, D., Pazzani, M.: An online algorithm for segmenting time series. In: The 2001 IEEE International Conference on Data Mining, pp. 289–296 (2001)

    Google Scholar 

  13. Stegmaier, P., Kel, A., Wingender, E., Borlak, J.: A discriminative approach for unsupervised clustering of DNA sequence motifs. PLoS Comput. Bio. 9, e1002958 (2013)

    Article  MathSciNet  Google Scholar 

  14. Wu, Y., Wang, L., Ren, J., Ding, W., Wu, X.: Mining sequential patterns with periodic wildcard gap. J. Appl. Intell. 41, 99–116 (2014)

    Article  Google Scholar 

  15. Wang, K., Xu, Y., Yu, J.: Scalable sequential pattern mining for biological sequences. In: The 13th International Conference on Information and Knowledge Management, pp. 10–15 (2004)

    Google Scholar 

Download references

Acknowledgements

I am deeply indebted to the NSFC (China National Science Foundation of China), for its funding support with Number 61773415 makes the related re-search of this paper better.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Guojun Mao .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2020 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Mao, G. (2020). Association Matrix Method and Its Applications in Mining DNA Sequences. In: Ahram, T. (eds) Advances in Artificial Intelligence, Software and Systems Engineering. AHFE 2019. Advances in Intelligent Systems and Computing, vol 965. Springer, Cham. https://doi.org/10.1007/978-3-030-20454-9_15

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-20454-9_15

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-20453-2

  • Online ISBN: 978-3-030-20454-9

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics