Skip to main content
Log in

L-BiX: incremental sliding-window aggregation over data streams using linear bidirectional aggregating indexes

  • Regular Paper
  • Published:
Knowledge and Information Systems Aims and scope Submit manuscript

Abstract

The number of real-time information sources, or so-called streams, has rapidly increased, leading to a greater demand for complex analyses over streams. Although many stream analysis methods exist, aggregation is fundamental to ascertain higher levels of knowledge from raw data. In particular, sliding-window aggregation, where aggregations over sliding windows are repeatedly computed, is useful in many real-life applications. Two stacks is the state-of-the-art method to compute sliding-window aggregations incrementally with a O(1) time complexity. However, its performance seriously degrades as the window size increases due to the high overhead to maintain its index. To address this problem, this paper proposes a linear bidirectional index (L-BiX) that exploits two kinds of partial aggregations. Specifically, forward (old-to-new) and backward (new-to-old) aggregations allow efficient computations in an incremental manner. The proposed algorithm requires the same time complexity as two stacks (O(1)). Our experimental evaluation shows that the throughput of L-BiX can be faster by up to 1.71 times than that of two stacks with a 50% smaller memory footprint.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15
Fig. 16
Fig. 17
Fig. 18
Fig. 19
Fig. 20
Fig. 21
Fig. 22

Similar content being viewed by others

References

  1. Abadi DJ et al (2003) Aurora: a new model and architecture for data stream management. VLDB J 2(12):120–139

    Article  Google Scholar 

  2. Abadi DJ et al (2005) The design of the borealis stream processing engine. In: Proceedings of the CIDR, pp 277–289

  3. Akidau T et al (2013) MillWheel: fault-tolerant stream processing at internet scale. Proc VLDB Endow 11(6):1033–1044

    Article  Google Scholar 

  4. Motwani R et al (2003) Query processing, approximation, and resource management in a data stream management system. In: Proceedings of the CIDR

  5. Toshniwal A et al (2014) Storm@twitter. In: Proceedings of the ACM SIGMOD international conference on management of data, pp 147–156

  6. Murray DG et al (2013) Naiad: a timely dataflow system. In: Proceedings of the ACM symposium on operating systems principles, pp 439–455

  7. Ruxton GD (2006) The unequal variance t test is an underused alternative to student’s t test and the Mann Whitney U test. Behav Ecol 17(4):688–690

    Article  Google Scholar 

  8. Shein AU, Chrysanthis PK, Labrinidis A (2015) Processing of aggregate continuous queries in a distributed environment. Real-time business intelligence and analytics - international workshops, BIRTE 2015, Kohala Coast, HI, USA, August 31, 2015, BIRTE 2016, New Delhi, India, September 5, 2016, BIRTE 2017, Munich, Germany, August 28, 2017, Revised Selected Papers, pp 45–62. https://doi.org/10.1007/978-3-030-24124-7_4

  9. Gulisano V et al (2012) Streamcloud: an elastic and scalable data streaming system. IEEE Trans Parallel Distrib Syst 12(23):2351–2351

    Article  Google Scholar 

  10. Arasu A et al (2006) The CQL continuous query language: semantic foundations and query execution. VLDB J 2(15):121–142

    Article  Google Scholar 

  11. Li J et al (2005) Semantics and evaluation techniques for window aggregates in data streams. In: Proceedings of the ACM SIGMOD international conference on management of data, pp 311–322

  12. Akidau T et al (2015) The dataflow model: a practical approach to balancing correctness, latency, and cost in massive-scale, unbounded, out-of-order data processing. Proc VLDB Endow 12(8):1792–1803

    Article  Google Scholar 

  13. Gedik B (2014) Generic windowing support for extensible stream processing systems. Softw Pract Exp 9(44):1105–1128

    Article  Google Scholar 

  14. Ali MH et al (2010) Spatio-temporal stream processing in microsoft streaminsight. Softw Pract Exp 2(33):69–74

    Google Scholar 

  15. Yang J, Widom J (2003) Incremental computation and maintenance of temporal aggregates. VLDB J 3(12):262–283

    Article  Google Scholar 

  16. Moon B et al (2000) Scalable algorithms for large temporal aggregation. In: International conference on data engineering (ICDE), pp 145–154

  17. Soisalon-Soininen E, Widmayer P (2003) Single and bulk updates in stratified trees: an amortized and worst-case analysis. In: Goos G, Hartmanis J, van Leeuwen J (eds) Computer science in perspective: essays dedicated to Thomas Ottmann, LNCS (Lecture Notes in Computer Science), vol 2598. Springer, Berlin, pp 278–292

    Chapter  Google Scholar 

  18. Bou S et al (2019) Streamingcube-based analytical framework for environmental data analysis. In: BigComp 2019, pp 1–8

  19. Bou S et al (2019) Scalable keyword search over relational data streams by aggressive candidate network consolidation. Inf Syst 81:117–135

    Article  Google Scholar 

  20. Bou S et al (2014) Keyword search with path-based filtering over XML streams. In: The 33rd IEEE symposium on reliable distributed systems (SRDS 2014), pp 337–338

  21. Bou S et al (2014) Filtering XML streams by XPath and Keywords. In: Proceedings of the 16th international conference on information integration and web-based applications and services (iiWAS 2014), pp 410–419

  22. Bou S et al (2015) Path-based keyword search over XML streams. Int J Web Inf Syst (IJWIS) 11(3):347–369

    Article  Google Scholar 

  23. Bou S et al (2016) An improved method of keyword search over relational data streams by aggressive candidate network consolidation. In: Database and expert systems applications, pp 336–351

  24. Qlik. https://www.qlik.com/. Accessed 18 Mar 2019

  25. Versive. https://jp.cloudera.com/solutions/gallery/versive-security-engine.html. Accessed 18 Mar 2019

  26. Ayasdi. https://www.ayasdi.com/. Accessed 18 Mar 2019

  27. CDS. https://www.himss.org/clinical-decision-support-cds. Accessed 18 Mar 2019

  28. Blount M et al (2010) Real-time analysis for intensive care: development and deployment of the artemis analytic system. IEEE Eng Med Biol Mag 29:110–118

    Article  Google Scholar 

  29. EHRs. https://www.capterra.com/infographics/top-emr-software. Accessed 18 Mar 2019

  30. Gray J et al (1997) Data cube: a relational aggregation operator generalizing group-by, cross-tab, and sub totals. Data Min Knowl Discov 1(1):29–53

    Article  Google Scholar 

  31. Wesley R, Xu F (2016) Incremental computation of common windowed holistic aggregates. Proc VLDB Endow 9(12):1221–1232

    Article  Google Scholar 

  32. Li J (2005) No pane, no gain: efficient evaluation of sliding-window aggregates over data streams. ACM SIGMOD Rec 34(1):39–44

    Article  Google Scholar 

  33. Krishnamurthy S et al (2006) On-the-fly sharing for streamed aggregation. In: Proceedings of the ACM SIGMOD international conference on management of data, pp 623–634

  34. Carbone P et al (2016) Cutty: aggregate sharing for user-defined windows. In: Proceedings of the ACM international on conference on information and knowledge management, pp 1201–1210

  35. Bhatotia P et al (2014) Slider: incremental sliding window analytics. In: Proceedings of the 15th international middleware conference, pp 61–72

  36. Shein AU et al (2017) Flatfit: accelerated incremental sliding-window aggregation for real-time analytics. In: Proceedings of international conference on scientific and statistical database management, vol 5, pp 1–12

  37. Tangwongsan K et al (2015) General incremental sliding-window aggregation. Proc VLDB Endow 8(7):702–713

    Article  Google Scholar 

  38. Tangwongsan K et al (2017) Low-latency sliding-window aggregation in worst-case constant time. In: Proceedings of the 11th ACM international conference on distributed and event-based systems, pp 66–77

  39. Arasu A, Widom J (2004) Resource sharing in continuous sliding-window aggregates. In: Proceedings of the VLDB conference, pp 336–347

  40. Arasu A, Manku GS (2004) Approximate counts and quantiles over sliding windows. In: Proceedings of the ACM SIGMOD–SIGACT–SIGART symposium on principles of database systems, pp 286–296

  41. Bulut A, Singh AK (2003) SWAT: hierarchical stream summarization in large networks. In: International conference on data engineering (ICDE), pp 303–314

  42. Jiao Y (2006) Maintaining stream statistics over multiscale sliding windows. ACM Trans Database Syst (TODS) 31(4):1305–1334

    Article  MathSciNet  Google Scholar 

  43. Gibbons PB, Tirthapura S et al (2002) Distributed streams algorithms for sliding windows. In: Proceedings of the ACM symposium on parallel algorithms and architectures, pp 63–72

  44. Guirguis S et al (2011) Optimized processing of multiple aggregate continuous queries. In: Proceedings of the ACM international conference on information and knowledge management, pp 1515–1524

  45. Shein AU et al (2015) F1: accelerating the optimization of aggregate continuous queries. In: Proceedings of the ACM international conference on information and knowledge management, pp 1151–1160

  46. Jerzak Z et al (2012) The DEBS 2012 grand challenge. In: Proceedings of the international conference on distributed event-based systems, pp 393–398

  47. Benchmark T (2015) TPC-H benchmark dataset. http://www.tpc.org/tpch/. Accessed 18 Mar 2019

Download references

Acknowledgements

This work has been partly supported by the NICT BigClouT Project and JSPS KAKENHI Grant Number JP19H04114.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Savong Bou.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Bou, S., Kitagawa, H. & Amagasa, T. L-BiX: incremental sliding-window aggregation over data streams using linear bidirectional aggregating indexes. Knowl Inf Syst 62, 3107–3131 (2020). https://doi.org/10.1007/s10115-020-01444-5

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10115-020-01444-5

Keywords

Navigation