Abstract
Achieving large-scale content-based publish-subscribe has been an ambitious research agenda that has received tremendous attention over the last two decades by researchers from distributed systems and networking communities. Simulations have been the most common approach for evaluating solutions. However, the research community on this topic neither has shared workload assumptions nor standard workload generation methodologies. As a result, each effort has introduced its own assumptions and ad-hoc workload generation methodologies. Also, comparison to related alternatives has often been neglected. This has made it difficult to understand the performance gains of one contribution over related alternatives. This paper reports an effort to enhance a workload generation tool for content-based publish-subscribe research using Google groups data. It is enhanced with a visual characterization of the generated workload, given a set of parameters. The workload generated can be characterized in terms of popularity and locality. The resulting software contributes to generating well-specified workloads, facilitates experiment reproducibility, and it will also be time-saving in evaluation processes.
Keywords: performance evaluation; distributed systems; workload characterization; workload generation; content-based publish subscribe
References
- Tarkoma S. “Publish/subscribe systems: design and principles”. John Wiley & Sons (2012).
- Eugster PT., et al. “The many faces of publish/subscribe”. ACM computing surveys (CSUR) 35 (2003): 114-131.
- Ramasubramanian V, Peterson R and Sirer EG. “Corona: A High Performance Publish-Subscribe System for the World Wide Web”. In Proceedings of the NSDI 6 (2006): 2-2.
- Carzaniga A and Wolf AL. “Content-based networking: A new communication infrastructure”. In Proceedings of the Workshop on Infrastruture for Mobile and Wireless Systems. Springer (2001): 59-68.
- Cao F and Singh JP. “Efficient event routing in content-based publish-subscribe service networks”. In Proceedings of the INFOCOM 2004. Twenty-third AnnualJoint Conference of the IEEE Computer and Communications Societies. IEEE 2 (2004): 929-940.
- Majumder A., et al. “Scalable content-based routing in pub/sub systems”. In Proceedings of the INFOCOM 2009, IEEE. IEEE (2009): 567-575.
- Diallo M., et al. “A content-based publish/subscribe framework for large-scale content delivery”. Computer Networks 57 (2013): 924-943.
- Yu A, Agarwal PK and Yang J. “Generating wide-area content-based publish/subscribe work-loads”. Network Meets Database (NetDB) (2009).
- Fabret F., et al. “Filtering algorithms and implementation for very fast publish/subscribe systems”. In Proceedings of the ACM SIGMOD Record. ACM 30 (2001): 115-126.
- Raiciu C, Rosenblum DS and Handley M. “Revisiting content-based publish/subscribe”. In Proceedings of the 26th IEEE International Conference on Distributed Computing Systems Workshops (ICDCSW’06). IEEE (2006): 19-19.
- Riabov A., et al. “Clustering algorithms for content-based publication-subscription systems”. In Proceedings of the Distributed Computing Systems, 2002. Proceedings. 22nd International Conference on. IEEE (2002): 133-142.
- Riabov A., et al. “New algorithms for content-based publication-subscription systems”. In Proceedings of the Distributed Computing Systems, 2003. Proceedings. 23rd International Conference on. IEEE (2003): 678-686.
- Carzaniga A, Rutherford MJ and Wolf AL. “A routing scheme for content-based networking”. In Proceedings of the INFOCOM 2004. Twenty-third AnnualJoint Conference of the IEEE Computer and Communications Societies. IEEE 2 (2004): 918-928.
- Chandramouli B., et al. “ProSem: Scalable wide-area publish/subscribe”. In Proceedings of the Proceedings of the 2008 ACM SIGMOD international conference on Management of data. ACM (2008): 1315-1318.
- Opyrchal L., et al. “Exploiting IP multicast in content-based publish-subscribe systems”. In Proceedings of the IFIP/ACM International Conference on Distributed systems platforms. Springer-Verlag New York, Inc (2000): 185-207.
- Ji S. “Efficient Content-based Publish/Subscribe Routing and Boolean Expression Matching Algorithms”. PhD thesis, Technical University of Munich (2018).
- Gupta A., et al. “Meghdoot: content-based publish/subscribe over P2P networks”. In Proceedings of the Proceedings of the 5th ACM/IFIP/USENIX international conference on Middleware. Springer-Verlag New York, Inc (2004): 254-273.
- Qian S., et al. “Fat Topic: Improving Latency in Content-Based Publish/Subscribe Systems on Apache Kafka”. In Proceedings of the International Conference on Wireless Algorithms, Systems, and Applications. Springer (2021): 547-558.
- Shah M and Kulkarni D. “Design and development of high performance, scalable content based publish subscribe system: doctoral symposium”. In Proceedings of the Proceedings of the 10th ACM International Conference on Distributed and Event-based Systems (2016): 406-409.
- Qian S., et al. “REIN: A fast event matching approach for content-based publish/subscribe systems”. In Proceedings of the IEEE INFOCOM 2014 - IEEE Conference on Computer Communications (2014): 2058-2066.
- Diallo MB. “Content-based networking for global scale mediation services”. PhD thesis, UPMC Sorbonne Universities (2013).
- Barkallah M. “Characterizing a Workload Generator for CBPS systems evaluation”. Master’s thesis, UPMC Sorbonne Universities (2011).
- Sadoghi M and Jacobsen HA. “Be-tree: an index structure to efficiently match boolean expressions over high-dimensional discrete space”. In Proceedings of the Proceedings of the 2011 ACM SIGMOD International Conference on Management of data (2011): 637-648.
- Carzaniga A and Wolf AL. “A benchmark suite for distributed publish/subscribe systems”. Technical report, Colorado Univ at Boulder Dept of Computer Science (2002).
- Henjes R, Menth M and Zepfel C. “Throughput performance of java messaging services using sun java system message queue”. In Proceedings of the Proceedings 20th European Conference on Modelling and Simulation, Bonn, Germany (2006): 684-691.
- Zhang K., et al. “PSBench: a benchmark for content-and topic-based publish/subscribe systems”. In Proceedings of the Posters & Demos Session (2014): 17-18.
- Lazidis A., et al. “Open-Source Publish-Subscribe Systems: A Comparative Study”. In Proceedings of the International Conference on Advanced Information Networking and Applications. Springer (2022): 105-115.
- Sachs K., et al. “Performance evaluation of message-oriented middleware using the SPECjms2007 benchmark”. Performance Evaluation 66 (2009): 410-434.
- Diallo M., et al. “Onelab2 Deliverable: D7.6 - Implementation of content-specific operational extensions”. Technical report (2010).
- Reason C., et al. “Onelab2 Deliverable: D7.4 - Definition of virtual and federated publish-subscribe architecture”. Technical report (2009).
Foot Notes
- Oneofthepioneering work on content-based publish-subscribe.
- Complementary cumulative distribution function. It is more practical to highlight the maximum popularity and the absence of interest.
- http://msrg.org/datasets/BEGen.
- https://openmessaging.cloud/docs/benchmarks/.