CLOUDRAID DETECTING DISTRIBUTED CONCURRENCY BUGS VIA LOG MINING AND ENHANCEMENT
Keywords:
distributed concurrency bugs, cloud systems, CLOUDRAID,, automatical tool,, message orderings, log mining, log enhancingAbstract
Cloud systems suffer from distributed concurrency bugs, which often lead to data loss and service outage. This paper
presents CLOUDRAID, a new automatical tool for finding distributed concurrency bugs efficiently and effectively.
Distributed concurrency bugs are notoriously difficult to find as they are triggered by untimely interaction among
nodes, i.e., unexpected message orderings. To detect concurrency bugs in cloud systems efficiently and effectively,
CLOUDRAID analyzes and tests automatically only the message orderings that are likely to expose errors.
Specifically, CLOUDRAID mines the logs from previous executions to uncover the message orderings that are
feasible but inadequately tested. In addition, we also propose a log enhancing technique to introduce new logs
automatically in the system being tested. These extra logs added improve further the effectiveness of CLOUDRAID
without introducing any noticeable performance overhead. Our log-based approach makes it well-suited for live
systems. We have applied CLOUDRAID to analyze six representative distributed systems: Hadoop2/Yarn, HBase,
HDFS, Cassandra, Zookeeper, and Flink. CLOUDRAID has succeeded in testing 60 different versions of these six
systems (10 versions per system) in 35 hours, uncovering 31 concurrency bugs, including nine new bugs that have
never been reported before. For these nine new bugs detected, which have all been confirmed by their original
developers, three are critical and have already been fixed.
Downloads
Downloads
Published
Issue
Section
License

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.