CLUE: System Trace Analytics for Cloud Service Performance Diagnosis

Abstract

In this paper, we present CLUE, a system event analytics tool for black-box performance diagnosis in production Cloud Computing systems. CLUE provides an unified and extensible means of profiling service transactional behaviors, and builds structured data called event sketches. CLUE further offers a set of analytic tools for summarizing and analyzing event sketches by integrating data mining and statistical analysis. CLUE has been developed in NEC as an internal tool and applied in diagnosing a diverse set of real performance problems for multi-tiered IT applications running on multi-core servers of major platforms including Linux (Redhat, Fedora), Unix (HP-UX), and Windows (Windows Server 2008). We demonstrated the evaluation of our framework on real-world IT systems, and showed how it can enable visibility and effective diagnosis of service system performance problems

Publication
IEEE/IFIP Network Operation and Management Symposium, Krakow, Poland