Diagnosing Production-Run Concurrency-Bug Failures

Thumbnail

Event details

Date 03.02.2014
Hour 10:3011:30
Speaker Shan LU
Location
Category Conferences - Seminars
Failures caused by software bugs are widespread in production runs, causing severe losses for end users. Unfortunately, diagnosing production-run failures, especially failures caused by concurrency bugs in multi-threaded software, is challenging. Existing work cannot satisfy privacy, run-time overhead, diagnosis capability, and diagnosis latency requirements all at once.

This talk will present a series of attempts from our group to address the above challenges. Our first attempt, called CCI, applies the cooperative bug isolation (CBI) approach, which was initially designed for sequential bugs, to concurrency bugs. Our carefully designed interleaving predicates and sampling schemes allow CCI to diagnose a wide variety of concurrency-bug failures with decent overhead. Our second attempt, called PBI, further improves the performance and preserves the diagnosis capability of CCI through a novel use of hardware performance counters. Our final attempt, called LXR, addresses the long diagnosis latency problem of CCI and PBI. Different from CCI and PBI that both obtain run-time information through sampling, LXR obtains run-time information through hardware support that maintains recent execution history with negligible overhead. I will conclude the talk by discussing other research in my group that tackles concurrency bugs and performance bugs.

Practical information

  • Informed public
  • Free

Organizer

  • Babak Falsafi

Contact

  • Stéphanie Baillargues

Event broadcasted in

Share