IC Colloquium : Finding Datacenter Software Tail Latency

Event details
Date | 16.10.2017 |
Hour | 16:15 › 17:30 |
Location | |
Category | Conferences - Seminars |
By : Richard L. Sites - Invited Professor within LABOS
Video of his talk
Abstract :
Datacenter computers are the other half of cell phones -- the anonymous servers somewhere in the world that make every cell phone browser, app, and operation work. Unlike traditional throughput-oriented computing, datacenter software is measured by user-facing transaction latency. For a given service, a histogram of the latencies usually has a long tail of very slow responses, with the 99th percentile latency 10x or more of the median latency. The "interesting" slow transactions are only slow under live load during the busiest hour of the day; they are fast if run again. They cannot be reproduced during offline load testing, and their underlying causes remain a mystery for months or years, hurting overall datacenter capacity. As an industry, we have very poor tools for observing and therefore fixing the unknown sources of interference.
The talk discusses several low-overhead tools for identifying where all the transaction wallclock time goes in such complex software. Versions of these have been in production use at Google for a few years.
Bio :
Dick Sites is currently an Invited Professor at EPFL Lausanne, teaching a graduate class on Datacenter Software Dynamics. In 2016 he taught an earlier version at the National University of Singapore. Prior to that, he worked at Google, Adobe Systems, Digital Equipment Corporation, Hewlett-Packard, Burroughs, and IBM. He also taught Computer Science at UC/San Diego in the 1970s. His accomplishments include co-architecting the DEC Alpha computers and building computer performance monitoring and tracing tools at the above companies. At Google, this included understanding CPU, disk, and network performance anomalies. Dr. Sites holds a PhD in Computer Science from Stanford and a BS in Mathematics from MIT. He also attended the Master's program in Computer Science at UNC Chapel Hill. He holds 39 patents and is a member of the U.S. National Academy of Engineering.
Video of his talk
Abstract :
Datacenter computers are the other half of cell phones -- the anonymous servers somewhere in the world that make every cell phone browser, app, and operation work. Unlike traditional throughput-oriented computing, datacenter software is measured by user-facing transaction latency. For a given service, a histogram of the latencies usually has a long tail of very slow responses, with the 99th percentile latency 10x or more of the median latency. The "interesting" slow transactions are only slow under live load during the busiest hour of the day; they are fast if run again. They cannot be reproduced during offline load testing, and their underlying causes remain a mystery for months or years, hurting overall datacenter capacity. As an industry, we have very poor tools for observing and therefore fixing the unknown sources of interference.
The talk discusses several low-overhead tools for identifying where all the transaction wallclock time goes in such complex software. Versions of these have been in production use at Google for a few years.
Bio :
Dick Sites is currently an Invited Professor at EPFL Lausanne, teaching a graduate class on Datacenter Software Dynamics. In 2016 he taught an earlier version at the National University of Singapore. Prior to that, he worked at Google, Adobe Systems, Digital Equipment Corporation, Hewlett-Packard, Burroughs, and IBM. He also taught Computer Science at UC/San Diego in the 1970s. His accomplishments include co-architecting the DEC Alpha computers and building computer performance monitoring and tracing tools at the above companies. At Google, this included understanding CPU, disk, and network performance anomalies. Dr. Sites holds a PhD in Computer Science from Stanford and a BS in Mathematics from MIT. He also attended the Master's program in Computer Science at UNC Chapel Hill. He holds 39 patents and is a member of the U.S. National Academy of Engineering.
Practical information
- General public
- Free
- This event is internal
Contact
- Host : Willy Zwaenepoel