Beating CountSketch for Heavy Hitters in Insertion Streams

Event details
Date | 29.02.2016 |
Hour | 16:15 › 17:15 |
Speaker | David P. Woodruff |
Location | |
Category | Conferences - Seminars |
Abstract:
We consider the problem of finding the most frequent items in a stream of items from a universe of size n. Namely, we consider returning all l_2-heavy hitters, i.e., those items j for which f_j >= eps sqrt{F_2}, where f_j is the number of occurrences of item j, and F_2 = sum_i f_i^2 is the second moment of the stream. In 2002, Charikar, Chen, and Farach-Colton suggested the CountSketch data structure, which solves this using log^2 n bits of space (for constant eps). The only known lower bound is log n bits. Using Gaussian processes, we show it is possible to achieve O(log n log log n) bits of space.
Bio:
David Woodruff has been a research scientist in the principles and methodologies group at IBM Almaden since 2007, which he joined after completing his Ph.D. in theoretical computer science at MIT. His interests are in data streams, distributed computation, machine learning, and numerical linear algebra, among other things. He received the EATCS Presburger Award in 2014, and Best Paper Awards in STOC 2013 and PODS 2010. He is a member of the IBM Academy of Technology and a Master Inventor at IBM.
We consider the problem of finding the most frequent items in a stream of items from a universe of size n. Namely, we consider returning all l_2-heavy hitters, i.e., those items j for which f_j >= eps sqrt{F_2}, where f_j is the number of occurrences of item j, and F_2 = sum_i f_i^2 is the second moment of the stream. In 2002, Charikar, Chen, and Farach-Colton suggested the CountSketch data structure, which solves this using log^2 n bits of space (for constant eps). The only known lower bound is log n bits. Using Gaussian processes, we show it is possible to achieve O(log n log log n) bits of space.
Bio:
David Woodruff has been a research scientist in the principles and methodologies group at IBM Almaden since 2007, which he joined after completing his Ph.D. in theoretical computer science at MIT. His interests are in data streams, distributed computation, machine learning, and numerical linear algebra, among other things. He received the EATCS Presburger Award in 2014, and Best Paper Awards in STOC 2013 and PODS 2010. He is a member of the IBM Academy of Technology and a Master Inventor at IBM.
Practical information
- General public
- Free
- This event is internal
Organizer
- Theory of Computation Laboratory 4 - THL4 - Prof. M. Kapralov
Contact
- Simone Muller