Inferring application performance regardless of data completeness

Event details
Date | 19.06.2014 |
Hour | 16:30 |
Speaker | Alessandra Sala, Bell Labs Ireland |
Location | |
Category | Conferences - Seminars |
Modern communication networks, such as online social networks and call networks, give us a unique opportunity of observing, analyzing and better understanding human behaviors. In Telecommunication industry, user’s data are considered a precious source of information to shed light into novel insights to drive the design of future communication platforms. Unfortunately, in the presence of factors such as increasing privacy awareness, restrictions on application programming interfaces (APIs) and constrained sampling strategies, analyzing complete datasets is often unrealistic. For instance, partial network views are basically default in telco analytics, as customers typically have frequent contacts with customers of other providers - which naturally cannot be observed; or, accurately inferring user activity is the Holy Grail of mobile advertisement and targeted service offering because privacy restrictions usually do not allow the logging of complete URLs.
This talk discusses the potential and risks of mining partial data with the analysis of two specific use cases. In the first use case, we unveil the hidden effects in the evaluation of marketing campaign in social networks when the spread of information is estimated from a partial view of the network. The proposed methodology is able to quantify the error introduced due to network partiality based on a theoretical oracle scenario and correct for the introduced error at large extent. In the second use case, we show an approach to mine mobile web traces form heavily truncated URLs and inferring user activities with high accuracy. Truncated URLs are trimmed from information like location or purchased products, to mask possibly sensitive end user data. Furthermore, URLs derived from real web traces are highly noisy because dominated by unintentional web traffic like advertisement, web analytics or third parties scripts. We have developed a statistical model to segregate representative URLs characterizing the user activities from unintentional web traffic and demonstrated that our approach classifies user activities with 92% accuracy.
This talk discusses the potential and risks of mining partial data with the analysis of two specific use cases. In the first use case, we unveil the hidden effects in the evaluation of marketing campaign in social networks when the spread of information is estimated from a partial view of the network. The proposed methodology is able to quantify the error introduced due to network partiality based on a theoretical oracle scenario and correct for the introduced error at large extent. In the second use case, we show an approach to mine mobile web traces form heavily truncated URLs and inferring user activities with high accuracy. Truncated URLs are trimmed from information like location or purchased products, to mask possibly sensitive end user data. Furthermore, URLs derived from real web traces are highly noisy because dominated by unintentional web traffic like advertisement, web analytics or third parties scripts. We have developed a statistical model to segregate representative URLs characterizing the user activities from unintentional web traffic and demonstrated that our approach classifies user activities with 92% accuracy.
Links
Practical information
- General public
- Free
Organizer
- Matthias Grossglauser
Contact
- Sylvie Thomet