IC Colloquium : Query-Based Data Pricing

Event details
Date | 21.01.2013 |
Hour | 16:15 › 17:30 |
Speaker | Dan Suciu, University of Washington |
Location | |
Category | Conferences - Seminars |
Abstract
Data has value, and is increasingly being bought and sold on the Web. Some large data vendors are producing highly valuable data in-house and selling it directly to customers, e.g. Gartner reports or Navteq maps. Smaller vendors are producing data often by aggregating public sources and sell it on Data Markets, such as Azure DataMarkets or Aggdata. And personal data lockers centralize private data with the goal of allow end-users to profit from their use, e.g. personal.com or lockerproject.org. Current pricing mechanisms, however, are very naive. By far the most common case is that of a fixed price for the entire data set. The big stomers can typically afford to purchase the data they need (e.g. the price of one Gartner Report is in the range of thousand of dollars), but small customers often need only a few data items from the entire data set and cannot afford to pay the full price.
In this talk I will discuss a framework for pricing data that allows the seller to set explicit prices for a set of views of her choice, and allows the buyer to buy ANY query; the price of the query is derived automatically from the explicit prices set by the seller. We call this framework ``query-based pricing''. A pricing function must satisfy an important property: it must be "arbitrage-free", in the sense that it must prevent the buyer from obtaining the answer to some query by purchasing and combining cheaper queries. In the case of traditional, conjunctive queries on a relational database, arbitrage-freeness is related to "query-view determinacy", a concept that has been well studied in database theory. In the case when the queries are perturbed answers to linear queries over private data, arbitrage-freeness is related to the "privacy budget" that has been studied in the context of differential privacy. I will show the theoretical complexity of computing an arbitrage-free price (which is high), as well as a practical way to circumvent the high complexity. Joint work with M. Balazinska, B. Howe, P. Koutris, Daniel Li, Chao Li, G. Miklau, P. Upadhyaya.
Biography
Dan Suciu is a Professor in Computer Science at the University of Washington. He received his Ph.D. from the University of Pennsylvania in 1995, was a principal member of the technical staff at AT&T Labs and joined the University of Washington in 2000. Suciu is conducting research in data management, with an emphasis on topics related to Big Data and data sharing, such as probabilistic data, data pricing, parallel data processing, data security. He is a co-author of two books Data on the Web: from Relations to Semistructured Data and XML, 1999, and Probabilistic Databases, 2011. He is a Fellow of the ACM, holds twelve US patents, received the ACM SIGMOD Best Paper Award in 2000, the ACM PODS Alberto Mendelzon Test of Time Award in 2010 and in 2012, and is a recipient of the NSF Career Award and of an Alfred P. Sloan Fellowship. Suciu serves on the VLDB Board of Trustees, and is an associate editor for the VLDB Journal, ACM TOIS, ACM TWEB, and Information Systems and is a past associate editor for ACM TODS. Suciu's PhD students Gerome Miklau and Christopher Re received the ACM SIGMOD Best Dissertation Award in 2006 and 2010 respectively, and Nilesh Dalvi was a runner up in 2008.
Data has value, and is increasingly being bought and sold on the Web. Some large data vendors are producing highly valuable data in-house and selling it directly to customers, e.g. Gartner reports or Navteq maps. Smaller vendors are producing data often by aggregating public sources and sell it on Data Markets, such as Azure DataMarkets or Aggdata. And personal data lockers centralize private data with the goal of allow end-users to profit from their use, e.g. personal.com or lockerproject.org. Current pricing mechanisms, however, are very naive. By far the most common case is that of a fixed price for the entire data set. The big stomers can typically afford to purchase the data they need (e.g. the price of one Gartner Report is in the range of thousand of dollars), but small customers often need only a few data items from the entire data set and cannot afford to pay the full price.
In this talk I will discuss a framework for pricing data that allows the seller to set explicit prices for a set of views of her choice, and allows the buyer to buy ANY query; the price of the query is derived automatically from the explicit prices set by the seller. We call this framework ``query-based pricing''. A pricing function must satisfy an important property: it must be "arbitrage-free", in the sense that it must prevent the buyer from obtaining the answer to some query by purchasing and combining cheaper queries. In the case of traditional, conjunctive queries on a relational database, arbitrage-freeness is related to "query-view determinacy", a concept that has been well studied in database theory. In the case when the queries are perturbed answers to linear queries over private data, arbitrage-freeness is related to the "privacy budget" that has been studied in the context of differential privacy. I will show the theoretical complexity of computing an arbitrage-free price (which is high), as well as a practical way to circumvent the high complexity. Joint work with M. Balazinska, B. Howe, P. Koutris, Daniel Li, Chao Li, G. Miklau, P. Upadhyaya.
Biography
Dan Suciu is a Professor in Computer Science at the University of Washington. He received his Ph.D. from the University of Pennsylvania in 1995, was a principal member of the technical staff at AT&T Labs and joined the University of Washington in 2000. Suciu is conducting research in data management, with an emphasis on topics related to Big Data and data sharing, such as probabilistic data, data pricing, parallel data processing, data security. He is a co-author of two books Data on the Web: from Relations to Semistructured Data and XML, 1999, and Probabilistic Databases, 2011. He is a Fellow of the ACM, holds twelve US patents, received the ACM SIGMOD Best Paper Award in 2000, the ACM PODS Alberto Mendelzon Test of Time Award in 2010 and in 2012, and is a recipient of the NSF Career Award and of an Alfred P. Sloan Fellowship. Suciu serves on the VLDB Board of Trustees, and is an associate editor for the VLDB Journal, ACM TOIS, ACM TWEB, and Information Systems and is a past associate editor for ACM TODS. Suciu's PhD students Gerome Miklau and Christopher Re received the ACM SIGMOD Best Dissertation Award in 2006 and 2010 respectively, and Nilesh Dalvi was a runner up in 2008.
Links
Practical information
- Informed public
- Free
- This event is internal
Organizer
- Christoph Koch
Contact
- Simone Muller / Christine Moscioni