Usher: Improving Data Quality with Dynamic Forms

Usher: Improving Data Quality with Dynamic Forms Technology Used: Java/ J2EE Knowledge and Data Engineering, 2011 Data quality is a critical problem in modern databases. data-entry forms present the first and arguably best opportunity for detecting and mitigating errors, but there has been little research into automatic methods for improving data quality at entry time. In this paper, we propose Usher, an end-to-end system for form design, entry, and data quality assurance. Using previous form submissions, Usher learns a probabilistic model over the questions of the form. Usher then applies…

Read More

Publishing Search Logs – A Comparative Study of Privacy Guarantees

Technology Used: Java/ J2EE Knowledge and Data Engineering, 2011 Search engine companies collect the “database of intentions,” the histories of their users’ search queries. These search logs are a gold mine for researchers. Search engine companies, however, are wary of publishing search logs in order not to disclose sensitive information. In this paper we analyze algorithms for publishing frequent keywords, queries and clicks of a search log. We first show how methods that achieve variants of k-anonymity are vulnerable to active attacks. We then demonstrate that the stronger guarantee ensured…

Read More