Speaker: Guy van den Broeck

Guy van den Broeck
Title: Symmetry in Probabilistic Databases

Abstract: Researchers in databases, AI, and machine learning, have all proposed representations of probability distributions over relational databases (possible worlds). In a tuple-independent probabilistic database, the possible worlds all have distinct probabilities, because the tuple probabilities are distinct. In AI and machine learning, however, one typically learns highly symmetric distributions, where large numbers of symmetric databases get assigned identical probability. This symmetry helps with generalizing from data. In this talk I discuss what happens to standard database notions of data and combined complexity when considering AI-style symmetric probabilistic databases. The question proves to be a fertile ground for database theory, with interesting connections to counting complexity and 0-1 laws.

Speaker: Wagner Meira

Wagner Meira
Title: "Data Mining: Cause or Consequence"

Abstract: Data mining arose as a merge of several areas such as databases, statistics and artificial intelligence, and has been growing steadily in the last 20 years. Recently, the popularization of the concepts of "data science" and "big data" accelerated the process. In this seminar we try to answer the question whether data mining is cause or consequence of these recent developments through an integrated view of four key components of data mining research and development, nominally models, algorithms, systems and applications, and how they are employed in scenarios such as internet and web. We will also discuss some trends related to knowledge and information discovery from massive data.

Speaker: Dan Olteanu

Dan Olteanu
Title: "Factorized Databases: Past and Future Past"

Abstract: In this talk I will overview the FDB project at Oxford on succinct, lossless representations of relational data that I call factorized databases. I will first present a characterization of the succinctness of results to conjunctive queries and how factorizations can speed up query processing.I will then comment on how this succinctness characterization relates to seemingly disparate results on: readability of provenance polynomials, representation systems for incomplete information, one-pass query evaluation using finite cursor machines, tractability in probabilistic databases, and parallel query evaluation with one synchronization step. I will conclude with two near-future projects that brought me back to factorized data representations: scalable machine learning over relational data and distributed database systems with low communication cost.

Speaker: Victor Vianu

Victor Vianu
Title: Analysis of Data-Centric Workfows

Abstract: Workflows centered around data have become pervasive in a wide variety of applications, including health-care management, e-commerce, business processes, scientific workflows, and e-government. Such workflows are often very complex and involve numerous interacting actors. They are prone to costly bugs, whence the need for static analysis in order to verify critical properties. Analysis tools are also needed to facilitate the integration, interoperation and evolution of workflows, and to provide runtime assistance to participating actors. This talk will present an overview of recent research carried out with collaborators at UC San Diego and INRIA on the analysis of data-centric workflows, an area of growing interest in both academia and industry.

Organized by

Sponsored by
Yahoo! Labs CIWS URP UNMSM VLDB Endowment