Keynote Talks

Keynote Speakers

Raghu Ramakrishnan (Microsoft)

Azure Data Lake

Abstract

Azure Data Lake is Microsoft’s serverless Big Data platform and is designed to support multiple compute engines, multiple storage tiers, exabyte scale, and comprehensive security and data sharing. ADL builds on our internal experience with an exabyte-scale Big Data service (called Cosmos) and integrates deeply with the Hadoop ecosystem. It has two complementary parts, ADL Analytics and ADL Store. ADL Store (ADLS) is a fully-managed, elastic, scalable, and secure file system that supports Hadoop distributed file system (HDFS) and Cosmos semantics. ADL Analytics (ADLA) is a framework for delivering managed serverless analytics, including those based on our own Microsoft engines (e.g., Scope, U-SQL variants) and OSS engines (e.g., Hive, Spark), based on the standard Hadoop pattern of plugging into HDFS and YARN. In this talk, I will present an overview of ADL. I will cover ADLS architecture, design points including the Cosmos experience, and performance. I will also describe the work we are doing with the Apache OSS community on YARN and how that is leveraged in the ADLA framework.

Bio

Raghu Ramakrishnan is a Technical Fellow and CTO for Data at Microsoft. He also heads engineering for Big Data platforms and services. From 1987 to 2006, he was a professor at University of Wisconsin-Madison, where he wrote the widely-used text “Database Management Systems” and led a wide range of research projects in database systems (e.g., the CORAL deductive database, the DEVise data visualization tool, SQL extensions to handle sequence data) and data mining (scalable clustering, mining over data streams). In 1999, he founded QUIQ, a company that introduced a cloud-based question-answering service. He joined Yahoo! in 2006 as a Yahoo! Fellow, and over the next six years served as Chief Scientist for the Audience (portal), Cloud and Search divisions, driving content recommendation algorithms (CORE), cloud data stores (PNUTS), and semantic search (“Web of Things”). Ramakrishnan has received several awards, including the ACM SIGKDD Innovations Award, the SIGMOD 10-year Test-of-Time Award, the IIT Madras Distinguished Alumnus Award, the NSF Presidential Young Investigator Award, and the Packard Fellowship in Science and Engineering. He is a Fellow of the ACM and IEEE. He has served as Chair of ACM SIGMOD and the Board of the VLDB Foundation, and is on the Board of ACM SIGKDD.


Anurag Gupta (Amazon Web Services)

Modern database architectures and the network bottleneck

Abstract

In this talk, we discuss how the bottleneck in database architectures has moved from CPU to IO to the network today. We then explore how AWS has addressed this with architectures like Amazon Aurora or Amazon Redshift Spectrum. We close with our beliefs on how data architectures will evolve in future.

Bio

Anurag Gupta is a Vice President at Amazon Web Services, where he is responsible for a number of transaction and analytic services. His transaction services are Amazon Aurora, a commercial-grade database compatible with MySQL and PostgreSQL, RDS MariaDB, RDS MySQL, and RDS PostgreSQL. His analytic services are Amazon Athena, a serverless query service, Amazon CloudSearch and Amazon ElasticSearch, managed search services, Amazon Redshift, a data warehousing service, and AWS Data Pipeline and AWS Glue, managed ETL services. Prior to Amazon, Anurag worked at a number of startups (Arbor, TradingDynamics, Interlace) and the companies that eventually acquired them (Hyperion, Ariba, Oracle). He began his career as a database kernel programmer at Oracle.


Magda Balazinska (University of Washington)

Performance SLAs for Cloud Data Analytics

Abstract

A variety of data analytics systems are available as cloud services today, including Amazon Elastic MapReduce (EMR), Redshift, Azure's HDInsight, and others. To buy these services, users select and pay for a given cluster configuration: i.e., number and type of service instances. It is well known, however, that users often have difficultly selecting configurations that meet their needs. In this talk, we present our recent work developing the PSLAManager and PerfEnforce systems that together enable users to directly purchase performance levels expressed as runtimes for queries over users' specific datasets. Given a cloud data analytics service, the PSLAManager generates a database-specific performance-oriented SLA with multiple choices of service tiers. PerfEnforce uses elastic scaling to meet, at a low cost, the SLA runtime guarantees that the user purchases. We present the core algorithms behind each system as well as their evaluation using the Myria cloud data analytics system running on Amazon EC2.

Bio

Magdalena Balazinska is a Professor in the Paul G. Allen School of Computer Science and Engineering at the University of Washington and the Director of the University of Washington eScience Institute. She's also the director of the IGERT PhD Program in Big Data and Data Science and the director of the associated Advanced Data Science PhD Option. Magdalena's research interests are in the field of database management systems. Her current research focuses on data management for data science, big data systems, and cloud computing. Magdalena holds a Ph.D. from the Massachusetts Institute of Technology (2006). She is a Microsoft Research New Faculty Fellow (2007), received the inaugural VLDB Women in Database Research Award (2016), an ACM SIGMOD Test-ofTime Award (2017), an NSF CAREER Award (2009), a 10-year most influential paper award (2010), a Google Research Award (2011), an HP Labs Research Innovation Award (2009 and 2010), a Rogel Faculty Support Award (2006), a Microsoft Research Graduate Fellowship (2003-2005), and multiple best-paper awards.

Latest news