Data Lakehouse with Michael Armbrust
Play • 57 min

A data warehouse is a system for performing fast queries on large amounts of data. A data lake is a system for storing high volumes of data in a format that is slow to access. A typical workflow for a data engineer is to pull data sets from this slow data lake storage into the data warehouse for faster querying.

Apache Spark is a system for fast processing of data across distributed datasets. Spark is not thought of as a data warehouse technology, but it can be used to fulfill some of the responsibilities. Delta is an open source system for a storage layer on top of a data lake. Delta integrates closely with Spark, creating a system that Databricks refers to as a “data lakehouse.”

Michael Armbrust is an engineer with Databricks. He joins the show to talk about his experience building the company, and his perspective on data engineering, as well as his work on Delta, the storage system built for the Spark ecosystem.

More episodes
Search
Clear search
Close search
Google apps
Main menu