Skip to article frontmatterSkip to article content

Delta Lake

What is Delta Lake?

Delta Lake is open source software that extends Parquet data files with a file-based transaction log for ACID transactions and scalable metadata handling.

In other words, Delta Lake makes a data lake behave like a database.

Why use Delta Lake?

There are two key reasons to use Delta Lake in pfeed:

  1. ACID Transactions: Writing streaming data to a data lake is more efficient and safer with ACID guarantees. Without ACID, writing new rows to Parquet files in parallel can lead to inconsistencies and corruption
  2. Versioned Data & Time Travel: Delta Lake allows you to version your data and time travel to a specific version. This is useful for reproducibility, debugging, and preventing accidental data loss.

To see how to use Delta Lake in pfeed, please refer to Storage and Integration with Delta Lake.

For more details, please refer to the Delta Lake documentation.