#28 - The architectural choices for handling big data

May 26, 2024

Do you and your team handle big data? If so, what architecture choices should you follow? This week's issue is a refresher-read on the topic of big data architecture.

What is “true” big data?

Data is considered "big" when it exceeds the capacity of traditional data processing tools to handle, store, and analyse efficiently. This often involves datasets that are terabytes or petabytes in size, but the exact threshold can vary based on the specific use case and technological context.

In practice, most organisations do not deal with truly "big" data. Many enterprises have data warehouses smaller than a terabyte, and even large tech companies often manage data sizes in the range of terabytes rather than petabytes (MotherDuck).

What are the common architecture patterns for handling big data?

Lambda Architecture: This pattern involves two data processing paths: a batch layer (cold path) for high accuracy and a speed layer (hot path) for low latency. It combines real-time and batch processing to provide timely and accurate data insights.
Lambda architecture. Source: Microsoft
Kappa Architecture: An alternative to Lambda, Kappa architecture simplifies the process by using a single stream processing path for all data, eliminating the need for separate batch and speed layers. It focuses on real-time data processing and is suitable for scenarios where reprocessing of data is required.
Kappa architecture. Source: Microsoft

What are the differences between Lambda and Kappa architectures?

Lambda Architecture:

Use Cases: Ideal for applications requiring both real-time and historical data processing, such as fraud detection, recommendation systems, and analytics platforms.
Advantages: It ensures low-latency access to real-time data while maintaining the accuracy and completeness of batch-processed data.
Companies that have Lambda architecture: Netflix

Kappa Architecture:

Use Cases: Ideal for scenarios requiring real-time insights and low latency, such as tracking user engagement during live events or monitoring real-time metrics.
Challenges: It is less suitable for applications needing extensive historical data analysis. Despite its streamlined approach, Kappa Architecture can be complex and expensive to implement. Streaming systems can scale to large data volumes but are often harder to execute and manage compared to traditional batch processing systems.
Companies that have Kappa architecture: X
Data & Beyond
A newsletter for current, and future, data leaders. Curated and summarised insights to help you make the best use of your data assets.
By Ella Pham

How to choose between Lambda and Kappa architecture?

Consider the need for historical data analysis.
Factor in the business context and enterprise architecture when estimating cost of maintenance.
Make reversible architecture choices: Kappa is just a variation of Lambda, design an experiment to benchmark the 2 architectures before choosing to stick with one.

Data & Beyond

#28 - The architectural choices for handling big data

What is “true” big data?

What are the common architecture patterns for handling big data?

What are the differences between Lambda and Kappa architectures?

How to choose between Lambda and Kappa architecture?

Further reading materials:

Discussion about this post