Data Streaming Workflow and Kinesis Data Firehose

Screenshot 2022-07-26 at 12.05.22 PM.png

Overview

Untitled

How it works - Simplified

Screenshot 2022-07-26 at 1.24.45 PM.png

Why Kinesis Data Firehose ?

Untitled

In-transit Dynamic Partitioning

Untitled

Traditionally, customers use Kinesis Data Firehose delivery streams to capture and load their data into Amazon S3 based data lakes for analytics. Partitioning the data while storing on Amazon Simple Storage Service(S3) is a best practice for optimizing performance and reducing the cost of analytics queries because partitions minimize the amount of data scanned. By default, Kinesis Firehose creates a static Universal Coordinated Time (UTC) based folder structure in the format of YYYY/MM/dd/HH. It is then appended it to the provided prefix before writing objects to Amazon S3.

With Dynamic Partitioning, Kinesis Data Firehose will continuously group data in-transit by dynamically or statically defined data keys, and deliver to individual Amazon S3 prefixes by key. This will reduce time-to-insight by minutes or hours, reducing costs and simplifying architectures. This feature combined with Kinesis Data Firehose's Apache Parquet and Apache ORC format conversion feature makes Kinesis Data Firehose an ideal option for capturing, preparing, and loading data that is ready for analytic queries and data processing. Review Kinesis Data Firehose documentation  for additional details on Kinesis Data Firehose dynamic partitioning feature.

Use Cases

Untitled

Best Practices

Consider the following best practices when deploying Kinesis Firehose: