Amazon Kinesis Data Firehose [Shared]

Data Streaming Workflow and Kinesis Data Firehose

Screenshot 2022-07-26 at 12.05.22 PM.png

Overview

Untitled

How it works - Simplified

Screenshot 2022-07-26 at 1.24.45 PM.png

Why Kinesis Data Firehose ?

Untitled

In-transit Dynamic Partitioning

Untitled

Traditionally, customers use Kinesis Data Firehose delivery streams to capture and load their data into Amazon S3 based data lakes for analytics. Partitioning the data while storing on Amazon Simple Storage Service(S3) is a best practice for optimizing performance and reducing the cost of analytics queries because partitions minimize the amount of data scanned. By default, Kinesis Firehose creates a static Universal Coordinated Time (UTC) based folder structure in the format of YYYY/MM/dd/HH. It is then appended it to the provided prefix before writing objects to Amazon S3.

With Dynamic Partitioning, Kinesis Data Firehose will continuously group data in-transit by dynamically or statically defined data keys, and deliver to individual Amazon S3 prefixes by key. This will reduce time-to-insight by minutes or hours, reducing costs and simplifying architectures. This feature combined with Kinesis Data Firehose's Apache Parquet and Apache ORC format conversion feature makes Kinesis Data Firehose an ideal option for capturing, preparing, and loading data that is ready for analytic queries and data processing. Review Kinesis Data Firehose documentation for additional details on Kinesis Data Firehose dynamic partitioning feature.

Use Cases

Untitled

Best Practices

Consider the following best practices when deploying Kinesis Firehose:

Record Size Limit: The maximum size of a record sent to Kinesis Data Firehose is 1,000 KB. If your message size is greater than this value, compressing the message before it is sent to Kinesis Data Firehose is the best approach.
Buffering hints: Buffering hint options are treated as hints. As a result, Kinesis Data Firehose might choose to use different values to optimize the buffering.
Permissions: Kinesis Data Firehose uses IAM roles for all the permissions that the delivery stream needs. You can choose to create a new role where required permissions are assigned automatically, or choose an existing role created for Kinesis Data Firehose. If you are creating a new role, ensure that you are granting only the permissions that are required to perform a task.
Encryption: Data at rest and data in transit can be encrypted in Kinesis Data Firehose. Refer to the Kinesis Firehose Documentation .