Skip to content

Commit

Permalink
hitch-hiking fix phrasing
Browse files Browse the repository at this point in the history
  • Loading branch information
smdsbz committed Feb 11, 2025
1 parent da19422 commit 8ecf450
Showing 1 changed file with 8 additions and 8 deletions.
16 changes: 8 additions & 8 deletions docs/content/append-table/streaming.md
Original file line number Diff line number Diff line change
Expand Up @@ -26,13 +26,13 @@ under the License.

# Streaming

You can streaming write to the Append table in a very flexible way through Flink, or through read the Append table
You can stream write to the Append table in a very flexible way through Flink, or read the Append table through
Flink, using it like a queue. The only difference is that its latency is in minutes. Its advantages are very low cost
and the ability to push down filters and projection.

## Pre small files merging

Pre means that this compact occurs before committing files to the snapshot.
"Pre" means that this compact occurs before committing files to the snapshot.

If Flink's checkpoint interval is short (for example, 30 seconds), each snapshot may produce lots of small changelog
files. Too many files may put a burden on the distributed storage cluster.
Expand All @@ -43,9 +43,9 @@ operator, which copies changelog files into large ones.

## Post small files merging

Post means that this compact occurs after committing files to the snapshot.
"Post" means that this compact occurs after committing files to the snapshot.

In streaming writing job, without bucket definition, there is no compaction in writer, instead, will use
In streaming write job, without bucket definition, there is no compaction in writer, instead, will use
`Compact Coordinator` to scan the small files and pass compaction task to `Compact Worker`. In streaming mode, if you
run insert sql in flink, the topology will be like this:

Expand All @@ -55,17 +55,17 @@ Do not worry about backpressure, compaction never backpressure.

If you set `write-only` to true, the `Compact Coordinator` and `Compact Worker` will be removed in the topology.

The auto compaction is only supported in Flink engine streaming mode. You can also start a compaction job in flink by
flink action in paimon and disable all the other compaction by set `write-only`.
The auto compaction is only supported in Flink engine streaming mode. You can also start a compaction job in Flink by
Flink action in Paimon and disable all the other compactions by setting `write-only`.

## Streaming Query

You can stream the Append table and use it like a Message Queue. As with primary key tables, there are two options
for streaming reads:
1. By default, Streaming read produces the latest snapshot on the table upon first startup, and continue to read the
latest incremental records.
2. You can specify `scan.mode` or `scan.snapshot-id` or `scan.timestamp-millis` or `scan.file-creation-time-millis` to
streaming read incremental only.
2. You can specify `scan.mode`, `scan.snapshot-id`, `scan.timestamp-millis` and/or `scan.file-creation-time-millis` to
stream read incremental only.

Similar to flink-kafka, order is not guaranteed by default, if your data has some sort of order requirement, you also
need to consider defining a `bucket-key`, see [Bucketed Append]({{< ref "append-table/bucketed" >}})

0 comments on commit 8ecf450

Please sign in to comment.