Skip to content

Commit a9cb053

Browse files
authored
docs: add examples for the inserter feature (#179)
1 parent da27018 commit a9cb053

File tree

6 files changed

+204
-8
lines changed

6 files changed

+204
-8
lines changed

Cargo.toml

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -37,6 +37,10 @@ harness = false
3737
name = "select"
3838
harness = false
3939

40+
[[example]]
41+
name = "inserter"
42+
required-features = ["inserter"]
43+
4044
[[example]]
4145
name = "mock"
4246
required-features = ["test-util"]

README.md

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -160,6 +160,8 @@ if stats.rows > 0 {
160160
}
161161
```
162162

163+
Please, read [examples](https://github.com/ClickHouse/clickhouse-rs/tree/main/examples/inserter.rs) to understand how to use it properly in different real-world cases.
164+
163165
* `Inserter` ends an active insert in `commit()` if thresholds (`max_bytes`, `max_rows`, `period`) are reached.
164166
* The interval between ending active `INSERT`s can be biased by using `with_period_bias` to avoid load spikes by parallel inserters.
165167
* `Inserter::time_left()` can be used to detect when the current period ends. Call `Inserter::commit()` again to check limits if your stream emits items rarely.

examples/README.md

Lines changed: 3 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -8,8 +8,9 @@ If something is missing, or you found a mistake in one of these examples, please
88

99
### General usage
1010

11-
- [usage.rs](usage.rs) - creating tables, executing other DDLs, inserting the data, and selecting it back. Additionally, it covers the client-side batching via the `inserter` feature, as well as `WATCH` queries. Optional cargo features: `inserter`, `watch`.
11+
- [usage.rs](usage.rs) - creating tables, executing other DDLs, inserting the data, and selecting it back. Additionally, it covers `WATCH` queries. Optional cargo features: `inserter`, `watch`.
1212
- [mock.rs](mock.rs) - writing tests with `mock` feature. Cargo features: requires `test-util`.
13+
- [inserter.rs](inserter.rs) - using the client-side batching via the `inserter` feature. Cargo features: requires `inserter`.
1314
- [async_insert.rs](async_insert.rs) - using the server-side batching via the [asynchronous inserts](https://clickhouse.com/docs/en/optimize/asynchronous-inserts) ClickHouse feature
1415
- [clickhouse_cloud.rs](clickhouse_cloud.rs) - using the client with ClickHouse Cloud, highlighting a few relevant settings (`wait_end_of_query`, `select_sequential_consistency`). Cargo features: requires `rustls-tls`; the code also works with `native-tls`.
1516
- [clickhouse_settings.rs](clickhouse_settings.rs) - applying various ClickHouse settings on the query level
@@ -56,4 +57,4 @@ If a particular example requires a cargo feature, you could run it as follows:
5657
cargo run --package clickhouse --example usage --features inserter watch
5758
```
5859

59-
Additionally, the individual examples should be runnable via the IDE such as CLion or RustRover.
60+
Additionally, the individual examples should be runnable via the IDE such as CLion or RustRover.

examples/inserter.rs

Lines changed: 173 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,173 @@
1+
use std::time::Duration;
2+
3+
use serde::{Deserialize, Serialize};
4+
use tokio::{
5+
sync::mpsc::{self, error::TryRecvError, Receiver},
6+
time::timeout,
7+
};
8+
9+
use clickhouse::{error::Result, sql::Identifier, Client, Row};
10+
11+
const TABLE_NAME: &str = "chrs_inserter";
12+
13+
#[derive(Debug, Row, Serialize, Deserialize)]
14+
struct MyRow {
15+
no: u32,
16+
}
17+
18+
// Pattern 1: dense streams
19+
// ------------------------
20+
// This pattern is useful when the stream is dense, i.e. with no/small pauses
21+
// between rows. For instance, when reading from a file or another database.
22+
// In other words, this pattern is applicable for ETL-like tasks.
23+
async fn dense(client: &Client, mut rx: Receiver<u32>) -> Result<()> {
24+
let mut inserter = client
25+
.inserter(TABLE_NAME)?
26+
// We limit the number of rows to be inserted in a single `INSERT` statement.
27+
// We use small value (100) for the example only.
28+
// See documentation of `with_max_rows` for details.
29+
.with_max_rows(100)
30+
// You can also use other limits. For instance, limit by the size.
31+
// First reached condition will end the current `INSERT`.
32+
.with_max_bytes(1_048_576);
33+
34+
while let Some(no) = rx.recv().await {
35+
inserter.write(&MyRow { no })?;
36+
inserter.commit().await?;
37+
}
38+
39+
inserter.end().await?;
40+
Ok(())
41+
}
42+
43+
// Pattern 2: sparse streams
44+
// -------------------------
45+
// This pattern is useful when the stream is sparse, i.e. with pauses between
46+
// rows. For instance, when streaming a real-time stream of events into CH.
47+
// Some rows are arriving one by one with delay, some batched.
48+
async fn sparse(client: &Client, mut rx: Receiver<u32>) -> Result<()> {
49+
let mut inserter = client
50+
.inserter(TABLE_NAME)?
51+
// Slice the stream into chunks (one `INSERT` per chunk) by time.
52+
// See documentation of `with_period` for details.
53+
.with_period(Some(Duration::from_millis(100)))
54+
// If you have a lot of parallel inserters (e.g. on multiple nodes),
55+
// it's reasonable to add some bias to the period to spread the load.
56+
.with_period_bias(0.1)
57+
// We also can use other limits. This is useful when the stream is
58+
// recovered after a long time of inactivity (e.g. restart of service or CH).
59+
.with_max_rows(500_000);
60+
61+
loop {
62+
let no = match rx.try_recv() {
63+
Ok(event) => event,
64+
Err(TryRecvError::Empty) => {
65+
// If there is no available events, we should wait for the next one.
66+
// However, we don't know when the next event will arrive.
67+
// So, we should wait no longer than the left time of the current period.
68+
let time_left = inserter.time_left().expect("with_period is set");
69+
70+
// Note: `rx.recv()` must be cancel safe for your channel.
71+
// This is true for popular `tokio`, `futures-channel`, `flume` channels.
72+
match timeout(time_left, rx.recv()).await {
73+
Ok(Some(event)) => event,
74+
// The stream is closed.
75+
Ok(None) => break,
76+
// Timeout
77+
Err(_) => {
78+
// If the period is over, we allow the inserter to end the current `INSERT`
79+
// statement. If no `INSERT` is in progress, this call is no-op.
80+
inserter.commit().await?;
81+
continue;
82+
}
83+
}
84+
}
85+
Err(TryRecvError::Disconnected) => break,
86+
};
87+
88+
inserter.write(&MyRow { no })?;
89+
inserter.commit().await?;
90+
91+
// You can use result of `commit()` to get the number of rows inserted.
92+
// It's useful not only for statistics but also to implement
93+
// at-least-once delivery by sending this info back to the sender,
94+
// where all unacknowledged events should be stored in this case.
95+
}
96+
97+
inserter.end().await?;
98+
Ok(())
99+
}
100+
101+
fn spawn_data_generator(n: u32, sparse: bool) -> Receiver<u32> {
102+
let (tx, rx) = mpsc::channel(1000);
103+
104+
tokio::spawn(async move {
105+
for no in 0..n {
106+
if sparse {
107+
let delay_ms = if no % 100 == 0 { 20 } else { 2 };
108+
tokio::time::sleep(Duration::from_millis(delay_ms)).await;
109+
}
110+
111+
tx.send(no).await.unwrap();
112+
}
113+
});
114+
115+
rx
116+
}
117+
118+
async fn fetch_batches(client: &Client) -> Result<Vec<(String, u64)>> {
119+
client
120+
.query(
121+
"SELECT toString(insertion_time), count()
122+
FROM ?
123+
GROUP BY insertion_time
124+
ORDER BY insertion_time",
125+
)
126+
.bind(Identifier(TABLE_NAME))
127+
.fetch_all::<(String, u64)>()
128+
.await
129+
}
130+
131+
#[tokio::main]
132+
async fn main() -> Result<()> {
133+
let client = Client::default().with_url("http://localhost:8123");
134+
135+
client
136+
.query(
137+
"CREATE OR REPLACE TABLE ? (
138+
no UInt32,
139+
insertion_time DateTime64(6) DEFAULT now64(6)
140+
)
141+
ENGINE = MergeTree
142+
ORDER BY no",
143+
)
144+
.bind(Identifier(TABLE_NAME))
145+
.execute()
146+
.await?;
147+
148+
println!("Pattern 1: dense streams");
149+
let rx = spawn_data_generator(1000, false);
150+
dense(&client, rx).await?;
151+
152+
// Prints 10 batches with 100 rows in each.
153+
for (insertion_time, count) in fetch_batches(&client).await? {
154+
println!("{}: {} rows", insertion_time, count);
155+
}
156+
157+
client
158+
.query("TRUNCATE TABLE ?")
159+
.bind(Identifier(TABLE_NAME))
160+
.execute()
161+
.await?;
162+
163+
println!("\nPattern 2: sparse streams");
164+
let rx = spawn_data_generator(1000, true);
165+
sparse(&client, rx).await?;
166+
167+
// Prints batches every 100±10ms.
168+
for (insertion_time, count) in fetch_batches(&client).await? {
169+
println!("{}: {} rows", insertion_time, count);
170+
}
171+
172+
Ok(())
173+
}

examples/usage.rs

Lines changed: 2 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -37,6 +37,8 @@ async fn insert(client: &Client) -> Result<()> {
3737
insert.end().await
3838
}
3939

40+
// This is a very basic example of using the `inserter` feature.
41+
// See `inserter.rs` for real-world patterns.
4042
#[cfg(feature = "inserter")]
4143
async fn inserter(client: &Client) -> Result<()> {
4244
let mut inserter = client
@@ -45,10 +47,6 @@ async fn inserter(client: &Client) -> Result<()> {
4547
.with_period(Some(std::time::Duration::from_secs(15)));
4648

4749
for i in 0..1000 {
48-
if i == 500 {
49-
inserter.set_max_rows(300);
50-
}
51-
5250
inserter.write(&MyRow { no: i, name: "foo" })?;
5351
inserter.commit().await?;
5452
}

src/inserter.rs

Lines changed: 20 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -53,6 +53,7 @@ impl<T> Inserter<T>
5353
where
5454
T: Row,
5555
{
56+
// TODO: (breaking change) remove `Result`.
5657
pub(crate) fn new(client: &Client, table: &str) -> Result<Self> {
5758
Ok(Self {
5859
client: client.clone(),
@@ -83,6 +84,9 @@ where
8384

8485
/// The maximum number of uncompressed bytes in one `INSERT` statement.
8586
///
87+
/// This is the soft limit, which can be exceeded if rows between
88+
/// [`Inserter::commit()`] calls are larger than set value.
89+
///
8690
/// Note: ClickHouse inserts batches atomically only if all rows fit in the
8791
/// same partition and their number is less [`max_insert_block_size`].
8892
///
@@ -96,6 +100,13 @@ where
96100

97101
/// The maximum number of rows in one `INSERT` statement.
98102
///
103+
/// In order to reduce overhead of merging small parts by ClickHouse, use
104+
/// larger values (e.g. 100_000 or even larger). Consider also/instead
105+
/// [`Inserter::with_max_bytes()`] if rows can be large.
106+
///
107+
/// This is the soft limit, which can be exceeded if multiple rows are
108+
/// written between [`Inserter::commit()`] calls.
109+
///
99110
/// Note: ClickHouse inserts batches atomically only if all rows fit in the
100111
/// same partition and their number is less [`max_insert_block_size`].
101112
///
@@ -114,6 +125,11 @@ where
114125
/// However, it's possible to use [`Inserter::time_left()`] and set a
115126
/// timer up to call [`Inserter::commit()`] to check passed time again.
116127
///
128+
/// Usually, it's reasonable to use 1-10s period, but it depends on
129+
/// desired delay for reading the data from the table.
130+
/// Larger values = less overhead for merging parts by CH.
131+
/// Smaller values = less delay for readers.
132+
///
117133
/// Extra ticks are skipped if the previous `INSERT` is still in progress:
118134
/// ```text
119135
/// Expected ticks: | 1 | 2 | 3 | 4 | 5 | 6 |
@@ -141,7 +157,8 @@ where
141157
self
142158
}
143159

144-
/// Similar to [`Client::with_option`], but for the INSERT statements generated by this [`Inserter`] only.
160+
/// Similar to [`Client::with_option`], but for the INSERT statements
161+
/// generated by this [`Inserter`] only.
145162
pub fn with_option(mut self, name: impl Into<String>, value: impl Into<String>) -> Self {
146163
self.client.add_option(name, value);
147164
self
@@ -192,7 +209,8 @@ where
192209

193210
/// Serializes the provided row into an internal buffer.
194211
///
195-
/// To check the limits and send the data to ClickHouse, call [`Inserter::commit()`].
212+
/// To check the limits and send the data to ClickHouse, call
213+
/// [`Inserter::commit()`].
196214
///
197215
/// # Panics
198216
/// If called after the previous call that returned an error.

0 commit comments

Comments
 (0)