Skip to content

Commit a544f33

Browse files
committed
[MINOR] Fix broken urls
Signed-off-by: wforget <[email protected]>
1 parent 846f1e4 commit a544f33

File tree

3 files changed

+14
-14
lines changed

3 files changed

+14
-14
lines changed

README.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -5,8 +5,8 @@ The connector supports to read from and write to StarRocks through Apache Spark
55
## Documentation
66

77
For the user manual of the released version of the Spark connector, please visit the StarRocks official documentation.
8-
* [Read data from StarRocks using Spark connector](https://docs.starrocks.io/en-us/latest/loading/Spark-connector-starrocks)
9-
* [Load data using Spark connector](https://docs.starrocks.io/en-us/latest/unloading/Spark_connector)
8+
* [Read data from StarRocks using Spark connector](https://docs.starrocks.io/docs/loading/Spark-connector-starrocks)
9+
* [Load data using Spark connector](https://docs.starrocks.io/docs/unloading/Spark_connector)
1010

1111
For the new features in the snapshot version of the Spark connector, please see the docs in this repo.
1212
* [Read from StarRocks](docs/connector-read.md)

docs/connector-read.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -10,7 +10,7 @@ You can also map the StarRocks table to a Spark DataFrame or a Spark RDD, and th
1010

1111
> **NOTICE**
1212
>
13-
> Reading data from StarRocks tables with Spark connector needs SELECT privilege. If you do not have the privilege, follow the instructions provided in [GRANT](https://docs.starrocks.io/en-us/latest/sql-reference/sql-statements/account-management/GRANT) to grant the privilege to the user that you use to connect to your StarRocks cluster.
13+
> Reading data from StarRocks tables with Spark connector needs SELECT privilege. If you do not have the privilege, follow the instructions provided in [GRANT](https://docs.starrocks.io/docs/sql-reference/sql-statements/account-management/GRANT) to grant the privilege to the user that you use to connect to your StarRocks cluster.
1414
1515
## Usage notes
1616

docs/connector-write.md

Lines changed: 11 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -1,10 +1,10 @@
11
# Load data using Spark connector
22

3-
StarRocks provides a self-developed connector named StarRocks Connector for Apache Spark™ (Spark connector for short) to help you load data into a StarRocks table by using Spark. The basic principle is to accumulate the data and then load it all at a time into StarRocks through [STREAM LOAD](https://docs.starrocks.io/en-us/latest/sql-reference/sql-statements/data-manipulation/STREAM%20LOAD). The Spark connector is implemented based on Spark DataSource V2. A DataSource can be created by using Spark DataFrames or Spark SQL. And both batch and structured streaming modes are supported.
3+
StarRocks provides a self-developed connector named StarRocks Connector for Apache Spark™ (Spark connector for short) to help you load data into a StarRocks table by using Spark. The basic principle is to accumulate the data and then load it all at a time into StarRocks through [STREAM LOAD](https://docs.starrocks.io/docs/sql-reference/sql-statements/data-manipulation/STREAM_LOAD). The Spark connector is implemented based on Spark DataSource V2. A DataSource can be created by using Spark DataFrames or Spark SQL. And both batch and structured streaming modes are supported.
44

55
> **NOTICE**
66
>
7-
> Loading data into StarRocks tables with Spark connector needs SELECT and INSERT privileges. If you do not have these privileges, follow the instructions provided in [GRANT](https://docs.starrocks.io/en-us/latest/sql-reference/sql-statements/account-management/GRANT) to grant these privileges to the user that you use to connect to your StarRocks cluster.
7+
> Loading data into StarRocks tables with Spark connector needs SELECT and INSERT privileges. If you do not have these privileges, follow the instructions provided in [GRANT](https://docs.starrocks.io/docs/sql-reference/sql-statements/account-management/GRANT) to grant these privileges to the user that you use to connect to your StarRocks cluster.
88
99
## Version requirements
1010

@@ -92,15 +92,15 @@ Directly download the corresponding version of the Spark connector JAR from the
9292
| starrocks.user | YES | None | The username of your StarRocks cluster account. |
9393
| starrocks.password | YES | None | The password of your StarRocks cluster account. |
9494
| starrocks.write.label.prefix | NO | spark- | The label prefix used by Stream Load. |
95-
| starrocks.write.enable.transaction-stream-load | NO | TRUE | Whether to use [Stream Load transaction interface](https://docs.starrocks.io/en-us/latest/loading/Stream_Load_transaction_interface) to load data. It requires StarRocks v2.5 or later. This feature can load more data in a transaction with less memory usage, and improve performance. <br/> **NOTICE:** Since 1.1.1, this parameter takes effect only when the value of `starrocks.write.max.retries` is non-positive because Stream Load transaction interface does not support retry. |
95+
| starrocks.write.enable.transaction-stream-load | NO | TRUE | Whether to use [Stream Load transaction interface](https://docs.starrocks.io/docs/loading/Stream_Load_transaction_interface) to load data. It requires StarRocks v2.5 or later. This feature can load more data in a transaction with less memory usage, and improve performance. <br/> **NOTICE:** Since 1.1.1, this parameter takes effect only when the value of `starrocks.write.max.retries` is non-positive because Stream Load transaction interface does not support retry. |
9696
| starrocks.write.buffer.size | NO | 104857600 | The maximum size of data that can be accumulated in memory before being sent to StarRocks at a time. Setting this parameter to a larger value can improve loading performance but may increase loading latency. |
9797
| starrocks.write.buffer.rows | NO | Integer.MAX_VALUE | Supported since version 1.1.1. The maximum number of rows that can be accumulated in memory before being sent to StarRocks at a time. |
9898
| starrocks.write.flush.interval.ms | NO | 300000 | The interval at which data is sent to StarRocks. This parameter is used to control the loading latency. |
9999
| starrocks.write.max.retries | NO | 3 | Supported since version 1.1.1. The number of times that the connector retries to perform the Stream Load for the same batch of data if the load fails. <br/> **NOTICE:** Because Stream Load transaction interface does not support retry. If this parameter is positive, the connector always use Stream Load interface and ingnore the value of `starrocks.write.enable.transaction-stream-load`. |
100100
| starrocks.write.retry.interval.ms | NO | 10000 | Supported since version 1.1.1. The interval to retry the Stream Load for the same batch of data if the load fails. |
101101
| starrocks.columns | NO | None | The StarRocks table column into which you want to load data. You can specify multiple columns, which must be separated by commas (,), for example, `"col0,col1,col2"`. |
102102
| starrocks.column.types | NO | None | Supported since version 1.1.1. Customize the column data types for Spark instead of using the defaults inferred from the StarRocks table and the [default mapping](#data-type-mapping-between-spark-and-starrocks). The parameter value is a schema in DDL format same as the output of Spark [StructType#toDDL](https://github.com/apache/spark/blob/master/sql/api/src/main/scala/org/apache/spark/sql/types/StructType.scala#L449) , such as `col0 INT, col1 STRING, col2 BIGINT`. Note that you only need to specify columns that need customization. One use case is to load data into columns of [BITMAP](#load-data-into-columns-of-bitmap-type) or [HLL](#load-data-into-columns-of-HLL-type) type.|
103-
| starrocks.write.properties.* | NO | None | The parameters that are used to control Stream Load behavior. For example, the parameter `starrocks.write.properties.format` specifies the format of the data to be loaded, such as CSV or JSON. For a list of supported parameters and their descriptions, see [STREAM LOAD](https://docs.starrocks.io/en-us/latest/sql-reference/sql-statements/data-manipulation/STREAM%20LOAD). |
103+
| starrocks.write.properties.* | NO | None | The parameters that are used to control Stream Load behavior. For example, the parameter `starrocks.write.properties.format` specifies the format of the data to be loaded, such as CSV or JSON. For a list of supported parameters and their descriptions, see [STREAM LOAD](https://docs.starrocks.io/docs/sql-reference/sql-statements/data-manipulation/STREAM_LOAD). |
104104
| starrocks.write.properties.format | NO | CSV | The file format based on which the Spark connector transforms each batch of data before the data is sent to StarRocks. Valid values: CSV and JSON. |
105105
| starrocks.write.properties.row_delimiter | NO | \n | The row delimiter for CSV-formatted data. |
106106
| starrocks.write.properties.column_separator | NO | \t | The column separator for CSV-formatted data. |
@@ -385,7 +385,7 @@ The following example explains how to load data with Spark SQL by using the `INS
385385
### Load data to primary key table
386386

387387
This section will show how to load data to StarRocks primary key table to achieve partial update, and conditional update.
388-
You can see [Change data through loading](https://docs.starrocks.io/en-us/latest/loading/Load_to_Primary_Key_tables) for the introduction of those features.
388+
You can see [Change data through loading](https://docs.starrocks.io/docs/loading/Load_to_Primary_Key_tables) for the introduction of those features.
389389
These examples use Spark SQL.
390390

391391
#### Preparations
@@ -517,7 +517,7 @@ takes effect only when the new value for `score` is has a greater or equal to th
517517

518518
### Load data into columns of BITMAP type
519519

520-
[`BITMAP`](https://docs.starrocks.io/en-us/latest/sql-reference/sql-statements/data-types/BITMAP) is often used to accelerate count distinct, such as counting UV, see [Use Bitmap for exact Count Distinct](https://docs.starrocks.io/en-us/latest/using_starrocks/Using_bitmap).
520+
[`BITMAP`](https://docs.starrocks.io/docs/sql-reference/sql-statements/data-types/BITMAP) is often used to accelerate count distinct, such as counting UV, see [Use Bitmap for exact Count Distinct](https://docs.starrocks.io/docs/using_starrocks/Using_bitmap).
521521
Here we take the counting of UV as an example to show how to load data into columns of the `BITMAP` type.
522522

523523
1. Create a StarRocks Aggregate table
@@ -536,7 +536,7 @@ Here we take the counting of UV as an example to show how to load data into colu
536536

537537
3. Create a Spark table
538538

539-
The schema of the Spark table is inferred from the StarRocks table, and the Spark does not support the `BITMAP` type. So you need to customize the corresponding column data type in Spark, for example as `BIGINT`, by configuring the option `"starrocks.column.types"="visit_users BIGINT"`. When using Stream Load to ingest data, the connector uses the [`to_bitmap`](https://docs.starrocks.io/en-us/latest/sql-reference/sql-functions/bitmap-functions/to_bitmap) function to convert the data of `BIGINT` type into `BITMAP` type.
539+
The schema of the Spark table is inferred from the StarRocks table, and the Spark does not support the `BITMAP` type. So you need to customize the corresponding column data type in Spark, for example as `BIGINT`, by configuring the option `"starrocks.column.types"="visit_users BIGINT"`. When using Stream Load to ingest data, the connector uses the [`to_bitmap`](https://docs.starrocks.io/docs/sql-reference/sql-functions/bitmap-functions/to_bitmap) function to convert the data of `BIGINT` type into `BITMAP` type.
540540

541541
Run the following DDL in `spark-sql`:
542542

@@ -580,13 +580,13 @@ Here we take the counting of UV as an example to show how to load data into colu
580580
```
581581
> **NOTICE:**
582582
>
583-
> The connector uses [`to_bitmap`](https://docs.starrocks.io/en-us/latest/sql-reference/sql-functions/bitmap-functions/to_bitmap)
583+
> The connector uses [`to_bitmap`](https://docs.starrocks.io/docs/sql-reference/sql-functions/bitmap-functions/to_bitmap)
584584
> function to convert data of the `TINYINT`, `SMALLINT`, `INTEGER`, and `BIGINT` types in Spark to the `BITMAP` type in StarRocks, and uses
585585
> [`bitmap_hash`](https://docs.starrocks.io/zh-cn/latest/sql-reference/sql-functions/bitmap-functions/bitmap_hash) function for other Spark data types.
586586

587587
### Load data into columns of HLL type
588588

589-
[`HLL`](https://docs.starrocks.io/en-us/latest/sql-reference/sql-statements/data-types/HLL) can be used for approximate count distinct, see [Use HLL for approximate count distinct](https://docs.starrocks.io/en-us/latest/using_starrocks/Using_HLL).
589+
[`HLL`](https://docs.starrocks.io/docs/sql-reference/sql-statements/data-types/HLL) can be used for approximate count distinct, see [Use HLL for approximate count distinct](https://docs.starrocks.io/docs/using_starrocks/Using_HLL).
590590

591591
Here we take the counting of UV as an example to show how to load data into columns of the `HLL` type. **`HLL` is supported since version 1.1.1**.
592592

@@ -606,7 +606,7 @@ DISTRIBUTED BY HASH(`page_id`);
606606

607607
2. Create a Spark table
608608

609-
The schema of the Spark table is inferred from the StarRocks table, and the Spark does not support the `HLL` type. So you need to customize the corresponding column data type in Spark, for example as `BIGINT`, by configuring the option `"starrocks.column.types"="visit_users BIGINT"`. When using Stream Load to ingest data, the connector uses the [`hll_hash`](https://docs.starrocks.io/en-us/latest/sql-reference/sql-functions/aggregate-functions/hll_hash) function to convert the data of `BIGINT` type into `HLL` type.
609+
The schema of the Spark table is inferred from the StarRocks table, and the Spark does not support the `HLL` type. So you need to customize the corresponding column data type in Spark, for example as `BIGINT`, by configuring the option `"starrocks.column.types"="visit_users BIGINT"`. When using Stream Load to ingest data, the connector uses the [`hll_hash`](https://docs.starrocks.io/docs/sql-reference/sql-functions/aggregate-functions/hll_hash) function to convert the data of `BIGINT` type into `HLL` type.
610610

611611
Run the following DDL in `spark-sql`:
612612

@@ -651,7 +651,7 @@ DISTRIBUTED BY HASH(`page_id`);
651651

652652

653653

654-
The following example explains how to load data into columns of the [`ARRAY`](https://docs.starrocks.io/en-us/latest/sql-reference/sql-statements/data-types/Array) type.
654+
The following example explains how to load data into columns of the [`ARRAY`](https://docs.starrocks.io/docs/sql-reference/sql-statements/data-types/Array) type.
655655

656656
1. Create a StarRocks table
657657

0 commit comments

Comments
 (0)