Skip to content

Commit

Permalink
fix make lint
Browse files Browse the repository at this point in the history
  • Loading branch information
kevinjqliu committed Aug 31, 2024
1 parent c018d26 commit 4410dd8
Show file tree
Hide file tree
Showing 8 changed files with 50 additions and 47 deletions.
2 changes: 1 addition & 1 deletion .markdownlint.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -19,4 +19,4 @@
default: true

# MD013/line-length - Line length
MD013: false
MD013: false
5 changes: 3 additions & 2 deletions mkdocs/docs/SUMMARY.md
Original file line number Diff line number Diff line change
Expand Up @@ -18,6 +18,7 @@
<!-- prettier-ignore-start -->

<!-- markdown-link-check-disable -->
# Summary

- [Getting started](index.md)
- [Configuration](configuration.md)
Expand All @@ -26,8 +27,8 @@
- [Contributing](contributing.md)
- [Community](community.md)
- Releases
- [Verify a release](verify-release.md)
- [How to release](how-to-release.md)
- [Verify a release](verify-release.md)
- [How to release](how-to-release.md)
- [Code Reference](reference/)

<!-- markdown-link-check-enable-->
Expand Down
42 changes: 21 additions & 21 deletions mkdocs/docs/api.md
Original file line number Diff line number Diff line change
Expand Up @@ -280,7 +280,7 @@ tbl.overwrite(df)

The data is written to the table, and when the table is read using `tbl.scan().to_arrow()`:

```
```python
pyarrow.Table
city: string
lat: double
Expand All @@ -303,7 +303,7 @@ tbl.append(df)

When reading the table `tbl.scan().to_arrow()` you can see that `Groningen` is now also part of the table:

```
```python
pyarrow.Table
city: string
lat: double
Expand Down Expand Up @@ -342,7 +342,7 @@ tbl.delete(delete_filter="city == 'Paris'")
In the above example, any records where the city field value equals to `Paris` will be deleted.
Running `tbl.scan().to_arrow()` will now yield:

```
```python
pyarrow.Table
city: string
lat: double
Expand All @@ -363,9 +363,9 @@ To explore the table metadata, tables can be inspected.
To inspect a tables's metadata with the time travel feature, call the inspect table method with the `snapshot_id` argument.
Time travel is supported on all metadata tables except `snapshots` and `refs`.

```python
table.inspect.entries(snapshot_id=805611270568163028)
```
```python
table.inspect.entries(snapshot_id=805611270568163028)
```

<!-- prettier-ignore-end -->

Expand All @@ -377,7 +377,7 @@ Inspect the snapshots of the table:
table.inspect.snapshots()
```

```
```python
pyarrow.Table
committed_at: timestamp[ms] not null
snapshot_id: int64 not null
Expand Down Expand Up @@ -405,7 +405,7 @@ Inspect the partitions of the table:
table.inspect.partitions()
```

```
```python
pyarrow.Table
partition: struct<dt_month: int32, dt_day: date32[day]> not null
child 0, dt_month: int32
Expand Down Expand Up @@ -446,7 +446,7 @@ To show all the table's current manifest entries for both data and delete files.
table.inspect.entries()
```

```
```python
pyarrow.Table
status: int8 not null
snapshot_id: int64 not null
Expand Down Expand Up @@ -604,7 +604,7 @@ To show a table's known snapshot references:
table.inspect.refs()
```

```
```python
pyarrow.Table
name: string not null
type: string not null
Expand All @@ -629,7 +629,7 @@ To show a table's current file manifests:
table.inspect.manifests()
```

```
```python
pyarrow.Table
content: int8 not null
path: string not null
Expand Down Expand Up @@ -679,7 +679,7 @@ To show table metadata log entries:
table.inspect.metadata_log_entries()
```

```
```python
pyarrow.Table
timestamp: timestamp[ms] not null
file: string not null
Expand All @@ -702,7 +702,7 @@ To show a table's history:
table.inspect.history()
```

```
```python
pyarrow.Table
made_current_at: timestamp[ms] not null
snapshot_id: int64 not null
Expand All @@ -723,7 +723,7 @@ Inspect the data files in the current snapshot of the table:
table.inspect.files()
```

```
```python
pyarrow.Table
content: int8 not null
file_path: string not null
Expand Down Expand Up @@ -850,7 +850,7 @@ readable_metrics: [

Expert Iceberg users may choose to commit existing parquet files to the Iceberg table as data files, without rewriting them.

```
```python
# Given that these parquet files have schema consistent with the Iceberg table
file_paths = [
Expand Down Expand Up @@ -930,7 +930,7 @@ with table.update_schema() as update:

Now the table has the union of the two schemas `print(table.schema())`:

```
```python
table {
1: city: optional string
2: lat: optional double
Expand Down Expand Up @@ -1180,7 +1180,7 @@ table.scan(

This will return a PyArrow table:

```
```python
pyarrow.Table
VendorID: int64
tpep_pickup_datetime: timestamp[us, tz=+00:00]
Expand Down Expand Up @@ -1222,7 +1222,7 @@ table.scan(

This will return a Pandas dataframe:

```
```python
VendorID tpep_pickup_datetime tpep_dropoff_datetime
0 2 2021-04-01 00:28:05+00:00 2021-04-01 00:47:59+00:00
1 1 2021-04-01 00:39:01+00:00 2021-04-01 00:57:39+00:00
Expand Down Expand Up @@ -1295,7 +1295,7 @@ ray_dataset = table.scan(

This will return a Ray dataset:

```
```python
Dataset(
num_blocks=1,
num_rows=1168798,
Expand Down Expand Up @@ -1346,7 +1346,7 @@ df = df.select("VendorID", "tpep_pickup_datetime", "tpep_dropoff_datetime")

This returns a Daft Dataframe which is lazily materialized. Printing `df` will display the schema:

```
```python
╭──────────┬───────────────────────────────┬───────────────────────────────╮
│ VendorID ┆ tpep_pickup_datetime ┆ tpep_dropoff_datetime │
---------
Expand All @@ -1364,7 +1364,7 @@ This is correctly optimized to take advantage of Iceberg features such as hidden
df.show(2)
```

```
```python
╭──────────┬───────────────────────────────┬───────────────────────────────╮
│ VendorID ┆ tpep_pickup_datetime ┆ tpep_dropoff_datetime │
---------
Expand Down
20 changes: 11 additions & 9 deletions mkdocs/docs/configuration.md
Original file line number Diff line number Diff line change
Expand Up @@ -22,6 +22,8 @@ hide:
- under the License.
-->

# Configuration

## Tables

Iceberg tables support table properties to configure table behavior.
Expand Down Expand Up @@ -77,15 +79,15 @@ For the FileIO there are several configuration options available:

| Key | Example | Description |
| -------------------- | ------------------------ | --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| s3.endpoint | https://10.0.19.25/ | Configure an alternative endpoint of the S3 service for the FileIO to access. This could be used to use S3FileIO with any s3-compatible object storage service that has a different endpoint, or access a private S3 endpoint in a virtual private cloud. |
| s3.endpoint | <https://10.0.19.25/> | Configure an alternative endpoint of the S3 service for the FileIO to access. This could be used to use S3FileIO with any s3-compatible object storage service that has a different endpoint, or access a private S3 endpoint in a virtual private cloud. |
| s3.access-key-id | admin | Configure the static access key id used to access the FileIO. |
| s3.secret-access-key | password | Configure the static secret access key used to access the FileIO. |
| s3.session-token | AQoDYXdzEJr... | Configure the static session token used to access the FileIO. |
| s3.signer | bearer | Configure the signature version of the FileIO. |
| s3.signer.uri | http://my.signer:8080/s3 | Configure the remote signing uri if it differs from the catalog uri. Remote signing is only implemented for `FsspecFileIO`. The final request is sent to `<s3.signer.uri>/<s3.signer.endpoint>`. |
| s3.signer.uri | <http://my.signer:8080/s3> | Configure the remote signing uri if it differs from the catalog uri. Remote signing is only implemented for `FsspecFileIO`. The final request is sent to `<s3.signer.uri>/<s3.signer.endpoint>`. |
| s3.signer.endpoint | v1/main/s3-sign | Configure the remote signing endpoint. Remote signing is only implemented for `FsspecFileIO`. The final request is sent to `<s3.signer.uri>/<s3.signer.endpoint>`. (default : v1/aws/s3/sign). |
| s3.region | us-west-2 | Sets the region of the bucket |
| s3.proxy-uri | http://my.proxy.com:8080 | Configure the proxy server to be used by the FileIO. |
| s3.proxy-uri | <http://my.proxy.com:8080> | Configure the proxy server to be used by the FileIO. |
| s3.connect-timeout | 60.0 | Configure socket connection timeout, in seconds. |

<!-- markdown-link-check-enable-->
Expand All @@ -96,7 +98,7 @@ For the FileIO there are several configuration options available:

| Key | Example | Description |
| -------------------- | ------------------- | ------------------------------------------------ |
| hdfs.host | https://10.0.19.25/ | Configure the HDFS host to connect to |
| hdfs.host | <https://10.0.19.25/> | Configure the HDFS host to connect to |
| hdfs.port | 9000 | Configure the HDFS port to connect to. |
| hdfs.user | user | Configure the HDFS username used for connection. |
| hdfs.kerberos_ticket | kerberos_ticket | Configure the path to the Kerberos ticket cache. |
Expand All @@ -109,7 +111,7 @@ For the FileIO there are several configuration options available:

| Key | Example | Description |
| ----------------------- | ----------------------------------------------------------------------------------------- | -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| adlfs.connection-string | AccountName=devstoreaccount1;AccountKey=Eby8vdM02xNOcqF...;BlobEndpoint=http://localhost/ | A [connection string](https://learn.microsoft.com/en-us/azure/storage/common/storage-configure-connection-string). This could be used to use FileIO with any adlfs-compatible object storage service that has a different endpoint (like [azurite](https://github.com/azure/azurite)). |
| adlfs.connection-string | AccountName=devstoreaccount1;AccountKey=Eby8vdM02xNOcqF...;BlobEndpoint=<http://localhost/> | A [connection string](https://learn.microsoft.com/en-us/azure/storage/common/storage-configure-connection-string). This could be used to use FileIO with any adlfs-compatible object storage service that has a different endpoint (like [azurite](https://github.com/azure/azurite)). |
| adlfs.account-name | devstoreaccount1 | The account that you want to connect to |
| adlfs.account-key | Eby8vdM02xNOcqF... | The key to authentication against the account. |
| adlfs.sas-token | NuHOuuzdQN7VRM%2FOpOeqBlawRCA845IY05h9eu1Yte4%3D | The shared access signature |
Expand All @@ -133,7 +135,7 @@ For the FileIO there are several configuration options available:
| gcs.cache-timeout | 60 | Configure the cache expiration time in seconds for object metadata cache |
| gcs.requester-pays | False | Configure whether to use requester-pays requests |
| gcs.session-kwargs | {} | Configure a dict of parameters to pass on to aiohttp.ClientSession; can contain, for example, proxy settings. |
| gcs.endpoint | http://0.0.0.0:4443 | Configure an alternative endpoint for the GCS FileIO to access (format protocol://host:port) If not given, defaults to the value of environment variable "STORAGE_EMULATOR_HOST"; if that is not set either, will use the standard Google endpoint. |
| gcs.endpoint | <http://0.0.0.0:4443> | Configure an alternative endpoint for the GCS FileIO to access (format protocol://host:port) If not given, defaults to the value of environment variable "STORAGE_EMULATOR_HOST"; if that is not set either, will use the standard Google endpoint. |
| gcs.default-location | US | Configure the default location where buckets are created, like 'US' or 'EUROPE-WEST3'. |
| gcs.version-aware | False | Configure whether to support object versioning on the GCS bucket. |

Expand Down Expand Up @@ -200,7 +202,7 @@ catalog:
| Key | Example | Description |
| ------------------- | -------------------------------- | -------------------------------------------------------------------------------------------------- |
| uri | https://rest-catalog/ws | URI identifying the REST Server |
| uri | <https://rest-catalog/ws> | URI identifying the REST Server |
| ugi | t-1234:secret | Hadoop UGI for Hive client. |
| credential | t-1234:secret | Credential to use for OAuth2 credential flow when initializing the catalog |
| token | FEW23.DFSDF.FSDF | Bearer token value to use for `Authorization` header |
Expand All @@ -210,7 +212,7 @@ catalog:
| rest.sigv4-enabled | true | Sign requests to the REST Server using AWS SigV4 protocol |
| rest.signing-region | us-east-1 | The region to use when SigV4 signing a request |
| rest.signing-name | execute-api | The service signing name to use when SigV4 signing a request |
| oauth2-server-uri | https://auth-service/cc | Authentication URL to use for client credentials authentication (default: uri + 'v1/oauth/tokens') |
| oauth2-server-uri | <https://auth-service/cc> | Authentication URL to use for client credentials authentication (default: uri + 'v1/oauth/tokens') |

<!-- markdown-link-check-enable-->

Expand Down Expand Up @@ -325,7 +327,7 @@ catalog:
| ---------------------- | ------------------------------------ | ------------------------------------------------------------------------------- |
| glue.id | 111111111111 | Configure the 12-digit ID of the Glue Catalog |
| glue.skip-archive | true | Configure whether to skip the archival of older table versions. Default to true |
| glue.endpoint | https://glue.us-east-1.amazonaws.com | Configure an alternative endpoint of the Glue service for GlueCatalog to access |
| glue.endpoint | <https://glue.us-east-1.amazonaws.com> | Configure an alternative endpoint of the Glue service for GlueCatalog to access |
| glue.profile-name | default | Configure the static profile used to access the Glue Catalog |
| glue.region | us-east-1 | Set the region of the Glue Catalog |
| glue.access-key-id | admin | Configure the static access key id used to access the Glue Catalog |
Expand Down
Loading

0 comments on commit 4410dd8

Please sign in to comment.