diff --git a/.github/workflows/check-md-link.yml b/.github/workflows/check-md-link.yml new file mode 100644 index 0000000000..eec019a19c --- /dev/null +++ b/.github/workflows/check-md-link.yml @@ -0,0 +1,16 @@ +name: Check Markdown links + +on: + push: + paths: + - mkdocs/** + branches: + - 'main' + pull_request: + +jobs: + markdown-link-check: + runs-on: ubuntu-latest + steps: + - uses: actions/checkout@master + - uses: gaurav-nelson/github-action-markdown-link-check@v1 diff --git a/mkdocs/docs/SUMMARY.md b/mkdocs/docs/SUMMARY.md index 40ba0bffd7..5cf753d4c3 100644 --- a/mkdocs/docs/SUMMARY.md +++ b/mkdocs/docs/SUMMARY.md @@ -17,6 +17,8 @@ + + - [Getting started](index.md) - [Configuration](configuration.md) - [CLI](cli.md) @@ -28,4 +30,6 @@ - [How to release](how-to-release.md) - [Code Reference](reference/) + + diff --git a/mkdocs/docs/configuration.md b/mkdocs/docs/configuration.md index 8acc0a98cb..d0c71b598d 100644 --- a/mkdocs/docs/configuration.md +++ b/mkdocs/docs/configuration.md @@ -81,6 +81,8 @@ For the FileIO there are several configuration options available: ### S3 + + | Key | Example | Description | | -------------------- | ------------------------ | --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | | s3.endpoint | https://10.0.19.25/ | Configure an alternative endpoint of the S3 service for the FileIO to access. This could be used to use S3FileIO with any s3-compatible object storage service that has a different endpoint, or access a private S3 endpoint in a virtual private cloud. | @@ -91,8 +93,12 @@ For the FileIO there are several configuration options available: | s3.proxy-uri | http://my.proxy.com:8080 | Configure the proxy server to be used by the FileIO. | | s3.connect-timeout | 60.0 | Configure socket connection timeout, in seconds. | + + ### HDFS + + | Key | Example | Description | | -------------------- | ------------------- | ------------------------------------------------ | | hdfs.host | https://10.0.19.25/ | Configure the HDFS host to connect to | @@ -100,8 +106,12 @@ For the FileIO there are several configuration options available: | hdfs.user | user | Configure the HDFS username used for connection. | | hdfs.kerberos_ticket | kerberos_ticket | Configure the path to the Kerberos ticket cache. | + + ### Azure Data lake + + | Key | Example | Description | | ----------------------- | ----------------------------------------------------------------------------------------- | -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | | adlfs.connection-string | AccountName=devstoreaccount1;AccountKey=Eby8vdM02xNOcqF...;BlobEndpoint=http://localhost/ | A [connection string](https://learn.microsoft.com/en-us/azure/storage/common/storage-configure-connection-string). This could be used to use FileIO with any adlfs-compatible object storage service that has a different endpoint (like [azurite](https://github.com/azure/azurite)). | @@ -112,8 +122,12 @@ For the FileIO there are several configuration options available: | adlfs.client-id | ad667be4-b811-11ed-afa1-0242ac120002 | The client-id | | adlfs.client-secret | oCA3R6P\*ka#oa1Sms2J74z... | The client-secret | + + ### Google Cloud Storage + + | Key | Example | Description | | -------------------------- | ------------------- | ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | | gcs.project-id | my-gcp-project | Configure Google Cloud Project for GCS FileIO. | @@ -128,6 +142,8 @@ For the FileIO there are several configuration options available: | gcs.default-location | US | Configure the default location where buckets are created, like 'US' or 'EUROPE-WEST3'. | | gcs.version-aware | False | Configure whether to support object versioning on the GCS bucket. | + + ## REST Catalog ```yaml @@ -145,6 +161,8 @@ catalog: cabundle: /absolute/path/to/cabundle.pem ``` + + | Key | Example | Description | | ---------------------- | ----------------------- | -------------------------------------------------------------------------------------------------- | | uri | https://rest-catalog/ws | URI identifying the REST Server | @@ -155,6 +173,8 @@ catalog: | rest.signing-name | execute-api | The service signing name to use when SigV4 signing a request | | rest.authorization-url | https://auth-service/cc | Authentication URL to use for client credentials authentication (default: uri + 'v1/oauth/tokens') | + + ## SQL Catalog The SQL catalog requires a database for its backend. PyIceberg supports PostgreSQL and SQLite through psycopg2. The database connection has to be configured using the `uri` property. See SQLAlchemy's [documentation for URL format](https://docs.sqlalchemy.org/en/20/core/engines.html#backend-specific-urls): diff --git a/mkdocs/docs/index.md b/mkdocs/docs/index.md index a8c2c6bd3c..1fee9cc69b 100644 --- a/mkdocs/docs/index.md +++ b/mkdocs/docs/index.md @@ -61,7 +61,7 @@ You either need to install `s3fs`, `adlfs`, `gcsfs`, or `pyarrow` to be able to ## Connecting to a catalog -Iceberg leverages the [catalog to have one centralized place to organize the tables](https://iceberg.apache.org/catalog/). This can be a traditional Hive catalog to store your Iceberg tables next to the rest, a vendor solution like the AWS Glue catalog, or an implementation of Icebergs' own [REST protocol](https://github.com/apache/iceberg/tree/main/open-api). Checkout the [configuration](configuration.md) page to find all the configuration details. +Iceberg leverages the [catalog to have one centralized place to organize the tables](https://iceberg.apache.org/concepts/catalog/). This can be a traditional Hive catalog to store your Iceberg tables next to the rest, a vendor solution like the AWS Glue catalog, or an implementation of Icebergs' own [REST protocol](https://github.com/apache/iceberg/tree/main/open-api). Checkout the [configuration](configuration.md) page to find all the configuration details. For the sake of demonstration, we'll configure the catalog to use the `SqlCatalog` implementation, which will store information in a local `sqlite` database. We'll also configure the catalog to store data files in the local filesystem instead of an object store. This should not be used in production due to the limited scalability.