Skip to content

Commit

Permalink
Merge branch 'main' into bucket-transforms
Browse files Browse the repository at this point in the history
  • Loading branch information
sungwy committed Jan 16, 2025
2 parents 7079265 + b806cfa commit 77246d5
Show file tree
Hide file tree
Showing 54 changed files with 3,559 additions and 1,759 deletions.
14 changes: 2 additions & 12 deletions mkdocs/requirements.txt → .codespellrc
Original file line number Diff line number Diff line change
Expand Up @@ -14,15 +14,5 @@
# KIND, either express or implied. See the License for the
# specific language governing permissions and limitations
# under the License.

mkdocs==1.6.1
griffe==1.5.1
jinja2==3.1.5
mkdocstrings==0.27.0
mkdocstrings-python==1.12.2
mkdocs-literate-nav==0.6.1
mkdocs-autorefs==1.2.0
mkdocs-gen-files==0.5.0
mkdocs-material==9.5.49
mkdocs-material-extensions==1.3.1
mkdocs-section-index==0.3.9
[codespell]
ignore-words-list = BoundIn,fo,MoR,NotIn,notIn,oT
10 changes: 5 additions & 5 deletions .github/workflows/python-ci-docs.yml
Original file line number Diff line number Diff line change
Expand Up @@ -36,12 +36,12 @@ jobs:

steps:
- uses: actions/checkout@v4
- name: Install poetry
run: make install-poetry
- uses: actions/setup-python@v5
with:
python-version: 3.12
- name: Install
working-directory: ./mkdocs
run: pip install -r requirements.txt
- name: Build
working-directory: ./mkdocs
run: mkdocs build --strict
run: make docs-install
- name: Build docs
run: make docs-build
12 changes: 6 additions & 6 deletions .github/workflows/python-release-docs.yml
Original file line number Diff line number Diff line change
Expand Up @@ -31,15 +31,15 @@ jobs:

steps:
- uses: actions/checkout@v4
- name: Install poetry
run: make install-poetry
- uses: actions/setup-python@v5
with:
python-version: ${{ matrix.python }}
- name: Install
working-directory: ./mkdocs
run: pip install -r requirements.txt
- name: Build
working-directory: ./mkdocs
run: mkdocs build --strict
- name: Install docs
run: make docs-install
- name: Build docs
run: make docs-build
- name: Copy
working-directory: ./mkdocs
run: mv ./site /tmp/site
Expand Down
19 changes: 8 additions & 11 deletions .pre-commit-config.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -28,26 +28,19 @@ repos:
- id: check-yaml
- id: check-ast
- repo: https://github.com/astral-sh/ruff-pre-commit
# Ruff version (Used for linting)
rev: v0.7.4
rev: v0.8.6
hooks:
- id: ruff
args: [ --fix, --exit-non-zero-on-fix, --preview ]
args: [ --fix, --exit-non-zero-on-fix ]
- id: ruff-format
args: [ --preview ]
- repo: https://github.com/pre-commit/mirrors-mypy
rev: v1.8.0
rev: v1.14.1
hooks:
- id: mypy
args:
[--install-types, --non-interactive, --config=pyproject.toml]
- repo: https://github.com/hadialqattan/pycln
rev: v2.4.0
hooks:
- id: pycln
args: [--config=pyproject.toml]
- repo: https://github.com/igorshubovych/markdownlint-cli
rev: v0.42.0
rev: v0.43.0
hooks:
- id: markdownlint
args: ["--fix"]
Expand All @@ -69,6 +62,10 @@ repos:
# --line-length is set to a high value to deal with very long lines
- --line-length
- '99999'
- repo: https://github.com/codespell-project/codespell
rev: v2.3.0
hooks:
- id: codespell
ci:
autofix_commit_msg: |
[pre-commit.ci] auto fixes from pre-commit.com hooks
Expand Down
11 changes: 10 additions & 1 deletion Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -27,7 +27,7 @@ install-poetry: ## Install poetry if the user has not done that yet.
echo "Poetry is already installed."; \
fi

install-dependencies: ## Install dependencies including dev and all extras
install-dependencies: ## Install dependencies including dev, docs, and all extras
poetry install --all-extras

install: | install-poetry install-dependencies
Expand Down Expand Up @@ -97,3 +97,12 @@ clean: ## Clean up the project Python working environment
@find . -name "*.pyd" -exec echo Deleting {} \; -delete
@find . -name "*.pyo" -exec echo Deleting {} \; -delete
@echo "Cleanup complete"

docs-install:
poetry install --with docs

docs-serve:
poetry run mkdocs serve -f mkdocs/mkdocs.yml

docs-build:
poetry run mkdocs build -f mkdocs/mkdocs.yml --strict
5 changes: 2 additions & 3 deletions mkdocs/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -22,7 +22,6 @@ The pyiceberg docs are stored in `docs/`.
## Running docs locally

```sh
pip3 install -r requirements.txt
mkdocs serve
open http://localhost:8000/
make docs-install
make docs-serve
```
22 changes: 15 additions & 7 deletions mkdocs/docs/api.md
Original file line number Diff line number Diff line change
Expand Up @@ -1005,7 +1005,7 @@ tbl.add_files(file_paths=file_paths)

## Schema evolution

PyIceberg supports full schema evolution through the Python API. It takes care of setting the field-IDs and makes sure that only non-breaking changes are done (can be overriden).
PyIceberg supports full schema evolution through the Python API. It takes care of setting the field-IDs and makes sure that only non-breaking changes are done (can be overridden).

In the examples below, the `.update_schema()` is called from the table itself.

Expand Down Expand Up @@ -1072,30 +1072,36 @@ Using `add_column` you can add a column, without having to worry about the field
with table.update_schema() as update:
update.add_column("retries", IntegerType(), "Number of retries to place the bid")
# In a struct
update.add_column("details.confirmed_by", StringType(), "Name of the exchange")
update.add_column("details", StructType())
with table.update_schema() as update:
update.add_column(("details", "confirmed_by"), StringType(), "Name of the exchange")
```

A complex type must exist before columns can be added to it. Fields in complex types are added in a tuple.

### Rename column

Renaming a field in an Iceberg table is simple:

```python
with table.update_schema() as update:
update.rename_column("retries", "num_retries")
# This will rename `confirmed_by` to `exchange`
update.rename_column("properties.confirmed_by", "exchange")
# This will rename `confirmed_by` to `processed_by` in the `details` struct
update.rename_column(("details", "confirmed_by"), "processed_by")
```

### Move column

Move a field inside of struct:
Move order of fields:

```python
with table.update_schema() as update:
update.move_first("symbol")
# This will move `bid` after `ask`
update.move_after("bid", "ask")
# This will move `confirmed_by` before `exchange`
update.move_before("details.created_by", "details.exchange")
# This will move `confirmed_by` before `exchange` in the `details` struct
update.move_before(("details", "confirmed_by"), ("details", "exchange"))
```

### Update column
Expand Down Expand Up @@ -1127,6 +1133,8 @@ Delete a field, careful this is a incompatible change (readers/writers might exp
```python
with table.update_schema(allow_incompatible_changes=True) as update:
update.delete_column("some_field")
# In a struct
update.delete_column(("details", "confirmed_by"))
```

## Partition evolution
Expand Down
30 changes: 15 additions & 15 deletions mkdocs/docs/configuration.md
Original file line number Diff line number Diff line change
Expand Up @@ -102,21 +102,21 @@ For the FileIO there are several configuration options available:

<!-- markdown-link-check-disable -->

| Key | Example | Description |
|----------------------|----------------------------|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| s3.endpoint | <https://10.0.19.25/> | Configure an alternative endpoint of the S3 service for the FileIO to access. This could be used to use S3FileIO with any s3-compatible object storage service that has a different endpoint, or access a private S3 endpoint in a virtual private cloud. |
| s3.access-key-id | admin | Configure the static access key id used to access the FileIO. |
| s3.secret-access-key | password | Configure the static secret access key used to access the FileIO. |
| s3.session-token | AQoDYXdzEJr... | Configure the static session token used to access the FileIO. |
| s3.role-session-name | session | An optional identifier for the assumed role session. |
| s3.role-arn | arn:aws:... | AWS Role ARN. If provided instead of access_key and secret_key, temporary credentials will be fetched by assuming this role. |
| s3.signer | bearer | Configure the signature version of the FileIO. |
| s3.signer.uri | <http://my.signer:8080/s3> | Configure the remote signing uri if it differs from the catalog uri. Remote signing is only implemented for `FsspecFileIO`. The final request is sent to `<s3.signer.uri>/<s3.signer.endpoint>`. |
| s3.signer.endpoint | v1/main/s3-sign | Configure the remote signing endpoint. Remote signing is only implemented for `FsspecFileIO`. The final request is sent to `<s3.signer.uri>/<s3.signer.endpoint>`. (default : v1/aws/s3/sign). |
| s3.region | us-west-2 | Sets the region of the bucket |
| s3.proxy-uri | <http://my.proxy.com:8080> | Configure the proxy server to be used by the FileIO. |
| s3.connect-timeout | 60.0 | Configure socket connection timeout, in seconds. |
| s3.force-virtual-addressing | False | Whether to use virtual addressing of buckets. If true, then virtual addressing is always enabled. If false, then virtual addressing is only enabled if endpoint_override is empty. This can be used for non-AWS backends that only support virtual hosted-style access. |
| Key | Example | Description |
|----------------------|----------------------------|-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| s3.endpoint | <https://10.0.19.25/> | Configure an alternative endpoint of the S3 service for the FileIO to access. This could be used to use S3FileIO with any s3-compatible object storage service that has a different endpoint, or access a private S3 endpoint in a virtual private cloud. |
| s3.access-key-id | admin | Configure the static access key id used to access the FileIO. |
| s3.secret-access-key | password | Configure the static secret access key used to access the FileIO. |
| s3.session-token | AQoDYXdzEJr... | Configure the static session token used to access the FileIO. |
| s3.role-session-name | session | An optional identifier for the assumed role session. |
| s3.role-arn | arn:aws:... | AWS Role ARN. If provided instead of access_key and secret_key, temporary credentials will be fetched by assuming this role. |
| s3.signer | bearer | Configure the signature version of the FileIO. |
| s3.signer.uri | <http://my.signer:8080/s3> | Configure the remote signing uri if it differs from the catalog uri. Remote signing is only implemented for `FsspecFileIO`. The final request is sent to `<s3.signer.uri>/<s3.signer.endpoint>`. |
| s3.signer.endpoint | v1/main/s3-sign | Configure the remote signing endpoint. Remote signing is only implemented for `FsspecFileIO`. The final request is sent to `<s3.signer.uri>/<s3.signer.endpoint>`. (default : v1/aws/s3/sign). |
| s3.region | us-west-2 | Configure the default region used to initialize an `S3FileSystem`. `PyArrowFileIO` attempts to automatically resolve the region for each S3 bucket, falling back to this value if resolution fails. |
| s3.proxy-uri | <http://my.proxy.com:8080> | Configure the proxy server to be used by the FileIO. |
| s3.connect-timeout | 60.0 | Configure socket connection timeout, in seconds. |
| s3.force-virtual-addressing | False | Whether to use virtual addressing of buckets. If true, then virtual addressing is always enabled. If false, then virtual addressing is only enabled if endpoint_override is empty. This can be used for non-AWS backends that only support virtual hosted-style access. |

<!-- markdown-link-check-enable-->

Expand Down
2 changes: 1 addition & 1 deletion mkdocs/docs/how-to-release.md
Original file line number Diff line number Diff line change
Expand Up @@ -31,7 +31,7 @@ This guide outlines the process for releasing PyIceberg in accordance with the [

* A GPG key must be registered and published in the [Apache Iceberg KEYS file](https://downloads.apache.org/iceberg/KEYS). Follow [the instructions for setting up a GPG key and uploading it to the KEYS file](#set-up-gpg-key-and-upload-to-apache-iceberg-keys-file).
* SVN Access
* Permission to upload artifacts to the [Apache development distribution](https://dist.apache.org/repos/dist/dev/iceberg/) (requires Apache Commmitter access).
* Permission to upload artifacts to the [Apache development distribution](https://dist.apache.org/repos/dist/dev/iceberg/) (requires Apache Committer access).
* Permission to upload artifacts to the [Apache release distribution](https://dist.apache.org/repos/dist/release/iceberg/) (requires Apache PMC access).
* PyPI Access
* The `twine` package must be installed for uploading releases to PyPi.
Expand Down
2 changes: 1 addition & 1 deletion mkdocs/docs/verify-release.md
Original file line number Diff line number Diff line change
Expand Up @@ -111,7 +111,7 @@ To run the full test coverage, with both unit tests and integration tests:
make test-coverage
```

This will spin up Docker containers to faciliate running test coverage.
This will spin up Docker containers to facilitate running test coverage.

# Cast the vote

Expand Down
Loading

0 comments on commit 77246d5

Please sign in to comment.