Skip to content

Commit

Permalink
Lw delete set fix (#174)
Browse files Browse the repository at this point in the history
* Move lightweight delete settings to per query for HTTP stickiness fix

* Minor cleanup and doc updates
  • Loading branch information
genzgd authored Jul 27, 2023
1 parent 3a28a66 commit 1a0649e
Show file tree
Hide file tree
Showing 8 changed files with 55 additions and 25 deletions.
2 changes: 2 additions & 0 deletions .github/workflows/pypi.yml
Original file line number Diff line number Diff line change
@@ -1,12 +1,14 @@
---
name: "PyPI Release"

# yamllint disable-line rule:truthy
on:
push:
tags:
- 'v*'
workflow_dispatch:


jobs:
publish:
name: PyPI Release
Expand Down
1 change: 1 addition & 0 deletions .github/workflows/test_cloud.yml
Original file line number Diff line number Diff line change
Expand Up @@ -17,6 +17,7 @@ jobs:
DBT_CH_TEST_HOST: ${{ secrets.INTEGRATIONS_TEAM_TESTS_CLOUD_HOST }}
DBT_CH_TEST_PASSWORD: ${{ secrets.INTEGRATIONS_TEAM_TESTS_CLOUD_PASSWORD }}
DBT_CH_TEST_CLUSTER_MODE: true
DBT_CH_TEST_CLOUD: true

steps:
- name: Checkout
Expand Down
6 changes: 6 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -1,3 +1,9 @@
### Release [1.4.6], 2023-07-27
#### Bug fix
- Lightweight deletes could fail in environments where the HTTP session was not preserved (such as clusters behind a non-sticky
load balancer). This has been fixed by sending the required settings with every request instead of relying on a SET statement.
A similar approach has been used to persist the 'insert_distributed_sync' setting for Distributed table materializations.

### Release [1.4.5], 2023-07-27
#### Improvement
- Adds additional experimental support for Distributed table engine models and incremental materialization. See the README for
Expand Down
19 changes: 13 additions & 6 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -34,6 +34,14 @@ pip install dbt-clickhouse

# Usage Notes

## SET Statement Warning
In many environments, using the SET statement to persist a ClickHouse setting across all DBT queries is not reliable
and can cause unexpected failures. This is particularly true when using HTTP connections through a load balancer that
distributes queries across multiple nodes (such as ClickHouse cloud), although in some circumstances this can also
happen with native ClickHouse connections. Accordingly, we recommend configuring any required ClickHouse settings in the
"custom_settings" property of the DBT profile as a best practice, instead of relying on a prehook "SET" statement as
has been occasionally suggested.

## Database

The dbt model relation identifier `database.schema.table` is not compatible with Clickhouse because Clickhouse does not support a `schema`.
Expand Down Expand Up @@ -68,7 +76,7 @@ your_profile_name:
use_lw_deletes: [False] Use the strategy `delete+insert` as the default incremental strategy.
check_exchange: [True] # Validate that clickhouse support the atomic EXCHANGE TABLES command. (Not needed for most ClickHouse versions)
local_suffix [local] # Table suffix of local tables on shards for distributed materializations
custom_settings: [{}] # A dicitonary/mapping of custom ClickHouse settings for the connection - default is empty.
custom_settings: [{}] # A dictionary/mapping of custom ClickHouse settings for the connection - default is empty.
# Native (clickhouse-driver) connection settings
sync_request_timeout: [5] Timeout for server ping
Expand Down Expand Up @@ -158,12 +166,11 @@ See the [S3 test file](https://github.com/ClickHouse/dbt-clickhouse/blob/main/te

# Distributed materializations

Note: Distributed materializations experimental and are not currently included in the automated test suite.

WARNING:
Notes:

To use distributed materializations correctly you should set **insert_distributed_sync** = 1 (or use as prehook) in order to have correct data while SELECT queries. Otherwise, downstream operations could produce invalid results
if the distributed insert has not completed before additional updates are executed.
- Distributed materializations are experimental and are not currently included in the automated test suite.
- dbt-clickhouse queries now automatically include the setting `insert_distributed_sync = 1` in order to ensure that downstream incremental
materialization operations execute correctly. This could cause some distributed table inserts to run more slowly than expected.

## Distributed table materialization

Expand Down
2 changes: 1 addition & 1 deletion dbt/adapters/clickhouse/__version__.py
Original file line number Diff line number Diff line change
@@ -1 +1 @@
version = '1.4.5'
version = '1.4.6'
42 changes: 27 additions & 15 deletions dbt/adapters/clickhouse/dbclient.py
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,9 @@
from dbt.adapters.clickhouse.credentials import ClickHouseCredentials
from dbt.adapters.clickhouse.logger import logger

LW_DELETE_SETTING = 'allow_experimental_lightweight_delete'
ND_MUTATION_SETTING = 'allow_nondeterministic_mutations'


def get_db_client(credentials: ClickHouseCredentials):
driver = credentials.driver
Expand Down Expand Up @@ -63,6 +66,7 @@ def __init__(self, credentials: ClickHouseCredentials):
self._conn_settings['database_replicated_enforce_synchronous_settings'] = '1'
self._conn_settings['insert_quorum'] = 'auto'
self._conn_settings['mutations_sync'] = '2'
self._conn_settings['insert_distributed_sync'] = '1'
self._client = self._create_client(credentials)
check_exchange = credentials.check_exchange and not credentials.cluster_mode
try:
Expand Down Expand Up @@ -108,27 +112,35 @@ def _server_version(self):
pass

def _check_lightweight_deletes(self, requested: bool):
lw_deletes = self.get_ch_setting('allow_experimental_lightweight_delete')
if lw_deletes is None:
lw_deletes = self.get_ch_setting(LW_DELETE_SETTING)
nd_mutations = self.get_ch_setting(ND_MUTATION_SETTING)
if lw_deletes is None or nd_mutations is None:
if requested:
logger.warning(
'use_lw_deletes requested but are not available on this ClickHouse server'
)
return False, False
lw_deletes = int(lw_deletes)
if lw_deletes == 1:
lw_deletes = int(lw_deletes) > 0
if not lw_deletes:
try:
self.command(f'SET {LW_DELETE_SETTING} = 1')
self._conn_settings[LW_DELETE_SETTING] = '1'
lw_deletes = True
except DbtDatabaseError:
pass
nd_mutations = int(nd_mutations) > 0
if lw_deletes and not nd_mutations:
try:
self.command(f'SET {ND_MUTATION_SETTING} = 1')
self._conn_settings[ND_MUTATION_SETTING] = '1'
nd_mutations = True
except DbtDatabaseError:
pass
if lw_deletes and nd_mutations:
return True, requested
if not requested:
return False, False
try:
self.command('SET allow_experimental_lightweight_delete = 1')
self.command('SET allow_nondeterministic_mutations = 1')
return True, True
except DbtDatabaseError as ex:
logger.warning(
'use_lw_deletes requested but cannot enable on this ClickHouse server %s', str(ex)
)
return False, False
if requested:
logger.warning('use_lw_deletes requested but cannot enable on this ClickHouse server')
return False, False

def _ensure_database(self, database_engine, cluster_name) -> None:
if not self.database:
Expand Down
6 changes: 4 additions & 2 deletions tests/integration/adapter/persist_docs/test_persist_docs.py
Original file line number Diff line number Diff line change
Expand Up @@ -101,14 +101,16 @@ def project_config_update(self):
}
}

def test_has_comments_pglike(self, project):
def test_has_comments_pg_like(self):
if os.environ.get('DBT_CH_TEST_CLOUD', '').lower() in ('1', 'true', 'yes'):
pytest.skip('Not running comment test for cloud')
run_dbt(["docs", "generate"])
with open("target/catalog.json") as fp:
catalog_data = json.load(fp)
assert "nodes" in catalog_data
assert len(catalog_data["nodes"]) == 4
table_node = catalog_data["nodes"]["model.test.table_model"]
view_node = self._assert_has_table_comments(table_node)
self._assert_has_table_comments(table_node)

view_node = catalog_data["nodes"]["model.test.view_model"]
self._assert_has_view_comments(view_node)
Expand Down
2 changes: 1 addition & 1 deletion tests/integration/adapter/test_comments.py
Original file line number Diff line number Diff line change
Expand Up @@ -67,7 +67,7 @@ def models(self):
['table_comment', 'view_comment'],
)
def test_comment(self, project, model_name):
if '_cloud' in os.environ.get('GITHUB_REF', ''):
if os.environ.get('DBT_CH_TEST_CLOUD', '').lower() in ('1', 'true', 'yes'):
pytest.skip('Not running comment test for cloud')
run_dbt(["run"])
run_dbt(["docs", "generate"])
Expand Down

0 comments on commit 1a0649e

Please sign in to comment.