feat: impl Task for private #18311

KKould · 2025-07-04T11:22:50Z

I hereby agree to the terms of the CLA available at: https://docs.databend.com/dev/policies/cla/

Summary

Currently, Task-related functions rely on Cloud. This pr is used to support Task functions when privately deployed.

TODO:

Tests

Unit Test
Logic Test
Benchmark Test
No Test

Type of change

Bug Fix (non-breaking change which fixes an issue)
New Feature (non-breaking change which adds functionality)
Breaking Change (fix or feature that could cause existing functionality not to work as expected)
Documentation Update
Refactoring
Performance Improvement
Other (please describe):

This change is

zhang2014 · 2025-07-04T12:07:25Z

src/meta/proto-conv/src/task_from_to_protobuf_impl.rs

+            None => None,
+            Some(ref w) => Some(mt::WarehouseOptions {
+                warehouse: w.warehouse.clone(),
+                using_warehouse_size: w.using_warehouse_size.clone(),


why need warehouse_size?

Consistent with the current task structure defined by cloud, warehouse_options is rarely used in private

drmingdrmer

It's in general clean and rational to me. Good job!

I did not quite grasp the logic about the task execution between databend-query and meta. Can you provide a doc explaining this part? especially about the related key-value layout on databend-meta, and the duty of the semaphore in the task execution.

And add a new version change log entry here, and add a test for this change.

databend/src/meta/proto-conv/src/util.rs

Lines 166 to 171 in 6533304

    
               (134, "2025-06-27: Add: SequenceMeta.storage_version"), 
        
               // Dear developer: 
        
               //      If you're gonna add a new metadata version, you'll have to add a test for it. 
        
               //      You could just copy an existing test file(e.g., `../tests/it/v024_table_meta.rs`) 
        
               //      and replace two of the variable `bytes` and `want`. 
        
           ];

Reviewed 10 of 35 files at r1, 38 of 38 files at r2, all commit messages.
Reviewable status: all files reviewed, 3 unresolved discussions (waiting on @zhang2014)

src/query/users/src/user_task.rs line 42 at r2 (raw file):

        task_api.execute_task(task_name).await??;
        Ok(())
    }

these UserApiProvider method does not seem necessary. just UserApiProvider.task_api(tenant).xxx() looks good enough to me 🤔

Code quote:

    #[async_backtrace::framed]
    pub async fn execute_task(&self, tenant: &Tenant, task_name: &str) -> Result<()> {
        let task_api = self.task_api(tenant);
        task_api.execute_task(task_name).await??;
        Ok(())
    }

src/query/management/src/task/task_mgr.rs line 238 at r2 (raw file):

            .await?
            .try_collect::<Vec<_>>()
            .await?;

what is list_task_fallback for? it looks the same as list_task.

Code quote:

    pub async fn list_task(&self) -> Result<Vec<Task>, MetaError> {
        let key = DirName::new(TaskIdent::new(&self.tenant, ""));
        let strm = self.kv_api.list_pb_values(&key).await?;

        match strm.try_collect().await {
            Ok(tasks) => Ok(tasks),
            Err(_) => self.list_task_fallback().await,
        }
    }

    #[async_backtrace::framed]
    #[fastrace::trace]
    pub async fn list_task_fallback(&self) -> Result<Vec<Task>, MetaError> {
        let key = TaskIdent::new(&self.tenant, "dummy");
        let dir = DirName::new(key);
        let tasks = self
            .kv_api
            .list_pb_values(&dir)
            .await?
            .try_collect::<Vec<_>>()
            .await?;

KKould · 2025-07-15T02:31:49Z

Currently, query uses the watch in meta to imitate channel to obtain tasks. When task messages are sent to channels, they are stored in meta using TaskMessage::key.

TaskMessage::key is divided into only 4 types of keys that will overwrite each other to avoid repeated storage and repeated processing.

Whenever a new key is inserted for overwriting, each query will receive the corresponding key change and process it, thus realizing the channel

The init type key of watch is used to let the Service load the Schedule, and TaskService will delete the corresponding key (TaskMgr::accept) when processing Execute & After & Delete TaskMessage to avoid repeated processing @drmingdrmer

KKould · 2025-07-15T02:46:09Z

TaskMetaHandle::acquire_with_guard tries to acquire semaphore in a short time to implement distributed task preemption, because when the watch mechanism changes the key, all queries will receive it.

KKould · 2025-07-15T02:48:22Z

list_task_fallback refers to UdfMgr, maybe this method can be deleted in TaskMgr

drmingdrmer · 2025-07-15T03:54:50Z

@KKould
Based on our previous discussion, you mentioned that message processing follows an at-most-once pattern.

If that's still the case, you could simply get and delete the task message key from the meta-service within a single transaction. This ensures only one consumer can acquire the task for execution, eliminating the need for a semaphore.

PS, This logic should be documented in the source code so future readers can easily understand the design rationale.

KKould · 2025-07-15T06:33:01Z

@drmingdrmer Thanks for your suggestion. I tried KVApi::transaction, but it seems that this API cannot perform complex operations after obtaining semaphore as I am doing now. I can only use TxnOp. However, the other comments should be solved.

src/query/service/src/interpreters/task/mod.rs

zhang2014 · 2025-07-15T06:40:37Z

src/query/service/src/task/service.rs

+    }
+
+    pub async fn prepare(&self) -> Result<()> {
+        let prepare_key = format!("{}/task_run_prepare/lock", self.tenant.tenant_name());


Is this the task's execution history? Maybe it's better to reuse system_history instead. cc: @dqhl76

In fact, there is task_history in the current cloud task for querying running records. Is it better to forward system_task.task_run to task_history?

github-actions · 2025-07-15T07:51:30Z

Docker Image for PR

tag: pr-18311-cb2a6ab-1752565796

note: this image tag is only available for internal use.

drmingdrmer · 2025-07-15T08:15:39Z

@drmingdrmer Thanks for your suggestion. I tried KVApi::transaction, but it seems that this API cannot perform complex operations after obtaining semaphore as I am doing now. I can only use TxnOp. However, the other comments should be solved.

In order to provide a at-most-once semantics, It would be more simple by using an upsert(key, None) to fetch-and-delete a task message from the queue stored on meta-service.

And the semaphore can be removed. The transaction I mentioned above is meant to automatically fetch-and-remove a task message, not for exclusive task running.

github-actions · 2025-07-15T10:28:48Z

🤖 Smart Auto-retry Analysis (Retry 9)

Workflow: 16387552174

📊 Summary

Total Jobs: 80
Failed Jobs: 1
Retryable: 0
Code Issues: 1

❌ NO RETRY NEEDED

All failures appear to be code/test issues requiring manual fixes.

🔍 Job Details

❌ linux / test_logs: Not retryable (Code/Test)

🤖 About

Automated analysis using job annotations to distinguish infrastructure issues (auto-retried) from code/test issues (manual fixes needed).

src/query/service/src/task/service.rs

src/meta/app/src/principal/task.rs

src/query/service/src/task/service.rs

sundy-li

LGTM, please fix the conflicts

…ept` replace `TaskMetaHandle::acquire_with_guard`

drmingdrmer

Reviewed 8 of 16 files at r3, 7 of 8 files at r6, 2 of 8 files at r7, 5 of 7 files at r8, 4 of 5 files at r9, 1 of 3 files at r10, 8 of 8 files at r11, 1 of 1 files at r12, all commit messages.
Reviewable status: all files reviewed, 13 unresolved discussions (waiting on @dqhl76, @KKould, @sundy-li, and @zhang2014)

src/meta/app/src/principal/task.rs line 100 at r12 (raw file):

    pub fn key(&self) -> String {
        format!("{}@{}", self.task.task_name, self.run_id)
    }

This method is also called a key but it does follow the convention of other meta-service key, such TaskIdent, which has a fixed prefix __fd_tasks.

And this method seems not used anywhere.

If it is a key used in meta-service, it should be defined with TIdent, such as TaskIdent.

Code quote:

    pub fn key(&self) -> String {
        format!("{}@{}", self.task.task_name, self.run_id)
    }

src/meta/app/src/principal/task.rs line 109 at r12 (raw file):

    DeleteTask(String),
    AfterTask(Task),
}

This struct and other core structs would benefit from documentation comments to clarify their purpose.

Currently the naming is ambiguous:

ExecuteTask suggests a command to run a task immediately, right?
ScheduleTask implies periodic scheduling, but it's missing schedule information (interval, timing, etc.)
AfterTask is unclear - does it mean "run task B after task A completes" or something else?

Consider adding doc comments that explain each variant's specific behavior and use cases.

Code quote:

#[derive(Debug, Clone, PartialEq)]
pub enum TaskMessage {
    ExecuteTask(Task),
    ScheduleTask(Task),
    DeleteTask(String),
    AfterTask(Task),
}

src/meta/app/src/principal/task.rs line 133 at r12 (raw file):

    pub fn schedule_key(task_name: &str) -> String {
        format!("{}-1-{task_name}", TaskMessage::prefix())
    }

why is schedule_key special, while there is already key() method?

Please explain it in the doc.

Code quote:

    pub fn key(&self) -> String {
        let ty = match self {
            TaskMessage::ExecuteTask(_) => 0,
            TaskMessage::ScheduleTask(_) => 1,
            TaskMessage::DeleteTask(_) => 2,
            TaskMessage::AfterTask(_) => 3,
        };
        format!("{}-{}-{}", TaskMessage::prefix(), ty, self.task_name())
    }

    pub fn schedule_key(task_name: &str) -> String {
        format!("{}-1-{task_name}", TaskMessage::prefix())
    }

src/meta/app/src/principal/task.rs line 137 at r12 (raw file):

    pub fn prefix() -> i64 {
        0
    }

Aside from other considerations, the keys and prefixes used in the meta-service should be human-readable strings rather than bare digits. This improves debuggability and makes the system easier to troubleshoot when inspecting the underlying storage.

Code quote:

    pub fn prefix() -> i64 {
        0
    }

drmingdrmer

Reviewed 2 of 2 files at r13, all commit messages.
Reviewable status: all files reviewed, 13 unresolved discussions (waiting on @dqhl76, @KKould, @sundy-li, and @zhang2014)

github-actions bot added the pr-feature this PR introduces a new feature to the codebase label Jul 4, 2025

zhang2014 reviewed Jul 4, 2025

View reviewed changes

KKould force-pushed the feat/task_v2 branch 2 times, most recently from 22084d0 to dff7a50 Compare July 14, 2025 09:58

KKould marked this pull request as ready for review July 14, 2025 12:02

KKould requested a review from drmingdrmer as a code owner July 14, 2025 12:02

drmingdrmer requested a review from zhang2014 July 14, 2025 16:14

drmingdrmer requested changes Jul 14, 2025

View reviewed changes

KKould force-pushed the feat/task_v2 branch from 35916e4 to 5a252b4 Compare July 15, 2025 02:24

zhang2014 reviewed Jul 15, 2025

View reviewed changes

src/query/service/src/interpreters/task/mod.rs Outdated Show resolved Hide resolved

zhang2014 reviewed Jul 15, 2025

View reviewed changes

KKould force-pushed the feat/task_v2 branch from 94fcde4 to e15e561 Compare July 15, 2025 07:12

BohuTANG added the ci-cloud Build docker image for cloud test label Jul 15, 2025

KKould force-pushed the feat/task_v2 branch from aab3a66 to 76eae14 Compare July 16, 2025 06:44

sundy-li reviewed Jul 16, 2025

View reviewed changes

src/query/service/src/task/service.rs Show resolved Hide resolved

src/meta/app/src/principal/task.rs Show resolved Hide resolved

src/query/service/src/task/service.rs Show resolved Hide resolved

src/query/service/src/task/service.rs Outdated Show resolved Hide resolved

sundy-li approved these changes Jul 17, 2025

View reviewed changes

KKould requested review from zhang2014 and drmingdrmer July 17, 2025 07:13

KKould added 4 commits July 17, 2025 15:15

feat: impl Task for private

bf17de1

feat: imp when_condition for Task & store TaskRun on system table

4ccb7b7

feat: use Meta's Watch mechanism to distribute TaskMessage

6baff0a

refactor: Using SQL to simplify DAG

c42aac3

KKould added 18 commits July 17, 2025 15:15

chore: fix meta_store init

5fe7a8b

chore: log error on spawn

b4eb4bd

test: add test for private task

d9b4576

chore: add license for private task

51bcf06

chore: fix test_display_license_info

cc80564

chore: codefmt

181a438

chore: add accept for Delete & After

3a8a3dc

chore: add restart test

d7ba36b

chore: codefmt

f0721b1

chore: add system.task_history for Private Task & use `TaskMgr::acc…

0b8b72f

…ept` replace `TaskMetaHandle::acquire_with_guard`

fix: TableFunctionFactory create fail

c15189b

ci: rename to private task test

739b8be

chore: add cron test

2e9d55a

feat: add system table: system.tasks

a5f4931

chore: fix update_or_create_task_run correct on ScheduleTask

863f120

chore: add Task when test on test-private-task.sh

547aed3

chore: add private task config check on GlobalServices::init_with

fc5fe53

chore: update Task version on Meta

22ea27f

KKould force-pushed the feat/task_v2 branch from 99febfd to 22ea27f Compare July 17, 2025 07:19

KKould added 2 commits July 17, 2025 15:43

chore: codefmt

578694f

chore: remove TaskMgr::list_task_fallback

98addfd

drmingdrmer requested review from dqhl76 and sundy-li July 18, 2025 14:37

drmingdrmer approved these changes Jul 18, 2025

View reviewed changes

chore: codefmt

6848da5

drmingdrmer approved these changes Jul 19, 2025

View reviewed changes

sundy-li merged commit d6ddbae into databendlabs:main Jul 21, 2025
84 of 87 checks passed

KKould mentioned this pull request Jul 21, 2025

docs: add docs for EEFeature 'PRIVATE TASK' databendlabs/databend-docs#2521

Open

	(134, "2025-06-27: Add: SequenceMeta.storage_version"),
	// Dear developer:
	// If you're gonna add a new metadata version, you'll have to add a test for it.
	// You could just copy an existing test file(e.g., `../tests/it/v024_table_meta.rs`)
	// and replace two of the variable `bytes` and `want`.
	];

feat: impl Task for private #18311

feat: impl Task for private #18311

Uh oh!

Conversation

KKould commented Jul 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Tests

Type of change

Uh oh!

zhang2014 Jul 4, 2025

Choose a reason for hiding this comment

Uh oh!

KKould Jul 5, 2025

Choose a reason for hiding this comment

Uh oh!

drmingdrmer left a comment

Choose a reason for hiding this comment

Uh oh!

KKould commented Jul 15, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

KKould commented Jul 15, 2025

Uh oh!

KKould commented Jul 15, 2025

Uh oh!

drmingdrmer commented Jul 15, 2025

Uh oh!

KKould commented Jul 15, 2025

Uh oh!

Uh oh!

zhang2014 Jul 15, 2025

Choose a reason for hiding this comment

Uh oh!

KKould Jul 15, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

github-actions bot commented Jul 15, 2025

Docker Image for PR

Uh oh!

drmingdrmer commented Jul 15, 2025

Uh oh!

github-actions bot commented Jul 15, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🤖 Smart Auto-retry Analysis (Retry 9)

📊 Summary

❌ NO RETRY NEEDED

🔍 Job Details

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

sundy-li left a comment

Choose a reason for hiding this comment

Uh oh!

drmingdrmer left a comment

Choose a reason for hiding this comment

Uh oh!

drmingdrmer left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

KKould commented Jul 4, 2025 •

edited

Loading

KKould commented Jul 15, 2025 •

edited

Loading

KKould Jul 15, 2025 •

edited

Loading

github-actions bot commented Jul 15, 2025 •

edited

Loading