-
Notifications
You must be signed in to change notification settings - Fork 794
feat: impl Task for private #18311
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: impl Task for private #18311
Conversation
None => None, | ||
Some(ref w) => Some(mt::WarehouseOptions { | ||
warehouse: w.warehouse.clone(), | ||
using_warehouse_size: w.using_warehouse_size.clone(), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why need warehouse_size?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Consistent with the current task structure defined by cloud, warehouse_options is rarely used in private
22084d0
to
dff7a50
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's in general clean and rational to me. Good job!
I did not quite grasp the logic about the task execution between databend-query and meta. Can you provide a doc explaining this part? especially about the related key-value layout on databend-meta, and the duty of the semaphore in the task execution.
And add a new version change log entry here, and add a test for this change.
databend/src/meta/proto-conv/src/util.rs
Lines 166 to 171 in 6533304
(134, "2025-06-27: Add: SequenceMeta.storage_version"), | |
// Dear developer: | |
// If you're gonna add a new metadata version, you'll have to add a test for it. | |
// You could just copy an existing test file(e.g., `../tests/it/v024_table_meta.rs`) | |
// and replace two of the variable `bytes` and `want`. | |
]; |
Reviewed 10 of 35 files at r1, 38 of 38 files at r2, all commit messages.
Reviewable status: all files reviewed, 3 unresolved discussions (waiting on @zhang2014)
src/query/users/src/user_task.rs
line 42 at r2 (raw file):
task_api.execute_task(task_name).await??; Ok(()) }
these UserApiProvider method does not seem necessary. just UserApiProvider.task_api(tenant).xxx()
looks good enough to me 🤔
Code quote:
#[async_backtrace::framed]
pub async fn execute_task(&self, tenant: &Tenant, task_name: &str) -> Result<()> {
let task_api = self.task_api(tenant);
task_api.execute_task(task_name).await??;
Ok(())
}
src/query/management/src/task/task_mgr.rs
line 238 at r2 (raw file):
.await? .try_collect::<Vec<_>>() .await?;
what is list_task_fallback
for? it looks the same as list_task
.
Code quote:
pub async fn list_task(&self) -> Result<Vec<Task>, MetaError> {
let key = DirName::new(TaskIdent::new(&self.tenant, ""));
let strm = self.kv_api.list_pb_values(&key).await?;
match strm.try_collect().await {
Ok(tasks) => Ok(tasks),
Err(_) => self.list_task_fallback().await,
}
}
#[async_backtrace::framed]
#[fastrace::trace]
pub async fn list_task_fallback(&self) -> Result<Vec<Task>, MetaError> {
let key = TaskIdent::new(&self.tenant, "dummy");
let dir = DirName::new(key);
let tasks = self
.kv_api
.list_pb_values(&dir)
.await?
.try_collect::<Vec<_>>()
.await?;
Currently, query uses the watch in meta to imitate channel to obtain tasks. When task messages are sent to channels, they are stored in meta using TaskMessage::key is divided into only 4 types of keys that will overwrite each other to avoid repeated storage and repeated processing. Whenever a new key is inserted for overwriting, each query will receive the corresponding key change and process it, thus realizing the channel The init type key of watch is used to let the Service load the Schedule, and TaskService will delete the corresponding key ( |
|
|
@KKould If that's still the case, you could simply get and delete the task message key from the meta-service within a single transaction. This ensures only one consumer can acquire the task for execution, eliminating the need for a semaphore. PS, This logic should be documented in the source code so future readers can easily understand the design rationale. |
@drmingdrmer Thanks for your suggestion. I tried |
} | ||
|
||
pub async fn prepare(&self) -> Result<()> { | ||
let prepare_key = format!("{}/task_run_prepare/lock", self.tenant.tenant_name()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this the task's execution history? Maybe it's better to reuse system_history instead. cc: @dqhl76
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In fact, there is task_history
in the current cloud task for querying running records. Is it better to forward system_task.task_run
to task_history
?
Docker Image for PR
|
In order to provide a And the semaphore can be removed. The transaction I mentioned above is meant to automatically fetch-and-remove a task message, not for exclusive task running. |
🤖 Smart Auto-retry Analysis (Retry 9)
📊 Summary
❌ NO RETRY NEEDEDAll failures appear to be code/test issues requiring manual fixes. 🔍 Job Details
🤖 AboutAutomated analysis using job annotations to distinguish infrastructure issues (auto-retried) from code/test issues (manual fixes needed). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, please fix the conflicts
…ept` replace `TaskMetaHandle::acquire_with_guard`
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 8 of 16 files at r3, 7 of 8 files at r6, 2 of 8 files at r7, 5 of 7 files at r8, 4 of 5 files at r9, 1 of 3 files at r10, 8 of 8 files at r11, 1 of 1 files at r12, all commit messages.
Reviewable status: all files reviewed, 13 unresolved discussions (waiting on @dqhl76, @KKould, @sundy-li, and @zhang2014)
src/meta/app/src/principal/task.rs
line 100 at r12 (raw file):
pub fn key(&self) -> String { format!("{}@{}", self.task.task_name, self.run_id) }
This method is also called a key
but it does follow the convention of other meta-service key, such TaskIdent
, which has a fixed prefix __fd_tasks
.
And this method seems not used anywhere.
If it is a key used in meta-service, it should be defined with TIdent
, such as TaskIdent
.
Code quote:
pub fn key(&self) -> String {
format!("{}@{}", self.task.task_name, self.run_id)
}
src/meta/app/src/principal/task.rs
line 109 at r12 (raw file):
DeleteTask(String), AfterTask(Task), }
This struct and other core structs would benefit from documentation comments to clarify their purpose.
Currently the naming is ambiguous:
ExecuteTask
suggests a command to run a task immediately, right?ScheduleTask
implies periodic scheduling, but it's missing schedule information (interval, timing, etc.)AfterTask
is unclear - does it mean "run task B after task A completes" or something else?
Consider adding doc comments that explain each variant's specific behavior and use cases.
Code quote:
#[derive(Debug, Clone, PartialEq)]
pub enum TaskMessage {
ExecuteTask(Task),
ScheduleTask(Task),
DeleteTask(String),
AfterTask(Task),
}
src/meta/app/src/principal/task.rs
line 133 at r12 (raw file):
pub fn schedule_key(task_name: &str) -> String { format!("{}-1-{task_name}", TaskMessage::prefix()) }
why is schedule_key
special, while there is already key()
method?
Please explain it in the doc.
Code quote:
pub fn key(&self) -> String {
let ty = match self {
TaskMessage::ExecuteTask(_) => 0,
TaskMessage::ScheduleTask(_) => 1,
TaskMessage::DeleteTask(_) => 2,
TaskMessage::AfterTask(_) => 3,
};
format!("{}-{}-{}", TaskMessage::prefix(), ty, self.task_name())
}
pub fn schedule_key(task_name: &str) -> String {
format!("{}-1-{task_name}", TaskMessage::prefix())
}
src/meta/app/src/principal/task.rs
line 137 at r12 (raw file):
pub fn prefix() -> i64 { 0 }
Aside from other considerations, the keys and prefixes used in the meta-service should be human-readable strings rather than bare digits. This improves debuggability and makes the system easier to troubleshoot when inspecting the underlying storage.
Code quote:
pub fn prefix() -> i64 {
0
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 2 of 2 files at r13, all commit messages.
Reviewable status: all files reviewed, 13 unresolved discussions (waiting on @dqhl76, @KKould, @sundy-li, and @zhang2014)
I hereby agree to the terms of the CLA available at: https://docs.databend.com/dev/policies/cla/
Summary
Currently, Task-related functions rely on Cloud. This pr is used to support Task functions when privately deployed.
TODO:
system.task_history
system.task_dependents
system.tasks
Tests
Type of change
This change is