-
Notifications
You must be signed in to change notification settings - Fork 379
feat: implement process manager and information_schema.process_list #5865
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Important Review skippedAuto reviews are disabled on this repository. Please check the settings in the CodeRabbit UI or the You can disable this status message by setting the Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
Documentation and Community
|
- **Error Handling Enhancements**:
Refactor Process Management in Meta Module - Introduced `ProcessManager` for handling process registration and deregistration. - Added methods for managing and querying process states, including `register_query`, `deregister_query`, and `list_all_processes`. - Removed redundant process management code from the query module. - Updated error handling to reflect changes in process management. - Enhanced test coverage for process management functionalities.
**Add Process Management Enhancements** - **`manager.rs`**: Introduced `process_manager` to `SystemCatalog` and `KvBackendCatalogManager` for improved process handling. - **`information_schema.rs`**: Updated table insertion logic to conditionally include `PROCESS_LIST`. - **`frontend.rs`, `standalone.rs`**: Enhanced `StartCommand` to clone `process_manager` for better resource management. - **`instance.rs`, `builder.rs`**: Integrated `ProcessManager` into `Instance` and `FrontendBuilder` to manage query
### Add Process Listing and Error Handling Enhancements - **Error Handling**: Introduced a new error variant `ListProcess` in `error.rs` to handle failures when listing running processes. - **Process List Implementation**: Enhanced `InformationSchemaProcessList` in `process_list.rs` to track running queries, including defining column names and implementing the `make_process_list` function to build the process list. - **Frontend Builder**: Added a `#[allow(clippy::too_many_arguments)]` attribute in `builder.rs` to suppress Clippy warnings for the `FrontendBuilder::new` function. These changes improve error handling and process tracking capabilities within the system.
Refactor imports in `process_list.rs` - Updated import paths for `Predicates` and `InformationTable` in `process_list.rs` to align with the new module structure.
Refactor process list generation in `process_list.rs` - Simplified the process list generation by removing intermediate row storage and directly building vectors. - Updated `process_to_row` function to use a mutable vector for current row data, improving memory efficiency. - Removed `rows_to_record_batch` function, integrating its logic directly into the main loop for streamlined processing.
- **Refactor Row Construction**: Updated row construction in multiple files to use references for `Value` objects, improving memory efficiency. Affected files include: - `cluster_info.rs` - `columns.rs` - `flows.rs` - `key_column_usage.rs` - `partitions.rs` - `procedure_info.rs` - `process_list.rs` - `region_peers.rs` - `region_statistics.rs` - `schemata.rs` - `table_constraints.rs` - `tables.rs` - `views.rs` - `pg_class.rs` - `pg_database.rs` - `pg_namespace.rs` - **Remove Unused Code**: Deleted unused functions and error variants related to process management in `process_list.rs` and `error.rs`. - **Predicate Evaluation Update**: Modified predicate evaluation functions in `predicate.rs` to work with references, enhancing performance.
931d85b
to
ab8ea60
Compare
### Implement Process Management Enhancements - **Error Handling Enhancements**: - Added new error variants `BumpSequence`, `StartReportTask`, `ReportProcess`, and `BuildProcessManager` in `error.rs` to improve error handling for process management tasks. - Updated `ErrorExt` implementations to handle new error types. - **Process Manager Improvements**: - Introduced `ProcessManager` enhancements in `process_manager.rs` to manage process states using `ProcessWithState` and `ProcessState` enums. - Implemented periodic task `ReportTask` to report running queries to the KV backend. - Modified `register_query` and `deregister_query` methods to use the new state management system. - **Testing and Validation**: - Updated tests in `process_manager.rs` to validate new process management logic. - Replaced `dump` method with `list_all_processes` for listing processes. - **Integration with Frontend and Standalone**: - Updated `frontend.rs` and `standalone.rs` to handle `ProcessManager` initialization errors using `BuildProcessManager` error variant. - **Schema Adjustments**: - Modified `process_list.rs` in `system_schema/information_schema` to use the updated process listing method. - **Key-Value Conversion**: - Added `TryFrom` implementation for converting `Process` to `KeyValue` in `process_list.rs`.
ab8ea60
to
caadc7b
Compare
#[derive(Debug, Clone, Serialize, Deserialize, PartialEq)] | ||
pub struct ProcessValue { | ||
/// Database name. | ||
pub database: String, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note that a query can cross schema, so it's hard to judge which "schema" the query is belongs to. We can use catalog here. When using greptimedb in multi-tenant mode, we use catalog to isolate tenants. Each connection should belongs to only one catalog.
/// The running query sql. | ||
pub query: String, | ||
/// Query start timestamp in milliseconds. | ||
pub start_timestamp_ms: i64, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We will also need to include some information of the client, there is a chance multiple clients issuing same query at same time, it's hard to judge process owner just with start_timestamp_ms
.
let mut query_builder = StringVectorBuilder::with_capacity(queries.len()); | ||
let mut start_time_builder = TimestampMillisecondVectorBuilder::with_capacity(queries.len()); | ||
let mut elapsed_time_builder = DurationMillisecondVectorBuilder::with_capacity(queries.len()); | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need to ensure this table only contains queries from current catalog. When using in multi-tenant mode, we should not allow one tenant to see global process list.
There is an exception that when connected to greptime catalog, we can show global state.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
client information like user, client ip/port are better to be included in this table.
https://dev.mysql.com/doc/refman/8.4/en/information-schema-processlist-table.html
One more idea, instead of this push model that each frontend reports this information to meta, can we make it a pull model that when we doing query against information.process_list, the frontend issues a request to fetch instant state from all siblings. Pros:
Cons:
|
I evaluated that approach at first time, but frontend does not know the address other frontend, to implement this we would need a service discovery thing. Maybe |
If we use pull mode, we can first query meta to get a list of live frontend nodes, then broadcast to all these frontends. I think we can afford the cost because it won't be a frequent query. The question is I remember we cannot do async operation on information schema. Not sure if it's still valid. |
@@ -331,11 +332,17 @@ impl StartCommand { | |||
|
|||
let information_extension = | |||
Arc::new(DistributedInformationExtension::new(meta_client.clone())); | |||
|
|||
let process_manager = Arc::new( | |||
ProcessManager::new(opts.grpc.server_addr.clone(), cached_meta_backend.clone()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Prefer to use the method that finds the Frontend address in Frontend's heartbeat:
peer_addr: addrs::resolve_addr(&opts.grpc.bind_addr, Some(&opts.grpc.server_addr)),
There's another issue regarding push mode: the pressure on the metasrv. The process list might be huge in a busy cluster, and the reporting tasks may submit too frequent for metasrv in a large cluster. Both produced extra network and storage issues for metasrv. |
For most queries that live less than the report interval, they won't be stored in metasrv or backend storage. It's not expected to have many long-running queries in any cluster. |
What's the status of this PR? |
I hereby agree to the terms of the GreptimeDB CLA.
Refer to a related PR or issue link (optional)
What's changed and what's your intention?
This PR adds implementation of
ProcessManager
andinformation_schema.process_list
table used to track running queries.This is the first step towards process management. To reduce PR size, currently no query will be registered to
ProcessManager
.PR Checklist
Please convert it to a draft if some of the following conditions are not met.