Skip to content

feat: implement process manager and information_schema.process_list #5865

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 16 commits into
base: main
Choose a base branch
from

Conversation

v0y4g3r
Copy link
Contributor

@v0y4g3r v0y4g3r commented Apr 9, 2025

I hereby agree to the terms of the GreptimeDB CLA.

Refer to a related PR or issue link (optional)

What's changed and what's your intention?

This PR adds implementation of ProcessManager and information_schema.process_list table used to track running queries.

This is the first step towards process management. To reduce PR size, currently no query will be registered to ProcessManager.

PR Checklist

Please convert it to a draft if some of the following conditions are not met.

  • I have written the necessary rustdoc comments.
  • I have added the necessary unit tests and integration tests.
  • This PR requires documentation updates.
  • API changes are backward compatible.
  • Schema or data changes are backward compatible.

Copy link
Contributor

coderabbitai bot commented Apr 9, 2025

Important

Review skipped

Auto reviews are disabled on this repository.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Generate unit testing code for this file.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai generate unit testing code for this file.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and generate unit testing code.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai generate docstrings to generate docstrings for this PR.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai plan to trigger planning for file edits and PR creation.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

@github-actions github-actions bot added the docs-not-required This change does not impact docs. label Apr 9, 2025
@v0y4g3r v0y4g3r mentioned this pull request Apr 9, 2025
4 tasks
@github-actions github-actions bot added docs-required This change requires docs update. and removed docs-not-required This change does not impact docs. labels Apr 10, 2025
v0y4g3r added 13 commits April 14, 2025 13:10
 - **Error Handling Enhancements**:
 Refactor Process Management in Meta Module

 - Introduced `ProcessManager` for handling process registration and deregistration.
 - Added methods for managing and querying process states, including `register_query`, `deregister_query`, and `list_all_processes`.
 - Removed redundant process management code from the query module.
 - Updated error handling to reflect changes in process management.
 - Enhanced test coverage for process management functionalities.
 **Add Process Management Enhancements**

 - **`manager.rs`**: Introduced `process_manager` to `SystemCatalog` and `KvBackendCatalogManager` for improved process handling.
 - **`information_schema.rs`**: Updated table insertion logic to conditionally include `PROCESS_LIST`.
 - **`frontend.rs`, `standalone.rs`**: Enhanced `StartCommand` to clone `process_manager` for better resource management.
 - **`instance.rs`, `builder.rs`**: Integrated `ProcessManager` into `Instance` and `FrontendBuilder` to manage query
 ### Add Process Listing and Error Handling Enhancements

 - **Error Handling**: Introduced a new error variant `ListProcess` in `error.rs` to handle failures when listing running processes.
 - **Process List Implementation**: Enhanced `InformationSchemaProcessList` in `process_list.rs` to track running queries, including defining column names and implementing the `make_process_list` function to build the process list.
 - **Frontend Builder**: Added a `#[allow(clippy::too_many_arguments)]` attribute in `builder.rs` to suppress Clippy warnings for the `FrontendBuilder::new` function.

 These changes improve error handling and process tracking capabilities within the system.
 Refactor imports in `process_list.rs`

 - Updated import paths for `Predicates` and `InformationTable` in `process_list.rs` to align with the new module structure.
 Refactor process list generation in `process_list.rs`

 - Simplified the process list generation by removing intermediate row storage and directly building vectors.
 - Updated `process_to_row` function to use a mutable vector for current row data, improving memory efficiency.
 - Removed `rows_to_record_batch` function, integrating its logic directly into the main loop for streamlined processing.
 - **Refactor Row Construction**: Updated row construction in multiple files to use references for `Value` objects, improving memory efficiency. Affected files include:
   - `cluster_info.rs`
   - `columns.rs`
   - `flows.rs`
   - `key_column_usage.rs`
   - `partitions.rs`
   - `procedure_info.rs`
   - `process_list.rs`
   - `region_peers.rs`
   - `region_statistics.rs`
   - `schemata.rs`
   - `table_constraints.rs`
   - `tables.rs`
   - `views.rs`
   - `pg_class.rs`
   - `pg_database.rs`
   - `pg_namespace.rs`
 - **Remove Unused Code**: Deleted unused functions and error variants related to process management in `process_list.rs` and `error.rs`.
 - **Predicate Evaluation Update**: Modified predicate evaluation functions in `predicate.rs` to work with references, enhancing performance.
@v0y4g3r v0y4g3r force-pushed the feat/show-process-list branch from 931d85b to ab8ea60 Compare April 14, 2025 13:20
v0y4g3r added 2 commits April 14, 2025 13:24
 ### Implement Process Management Enhancements

 - **Error Handling Enhancements**:
   - Added new error variants `BumpSequence`, `StartReportTask`, `ReportProcess`, and `BuildProcessManager` in `error.rs` to improve error handling for process management tasks.
   - Updated `ErrorExt` implementations to handle new error types.

 - **Process Manager Improvements**:
   - Introduced `ProcessManager` enhancements in `process_manager.rs` to manage process states using `ProcessWithState` and `ProcessState` enums.
   - Implemented periodic task `ReportTask` to report running queries to the KV backend.
   - Modified `register_query` and `deregister_query` methods to use the new state management system.

 - **Testing and Validation**:
   - Updated tests in `process_manager.rs` to validate new process management logic.
   - Replaced `dump` method with `list_all_processes` for listing processes.

 - **Integration with Frontend and Standalone**:
   - Updated `frontend.rs` and `standalone.rs` to handle `ProcessManager` initialization errors using `BuildProcessManager` error variant.

 - **Schema Adjustments**:
   - Modified `process_list.rs` in `system_schema/information_schema` to use the updated process listing method.

 - **Key-Value Conversion**:
   - Added `TryFrom` implementation for converting `Process` to `KeyValue` in `process_list.rs`.
@v0y4g3r v0y4g3r force-pushed the feat/show-process-list branch from ab8ea60 to caadc7b Compare April 14, 2025 13:41
@v0y4g3r v0y4g3r changed the title feat: support information_schema.process_list table to show running queries feat: implement process manager and information_schema.process_list Apr 14, 2025
@v0y4g3r v0y4g3r marked this pull request as ready for review April 14, 2025 13:43
@v0y4g3r v0y4g3r requested review from MichaelScofield and a team as code owners April 14, 2025 13:43
@v0y4g3r v0y4g3r requested a review from sunng87 April 14, 2025 13:43
@github-actions github-actions bot removed the docs-required This change requires docs update. label Apr 14, 2025
@github-actions github-actions bot added the docs-not-required This change does not impact docs. label Apr 14, 2025
#[derive(Debug, Clone, Serialize, Deserialize, PartialEq)]
pub struct ProcessValue {
/// Database name.
pub database: String,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that a query can cross schema, so it's hard to judge which "schema" the query is belongs to. We can use catalog here. When using greptimedb in multi-tenant mode, we use catalog to isolate tenants. Each connection should belongs to only one catalog.

/// The running query sql.
pub query: String,
/// Query start timestamp in milliseconds.
pub start_timestamp_ms: i64,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We will also need to include some information of the client, there is a chance multiple clients issuing same query at same time, it's hard to judge process owner just with start_timestamp_ms.

let mut query_builder = StringVectorBuilder::with_capacity(queries.len());
let mut start_time_builder = TimestampMillisecondVectorBuilder::with_capacity(queries.len());
let mut elapsed_time_builder = DurationMillisecondVectorBuilder::with_capacity(queries.len());

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to ensure this table only contains queries from current catalog. When using in multi-tenant mode, we should not allow one tenant to see global process list.

There is an exception that when connected to greptime catalog, we can show global state.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

client information like user, client ip/port are better to be included in this table.
https://dev.mysql.com/doc/refman/8.4/en/information-schema-processlist-table.html

@sunng87
Copy link
Member

sunng87 commented Apr 15, 2025

One more idea, instead of this push model that each frontend reports this information to meta, can we make it a pull model that when we doing query against information.process_list, the frontend issues a request to fetch instant state from all siblings.

Pros:

  1. More instant data
  2. No overhead when no query is on process_list

Cons:

  1. Need cache when large quality of queries on process_list

@v0y4g3r
Copy link
Contributor Author

v0y4g3r commented Apr 15, 2025

One more idea, instead of this push model that each frontend reports this information to meta, can we make it a pull model that when we doing query against information.process_list, the frontend issues a request to fetch instant state from all siblings.

Pros:

  1. More instant data
  2. No overhead when no query is on process_list

Cons:

  1. Need cache when large quality of queries on process_list

I evaluated that approach at first time, but frontend does not know the address other frontend, to implement this we would need a service discovery thing. Maybe information_schema.cluster_info will do the trick but the frontend list in that table is also updated along with heartbeats, which is not realtime.

@sunng87
Copy link
Member

sunng87 commented Apr 16, 2025

If we use pull mode, we can first query meta to get a list of live frontend nodes, then broadcast to all these frontends. I think we can afford the cost because it won't be a frequent query.

The question is I remember we cannot do async operation on information schema. Not sure if it's still valid.

@@ -331,11 +332,17 @@ impl StartCommand {

let information_extension =
Arc::new(DistributedInformationExtension::new(meta_client.clone()));

let process_manager = Arc::new(
ProcessManager::new(opts.grpc.server_addr.clone(), cached_meta_backend.clone())
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Prefer to use the method that finds the Frontend address in Frontend's heartbeat:

peer_addr: addrs::resolve_addr(&opts.grpc.bind_addr, Some(&opts.grpc.server_addr)),

@MichaelScofield
Copy link
Collaborator

There's another issue regarding push mode: the pressure on the metasrv. The process list might be huge in a busy cluster, and the reporting tasks may submit too frequent for metasrv in a large cluster. Both produced extra network and storage issues for metasrv.

@v0y4g3r
Copy link
Contributor Author

v0y4g3r commented Apr 16, 2025

There's another issue regarding push mode: the pressure on the metasrv. The process list might be huge in a busy cluster, and the reporting tasks may submit too frequent for metasrv in a large cluster. Both produced extra network and storage issues for metasrv.

For most queries that live less than the report interval, they won't be stored in metasrv or backend storage. It's not expected to have many long-running queries in any cluster.

@killme2008
Copy link
Contributor

What's the status of this PR?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
docs-not-required This change does not impact docs.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants