Skip to content

chore: Update api docs for SessionContext, TaskContext, etc #6106

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
Apr 26, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
63 changes: 49 additions & 14 deletions datafusion/core/src/execution/context.rs
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,7 @@
// specific language governing permissions and limitations
// under the License.

//! SessionContext contains methods for registering data sources and executing queries
//! [`SessionContext`] contains methods for registering data sources and executing queries
use crate::{
catalog::catalog::{CatalogList, MemoryCatalogList},
datasource::{
Expand Down Expand Up @@ -158,11 +158,15 @@ where
}
}

/// SessionContext is the main interface for executing queries with DataFusion. It stands for
/// the connection between user and DataFusion/Ballista cluster.
/// The context provides the following functionality
/// Main interface for executing queries with DataFusion. Maintains
/// the state of the connection between a user and an instance of the
/// DataFusion engine.
///
/// * Create DataFrame from a CSV or Parquet data source.
/// # Overview
///
/// [`SessionContext`] provides the following functionality:
///
/// * Create a DataFrame from a CSV or Parquet data source.
/// * Register a CSV or Parquet data source as a table that can be referenced from a SQL query.
/// * Register a custom data source that can be referenced from a SQL query.
/// * Execution a SQL query
Expand Down Expand Up @@ -199,6 +203,20 @@ where
/// # Ok(())
/// # }
/// ```
///
/// # `SessionContext`, `SessionState`, and `TaskContext`
///
/// A [`SessionContext`] can be created from a [`SessionConfig`] and
/// stores the state for a particular query session. A single
/// [`SessionContext`] can run multiple queries.
///
/// [`SessionState`] contains information available during query
/// planning (creating [`LogicalPlan`]s and [`ExecutionPlan`]s).
///
/// [`TaskContext`] contains the state available during query
/// execution [`ExecutionPlan::execute`]. It contains a subset of the
/// information in[`SessionState`] and is created from a
/// [`SessionContext`] or a [`SessionState`].
#[derive(Clone)]
pub struct SessionContext {
/// UUID for the session
Expand All @@ -216,7 +234,7 @@ impl Default for SessionContext {
}

impl SessionContext {
/// Creates a new execution context using a default session configuration.
/// Creates a new `SessionContext` using the default [`SessionConfig`].
pub fn new() -> Self {
Self::with_config(SessionConfig::new())
}
Expand All @@ -241,19 +259,35 @@ impl SessionContext {
Ok(())
}

/// Creates a new session context using the provided session configuration.
/// Creates a new `SessionContext` using the provided
/// [`SessionConfig`] and a new [`RuntimeEnv`].
///
/// See [`Self::with_config_rt`] for more details on resource
/// limits.
pub fn with_config(config: SessionConfig) -> Self {
let runtime = Arc::new(RuntimeEnv::default());
Self::with_config_rt(config, runtime)
}

/// Creates a new session context using the provided configuration and [`RuntimeEnv`].
/// Creates a new `SessionContext` using the provided
/// [`SessionConfig`] and a [`RuntimeEnv`].
///
/// # Resource Limits
///
/// By default, each new `SessionContext` creates a new
/// `RuntimeEnv`, and therefore will not enforce memory or disk
/// limits for queries run on different `SessionContext`s.
///
/// To enforce resource limits (e.g. to limit the total amount of
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

great, thanks

/// memory used) across all DataFusion queries in a process,
/// all `SessionContext`'s should be configured with the
/// same `RuntimeEnv`.
pub fn with_config_rt(config: SessionConfig, runtime: Arc<RuntimeEnv>) -> Self {
let state = SessionState::with_config_rt(config, runtime);
Self::with_state(state)
}

/// Creates a new session context using the provided session state.
/// Creates a new `SessionContext` using the provided [`SessionState`]
pub fn with_state(state: SessionState) -> Self {
Self {
session_id: state.session_id.clone(),
Expand All @@ -262,7 +296,7 @@ impl SessionContext {
}
}

/// Returns the time this session was created
/// Returns the time this `SessionContext` was created
pub fn session_start_time(&self) -> DateTime<Utc> {
self.session_start_time
}
Expand All @@ -282,12 +316,12 @@ impl SessionContext {
)
}

/// Return the [RuntimeEnv] used to run queries with this [SessionContext]
/// Return the [RuntimeEnv] used to run queries with this `SessionContext`
pub fn runtime_env(&self) -> Arc<RuntimeEnv> {
self.state.read().runtime_env.clone()
}

/// Return the `session_id` of this Session
/// Returns an id that uniquely identifies this `SessionContext`.
pub fn session_id(&self) -> String {
self.session_id.clone()
}
Expand Down Expand Up @@ -1205,7 +1239,7 @@ impl QueryPlanner for DefaultQueryPlanner {
/// Execution context for registering data sources and executing queries
#[derive(Clone)]
pub struct SessionState {
/// UUID for the session
/// A unique UUID that identifies the session
session_id: String,
/// Responsible for analyzing and rewrite a logical plan before optimization
analyzer: Analyzer,
Expand Down Expand Up @@ -1252,7 +1286,8 @@ pub fn default_session_builder(config: SessionConfig) -> SessionState {
}

impl SessionState {
/// Returns new SessionState using the provided configuration and runtime
/// Returns new [`SessionState`] using the provided
/// [`SessionConfig`] and [`RuntimeEnv`].
pub fn with_config_rt(config: SessionConfig, runtime: Arc<RuntimeEnv>) -> Self {
let catalog_list = Arc::new(MemoryCatalogList::new()) as Arc<dyn CatalogList>;
Self::with_config_rt_and_catalog_list(config, runtime, catalog_list)
Expand Down
25 changes: 1 addition & 24 deletions datafusion/core/src/execution/mod.rs
Original file line number Diff line number Diff line change
Expand Up @@ -15,30 +15,7 @@
// specific language governing permissions and limitations
// under the License.

//! This module contains the shared state available at different parts
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this was hard to find, so I moved the content on to the structs that were referenced

//! of query planning and execution
//!
//! # Runtime Environment
//!
//! [`runtime_env::RuntimeEnv`] can be created from a [`runtime_env::RuntimeConfig`] and
//! stores state to be shared across multiple sessions. In most applications there will
//! be a single [`runtime_env::RuntimeEnv`] for the entire process
//!
//! # Session Context
//!
//! [`context::SessionContext`] can be created from a [`context::SessionConfig`] and
//! an optional [`runtime_env::RuntimeConfig`], and stores the state for a particular
//! query session.
//!
//! In particular [`context::SessionState`] is the information available to query planning
//!
//! # Task Context
//!
//! [`context::TaskContext`] is typically created from a [`context::SessionContext`] or
//! [`context::SessionState`], and represents the state available to query execution.
//!
//! In particular it is the state passed to [`crate::physical_plan::ExecutionPlan::execute`]
//!
//! Shared state for query planning and execution.

pub mod context;
// backwards compatibility
Expand Down
14 changes: 11 additions & 3 deletions datafusion/execution/src/runtime_env.rs
Original file line number Diff line number Diff line change
Expand Up @@ -15,8 +15,8 @@
// specific language governing permissions and limitations
// under the License.

//! Execution runtime environment that holds object Store, memory manager, disk manager
//! and various system level components that are used during physical plan execution.
//! Execution [`RuntimeEnv`] environment that manages access to object
//! store, memory manager, disk manager.

use crate::{
disk_manager::{DiskManager, DiskManagerConfig},
Expand All @@ -32,7 +32,15 @@ use std::sync::Arc;
use url::Url;

#[derive(Clone)]
/// Execution runtime environment.
/// Execution runtime environment that manages system resources such
/// as memory, disk and storage.
///
/// A [`RuntimeEnv`] is created from a [`RuntimeConfig`] and has the
/// following resource management functionality:
///
/// * [`MemoryPool`]: Manage memory
/// * [`DiskManager`]: Manage temporary files on local disk
/// * [`ObjectStoreRegistry`]: Manage mapping URLs to object store instances
pub struct RuntimeEnv {
/// Runtime memory management
pub memory_pool: Arc<dyn MemoryPool>,
Expand Down
7 changes: 6 additions & 1 deletion datafusion/execution/src/task.rs
Original file line number Diff line number Diff line change
Expand Up @@ -32,6 +32,11 @@ use crate::{
};

/// Task Execution Context
///
/// A [`TaskContext`] has represents the state available during a single query's
/// execution.
///
/// # Task Context
pub struct TaskContext {
/// Session Id
session_id: String,
Expand Down Expand Up @@ -98,7 +103,7 @@ impl TaskContext {
))
}

/// Return the SessionConfig associated with the Task
/// Return the SessionConfig associated with this [TaskContext]
pub fn session_config(&self) -> &SessionConfig {
&self.session_config
}
Expand Down
10 changes: 6 additions & 4 deletions datafusion/physical-expr/src/execution_props.rs
Original file line number Diff line number Diff line change
Expand Up @@ -20,10 +20,12 @@ use chrono::{DateTime, TimeZone, Utc};
use std::collections::HashMap;
use std::sync::Arc;

/// Holds per-execution properties and data (such as starting timestamps, etc).
/// An instance of this struct is created each time a [`LogicalPlan`] is prepared for
/// execution (optimized). If the same plan is optimized multiple times, a new
/// `ExecutionProps` is created each time.
/// Holds per-query execution properties and data (such as statment
/// starting timestamps).
///
/// An [`ExecutionProps`] is created each time a [`LogicalPlan`] is
/// prepared for execution (optimized). If the same plan is optimized
/// multiple times, a new `ExecutionProps` is created each time.
///
/// It is important that this structure be cheap to create as it is
/// done so during predicate pruning and expression simplification
Expand Down