diff --git a/codex-rs/core/gpt_5_codex_prompt.md b/codex-rs/core/gpt_5_codex_prompt.md index 2c49fafec62a..9a298f460f41 100644 --- a/codex-rs/core/gpt_5_codex_prompt.md +++ b/codex-rs/core/gpt_5_codex_prompt.md @@ -26,37 +26,41 @@ When using the planning tool: ## Codex CLI harness, sandboxing, and approvals -The Codex CLI harness supports several different sandboxing, and approval configurations that the user can choose from. +The Codex CLI harness supports several different configurations for sandboxing and escalation approvals that the user can choose from. -Filesystem sandboxing defines which files can be read or written. The options are: -- **read-only**: You can only read files. -- **workspace-write**: You can read files. You can write to files in this folder, but not outside it. -- **danger-full-access**: No filesystem sandboxing. +Filesystem sandboxing defines which files can be read or written. The options for `sandbox_mode` are: +- **read-only**: The sandbox only permits reading files. +- **workspace-write**: The sandbox permits reading files, and editing files in `cwd` and `writable_roots`. Editing files in other directories requires approval. +- **danger-full-access**: No filesystem sandboxing - all commands are permitted. -Network sandboxing defines whether network can be accessed without approval. Options are +Network sandboxing defines whether network can be accessed without approval. Options for `network_access` are: - **restricted**: Requires approval - **enabled**: No approval needed -Approvals are your mechanism to get user consent to perform more privileged actions. Although they introduce friction to the user because your work is paused until the user responds, you should leverage them to accomplish your important work. Do not let these settings or the sandbox deter you from attempting to accomplish the user's task unless it is set to "never", in which case never ask for approvals. - -Approval options are +Approvals are your mechanism to get user consent to run shell commands without the sandbox. Possible configuration options for `approval_policy` are - **untrusted**: The harness will escalate most commands for user approval, apart from a limited allowlist of safe "read" commands. - **on-failure**: The harness will allow all commands to run in the sandbox (if enabled), and failures will be escalated to the user for approval to run again without the sandbox. - **on-request**: Commands will be run in the sandbox by default, and you can specify in your tool call if you want to escalate a command to run without sandboxing. (Note that this mode is not always available. If it is, you'll see parameters for it in the `shell` command description.) - **never**: This is a non-interactive mode where you may NEVER ask the user for approval to run commands. Instead, you must always persist and work around constraints to solve the task for the user. You MUST do your utmost best to finish the task and validate your work before yielding. If this mode is paired with `danger-full-access`, take advantage of it to deliver the best outcome for the user. Further, in this mode, your default testing philosophy is overridden: Even if you don't see local patterns for testing, you may add tests and scripts to validate your work. Just remove them before yielding. -When you are running with approvals `on-request`, and sandboxing enabled, here are scenarios where you'll need to request approval: -- You need to run a command that writes to a directory that requires it (e.g. running tests that write to /tmp) +When you are running with `approval_policy == on-request`, and sandboxing enabled, here are scenarios where you'll need to request approval: +- You need to run a command that writes to a directory that requires it (e.g. running tests that write to /var) - You need to run a GUI app (e.g., open/xdg-open/osascript) to open browsers or files. - You are running sandboxed and need to run a command that requires network access (e.g. installing packages) -- If you run a command that is important to solving the user's query, but it fails because of sandboxing, rerun the command with approval. +- If you run a command that is important to solving the user's query, but it fails because of sandboxing, rerun the command with approval. ALWAYS proceed to use the `with_escalated_permissions` and `justification` parameters - do not message the user before requesting approval for the command. - You are about to take a potentially destructive action such as an `rm` or `git reset` that the user did not explicitly ask for - (for all of these, you should weigh alternative paths that do not require approval) -When sandboxing is set to read-only, you'll need to request approval for any command that isn't a read. +When `sandbox_mode` is set to read-only, you'll need to request approval for any command that isn't a read. You will be told what filesystem sandboxing, network sandboxing, and approval mode are active in a developer or user message. If you are not told about this, assume that you are running with workspace-write, network sandboxing enabled, and approval on-failure. +Although they introduce friction to the user because your work is paused until the user responds, you should leverage them when necessary to accomplish important work. If the completing the task requires escalated permissions, Do not let these settings or the sandbox deter you from attempting to accomplish the user's task unless it is set to "never", in which case never ask for approvals. + +When requesting approval to execute a command that will require escalated privileges: + - Provide the `with_escalated_permissions` parameter with the boolean value true + - Include a short, 1 sentence explanation for why you need to enable `with_escalated_permissions` in the justification parameter + ## Special user requests - If the user makes a simple request (such as asking for the time) which you can fulfill by running a terminal command (such as `date`), you should do so. diff --git a/codex-rs/core/src/codex.rs b/codex-rs/core/src/codex.rs index 34d7a67243ee..474e459c25dc 100644 --- a/codex-rs/core/src/codex.rs +++ b/codex-rs/core/src/codex.rs @@ -2113,6 +2113,7 @@ async fn submission_loop( debug!("Agent loop exited"); } +// Intentionally omit upstream review thread spawning; our fork handles review flows differently. /// Takes a user message as input and runs a loop where, at each turn, the model /// replies with either: /// @@ -2134,6 +2135,8 @@ async fn run_agent(sess: Arc, sub_id: String, input: Vec) { if sess.tx_event.send(event).await.is_err() { return; } + // Continue with our fork's history and input handling. + // Debug logging for ephemeral images let ephemeral_count = input diff --git a/codex-rs/core/src/config.rs b/codex-rs/core/src/config.rs index 7729e733d055..5d89c3b01500 100644 --- a/codex-rs/core/src/config.rs +++ b/codex-rs/core/src/config.rs @@ -42,7 +42,7 @@ use toml_edit::Item as TomlItem; use toml_edit::Table as TomlTable; const OPENAI_DEFAULT_MODEL: &str = "gpt-5"; -const OPENAI_DEFAULT_REVIEW_MODEL: &str = "gpt-5"; +const OPENAI_DEFAULT_REVIEW_MODEL: &str = "gpt-5-codex"; pub const GPT_5_CODEX_MEDIUM_MODEL: &str = "gpt-5-codex"; /// Maximum number of bytes of the documentation that will be embedded. Larger @@ -2202,7 +2202,7 @@ model_verbosity = "high" assert_eq!( Config { model: "o3".to_string(), - review_model: "gpt-5".to_string(), + review_model: OPENAI_DEFAULT_REVIEW_MODEL.to_string(), model_family: find_family_for_model("o3").expect("known model slug"), model_context_window: Some(200_000), model_max_output_tokens: Some(100_000), @@ -2265,7 +2265,7 @@ model_verbosity = "high" )?; let expected_gpt3_profile_config = Config { model: "gpt-3.5-turbo".to_string(), - review_model: "gpt-5".to_string(), + review_model: OPENAI_DEFAULT_REVIEW_MODEL.to_string(), model_family: find_family_for_model("gpt-3.5-turbo").expect("known model slug"), model_context_window: Some(16_385), model_max_output_tokens: Some(4_096), @@ -2344,7 +2344,7 @@ model_verbosity = "high" )?; let expected_zdr_profile_config = Config { model: "o3".to_string(), - review_model: "gpt-5".to_string(), + review_model: OPENAI_DEFAULT_REVIEW_MODEL.to_string(), model_family: find_family_for_model("o3").expect("known model slug"), model_context_window: Some(200_000), model_max_output_tokens: Some(100_000), @@ -2409,7 +2409,7 @@ model_verbosity = "high" )?; let expected_gpt5_profile_config = Config { model: "gpt-5".to_string(), - review_model: "gpt-5".to_string(), + review_model: OPENAI_DEFAULT_REVIEW_MODEL.to_string(), model_family: find_family_for_model("gpt-5").expect("known model slug"), model_context_window: Some(400_000), model_max_output_tokens: Some(128_000), diff --git a/codex-rs/core/src/conversation_history.rs b/codex-rs/core/src/conversation_history.rs index f20197f5e590..3e56ba35d80e 100644 --- a/codex-rs/core/src/conversation_history.rs +++ b/codex-rs/core/src/conversation_history.rs @@ -58,8 +58,9 @@ fn is_api_message(message: &ResponseItem) -> bool { | ResponseItem::CustomToolCall { .. } | ResponseItem::CustomToolCallOutput { .. } | ResponseItem::LocalShellCall { .. } - | ResponseItem::Reasoning { .. } => true, - ResponseItem::WebSearchCall { .. } | ResponseItem::Other => false, + | ResponseItem::Reasoning { .. } + | ResponseItem::WebSearchCall { .. } => true, + ResponseItem::Other => false, } } diff --git a/codex-rs/core/src/environment_context.rs b/codex-rs/core/src/environment_context.rs index 1d755841e8df..47e2b1bdea3c 100644 --- a/codex-rs/core/src/environment_context.rs +++ b/codex-rs/core/src/environment_context.rs @@ -71,8 +71,33 @@ impl EnvironmentContext { shell, } } + + /// Compares two environment contexts, ignoring the shell. Useful when + /// comparing turn to turn, since the initial environment_context will + /// include the shell, and then it is not configurable from turn to turn. + #[cfg(test)] + pub fn equals_except_shell(&self, other: &EnvironmentContext) -> bool { + let EnvironmentContext { + cwd, + approval_policy, + sandbox_mode, + network_access, + writable_roots, + // should compare all fields except shell + shell: _, + } = other; + + self.cwd == *cwd + && self.approval_policy == *approval_policy + && self.sandbox_mode == *sandbox_mode + && self.network_access == *network_access + && self.writable_roots == *writable_roots + } } +// Note: The core no longer exposes `TurnContext` here; callers construct +// `EnvironmentContext` directly via `EnvironmentContext::new(...)`. + impl EnvironmentContext { /// Serializes the environment context to XML. Libraries like `quick-xml` /// require custom macros to handle Enums with newtypes, so we just do it @@ -140,6 +165,9 @@ impl From for ResponseItem { #[cfg(test)] mod tests { + use crate::shell::BashShell; + use crate::shell::ZshShell; + use super::*; use pretty_assertions::assert_eq; @@ -210,4 +238,82 @@ mod tests { assert_eq!(context.serialize_to_xml(), expected); } + + #[test] + fn equals_except_shell_compares_approval_policy() { + // Approval policy + let context1 = EnvironmentContext::new( + Some(PathBuf::from("/repo")), + Some(AskForApproval::OnRequest), + Some(workspace_write_policy(vec!["/repo"], false)), + None, + ); + let context2 = EnvironmentContext::new( + Some(PathBuf::from("/repo")), + Some(AskForApproval::Never), + Some(workspace_write_policy(vec!["/repo"], true)), + None, + ); + assert!(!context1.equals_except_shell(&context2)); + } + + #[test] + fn equals_except_shell_compares_sandbox_policy() { + let context1 = EnvironmentContext::new( + Some(PathBuf::from("/repo")), + Some(AskForApproval::OnRequest), + Some(SandboxPolicy::new_read_only_policy()), + None, + ); + let context2 = EnvironmentContext::new( + Some(PathBuf::from("/repo")), + Some(AskForApproval::OnRequest), + Some(SandboxPolicy::new_workspace_write_policy()), + None, + ); + + assert!(!context1.equals_except_shell(&context2)); + } + + #[test] + fn equals_except_shell_compares_workspace_write_policy() { + let context1 = EnvironmentContext::new( + Some(PathBuf::from("/repo")), + Some(AskForApproval::OnRequest), + Some(workspace_write_policy(vec!["/repo", "/tmp", "/var"], false)), + None, + ); + let context2 = EnvironmentContext::new( + Some(PathBuf::from("/repo")), + Some(AskForApproval::OnRequest), + Some(workspace_write_policy(vec!["/repo", "/tmp"], true)), + None, + ); + + assert!(!context1.equals_except_shell(&context2)); + } + + #[test] + fn equals_except_shell_ignores_shell() { + let context1 = EnvironmentContext::new( + Some(PathBuf::from("/repo")), + Some(AskForApproval::OnRequest), + Some(workspace_write_policy(vec!["/repo"], false)), + Some(Shell::Bash(BashShell { + shell_path: "/bin/bash".into(), + bashrc_path: "/home/user/.bashrc".into(), + })), + ); + let context2 = EnvironmentContext::new( + Some(PathBuf::from("/repo")), + Some(AskForApproval::OnRequest), + Some(workspace_write_policy(vec!["/repo"], false)), + Some(Shell::Zsh(ZshShell { + shell_path: "/bin/zsh".into(), + zshrc_path: "/home/user/.zshrc".into(), + })), + ); + + assert!(context1.equals_except_shell(&context2)); + } } diff --git a/codex-rs/core/src/lib.rs b/codex-rs/core/src/lib.rs index f97b53c814fe..ec38191c0129 100644 --- a/codex-rs/core/src/lib.rs +++ b/codex-rs/core/src/lib.rs @@ -52,6 +52,7 @@ pub use model_provider_info::built_in_model_providers; pub use model_provider_info::create_oss_provider_with_base_url; mod conversation_manager; mod event_mapping; +pub mod review_format; pub use codex_protocol::protocol::InitialHistory; pub use conversation_manager::ConversationManager; pub use conversation_manager::NewConversation; @@ -93,6 +94,8 @@ pub use crate::client_common::Prompt; pub use crate::client_common::TextFormat; pub use crate::client_common::ResponseEvent; pub use crate::client_common::ResponseStream; +// Upstream also exports REVIEW_PROMPT; include it to preserve compatibility. +pub use crate::client_common::REVIEW_PROMPT; pub use codex_protocol::models::ContentItem; pub use codex_protocol::models::ReasoningItemContent; pub use codex_protocol::models::ResponseItem; diff --git a/codex-rs/core/src/protocol.rs b/codex-rs/core/src/protocol.rs index 7f0ec9036474..502fecefae77 100644 --- a/codex-rs/core/src/protocol.rs +++ b/codex-rs/core/src/protocol.rs @@ -25,6 +25,13 @@ use crate::model_provider_info::ModelProviderInfo; use crate::parse_command::ParsedCommand; use crate::plan_tool::UpdatePlanArgs; +// Re-export review types from the shared protocol crate so callers can use +// `codex_core::protocol::ReviewFinding` and friends. +pub use codex_protocol::protocol::ReviewCodeLocation; +pub use codex_protocol::protocol::ReviewFinding; +pub use codex_protocol::protocol::ReviewLineRange; +pub use codex_protocol::protocol::ReviewOutputEvent; + /// Submission Queue Entry - requests from user #[derive(Debug, Clone, Deserialize, Serialize)] pub struct Submission { diff --git a/codex-rs/core/src/review_format.rs b/codex-rs/core/src/review_format.rs new file mode 100644 index 000000000000..272010d56431 --- /dev/null +++ b/codex-rs/core/src/review_format.rs @@ -0,0 +1,55 @@ +use crate::protocol::ReviewFinding; + +// Note: We keep this module UI-agnostic. It returns plain strings that +// higher layers (e.g., TUI) may style as needed. + +fn format_location(item: &ReviewFinding) -> String { + let path = item.code_location.absolute_file_path.display(); + let start = item.code_location.line_range.start; + let end = item.code_location.line_range.end; + format!("{path}:{start}-{end}") +} + +/// Format a full review findings block as plain text lines. +/// +/// - When `selection` is `Some`, each item line includes a checkbox marker: +/// "[x]" for selected items and "[ ]" for unselected. Missing indices +/// default to selected. +/// - When `selection` is `None`, the marker is omitted and a simple bullet is +/// rendered ("- Title — path:start-end"). +pub fn format_review_findings_block( + findings: &[ReviewFinding], + selection: Option<&[bool]>, +) -> String { + let mut lines: Vec = Vec::new(); + + // Header + let header = if findings.len() > 1 { + "Full review comments:" + } else { + "Review comment:" + }; + lines.push(header.to_string()); + + for (idx, item) in findings.iter().enumerate() { + lines.push(String::new()); + + let title = &item.title; + let location = format_location(item); + + if let Some(flags) = selection { + // Default to selected if index is out of bounds. + let checked = flags.get(idx).copied().unwrap_or(true); + let marker = if checked { "[x]" } else { "[ ]" }; + lines.push(format!("- {marker} {title} — {location}")); + } else { + lines.push(format!("- {title} — {location}")); + } + + for body_line in item.body.lines() { + lines.push(format!(" {body_line}")); + } + } + + lines.join("\n") +} diff --git a/codex-rs/core/src/rollout/policy.rs b/codex-rs/core/src/rollout/policy.rs index 185c8a7d34e7..0a63cf901c65 100644 --- a/codex-rs/core/src/rollout/policy.rs +++ b/codex-rs/core/src/rollout/policy.rs @@ -28,8 +28,9 @@ pub(crate) fn should_persist_response_item(item: &ResponseItem) -> bool { | ResponseItem::FunctionCall { .. } | ResponseItem::FunctionCallOutput { .. } | ResponseItem::CustomToolCall { .. } - | ResponseItem::CustomToolCallOutput { .. } => true, - ResponseItem::WebSearchCall { .. } | ResponseItem::Other => false, + | ResponseItem::CustomToolCallOutput { .. } + | ResponseItem::WebSearchCall { .. } => true, + ResponseItem::Other => false, } } diff --git a/codex-rs/core/src/shell.rs b/codex-rs/core/src/shell.rs index dd2def5c357c..bcd5fa9a696e 100644 --- a/codex-rs/core/src/shell.rs +++ b/codex-rs/core/src/shell.rs @@ -6,20 +6,20 @@ use std::path::PathBuf; #[derive(Debug, PartialEq, Eq, Clone, Serialize, Deserialize)] pub struct ZshShell { - shell_path: String, - zshrc_path: String, + pub(crate) shell_path: String, + pub(crate) zshrc_path: String, } #[derive(Debug, PartialEq, Eq, Clone, Serialize, Deserialize)] pub struct BashShell { - shell_path: String, - bashrc_path: String, + pub(crate) shell_path: String, + pub(crate) bashrc_path: String, } #[derive(Debug, PartialEq, Eq, Clone, Serialize, Deserialize)] pub struct PowerShellConfig { - exe: String, // Executable name or path, e.g. "pwsh" or "powershell.exe". - bash_exe_fallback: Option, // In case the model generates a bash command. + pub(crate) exe: String, // Executable name or path, e.g. "pwsh" or "powershell.exe". + pub(crate) bash_exe_fallback: Option, // In case the model generates a bash command. } #[derive(Debug, PartialEq, Eq, Clone, Serialize, Deserialize)] diff --git a/codex-rs/core/tests/suite/prompt_caching.rs b/codex-rs/core/tests/suite/prompt_caching.rs index 94fce96ec896..288bb3ce1114 100644 --- a/codex-rs/core/tests/suite/prompt_caching.rs +++ b/codex-rs/core/tests/suite/prompt_caching.rs @@ -12,6 +12,7 @@ use codex_core::protocol::Op; use codex_core::protocol::SandboxPolicy; use codex_core::protocol_config_types::ReasoningEffort; use codex_core::protocol_config_types::ReasoningSummary; +use codex_core::shell::Shell; use codex_core::shell::default_user_shell; use core_test_support::load_default_config_for_test; use core_test_support::load_sse_fixture_with_id; @@ -23,6 +24,30 @@ use wiremock::ResponseTemplate; use wiremock::matchers::method; use wiremock::matchers::path; +fn text_user_input(text: String) -> serde_json::Value { + serde_json::json!({ + "type": "message", + "role": "user", + "content": [ { "type": "input_text", "text": text } ] + }) +} + +fn default_env_context_str(cwd: &str, shell: &Shell) -> String { + format!( + r#" + {} + on-request + read-only + restricted +{}"#, + cwd, + match shell.name() { + Some(name) => format!(" {name}\n"), + None => String::new(), + } + ) +} + /// Build minimal SSE stream with completed marker using the JSON fixture. fn sse_completed(id: &str) -> String { load_sse_fixture_with_id("tests/fixtures/completed_template.json", id) @@ -553,12 +578,262 @@ async fn per_turn_overrides_keep_cached_prefix_and_key_constant() { "role": "user", "content": [ { "type": "input_text", "text": "hello 2" } ] }); + let expected_env_text_2 = format!( + r#" + {} + never + workspace-write + enabled + + {} + +"#, + new_cwd.path().to_string_lossy(), + writable.path().to_string_lossy(), + ); + let expected_env_msg_2 = serde_json::json!({ + "type": "message", + "role": "user", + "content": [ { "type": "input_text", "text": expected_env_text_2 } ] + }); let expected_body2 = serde_json::json!( [ body1["input"].as_array().unwrap().as_slice(), - [expected_user_message_2].as_slice(), + [expected_env_msg_2, expected_user_message_2].as_slice(), ] .concat() ); assert_eq!(body2["input"], expected_body2); } + +#[tokio::test(flavor = "multi_thread", worker_threads = 2)] +async fn send_user_turn_with_no_changes_does_not_send_environment_context() { + use pretty_assertions::assert_eq; + + let server = MockServer::start().await; + + let sse = sse_completed("resp"); + let template = ResponseTemplate::new(200) + .insert_header("content-type", "text/event-stream") + .set_body_raw(sse, "text/event-stream"); + + Mock::given(method("POST")) + .and(path("/v1/responses")) + .respond_with(template) + .expect(2) + .mount(&server) + .await; + + let model_provider = ModelProviderInfo { + base_url: Some(format!("{}/v1", server.uri())), + ..built_in_model_providers()["openai"].clone() + }; + + let cwd = TempDir::new().unwrap(); + let codex_home = TempDir::new().unwrap(); + let mut config = load_default_config_for_test(&codex_home); + config.cwd = cwd.path().to_path_buf(); + config.model_provider = model_provider; + config.user_instructions = Some("be consistent and helpful".to_string()); + + let default_cwd = config.cwd.clone(); + let default_approval_policy = config.approval_policy; + let default_sandbox_policy = config.sandbox_policy.clone(); + let default_model = config.model.clone(); + let default_effort = config.model_reasoning_effort; + let default_summary = config.model_reasoning_summary; + + let conversation_manager = + ConversationManager::with_auth(CodexAuth::from_api_key("Test API Key")); + let codex = conversation_manager + .new_conversation(config) + .await + .expect("create new conversation") + .conversation; + + codex + .submit(Op::UserTurn { + items: vec![InputItem::Text { + text: "hello 1".into(), + }], + cwd: default_cwd.clone(), + approval_policy: default_approval_policy, + sandbox_policy: default_sandbox_policy.clone(), + model: default_model.clone(), + effort: default_effort, + summary: default_summary, + }) + .await + .unwrap(); + wait_for_event(&codex, |ev| matches!(ev, EventMsg::TaskComplete(_))).await; + + codex + .submit(Op::UserTurn { + items: vec![InputItem::Text { + text: "hello 2".into(), + }], + cwd: default_cwd.clone(), + approval_policy: default_approval_policy, + sandbox_policy: default_sandbox_policy.clone(), + model: default_model.clone(), + effort: default_effort, + summary: default_summary, + }) + .await + .unwrap(); + wait_for_event(&codex, |ev| matches!(ev, EventMsg::TaskComplete(_))).await; + + let requests = server.received_requests().await.unwrap(); + assert_eq!(requests.len(), 2, "expected two POST requests"); + + let body1 = requests[0].body_json::().unwrap(); + let body2 = requests[1].body_json::().unwrap(); + + let shell = default_user_shell().await; + let expected_ui_text = + "\n\nbe consistent and helpful\n\n"; + let expected_ui_msg = text_user_input(expected_ui_text.to_string()); + + let expected_env_msg_1 = text_user_input(default_env_context_str( + &cwd.path().to_string_lossy(), + &shell, + )); + let expected_user_message_1 = text_user_input("hello 1".to_string()); + + let expected_input_1 = serde_json::Value::Array(vec![ + expected_ui_msg.clone(), + expected_env_msg_1.clone(), + expected_user_message_1.clone(), + ]); + assert_eq!(body1["input"], expected_input_1); + + let expected_user_message_2 = text_user_input("hello 2".to_string()); + let expected_input_2 = serde_json::Value::Array(vec![ + expected_ui_msg, + expected_env_msg_1, + expected_user_message_1, + expected_user_message_2, + ]); + assert_eq!(body2["input"], expected_input_2); +} + +#[tokio::test(flavor = "multi_thread", worker_threads = 2)] +async fn send_user_turn_with_changes_sends_environment_context() { + use pretty_assertions::assert_eq; + + let server = MockServer::start().await; + + let sse = sse_completed("resp"); + let template = ResponseTemplate::new(200) + .insert_header("content-type", "text/event-stream") + .set_body_raw(sse, "text/event-stream"); + + Mock::given(method("POST")) + .and(path("/v1/responses")) + .respond_with(template) + .expect(2) + .mount(&server) + .await; + + let model_provider = ModelProviderInfo { + base_url: Some(format!("{}/v1", server.uri())), + ..built_in_model_providers()["openai"].clone() + }; + + let cwd = TempDir::new().unwrap(); + let codex_home = TempDir::new().unwrap(); + let mut config = load_default_config_for_test(&codex_home); + config.cwd = cwd.path().to_path_buf(); + config.model_provider = model_provider; + config.user_instructions = Some("be consistent and helpful".to_string()); + + let default_cwd = config.cwd.clone(); + let default_approval_policy = config.approval_policy; + let default_sandbox_policy = config.sandbox_policy.clone(); + let default_model = config.model.clone(); + let default_effort = config.model_reasoning_effort; + let default_summary = config.model_reasoning_summary; + + let conversation_manager = + ConversationManager::with_auth(CodexAuth::from_api_key("Test API Key")); + let codex = conversation_manager + .new_conversation(config.clone()) + .await + .expect("create new conversation") + .conversation; + + codex + .submit(Op::UserTurn { + items: vec![InputItem::Text { + text: "hello 1".into(), + }], + cwd: default_cwd.clone(), + approval_policy: default_approval_policy, + sandbox_policy: default_sandbox_policy.clone(), + model: default_model, + effort: default_effort, + summary: default_summary, + }) + .await + .unwrap(); + wait_for_event(&codex, |ev| matches!(ev, EventMsg::TaskComplete(_))).await; + + codex + .submit(Op::UserTurn { + items: vec![InputItem::Text { + text: "hello 2".into(), + }], + cwd: default_cwd.clone(), + approval_policy: AskForApproval::Never, + sandbox_policy: SandboxPolicy::DangerFullAccess, + model: "o3".to_string(), + effort: Some(ReasoningEffort::High), + summary: ReasoningSummary::Detailed, + }) + .await + .unwrap(); + wait_for_event(&codex, |ev| matches!(ev, EventMsg::TaskComplete(_))).await; + + let requests = server.received_requests().await.unwrap(); + assert_eq!(requests.len(), 2, "expected two POST requests"); + + let body1 = requests[0].body_json::().unwrap(); + let body2 = requests[1].body_json::().unwrap(); + + let shell = default_user_shell().await; + let expected_ui_text = + "\n\nbe consistent and helpful\n\n"; + let expected_ui_msg = serde_json::json!({ + "type": "message", + "role": "user", + "content": [ { "type": "input_text", "text": expected_ui_text } ] + }); + let expected_env_text_1 = default_env_context_str(&default_cwd.to_string_lossy(), &shell); + let expected_env_msg_1 = text_user_input(expected_env_text_1); + let expected_user_message_1 = text_user_input("hello 1".to_string()); + let expected_input_1 = serde_json::Value::Array(vec![ + expected_ui_msg.clone(), + expected_env_msg_1.clone(), + expected_user_message_1.clone(), + ]); + assert_eq!(body1["input"], expected_input_1); + + let expected_env_msg_2 = text_user_input(format!( + r#" + {} + never + danger-full-access + enabled +"#, + default_cwd.to_string_lossy() + )); + let expected_user_message_2 = text_user_input("hello 2".to_string()); + let expected_input_2 = serde_json::Value::Array(vec![ + expected_ui_msg, + expected_env_msg_1, + expected_user_message_1, + expected_env_msg_2, + expected_user_message_2, + ]); + assert_eq!(body2["input"], expected_input_2); +} diff --git a/codex-rs/core/tests/suite/review.rs b/codex-rs/core/tests/suite/review.rs index f65913a2ca1b..a20807e4eceb 100644 --- a/codex-rs/core/tests/suite/review.rs +++ b/codex-rs/core/tests/suite/review.rs @@ -1,9 +1,14 @@ use codex_core::CodexAuth; use codex_core::CodexConversation; +use codex_core::ContentItem; use codex_core::ConversationManager; use codex_core::ModelProviderInfo; +use codex_core::REVIEW_PROMPT; +use codex_core::ResponseItem; use codex_core::built_in_model_providers; use codex_core::config::Config; +use codex_core::protocol::ConversationPathResponseEvent; +use codex_core::protocol::ENVIRONMENT_CONTEXT_OPEN_TAG; use codex_core::protocol::EventMsg; use codex_core::protocol::ExitedReviewModeEvent; use codex_core::protocol::InputItem; @@ -13,6 +18,8 @@ use codex_core::protocol::ReviewFinding; use codex_core::protocol::ReviewLineRange; use codex_core::protocol::ReviewOutputEvent; use codex_core::protocol::ReviewRequest; +use codex_core::protocol::RolloutItem; +use codex_core::protocol::RolloutLine; use codex_core::spawn::CODEX_SANDBOX_NETWORK_DISABLED_ENV_VAR; use core_test_support::load_default_config_for_test; use core_test_support::load_sse_fixture_with_id_from_str; @@ -115,6 +122,46 @@ async fn review_op_emits_lifecycle_and_review_output() { assert_eq!(expected, review); let _complete = wait_for_event(&codex, |ev| matches!(ev, EventMsg::TaskComplete(_))).await; + // Also verify that a user message with the header and a formatted finding + // was recorded back in the parent session's rollout. + codex.submit(Op::GetPath).await.unwrap(); + let history_event = + wait_for_event(&codex, |ev| matches!(ev, EventMsg::ConversationPath(_))).await; + let path = match history_event { + EventMsg::ConversationPath(ConversationPathResponseEvent { path, .. }) => path, + other => panic!("expected ConversationPath event, got {other:?}"), + }; + let text = std::fs::read_to_string(&path).expect("read rollout file"); + + let mut saw_header = false; + let mut saw_finding_line = false; + for line in text.lines() { + if line.trim().is_empty() { + continue; + } + let v: serde_json::Value = serde_json::from_str(line).expect("jsonl line"); + let rl: RolloutLine = serde_json::from_value(v).expect("rollout line"); + if let RolloutItem::ResponseItem(ResponseItem::Message { role, content, .. }) = rl.item + && role == "user" + { + for c in content { + if let ContentItem::InputText { text } = c { + if text.contains("full review output from reviewer model") { + saw_header = true; + } + if text.contains("- Prefer Stylize helpers — /tmp/file.rs:10-20") { + saw_finding_line = true; + } + } + } + } + } + assert!(saw_header, "user header missing from rollout"); + assert!( + saw_finding_line, + "formatted finding line missing from rollout" + ); + server.verify().await; } @@ -419,17 +466,73 @@ async fn review_input_isolated_from_parent_history() { .await; let _complete = wait_for_event(&codex, |ev| matches!(ev, EventMsg::TaskComplete(_))).await; - // Assert the request `input` contains only the single review user message. + // Assert the request `input` contains the environment context followed by the review prompt. let request = &server.received_requests().await.unwrap()[0]; let body = request.body_json::().unwrap(); - let expected_input = serde_json::json!([ + let input = body["input"].as_array().expect("input array"); + assert_eq!( + input.len(), + 2, + "expected environment context and review prompt" + ); + + let env_msg = &input[0]; + assert_eq!(env_msg["type"].as_str().unwrap(), "message"); + assert_eq!(env_msg["role"].as_str().unwrap(), "user"); + let env_text = env_msg["content"][0]["text"].as_str().expect("env text"); + assert!( + env_text.starts_with(ENVIRONMENT_CONTEXT_OPEN_TAG), + "environment context must be the first item" + ); + assert!( + env_text.contains(""), + "environment context should include cwd" + ); + + let review_msg = &input[1]; + assert_eq!(review_msg["type"].as_str().unwrap(), "message"); + assert_eq!(review_msg["role"].as_str().unwrap(), "user"); + assert_eq!( + review_msg["content"][0]["text"].as_str().unwrap(), + format!("{REVIEW_PROMPT}\n\n---\n\nNow, here's your task: Please review only this",) + ); + + // Also verify that a user interruption note was recorded in the rollout. + codex.submit(Op::GetPath).await.unwrap(); + let history_event = + wait_for_event(&codex, |ev| matches!(ev, EventMsg::ConversationPath(_))).await; + let path = match history_event { + EventMsg::ConversationPath(ConversationPathResponseEvent { path, .. }) => path, + other => panic!("expected ConversationPath event, got {other:?}"), + }; + let text = std::fs::read_to_string(&path).expect("read rollout file"); + let mut saw_interruption_message = false; + for line in text.lines() { + if line.trim().is_empty() { + continue; + } + let v: serde_json::Value = serde_json::from_str(line).expect("jsonl line"); + let rl: RolloutLine = serde_json::from_value(v).expect("rollout line"); + if let RolloutItem::ResponseItem(ResponseItem::Message { role, content, .. }) = rl.item + && role == "user" { - "type": "message", - "role": "user", - "content": [{"type": "input_text", "text": review_prompt}] + for c in content { + if let ContentItem::InputText { text } = c + && text.contains("User initiated a review task, but was interrupted.") + { + saw_interruption_message = true; + break; + } + } + } + if saw_interruption_message { + break; } - ]); - assert_eq!(body["input"], expected_input); + } + assert!( + saw_interruption_message, + "expected user interruption message in rollout" + ); server.verify().await; } diff --git a/docs/config.md b/docs/config.md index 3983844c5c9e..df67240f1a0a 100644 --- a/docs/config.md +++ b/docs/config.md @@ -69,17 +69,6 @@ base_url = "https://api.mistral.ai/v1" env_key = "MISTRAL_API_KEY" ``` -Note that Azure requires `api-version` to be passed as a query parameter, so be sure to specify it as part of `query_params` when defining the Azure provider: - -```toml -[model_providers.azure] -name = "Azure" -# Make sure you set the appropriate subdomain for this URL. -base_url = "https://YOUR_PROJECT_NAME.openai.azure.com/openai" -env_key = "AZURE_OPENAI_API_KEY" # Or "OPENAI_API_KEY", whichever you use. -query_params = { api-version = "2025-04-01-preview" } -``` - It is also possible to configure a provider to include extra HTTP headers with a request. These can be hardcoded values (`http_headers`) or values read from environment variables (`env_http_headers`): ```toml @@ -96,6 +85,22 @@ http_headers = { "X-Example-Header" = "example-value" } env_http_headers = { "X-Example-Features" = "EXAMPLE_FEATURES" } ``` +### Azure model provider example + +Note that Azure requires `api-version` to be passed as a query parameter, so be sure to specify it as part of `query_params` when defining the Azure provider: + +```toml +[model_providers.azure] +name = "Azure" +# Make sure you set the appropriate subdomain for this URL. +base_url = "https://YOUR_PROJECT_NAME.openai.azure.com/openai" +env_key = "AZURE_OPENAI_API_KEY" # Or "OPENAI_API_KEY", whichever you use. +query_params = { api-version = "2025-04-01-preview" } +wire_api = "responses" +``` + +Export your key before launching Codex: `export AZURE_OPENAI_API_KEY=…` + ### Per-provider network tuning The following optional settings control retry behaviour and streaming idle timeouts **per model provider**. They must be specified inside the corresponding `[model_providers.]` block in `config.toml`. (Older releases accepted top‑level keys; those are now ignored.) diff --git a/docs/release_management.md b/docs/release_management.md index 1b81bc3eb23c..8f3ad7966395 100644 --- a/docs/release_management.md +++ b/docs/release_management.md @@ -23,14 +23,7 @@ When the workflow finishes, the GitHub Release is "done," but you still have to ## Publishing to npm -After the GitHub Release is done, you can publish to npm. Note the GitHub Release includes the appropriate artifact for npm (which is the output of `npm pack`), which should be named `codex-npm-VERSION.tgz`. To publish to npm, run: - -``` -VERSION=0.21.0 -./scripts/publish_to_npm.py "$VERSION" -``` - -Note that you must have permissions to publish to https://www.npmjs.com/package/@openai/codex for this to succeed. +The GitHub Action is responsible for publishing to npm. ## Publishing to Homebrew