Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
52 changes: 52 additions & 0 deletions scripts/run_evolution.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,52 @@
import asyncio
import structlog

from skyvern.evolution.evolve import Evolve
from skyvern.evolution.prompt_manager import PromptManager

LOG = structlog.get_logger()

async def main():
"""
Main function to run the prompt evolution loop.
"""
LOG.info("Initializing prompt evolution process...")

prompt_manager = PromptManager()
evolver = Evolve(prompt_manager)

# Check if the baseline prompt was loaded correctly
if not prompt_manager.get_prompt("baseline"):
LOG.error("Failed to load baseline prompt. Aborting evolution process.")
return

LOG.info("Starting evolution loop...")

# Run the evolution loop for a few generations as a demonstration
num_generations = 5
for i in range(num_generations):
LOG.info(f"--- Generation {i+1}/{num_generations} ---")

# Evolve the prompts to create new variations
await evolver.evolve_prompts()

# Evaluate the performance of the new prompts
evolver.evaluate_and_score_prompts()

# Log the best prompt of the current generation
best_prompt = prompt_manager.get_best_prompt()
if best_prompt:
LOG.info(f"Best prompt of generation {i+1}: '{best_prompt.name}' with score {best_prompt.score}")
else:
LOG.warning("No prompts in manager after evolution and evaluation.")

# In a real application, you might add a delay or run this as a continuous background process
await asyncio.sleep(5)

LOG.info("Evolution loop finished.")
Comment on lines +9 to +46
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion | 🟠 Major

Add type hints to the main function.

The main() function is missing type hints for its return value, which is required by the coding guidelines.

As per coding guidelines:

-async def main():
+async def main() -> None:
     """
     Main function to run the prompt evolution loop.
     """
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
async def main():
"""
Main function to run the prompt evolution loop.
"""
LOG.info("Initializing prompt evolution process...")
prompt_manager = PromptManager()
evolver = Evolve(prompt_manager)
# Check if the baseline prompt was loaded correctly
if not prompt_manager.get_prompt("baseline"):
LOG.error("Failed to load baseline prompt. Aborting evolution process.")
return
LOG.info("Starting evolution loop...")
# Run the evolution loop for a few generations as a demonstration
num_generations = 5
for i in range(num_generations):
LOG.info(f"--- Generation {i+1}/{num_generations} ---")
# Evolve the prompts to create new variations
await evolver.evolve_prompts()
# Evaluate the performance of the new prompts
evolver.evaluate_and_score_prompts()
# Log the best prompt of the current generation
best_prompt = prompt_manager.get_best_prompt()
if best_prompt:
LOG.info(f"Best prompt of generation {i+1}: '{best_prompt.name}' with score {best_prompt.score}")
else:
LOG.warning("No prompts in manager after evolution and evaluation.")
# In a real application, you might add a delay or run this as a continuous background process
await asyncio.sleep(5)
LOG.info("Evolution loop finished.")
async def main() -> None:
"""
Main function to run the prompt evolution loop.
"""
LOG.info("Initializing prompt evolution process...")
prompt_manager = PromptManager()
evolver = Evolve(prompt_manager)
# Check if the baseline prompt was loaded correctly
if not prompt_manager.get_prompt("baseline"):
LOG.error("Failed to load baseline prompt. Aborting evolution process.")
return
LOG.info("Starting evolution loop...")
# Run the evolution loop for a few generations as a demonstration
num_generations = 5
for i in range(num_generations):
LOG.info(f"--- Generation {i+1}/{num_generations} ---")
# Evolve the prompts to create new variations
await evolver.evolve_prompts()
# Evaluate the performance of the new prompts
evolver.evaluate_and_score_prompts()
# Log the best prompt of the current generation
best_prompt = prompt_manager.get_best_prompt()
if best_prompt:
LOG.info(f"Best prompt of generation {i+1}: '{best_prompt.name}' with score {best_prompt.score}")
else:
LOG.warning("No prompts in manager after evolution and evaluation.")
# In a real application, you might add a delay or run this as a continuous background process
await asyncio.sleep(5)
LOG.info("Evolution loop finished.")
🤖 Prompt for AI Agents
In scripts/run_evolution.py around lines 9 to 46, the async main() function is
missing a return type annotation; update its signature to include an explicit
return type (async def main() -> None:) to satisfy the coding guidelines and
ensure the coroutine is annotated as returning None; no other behavioral changes
are required.


if __name__ == "__main__":
# This script needs to be run in an environment where the skyvern package is installed
# and the necessary configurations (like .env for LLM providers) are set up.
# Example: poetry run python scripts/run_evolution.py
asyncio.run(main())
1 change: 1 addition & 0 deletions skyvern/evolution/__init__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
# This file is intentionally left blank to mark the directory as a Python package.
74 changes: 74 additions & 0 deletions skyvern/evolution/evolve.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,74 @@
import structlog
import random
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove unused import 'random' if it's not used.

Suggested change
import random

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion | 🟠 Major

Remove unused import.

The random module is imported but never used in this file.

 import structlog
-import random
 
 from skyvern.forge.prompts import prompt_engine
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
import random
import structlog
from skyvern.forge.prompts import prompt_engine
🤖 Prompt for AI Agents
In skyvern/evolution/evolve.py around line 2, the file imports the random module
which is unused; remove the unused import statement (delete or comment out the
"import random" line) to clean up imports and avoid linter warnings.


from skyvern.forge.prompts import prompt_engine
from skyvern.forge.sdk.llm import LLM_API_HANDLER

LOG = structlog.get_logger()

class Evolve:
def __init__(self, prompt_manager):
self.prompt_manager = prompt_manager
self.evolution_count = 0

async def evolve_prompts(self):
"""
Takes the top-performing prompts and uses an LLM to generate new variations.
"""
best_prompt = self.prompt_manager.get_best_prompt()
if not best_prompt:
LOG.warning("No prompts found to evolve.")
return

LOG.info(f"Evolving prompt '{best_prompt.name}' with score {best_prompt.score}")

# Use an LLM to generate a new variation of the prompt.
evolution_prompt = prompt_engine.load_prompt(
"evolve-prompt",
prompt_to_evolve=best_prompt.template,
)

# In a real implementation, a 'step' object would be passed here.
# This is a placeholder for demonstration purposes.
response = await LLM_API_HANDLER(prompt=evolution_prompt, step=None)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider adding error handling around the LLM_API_HANDLER call to catch unexpected failures.


# Assuming the response is the raw string of the new prompt
evolved_prompt_str = response if isinstance(response, str) else str(response)

# Add the new prompt to the population
self.evolution_count += 1
new_prompt_name = f"evolved_v{self.evolution_count}"
self.prompt_manager.add_prompt(new_prompt_name, evolved_prompt_str, score=0)

LOG.info(f"Evolved new prompt '{new_prompt_name}': {evolved_prompt_str[:100]}...")

def evaluate_and_score_prompts(self):
"""
Simulates the evaluation of prompts and updates their scores based on deterministic criteria.
In a real-world scenario, this would involve running benchmarks.
"""
LOG.info("Evaluating and scoring prompts...")
for name, prompt in self.prompt_manager.prompts.items():
# Skip the baseline prompt as its score is fixed.
if name == "baseline":
continue

score = 0
# Score based on length (ideal length between 500 and 1500 characters)
length = len(prompt.template)
if 500 <= length <= 1500:
score += 0.5
else:
score -= 0.2

# Score based on presence of keywords
keywords = ["action", "reasoning", "COMPLETE", "TERMINATE", "element", "goal"]
for keyword in keywords:
if keyword in prompt.template.lower():
score += 0.2

# Normalize score to be between 0 and 2 for this simulation
normalized_score = max(0, min(2, score))

self.prompt_manager.update_score(name, normalized_score)
LOG.info(f"Evaluated '{name}', assigned score: {normalized_score}")
Comment on lines +9 to +74
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion | 🟠 Major

Add type hints to the Evolve class.

The entire Evolve class is missing type hints for method parameters and return values, which violates the coding guidelines for Python 3.11+.

As per coding guidelines:

+from typing import TYPE_CHECKING
+
+if TYPE_CHECKING:
+    from skyvern.evolution.prompt_manager import PromptManager
+
 class Evolve:
-    def __init__(self, prompt_manager):
+    def __init__(self, prompt_manager: "PromptManager") -> None:
         self.prompt_manager = prompt_manager
         self.evolution_count = 0
 
-    async def evolve_prompts(self):
+    async def evolve_prompts(self) -> None:
         """
         Takes the top-performing prompts and uses an LLM to generate new variations.
         """
         # ... rest of method
 
-    def evaluate_and_score_prompts(self):
+    def evaluate_and_score_prompts(self) -> None:
         """
         Simulates the evaluation of prompts and updates their scores based on deterministic criteria.
         In a real-world scenario, this would involve running benchmarks.
         """
         # ... rest of method
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
class Evolve:
def __init__(self, prompt_manager):
self.prompt_manager = prompt_manager
self.evolution_count = 0
async def evolve_prompts(self):
"""
Takes the top-performing prompts and uses an LLM to generate new variations.
"""
best_prompt = self.prompt_manager.get_best_prompt()
if not best_prompt:
LOG.warning("No prompts found to evolve.")
return
LOG.info(f"Evolving prompt '{best_prompt.name}' with score {best_prompt.score}")
# Use an LLM to generate a new variation of the prompt.
evolution_prompt = prompt_engine.load_prompt(
"evolve-prompt",
prompt_to_evolve=best_prompt.template,
)
# In a real implementation, a 'step' object would be passed here.
# This is a placeholder for demonstration purposes.
response = await LLM_API_HANDLER(prompt=evolution_prompt, step=None)
# Assuming the response is the raw string of the new prompt
evolved_prompt_str = response if isinstance(response, str) else str(response)
# Add the new prompt to the population
self.evolution_count += 1
new_prompt_name = f"evolved_v{self.evolution_count}"
self.prompt_manager.add_prompt(new_prompt_name, evolved_prompt_str, score=0)
LOG.info(f"Evolved new prompt '{new_prompt_name}': {evolved_prompt_str[:100]}...")
def evaluate_and_score_prompts(self):
"""
Simulates the evaluation of prompts and updates their scores based on deterministic criteria.
In a real-world scenario, this would involve running benchmarks.
"""
LOG.info("Evaluating and scoring prompts...")
for name, prompt in self.prompt_manager.prompts.items():
# Skip the baseline prompt as its score is fixed.
if name == "baseline":
continue
score = 0
# Score based on length (ideal length between 500 and 1500 characters)
length = len(prompt.template)
if 500 <= length <= 1500:
score += 0.5
else:
score -= 0.2
# Score based on presence of keywords
keywords = ["action", "reasoning", "COMPLETE", "TERMINATE", "element", "goal"]
for keyword in keywords:
if keyword in prompt.template.lower():
score += 0.2
# Normalize score to be between 0 and 2 for this simulation
normalized_score = max(0, min(2, score))
self.prompt_manager.update_score(name, normalized_score)
LOG.info(f"Evaluated '{name}', assigned score: {normalized_score}")
from typing import TYPE_CHECKING
if TYPE_CHECKING:
from skyvern.evolution.prompt_manager import PromptManager
class Evolve:
def __init__(self, prompt_manager: "PromptManager") -> None:
self.prompt_manager = prompt_manager
self.evolution_count = 0
async def evolve_prompts(self) -> None:
"""
Takes the top-performing prompts and uses an LLM to generate new variations.
"""
best_prompt = self.prompt_manager.get_best_prompt()
if not best_prompt:
LOG.warning("No prompts found to evolve.")
return
LOG.info(f"Evolving prompt '{best_prompt.name}' with score {best_prompt.score}")
# Use an LLM to generate a new variation of the prompt.
evolution_prompt = prompt_engine.load_prompt(
"evolve-prompt",
prompt_to_evolve=best_prompt.template,
)
# In a real implementation, a 'step' object would be passed here.
# This is a placeholder for demonstration purposes.
response = await LLM_API_HANDLER(prompt=evolution_prompt, step=None)
# Assuming the response is the raw string of the new prompt
evolved_prompt_str = response if isinstance(response, str) else str(response)
# Add the new prompt to the population
self.evolution_count += 1
new_prompt_name = f"evolved_v{self.evolution_count}"
self.prompt_manager.add_prompt(new_prompt_name, evolved_prompt_str, score=0)
LOG.info(f"Evolved new prompt '{new_prompt_name}': {evolved_prompt_str[:100]}...")
def evaluate_and_score_prompts(self) -> None:
"""
Simulates the evaluation of prompts and updates their scores based on deterministic criteria.
In a real-world scenario, this would involve running benchmarks.
"""
LOG.info("Evaluating and scoring prompts...")
for name, prompt in self.prompt_manager.prompts.items():
# Skip the baseline prompt as its score is fixed.
if name == "baseline":
continue
score = 0
# Score based on length (ideal length between 500 and 1500 characters)
length = len(prompt.template)
if 500 <= length <= 1500:
score += 0.5
else:
score -= 0.2
# Score based on presence of keywords
keywords = ["action", "reasoning", "COMPLETE", "TERMINATE", "element", "goal"]
for keyword in keywords:
if keyword in prompt.template.lower():
score += 0.2
# Normalize score to be between 0 and 2 for this simulation
normalized_score = max(0, min(2, score))
self.prompt_manager.update_score(name, normalized_score)
LOG.info(f"Evaluated '{name}', assigned score: {normalized_score}")
🤖 Prompt for AI Agents
In skyvern/evolution/evolve.py around lines 9-74, the Evolve class and its
methods lack Python 3.11+ type hints; add explicit type annotations for the
class attributes and method signatures: annotate __init__ to accept
prompt_manager: "PromptManager" (use a forward reference or import the
PromptManager type), self.evolution_count: int, and self.prompt_manager:
"PromptManager"; annotate async def evolve_prompts(self) -> None and def
evaluate_and_score_prompts(self) -> None; annotate local variables where helpful
(e.g., best_prompt: Optional[Prompt], response: Any, evolved_prompt_str: str,
new_prompt_name: str, score: float, normalized_score: float) and ensure you
import necessary typing items (Optional, Any, Optional["Prompt"] or a Prompt
type, and if needed Coroutine) or reference existing project types; update
function and variable annotations accordingly without changing logic.

68 changes: 68 additions & 0 deletions skyvern/evolution/prompt_manager.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,68 @@
import structlog

from skyvern.forge.prompts import prompt_engine

LOG = structlog.get_logger()

class Prompt:
def __init__(self, name, template, score=0):
self.name = name
self.template = template
self.score = score
Comment on lines +7 to +11
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion | 🟠 Major

Add type hints and class docstring.

The Prompt class is missing type hints for its __init__ parameters and lacks a class-level docstring describing its purpose.

As per coding guidelines, apply this diff:

 class Prompt:
+    """
+    Represents a prompt template with its associated metadata.
+    
+    Attributes:
+        name: Unique identifier for the prompt.
+        template: The Jinja2 template string.
+        score: Performance score for ranking (default: 0).
+    """
-    def __init__(self, name, template, score=0):
+    def __init__(self, name: str, template: str, score: float = 0) -> None:
         self.name = name
         self.template = template
         self.score = score
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
class Prompt:
def __init__(self, name, template, score=0):
self.name = name
self.template = template
self.score = score
class Prompt:
"""
Represents a prompt template with its associated metadata.
Attributes:
name: Unique identifier for the prompt.
template: The Jinja2 template string.
score: Performance score for ranking (default: 0).
"""
def __init__(self, name: str, template: str, score: float = 0) -> None:
self.name = name
self.template = template
self.score = score
🤖 Prompt for AI Agents
In skyvern/evolution/prompt_manager.py around lines 7 to 11, the Prompt class
lacks a class-level docstring and type hints; add a concise docstring explaining
that Prompt represents a named prompt template with an associated score,
annotate the class attributes (name: str, template: str, score: int = 0) and
update the __init__ signature to use type hints (def __init__(self, name: str,
template: str, score: int = 0) -> None:) so static type checkers and IDEs can
validate usage.


class PromptManager:
def __init__(self):
self.prompts = {}
self._load_baseline_prompt()

def _load_baseline_prompt(self):
"""
Loads the original 'extract-action.j2' prompt as the baseline.
"""
try:
# Access the Jinja2 environment from the prompt_engine
env = prompt_engine.env
# Construct the path to the template within the Jinja2 environment
template_path = "skyvern/extract-action.j2"
# Get the template source from the loader
baseline_template = env.loader.get_source(env, template_path)[0]

self.add_prompt("baseline", baseline_template, score=1.0) # Assuming baseline is good.
LOG.info("Loaded baseline prompt 'extract-action.j2'.")
except Exception as e:
LOG.error(f"Failed to load baseline prompt: {e}", exc_info=True)

def add_prompt(self, name, template, score=0):
"""
Adds a new prompt to the population.
"""
if name in self.prompts:
LOG.warning(f"Prompt with name '{name}' already exists. Overwriting.")

self.prompts[name] = Prompt(name, template, score)
LOG.info(f"Added prompt '{name}' with score {score}.")

def get_prompt(self, name):
"""
Retrieves a prompt object by its name.
"""
return self.prompts.get(name)

def get_best_prompt(self):
"""
Returns the prompt with the highest score.
"""
if not self.prompts:
return None

return max(self.prompts.values(), key=lambda p: p.score)

def update_score(self, name, score):
"""
Updates the score of a prompt after evaluation.
"""
if name in self.prompts:
self.prompts[name].score = score
LOG.info(f"Updated score for prompt '{name}' to {score}.")
else:
LOG.warning(f"Prompt '{name}' not found for score update.")
Comment on lines +13 to +68
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion | 🟠 Major

Add type hints to all methods.

The PromptManager class methods are missing type hints for parameters and return values, which is required by the coding guidelines for Python 3.11+.

As per coding guidelines, apply these changes:

+from typing import Optional
+
 class PromptManager:
-    def __init__(self):
+    def __init__(self) -> None:
         self.prompts = {}
         self._load_baseline_prompt()
 
-    def _load_baseline_prompt(self):
+    def _load_baseline_prompt(self) -> None:
         """
         Loads the original 'extract-action.j2' prompt as the baseline.
         """
         # ... rest of method
 
-    def add_prompt(self, name, template, score=0):
+    def add_prompt(self, name: str, template: str, score: float = 0) -> None:
         """
         Adds a new prompt to the population.
         """
         # ... rest of method
 
-    def get_prompt(self, name):
+    def get_prompt(self, name: str) -> Optional[Prompt]:
         """
         Retrieves a prompt object by its name.
         """
         return self.prompts.get(name)
 
-    def get_best_prompt(self):
+    def get_best_prompt(self) -> Optional[Prompt]:
         """
         Returns the prompt with the highest score.
         """
         # ... rest of method
 
-    def update_score(self, name, score):
+    def update_score(self, name: str, score: float) -> None:
         """
         Updates the score of a prompt after evaluation.
         """
         # ... rest of method
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
class PromptManager:
def __init__(self):
self.prompts = {}
self._load_baseline_prompt()
def _load_baseline_prompt(self):
"""
Loads the original 'extract-action.j2' prompt as the baseline.
"""
try:
# Access the Jinja2 environment from the prompt_engine
env = prompt_engine.env
# Construct the path to the template within the Jinja2 environment
template_path = "skyvern/extract-action.j2"
# Get the template source from the loader
baseline_template = env.loader.get_source(env, template_path)[0]
self.add_prompt("baseline", baseline_template, score=1.0) # Assuming baseline is good.
LOG.info("Loaded baseline prompt 'extract-action.j2'.")
except Exception as e:
LOG.error(f"Failed to load baseline prompt: {e}", exc_info=True)
def add_prompt(self, name, template, score=0):
"""
Adds a new prompt to the population.
"""
if name in self.prompts:
LOG.warning(f"Prompt with name '{name}' already exists. Overwriting.")
self.prompts[name] = Prompt(name, template, score)
LOG.info(f"Added prompt '{name}' with score {score}.")
def get_prompt(self, name):
"""
Retrieves a prompt object by its name.
"""
return self.prompts.get(name)
def get_best_prompt(self):
"""
Returns the prompt with the highest score.
"""
if not self.prompts:
return None
return max(self.prompts.values(), key=lambda p: p.score)
def update_score(self, name, score):
"""
Updates the score of a prompt after evaluation.
"""
if name in self.prompts:
self.prompts[name].score = score
LOG.info(f"Updated score for prompt '{name}' to {score}.")
else:
LOG.warning(f"Prompt '{name}' not found for score update.")
from typing import Optional
class PromptManager:
def __init__(self) -> None:
self.prompts: dict[str, Prompt] = {}
self._load_baseline_prompt()
def _load_baseline_prompt(self) -> None:
"""
Loads the original 'extract-action.j2' prompt as the baseline.
"""
try:
# Access the Jinja2 environment from the prompt_engine
env = prompt_engine.env
# Construct the path to the template within the Jinja2 environment
template_path = "skyvern/extract-action.j2"
# Get the template source from the loader
baseline_template = env.loader.get_source(env, template_path)[0]
self.add_prompt("baseline", baseline_template, score=1.0) # Assuming baseline is good.
LOG.info("Loaded baseline prompt 'extract-action.j2'.")
except Exception as e:
LOG.error(f"Failed to load baseline prompt: {e}", exc_info=True)
def add_prompt(self, name: str, template: str, score: float = 0) -> None:
"""
Adds a new prompt to the population.
"""
if name in self.prompts:
LOG.warning(f"Prompt with name '{name}' already exists. Overwriting.")
self.prompts[name] = Prompt(name, template, score)
LOG.info(f"Added prompt '{name}' with score {score}.")
def get_prompt(self, name: str) -> Optional[Prompt]:
"""
Retrieves a prompt object by its name.
"""
return self.prompts.get(name)
def get_best_prompt(self) -> Optional[Prompt]:
"""
Returns the prompt with the highest score.
"""
if not self.prompts:
return None
return max(self.prompts.values(), key=lambda p: p.score)
def update_score(self, name: str, score: float) -> None:
"""
Updates the score of a prompt after evaluation.
"""
if name in self.prompts:
self.prompts[name].score = score
LOG.info(f"Updated score for prompt '{name}' to {score}.")
else:
LOG.warning(f"Prompt '{name}' not found for score update.")

60 changes: 46 additions & 14 deletions skyvern/forge/agent.py
Original file line number Diff line number Diff line change
Expand Up @@ -1316,11 +1316,31 @@ async def _build_extract_action_prompt(
)

task_type = task.task_type if task.task_type else TaskType.general
template = ""

# Determine which template to use. Evolved prompts are handled as raw strings,
# while standard prompts are handled by name.
template_name: str | None = None
template_str: str | None = None

if task_type == TaskType.general:
template = "extract-action"
# For general tasks, try to use the best prompt from our evolution manager.
best_prompt = app.PROMPT_MANAGER.get_best_prompt()
if best_prompt:
LOG.info(f"Using evolved prompt: {best_prompt.name} with score {best_prompt.score}")
template_str = best_prompt.template
else:
# If no evolved prompts, fall back to the baseline prompt.
LOG.warning("PromptManager has no prompts. Falling back to baseline 'extract-action'.")
baseline_prompt = app.PROMPT_MANAGER.get_prompt("baseline")
if baseline_prompt:
template_str = baseline_prompt.template
else:
# If even the baseline is missing, this is a critical error.
LOG.error("Baseline prompt could not be loaded from PromptManager.")
# As a last resort, use the template name.
template_name = "extract-action"
elif task_type == TaskType.validation:
template = "decisive-criterion-validate"
template_name = "decisive-criterion-validate"
elif task_type == TaskType.action:
prompt = prompt_engine.load_prompt("infer-action-type", navigation_goal=navigation_goal)
json_response = await app.LLM_API_HANDLER(prompt=prompt, step=step)
Expand All @@ -1329,26 +1349,22 @@ async def _build_extract_action_prompt(
reason=json_response.get("thought"), error_type=json_response.get("error")
)

action_type: str = json_response.get("action_type") or ""
action_type = ActionType[action_type.upper()]
action_type_str: str = json_response.get("action_type") or ""
action_type = ActionType[action_type_str.upper()]

if action_type == ActionType.CLICK:
template = "single-click-action"
template_name = "single-click-action"
elif action_type == ActionType.INPUT_TEXT:
template = "single-input-action"
template_name = "single-input-action"
elif action_type == ActionType.UPLOAD_FILE:
template = "single-upload-action"
template_name = "single-upload-action"
elif action_type == ActionType.SELECT_OPTION:
template = "single-select-action"
template_name = "single-select-action"
else:
raise UnsupportedActionType(action_type=action_type)

if not template:
raise UnsupportedTaskType(task_type=task_type)

context = skyvern_context.ensure_context()
return prompt_engine.load_prompt(
template=template,
render_kwargs = dict(
navigation_goal=navigation_goal,
navigation_payload_str=json.dumps(final_navigation_payload),
starting_url=starting_url,
Expand All @@ -1363,6 +1379,22 @@ async def _build_extract_action_prompt(
terminate_criterion=task.terminate_criterion,
)

if template_str is not None:
# Render the prompt from a raw string (used for evolved prompts)
return prompt_engine.load_prompt_from_string(
template=template_str,
**render_kwargs,
)

if template_name is not None:
# Render the prompt from a template file by name (standard behavior)
return prompt_engine.load_prompt(
template=template_name,
**render_kwargs,
)

raise UnsupportedTaskType(task_type=task_type)

def _build_navigation_payload(
self,
task: Task,
Expand Down
2 changes: 2 additions & 0 deletions skyvern/forge/app.py
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,7 @@

from fastapi import FastAPI

from skyvern.evolution.prompt_manager import PromptManager
from skyvern.forge.agent import ForgeAgent
from skyvern.forge.agent_functions import AgentFunction
from skyvern.forge.sdk.api.llm.api_handler_factory import LLMAPIHandlerFactory
Expand Down Expand Up @@ -43,4 +44,5 @@
authentication_function: Callable[[str], Awaitable[Organization]] | None = None
setup_api_app: Callable[[FastAPI], None] | None = None

PROMPT_MANAGER = PromptManager()
agent = ForgeAgent()
17 changes: 17 additions & 0 deletions skyvern/forge/prompts/skyvern/evolve-prompt.j2
Original file line number Diff line number Diff line change
@@ -0,0 +1,17 @@
You are an expert in prompt engineering for large language models that control web automation agents.
Your task is to evolve the following prompt to make it more effective. The goal is to improve the agent's ability to understand a webpage and decide on the next best action to achieve a user's goal.

Here are some principles for a good prompt:
- **Clarity and Conciseness:** The prompt should be easy for the LLM to understand. Avoid ambiguity.
- **Role-setting:** Clearly define the role and capabilities of the agent.
- **Comprehensive Context:** Ensure all necessary information (like page elements, user goal, history) is presented logically.
- **Action-oriented:** The prompt should guide the LLM towards producing a concrete, executable action.
- **Robustness:** The prompt should encourage the model to handle unexpected situations gracefully (e.g., by providing fallback actions or reasoning about errors).

Here is the prompt to evolve:
---
{{ prompt_to_evolve }}
---

Based on the principles above, please provide a new, improved version of this prompt.
Only output the new prompt template. Do not include any other text or explanation.