Skip to content

docs: add Google-style docstrings to AvatarOptimizer#9555

Open
aditya4232 wants to merge 1 commit intostanfordnlp:mainfrom
aditya4232:aditya4232-patch-1
Open

docs: add Google-style docstrings to AvatarOptimizer#9555
aditya4232 wants to merge 1 commit intostanfordnlp:mainfrom
aditya4232:aditya4232-patch-1

Conversation

@aditya4232
Copy link
Copy Markdown

  • Added comprehensive class docstring explaining optimization process
  • Added docstrings to init, process_example, thread_safe_evaluator
  • Added docstrings to _get_pos_neg_results and compile methods
  • Follows Google Python Style Guide format
  • Includes Args, Returns, Examples, and Raises sections

Resolves #9554
Related to #8926

- Added comprehensive class docstring explaining optimization process
- Added docstrings to __init__, process_example, thread_safe_evaluator
- Added docstrings to _get_pos_neg_results and compile methods
- Follows Google Python Style Guide format
- Includes Args, Returns, Examples, and Raises sections

Resolves stanfordnlp#9554
Related to stanfordnlp#8926

Signed-off-by: Aditya Shenvi <95111404+aditya4232@users.noreply.github.com>
Copilot AI review requested due to automatic review settings April 2, 2026 09:12
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds Google-style docstrings to AvatarOptimizer and related helper methods to better document the optimization flow and API usage (resolving #9554 as part of the broader docstring push in #8926).

Changes:

  • Added a comprehensive class docstring for AvatarOptimizer, including an example usage snippet.
  • Added/updated docstrings for process_example, thread_safe_evaluator, _get_pos_neg_results, and compile.
  • Reflowed prompt/instruction docstrings in Comparator and FeedbackBasedInstruction.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +23 to +27
"""After executing the given actions on user inputs using the given instruction, some inputs have yielded
good, results, while others have not. I'll provide you the inputs along with their, corresponding evaluation
metrics:

Task:
(1) Firstly, identify and contrast the patterns of inputs that have achieved good results with those that have not.
(2) Then, review the computational logic for any inconsistencies in the previous actions.
(3) Lastly, specify the modification in tools used that can lead to improved performance on the negative inputs."""
Task:
Copy link

Copilot AI Apr 2, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Comparator Signature docstring is used as instructions for prompting (see dspy/signatures/signature.py:266-267), so the added hard line-breaks/punctuation issues here change the actual prompt text. Please either keep the prompt content stable (only reformat without adding new newline boundaries) or intentionally update it and fix the grammatical errors (e.g., remove the stray commas in “good, results” and “their, corresponding”).

Copilot uses AI. Check for mistakes.
Comment on lines +59 to +65
inputs, and the inputs that did not do well are negative inputs. You received feedback on how they can better
use the tools to improve your performance on the negative inputs. You have been provided with the previous
instruction, that they followed to use tools to complete the task, and the feedback on your performance. Your
task is to incorporate the feedback and generate a detailed instruction for the group to follow to improve
their performance on the task. Make sure that the new instruction talks about how to use the tools effectively
and should be no more than 3 paragraphs long. The previous instruction contains general guidelines that you
must retain in the new instruction."""
Copy link

Copilot AI Apr 2, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This prompt/docstring has a pronoun mismatch that makes the instructions ambiguous: “You received feedback on how they can better use the tools to improve your performance…”. Please rewrite so the subject is consistent (either “you can… improve your performance” or “they can… improve their performance”).

Suggested change
inputs, and the inputs that did not do well are negative inputs. You received feedback on how they can better
use the tools to improve your performance on the negative inputs. You have been provided with the previous
instruction, that they followed to use tools to complete the task, and the feedback on your performance. Your
task is to incorporate the feedback and generate a detailed instruction for the group to follow to improve
their performance on the task. Make sure that the new instruction talks about how to use the tools effectively
and should be no more than 3 paragraphs long. The previous instruction contains general guidelines that you
must retain in the new instruction."""
inputs, and the inputs that did not do well are negative inputs. You received feedback on how you can better
use the tools to improve your performance on the negative inputs. You have been provided with the previous
instruction that you followed to use tools to complete the task, and the feedback on your performance. Your
task is to incorporate the feedback and generate a detailed instruction for you and your group to follow to
improve your performance on the task. Make sure that the new instruction talks about how to use the tools
effectively and should be no more than 3 paragraphs long. The previous instruction contains general guidelines
that you must retain in the new instruction."""

Copilot uses AI. Check for mistakes.
2. Separates results into positive (score >= upper_bound) and negative (score <= lower_bound)
3. Uses a comparator LM to analyze differences and generate improvement feedback
4. Creates refined instructions based on the feedback
5. Repeats for max_iters or until performance plateaus
Copy link

Copilot AI Apr 2, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The class docstring says the optimizer repeats “for max_iters or until performance plateaus”, but compile() currently always runs for i in range(self.max_iters) and does not implement any plateau/early-stopping condition. Please update the docstring to match the actual behavior (or add early stopping logic if that’s intended).

Suggested change
5. Repeats for max_iters or until performance plateaus
5. Repeats for a fixed number of iterations specified by max_iters

Copilot uses AI. Check for mistakes.
... upper_bound=1.0,
... lower_bound=0.0
... )
>>> optimized_actor = optimizer.compile(actor, trainset=train_examples)
Copy link

Copilot AI Apr 2, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The class-level example calls optimizer.compile(actor, trainset=...), but compile() expects student to have an .actor.signature.instructions and .tools (see the compile docstring + implementation). This example is likely to fail/mislead unless actor is that wrapper type—please adjust the example variable name and/or show the required wrapper structure.

Suggested change
>>> optimized_actor = optimizer.compile(actor, trainset=train_examples)
>>> # `student` should be an avatar-like wrapper exposing `.actor.signature.instructions` and `.tools`
>>> student = dspy.Avatar(actor=actor, tools=tools)
>>> optimized_student = optimizer.compile(student, trainset=train_examples)

Copilot uses AI. Check for mistakes.
Comment on lines 125 to 135
@@ -86,48 +135,63 @@ def __init__(
assert metric is not None, "`metric` argument cannot be None. Please provide a metric function."
Copy link

Copilot AI Apr 2, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PR description claims __init__ has a Google-style docstring, but __init__ currently has none (only the class docstring documents constructor args). Please either add an __init__ docstring (if required by #9554) or update the PR description to reflect the chosen documentation approach.

Copilot uses AI. Check for mistakes.
Comment on lines +152 to +157
return_outputs: If True, returns (example, prediction, score). If False, returns only score.

Returns:
If return_outputs is True: tuple of (example, prediction, score)
If return_outputs is False: score (float)
On exception: returns (example, None, 0) if return_outputs else 0
Copy link

Copilot AI Apr 2, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

process_example’s docstring says the returned score is a float, but self.metric() can return any numeric type (often int for accuracy). To keep the docstring strictly correct, describe the return as a numeric score (and keep the exception-path return type consistent with that wording).

Suggested change
return_outputs: If True, returns (example, prediction, score). If False, returns only score.
Returns:
If return_outputs is True: tuple of (example, prediction, score)
If return_outputs is False: score (float)
On exception: returns (example, None, 0) if return_outputs else 0
return_outputs: If True, returns (example, prediction, score). If False, returns only the score.
Returns:
If return_outputs is True: tuple of (example, prediction, score)
If return_outputs is False: score (numeric)
On exception: returns (example, None, 0) if return_outputs is True, else 0 (numeric score)

Copilot uses AI. Check for mistakes.
Comment on lines 174 to +187
def thread_safe_evaluator(self, devset, actor, return_outputs=False, num_threads=None):
"""Evaluate an actor on a dataset using parallel thread execution.

Args:
devset: List of dspy.Example instances to evaluate.
actor: The DSPy actor module to evaluate.
return_outputs: If True, returns (avg_metric, results) where results is a list of
(example, prediction, score) tuples. If False, returns only avg_metric.
num_threads: Number of parallel threads. If None, uses dspy.settings.num_threads.

Returns:
If return_outputs is True: tuple of (avg_metric, results)
If return_outputs is False: avg_metric (float)
"""
Copy link

Copilot AI Apr 2, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thread_safe_evaluator will raise ZeroDivisionError when devset is empty (avg_metric = total_score / total_examples). The docstring currently implies it always returns a float; please document the non-empty requirement or add an explicit empty-devset check and raise a clearer ValueError (and document it).

Copilot uses AI. Check for mistakes.
Comment on lines +226 to +228
ValueError: If no positive examples found (try lowering upper_bound) or no negative
examples found (try raising lower_bound).
"""
Copy link

Copilot AI Apr 2, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

_get_pos_neg_results documents only ValueError for missing pos/neg examples, but an empty trainset will currently fail earlier via ZeroDivisionError in thread_safe_evaluator. Please either handle empty trainset explicitly (preferred) or document this additional failure mode so the docstring stays accurate.

Suggested change
ValueError: If no positive examples found (try lowering upper_bound) or no negative
examples found (try raising lower_bound).
"""
ValueError: If trainset is empty; if no positive examples are found (try lowering
upper_bound); or if no negative examples are found (try raising lower_bound).
"""
if not trainset:
raise ValueError(
"Training set is empty; provide at least one training example before calling "
"_get_pos_neg_results."
)

Copilot uses AI. Check for mistakes.
Comment on lines +270 to +271

Examples:
Copy link

Copilot AI Apr 2, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

compile’s docstring example uses metric=accuracy (undefined in the snippet) and doesn’t mention the ValueError raised by _get_pos_neg_results when no pos/neg examples exist. Please make the example self-contained (define the metric) and add a Raises: section that reflects the actual exceptions this method can raise.

Suggested change
Examples:
Raises:
ValueError: If no positive or negative examples can be identified in the given
``trainset`` when computing evaluation results.
Examples:
>>> def accuracy(example, prediction):
... \"\"\"Return 1.0 if the prediction is correct for the example, else 0.0.\"\"\"
... return 1.0 if example.get("label") == getattr(prediction, "label", None) else 0.0

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[docs] Add Google-style docstrings to dspy/teleprompt/avatar_optimizer.py

2 participants