docs: add Google-style docstrings to AvatarOptimizer#9555

Open

aditya4232 wants to merge 1 commit intostanfordnlp:mainfrom

aditya4232:aditya4232-patch-1

aditya4232 commented Apr 2, 2026

Added comprehensive class docstring explaining optimization process
Added docstrings to init, process_example, thread_safe_evaluator
Added docstrings to _get_pos_neg_results and compile methods
Follows Google Python Style Guide format
Includes Args, Returns, Examples, and Raises sections

Resolves #9554
Related to #8926


          docs: add Google-style docstrings to AvatarOptimizer

35b514c

- Added comprehensive class docstring explaining optimization process
- Added docstrings to __init__, process_example, thread_safe_evaluator
- Added docstrings to _get_pos_neg_results and compile methods
- Follows Google Python Style Guide format
- Includes Args, Returns, Examples, and Raises sections

Resolves stanfordnlp#9554
Related to stanfordnlp#8926

Signed-off-by: Aditya Shenvi <95111404+aditya4232@users.noreply.github.com>

Copilot AI review requested due to automatic review settings

April 2, 2026 09:12

Copilot started reviewing on behalf of aditya4232

April 2, 2026 09:12

Copilot AI reviewed

View reviewed changes

Contributor

Copilot AI left a comment

Pull request overview

Adds Google-style docstrings to AvatarOptimizer and related helper methods to better document the optimization flow and API usage (resolving #9554 as part of the broader docstring push in #8926).

Changes:

Added a comprehensive class docstring for AvatarOptimizer, including an example usage snippet.
Added/updated docstrings for process_example, thread_safe_evaluator, _get_pos_neg_results, and compile.
Reflowed prompt/instruction docstrings in Comparator and FeedbackBasedInstruction.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

dspy/teleprompt/avatar_optimizer.py

Comment on lines +23 to +27

+                  """After executing the given actions on user inputs using the given instruction, some inputs have yielded
+                  good, results, while others have not. I'll provide you the inputs along with their, corresponding evaluation
+                  metrics:
-              Task:
-              (1) Firstly, identify and contrast the patterns of inputs that have achieved good results with those that have not.
-              (2) Then, review the computational logic for any inconsistencies in the previous actions.
-              (3) Lastly, specify the modification in tools used that can lead to improved performance on the negative inputs."""
+                  Task:

Copilot AI Apr 2, 2026

The Comparator Signature docstring is used as instructions for prompting (see dspy/signatures/signature.py:266-267), so the added hard line-breaks/punctuation issues here change the actual prompt text. Please either keep the prompt content stable (only reformat without adding new newline boundaries) or intentionally update it and fix the grammatical errors (e.g., remove the stray commas in “good, results” and “their, corresponding”).

Copilot uses AI. Check for mistakes.

dspy/teleprompt/avatar_optimizer.py

Comment on lines +59 to +65

+                  inputs, and the inputs that did not do well are negative inputs. You received feedback on how they can better
+                  use the tools to improve your performance on the negative inputs. You have been provided with the previous
+                  instruction, that they followed to use tools to complete the task, and the feedback on your performance. Your
+                  task is to incorporate the feedback and generate a detailed instruction for the group to follow to improve
+                  their performance on the task. Make sure that the new instruction talks about how to use the tools effectively
+                  and should be no more than 3 paragraphs long. The previous instruction contains general guidelines that you
+                  must retain in the new instruction."""

Copilot AI Apr 2, 2026

This prompt/docstring has a pronoun mismatch that makes the instructions ambiguous: “You received feedback on how they can better use the tools to improve your performance…”. Please rewrite so the subject is consistent (either “you can… improve your performance” or “they can… improve their performance”).

Suggested change

      
                inputs, and the inputs that did not do well are negative inputs. You received feedback on how they can better
          
                use the tools to improve your performance on the negative inputs. You have been provided with the previous
          
                instruction, that they followed to use tools to complete the task, and the feedback on your performance. Your
          
                task is to incorporate the feedback and generate a detailed instruction for the group to follow to improve
          
                their performance on the task. Make sure that the new instruction talks about how to use the tools effectively
          
                and should be no more than 3 paragraphs long. The previous instruction contains general guidelines that you
          
                must retain in the new instruction."""
          
                inputs, and the inputs that did not do well are negative inputs. You received feedback on how you can better
          
                use the tools to improve your performance on the negative inputs. You have been provided with the previous
          
                instruction that you followed to use tools to complete the task, and the feedback on your performance. Your
          
                task is to incorporate the feedback and generate a detailed instruction for you and your group to follow to
          
                improve your performance on the task. Make sure that the new instruction talks about how to use the tools
          
                effectively and should be no more than 3 paragraphs long. The previous instruction contains general guidelines
          
                that you must retain in the new instruction."""

Copilot uses AI. Check for mistakes.

dspy/teleprompt/avatar_optimizer.py

+. Separates results into positive (score >= upper_bound) and negative (score <= lower_bound)
+. Uses a comparator LM to analyze differences and generate improvement feedback
+. Creates refined instructions based on the feedback
+. Repeats for max_iters or until performance plateaus

Copilot AI Apr 2, 2026

The class docstring says the optimizer repeats “for max_iters or until performance plateaus”, but compile() currently always runs for i in range(self.max_iters) and does not implement any plateau/early-stopping condition. Please update the docstring to match the actual behavior (or add early stopping logic if that’s intended).

Suggested change

      
                5. Repeats for max_iters or until performance plateaus
          
                5. Repeats for a fixed number of iterations specified by max_iters

Copilot uses AI. Check for mistakes.

dspy/teleprompt/avatar_optimizer.py

+                      ...     upper_bound=1.0,
+                      ...     lower_bound=0.0
+                      ... )
+                      >>> optimized_actor = optimizer.compile(actor, trainset=train_examples)

Copilot AI Apr 2, 2026

The class-level example calls optimizer.compile(actor, trainset=...), but compile() expects student to have an .actor.signature.instructions and .tools (see the compile docstring + implementation). This example is likely to fail/mislead unless actor is that wrapper type—please adjust the example variable name and/or show the required wrapper structure.

Suggested change

      
                    >>> optimized_actor = optimizer.compile(actor, trainset=train_examples)
          
                    >>> # `student` should be an avatar-like wrapper exposing `.actor.signature.instructions` and `.tools`
          
                    >>> student = dspy.Avatar(actor=actor, tools=tools)
          
                    >>> optimized_student = optimizer.compile(student, trainset=train_examples)

Copilot uses AI. Check for mistakes.

dspy/teleprompt/avatar_optimizer.py

Comment on lines 125 to 135

		@@ -86,48 +135,63 @@ def __init__(
		assert metric is not None, "`metric` argument cannot be None. Please provide a metric function."

Copilot AI Apr 2, 2026

PR description claims __init__ has a Google-style docstring, but __init__ currently has none (only the class docstring documents constructor args). Please either add an __init__ docstring (if required by #9554) or update the PR description to reflect the chosen documentation approach.

Copilot uses AI. Check for mistakes.

dspy/teleprompt/avatar_optimizer.py

Comment on lines +152 to +157

+                          return_outputs: If True, returns (example, prediction, score). If False, returns only score.
+                      Returns:
+                          If return_outputs is True: tuple of (example, prediction, score)
+                          If return_outputs is False: score (float)
+                          On exception: returns (example, None, 0) if return_outputs else 0

Copilot AI Apr 2, 2026

process_example’s docstring says the returned score is a float, but self.metric() can return any numeric type (often int for accuracy). To keep the docstring strictly correct, describe the return as a numeric score (and keep the exception-path return type consistent with that wording).

Suggested change

      
                        return_outputs: If True, returns (example, prediction, score). If False, returns only score.
          
                    Returns:
          
                        If return_outputs is True: tuple of (example, prediction, score)
          
                        If return_outputs is False: score (float)
          
                        On exception: returns (example, None, 0) if return_outputs else 0
          
                        return_outputs: If True, returns (example, prediction, score). If False, returns only the score.
          
                    Returns:
          
                        If return_outputs is True: tuple of (example, prediction, score)
          
                        If return_outputs is False: score (numeric)
          
                        On exception: returns (example, None, 0) if return_outputs is True, else 0 (numeric score)

Copilot uses AI. Check for mistakes.

dspy/teleprompt/avatar_optimizer.py

Comment on lines 174 to +187

                   def thread_safe_evaluator(self, devset, actor, return_outputs=False, num_threads=None):
+                      """Evaluate an actor on a dataset using parallel thread execution.
+                      Args:
+                          devset: List of dspy.Example instances to evaluate.
+                          actor: The DSPy actor module to evaluate.
+                          return_outputs: If True, returns (avg_metric, results) where results is a list of
+                              (example, prediction, score) tuples. If False, returns only avg_metric.
+                          num_threads: Number of parallel threads. If None, uses dspy.settings.num_threads.
+                      Returns:
+                          If return_outputs is True: tuple of (avg_metric, results)
+                          If return_outputs is False: avg_metric (float)
+                      """

Copilot AI Apr 2, 2026

thread_safe_evaluator will raise ZeroDivisionError when devset is empty (avg_metric = total_score / total_examples). The docstring currently implies it always returns a float; please document the non-empty requirement or add an explicit empty-devset check and raise a clearer ValueError (and document it).

Copilot uses AI. Check for mistakes.

dspy/teleprompt/avatar_optimizer.py

Comment on lines +226 to +228

+                          ValueError: If no positive examples found (try lowering upper_bound) or no negative
+                              examples found (try raising lower_bound).
+                      """

Copilot AI Apr 2, 2026

_get_pos_neg_results documents only ValueError for missing pos/neg examples, but an empty trainset will currently fail earlier via ZeroDivisionError in thread_safe_evaluator. Please either handle empty trainset explicitly (preferred) or document this additional failure mode so the docstring stays accurate.

Suggested change

      
                        ValueError: If no positive examples found (try lowering upper_bound) or no negative
          
                            examples found (try raising lower_bound).
          
                    """
          
                        ValueError: If trainset is empty; if no positive examples are found (try lowering
          
                            upper_bound); or if no negative examples are found (try raising lower_bound).
          
                    """
          
                    if not trainset:
          
                        raise ValueError(
          
                            "Training set is empty; provide at least one training example before calling "
          
                            "_get_pos_neg_results."
          
                        )

Copilot uses AI. Check for mistakes.

dspy/teleprompt/avatar_optimizer.py

Comment on lines +270 to +271


		Examples:

Copilot AI Apr 2, 2026

compile’s docstring example uses metric=accuracy (undefined in the snippet) and doesn’t mention the ValueError raised by _get_pos_neg_results when no pos/neg examples exist. Please make the example self-contained (define the metric) and add a Raises: section that reflects the actual exceptions this method can raise.

Suggested change

      
                    Examples:
          
                    Raises:
          
                        ValueError: If no positive or negative examples can be identified in the given
          
                            ``trainset`` when computing evaluation results.
          
                    Examples:
          
                        >>> def accuracy(example, prediction):
          
                        ...     \"\"\"Return 1.0 if the prediction is correct for the example, else 0.0.\"\"\"
          
                        ...     return 1.0 if example.get("label") == getattr(prediction, "label", None) else 0.0

Copilot uses AI. Check for mistakes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet