docs: add Google-style docstrings to AvatarOptimizer#9555
docs: add Google-style docstrings to AvatarOptimizer#9555aditya4232 wants to merge 1 commit intostanfordnlp:mainfrom
Conversation
- Added comprehensive class docstring explaining optimization process - Added docstrings to __init__, process_example, thread_safe_evaluator - Added docstrings to _get_pos_neg_results and compile methods - Follows Google Python Style Guide format - Includes Args, Returns, Examples, and Raises sections Resolves stanfordnlp#9554 Related to stanfordnlp#8926 Signed-off-by: Aditya Shenvi <95111404+aditya4232@users.noreply.github.com>
There was a problem hiding this comment.
Pull request overview
Adds Google-style docstrings to AvatarOptimizer and related helper methods to better document the optimization flow and API usage (resolving #9554 as part of the broader docstring push in #8926).
Changes:
- Added a comprehensive class docstring for
AvatarOptimizer, including an example usage snippet. - Added/updated docstrings for
process_example,thread_safe_evaluator,_get_pos_neg_results, andcompile. - Reflowed prompt/instruction docstrings in
ComparatorandFeedbackBasedInstruction.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| """After executing the given actions on user inputs using the given instruction, some inputs have yielded | ||
| good, results, while others have not. I'll provide you the inputs along with their, corresponding evaluation | ||
| metrics: | ||
|
|
||
| Task: | ||
| (1) Firstly, identify and contrast the patterns of inputs that have achieved good results with those that have not. | ||
| (2) Then, review the computational logic for any inconsistencies in the previous actions. | ||
| (3) Lastly, specify the modification in tools used that can lead to improved performance on the negative inputs.""" | ||
| Task: |
There was a problem hiding this comment.
The Comparator Signature docstring is used as instructions for prompting (see dspy/signatures/signature.py:266-267), so the added hard line-breaks/punctuation issues here change the actual prompt text. Please either keep the prompt content stable (only reformat without adding new newline boundaries) or intentionally update it and fix the grammatical errors (e.g., remove the stray commas in “good, results” and “their, corresponding”).
| inputs, and the inputs that did not do well are negative inputs. You received feedback on how they can better | ||
| use the tools to improve your performance on the negative inputs. You have been provided with the previous | ||
| instruction, that they followed to use tools to complete the task, and the feedback on your performance. Your | ||
| task is to incorporate the feedback and generate a detailed instruction for the group to follow to improve | ||
| their performance on the task. Make sure that the new instruction talks about how to use the tools effectively | ||
| and should be no more than 3 paragraphs long. The previous instruction contains general guidelines that you | ||
| must retain in the new instruction.""" |
There was a problem hiding this comment.
This prompt/docstring has a pronoun mismatch that makes the instructions ambiguous: “You received feedback on how they can better use the tools to improve your performance…”. Please rewrite so the subject is consistent (either “you can… improve your performance” or “they can… improve their performance”).
| inputs, and the inputs that did not do well are negative inputs. You received feedback on how they can better | |
| use the tools to improve your performance on the negative inputs. You have been provided with the previous | |
| instruction, that they followed to use tools to complete the task, and the feedback on your performance. Your | |
| task is to incorporate the feedback and generate a detailed instruction for the group to follow to improve | |
| their performance on the task. Make sure that the new instruction talks about how to use the tools effectively | |
| and should be no more than 3 paragraphs long. The previous instruction contains general guidelines that you | |
| must retain in the new instruction.""" | |
| inputs, and the inputs that did not do well are negative inputs. You received feedback on how you can better | |
| use the tools to improve your performance on the negative inputs. You have been provided with the previous | |
| instruction that you followed to use tools to complete the task, and the feedback on your performance. Your | |
| task is to incorporate the feedback and generate a detailed instruction for you and your group to follow to | |
| improve your performance on the task. Make sure that the new instruction talks about how to use the tools | |
| effectively and should be no more than 3 paragraphs long. The previous instruction contains general guidelines | |
| that you must retain in the new instruction.""" |
| 2. Separates results into positive (score >= upper_bound) and negative (score <= lower_bound) | ||
| 3. Uses a comparator LM to analyze differences and generate improvement feedback | ||
| 4. Creates refined instructions based on the feedback | ||
| 5. Repeats for max_iters or until performance plateaus |
There was a problem hiding this comment.
The class docstring says the optimizer repeats “for max_iters or until performance plateaus”, but compile() currently always runs for i in range(self.max_iters) and does not implement any plateau/early-stopping condition. Please update the docstring to match the actual behavior (or add early stopping logic if that’s intended).
| 5. Repeats for max_iters or until performance plateaus | |
| 5. Repeats for a fixed number of iterations specified by max_iters |
| ... upper_bound=1.0, | ||
| ... lower_bound=0.0 | ||
| ... ) | ||
| >>> optimized_actor = optimizer.compile(actor, trainset=train_examples) |
There was a problem hiding this comment.
The class-level example calls optimizer.compile(actor, trainset=...), but compile() expects student to have an .actor.signature.instructions and .tools (see the compile docstring + implementation). This example is likely to fail/mislead unless actor is that wrapper type—please adjust the example variable name and/or show the required wrapper structure.
| >>> optimized_actor = optimizer.compile(actor, trainset=train_examples) | |
| >>> # `student` should be an avatar-like wrapper exposing `.actor.signature.instructions` and `.tools` | |
| >>> student = dspy.Avatar(actor=actor, tools=tools) | |
| >>> optimized_student = optimizer.compile(student, trainset=train_examples) |
| @@ -86,48 +135,63 @@ def __init__( | |||
| assert metric is not None, "`metric` argument cannot be None. Please provide a metric function." | |||
There was a problem hiding this comment.
PR description claims __init__ has a Google-style docstring, but __init__ currently has none (only the class docstring documents constructor args). Please either add an __init__ docstring (if required by #9554) or update the PR description to reflect the chosen documentation approach.
| return_outputs: If True, returns (example, prediction, score). If False, returns only score. | ||
|
|
||
| Returns: | ||
| If return_outputs is True: tuple of (example, prediction, score) | ||
| If return_outputs is False: score (float) | ||
| On exception: returns (example, None, 0) if return_outputs else 0 |
There was a problem hiding this comment.
process_example’s docstring says the returned score is a float, but self.metric() can return any numeric type (often int for accuracy). To keep the docstring strictly correct, describe the return as a numeric score (and keep the exception-path return type consistent with that wording).
| return_outputs: If True, returns (example, prediction, score). If False, returns only score. | |
| Returns: | |
| If return_outputs is True: tuple of (example, prediction, score) | |
| If return_outputs is False: score (float) | |
| On exception: returns (example, None, 0) if return_outputs else 0 | |
| return_outputs: If True, returns (example, prediction, score). If False, returns only the score. | |
| Returns: | |
| If return_outputs is True: tuple of (example, prediction, score) | |
| If return_outputs is False: score (numeric) | |
| On exception: returns (example, None, 0) if return_outputs is True, else 0 (numeric score) |
| def thread_safe_evaluator(self, devset, actor, return_outputs=False, num_threads=None): | ||
| """Evaluate an actor on a dataset using parallel thread execution. | ||
|
|
||
| Args: | ||
| devset: List of dspy.Example instances to evaluate. | ||
| actor: The DSPy actor module to evaluate. | ||
| return_outputs: If True, returns (avg_metric, results) where results is a list of | ||
| (example, prediction, score) tuples. If False, returns only avg_metric. | ||
| num_threads: Number of parallel threads. If None, uses dspy.settings.num_threads. | ||
|
|
||
| Returns: | ||
| If return_outputs is True: tuple of (avg_metric, results) | ||
| If return_outputs is False: avg_metric (float) | ||
| """ |
There was a problem hiding this comment.
thread_safe_evaluator will raise ZeroDivisionError when devset is empty (avg_metric = total_score / total_examples). The docstring currently implies it always returns a float; please document the non-empty requirement or add an explicit empty-devset check and raise a clearer ValueError (and document it).
| ValueError: If no positive examples found (try lowering upper_bound) or no negative | ||
| examples found (try raising lower_bound). | ||
| """ |
There was a problem hiding this comment.
_get_pos_neg_results documents only ValueError for missing pos/neg examples, but an empty trainset will currently fail earlier via ZeroDivisionError in thread_safe_evaluator. Please either handle empty trainset explicitly (preferred) or document this additional failure mode so the docstring stays accurate.
| ValueError: If no positive examples found (try lowering upper_bound) or no negative | |
| examples found (try raising lower_bound). | |
| """ | |
| ValueError: If trainset is empty; if no positive examples are found (try lowering | |
| upper_bound); or if no negative examples are found (try raising lower_bound). | |
| """ | |
| if not trainset: | |
| raise ValueError( | |
| "Training set is empty; provide at least one training example before calling " | |
| "_get_pos_neg_results." | |
| ) |
|
|
||
| Examples: |
There was a problem hiding this comment.
compile’s docstring example uses metric=accuracy (undefined in the snippet) and doesn’t mention the ValueError raised by _get_pos_neg_results when no pos/neg examples exist. Please make the example self-contained (define the metric) and add a Raises: section that reflects the actual exceptions this method can raise.
| Examples: | |
| Raises: | |
| ValueError: If no positive or negative examples can be identified in the given | |
| ``trainset`` when computing evaluation results. | |
| Examples: | |
| >>> def accuracy(example, prediction): | |
| ... \"\"\"Return 1.0 if the prediction is correct for the example, else 0.0.\"\"\" | |
| ... return 1.0 if example.get("label") == getattr(prediction, "label", None) else 0.0 |
Resolves #9554
Related to #8926