Skip to content

BREAKING CHANGE: Make EvaluationReport and ReportCase into generic dataclasses #1799

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

dmontagu
Copy link
Contributor

Right now, these types are (non-generic) BaseModels. We received a request in the public slack to have these types be generic. As I noted there, it's much more difficult to do that with BaseModel than with a dataclass, so I converted the types into dataclasses and made things generic.

I'm not 100% convinced yet that we should make this change (though I'd be easily convinced if other maintainers consider the consequences and agree with the change), but the changes in this PR make these classes generic so you can have type-checked access to the report case inputs, outputs, and metadata.

The main downside (and reason this is a breaking change) is the loss of model_* methods on EvaluationReport and ReportCase. I introduced EvaluationReportAdapter and ReportCaseAdapter to make it somewhat easier to serialize these things on your own, but either way this is definitely less ergonomic than BaseModel, it's just harder to deal with generics properly in this way with a generic BaseModel subclass. (Generic BaseModels are more designed to reduce boilerplate when making many concrete instantiations of a parametrized type, and not designed for more type-theoretic applications like what is happening here.)

Comment on lines +63 to +65
scores: dict[str, EvaluationResult[int | float]]
labels: dict[str, EvaluationResult[str]]
assertions: dict[str, EvaluationResult[bool]]
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The = field(init=False) currently on these fields is just a straight-up mistake in the current implementation (maybe it was supposed to be there at some point, but I'm 99% sure it shouldn't be there now).

I think the only reason it didn't cause issues was because these objects are generally created by us, and in a way that it was never created without explicitly specifying the value. (Because this is currently a BaseModel, the =field(init=False) doesn't remove the field from the __init__.)

Copy link

Docs Preview

commit: 6b499f5
Preview URL: https://de8356b4-pydantic-ai-previews.pydantic.workers.dev

@dmontagu dmontagu requested a review from DouweM May 21, 2025 16:25
@DouweM
Copy link
Contributor

DouweM commented May 21, 2025

@dmontagu To not lose the model_dump[_json] methods and maintain backward compatibility (at least of the BaseModel bits we think people are likely to use), would it be crazy to implement them explicitly on the classes in question, delegating to the TypeAdapter?

I could imagine a decorator that works on any dataclass and does just that -- that may be useful more generally to get consistent validation/serialization APIs between BaseModels and dataclasses.

@DouweM DouweM assigned dmontagu and unassigned DouweM May 21, 2025
@dmontagu
Copy link
Contributor Author

dmontagu commented May 21, 2025

I thought about that but the reason I didn't do it is because it won't be type-aware (well, specifically, aware of the generic parameters involved; they will always be Any), so may end up with some surprising behavior. Admittedly it would be compatible with the way things work now, but I also feel it's a bit confusing to have model_X methods on a dataclass, it might make you think it's not actually a dataclass. Given we aren't in v1 yet and most people aren't manually serializing the reports, my feeling is it's not unreasonable to make a breaking change here and just make it a proper dataclass now.

If people are making serious use of model_dump etc. on the ReportCase / EvaluationReport classes then I might feel differently, but I would be surprised if that was the case and I'm not sure we have a good way to determine that anyway (short of user feedback, which I assume we won't get until breaking their code 😞).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants