Currently vf-eval doesn't log additional state information. Calling env.evaluate will log custom state info if one passes state_colums: list[str] to make_dataset.
There's an open issue about this on verifiers PrimeIntellect-ai/verifiers#418. We should update our environments to log judge responses when this feature is implemented.