You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Bug Description
The groundedness_measure_with_cot_reasons_consider_answerability function returns NaN if filter_trivial_statements=True and all statements are filtered in _remove_trivial_statements(). It's likely that similar behaviour occurs for other groundedness functions, as they also may filter all statements.
hypotheses will be an empty list, consequently results will be an empty list, consequently groundedness_scores will be an empty dict, consequently the following is computed over an empty dict, which computes NaN:
importboto3fromtrulens.providers.bedrockimportBedrockclient=boto3.client(service_name="bedrock-runtime")
bedrock=Bedrock(
model_id="anthropic.claude-3-5-sonnet-20240620-v1:0",
client=client,
)
question="How do you implement a binary search algorithm in Python?"source="""* FUNCTION DEFINITION* def calculate_area(radius): return 3.14 * radius ** 2"""statement="""The layers of abstraction and the layers of abstraction and the layers of abstraction and the layers of abstraction and the layers of abstraction and the layers of abstraction and the layers of abstraction and the layers of abstraction and the layers of abstraction and the layers of abstraction and the layers of abstraction and the layers of abstraction and the layers of abstraction and the layers of abstraction and the layers of abstraction and the layers of abstraction and the layers of abstraction and the layers of abstraction and the layers of abstraction and the layers of abstraction and the layers of abstraction and the layers of abstraction and the layers of abstraction and the layers of abstraction and the layers of abstraction and the layers of abstraction and the layers of abstraction and the layers of abstraction and the layers of abstraction and the layers of abstraction and the layers of abstraction and the layers of abstraction and the layers of abstraction and the layers of abstraction and the layers of"""bedrock.groundedness_measure_with_cot_reasons_consider_answerability(
source=source, statement=statement, question=question
)
Expected behavior
I expect 0 to be returned if there are no statements left after filtering non-trivial statements, as NaN is not a value between 0.0 and 1.0 (as documented).
ifnothypotheses:
return0.0, {"reason": "No non-trivial statements to evaluate"}
Relevant Logs/Tracebacks
Here's the numpy warning:
/home/user/miniconda3/envs/my-env/lib/python3.12/site-packages/numpy/_core/fromnumeric.py:3904: RuntimeWarning: Mean of empty slice.
return _methods._mean(a, axis=axis, dtype=dtype,
/home/user/miniconda3/my-env/project/lib/python3.12/site-packages/numpy/_core/_methods.py:147: RuntimeWarning: invalid value encountered in scalar divide
ret = ret.dtype.type(ret / rcount)
Environment:
OS: Ubuntu 22.04
Python Version: 3.12.5
TruLens version: 1.2.6
numpy: 2.1.3
The text was updated successfully, but these errors were encountered:
Hey @drooms-sandrus - not sure if this is so clearcut. I agree with your view that an LLM response with all trivial statements is undesirable, but groundedness would not be the right metric to detect that issue.
For examples like yours or similar examples where the response has only trivial statements, I would expect this to be detected by lower answer relevance score.
Hi @sfc-gh-jreini, I agree that answer relevance might be the right metric to detect whether all statements are trivial (or rather, irrelevant to the user query).
The point I'm making is from a software engineering perspective: The documentation states for the return value: "A tuple containing a value between 0.0 (not grounded) and 1.0 (grounded) and a dictionary containing the reasons for the evaluation."
Downstream code can therefore expect the first tuple element to be between 0 and 1. Similarly, downstream code will likely not expect NaN to be returned for the first element, which may lead to issues. If NaN is a desired possible output, I suggest to document it.
Further, it may be desirable to update the return type hint. While float in Tuple[float, dict] (Tuple is deprecated btw) also encompasses NaN, explicit is better than implicit. The pydantic type hint AllowInfNan is a suitable candidate to convey that down stream code must handle potential NaN values (alternatively, it's also possible to construct more precise type-hints via pydantic, and have them enforced, such as "NumberBetween0and1AllowsInfNan", but this may be overkill).
Thanks @drooms-sandrus - I like your suggestion to document the option for NaN in the case that only trivial statements are evaluated, and to update the typehint. @daniel-huang-1230 what do you think about this change?
Bug Description
The
groundedness_measure_with_cot_reasons_consider_answerability
function returns NaN iffilter_trivial_statements=True
and all statements are filtered in_remove_trivial_statements()
. It's likely that similar behaviour occurs for othergroundedness
functions, as they also may filter all statements.hypotheses
will be an empty list, consequentlyresults
will be an empty list, consequentlygroundedness_scores
will be an empty dict, consequently the following is computed over an empty dict, which computesNaN
:To Reproduce
Expected behavior
I expect 0 to be returned if there are no statements left after filtering non-trivial statements, as NaN is not a value between 0.0 and 1.0 (as documented).
Relevant Logs/Tracebacks
Here's the numpy warning:
Environment:
The text was updated successfully, but these errors were encountered: