Skip to content

Huge Difference in prediction confidence score #5

@prasadautomationtesting

Description

@prasadautomationtesting

I noticed a significant difference in confidence scores between the base and large models for the same input. While I understand that model size can affect this, I'm curious about why the score difference is so substantial.

Base:

from lettucedetect.models.inference import HallucinationDetector

detector = HallucinationDetector(
method="transformer",
model_path="KRLabsOrg/lettucedect-base-modernbert-en-v1"
)

contexts = ["France is a country in Europe. The capital of France is Paris. The population of France is 67 million.",]
question = "What is the capital of France? What is the population of France?"
answer = "The capital of France is Paris. The population of France is 69 million."

predictions = detector.predict(context=contexts, question=question, answer=answer, output_format="spans")
print("Predictions:", predictions)

Output:
Predictions: [{'start': 31, 'end': 71, 'confidence': 0.9891987442970276, 'text': ' The population of France is 69 million.'}]

Large:

from lettucedetect.models.inference import HallucinationDetector

detector = HallucinationDetector(
method="transformer",
model_path="KRLabsOrg/lettucedect-large-modernbert-en-v1"
)

contexts = ["France is a country in Europe. The capital of France is Paris. The population of France is 67 million.",]
question = "What is the capital of France? What is the population of France?"
answer = "The capital of France is Paris. The population of France is 69 million."

predictions = detector.predict(context=contexts, question=question, answer=answer, output_format="spans")
print("Predictions:", predictions)

Output:
Predictions: [{'start': 31, 'end': 71, 'confidence': 0.7649378180503845, 'text': ' The population of France is 69 million.'}]

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions