Skip to content
Open
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
34,964 changes: 34,964 additions & 0 deletions canary_analysis/evidence_token_overlap.json

Large diffs are not rendered by default.

32,762 changes: 32,762 additions & 0 deletions canary_analysis/per_bucket_agree_rates.json

Large diffs are not rendered by default.

1,509 changes: 1,509 additions & 0 deletions canary_analysis/results_dataset_canary/gemma3_12b/base_chemqa.jsonl

Large diffs are not rendered by default.

57 changes: 57 additions & 0 deletions canary_analysis/results_dataset_canary/gemma3_12b/base_chemqa.log

Large diffs are not rendered by default.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@yuehu99 add code . github is for code

Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
{
"n": 1509,
"file": "/data/yue/Luis_data/ChemQA/eval/results_dataset_final/gemma3_12b/base_chemqa.jsonl",
"CIDEr": 0.007813678111936988,
"BLEU-1": 0.1742750660854814,
"BLEU-4": 0.01952830904850304,
"ROUGE-L": 0.14044466257598223
}
1,509 changes: 1,509 additions & 0 deletions canary_analysis/results_dataset_canary/gemma3_12b/ft_chemqa.jsonl

Large diffs are not rendered by default.

169 changes: 169 additions & 0 deletions canary_analysis/results_dataset_canary/gemma3_12b/ft_chemqa.log

Large diffs are not rendered by default.

Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
{
"n": 1509,
"file": "/data/yue/Luis_data/ChemQA/eval/results_dataset_final/gemma3_12b/ft_chemqa.jsonl",
"CIDEr": 0.3017220394119498,
"BLEU-1": 0.39298091390123585,
"BLEU-4": 0.12178643557164551,
"ROUGE-L": 0.2854779966380466
}
1,509 changes: 1,509 additions & 0 deletions canary_analysis/results_dataset_canary/llama3_1_8b/base_chemqa.jsonl

Large diffs are not rendered by default.

52 changes: 52 additions & 0 deletions canary_analysis/results_dataset_canary/llama3_1_8b/base_chemqa.log

Large diffs are not rendered by default.

Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
{
"n": 1509,
"file": "/data/yue/Luis_data/ChemQA/eval/results_dataset_final/llama3_1_8b/base_chemqa.jsonl",
"CIDEr": 0.007154326827312512,
"BLEU-1": 0.1739095396393665,
"BLEU-4": 0.024405300152486317,
"ROUGE-L": 0.14639087486269248
}
1,509 changes: 1,509 additions & 0 deletions canary_analysis/results_dataset_canary/llama3_1_8b/ft_chemqa.jsonl

Large diffs are not rendered by default.

55 changes: 55 additions & 0 deletions canary_analysis/results_dataset_canary/llama3_1_8b/ft_chemqa.log

Large diffs are not rendered by default.

Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
{
"n": 1509,
"file": "/data/yue/Luis_data/ChemQA/eval/results_dataset_final/llama3_1_8b/ft_chemqa.jsonl",
"CIDEr": 0.275332119430238,
"BLEU-1": 0.3806338204175452,
"BLEU-4": 0.1132892398893474,
"ROUGE-L": 0.27707005740155305
}
1,509 changes: 1,509 additions & 0 deletions canary_analysis/results_dataset_canary/qwen2_5_14b/base_chemqa.jsonl

Large diffs are not rendered by default.

56 changes: 56 additions & 0 deletions canary_analysis/results_dataset_canary/qwen2_5_14b/base_chemqa.log

Large diffs are not rendered by default.

Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
{
"n": 1509,
"file": "/data/yue/Luis_data/ChemQA/eval/results_dataset_final/qwen2_5_14b/base_chemqa.jsonl",
"CIDEr": 0.0020658071237885283,
"BLEU-1": 0.1690406209963065,
"BLEU-4": 0.02195909746742649,
"ROUGE-L": 0.15548318474879316
}
1,509 changes: 1,509 additions & 0 deletions canary_analysis/results_dataset_canary/qwen2_5_14b/ft_chemqa.jsonl

Large diffs are not rendered by default.

59 changes: 59 additions & 0 deletions canary_analysis/results_dataset_canary/qwen2_5_14b/ft_chemqa.log

Large diffs are not rendered by default.

Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
{
"n": 1509,
"file": "/data/yue/Luis_data/ChemQA/eval/results_dataset_final/qwen2_5_14b/ft_chemqa.jsonl",
"CIDEr": 0.31432345504993275,
"BLEU-1": 0.39555647304724456,
"BLEU-4": 0.12470210604655615,
"ROUGE-L": 0.2863386757828791
}
12 changes: 12 additions & 0 deletions canary_analysis/results_dataset_canary/summary.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,12 @@
# dataset_final Evaluation Results

Flattened from `dataset_final.jsonl` (compound-level QA pairs); reference = `phase2_answer`.

| model | variant | n | CIDEr | BLEU-1 | BLEU-4 | ROUGE-L |
|---|---|---|---|---|---|---|
| gemma3_12b | base | 1509 | 0.0078 | 0.1743 | 0.0195 | 0.1404 |
| gemma3_12b | finetuned | 1509 | 0.3017 | 0.3930 | 0.1218 | 0.2855 |
| llama3_1_8b | base | 1509 | 0.0072 | 0.1739 | 0.0244 | 0.1464 |
| llama3_1_8b | finetuned | 1509 | 0.2753 | 0.3806 | 0.1133 | 0.2771 |
| qwen2_5_14b | base | 1509 | 0.0021 | 0.1690 | 0.0220 | 0.1555 |
| qwen2_5_14b | finetuned | 1509 | 0.3143 | 0.3956 | 0.1247 | 0.2863 |
Comment on lines +5 to +12

Copilot AI Apr 24, 2026

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The results table rows start with || (double pipe), so the Markdown table won't render correctly. Use a single leading | for each row.

Copilot uses AI. Check for mistakes.
8 changes: 8 additions & 0 deletions canary_analysis/summary_ft_canary.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
| model | variant | n | CIDEr | BLEU-1 | BLEU-4 | ROUGE-L |
|---|---|---|---|---|---|---|
| gemma3_12b | base | 1509 | 0.0078 | 0.1743 | 0.0195 | 0.1404 |
| gemma3_12b | finetuned | 1509 | 0.3017 | 0.3930 | 0.1218 | 0.2855 |
| llama3_1_8b | base | 1509 | 0.0072 | 0.1739 | 0.0244 | 0.1464 |
| llama3_1_8b | finetuned | 1509 | 0.2753 | 0.3806 | 0.1133 | 0.2771 |
| qwen2_5_14b | base | 1509 | 0.0021 | 0.1690 | 0.0220 | 0.1555 |
| qwen2_5_14b | finetuned | 1509 | 0.3143 | 0.3956 | 0.1247 | 0.2863 |
Comment on lines +1 to +8

Copilot AI Apr 24, 2026

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Markdown table is prefixed with || (double pipe) on every row, which prevents GitHub from rendering it as a table. Use a single leading | per row (and remove the extra leading space) so the table renders correctly.

Copilot uses AI. Check for mistakes.