From d2281bbc04b1d26d5055c1d8967c6ac33938adfe Mon Sep 17 00:00:00 2001
From: Akar <67700732+akrztrk@users.noreply.github.com>
Date: Thu, 19 Dec 2024 20:34:25 +0100
Subject: [PATCH] upload new benchmark table (#1669)

* upload new benchmark table

* upload llm bencmark table

* update typo

* Update benchmark.md
---
 docs/en/benchmark.md     | 28 ++++++++++++++++++++++++++++
 docs/en/benchmark_llm.md | 27 ++++++++++++++++++++++++++-
 2 files changed, 54 insertions(+), 1 deletion(-)
diff --git a/docs/en/benchmark.md b/docs/en/benchmark.md
index 0a8ef32720..51effc2445 100644
--- a/docs/en/benchmark.md
+++ b/docs/en/benchmark.md
@@ -659,6 +659,34 @@ deid_pipeline = Pipeline().setStages([
 
 PS: The reason why pipelines with the same stages have different costs is due to the layers of the NER model and the hardcoded regexes in Deidentification.
 
+
+- ZeroShot Deidentification Pipelines Speed Comparison
+  
+    - **[clinical_deidentification](https://nlp.johnsnowlabs.com/2024/03/27/clinical_deidentification_en.html)** 2 NER, 1 clinical embedding, 13 Rule-based NER, 3 chunk merger, 1 Deidentification
+
+    - **[clinical_deidentification_zeroshot_medium](https://nlp.johnsnowlabs.com/2024/12/04/clinical_deidentification_zeroshot_medium_en.html)** 1 ZeroShotNER, 18 Rule-based NER, 2 chunk merger 
+
+    - **[clinical_deidentification_docwise_medium_wip](https://nlp.johnsnowlabs.com/2024/12/03/clinical_deidentification_docwise_medium_wip_en.html)** 1 ZeroShotNER, 4 NER, 1 clinical embedding, 18 Rule-based NER,  3 chunk merger, 1 Deidentification
+
+    - **[clinical_deidentification_zeroshot_large](https://nlp.johnsnowlabs.com/2024/12/04/clinical_deidentification_zeroshot_large_en.html)** 1 ZeroShotNER, 18 Rule-based NER, 2 chunk merger 
+
+    - **[clinical_deidentification_docwise_large_wip](https://nlp.johnsnowlabs.com/2024/12/03/clinical_deidentification_docwise_large_wip_en.html)** 1 ZeroShotNER, 4 NER, 1 clinical embedding, 18 Rule-based NER, 3 chunk merger, 1 Deidentification
+
+- CPU Testing:
+
+{:.table-model-big.db}
+
+| partition | clinical deidendification | clinical deidendification <br> zeroshot_medium | clinical deidendification  <br> docwise_medium_wip | clinical deidendification  <br>  zeroshot_large | clinical deidendification  <br> docwise_large_wip |
+|-----------|---------------------------|-------------------------------------------|----------------------------------------------|------------------------------------------|---------------------------------------------|
+|         4 |                     295.8 |                                     520.8 |                                        862.7 |                                   1537.9 |                                      1832.4 |
+|         8 |                     195.0 |                                     345.6 |                                        577.0 |                                   1013.9 |                                      1228.3 |
+|        16 |                     133.3 |                                     227.2 |                                        401.8 |                                    666.2 |                                       835.2 |
+|        32 |                     109.5 |                                     160.9 |                                        305.3 |                                    456.9 |                                       614.7 |
+|        64 |                      92.0 |                                     166.8 |                                        291.5 |                                    465.0 |                                       584.9 |
+|       100 |                      79.3 |                                     174.1 |                                        274.8 |                                    495.3 |                                       587.8 |
+|      1000 |                      56.3 |                                     181.4 |                                        270.7 |                                    502.4 |                                       556.4 |
+    
+
 </div><div class="h3-box" markdown="1">
 
 ### Deidentification Pipelines Cost Benchmarks 
diff --git a/docs/en/benchmark_llm.md b/docs/en/benchmark_llm.md
index 05fdc17a99..817532dcd9 100644
--- a/docs/en/benchmark_llm.md
+++ b/docs/en/benchmark_llm.md
@@ -10,6 +10,31 @@ show_nav: true
 sidebar:
     nav: sparknlp-healthcare
 ---
+<div class="h3-box" markdown="1">
+
+##  Medical Benchmarks
+
+### Benchmarking
+
+{:.table-model-big.db}
+
+| Model           | Avarega | MedMCQA | MedQA  | MMLU <br>anotomy | MMLU<br>clinical<br>knowledge | MMLU<br>college<br>biology | MMLU<br>college<br>medicine  | MMLU<br>medical<br>genetics  | MMLU<br>professional<br>medicine  | PubMedQA |
+|-----------------|---------|---------|--------|------------------|-------------------------------|----------------------------|------------------------------|------------------------------|-----------------------------------|----------|
+| jsl_medm_q4_v3  | 0.6884  | 0.6421  | 0.6889 | 0.7333           | 0.834                         | 0.8681                     | 0.7514                       | 0.9                          | 0.8493                            | 0.782    |
+| jsl_medm_q8_v3  | 0.6947  | 0.6416  | 0.707  | 0.7556           | 0.8377                        | 0.9097                     | 0.7688                       | 0.9                          | 0.8713                            | 0.79     |
+| jsl_medm_q16_v3 | 0.6964  | 0.6436  | 0.7117 | 0.7481           | 0.8453                        | 0.9028                     | 0.7688                       | 0.87                         | 0.8676                            | 0.794    |
+| jsl_meds_q4_v3  | 0.5522  | 0.5104  | 0.48   | 0.6444           | 0.7472                        | 0.8333                     | 0.6532                       | 0.68                         | 0.6691                            | 0.752    |
+| jsl_meds_q8_v3  | 0.5727  | 0.53    | 0.4933 | 0.6593           | 0.7623                        | 0.8681                     | 0.6301                       | 0.76                         | 0.7647                            | 0.762    |
+| jsl_meds_q16_v3 | 0.5793  | 0.5482  | 0.4839 | 0.637            | 0.7585                        | 0.8403                     | 0.6532                       | 0.77                         | 0.7022                            | 0.766    |
+</div><div class="h3-box" markdown="1">
+
+### Benchmark Summary
+
+We evaluated six Johnsnow Lab LLM models across ten task categories: MedMCQA, MedQA, MMLU Anatomy, MMLU Clinical Knowledge, MMLU College Biology, MMLU College Medicine, MMLU Medical Genetics, MMLU Professional Medicine, and PubMedQA.
+
+Each model's performance was measured based on accuracy, reflecting how well it handled medical reasoning, clinical knowledge, and biomedical question answering. 
+
+</div><div class="h3-box" markdown="1">
 
 <div class="h3-box" markdown="1">
 
@@ -204,4 +229,4 @@ GPT4o demonstrates strength in Clinical Relevance, especially in Biomedical and
 Neutral and "None" ratings across categories highlight areas for further optimization for both models.
 This analysis underscores the strengths of JSL-MedM in producing concise and factual outputs, while GPT4o shows a stronger contextual understanding in certain specialized tasks.
 
-</div>
\ No newline at end of file
+</div>