From 73b7b70d39df8c0c7e63b85929fe49642b9a3192 Mon Sep 17 00:00:00 2001 From: Vildan <64216738+Meryem1425@users.noreply.github.com> Date: Wed, 9 Oct 2024 18:21:33 -0400 Subject: [PATCH] Models hub internal (#1542) --- ...al_deidentification_nameAugmented_v2_en.md | 140 +++++++++++++ ...0-08-loinc_numeric_resolver_pipeline_en.md | 112 ++++++++++ .../2024-10-08-loinc_resolver_pipeline_en.md | 113 ++++++++++ ...0-07-sbiobertresolve_loinc_augmented_en.md | 180 ++++++++++++++++ .../2024-10-07-sbiobertresolve_loinc_en.md | 197 ++++++++++++++++++ ...obertresolve_loinc_numeric_augmented_en.md | 186 +++++++++++++++++ ...10-08-biolordresolve_loinc_augmented_en.md | 184 ++++++++++++++++ .../akrztrk/2024-10-08-jsl_medm_q16_v2_en.md | 134 ++++++++++++ .../akrztrk/2024-10-08-jsl_medm_q8_v3_en.md | 139 ++++++++++++ ...0-08-loinc_numeric_resolver_pipeline_en.md | 112 ++++++++++ .../2024-10-08-loinc_resolver_pipeline_en.md | 113 ++++++++++ ...0-08-loinc_numeric_resolver_pipeline_en.md | 112 ++++++++++ .../2024-10-08-loinc_resolver_pipeline_en.md | 113 ++++++++++ 13 files changed, 1835 insertions(+) create mode 100644 docs/_posts/Cabir40/2024-10-07-clinical_deidentification_nameAugmented_v2_en.md create mode 100644 docs/_posts/Meryem1425/2024-10-08-loinc_numeric_resolver_pipeline_en.md create mode 100644 docs/_posts/Meryem1425/2024-10-08-loinc_resolver_pipeline_en.md create mode 100644 docs/_posts/akrztrk/2024-10-07-sbiobertresolve_loinc_augmented_en.md create mode 100644 docs/_posts/akrztrk/2024-10-07-sbiobertresolve_loinc_en.md create mode 100644 docs/_posts/akrztrk/2024-10-07-sbiobertresolve_loinc_numeric_augmented_en.md create mode 100644 docs/_posts/akrztrk/2024-10-08-biolordresolve_loinc_augmented_en.md create mode 100644 docs/_posts/akrztrk/2024-10-08-jsl_medm_q16_v2_en.md create mode 100644 docs/_posts/akrztrk/2024-10-08-jsl_medm_q8_v3_en.md create mode 100644 docs/_posts/akrztrk/2024-10-08-loinc_numeric_resolver_pipeline_en.md create mode 100644 docs/_posts/akrztrk/2024-10-08-loinc_resolver_pipeline_en.md create mode 100644 docs/_posts/bugeki/2024-10-08-loinc_numeric_resolver_pipeline_en.md create mode 100644 docs/_posts/bugeki/2024-10-08-loinc_resolver_pipeline_en.md diff --git a/docs/_posts/Cabir40/2024-10-07-clinical_deidentification_nameAugmented_v2_en.md b/docs/_posts/Cabir40/2024-10-07-clinical_deidentification_nameAugmented_v2_en.md new file mode 100644 index 0000000000..81d338bb44 --- /dev/null +++ b/docs/_posts/Cabir40/2024-10-07-clinical_deidentification_nameAugmented_v2_en.md @@ -0,0 +1,140 @@ +--- +layout: model +title: Clinical Deidentification Pipeline (Sentence Wise) +author: John Snow Labs +name: clinical_deidentification_nameAugmented_v2 +date: 2024-10-07 +tags: [deidentification, deid, en, licensed, clinical, pipeline, sent_wise] +task: [De-identification, Pipeline Healthcare] +language: en +edition: Healthcare NLP 5.4.0 +spark_version: 3.4 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +This pipeline can be used to deidentify PHI information from medical texts. The PHI information will be masked and obfuscated in the resulting text. +The pipeline can mask and obfuscate `MEDICALRECORD`, `ORGANIZATION`, `PROFESSION`, `HEALTHPLAN`, `NAME`, `LOCATION-OTHER`, `URL`, `DEVICE`, `CITY`, `DATE`, `ZIP`, `STATE`, +`COUNTRY`, `STREET`, `PHONE`, `LOCATION`, `EMAIL`, `IDNUM`, `BIOID`, `FAX`, `AGE`, `LOCATION_OTHER`, `DLN`, `CONTACT`, `ID`, `SSN`, `ACCOUNT`, `PLATE`, `VIN`, `LICENSE`, +`IP` entities. + +## Predicted Entities + +`MEDICALRECORD`, `ORGANIZATION`, `PROFESSION`, `HEALTHPLAN`, `NAME`, `LOCATION-OTHER`, `URL`, `DEVICE`, `CITY`, `DATE`, `ZIP`, `STATE`, +`COUNTRY`, `STREET`, `PHONE`, `LOCATION`, `EMAIL`, `IDNUM`, `BIOID`, `FAX`, `AGE`, `LOCATION_OTHER`, `DLN`, `CONTACT`, `ID`, `SSN`, `ACCOUNT`, `PLATE`, `VIN`, `LICENSE`, +`IP` + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/clinical/models/clinical_deidentification_nameAugmented_v2_en_5.4.0_3.4_1728315719478.zip){:.button.button-orange.button-orange-trans.arr.button-icon.hidden} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/clinical/models/clinical_deidentification_nameAugmented_v2_en_5.4.0_3.4_1728315719478.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} + +```python +from sparknlp.pretrained import PretrainedPipeline + +deid_pipeline = PretrainedPipeline("clinical_deidentification_nameAugmented_v2", "en", "clinical/models") + +text = """Dr. John Lee, from Royal Medical Clinic in Chicago, attended to the patient on 11/05/2024. +The patient’s medical record number is 56467890. +The patient, Emma Wilson, is 50 years old, her Contact number: 444-456-7890 .""" + +deid_result = deid_pipeline.fullAnnotate(text) + +print(''.join([i.metadata['masked'] for i in deid_result['obfuscated']])) +print(''.join([i.result for i in deid_result['obfuscated']])) +``` +```scala +import com.johnsnowlabs.nlp.pretrained.PretrainedPipeline + +val deid_pipeline = PretrainedPipeline("clinical_deidentification_nameAugmented_v2", "en", "clinical/models") + +val text = """Dr. John Lee, from Royal Medical Clinic in Chicago, attended to the patient on 11/05/2024. +The patient’s medical record number is 56467890. +The patient, Emma Wilson, is 50 years old, her Contact number: 444-456-7890 .""" + +val deid_result = deid_pipeline.fullAnnotate(text) + +println(deid_result("obfuscated").map(_("metadata")("masked").toString).mkString("")) +println(deid_result("obfuscated").map(_("result").toString).mkString("")) +``` +
+ +## Results + +```bash +Masked with entity labels +------------------------------ +Dr. , from in , attended to the patient on . +The patient’s medical record number is . +The patient, , is years old, her Contact number: . + +Obfuscated +------------------------------ +Dr. Rhodia Cera, from 252 Mchenry St in UNTERLAND, attended to the patient on 18/06/2024. +The patient’s medical record number is 16109604. +The patient, Eulice Hickory, is 44 years old, her Contact number: 540-981-1914 . +``` + +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|clinical_deidentification_nameAugmented_v2| +|Type:|pipeline| +|Compatibility:|Healthcare NLP 5.4.0+| +|License:|Licensed| +|Edition:|Official| +|Language:|en| +|Size:|1.9 GB| + +## Included Models + +- DocumentAssembler +- SentenceDetectorDLModel +- TokenizerModel +- WordEmbeddingsModel +- NerDLModel +- NerConverterInternalModel +- WordEmbeddingsModel +- MedicalNerModel +- NerConverterInternalModel +- MedicalNerModel +- NerConverterInternalModel +- MedicalNerModel +- NerConverterInternalModel +- ChunkMergeModel +- ContextualParserModel +- ContextualParserModel +- ContextualParserModel +- ContextualParserModel +- ContextualParserModel +- ContextualParserModel +- ContextualParserModel +- RegexMatcherInternalModel +- ContextualParserModel +- ContextualParserModel +- TextMatcherInternalModel +- TextMatcherInternalModel +- TextMatcherInternalModel +- ContextualParserModel +- RegexMatcherInternalModel +- RegexMatcherInternalModel +- RegexMatcherInternalModel +- ChunkMergeModel +- ChunkMergeModel +- DeIdentificationModel +- Finisher diff --git a/docs/_posts/Meryem1425/2024-10-08-loinc_numeric_resolver_pipeline_en.md b/docs/_posts/Meryem1425/2024-10-08-loinc_numeric_resolver_pipeline_en.md new file mode 100644 index 0000000000..eaea5062fe --- /dev/null +++ b/docs/_posts/Meryem1425/2024-10-08-loinc_numeric_resolver_pipeline_en.md @@ -0,0 +1,112 @@ +--- +layout: model +title: Pipeline for Logical Observation Identifiers Names and Codes (LOINC-Numeric) +author: John Snow Labs +name: loinc_numeric_resolver_pipeline +date: 2024-10-08 +tags: [licensed, en, clinical, loinc, pipeline, resolver] +task: [Entity Resolution, Pipeline Healthcare] +language: en +edition: Healthcare NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +This pipeline extracts `TEST` entities and maps them to their correspondings Logical Observation Identifiers Names and Codes(LOINC) codes using `sbiobert_base_cased_mli` sentence embeddings. It was prepared with the numeric LOINC codes, without the inclusion of LOINC “Document Ontology” codes starting with the letter “L”. It also provides the official resolution of the codes within the brackets. + +## Predicted Entities + +`TEST`, `Test_Result` + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/clinical/models/loinc_numeric_resolver_pipeline_en_5.5.0_3.0_1728417134407.zip){:.button.button-orange.button-orange-trans.arr.button-icon.hidden} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/clinical/models/loinc_numeric_resolver_pipeline_en_5.5.0_3.0_1728417134407.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} + +```python + +from sparknlp.pretrained import PretrainedPipeline + +loinc_pipeline = PretrainedPipeline("loinc_numeric_resolver_pipeline", "en", "clinical/models") + +text = """A 65-year-old woman presents to the office with generalized fatigue for the last 4 months. She used to walk 1 mile each evening but now gets tired after 1-2 blocks. She has a history of Crohn disease and hypertension for which she receives appropriate medications. She is married and lives with her husband. She eats a balanced diet that includes chicken, fish, pork, fruits, and vegetables. She rarely drinks alcohol and denies tobacco use. A physical examination is unremarkable. Laboratory studies show the following: +Hemoglobin: 9.8 g/dL +Hematocrit: 32% +Mean Corpuscular Volume: 110 μm3""" + +result = loinc_pipeline.fullAnnotate(text) + +``` +```scala + +import com.johnsnowlabs.nlp.pretrained.PretrainedPipeline + +val loinc_pipeline = PretrainedPipeline("loinc_numeric_resolver_pipeline", "en", "clinical/models") + +val text = """A 65-year-old woman presents to the office with generalized fatigue for the last 4 months. She used to walk 1 mile each evening but now gets tired after 1-2 blocks. She has a history of Crohn disease and hypertension for which she receives appropriate medications. She is married and lives with her husband. She eats a balanced diet that includes chicken, fish, pork, fruits, and vegetables. She rarely drinks alcohol and denies tobacco use. A physical examination is unremarkable. Laboratory studies show the following: +Hemoglobin: 9.8 g/dL +Hematocrit: 32% +Mean Corpuscular Volume: 110 μm3""" + +val result = loinc_pipeline.fullAnnotate(text) + +``` +
+ +## Results + +```bash + ++-----------------------+-----+---+-------+-----------------------------------------------------------------+-----------------------------------------------------------------+-----------------------------------------------------------------+ +| chunks|begin|end| code| all_codes| resolutions| all_distances| ++-----------------------+-----+---+-------+-----------------------------------------------------------------+-----------------------------------------------------------------+-----------------------------------------------------------------+ +| A physical examination| 443|464|55286-9|[55286-9, 11384-5, 29544-4, 29545-1, 32427-7, 11435-5, 29271-4...|[Physical exam by body areas [Physical exam by body areas], Ph...|[0.0713, 0.0913, 0.0910, 0.0961, 0.1114, 0.1119, 0.1153, 0.112...| +| Laboratory studies| 483|500|26436-6|[26436-6, 52482-7, 11502-2, 34075-2, 100455-5, 85069-3, 101129...|[Laboratory studies (set) [Laboratory studies (set)], Laborato...|[0.0469, 0.0648, 0.0748, 0.0947, 0.0967, 0.1285, 0.1257, 0.129...| +| Hemoglobin| 522|531|10346-5|[10346-5, 15082-1, 11559-2, 2030-5, 34618-9, 38896-7, 717-9, 1...|[Haemoglobin [Hemoglobin A [Units/volume] in Blood by Electrop...|[0.0214, 0.0356, 0.0563, 0.0654, 0.0886, 0.0891, 0.1005, 0.105...| +| Hematocrit| 543|552|32354-3|[32354-3, 20570-8, 11153-4, 13508-7, 104874-3, 42908-4, 11559-...|[Hematocrit [Volume Fraction] of Arterial blood [Hematocrit [V...|[0.0590, 0.0625, 0.0675, 0.0737, 0.0890, 0.1035, 0.1060, 0.107...| +|Mean Corpuscular Volume| 559|581|30386-7|[30386-7, 101864-7, 20161-6, 18033-1, 19853-1, 101150-1, 59117...|[Erythrocyte mean corpuscular diameter [Length] [Erythrocyte m...|[0.1344, 0.1333, 0.1350, 0.1359, 0.1353, 0.1427, 0.1523, 0.147...| ++-----------------------+-----+---+-------+-----------------------------------------------------------------+-----------------------------------------------------------------+-----------------------------------------------------------------+ + +``` + +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|loinc_numeric_resolver_pipeline| +|Type:|pipeline| +|Compatibility:|Healthcare NLP 5.5.0+| +|License:|Licensed| +|Edition:|Official| +|Language:|en| +|Size:|2.8 GB| + +## Included Models + +- DocumentAssembler +- SentenceDetectorDLModel +- TokenizerModel +- WordEmbeddingsModel +- MedicalNerModel +- NerConverterInternalModel +- MedicalNerModel +- NerConverterInternalModel +- ChunkMergeModel +- Chunk2Doc +- BertSentenceEmbeddings +- SentenceEntityResolverModel diff --git a/docs/_posts/Meryem1425/2024-10-08-loinc_resolver_pipeline_en.md b/docs/_posts/Meryem1425/2024-10-08-loinc_resolver_pipeline_en.md new file mode 100644 index 0000000000..023f34fc98 --- /dev/null +++ b/docs/_posts/Meryem1425/2024-10-08-loinc_resolver_pipeline_en.md @@ -0,0 +1,113 @@ +--- +layout: model +title: Pipeline for Logical Observation Identifiers Names and Codes (LOINC) +author: John Snow Labs +name: loinc_resolver_pipeline +date: 2024-10-08 +tags: [licensed, en, loinc, pipeline, resolver] +task: [Entity Resolution, Pipeline Healthcare] +language: en +edition: Healthcare NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +This pipeline extracts `Test` entities from clinical texts and maps them to their corresponding Logical Observation Identifiers Names and Codes (LOINC) codes using `sbiobert_base_cased_mli` Sentence Bert Embeddings. + +## Predicted Entities + +`loinc_code` + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/clinical/models/loinc_resolver_pipeline_en_5.5.0_3.0_1728412941145.zip){:.button.button-orange.button-orange-trans.arr.button-icon.hidden} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/clinical/models/loinc_resolver_pipeline_en_5.5.0_3.0_1728412941145.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} + +```python + +from sparknlp.pretrained import PretrainedPipeline + +ner_pipeline = PretrainedPipeline("loinc_resolver_pipeline", "en", "clinical/models") + +result = ner_pipeline.annotate("""A 65-year-old woman presents to the office with generalized fatigue for the last 4 months. + She used to walk 1 mile each evening but now gets tired after 1-2 blocks. She has a history of Crohn disease and hypertension + for which she receives appropriate medications. She is married and lives with her husband. She eats a balanced diet that + includes chicken, fish, pork, fruits, and vegetables. She rarely drinks alcohol and denies tobacco use. Her vital signs are + within normal limits. A physical examination is unremarkable. Laboratory studies show the following: Hemoglobin: 9.8 g/dL, + Hematocrit: 32%, Mean Corpuscular Volume: 110 μm3""") + +``` +```scala + +import com.johnsnowlabs.nlp.pretrained.PretrainedPipeline + +val ner_pipeline = PretrainedPipeline("loinc_resolver_pipeline", "en", "clinical/models") + +val result = ner_pipeline.annotate("""A 65-year-old woman presents to the office with generalized fatigue for the last 4 months. + She used to walk 1 mile each evening but now gets tired after 1-2 blocks. She has a history of Crohn disease and hypertension + for which she receives appropriate medications. She is married and lives with her husband. She eats a balanced diet that + includes chicken, fish, pork, fruits, and vegetables. She rarely drinks alcohol and denies tobacco use. Her vital signs are + within normal limits. A physical examination is unremarkable. Laboratory studies show the following: Hemoglobin: 9.8 g/dL, + Hematocrit: 32%, Mean Corpuscular Volume: 110 μm3""") + +``` +
+ +## Results + +```bash + ++-----------------------+-----+----------+----------------------------------------------------------------------+----------------------------------------------------------------------+----------------------------------------------------------------------+ +| chunk|label|loinc_code| resolution| all_codes| all_resolutions| ++-----------------------+-----+----------+----------------------------------------------------------------------+----------------------------------------------------------------------+----------------------------------------------------------------------+ +| Vital signs| Test| 8716-3| Vital signs [Vital signs]|8716-3:::LP133943-3:::LP204118-6:::80339-5:::34566-0:::29274-8:::95...|Vital signs [Vital signs]:::EMS vital signs [EMS vital signs]:::Vit...| +| A physical examination| Test| LP7801-6| Physical exam [Physical exam]|LP7801-6:::LP269267-3:::LP94385-9:::55286-9:::11384-5:::LP133607-4:...|Physical exam [Physical exam]:::Estimated from physical examination...| +| Laboratory studies| Test| LP74124-6| Laboratory studies [Laboratory studies]|LP74124-6:::26436-6:::LP36394-2:::52482-7:::ATTACH.LAB:::11502-2:::...|Laboratory studies [Laboratory studies]:::Laboratory studies (set) ...| +| Hemoglobin| Test| LP14449-0| Hemoglobin [Hemoglobin]|LP14449-0:::LP30929-1:::LP16455-5:::10346-5:::LP16428-2:::LP14554-7...|Hemoglobin [Hemoglobin]:::Hemoglobin G [Hemoglobin G]:::Hemoglobin ...| +| Hematocrit| Test| LP15101-6| Hematocrit [Hematocrit]|LP15101-6:::LP308151-2:::32354-3:::20570-8:::11153-4:::LP74090-9:::...|Hematocrit [Hematocrit]:::Hematocrit/Hemoglobin [Hematocrit/Hemoglo...| +|Mean Corpuscular Volume| Test| LP15191-7|Erythrocyte mean corpuscular volume [Erythrocyte mean corpuscular v...|LP15191-7:::LP17688-0:::LP62885-6:::LP29006-1:::LP66395-2:::LP41110...|Erythrocyte mean corpuscular volume [Erythrocyte mean corpuscular v...| ++-----------------------+-----+----------+----------------------------------------------------------------------+----------------------------------------------------------------------+----------------------------------------------------------------------+ + +``` + +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|loinc_resolver_pipeline| +|Type:|pipeline| +|Compatibility:|Healthcare NLP 5.5.0+| +|License:|Licensed| +|Edition:|Official| +|Language:|en| +|Size:|3.2 GB| + +## Included Models + +- DocumentAssembler +- SentenceDetectorDLModel +- TokenizerModel +- WordEmbeddingsModel +- MedicalNerModel +- NerConverterInternalModel +- MedicalNerModel +- NerConverterInternalModel +- ChunkMergeModel +- Chunk2Doc +- BertSentenceEmbeddings +- SentenceEntityResolverModel diff --git a/docs/_posts/akrztrk/2024-10-07-sbiobertresolve_loinc_augmented_en.md b/docs/_posts/akrztrk/2024-10-07-sbiobertresolve_loinc_augmented_en.md new file mode 100644 index 0000000000..331c2278cc --- /dev/null +++ b/docs/_posts/akrztrk/2024-10-07-sbiobertresolve_loinc_augmented_en.md @@ -0,0 +1,180 @@ +--- +layout: model +title: Sentence Entity Resolver for LOINC (sbiobert_base_cased_mli embeddings) +author: John Snow Labs +name: sbiobertresolve_loinc_augmented +date: 2024-10-07 +tags: [licensed, en, entity_resolution, loinc, clinical] +task: Entity Resolution +language: en +edition: Healthcare NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: SentenceEntityResolverModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +This model maps extracted clinical NER entities to Logical Observation Identifiers Names and Codes(LOINC) codes using `sbiobert_base_cased_mli` Sentence Bert Embeddings. It trained on the augmented version of the dataset which is used in previous LOINC resolver models. It also provides the official resolution of the codes within the brackets. + +## Predicted Entities + +`loinc_code` + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/clinical/models/sbiobertresolve_loinc_augmented_en_5.5.0_3.0_1728318394102.zip){:.button.button-orange.button-orange-trans.arr.button-icon.hidden} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/clinical/models/sbiobertresolve_loinc_augmented_en_5.5.0_3.0_1728318394102.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} + +```python +document_assembler = DocumentAssembler()\ + .setInputCol("text")\ + .setOutputCol("document") + +sentence_detector = SentenceDetectorDLModel.pretrained("sentence_detector_dl_healthcare", "en", "clinical/models") \ + .setInputCols(["document"]) \ + .setOutputCol("sentence") + +tokenizer = Tokenizer()\ + .setInputCols(["sentence"])\ + .setOutputCol("token") + +word_embeddings = WordEmbeddingsModel.pretrained("embeddings_clinical", "en", "clinical/models")\ + .setInputCols(["sentence", "token"])\ + .setOutputCol("embeddings") + +ner_model = MedicalNerModel.pretrained("ner_radiology", "en", "clinical/models") \ + .setInputCols(["sentence", "token", "embeddings"]) \ + .setOutputCol("ner") + +ner_converter = NerConverterInternal() \ + .setInputCols(["sentence", "token", "ner"]) \ + .setOutputCol("ner_chunk")\ + .setWhiteList(["Test"]) + +chunk2doc = Chunk2Doc()\ + .setInputCols("ner_chunk")\ + .setOutputCol("ner_chunk_doc") + +sbert_embedder = BertSentenceEmbeddings.pretrained("sbiobert_base_cased_mli","en","clinical/models")\ + .setInputCols(["ner_chunk_doc"])\ + .setOutputCol("sbert_embeddings")\ + .setCaseSensitive(False) + +resolver = SentenceEntityResolverModel.pretrained("sbiobertresolve_loinc_augmented","en", "clinical/models") \ + .setInputCols(["sbert_embeddings"]) \ + .setOutputCol("resolution")\ + .setDistanceFunction("EUCLIDEAN") + + +nlpPipeline = Pipeline(stages=[document_assembler, + sentence_detector, + tokenizer, + word_embeddings, + ner_model, + ner_converter, + chunk2doc, + sbert_embedder, + resolver]) + +data = spark.createDataFrame([["""The patient is a 22-year-old female with a history of obesity. She has a Body mass index (BMI) of 33.5 kg/m2, aspartate aminotransferase 64, and alanine aminotransferase 126."""]]).toDF("text") + +result = nlpPipeline.fit(data).transform(data) +``` +```scala +val document_assembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val sentence_detector = SentenceDetectorDLModel.pretrained("sentence_detector_dl_healthcare","en","clinical/models") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val tokenizer = new Tokenizer() + .setInputCols(Array("sentence")) + .setOutputCol("token") + +val word_embeddings = WordEmbeddingsModel.pretrained("embeddings_clinical","en","clinical/models") + .setInputCols(Array("sentence","token")) + .setOutputCol("embeddings") + +val ner_model = MedicalNerModel.pretrained("ner_radiology","en","clinical/models") + .setInputCols(Array("sentence","token","embeddings")) + .setOutputCol("ner") + +val ner_converter = new NerConverterInternal() + .setInputCols(Array("sentence","token","ner")) + .setOutputCol("ner_chunk") + .setWhiteList(Array("Test")) + +val chunk2doc = new Chunk2Doc() + .setInputCols("ner_chunk") + .setOutputCol("ner_chunk_doc") + +val sbert_embedder = BertSentenceEmbeddings.pretrained("sbiobert_base_cased_mli","en","clinical/models") + .setInputCols(Array("ner_chunk_doc")) + .setOutputCol("sbert_embeddings") + .setCaseSensitive(false) + +val resolver = SentenceEntityResolverModel.pretrained("sbiobertresolve_loinc_augmented","en","clinical/models") + .setInputCols(Array("sbert_embeddings")) + .setOutputCol("resolution") + .setDistanceFunction("EUCLIDEAN") + +val nlpPipeline = new Pipeline().setStages(Array( + document_assembler, + sentence_detector, + tokenizer, + word_embeddings, + ner_model, + ner_converter, + chunk2doc, + sbert_embedder, + resolver)) + +val data = Seq([["""The patient is a 22-year-old female with a history of obesity. She has a Body mass index (BMI) of 33.5 kg/m2, aspartate aminotransferase 64, and alanine aminotransferase 126."""]]).toDF("text") + +val result = nlpPipeline.fit(data).transform(data) +``` +
+ +## Results + +```bash ++--------------------------+-----+---+---------+----------+-------------------------------------------------------+------------------------------------------------------------+------------------------------------------------------------+------------------------------------------------------------+ +| chunk|begin|end|ner_label|loinc_code| description| resolutions| all_codes| aux_labels| ++--------------------------+-----+---+---------+----------+-------------------------------------------------------+------------------------------------------------------------+------------------------------------------------------------+------------------------------------------------------------+ +| BMI| 90| 92| Test| 39156-5| BMI [Body mass index (BMI) [Ratio]]|BMI [Body mass index (BMI) [Ratio]]:::BM [IDH1 gene exon ...|39156-5:::100305-2:::LP266933-3:::100225-2:::LP241982-0::...|Observation:::Observation:::Observation:::Observation:::O...| +|aspartate aminotransferase| 110|135| Test| LP15426-7|Aspartate aminotransferase [Aspartate aminotransferase]|Aspartate aminotransferase [Aspartate aminotransferase]::...|LP15426-7:::100739-2:::LP307348-5:::LP15333-5:::LP307326-...|Observation:::Observation:::Observation:::Observation:::O...| +| alanine aminotransferase| 145|168| Test| LP15333-5| Alanine aminotransferase [Alanine aminotransferase]|Alanine aminotransferase [Alanine aminotransferase]:::Ala...|LP15333-5:::LP307326-1:::100738-4:::LP307348-5:::LP15426-...|Observation:::Observation:::Observation:::Observation:::O...| ++--------------------------+-----+---+---------+----------+-------------------------------------------------------+------------------------------------------------------------+------------------------------------------------------------+------------------------------------------------------------+ +``` + +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sbiobertresolve_loinc_augmented| +|Compatibility:|Healthcare NLP 5.5.0+| +|License:|Licensed| +|Edition:|Official| +|Input Labels:|[sentence_embeddings]| +|Output Labels:|[loinc_code]| +|Language:|en| +|Size:|1.1 GB| +|Case sensitive:|false| + +## References +This model is trained with augmented version of the LOINC v2.78 dataset released in 2024-08-06. diff --git a/docs/_posts/akrztrk/2024-10-07-sbiobertresolve_loinc_en.md b/docs/_posts/akrztrk/2024-10-07-sbiobertresolve_loinc_en.md new file mode 100644 index 0000000000..5086511b7c --- /dev/null +++ b/docs/_posts/akrztrk/2024-10-07-sbiobertresolve_loinc_en.md @@ -0,0 +1,197 @@ +--- +layout: model +title: Sentence Entity Resolver for Logical Observation Identifiers Names and Codes (LOINC) codes +author: John Snow Labs +name: sbiobertresolve_loinc +date: 2024-10-07 +tags: [licensed, en, entity_resolution, loinc, clinical] +task: Entity Resolution +language: en +edition: Healthcare NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: SentenceEntityResolverModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +This model maps extracted medical entities to Logical Observation Identifiers Names and Codes (LOINC) codes using `sbiobert_base_cased_mli` Sentence Bert Embeddings. +It also provides the official resolution of the codes within the brackets. + +## Predicted Entities + +`loinc_code` + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/clinical/models/sbiobertresolve_loinc_en_5.5.0_3.0_1728321808601.zip){:.button.button-orange.button-orange-trans.arr.button-icon.hidden} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/clinical/models/sbiobertresolve_loinc_en_5.5.0_3.0_1728321808601.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} + +```python + +document_assembler = DocumentAssembler()\ + .setInputCol("text")\ + .setOutputCol("document") + +sentence_detector = SentenceDetectorDLModel.pretrained("sentence_detector_dl_healthcare", "en", "clinical/models") \ + .setInputCols(["document"]) \ + .setOutputCol("sentence") + +tokenizer = Tokenizer()\ + .setInputCols(["sentence"])\ + .setOutputCol("token") + +word_embeddings = WordEmbeddingsModel.pretrained("embeddings_clinical", "en", "clinical/models")\ + .setInputCols(["sentence", "token"])\ + .setOutputCol("embeddings") + +ner_model = MedicalNerModel.pretrained("ner_jsl", "en", "clinical/models") \ + .setInputCols(["sentence", "token", "embeddings"]) \ + .setOutputCol("ner") + +ner_converter = NerConverterInternal() \ + .setInputCols(["sentence", "token", "ner"]) \ + .setOutputCol("ner_chunk")\ + .setWhiteList(["Test"]) + +chunk2doc = Chunk2Doc()\ + .setInputCols("ner_chunk")\ + .setOutputCol("ner_chunk_doc") + +sbert_embedder = BertSentenceEmbeddings.pretrained("sbiobert_base_cased_mli","en","clinical/models")\ + .setInputCols(["ner_chunk_doc"])\ + .setOutputCol("sbert_embeddings")\ + .setCaseSensitive(False) + +resolver = SentenceEntityResolverModel.pretrained("sbiobertresolve_loinc","en", "clinical/models") \ + .setInputCols(["sbert_embeddings"]) \ + .setOutputCol("resolution")\ + .setDistanceFunction("EUCLIDEAN") + + +nlpPipeline = Pipeline(stages=[document_assembler, + sentence_detector, + tokenizer, + word_embeddings, + ner_model, + ner_converter, + chunk2doc, + sbert_embedder, + resolver]) + +data = spark.createDataFrame([["""A 65-year-old woman presents to the office with generalized fatigue for the last 4 months. + She used to walk 1 mile each evening but now gets tired after 1-2 blocks. She has a history of Crohn disease and hypertension + for which she receives appropriate medications. She is married and lives with her husband. She eats a balanced diet that + includes chicken, fish, pork, fruits, and vegetables. She rarely drinks alcohol and denies tobacco use. A physical examination + is unremarkable. Laboratory studies show the following: Hemoglobin: 9.8g/dL, Hematocrit: 32%, Mean Corpuscular Volume: 110 μm3"""]]).toDF("text") + +result = nlpPipeline.fit(data).transform(data) + +``` +```scala + +val document_assembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val sentence_detector = SentenceDetectorDLModel.pretrained("sentence_detector_dl_healthcare","en","clinical/models") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val tokenizer = new Tokenizer() + .setInputCols(Array("sentence")) + .setOutputCol("token") + +val word_embeddings = WordEmbeddingsModel.pretrained("embeddings_clinical","en","clinical/models") + .setInputCols(Array("sentence","token")) + .setOutputCol("embeddings") + +val ner_model = MedicalNerModel.pretrained("ner_jsl","en","clinical/models") + .setInputCols(Array("sentence","token","embeddings")) + .setOutputCol("ner") + +val ner_converter = new NerConverterInternal() + .setInputCols(Array("sentence","token","ner")) + .setOutputCol("ner_chunk") + .setWhiteList(Array("Test")) + +val chunk2doc = new Chunk2Doc() + .setInputCols("ner_chunk") + .setOutputCol("ner_chunk_doc") + +val sbert_embedder = BertSentenceEmbeddings.pretrained("sbiobert_base_cased_mli","en","clinical/models") + .setInputCols(Array("ner_chunk_doc")) + .setOutputCol("sbert_embeddings") + .setCaseSensitive(false) + +val resolver = SentenceEntityResolverModel.pretrained("sbiobertresolve_loinc","en","clinical/models") + .setInputCols(Array("sbert_embeddings")) + .setOutputCol("resolution") + .setDistanceFunction("EUCLIDEAN") + +val nlpPipeline = new Pipeline().setStages(Array( + document_assembler, + sentence_detector, + tokenizer, + word_embeddings, + ner_model, + ner_converter, + chunk2doc, + sbert_embedder, + resolver)) + +val data = Seq([["""A 65-year-old woman presents to the office with generalized fatigue for the last 4 months. + She used to walk 1 mile each evening but now gets tired after 1-2 blocks. She has a history of Crohn disease and hypertension + for which she receives appropriate medications. She is married and lives with her husband. She eats a balanced diet that + includes chicken, fish, pork, fruits, and vegetables. She rarely drinks alcohol and denies tobacco use. A physical examination + is unremarkable. Laboratory studies show the following: Hemoglobin: 9.8g/dL, Hematocrit: 32%, Mean Corpuscular Volume: 110 μm3"""]]).toDF("text") + +val result = nlpPipeline.fit(data).transform(data) + +``` +
+ +## Results + +```bash + ++-----------------------+-----+---+---------+----------+------------------------------------------------------------+------------------------------------------------------------+------------------------------------------------------------+------------------------------------------------------------+ +| chunk|begin|end|ner_label|loinc_code| description| resolutions| all_codes| aux_labels| ++-----------------------+-----+---+---------+----------+------------------------------------------------------------+------------------------------------------------------------+------------------------------------------------------------+------------------------------------------------------------+ +| physical examination| 490|509| Test| 29544-4| Physical findings [Physical findings]|Physical findings [Physical findings]:::Physical exam by ...|29544-4:::55286-9:::11435-5:::11384-5:::29545-1:::8709-8:...|ACTIVE:::ACTIVE:::ACTIVE:::ACTIVE:::ACTIVE:::ACTIVE:::ACT...| +| Laboratory studies| 528|545| Test| 26436-6| Laboratory studies (set) [Laboratory studies (set)]|Laboratory studies (set) [Laboratory studies (set)]:::Lab...|26436-6:::52482-7:::11502-2:::34075-2:::100455-5:::85069-...|ACTIVE:::DISCOURAGED:::ACTIVE:::ACTIVE:::ACTIVE:::ACTIVE:...| +| Hemoglobin| 567|576| Test| 10346-5|Haemoglobin [Hemoglobin A [Units/volume] in Blood by Elec...|Haemoglobin [Hemoglobin A [Units/volume] in Blood by Elec...|10346-5:::15082-1:::11559-2:::2030-5:::34618-9:::38896-7:...|ACTIVE:::ACTIVE:::ACTIVE:::ACTIVE:::ACTIVE:::ACTIVE:::ACT...| +| Hematocrit| 590|599| Test| 32354-3|Hematocrit [Volume Fraction] of Arterial blood [Hematocri...|Hematocrit [Volume Fraction] of Arterial blood [Hematocri...|32354-3:::20570-8:::11153-4:::13508-7:::104874-3:::42908-...|ACTIVE:::ACTIVE:::ACTIVE:::ACTIVE:::ACTIVE:::ACTIVE:::ACT...| +|Mean Corpuscular Volume| 607|629| Test| 30386-7|Erythrocyte mean corpuscular diameter [Length] [Erythrocy...|Erythrocyte mean corpuscular diameter [Length] [Erythrocy...|30386-7:::101864-7:::20161-6:::18033-1:::19853-1:::101150...|ACTIVE:::ACTIVE:::ACTIVE:::ACTIVE:::ACTIVE:::ACTIVE:::ACT...| ++-----------------------+-----+---+---------+----------+------------------------------------------------------------+------------------------------------------------------------+------------------------------------------------------------+------------------------------------------------------------+ + +``` + +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sbiobertresolve_loinc| +|Compatibility:|Healthcare NLP 5.5.0+| +|License:|Licensed| +|Edition:|Official| +|Input Labels:|[sentence_embeddings]| +|Output Labels:|[loinc_code]| +|Language:|en| +|Size:|666.8 MB| +|Case sensitive:|false| + +## References +This model is trained with LOINC v2.78 dataset released in 2024-08-06. diff --git a/docs/_posts/akrztrk/2024-10-07-sbiobertresolve_loinc_numeric_augmented_en.md b/docs/_posts/akrztrk/2024-10-07-sbiobertresolve_loinc_numeric_augmented_en.md new file mode 100644 index 0000000000..bdf9cdbaa4 --- /dev/null +++ b/docs/_posts/akrztrk/2024-10-07-sbiobertresolve_loinc_numeric_augmented_en.md @@ -0,0 +1,186 @@ +--- +layout: model +title: Sentence Entity Resolver for LOINC (sbiobert_base_cased_mli embeddings) +author: John Snow Labs +name: sbiobertresolve_loinc_numeric_augmented +date: 2024-10-07 +tags: [licensed, en, entity_resolution, loinc, clinical] +task: Entity Resolution +language: en +edition: Healthcare NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: SentenceEntityResolverModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +This model maps extracted clinical NER entities to Logical Observation Identifiers Names and Codes(LOINC) codes using `sbiobert_base_cased_mli` Sentence Bert Embeddings. It is trained with the numeric LOINC codes, without the inclusion of LOINC "Document Ontology" codes starting with the letter "L". It also provides the official resolution of the codes within the brackets. + +## Predicted Entities + +`loinc_code` + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/clinical/models/sbiobertresolve_loinc_numeric_augmented_en_5.5.0_3.0_1728331598728.zip){:.button.button-orange.button-orange-trans.arr.button-icon.hidden} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/clinical/models/sbiobertresolve_loinc_numeric_augmented_en_5.5.0_3.0_1728331598728.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} + +```python + +document_assembler = DocumentAssembler()\ + .setInputCol("text")\ + .setOutputCol("document") + +sentence_detector = SentenceDetectorDLModel.pretrained("sentence_detector_dl_healthcare", "en", "clinical/models") \ + .setInputCols(["document"]) \ + .setOutputCol("sentence") + +tokenizer = Tokenizer()\ + .setInputCols(["sentence"])\ + .setOutputCol("token") + +word_embeddings = WordEmbeddingsModel.pretrained("embeddings_clinical", "en", "clinical/models")\ + .setInputCols(["sentence", "token"])\ + .setOutputCol("embeddings") + +ner_model = MedicalNerModel.pretrained("ner_radiology", "en", "clinical/models") \ + .setInputCols(["sentence", "token", "embeddings"]) \ + .setOutputCol("ner") + +ner_converter = NerConverterInternal() \ + .setInputCols(["sentence", "token", "ner"]) \ + .setOutputCol("ner_chunk")\ + .setWhiteList(["Test"]) + +chunk2doc = Chunk2Doc()\ + .setInputCols("ner_chunk")\ + .setOutputCol("ner_chunk_doc") + +sbert_embedder = BertSentenceEmbeddings.pretrained("sbiobert_base_cased_mli","en","clinical/models")\ + .setInputCols(["ner_chunk_doc"])\ + .setOutputCol("sbert_embeddings")\ + .setCaseSensitive(False) + +resolver = SentenceEntityResolverModel.pretrained("sbiobertresolve_loinc_numeric_augmented","en", "clinical/models") \ + .setInputCols(["sbert_embeddings"]) \ + .setOutputCol("resolution")\ + .setDistanceFunction("EUCLIDEAN") + + +nlpPipeline = Pipeline(stages=[document_assembler, + sentence_detector, + tokenizer, + word_embeddings, + ner_model, + ner_converter, + chunk2doc, + sbert_embedder, + resolver]) + +data = spark.createDataFrame([["""The patient is a 22-year-old female with a history of obesity. She has a Body mass index (BMI) of 33.5 kg/m2, aspartate aminotransferase 64, and alanine aminotransferase 126."""]]).toDF("text") + +result = nlpPipeline.fit(data).transform(data) + +``` +```scala + +val document_assembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val sentence_detector = SentenceDetectorDLModel.pretrained("sentence_detector_dl_healthcare","en","clinical/models") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val tokenizer = new Tokenizer() + .setInputCols(Array("sentence")) + .setOutputCol("token") + +val word_embeddings = WordEmbeddingsModel.pretrained("embeddings_clinical","en","clinical/models") + .setInputCols(Array("sentence","token")) + .setOutputCol("embeddings") + +val ner_model = MedicalNerModel.pretrained("ner_radiology","en","clinical/models") + .setInputCols(Array("sentence","token","embeddings")) + .setOutputCol("ner") + +val ner_converter = new NerConverterInternal() + .setInputCols(Array("sentence","token","ner")) + .setOutputCol("ner_chunk") + .setWhiteList(Array("Test")) + +val chunk2doc = new Chunk2Doc() + .setInputCols("ner_chunk") + .setOutputCol("ner_chunk_doc") + +val sbert_embedder = BertSentenceEmbeddings.pretrained("sbiobert_base_cased_mli","en","clinical/models") + .setInputCols(Array("ner_chunk_doc")) + .setOutputCol("sbert_embeddings") + .setCaseSensitive(false) + +val resolver = SentenceEntityResolverModel.pretrained("sbiobertresolve_loinc_numeric_augmented","en","clinical/models") + .setInputCols(Array("sbert_embeddings")) + .setOutputCol("resolution") + .setDistanceFunction("EUCLIDEAN") + +val nlpPipeline = new Pipeline().setStages(Array( + document_assembler, + sentence_detector, + tokenizer, + word_embeddings, + ner_model, + ner_converter, + chunk2doc, + sbert_embedder, + resolver)) + +val data = Seq([["""The patient is a 22-year-old female with a history of obesity. She has a Body mass index (BMI) of 33.5 kg/m2, aspartate aminotransferase 64, and alanine aminotransferase 126."""]]).toDF("text") + +val result = nlpPipeline.fit(data).transform(data) + +``` +
+ +## Results + +```bash + ++--------------------------+-----+---+---------+----------+------------------------------------------------------------+------------------------------------------------------------+------------------------------------------------------------+------------------------------------------------------------+ +| chunk|begin|end|ner_label|loinc_code| description| resolutions| all_codes| aux_labels| ++--------------------------+-----+---+---------+----------+------------------------------------------------------------+------------------------------------------------------------+------------------------------------------------------------+------------------------------------------------------------+ +| BMI| 90| 92| Test| 39156-5| BMI [Body mass index (BMI) [Ratio]]|BMI [Body mass index (BMI) [Ratio]]:::BM [IDH1 gene exon ...|39156-5:::100305-2:::100225-2:::38410-7:::72087-0:::33573...|Observation:::Observation:::Observation:::Observation:::O...| +|aspartate aminotransferase| 110|135| Test| 100739-2|Aspartate transaminase [Aspartate aminotransferase.macrom...|Aspartate transaminase [Aspartate aminotransferase.macrom...|100739-2:::43822-6:::77063-6:::53877-7:::100738-4:::21081...|Observation:::Observation:::Observation:::Observation:::O...| +| alanine aminotransferase| 145|168| Test| 100738-4|Alanine transaminase [Alanine aminotransferase.macromolec...|Alanine transaminase [Alanine aminotransferase.macromolec...|100738-4:::100739-2:::69383-8:::59245-1:::25302-1:::43822...|Observation:::Observation:::Observation:::Observation:::O...| ++--------------------------+-----+---+---------+----------+------------------------------------------------------------+------------------------------------------------------------+------------------------------------------------------------+------------------------------------------------------------+ + +``` + +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sbiobertresolve_loinc_numeric_augmented| +|Compatibility:|Healthcare NLP 5.5.0+| +|License:|Licensed| +|Edition:|Official| +|Input Labels:|[sentence_embeddings]| +|Output Labels:|[loinc_code]| +|Language:|en| +|Size:|667.3 MB| +|Case sensitive:|false| + +## References +This model is trained with augmented version of the LOINC v2.78 dataset released in 2024-08-06. diff --git a/docs/_posts/akrztrk/2024-10-08-biolordresolve_loinc_augmented_en.md b/docs/_posts/akrztrk/2024-10-08-biolordresolve_loinc_augmented_en.md new file mode 100644 index 0000000000..0d06bb51eb --- /dev/null +++ b/docs/_posts/akrztrk/2024-10-08-biolordresolve_loinc_augmented_en.md @@ -0,0 +1,184 @@ +--- +layout: model +title: Sentence Entity Resolver for LOINC Codes - Augmented (mpnet_embeddings_biolord_2023_c embeddings) +author: John Snow Labs +name: biolordresolve_loinc_augmented +date: 2024-10-08 +tags: [licensed, en, biolord, loinc, entity_resolution, clinical] +task: Entity Resolution +language: en +edition: Healthcare NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: SentenceEntityResolverModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +This model maps medical entities to Logical Observation Identifiers Names and Codes(LOINC) codes using `mpnet_embeddings_biolord_2023_c` embeddings. +It trained on the augmented version of the dataset which is used in previous LOINC resolver models. It also provides the official resolution of the codes within the brackets. + +## Predicted Entities + +`loinc_code` + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/clinical/models/biolordresolve_loinc_augmented_en_5.5.0_3.0_1728403801539.zip){:.button.button-orange.button-orange-trans.arr.button-icon.hidden} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/clinical/models/biolordresolve_loinc_augmented_en_5.5.0_3.0_1728403801539.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} + +```python +document_assembler = DocumentAssembler()\ + .setInputCol("text")\ + .setOutputCol("document") + +sentenceDetectorDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl_healthcare", "en", "clinical/models") \ + .setInputCols(["document"])\ + .setOutputCol("sentence") + +tokenizer = Tokenizer()\ + .setInputCols(["sentence"])\ + .setOutputCol("token") + +word_embeddings = WordEmbeddingsModel.pretrained("embeddings_clinical", "en", "clinical/models")\ + .setInputCols(["sentence", "token"])\ + .setOutputCol("word_embeddings") + +ner = MedicalNerModel.pretrained("ner_radiology", "en", "clinical/models") \ + .setInputCols(["sentence", "token", "word_embeddings"]) \ + .setOutputCol("ner") + +ner_converter = NerConverterInternal()\ + .setInputCols(["sentence", "token", "ner"])\ + .setOutputCol("ner_chunk")\ + .setWhiteList(["Test"]) + +c2doc = Chunk2Doc()\ + .setInputCols("ner_chunk")\ + .setOutputCol("ner_chunk_doc") + +biolord_embedding = MPNetEmbeddings.pretrained("mpnet_embeddings_biolord_2023_c", "en")\ + .setInputCols(["ner_chunk_doc"])\ + .setOutputCol("embeddings")\ + .setCaseSensitive(False) + +loinc_resolver = SentenceEntityResolverModel.pretrained("biolordresolve_loinc_augmented","en", "clinical/models")\ + .setInputCols(["embeddings"]) \ + .setOutputCol("loinc_code")\ + .setDistanceFunction("EUCLIDEAN") + +resolver_pipeline = Pipeline( + stages = [ + document_assembler, + sentenceDetectorDL, + tokenizer, + word_embeddings, + ner, + ner_converter, + c2doc, + biolord_embedding, + loinc_resolver]) + +data = spark.createDataFrame([["""The patient is a 22-year-old female with a history of obesity. She has a Body mass index (BMI) of 33.5 kg/m2, aspartate aminotransferase 64, and alanine aminotransferase 126."""]]).toDF("text") + +result = resolver_pipeline.fit(data).transform(data) +``` +```scala +val document_assembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val sentenceDetectorDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl_healthcare", "en", "clinical/models") + .setInputCols(["document"]) + .setOutputCol("sentence") + +val tokenizer = new Tokenizer() + .setInputCols(["sentence"]) + .setOutputCol("token") + +val word_embeddings = WordEmbeddingsModel.pretrained("embeddings_clinical", "en", "clinical/models") + .setInputCols(Array("sentence", "token")) + .setOutputCol("word_embeddings") + +val ner = MedicalNerModel.pretrained("ner_radiology", "en", "clinical/models") + .setInputCols(Array("sentence", "token", "word_embeddings")) + .setOutputCol("ner") + +val ner_converter = new NerConverterInternal() + .setInputCols(Array("sentence", "token", "ner")) + .setOutputCol("ner_chunk") + .setWhiteList(["Test"]) + +val c2doc = new Chunk2Doc() + .setInputCols("ner_chunk") + .setOutputCol("ner_chunk_doc") + +val biolord_embedding = MPNetEmbeddings.pretrained("mpnet_embeddings_biolord_2023_c", "en") + .setInputCols(["ner_chunk_doc"]) + .setOutputCol("embeddings") + .setCaseSensitive(False) + +val loinc_resolver = SentenceEntityResolverModel.pretrained("biolordresolve_loinc_augmented","en", "clinical/models") + .setInputCols(["embeddings"]) + .setOutputCol("loinc_code") + .setDistanceFunction("EUCLIDEAN") + +val resolver_pipeline = new Pipeline( + stages = [ + document_assembler, + sentenceDetectorDL, + tokenizer, + word_embeddings, + ner, + ner_converter, + c2doc, + biolord_embedding, + loinc_resolver]) + + +val data = Seq([["""The patient is a 22-year-old female with a history of obesity. She has a Body mass index (BMI) of 33.5 kg/m2, aspartate aminotransferase 64, and alanine aminotransferase 126."""]]).toDF("text") + +val result = resolver_pipeline.fit(data).transform(data) +``` +
+ +## Results + +```bash ++--------------------------+-----+---+---------+----------+-------------------------------------------------------+------------------------------------------------------------+------------------------------------------------------------+------------------------------------------------------------+ +| chunk|begin|end|ner_label|loinc_code| description| resolutions| all_codes| aux_labels| ++--------------------------+-----+---+---------+----------+-------------------------------------------------------+------------------------------------------------------------+------------------------------------------------------------+------------------------------------------------------------+ +| BMI| 90| 92| Test| 39156-5| BMI [Body mass index (BMI) [Ratio]]|BMI [Body mass index (BMI) [Ratio]]:::BMI Est [Body mass ...|39156-5:::89270-3:::94138-5:::59574-4:::LP415677-6:::5957...|Observation:::Observation:::Observation:::Observation:::M...| +|aspartate aminotransferase| 110|135| Test| LP15426-7|Aspartate aminotransferase [Aspartate aminotransferase]|Aspartate aminotransferase [Aspartate aminotransferase]::...|LP15426-7:::100739-2:::LP307348-5:::LP307326-1:::LP307433...|Observation:::Observation:::Observation:::Observation:::O...| +| alanine aminotransferase| 145|168| Test| LP15333-5| Alanine aminotransferase [Alanine aminotransferase]|Alanine aminotransferase [Alanine aminotransferase]:::L-a...|LP15333-5:::59245-1:::100738-4:::LP307326-1:::69383-8:::L...|Observation:::Observation:::Observation:::Observation:::O...| ++--------------------------+-----+---+---------+----------+-------------------------------------------------------+------------------------------------------------------------+------------------------------------------------------------+------------------------------------------------------------+ +``` + +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|biolordresolve_loinc_augmented| +|Compatibility:|Healthcare NLP 5.5.0+| +|License:|Licensed| +|Edition:|Official| +|Input Labels:|[mpnet_embeddings]| +|Output Labels:|[loinc_code]| +|Language:|en| +|Size:|1.1 GB| +|Case sensitive:|false| + +## References +This model is trained with augmented version of the LOINC v2.78 dataset released in 2024-08-06. diff --git a/docs/_posts/akrztrk/2024-10-08-jsl_medm_q16_v2_en.md b/docs/_posts/akrztrk/2024-10-08-jsl_medm_q16_v2_en.md new file mode 100644 index 0000000000..29bf1f6e71 --- /dev/null +++ b/docs/_posts/akrztrk/2024-10-08-jsl_medm_q16_v2_en.md @@ -0,0 +1,134 @@ +--- +layout: model +title: JSL_MedM_v2 (LLM - q16) +author: John Snow Labs +name: jsl_medm_q16_v2 +date: 2024-10-08 +tags: [en, licensed, clinical, medical, llm, ner, tensorflow] +task: [Summarization, Question Answering, Named Entity Recognition] +language: en +edition: Healthcare NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: tensorflow +annotator: MedicalLLM +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +This LLM model is trained to extract and link entities in a document. Users needs to define an input schema as explained in the example section. Drug is defined as a list which tells the model that there could be multiple drugs in the document and it has to extract all of them. Each drug has properties like name and reaction. Since “name” is only one, it is a string, but there could be multiple reactions, hence it is a list. Similarly, users can define any schema for any type of entity. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/clinical/models/jsl_medm_q16_v2_en_5.5.0_3.0_1728410388989.zip){:.button.button-orange.button-orange-trans.arr.button-icon.hidden} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/clinical/models/jsl_medm_q16_v2_en_5.5.0_3.0_1728410388989.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} + +```python + +document_assembler = DocumentAssembler()\ + .setInputCol("text")\ + .setOutputCol("document") + +medical_llm = MedicalLLM.pretrained("jsl_medm_q16_v2", "en", "clinical/models")\ + .setInputCols("document")\ + .setOutputCol("completions")\ + .setBatchSize(1)\ + .setNPredict(100)\ + .setUseChatTemplate(True)\ + .setTemperature(0) + + +pipeline = Pipeline( + stages = [ + document_assembler, + medical_llm +]) + +prompt = """ +A 23-year-old pregnant woman at 22 weeks gestation presents with burning upon urination. She states it started 1 day ago and has been worsening despite drinking more water and taking cranberry extract. She otherwise feels well and is followed by a doctor for her pregnancy. Her temperature is 97.7°F (36.5°C), blood pressure is 122/77 mmHg, pulse is 80/min, respirations are 19/min, and oxygen saturation is 98% on room air. Physical exam is notable for an absence of costovertebral angle tenderness and a gravid uterus. +Which of the following is the best treatment for this patient? +A: Ampicillin +B: Ceftriaxone +C: Ciprofloxacin +D: Doxycycline +E: Nitrofurantoin +""" + +data = spark.createDataFrame([[prompt]]).toDF("text") + +results = pipeline.fit(data).transform(data) + +results.select("completions").show(truncate=False) + +``` +```scala + +val document_assembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val medical_llm = MedicalLLM.pretrained("jsl_medm_q16_v2", "en", "clinical/models") + .setInputCols("document") + .setOutputCol("completions") + .setBatchSize(1) + .setNPredict(100) + .setUseChatTemplate(True) + .setTemperature(0) + + +val pipeline = new Pipeline().setStages(Array( + document_assembler, + medical_llm +)) + +val prompt = """ +A 23-year-old pregnant woman at 22 weeks gestation presents with burning upon urination. She states it started 1 day ago and has been worsening despite drinking more water and taking cranberry extract. She otherwise feels well and is followed by a doctor for her pregnancy. Her temperature is 97.7°F (36.5°C), blood pressure is 122/77 mmHg, pulse is 80/min, respirations are 19/min, and oxygen saturation is 98% on room air. Physical exam is notable for an absence of costovertebral angle tenderness and a gravid uterus. +Which of the following is the best treatment for this patient? +A: Ampicillin +B: Ceftriaxone +C: Ciprofloxacin +D: Doxycycline +E: Nitrofurantoin +""" + +val data = Seq(prompt).toDF("text") + +val results = pipeline.fit(data).transform(data) + +results.select("completions").show(truncate=False) + +``` +
+ +## Results + +```bash + +The correct answer is E: Nitrofurantoin. + +The patient is presenting with symptoms of urinary tract infection (UTI), which is common during pregnancy. Nitrofurantoin is a first-line antibiotic for uncomplicated UTI during pregnancy. It is safe and effective in treating UTI during pregnancy and has been used for many years without any adverse effects on the fetus. + +``` + +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|jsl_medm_q16_v2| +|Compatibility:|Healthcare NLP 5.5.0+| +|License:|Licensed| +|Edition:|Official| +|Language:|en| +|Size:|12.7 GB| diff --git a/docs/_posts/akrztrk/2024-10-08-jsl_medm_q8_v3_en.md b/docs/_posts/akrztrk/2024-10-08-jsl_medm_q8_v3_en.md new file mode 100644 index 0000000000..a7fc498cd7 --- /dev/null +++ b/docs/_posts/akrztrk/2024-10-08-jsl_medm_q8_v3_en.md @@ -0,0 +1,139 @@ +--- +layout: model +title: JSL_MedM_v3 (LLM - q8) +author: John Snow Labs +name: jsl_medm_q8_v3 +date: 2024-10-08 +tags: [en, licensed, clinical, medical, llm, ner, tensorflow] +task: [Summarization, Question Answering, Named Entity Recognition] +language: en +edition: Healthcare NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: tensorflow +annotator: MedicalLLM +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +This LLM model is trained to extract and link entities in a document. Users needs to define an input schema as explained in the example section. Drug is defined as a list which tells the model that there could be multiple drugs in the document and it has to extract all of them. Each drug has properties like name and reaction. Since “name” is only one, it is a string, but there could be multiple reactions, hence it is a list. Similarly, users can define any schema for any type of entity. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/clinical/models/jsl_medm_q8_v3_en_5.5.0_3.0_1728412912575.zip){:.button.button-orange.button-orange-trans.arr.button-icon.hidden} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/clinical/models/jsl_medm_q8_v3_en_5.5.0_3.0_1728412912575.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} + +```python + +document_assembler = DocumentAssembler()\ + .setInputCol("text")\ + .setOutputCol("document") + +medical_llm = MedicalLLM.pretrained("jsl_medm_q8_v3", "en", "clinical/models")\ + .setInputCols("document")\ + .setOutputCol("completions")\ + .setBatchSize(1)\ + .setNPredict(100)\ + .setUseChatTemplate(True)\ + .setTemperature(0) + + +pipeline = Pipeline( + stages = [ + document_assembler, + medical_llm +]) + +prompt = """ +A 23-year-old pregnant woman at 22 weeks gestation presents with burning upon urination. She states it started 1 day ago and has been worsening despite drinking more water and taking cranberry extract. She otherwise feels well and is followed by a doctor for her pregnancy. Her temperature is 97.7°F (36.5°C), blood pressure is 122/77 mmHg, pulse is 80/min, respirations are 19/min, and oxygen saturation is 98% on room air. Physical exam is notable for an absence of costovertebral angle tenderness and a gravid uterus. +Which of the following is the best treatment for this patient? +A: Ampicillin +B: Ceftriaxone +C: Ciprofloxacin +D: Doxycycline +E: Nitrofurantoin +""" + +data = spark.createDataFrame([[prompt]]).toDF("text") + +results = pipeline.fit(data).transform(data) + +results.select("completions").show(truncate=False) + +``` +```scala + +val document_assembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val medical_llm = MedicalLLM.pretrained("jsl_medm_q8_v3", "en", "clinical/models") + .setInputCols("document") + .setOutputCol("completions") + .setBatchSize(1) + .setNPredict(100) + .setUseChatTemplate(True) + .setTemperature(0) + + +val pipeline = new Pipeline().setStages(Array( + document_assembler, + medical_llm +)) + +val prompt = """ +A 23-year-old pregnant woman at 22 weeks gestation presents with burning upon urination. She states it started 1 day ago and has been worsening despite drinking more water and taking cranberry extract. She otherwise feels well and is followed by a doctor for her pregnancy. Her temperature is 97.7°F (36.5°C), blood pressure is 122/77 mmHg, pulse is 80/min, respirations are 19/min, and oxygen saturation is 98% on room air. Physical exam is notable for an absence of costovertebral angle tenderness and a gravid uterus. +Which of the following is the best treatment for this patient? +A: Ampicillin +B: Ceftriaxone +C: Ciprofloxacin +D: Doxycycline +E: Nitrofurantoin +""" + +val data = Seq(prompt).toDF("text") + +val results = pipeline.fit(data).transform(data) + +results.select("completions").show(truncate=False) + +``` +
+ +## Results + +```bash + +The best treatment for a pregnant woman at 22 weeks gestation presenting with symptoms of a urinary tract infection (UTI) is: + +E: Nitrofurantoin + +Here's the rationale: + +- The patient's symptoms of burning upon urination, worsening over a day, and absence of costovertebral angle tenderness suggest a urinary tract infection (UTI). +- The patient is pregnant, which increases the risk of UTIs and their complications, such as pyelonephritis + +``` + +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|jsl_medm_q8_v3| +|Compatibility:|Healthcare NLP 5.5.0+| +|License:|Licensed| +|Edition:|Official| +|Language:|en| +|Size:|15.0 GB| diff --git a/docs/_posts/akrztrk/2024-10-08-loinc_numeric_resolver_pipeline_en.md b/docs/_posts/akrztrk/2024-10-08-loinc_numeric_resolver_pipeline_en.md new file mode 100644 index 0000000000..901d13a920 --- /dev/null +++ b/docs/_posts/akrztrk/2024-10-08-loinc_numeric_resolver_pipeline_en.md @@ -0,0 +1,112 @@ +--- +layout: model +title: Pipeline for Logical Observation Identifiers Names and Codes (LOINC-Numeric) +author: John Snow Labs +name: loinc_numeric_resolver_pipeline +date: 2024-10-08 +tags: [licensed, en, clinical, loinc, pipeline, resolver] +task: [Entity Resolution, Pipeline Healthcare] +language: en +edition: Healthcare NLP 5.5.0 +spark_version: 3.2 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +This pipeline extracts `TEST` entities and maps them to their correspondings Logical Observation Identifiers Names and Codes(LOINC) codes using `sbiobert_base_cased_mli` sentence embeddings. It was prepared with the numeric LOINC codes, without the inclusion of LOINC “Document Ontology” codes starting with the letter “L”. It also provides the official resolution of the codes within the brackets. + +## Predicted Entities + +`TEST`, `Test_Result` + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/clinical/models/loinc_numeric_resolver_pipeline_en_5.5.0_3.2_1728416293432.zip){:.button.button-orange.button-orange-trans.arr.button-icon.hidden} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/clinical/models/loinc_numeric_resolver_pipeline_en_5.5.0_3.2_1728416293432.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} + +```python + +from sparknlp.pretrained import PretrainedPipeline + +loinc_pipeline = PretrainedPipeline("loinc_numeric_resolver_pipeline", "en", "clinical/models") + +text = """A 65-year-old woman presents to the office with generalized fatigue for the last 4 months. She used to walk 1 mile each evening but now gets tired after 1-2 blocks. She has a history of Crohn disease and hypertension for which she receives appropriate medications. She is married and lives with her husband. She eats a balanced diet that includes chicken, fish, pork, fruits, and vegetables. She rarely drinks alcohol and denies tobacco use. A physical examination is unremarkable. Laboratory studies show the following: +Hemoglobin: 9.8 g/dL +Hematocrit: 32% +Mean Corpuscular Volume: 110 μm3""" + +result = loinc_pipeline.fullAnnotate(text) + +``` +```scala + +import com.johnsnowlabs.nlp.pretrained.PretrainedPipeline + +val loinc_pipeline = PretrainedPipeline("loinc_numeric_resolver_pipeline", "en", "clinical/models") + +val text = """A 65-year-old woman presents to the office with generalized fatigue for the last 4 months. She used to walk 1 mile each evening but now gets tired after 1-2 blocks. She has a history of Crohn disease and hypertension for which she receives appropriate medications. She is married and lives with her husband. She eats a balanced diet that includes chicken, fish, pork, fruits, and vegetables. She rarely drinks alcohol and denies tobacco use. A physical examination is unremarkable. Laboratory studies show the following: +Hemoglobin: 9.8 g/dL +Hematocrit: 32% +Mean Corpuscular Volume: 110 μm3""" + +val result = loinc_pipeline.fullAnnotate(text) + +``` +
+ +## Results + +```bash + ++-----------------------+-----+---+-------+-----------------------------------------------------------------+-----------------------------------------------------------------+-----------------------------------------------------------------+ +| chunks|begin|end| code| all_codes| resolutions| all_distances| ++-----------------------+-----+---+-------+-----------------------------------------------------------------+-----------------------------------------------------------------+-----------------------------------------------------------------+ +| A physical examination| 443|464|55286-9|[55286-9, 11384-5, 29544-4, 29545-1, 32427-7, 11435-5, 29271-4...|[Physical exam by body areas [Physical exam by body areas], Ph...|[0.0713, 0.0913, 0.0910, 0.0961, 0.1114, 0.1119, 0.1153, 0.112...| +| Laboratory studies| 483|500|26436-6|[26436-6, 52482-7, 11502-2, 34075-2, 100455-5, 85069-3, 101129...|[Laboratory studies (set) [Laboratory studies (set)], Laborato...|[0.0469, 0.0648, 0.0748, 0.0947, 0.0967, 0.1285, 0.1257, 0.129...| +| Hemoglobin| 522|531|10346-5|[10346-5, 15082-1, 11559-2, 2030-5, 34618-9, 38896-7, 717-9, 1...|[Haemoglobin [Hemoglobin A [Units/volume] in Blood by Electrop...|[0.0214, 0.0356, 0.0563, 0.0654, 0.0886, 0.0891, 0.1005, 0.105...| +| Hematocrit| 543|552|32354-3|[32354-3, 20570-8, 11153-4, 13508-7, 104874-3, 42908-4, 11559-...|[Hematocrit [Volume Fraction] of Arterial blood [Hematocrit [V...|[0.0590, 0.0625, 0.0675, 0.0737, 0.0890, 0.1035, 0.1060, 0.107...| +|Mean Corpuscular Volume| 559|581|30386-7|[30386-7, 101864-7, 20161-6, 18033-1, 19853-1, 101150-1, 59117...|[Erythrocyte mean corpuscular diameter [Length] [Erythrocyte m...|[0.1344, 0.1333, 0.1350, 0.1359, 0.1353, 0.1427, 0.1523, 0.147...| ++-----------------------+-----+---+-------+-----------------------------------------------------------------+-----------------------------------------------------------------+-----------------------------------------------------------------+ + +``` + +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|loinc_numeric_resolver_pipeline| +|Type:|pipeline| +|Compatibility:|Healthcare NLP 5.5.0+| +|License:|Licensed| +|Edition:|Official| +|Language:|en| +|Size:|2.8 GB| + +## Included Models + +- DocumentAssembler +- SentenceDetectorDLModel +- TokenizerModel +- WordEmbeddingsModel +- MedicalNerModel +- NerConverterInternalModel +- MedicalNerModel +- NerConverterInternalModel +- ChunkMergeModel +- Chunk2Doc +- BertSentenceEmbeddings +- SentenceEntityResolverModel diff --git a/docs/_posts/akrztrk/2024-10-08-loinc_resolver_pipeline_en.md b/docs/_posts/akrztrk/2024-10-08-loinc_resolver_pipeline_en.md new file mode 100644 index 0000000000..4451defde6 --- /dev/null +++ b/docs/_posts/akrztrk/2024-10-08-loinc_resolver_pipeline_en.md @@ -0,0 +1,113 @@ +--- +layout: model +title: Pipeline for Logical Observation Identifiers Names and Codes (LOINC) +author: John Snow Labs +name: loinc_resolver_pipeline +date: 2024-10-08 +tags: [licensed, en, loinc, pipeline, resolver] +task: [Entity Resolution, Pipeline Healthcare] +language: en +edition: Healthcare NLP 5.5.0 +spark_version: 3.2 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +This pipeline extracts `Test` entities from clinical texts and maps them to their corresponding Logical Observation Identifiers Names and Codes (LOINC) codes using `sbiobert_base_cased_mli` Sentence Bert Embeddings. + +## Predicted Entities + +`loinc_code` + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/clinical/models/loinc_resolver_pipeline_en_5.5.0_3.2_1728411999529.zip){:.button.button-orange.button-orange-trans.arr.button-icon.hidden} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/clinical/models/loinc_resolver_pipeline_en_5.5.0_3.2_1728411999529.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} + +```python + +from sparknlp.pretrained import PretrainedPipeline + +ner_pipeline = PretrainedPipeline("loinc_resolver_pipeline", "en", "clinical/models") + +result = ner_pipeline.annotate("""A 65-year-old woman presents to the office with generalized fatigue for the last 4 months. + She used to walk 1 mile each evening but now gets tired after 1-2 blocks. She has a history of Crohn disease and hypertension + for which she receives appropriate medications. She is married and lives with her husband. She eats a balanced diet that + includes chicken, fish, pork, fruits, and vegetables. She rarely drinks alcohol and denies tobacco use. Her vital signs are + within normal limits. A physical examination is unremarkable. Laboratory studies show the following: Hemoglobin: 9.8 g/dL, + Hematocrit: 32%, Mean Corpuscular Volume: 110 μm3""") + +``` +```scala + +import com.johnsnowlabs.nlp.pretrained.PretrainedPipeline + +val ner_pipeline = PretrainedPipeline("loinc_resolver_pipeline", "en", "clinical/models") + +val result = ner_pipeline.annotate("""A 65-year-old woman presents to the office with generalized fatigue for the last 4 months. + She used to walk 1 mile each evening but now gets tired after 1-2 blocks. She has a history of Crohn disease and hypertension + for which she receives appropriate medications. She is married and lives with her husband. She eats a balanced diet that + includes chicken, fish, pork, fruits, and vegetables. She rarely drinks alcohol and denies tobacco use. Her vital signs are + within normal limits. A physical examination is unremarkable. Laboratory studies show the following: Hemoglobin: 9.8 g/dL, + Hematocrit: 32%, Mean Corpuscular Volume: 110 μm3""") + +``` +
+ +## Results + +```bash + ++-----------------------+-----+----------+----------------------------------------------------------------------+----------------------------------------------------------------------+----------------------------------------------------------------------+ +| chunk|label|loinc_code| resolution| all_codes| all_resolutions| ++-----------------------+-----+----------+----------------------------------------------------------------------+----------------------------------------------------------------------+----------------------------------------------------------------------+ +| Vital signs| Test| 8716-3| Vital signs [Vital signs]|8716-3:::LP133943-3:::LP204118-6:::80339-5:::34566-0:::29274-8:::95...|Vital signs [Vital signs]:::EMS vital signs [EMS vital signs]:::Vit...| +| A physical examination| Test| LP7801-6| Physical exam [Physical exam]|LP7801-6:::LP269267-3:::LP94385-9:::55286-9:::11384-5:::LP133607-4:...|Physical exam [Physical exam]:::Estimated from physical examination...| +| Laboratory studies| Test| LP74124-6| Laboratory studies [Laboratory studies]|LP74124-6:::26436-6:::LP36394-2:::52482-7:::ATTACH.LAB:::11502-2:::...|Laboratory studies [Laboratory studies]:::Laboratory studies (set) ...| +| Hemoglobin| Test| LP14449-0| Hemoglobin [Hemoglobin]|LP14449-0:::LP30929-1:::LP16455-5:::10346-5:::LP16428-2:::LP14554-7...|Hemoglobin [Hemoglobin]:::Hemoglobin G [Hemoglobin G]:::Hemoglobin ...| +| Hematocrit| Test| LP15101-6| Hematocrit [Hematocrit]|LP15101-6:::LP308151-2:::32354-3:::20570-8:::11153-4:::LP74090-9:::...|Hematocrit [Hematocrit]:::Hematocrit/Hemoglobin [Hematocrit/Hemoglo...| +|Mean Corpuscular Volume| Test| LP15191-7|Erythrocyte mean corpuscular volume [Erythrocyte mean corpuscular v...|LP15191-7:::LP17688-0:::LP62885-6:::LP29006-1:::LP66395-2:::LP41110...|Erythrocyte mean corpuscular volume [Erythrocyte mean corpuscular v...| ++-----------------------+-----+----------+----------------------------------------------------------------------+----------------------------------------------------------------------+----------------------------------------------------------------------+ + +``` + +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|loinc_resolver_pipeline| +|Type:|pipeline| +|Compatibility:|Healthcare NLP 5.5.0+| +|License:|Licensed| +|Edition:|Official| +|Language:|en| +|Size:|3.2 GB| + +## Included Models + +- DocumentAssembler +- SentenceDetectorDLModel +- TokenizerModel +- WordEmbeddingsModel +- MedicalNerModel +- NerConverterInternalModel +- MedicalNerModel +- NerConverterInternalModel +- ChunkMergeModel +- Chunk2Doc +- BertSentenceEmbeddings +- SentenceEntityResolverModel diff --git a/docs/_posts/bugeki/2024-10-08-loinc_numeric_resolver_pipeline_en.md b/docs/_posts/bugeki/2024-10-08-loinc_numeric_resolver_pipeline_en.md new file mode 100644 index 0000000000..69e097f83e --- /dev/null +++ b/docs/_posts/bugeki/2024-10-08-loinc_numeric_resolver_pipeline_en.md @@ -0,0 +1,112 @@ +--- +layout: model +title: Pipeline for Logical Observation Identifiers Names and Codes (LOINC-Numeric) +author: John Snow Labs +name: loinc_numeric_resolver_pipeline +date: 2024-10-08 +tags: [licensed, en, clinical, loinc, pipeline, resolver] +task: [Entity Resolution, Pipeline Healthcare] +language: en +edition: Healthcare NLP 5.5.0 +spark_version: 3.4 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +This pipeline extracts `TEST` entities and maps them to their correspondings Logical Observation Identifiers Names and Codes(LOINC) codes using `sbiobert_base_cased_mli` sentence embeddings. It was prepared with the numeric LOINC codes, without the inclusion of LOINC “Document Ontology” codes starting with the letter “L”. It also provides the official resolution of the codes within the brackets. + +## Predicted Entities + +`TEST`, `Test_Result` + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/clinical/models/loinc_numeric_resolver_pipeline_en_5.5.0_3.4_1728415230214.zip){:.button.button-orange.button-orange-trans.arr.button-icon.hidden} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/clinical/models/loinc_numeric_resolver_pipeline_en_5.5.0_3.4_1728415230214.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} + +```python + +from sparknlp.pretrained import PretrainedPipeline + +loinc_pipeline = PretrainedPipeline("loinc_numeric_resolver_pipeline", "en", "clinical/models") + +text = """A 65-year-old woman presents to the office with generalized fatigue for the last 4 months. She used to walk 1 mile each evening but now gets tired after 1-2 blocks. She has a history of Crohn disease and hypertension for which she receives appropriate medications. She is married and lives with her husband. She eats a balanced diet that includes chicken, fish, pork, fruits, and vegetables. She rarely drinks alcohol and denies tobacco use. A physical examination is unremarkable. Laboratory studies show the following: +Hemoglobin: 9.8 g/dL +Hematocrit: 32% +Mean Corpuscular Volume: 110 μm3""" + +result = loinc_pipeline.fullAnnotate(text) + +``` +```scala + +import com.johnsnowlabs.nlp.pretrained.PretrainedPipeline + +val loinc_pipeline = PretrainedPipeline("loinc_numeric_resolver_pipeline", "en", "clinical/models") + +val text = """A 65-year-old woman presents to the office with generalized fatigue for the last 4 months. She used to walk 1 mile each evening but now gets tired after 1-2 blocks. She has a history of Crohn disease and hypertension for which she receives appropriate medications. She is married and lives with her husband. She eats a balanced diet that includes chicken, fish, pork, fruits, and vegetables. She rarely drinks alcohol and denies tobacco use. A physical examination is unremarkable. Laboratory studies show the following: +Hemoglobin: 9.8 g/dL +Hematocrit: 32% +Mean Corpuscular Volume: 110 μm3""" + +val result = loinc_pipeline.fullAnnotate(text) + +``` +
+ +## Results + +```bash + ++-----------------------+-----+---+-------+-----------------------------------------------------------------+-----------------------------------------------------------------+-----------------------------------------------------------------+ +| chunks|begin|end| code| all_codes| resolutions| all_distances| ++-----------------------+-----+---+-------+-----------------------------------------------------------------+-----------------------------------------------------------------+-----------------------------------------------------------------+ +| A physical examination| 443|464|55286-9|[55286-9, 11384-5, 29544-4, 29545-1, 32427-7, 11435-5, 29271-4...|[Physical exam by body areas [Physical exam by body areas], Ph...|[0.0713, 0.0913, 0.0910, 0.0961, 0.1114, 0.1119, 0.1153, 0.112...| +| Laboratory studies| 483|500|26436-6|[26436-6, 52482-7, 11502-2, 34075-2, 100455-5, 85069-3, 101129...|[Laboratory studies (set) [Laboratory studies (set)], Laborato...|[0.0469, 0.0648, 0.0748, 0.0947, 0.0967, 0.1285, 0.1257, 0.129...| +| Hemoglobin| 522|531|10346-5|[10346-5, 15082-1, 11559-2, 2030-5, 34618-9, 38896-7, 717-9, 1...|[Haemoglobin [Hemoglobin A [Units/volume] in Blood by Electrop...|[0.0214, 0.0356, 0.0563, 0.0654, 0.0886, 0.0891, 0.1005, 0.105...| +| Hematocrit| 543|552|32354-3|[32354-3, 20570-8, 11153-4, 13508-7, 104874-3, 42908-4, 11559-...|[Hematocrit [Volume Fraction] of Arterial blood [Hematocrit [V...|[0.0590, 0.0625, 0.0675, 0.0737, 0.0890, 0.1035, 0.1060, 0.107...| +|Mean Corpuscular Volume| 559|581|30386-7|[30386-7, 101864-7, 20161-6, 18033-1, 19853-1, 101150-1, 59117...|[Erythrocyte mean corpuscular diameter [Length] [Erythrocyte m...|[0.1344, 0.1333, 0.1350, 0.1359, 0.1353, 0.1427, 0.1523, 0.147...| ++-----------------------+-----+---+-------+-----------------------------------------------------------------+-----------------------------------------------------------------+-----------------------------------------------------------------+ + +``` + +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|loinc_numeric_resolver_pipeline| +|Type:|pipeline| +|Compatibility:|Healthcare NLP 5.5.0+| +|License:|Licensed| +|Edition:|Official| +|Language:|en| +|Size:|2.8 GB| + +## Included Models + +- DocumentAssembler +- SentenceDetectorDLModel +- TokenizerModel +- WordEmbeddingsModel +- MedicalNerModel +- NerConverterInternalModel +- MedicalNerModel +- NerConverterInternalModel +- ChunkMergeModel +- Chunk2Doc +- BertSentenceEmbeddings +- SentenceEntityResolverModel diff --git a/docs/_posts/bugeki/2024-10-08-loinc_resolver_pipeline_en.md b/docs/_posts/bugeki/2024-10-08-loinc_resolver_pipeline_en.md new file mode 100644 index 0000000000..107b269968 --- /dev/null +++ b/docs/_posts/bugeki/2024-10-08-loinc_resolver_pipeline_en.md @@ -0,0 +1,113 @@ +--- +layout: model +title: Pipeline for Logical Observation Identifiers Names and Codes (LOINC) +author: John Snow Labs +name: loinc_resolver_pipeline +date: 2024-10-08 +tags: [licensed, en, loinc, pipeline, resolver] +task: [Entity Resolution, Pipeline Healthcare] +language: en +edition: Healthcare NLP 5.5.0 +spark_version: 3.4 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +This pipeline extracts `Test` entities from clinical texts and maps them to their corresponding Logical Observation Identifiers Names and Codes (LOINC) codes using `sbiobert_base_cased_mli` Sentence Bert Embeddings. + +## Predicted Entities + +`loinc_code` + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/clinical/models/loinc_resolver_pipeline_en_5.5.0_3.4_1728411236477.zip){:.button.button-orange.button-orange-trans.arr.button-icon.hidden} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/clinical/models/loinc_resolver_pipeline_en_5.5.0_3.4_1728411236477.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} + +```python + +from sparknlp.pretrained import PretrainedPipeline + +ner_pipeline = PretrainedPipeline("loinc_resolver_pipeline", "en", "clinical/models") + +result = ner_pipeline.annotate("""A 65-year-old woman presents to the office with generalized fatigue for the last 4 months. + She used to walk 1 mile each evening but now gets tired after 1-2 blocks. She has a history of Crohn disease and hypertension + for which she receives appropriate medications. She is married and lives with her husband. She eats a balanced diet that + includes chicken, fish, pork, fruits, and vegetables. She rarely drinks alcohol and denies tobacco use. Her vital signs are + within normal limits. A physical examination is unremarkable. Laboratory studies show the following: Hemoglobin: 9.8 g/dL, + Hematocrit: 32%, Mean Corpuscular Volume: 110 μm3""") + +``` +```scala + +import com.johnsnowlabs.nlp.pretrained.PretrainedPipeline + +val ner_pipeline = PretrainedPipeline("loinc_resolver_pipeline", "en", "clinical/models") + +val result = ner_pipeline.annotate("""A 65-year-old woman presents to the office with generalized fatigue for the last 4 months. + She used to walk 1 mile each evening but now gets tired after 1-2 blocks. She has a history of Crohn disease and hypertension + for which she receives appropriate medications. She is married and lives with her husband. She eats a balanced diet that + includes chicken, fish, pork, fruits, and vegetables. She rarely drinks alcohol and denies tobacco use. Her vital signs are + within normal limits. A physical examination is unremarkable. Laboratory studies show the following: Hemoglobin: 9.8 g/dL, + Hematocrit: 32%, Mean Corpuscular Volume: 110 μm3""") + +``` +
+ +## Results + +```bash + ++-----------------------+-----+----------+----------------------------------------------------------------------+----------------------------------------------------------------------+----------------------------------------------------------------------+ +| chunk|label|loinc_code| resolution| all_codes| all_resolutions| ++-----------------------+-----+----------+----------------------------------------------------------------------+----------------------------------------------------------------------+----------------------------------------------------------------------+ +| Vital signs| Test| 8716-3| Vital signs [Vital signs]|8716-3:::LP133943-3:::LP204118-6:::80339-5:::34566-0:::29274-8:::95...|Vital signs [Vital signs]:::EMS vital signs [EMS vital signs]:::Vit...| +| A physical examination| Test| LP7801-6| Physical exam [Physical exam]|LP7801-6:::LP269267-3:::LP94385-9:::55286-9:::11384-5:::LP133607-4:...|Physical exam [Physical exam]:::Estimated from physical examination...| +| Laboratory studies| Test| LP74124-6| Laboratory studies [Laboratory studies]|LP74124-6:::26436-6:::LP36394-2:::52482-7:::ATTACH.LAB:::11502-2:::...|Laboratory studies [Laboratory studies]:::Laboratory studies (set) ...| +| Hemoglobin| Test| LP14449-0| Hemoglobin [Hemoglobin]|LP14449-0:::LP30929-1:::LP16455-5:::10346-5:::LP16428-2:::LP14554-7...|Hemoglobin [Hemoglobin]:::Hemoglobin G [Hemoglobin G]:::Hemoglobin ...| +| Hematocrit| Test| LP15101-6| Hematocrit [Hematocrit]|LP15101-6:::LP308151-2:::32354-3:::20570-8:::11153-4:::LP74090-9:::...|Hematocrit [Hematocrit]:::Hematocrit/Hemoglobin [Hematocrit/Hemoglo...| +|Mean Corpuscular Volume| Test| LP15191-7|Erythrocyte mean corpuscular volume [Erythrocyte mean corpuscular v...|LP15191-7:::LP17688-0:::LP62885-6:::LP29006-1:::LP66395-2:::LP41110...|Erythrocyte mean corpuscular volume [Erythrocyte mean corpuscular v...| ++-----------------------+-----+----------+----------------------------------------------------------------------+----------------------------------------------------------------------+----------------------------------------------------------------------+ + +``` + +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|loinc_resolver_pipeline| +|Type:|pipeline| +|Compatibility:|Healthcare NLP 5.5.0+| +|License:|Licensed| +|Edition:|Official| +|Language:|en| +|Size:|3.2 GB| + +## Included Models + +- DocumentAssembler +- SentenceDetectorDLModel +- TokenizerModel +- WordEmbeddingsModel +- MedicalNerModel +- NerConverterInternalModel +- MedicalNerModel +- NerConverterInternalModel +- ChunkMergeModel +- Chunk2Doc +- BertSentenceEmbeddings +- SentenceEntityResolverModel