Skip to content

Commit

Permalink
Models hub internal (#1542)
Browse files Browse the repository at this point in the history
  • Loading branch information
Meryem1425 authored Oct 9, 2024
1 parent c7b88b8 commit 73b7b70
Show file tree
Hide file tree
Showing 13 changed files with 1,835 additions and 0 deletions.
Original file line number Diff line number Diff line change
@@ -0,0 +1,140 @@
---
layout: model
title: Clinical Deidentification Pipeline (Sentence Wise)
author: John Snow Labs
name: clinical_deidentification_nameAugmented_v2
date: 2024-10-07
tags: [deidentification, deid, en, licensed, clinical, pipeline, sent_wise]
task: [De-identification, Pipeline Healthcare]
language: en
edition: Healthcare NLP 5.4.0
spark_version: 3.4
supported: true
annotator: PipelineModel
article_header:
type: cover
use_language_switcher: "Python-Scala-Java"
---

## Description

This pipeline can be used to deidentify PHI information from medical texts. The PHI information will be masked and obfuscated in the resulting text.
The pipeline can mask and obfuscate `MEDICALRECORD`, `ORGANIZATION`, `PROFESSION`, `HEALTHPLAN`, `NAME`, `LOCATION-OTHER`, `URL`, `DEVICE`, `CITY`, `DATE`, `ZIP`, `STATE`,
`COUNTRY`, `STREET`, `PHONE`, `LOCATION`, `EMAIL`, `IDNUM`, `BIOID`, `FAX`, `AGE`, `LOCATION_OTHER`, `DLN`, `CONTACT`, `ID`, `SSN`, `ACCOUNT`, `PLATE`, `VIN`, `LICENSE`,
`IP` entities.

## Predicted Entities

`MEDICALRECORD`, `ORGANIZATION`, `PROFESSION`, `HEALTHPLAN`, `NAME`, `LOCATION-OTHER`, `URL`, `DEVICE`, `CITY`, `DATE`, `ZIP`, `STATE`,
`COUNTRY`, `STREET`, `PHONE`, `LOCATION`, `EMAIL`, `IDNUM`, `BIOID`, `FAX`, `AGE`, `LOCATION_OTHER`, `DLN`, `CONTACT`, `ID`, `SSN`, `ACCOUNT`, `PLATE`, `VIN`, `LICENSE`,
`IP`

{:.btn-box}
<button class="button button-orange" disabled>Live Demo</button>
<button class="button button-orange" disabled>Open in Colab</button>
[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/clinical/models/clinical_deidentification_nameAugmented_v2_en_5.4.0_3.4_1728315719478.zip){:.button.button-orange.button-orange-trans.arr.button-icon.hidden}
[Copy S3 URI](s3://auxdata.johnsnowlabs.com/clinical/models/clinical_deidentification_nameAugmented_v2_en_5.4.0_3.4_1728315719478.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3}

## How to use



<div class="tabs-box" markdown="1">
{% include programmingLanguageSelectScalaPythonNLU.html %}

```python
from sparknlp.pretrained import PretrainedPipeline

deid_pipeline = PretrainedPipeline("clinical_deidentification_nameAugmented_v2", "en", "clinical/models")

text = """Dr. John Lee, from Royal Medical Clinic in Chicago, attended to the patient on 11/05/2024.
The patient’s medical record number is 56467890.
The patient, Emma Wilson, is 50 years old, her Contact number: 444-456-7890 ."""

deid_result = deid_pipeline.fullAnnotate(text)

print(''.join([i.metadata['masked'] for i in deid_result['obfuscated']]))
print(''.join([i.result for i in deid_result['obfuscated']]))
```
```scala
import com.johnsnowlabs.nlp.pretrained.PretrainedPipeline

val deid_pipeline = PretrainedPipeline("clinical_deidentification_nameAugmented_v2", "en", "clinical/models")

val text = """Dr. John Lee, from Royal Medical Clinic in Chicago, attended to the patient on 11/05/2024.
The patient’s medical record number is 56467890.
The patient, Emma Wilson, is 50 years old, her Contact number: 444-456-7890 ."""

val deid_result = deid_pipeline.fullAnnotate(text)

println(deid_result("obfuscated").map(_("metadata")("masked").toString).mkString(""))
println(deid_result("obfuscated").map(_("result").toString).mkString(""))
```
</div>

## Results

```bash
Masked with entity labels
------------------------------
Dr. <NAME>, from <LOCATION> in <CITY>, attended to the patient on <DATE>.
The patient’s medical record number is <MEDICALRECORD>.
The patient, <NAME>, is <AGE> years old, her Contact number: <PHONE> .

Obfuscated
------------------------------
Dr. Rhodia Cera, from 252 Mchenry St in UNTERLAND, attended to the patient on 18/06/2024.
The patient’s medical record number is 16109604.
The patient, Eulice Hickory, is 44 years old, her Contact number: 540-981-1914 .
```

{:.model-param}
## Model Information

{:.table-model}
|---|---|
|Model Name:|clinical_deidentification_nameAugmented_v2|
|Type:|pipeline|
|Compatibility:|Healthcare NLP 5.4.0+|
|License:|Licensed|
|Edition:|Official|
|Language:|en|
|Size:|1.9 GB|

## Included Models

- DocumentAssembler
- SentenceDetectorDLModel
- TokenizerModel
- WordEmbeddingsModel
- NerDLModel
- NerConverterInternalModel
- WordEmbeddingsModel
- MedicalNerModel
- NerConverterInternalModel
- MedicalNerModel
- NerConverterInternalModel
- MedicalNerModel
- NerConverterInternalModel
- ChunkMergeModel
- ContextualParserModel
- ContextualParserModel
- ContextualParserModel
- ContextualParserModel
- ContextualParserModel
- ContextualParserModel
- ContextualParserModel
- RegexMatcherInternalModel
- ContextualParserModel
- ContextualParserModel
- TextMatcherInternalModel
- TextMatcherInternalModel
- TextMatcherInternalModel
- ContextualParserModel
- RegexMatcherInternalModel
- RegexMatcherInternalModel
- RegexMatcherInternalModel
- ChunkMergeModel
- ChunkMergeModel
- DeIdentificationModel
- Finisher
Original file line number Diff line number Diff line change
@@ -0,0 +1,112 @@
---
layout: model
title: Pipeline for Logical Observation Identifiers Names and Codes (LOINC-Numeric)
author: John Snow Labs
name: loinc_numeric_resolver_pipeline
date: 2024-10-08
tags: [licensed, en, clinical, loinc, pipeline, resolver]
task: [Entity Resolution, Pipeline Healthcare]
language: en
edition: Healthcare NLP 5.5.0
spark_version: 3.0
supported: true
annotator: PipelineModel
article_header:
type: cover
use_language_switcher: "Python-Scala-Java"
---

## Description

This pipeline extracts `TEST` entities and maps them to their correspondings Logical Observation Identifiers Names and Codes(LOINC) codes using `sbiobert_base_cased_mli` sentence embeddings. It was prepared with the numeric LOINC codes, without the inclusion of LOINC “Document Ontology” codes starting with the letter “L”. It also provides the official resolution of the codes within the brackets.

## Predicted Entities

`TEST`, `Test_Result`

{:.btn-box}
<button class="button button-orange" disabled>Live Demo</button>
<button class="button button-orange" disabled>Open in Colab</button>
[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/clinical/models/loinc_numeric_resolver_pipeline_en_5.5.0_3.0_1728417134407.zip){:.button.button-orange.button-orange-trans.arr.button-icon.hidden}
[Copy S3 URI](s3://auxdata.johnsnowlabs.com/clinical/models/loinc_numeric_resolver_pipeline_en_5.5.0_3.0_1728417134407.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3}

## How to use



<div class="tabs-box" markdown="1">
{% include programmingLanguageSelectScalaPythonNLU.html %}

```python

from sparknlp.pretrained import PretrainedPipeline

loinc_pipeline = PretrainedPipeline("loinc_numeric_resolver_pipeline", "en", "clinical/models")

text = """A 65-year-old woman presents to the office with generalized fatigue for the last 4 months. She used to walk 1 mile each evening but now gets tired after 1-2 blocks. She has a history of Crohn disease and hypertension for which she receives appropriate medications. She is married and lives with her husband. She eats a balanced diet that includes chicken, fish, pork, fruits, and vegetables. She rarely drinks alcohol and denies tobacco use. A physical examination is unremarkable. Laboratory studies show the following:
Hemoglobin: 9.8 g/dL
Hematocrit: 32%
Mean Corpuscular Volume: 110 μm3"""

result = loinc_pipeline.fullAnnotate(text)

```
```scala

import com.johnsnowlabs.nlp.pretrained.PretrainedPipeline

val loinc_pipeline = PretrainedPipeline("loinc_numeric_resolver_pipeline", "en", "clinical/models")

val text = """A 65-year-old woman presents to the office with generalized fatigue for the last 4 months. She used to walk 1 mile each evening but now gets tired after 1-2 blocks. She has a history of Crohn disease and hypertension for which she receives appropriate medications. She is married and lives with her husband. She eats a balanced diet that includes chicken, fish, pork, fruits, and vegetables. She rarely drinks alcohol and denies tobacco use. A physical examination is unremarkable. Laboratory studies show the following:
Hemoglobin: 9.8 g/dL
Hematocrit: 32%
Mean Corpuscular Volume: 110 μm3"""

val result = loinc_pipeline.fullAnnotate(text)

```
</div>

## Results

```bash

+-----------------------+-----+---+-------+-----------------------------------------------------------------+-----------------------------------------------------------------+-----------------------------------------------------------------+
| chunks|begin|end| code| all_codes| resolutions| all_distances|
+-----------------------+-----+---+-------+-----------------------------------------------------------------+-----------------------------------------------------------------+-----------------------------------------------------------------+
| A physical examination| 443|464|55286-9|[55286-9, 11384-5, 29544-4, 29545-1, 32427-7, 11435-5, 29271-4...|[Physical exam by body areas [Physical exam by body areas], Ph...|[0.0713, 0.0913, 0.0910, 0.0961, 0.1114, 0.1119, 0.1153, 0.112...|
| Laboratory studies| 483|500|26436-6|[26436-6, 52482-7, 11502-2, 34075-2, 100455-5, 85069-3, 101129...|[Laboratory studies (set) [Laboratory studies (set)], Laborato...|[0.0469, 0.0648, 0.0748, 0.0947, 0.0967, 0.1285, 0.1257, 0.129...|
| Hemoglobin| 522|531|10346-5|[10346-5, 15082-1, 11559-2, 2030-5, 34618-9, 38896-7, 717-9, 1...|[Haemoglobin [Hemoglobin A [Units/volume] in Blood by Electrop...|[0.0214, 0.0356, 0.0563, 0.0654, 0.0886, 0.0891, 0.1005, 0.105...|
| Hematocrit| 543|552|32354-3|[32354-3, 20570-8, 11153-4, 13508-7, 104874-3, 42908-4, 11559-...|[Hematocrit [Volume Fraction] of Arterial blood [Hematocrit [V...|[0.0590, 0.0625, 0.0675, 0.0737, 0.0890, 0.1035, 0.1060, 0.107...|
|Mean Corpuscular Volume| 559|581|30386-7|[30386-7, 101864-7, 20161-6, 18033-1, 19853-1, 101150-1, 59117...|[Erythrocyte mean corpuscular diameter [Length] [Erythrocyte m...|[0.1344, 0.1333, 0.1350, 0.1359, 0.1353, 0.1427, 0.1523, 0.147...|
+-----------------------+-----+---+-------+-----------------------------------------------------------------+-----------------------------------------------------------------+-----------------------------------------------------------------+

```
{:.model-param}
## Model Information
{:.table-model}
|---|---|
|Model Name:|loinc_numeric_resolver_pipeline|
|Type:|pipeline|
|Compatibility:|Healthcare NLP 5.5.0+|
|License:|Licensed|
|Edition:|Official|
|Language:|en|
|Size:|2.8 GB|
## Included Models
- DocumentAssembler
- SentenceDetectorDLModel
- TokenizerModel
- WordEmbeddingsModel
- MedicalNerModel
- NerConverterInternalModel
- MedicalNerModel
- NerConverterInternalModel
- ChunkMergeModel
- Chunk2Doc
- BertSentenceEmbeddings
- SentenceEntityResolverModel
113 changes: 113 additions & 0 deletions docs/_posts/Meryem1425/2024-10-08-loinc_resolver_pipeline_en.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,113 @@
---
layout: model
title: Pipeline for Logical Observation Identifiers Names and Codes (LOINC)
author: John Snow Labs
name: loinc_resolver_pipeline
date: 2024-10-08
tags: [licensed, en, loinc, pipeline, resolver]
task: [Entity Resolution, Pipeline Healthcare]
language: en
edition: Healthcare NLP 5.5.0
spark_version: 3.0
supported: true
annotator: PipelineModel
article_header:
type: cover
use_language_switcher: "Python-Scala-Java"
---

## Description

This pipeline extracts `Test` entities from clinical texts and maps them to their corresponding Logical Observation Identifiers Names and Codes (LOINC) codes using `sbiobert_base_cased_mli` Sentence Bert Embeddings.

## Predicted Entities

`loinc_code`

{:.btn-box}
<button class="button button-orange" disabled>Live Demo</button>
<button class="button button-orange" disabled>Open in Colab</button>
[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/clinical/models/loinc_resolver_pipeline_en_5.5.0_3.0_1728412941145.zip){:.button.button-orange.button-orange-trans.arr.button-icon.hidden}
[Copy S3 URI](s3://auxdata.johnsnowlabs.com/clinical/models/loinc_resolver_pipeline_en_5.5.0_3.0_1728412941145.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3}

## How to use



<div class="tabs-box" markdown="1">
{% include programmingLanguageSelectScalaPythonNLU.html %}

```python

from sparknlp.pretrained import PretrainedPipeline

ner_pipeline = PretrainedPipeline("loinc_resolver_pipeline", "en", "clinical/models")

result = ner_pipeline.annotate("""A 65-year-old woman presents to the office with generalized fatigue for the last 4 months.
She used to walk 1 mile each evening but now gets tired after 1-2 blocks. She has a history of Crohn disease and hypertension
for which she receives appropriate medications. She is married and lives with her husband. She eats a balanced diet that
includes chicken, fish, pork, fruits, and vegetables. She rarely drinks alcohol and denies tobacco use. Her vital signs are
within normal limits. A physical examination is unremarkable. Laboratory studies show the following: Hemoglobin: 9.8 g/dL,
Hematocrit: 32%, Mean Corpuscular Volume: 110 μm3""")

```
```scala

import com.johnsnowlabs.nlp.pretrained.PretrainedPipeline

val ner_pipeline = PretrainedPipeline("loinc_resolver_pipeline", "en", "clinical/models")

val result = ner_pipeline.annotate("""A 65-year-old woman presents to the office with generalized fatigue for the last 4 months.
She used to walk 1 mile each evening but now gets tired after 1-2 blocks. She has a history of Crohn disease and hypertension
for which she receives appropriate medications. She is married and lives with her husband. She eats a balanced diet that
includes chicken, fish, pork, fruits, and vegetables. She rarely drinks alcohol and denies tobacco use. Her vital signs are
within normal limits. A physical examination is unremarkable. Laboratory studies show the following: Hemoglobin: 9.8 g/dL,
Hematocrit: 32%, Mean Corpuscular Volume: 110 μm3""")

```
</div>

## Results

```bash

+-----------------------+-----+----------+----------------------------------------------------------------------+----------------------------------------------------------------------+----------------------------------------------------------------------+
| chunk|label|loinc_code| resolution| all_codes| all_resolutions|
+-----------------------+-----+----------+----------------------------------------------------------------------+----------------------------------------------------------------------+----------------------------------------------------------------------+
| Vital signs| Test| 8716-3| Vital signs [Vital signs]|8716-3:::LP133943-3:::LP204118-6:::80339-5:::34566-0:::29274-8:::95...|Vital signs [Vital signs]:::EMS vital signs [EMS vital signs]:::Vit...|
| A physical examination| Test| LP7801-6| Physical exam [Physical exam]|LP7801-6:::LP269267-3:::LP94385-9:::55286-9:::11384-5:::LP133607-4:...|Physical exam [Physical exam]:::Estimated from physical examination...|
| Laboratory studies| Test| LP74124-6| Laboratory studies [Laboratory studies]|LP74124-6:::26436-6:::LP36394-2:::52482-7:::ATTACH.LAB:::11502-2:::...|Laboratory studies [Laboratory studies]:::Laboratory studies (set) ...|
| Hemoglobin| Test| LP14449-0| Hemoglobin [Hemoglobin]|LP14449-0:::LP30929-1:::LP16455-5:::10346-5:::LP16428-2:::LP14554-7...|Hemoglobin [Hemoglobin]:::Hemoglobin G [Hemoglobin G]:::Hemoglobin ...|
| Hematocrit| Test| LP15101-6| Hematocrit [Hematocrit]|LP15101-6:::LP308151-2:::32354-3:::20570-8:::11153-4:::LP74090-9:::...|Hematocrit [Hematocrit]:::Hematocrit/Hemoglobin [Hematocrit/Hemoglo...|
|Mean Corpuscular Volume| Test| LP15191-7|Erythrocyte mean corpuscular volume [Erythrocyte mean corpuscular v...|LP15191-7:::LP17688-0:::LP62885-6:::LP29006-1:::LP66395-2:::LP41110...|Erythrocyte mean corpuscular volume [Erythrocyte mean corpuscular v...|
+-----------------------+-----+----------+----------------------------------------------------------------------+----------------------------------------------------------------------+----------------------------------------------------------------------+

```
{:.model-param}
## Model Information
{:.table-model}
|---|---|
|Model Name:|loinc_resolver_pipeline|
|Type:|pipeline|
|Compatibility:|Healthcare NLP 5.5.0+|
|License:|Licensed|
|Edition:|Official|
|Language:|en|
|Size:|3.2 GB|
## Included Models
- DocumentAssembler
- SentenceDetectorDLModel
- TokenizerModel
- WordEmbeddingsModel
- MedicalNerModel
- NerConverterInternalModel
- MedicalNerModel
- NerConverterInternalModel
- ChunkMergeModel
- Chunk2Doc
- BertSentenceEmbeddings
- SentenceEntityResolverModel
Loading

0 comments on commit 73b7b70

Please sign in to comment.