-
Notifications
You must be signed in to change notification settings - Fork 25
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
1 parent
c7b88b8
commit 73b7b70
Showing
13 changed files
with
1,835 additions
and
0 deletions.
There are no files selected for viewing
140 changes: 140 additions & 0 deletions
140
docs/_posts/Cabir40/2024-10-07-clinical_deidentification_nameAugmented_v2_en.md
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,140 @@ | ||
--- | ||
layout: model | ||
title: Clinical Deidentification Pipeline (Sentence Wise) | ||
author: John Snow Labs | ||
name: clinical_deidentification_nameAugmented_v2 | ||
date: 2024-10-07 | ||
tags: [deidentification, deid, en, licensed, clinical, pipeline, sent_wise] | ||
task: [De-identification, Pipeline Healthcare] | ||
language: en | ||
edition: Healthcare NLP 5.4.0 | ||
spark_version: 3.4 | ||
supported: true | ||
annotator: PipelineModel | ||
article_header: | ||
type: cover | ||
use_language_switcher: "Python-Scala-Java" | ||
--- | ||
|
||
## Description | ||
|
||
This pipeline can be used to deidentify PHI information from medical texts. The PHI information will be masked and obfuscated in the resulting text. | ||
The pipeline can mask and obfuscate `MEDICALRECORD`, `ORGANIZATION`, `PROFESSION`, `HEALTHPLAN`, `NAME`, `LOCATION-OTHER`, `URL`, `DEVICE`, `CITY`, `DATE`, `ZIP`, `STATE`, | ||
`COUNTRY`, `STREET`, `PHONE`, `LOCATION`, `EMAIL`, `IDNUM`, `BIOID`, `FAX`, `AGE`, `LOCATION_OTHER`, `DLN`, `CONTACT`, `ID`, `SSN`, `ACCOUNT`, `PLATE`, `VIN`, `LICENSE`, | ||
`IP` entities. | ||
|
||
## Predicted Entities | ||
|
||
`MEDICALRECORD`, `ORGANIZATION`, `PROFESSION`, `HEALTHPLAN`, `NAME`, `LOCATION-OTHER`, `URL`, `DEVICE`, `CITY`, `DATE`, `ZIP`, `STATE`, | ||
`COUNTRY`, `STREET`, `PHONE`, `LOCATION`, `EMAIL`, `IDNUM`, `BIOID`, `FAX`, `AGE`, `LOCATION_OTHER`, `DLN`, `CONTACT`, `ID`, `SSN`, `ACCOUNT`, `PLATE`, `VIN`, `LICENSE`, | ||
`IP` | ||
|
||
{:.btn-box} | ||
<button class="button button-orange" disabled>Live Demo</button> | ||
<button class="button button-orange" disabled>Open in Colab</button> | ||
[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/clinical/models/clinical_deidentification_nameAugmented_v2_en_5.4.0_3.4_1728315719478.zip){:.button.button-orange.button-orange-trans.arr.button-icon.hidden} | ||
[Copy S3 URI](s3://auxdata.johnsnowlabs.com/clinical/models/clinical_deidentification_nameAugmented_v2_en_5.4.0_3.4_1728315719478.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} | ||
|
||
## How to use | ||
|
||
|
||
|
||
<div class="tabs-box" markdown="1"> | ||
{% include programmingLanguageSelectScalaPythonNLU.html %} | ||
|
||
```python | ||
from sparknlp.pretrained import PretrainedPipeline | ||
|
||
deid_pipeline = PretrainedPipeline("clinical_deidentification_nameAugmented_v2", "en", "clinical/models") | ||
|
||
text = """Dr. John Lee, from Royal Medical Clinic in Chicago, attended to the patient on 11/05/2024. | ||
The patient’s medical record number is 56467890. | ||
The patient, Emma Wilson, is 50 years old, her Contact number: 444-456-7890 .""" | ||
|
||
deid_result = deid_pipeline.fullAnnotate(text) | ||
|
||
print(''.join([i.metadata['masked'] for i in deid_result['obfuscated']])) | ||
print(''.join([i.result for i in deid_result['obfuscated']])) | ||
``` | ||
```scala | ||
import com.johnsnowlabs.nlp.pretrained.PretrainedPipeline | ||
|
||
val deid_pipeline = PretrainedPipeline("clinical_deidentification_nameAugmented_v2", "en", "clinical/models") | ||
|
||
val text = """Dr. John Lee, from Royal Medical Clinic in Chicago, attended to the patient on 11/05/2024. | ||
The patient’s medical record number is 56467890. | ||
The patient, Emma Wilson, is 50 years old, her Contact number: 444-456-7890 .""" | ||
|
||
val deid_result = deid_pipeline.fullAnnotate(text) | ||
|
||
println(deid_result("obfuscated").map(_("metadata")("masked").toString).mkString("")) | ||
println(deid_result("obfuscated").map(_("result").toString).mkString("")) | ||
``` | ||
</div> | ||
|
||
## Results | ||
|
||
```bash | ||
Masked with entity labels | ||
------------------------------ | ||
Dr. <NAME>, from <LOCATION> in <CITY>, attended to the patient on <DATE>. | ||
The patient’s medical record number is <MEDICALRECORD>. | ||
The patient, <NAME>, is <AGE> years old, her Contact number: <PHONE> . | ||
|
||
Obfuscated | ||
------------------------------ | ||
Dr. Rhodia Cera, from 252 Mchenry St in UNTERLAND, attended to the patient on 18/06/2024. | ||
The patient’s medical record number is 16109604. | ||
The patient, Eulice Hickory, is 44 years old, her Contact number: 540-981-1914 . | ||
``` | ||
|
||
{:.model-param} | ||
## Model Information | ||
|
||
{:.table-model} | ||
|---|---| | ||
|Model Name:|clinical_deidentification_nameAugmented_v2| | ||
|Type:|pipeline| | ||
|Compatibility:|Healthcare NLP 5.4.0+| | ||
|License:|Licensed| | ||
|Edition:|Official| | ||
|Language:|en| | ||
|Size:|1.9 GB| | ||
|
||
## Included Models | ||
|
||
- DocumentAssembler | ||
- SentenceDetectorDLModel | ||
- TokenizerModel | ||
- WordEmbeddingsModel | ||
- NerDLModel | ||
- NerConverterInternalModel | ||
- WordEmbeddingsModel | ||
- MedicalNerModel | ||
- NerConverterInternalModel | ||
- MedicalNerModel | ||
- NerConverterInternalModel | ||
- MedicalNerModel | ||
- NerConverterInternalModel | ||
- ChunkMergeModel | ||
- ContextualParserModel | ||
- ContextualParserModel | ||
- ContextualParserModel | ||
- ContextualParserModel | ||
- ContextualParserModel | ||
- ContextualParserModel | ||
- ContextualParserModel | ||
- RegexMatcherInternalModel | ||
- ContextualParserModel | ||
- ContextualParserModel | ||
- TextMatcherInternalModel | ||
- TextMatcherInternalModel | ||
- TextMatcherInternalModel | ||
- ContextualParserModel | ||
- RegexMatcherInternalModel | ||
- RegexMatcherInternalModel | ||
- RegexMatcherInternalModel | ||
- ChunkMergeModel | ||
- ChunkMergeModel | ||
- DeIdentificationModel | ||
- Finisher |
112 changes: 112 additions & 0 deletions
112
docs/_posts/Meryem1425/2024-10-08-loinc_numeric_resolver_pipeline_en.md
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,112 @@ | ||
--- | ||
layout: model | ||
title: Pipeline for Logical Observation Identifiers Names and Codes (LOINC-Numeric) | ||
author: John Snow Labs | ||
name: loinc_numeric_resolver_pipeline | ||
date: 2024-10-08 | ||
tags: [licensed, en, clinical, loinc, pipeline, resolver] | ||
task: [Entity Resolution, Pipeline Healthcare] | ||
language: en | ||
edition: Healthcare NLP 5.5.0 | ||
spark_version: 3.0 | ||
supported: true | ||
annotator: PipelineModel | ||
article_header: | ||
type: cover | ||
use_language_switcher: "Python-Scala-Java" | ||
--- | ||
|
||
## Description | ||
|
||
This pipeline extracts `TEST` entities and maps them to their correspondings Logical Observation Identifiers Names and Codes(LOINC) codes using `sbiobert_base_cased_mli` sentence embeddings. It was prepared with the numeric LOINC codes, without the inclusion of LOINC “Document Ontology” codes starting with the letter “L”. It also provides the official resolution of the codes within the brackets. | ||
|
||
## Predicted Entities | ||
|
||
`TEST`, `Test_Result` | ||
|
||
{:.btn-box} | ||
<button class="button button-orange" disabled>Live Demo</button> | ||
<button class="button button-orange" disabled>Open in Colab</button> | ||
[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/clinical/models/loinc_numeric_resolver_pipeline_en_5.5.0_3.0_1728417134407.zip){:.button.button-orange.button-orange-trans.arr.button-icon.hidden} | ||
[Copy S3 URI](s3://auxdata.johnsnowlabs.com/clinical/models/loinc_numeric_resolver_pipeline_en_5.5.0_3.0_1728417134407.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} | ||
|
||
## How to use | ||
|
||
|
||
|
||
<div class="tabs-box" markdown="1"> | ||
{% include programmingLanguageSelectScalaPythonNLU.html %} | ||
|
||
```python | ||
|
||
from sparknlp.pretrained import PretrainedPipeline | ||
|
||
loinc_pipeline = PretrainedPipeline("loinc_numeric_resolver_pipeline", "en", "clinical/models") | ||
|
||
text = """A 65-year-old woman presents to the office with generalized fatigue for the last 4 months. She used to walk 1 mile each evening but now gets tired after 1-2 blocks. She has a history of Crohn disease and hypertension for which she receives appropriate medications. She is married and lives with her husband. She eats a balanced diet that includes chicken, fish, pork, fruits, and vegetables. She rarely drinks alcohol and denies tobacco use. A physical examination is unremarkable. Laboratory studies show the following: | ||
Hemoglobin: 9.8 g/dL | ||
Hematocrit: 32% | ||
Mean Corpuscular Volume: 110 μm3""" | ||
|
||
result = loinc_pipeline.fullAnnotate(text) | ||
|
||
``` | ||
```scala | ||
|
||
import com.johnsnowlabs.nlp.pretrained.PretrainedPipeline | ||
|
||
val loinc_pipeline = PretrainedPipeline("loinc_numeric_resolver_pipeline", "en", "clinical/models") | ||
|
||
val text = """A 65-year-old woman presents to the office with generalized fatigue for the last 4 months. She used to walk 1 mile each evening but now gets tired after 1-2 blocks. She has a history of Crohn disease and hypertension for which she receives appropriate medications. She is married and lives with her husband. She eats a balanced diet that includes chicken, fish, pork, fruits, and vegetables. She rarely drinks alcohol and denies tobacco use. A physical examination is unremarkable. Laboratory studies show the following: | ||
Hemoglobin: 9.8 g/dL | ||
Hematocrit: 32% | ||
Mean Corpuscular Volume: 110 μm3""" | ||
|
||
val result = loinc_pipeline.fullAnnotate(text) | ||
|
||
``` | ||
</div> | ||
|
||
## Results | ||
|
||
```bash | ||
|
||
+-----------------------+-----+---+-------+-----------------------------------------------------------------+-----------------------------------------------------------------+-----------------------------------------------------------------+ | ||
| chunks|begin|end| code| all_codes| resolutions| all_distances| | ||
+-----------------------+-----+---+-------+-----------------------------------------------------------------+-----------------------------------------------------------------+-----------------------------------------------------------------+ | ||
| A physical examination| 443|464|55286-9|[55286-9, 11384-5, 29544-4, 29545-1, 32427-7, 11435-5, 29271-4...|[Physical exam by body areas [Physical exam by body areas], Ph...|[0.0713, 0.0913, 0.0910, 0.0961, 0.1114, 0.1119, 0.1153, 0.112...| | ||
| Laboratory studies| 483|500|26436-6|[26436-6, 52482-7, 11502-2, 34075-2, 100455-5, 85069-3, 101129...|[Laboratory studies (set) [Laboratory studies (set)], Laborato...|[0.0469, 0.0648, 0.0748, 0.0947, 0.0967, 0.1285, 0.1257, 0.129...| | ||
| Hemoglobin| 522|531|10346-5|[10346-5, 15082-1, 11559-2, 2030-5, 34618-9, 38896-7, 717-9, 1...|[Haemoglobin [Hemoglobin A [Units/volume] in Blood by Electrop...|[0.0214, 0.0356, 0.0563, 0.0654, 0.0886, 0.0891, 0.1005, 0.105...| | ||
| Hematocrit| 543|552|32354-3|[32354-3, 20570-8, 11153-4, 13508-7, 104874-3, 42908-4, 11559-...|[Hematocrit [Volume Fraction] of Arterial blood [Hematocrit [V...|[0.0590, 0.0625, 0.0675, 0.0737, 0.0890, 0.1035, 0.1060, 0.107...| | ||
|Mean Corpuscular Volume| 559|581|30386-7|[30386-7, 101864-7, 20161-6, 18033-1, 19853-1, 101150-1, 59117...|[Erythrocyte mean corpuscular diameter [Length] [Erythrocyte m...|[0.1344, 0.1333, 0.1350, 0.1359, 0.1353, 0.1427, 0.1523, 0.147...| | ||
+-----------------------+-----+---+-------+-----------------------------------------------------------------+-----------------------------------------------------------------+-----------------------------------------------------------------+ | ||
|
||
``` | ||
{:.model-param} | ||
## Model Information | ||
{:.table-model} | ||
|---|---| | ||
|Model Name:|loinc_numeric_resolver_pipeline| | ||
|Type:|pipeline| | ||
|Compatibility:|Healthcare NLP 5.5.0+| | ||
|License:|Licensed| | ||
|Edition:|Official| | ||
|Language:|en| | ||
|Size:|2.8 GB| | ||
## Included Models | ||
- DocumentAssembler | ||
- SentenceDetectorDLModel | ||
- TokenizerModel | ||
- WordEmbeddingsModel | ||
- MedicalNerModel | ||
- NerConverterInternalModel | ||
- MedicalNerModel | ||
- NerConverterInternalModel | ||
- ChunkMergeModel | ||
- Chunk2Doc | ||
- BertSentenceEmbeddings | ||
- SentenceEntityResolverModel |
113 changes: 113 additions & 0 deletions
113
docs/_posts/Meryem1425/2024-10-08-loinc_resolver_pipeline_en.md
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,113 @@ | ||
--- | ||
layout: model | ||
title: Pipeline for Logical Observation Identifiers Names and Codes (LOINC) | ||
author: John Snow Labs | ||
name: loinc_resolver_pipeline | ||
date: 2024-10-08 | ||
tags: [licensed, en, loinc, pipeline, resolver] | ||
task: [Entity Resolution, Pipeline Healthcare] | ||
language: en | ||
edition: Healthcare NLP 5.5.0 | ||
spark_version: 3.0 | ||
supported: true | ||
annotator: PipelineModel | ||
article_header: | ||
type: cover | ||
use_language_switcher: "Python-Scala-Java" | ||
--- | ||
|
||
## Description | ||
|
||
This pipeline extracts `Test` entities from clinical texts and maps them to their corresponding Logical Observation Identifiers Names and Codes (LOINC) codes using `sbiobert_base_cased_mli` Sentence Bert Embeddings. | ||
|
||
## Predicted Entities | ||
|
||
`loinc_code` | ||
|
||
{:.btn-box} | ||
<button class="button button-orange" disabled>Live Demo</button> | ||
<button class="button button-orange" disabled>Open in Colab</button> | ||
[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/clinical/models/loinc_resolver_pipeline_en_5.5.0_3.0_1728412941145.zip){:.button.button-orange.button-orange-trans.arr.button-icon.hidden} | ||
[Copy S3 URI](s3://auxdata.johnsnowlabs.com/clinical/models/loinc_resolver_pipeline_en_5.5.0_3.0_1728412941145.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} | ||
|
||
## How to use | ||
|
||
|
||
|
||
<div class="tabs-box" markdown="1"> | ||
{% include programmingLanguageSelectScalaPythonNLU.html %} | ||
|
||
```python | ||
|
||
from sparknlp.pretrained import PretrainedPipeline | ||
|
||
ner_pipeline = PretrainedPipeline("loinc_resolver_pipeline", "en", "clinical/models") | ||
|
||
result = ner_pipeline.annotate("""A 65-year-old woman presents to the office with generalized fatigue for the last 4 months. | ||
She used to walk 1 mile each evening but now gets tired after 1-2 blocks. She has a history of Crohn disease and hypertension | ||
for which she receives appropriate medications. She is married and lives with her husband. She eats a balanced diet that | ||
includes chicken, fish, pork, fruits, and vegetables. She rarely drinks alcohol and denies tobacco use. Her vital signs are | ||
within normal limits. A physical examination is unremarkable. Laboratory studies show the following: Hemoglobin: 9.8 g/dL, | ||
Hematocrit: 32%, Mean Corpuscular Volume: 110 μm3""") | ||
|
||
``` | ||
```scala | ||
|
||
import com.johnsnowlabs.nlp.pretrained.PretrainedPipeline | ||
|
||
val ner_pipeline = PretrainedPipeline("loinc_resolver_pipeline", "en", "clinical/models") | ||
|
||
val result = ner_pipeline.annotate("""A 65-year-old woman presents to the office with generalized fatigue for the last 4 months. | ||
She used to walk 1 mile each evening but now gets tired after 1-2 blocks. She has a history of Crohn disease and hypertension | ||
for which she receives appropriate medications. She is married and lives with her husband. She eats a balanced diet that | ||
includes chicken, fish, pork, fruits, and vegetables. She rarely drinks alcohol and denies tobacco use. Her vital signs are | ||
within normal limits. A physical examination is unremarkable. Laboratory studies show the following: Hemoglobin: 9.8 g/dL, | ||
Hematocrit: 32%, Mean Corpuscular Volume: 110 μm3""") | ||
|
||
``` | ||
</div> | ||
|
||
## Results | ||
|
||
```bash | ||
|
||
+-----------------------+-----+----------+----------------------------------------------------------------------+----------------------------------------------------------------------+----------------------------------------------------------------------+ | ||
| chunk|label|loinc_code| resolution| all_codes| all_resolutions| | ||
+-----------------------+-----+----------+----------------------------------------------------------------------+----------------------------------------------------------------------+----------------------------------------------------------------------+ | ||
| Vital signs| Test| 8716-3| Vital signs [Vital signs]|8716-3:::LP133943-3:::LP204118-6:::80339-5:::34566-0:::29274-8:::95...|Vital signs [Vital signs]:::EMS vital signs [EMS vital signs]:::Vit...| | ||
| A physical examination| Test| LP7801-6| Physical exam [Physical exam]|LP7801-6:::LP269267-3:::LP94385-9:::55286-9:::11384-5:::LP133607-4:...|Physical exam [Physical exam]:::Estimated from physical examination...| | ||
| Laboratory studies| Test| LP74124-6| Laboratory studies [Laboratory studies]|LP74124-6:::26436-6:::LP36394-2:::52482-7:::ATTACH.LAB:::11502-2:::...|Laboratory studies [Laboratory studies]:::Laboratory studies (set) ...| | ||
| Hemoglobin| Test| LP14449-0| Hemoglobin [Hemoglobin]|LP14449-0:::LP30929-1:::LP16455-5:::10346-5:::LP16428-2:::LP14554-7...|Hemoglobin [Hemoglobin]:::Hemoglobin G [Hemoglobin G]:::Hemoglobin ...| | ||
| Hematocrit| Test| LP15101-6| Hematocrit [Hematocrit]|LP15101-6:::LP308151-2:::32354-3:::20570-8:::11153-4:::LP74090-9:::...|Hematocrit [Hematocrit]:::Hematocrit/Hemoglobin [Hematocrit/Hemoglo...| | ||
|Mean Corpuscular Volume| Test| LP15191-7|Erythrocyte mean corpuscular volume [Erythrocyte mean corpuscular v...|LP15191-7:::LP17688-0:::LP62885-6:::LP29006-1:::LP66395-2:::LP41110...|Erythrocyte mean corpuscular volume [Erythrocyte mean corpuscular v...| | ||
+-----------------------+-----+----------+----------------------------------------------------------------------+----------------------------------------------------------------------+----------------------------------------------------------------------+ | ||
|
||
``` | ||
{:.model-param} | ||
## Model Information | ||
{:.table-model} | ||
|---|---| | ||
|Model Name:|loinc_resolver_pipeline| | ||
|Type:|pipeline| | ||
|Compatibility:|Healthcare NLP 5.5.0+| | ||
|License:|Licensed| | ||
|Edition:|Official| | ||
|Language:|en| | ||
|Size:|3.2 GB| | ||
## Included Models | ||
- DocumentAssembler | ||
- SentenceDetectorDLModel | ||
- TokenizerModel | ||
- WordEmbeddingsModel | ||
- MedicalNerModel | ||
- NerConverterInternalModel | ||
- MedicalNerModel | ||
- NerConverterInternalModel | ||
- ChunkMergeModel | ||
- Chunk2Doc | ||
- BertSentenceEmbeddings | ||
- SentenceEntityResolverModel |
Oops, something went wrong.