BFD-3664: Pipeline job for SAMHSA tag backfill. #2506

dondevun · 2024-12-03T16:08:01Z

JIRA Ticket:
BFD-3664

What Does This PR Do?

This PR adds a pipeline job to backfill the SAMHSA tags tables. This will be able to run concurrently with the RDA pipeline job, but is disabled on CCW pipeline instances.

This will process all of the tables that could have SAMHSA codes concurrently, each with its own entityManager. There were some tradeoffs for the sake of performance, the biggest being that it does not construct entities from the SQL queries. Instead, it returns arrays of objects, and relies on the code to be aware of the types in each array position. This is obviously not ideal, the biggest issue being possible ClassCastExceptions if the array objects are not processed in the correct order; however, without the entity class to map the columns to the types, it is unfortunately unavoidable for this implementation.

What Should Reviewers Watch For?

If you're reviewing this PR, please check for these things in particular:

What Security Implications Does This PR Have?

Please indicate if this PR does any of the following:

Adds any new software dependencies
Modifies any security controls
Adds new transmission or storage of data
Any other changes that could possibly affect security?
I have considered the above security implications as it relates to this PR. (If one or more of the above apply, it cannot be merged without the ISSO or team security engineer's (@sb-benohe) approval.)

Validation

Have you fully verified and tested these changes? Is the acceptance criteria met? Please provide reproducible testing instructions, code snippets, or screenshots as applicable.

…if/samhsa/CcwTagKey.java Co-authored-by: aschey-forpeople <[email protected]>

…rupted.

…hout.

aschey-forpeople · 2024-12-06T17:51:04Z

apps/bfd-pipeline/bfd-pipeline-app/src/main/java/gov/cms/bfd/pipeline/app/AppConfiguration.java

+      ConfigLoader config, boolean ccwPipelineEnabled) {
+    boolean enabled = config.booleanOption(SSM_PATH_SAMHSA_BACKFILL_ENABLED).orElse(false);
+    // We don't want to run if we're on a CCW Pipeline instance
+    if (!enabled || ccwPipelineEnabled) {


It is a bit confusing that this runs on the RDA pipeline, but I understand the reasoning. Ideally this could run as its own pipeline, but that would increase the complexity a fair bit. I think this is fine for now and we can revisit once we're running in ECS.

aschey-forpeople · 2024-12-10T21:47:40Z

...s/src/main/java/gov/cms/bfd/pipeline/sharedutils/samhsa/backfill/AbstractSamhsaBackfill.java

+    String queryStr =
+        strSub.replace(
+            startingClaim.isPresent() ? QUERY_WITH_STARTING_CLAIM : QUERY_WITH_NO_STARTING_CLAIM);
+    return entityManager.createNativeQuery(queryStr, tableEntry.getClaimClass());


Would it be possible to use createQuery instead of createNativeQuery? That will at least perform some type checking to ensure the type is assignable to the query result. I believe that would remove the need for select * as well.

Yeah, we can do that. The check for existing tags may be a bit tricky, but I'll see what I can do.

I think these changes should work fine. Testing performance now.

aschey-forpeople · 2024-12-10T22:06:31Z

ops/terraform/services/base/values/test.yaml

@@ -152,6 +152,8 @@
 /bfd/${env}/pipeline/nonsensitive/rda/cleanup/run_size: UNDEFINED
 /bfd/${env}/pipeline/nonsensitive/rda/cleanup/transaction_size: UNDEFINED
 /bfd/${env}/pipeline/nonsensitive/rda/instance_type: m6a.large
+/bfd/${env}/pipeline/nonsensitive/rda/samhsa/backfill/enabled: "false"
+/bfd/${env}/pipeline/nonsensitive/rda/samhsa/backfill/batch_size: 15000  


Might want to test with a larger batch size here to see if it helps. 100,000 or so shouldn't be a problem.

dondevun · 2024-12-11T13:46:42Z

dondevun and others added 16 commits November 19, 2024 18:58

Rebased commit for CCW RIF ingestion tag creation.

c623eb5

Update apps/bfd-model/bfd-model-rif/src/main/java/gov/cms/bfd/model/r…

15a7958

…if/samhsa/CcwTagKey.java Co-authored-by: aschey-forpeople <[email protected]>

Refactor SamhsaUtil to combine a lot of the logic.

99a9bda

Refactor CCW Samhsa ingestion.

1d5206b

Fix broken table field in samhsa adapters.

4e5c249

Refactor samhsa adapters

ec53538

Removed erroneous file

18754ba

Refactored the way SAMHSA methods are executed.

e2f5960

Removed unused documentation for throws.

88219b4

Added boolean on SamhsaUtil to indicate if a tag was created.

da7848f

Add javadoc for return statement in SamhsaUtil

56351c9

Create pipeline job to backfill SAMHSA data.:

c4a82dd

Add configuration for backfill pipeline.

b5c3a2b

Move majority of CCW Samsha tag processing to adapters.

69bc595

Rearrange methods in SamhsaUtil.

42535d5

Update backfill query to ignore claims that already have tags.

ead04dc

dondevun changed the title ~~Bfd 3664~~ BFD-3664: Pipeline job for SAMHSA tag backfill. Dec 4, 2024

dondevun added 4 commits December 4, 2024 19:21

Formatting changes

67f26a4

Merge branch 'feature/samhsa2.0' into BFD-3664

52da6dd

Fix formatting.

e978d0a

Create ability for the backfill to restart where it left off if inter…

b3a9b15

…rupted.

dondevun marked this pull request as ready for review December 5, 2024 19:25

dondevun requested review from mjburling, aschey-forpeople and malessi as code owners December 5, 2024 19:25

dondevun added 4 commits December 5, 2024 19:48

Clean up switch statement in getClaimId methods.

2ff48dc

Fixed checkstyle violation.

87f2de9

Create integration test for backfill service.

c84a48e

Changes to Samhsa backfill service to reuse same entityManager throug…

6ae26d1

…hout.

aschey-forpeople reviewed Dec 6, 2024

View reviewed changes

Refactor backfill to use enums.

e11f2dc

aschey-forpeople reviewed Dec 10, 2024

View reviewed changes

Using Query instead of NativeQuery

4d6954d

dondevun added 5 commits December 11, 2024 16:32

Testing omitting the not exists statement in the queries

eb0f047

Refactor code to query codes from tables directly.

068c4fc

Refactor CCW Samhsa backfill to use SQL queries.

9a6f719

Fix null value for appStateCcw

ab6b9f7

Added new fields to progress table.

3373f25

dondevun marked this pull request as draft January 6, 2025 16:18

dondevun force-pushed the BFD-3664 branch 2 times, most recently from 890887b to b75b481 Compare January 7, 2025 19:06

Give each thread its own entity manager. Create a logging interval.

dd6e285

dondevun force-pushed the BFD-3664 branch from b75b481 to dd6e285 Compare January 8, 2025 15:39

dondevun added 5 commits January 10, 2025 12:04

Move from and thru dates fetch to a separate query, eliminating joins.

c004558

Fix typo in query.

9c86480

Fix typo in query.

f7b38cf

Cleaned up comments.

0791074

Added samhsa backfill log interval to base yaml files.

3491629

dondevun marked this pull request as ready for review January 10, 2025 14:32

Run formatter

dab5366

dondevun force-pushed the BFD-3664 branch from 0393aa7 to dab5366 Compare January 10, 2025 16:52

dondevun and others added 2 commits January 10, 2025 11:53

Merge branch 'feature/samhsa2.0' into BFD-3664

0c56e7a

Fixed type in run-bfd-pipeline

0257f75

dondevun requested a review from aschey-forpeople January 10, 2025 17:05

dondevun added 4 commits January 15, 2025 14:31

Make the queries a little more manageable.

a09dc5f

Add comment to PiplineApplication.

66f37e1

Clean up comments.

dc14505

Add more comments, formatting changes.

ea43adf

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

BFD-3664: Pipeline job for SAMHSA tag backfill. #2506

BFD-3664: Pipeline job for SAMHSA tag backfill. #2506

dondevun commented Dec 3, 2024 •

edited

Loading

aschey-forpeople Dec 6, 2024

aschey-forpeople Dec 10, 2024

dondevun Dec 10, 2024

dondevun Dec 11, 2024

aschey-forpeople Dec 10, 2024

dondevun commented Dec 11, 2024

BFD-3664: Pipeline job for SAMHSA tag backfill. #2506

Are you sure you want to change the base?

BFD-3664: Pipeline job for SAMHSA tag backfill. #2506

Conversation

dondevun commented Dec 3, 2024 • edited Loading

What Does This PR Do?

What Should Reviewers Watch For?

What Security Implications Does This PR Have?

Validation

aschey-forpeople Dec 6, 2024

Choose a reason for hiding this comment

aschey-forpeople Dec 10, 2024

Choose a reason for hiding this comment

dondevun Dec 10, 2024

Choose a reason for hiding this comment

dondevun Dec 11, 2024

Choose a reason for hiding this comment

aschey-forpeople Dec 10, 2024

Choose a reason for hiding this comment

dondevun commented Dec 11, 2024

dondevun commented Dec 3, 2024 •

edited

Loading