Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add de-duping logic for multiple claim years #61

Open
sarah-tuva opened this issue Mar 19, 2024 · 0 comments
Open

Add de-duping logic for multiple claim years #61

sarah-tuva opened this issue Mar 19, 2024 · 0 comments
Labels
enhancement New feature or request

Comments

@sarah-tuva
Copy link
Member

sarah-tuva commented Mar 19, 2024

This issue has been created from a discussion in Slack. Duplicated claims are showing up in the run-out and new year files. The following code is an example of how this has been resolved from the community:

That's true. Thanks
@Chase Jones. Hope this is helpful to
@Daniel Goldberg or others. Instead of checking for dups on data load, we figured it was easier to just load all the dup data and then do an after-the-fact check to identify and delete dups from all CCLF raw tables. We used statements similar to this on each of the raw CCLF tables to clean them up prior to running DBT models. We implemented this as the last step of our monthly ETL which should keep from any future dups entering the DBT models. You can adapt for each of the CCLF files.

delete from cclf_source.cclf.partb_physicians 
where (cur_clm_uniq_id, clm_line_num, create_date) in (
   select cur_clm_uniq_id, clm_line_num, create_date
   from (
       select 
            cur_clm_uniq_id
           ,clm_line_num
           ,create_date
           ,row_number() over (partition by cur_clm_uniq_id, clm_line_num order by create_date) as rownum
       from cclf_source.cclf.partb_physicians 
   ) as sub
   where rownum > 1
);

Update:

@Richard Queen
Thanks so much for providing that. That helped get me in the right direction. Maybe this was a translation error on my part, but I did want to mention that what you provided seems to have removed all data for any claim that contained a duplicate rather than retaining a row due to the where clause.
I ended up with the following at the end of my ETL process that seems to be working. One gotcha to be aware of is the new column row_num in the resulting table that might as well be removed.

create or replace table tuva.raw_cclf.partb_physicians as select * from (
       select
            *
           ,row_number() over (partition by cur_clm_uniq_id, clm_line_num order by clm_line_num) as row_num
       from tuva.raw_cclf.partb_physicians
  qualify row_num = 1
);
@sarah-tuva sarah-tuva added the enhancement New feature or request label Mar 19, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

1 participant