You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This issue has been created from a discussion in Slack. Duplicated claims are showing up in the run-out and new year files. The following code is an example of how this has been resolved from the community:
That's true. Thanks @Chase Jones. Hope this is helpful to @Daniel Goldberg or others. Instead of checking for dups on data load, we figured it was easier to just load all the dup data and then do an after-the-fact check to identify and delete dups from all CCLF raw tables. We used statements similar to this on each of the raw CCLF tables to clean them up prior to running DBT models. We implemented this as the last step of our monthly ETL which should keep from any future dups entering the DBT models. You can adapt for each of the CCLF files.
delete from cclf_source.cclf.partb_physicians
where (cur_clm_uniq_id, clm_line_num, create_date) in (
select cur_clm_uniq_id, clm_line_num, create_date
from (
select
cur_clm_uniq_id
,clm_line_num
,create_date
,row_number() over (partition by cur_clm_uniq_id, clm_line_num order by create_date) as rownum
from cclf_source.cclf.partb_physicians
) as sub
where rownum > 1
);
Update:
@Richard Queen
Thanks so much for providing that. That helped get me in the right direction. Maybe this was a translation error on my part, but I did want to mention that what you provided seems to have removed all data for any claim that contained a duplicate rather than retaining a row due to the where clause.
I ended up with the following at the end of my ETL process that seems to be working. One gotcha to be aware of is the new column row_num in the resulting table that might as well be removed.
create or replace table tuva.raw_cclf.partb_physicians as select * from (
select
*
,row_number() over (partition by cur_clm_uniq_id, clm_line_num order by clm_line_num) as row_num
from tuva.raw_cclf.partb_physicians
qualify row_num = 1
);
The text was updated successfully, but these errors were encountered:
This issue has been created from a discussion in Slack. Duplicated claims are showing up in the run-out and new year files. The following code is an example of how this has been resolved from the community:
Update:
The text was updated successfully, but these errors were encountered: