Skip to content

Running LandRCBM in RIA #62

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 51 commits into
base: development
Choose a base branch
from

Conversation

DominiqueCaron
Copy link
Contributor

Preface

This is a draft PR, I am not sure it will ever be merged. The purpose is mainly to give an idea of the changes I made to run CBM_core with LandR inputs` (https://github.com/DominiqueCaron/LandRCBM/blob/main/global_LandRCBMsmallRIA.R). It is still pretty rough code.

Description

To make CBM_core run with LandR, I've made a number of changes. To avoid interfering with "standard" CBM simulations, most changes are within if else statements. Briefly, the simulation runs for a small area within RIA, but I am 99% that it does not works correctly, since I have not tested anything and have an incomplete understanding of CBM_core. This PR is coupled with changes made to CBM_dataPrep_RIA (PredictiveEcology/CBM_dataPrep_RIA#8) and LandRCBM_split3pools (PredictiveEcology/LandRCBM_split3pools#25).

  • There wasn't any changes to do in CBM_core to make the spinup work. I did some changes in the other repos. I still haven't test whether it works correctly thought.
  • I added an eventPriority = 10 argument to the annual event to make sure that LandR does its things between the spinup and the annual event (e.g., for the first year, we want growth & mortality to happen between the spinup and annual).
  • As of now, each cohort has a separate cohortGroups. For now, LandRCBM each cohort as a different gcids (set of increments), but in the future we could group cohorts based on increments. This is more of an optimization task, so I haven't bother to implement it yet.
  • The objects cohortGroups and cohortGroupKeep gets updated each year and only contains the cohorts that are present at time t. I think that works for the annual event, but create problems for tracking pools and emissions across years (see points of discussion).

Points of discussion and clarification.

  • Disturbed cohorts: I am not sure what is the expected increments for disturbed cohorts. My understanding is that in "standard" CBM, it uses the first point in the growth curves. If so, I need to review how LandR works after a disturbance. As of now, I believe that the increments for disturbed cohorts returned by LandRCBM_split3pools is: -(biomass at t-1). If there is regeneration+growth, there is a new cohort (same species but with age 1) with the initial growth.
  • Tracking pools and emissions across years: Currently, I think that CBM_core tracks pools and emissions across years per cohortGroups. However, unlike standard CBM runs, with LandRCBM, cohortGroupsID and cohortID are not static, they get updated each year. So, I don't see how we can retrieve what cohortGroup X at year Y represent once the simulation is done.

@camillegiuliano
Copy link
Contributor

This is awesome!!

When you say pools and emissions, I'm not sure if you mean the cbm_vars tables related to those (flux and pools) or if you mean the cbmPools and emissionsProducts tables that are build from them. Regardless, here are some of my thoughts for all three:

  • for the cbm_vars tables: If tracking cohortGroups is a problem, could we have an object in core that gets overwritten every year that keeps track of the "old groups" of the previous year vs the current year's group IDs, and use it to update the old groups in cbm_vars$pools/flux to the new groups of the current simulation year? I'm not sure how this would work in practice or if it's feasible, because I'm not super familiar with how groups are created and tracked in the LandRCBM version. If this is possible, the user won't be able to know what X group was at year X once the simulation is done, but the simulation itself will be able to keep track year to year while it is simulating.
  • for emissionsProducts: This is technically tracked by cohortGroup, but only in the sense that each row in the cbm_vars$flux table represents a cohort. It shouldn't matter whether those group IDs are static or not, it doesn't even look at the IDs. Each emission type just adds up the emissions for its whole column for the year into the total sim$emissionProducts for the year, regardless of how those rows are grouped. So this shouldn't be a problem here, as long as cbm_vars$flux is properly updated for that year.
  • for cbmPools, This table is definitely a point of discussions but I'm not sure that having consistent cohortGroupIDs is really that important at least in terms of building and updating the table. I haven't spent much time messing with this table (and if I did it was a while ago) so I may be wrong here, but the table gets essentially overwritten (and saved as an .rds file in the outputs folder) every year. I'm not sure if the way it's currently built depends on the IDs being static (I don't think it does), but if it does, I don't think it would be too hard of a fix. However, even if this is all true and it runs fine with dynamic IDs, this does mean that a user can't cross reference cbmPools tables for a cohort between years. If we want this to be possible, we could potentially add species/ecozone/spu columns to the table to mitigate that?

Hopefully this all makes some sense! I'm excited to see how it runs with more than 1 simulation year! I know the cbm_vars tables with the python functions can be picky to deal with when it comes to cohorts and IDs.

@DominiqueCaron
Copy link
Contributor Author

Thanks for the info, that helps!

I'm not sure if you mean the cbm_vars tables related to those (flux and pools) or if you mean the cbmPools and emissionsProducts tables that are build from them

I mean the cbmPools and emissionsProducts.

Could we have an object in core that gets overwritten every year that keeps track of the "old groups" of the previous year vs the current year's group IDs, and use it to update the old groups in cbm_vars$pools/flux to the new groups of the current simulation year?

I believe that is done with cohortGroupKeep. So each year, we update the cbm_vars$pools/flux by linking the new groups with the group ID of the previous timestep. These are determined by looking at the species, age, and pixel Index. So tracking the cohorts between year t and t-1 is not really a problem, it is tracking through the entire simulation.

  • Each emission type just adds up the emissions for its whole column for the year into the total sim$emissionProducts for the year,

Ok, so emissions are tracked for the entire landscape? I guess that's fine then. So, it is only the cbmPools that cause problem.

  • for cbmPools, ...

Yes, I agree. I don't think that it does not work as it is, I just think we might need to rethink how pools are stored across year. How it is written, I don't see how we can, for example, look at how the pools in a given pixel (or subset of pixel) evolve in time, or how a certain species contributes to the carbon dynamics across time.

@camillegiuliano
Copy link
Contributor

Ok, so emissions are tracked for the entire landscape?

Yep, that table only tracks the entire simulated landscape for each year. So if you only have 1 simulation year, that table should only have one row for that year.

I don't see how we can, for example, look at how the pools in a given pixel (or subset of pixel) evolve in time, or how a certain species contributes to the carbon dynamics across time.

I think this is also somewhat true in that table for standard spadesCBM too, at least tracking a subset of pixels over time using the cbmPools table alone isn't really possible. You have no way to follow between old and new cohort groups when looking only at cbmPools, you can only track them if they never get disturbed. (unless you link this table with cohortGroupKeep every year, which I suppose would also work for the LandR version too, but I don't think is very efficient). 'spatialPlot' is capable of making yearly maps of total carbon on the landscape per pixel, something that I think LandRCBM could do too. Looking at the code for that plot now, I have some minor fixes to make to make this work (it would only be accurate for the last simulation year right now, but it should be an easy fix), but this does allow to visually track the landscape over time. There may be a better way though, I guess we could track per pixel, but that doesn't seem efficient at all. Definitely something to discuss all together.

You can however track cbmPools by gcid (which is equivalent to species in standard CBM) and age in standard CBM which I think is where the main difference/problem lies. If I remember correctly, in LandRCBM, gcid doesn't necessarily represent a species, so right now as is, we lose the opportunity to track carbon per species. We'd have to link species to the cbmPools table (through gcMeta maybe?) in the LandR version to be able to track species carbon dynamics over time the same way you would in standard CBM.

@cboisvenue
Copy link
Contributor

Great work and good discussion all. Here are a few thoughts on Dominique's questions:

  • I don't think that having the cohortGroups and cohortGroupKeep updated each year is a problem (I agree with Camille). What we need from LandR /LandRCBM is yearly increments (except for the spinup we need the curves - but for the spinup there is one gcid per "pixels that have the same species combination", so it should work the same as standard CBM).
  • I am a bit concerned about your comment about setting a priority Dominique. What event did you change that for and why? Since CBM has a very specific order of carbon transfers already (see Overview section of the manual), and that the year 1 increments post spinup should be coming from the LandR/LandRCBM side, I don't quite follow.
  • cbmPools being outputted each year was a thing for SK runs. In RIA, that was too much, we we only saved at set intervals (different for different scenarios - 5 for present day, and 100 for the FRI scenario).
  • I don't think we need to track cohort dynamics through time, not more than what LandR does anyway. So if we want to look at species evolution through time it would be on the LandR side and only for AGBiomass.

@DominiqueCaron
Copy link
Contributor Author

DominiqueCaron commented May 1, 2025

Ok, great. For tracking cohorts (point 1,2,4), that answer a lot of the answers I had! One thing I need implement is how DOM is tracked. We discussed previously the potential challenge with DOM being more of a pixel attribute and less of a cohort attribute: when a cohort "dies out", there is still carbon in the soil. How it is coded now, when we "update" the cohorts ID, we only keep the cohorts in cohortDT which are the cohorts that have above ground biomass. So, we loose that legacy of DOM from cohorts that disappeared that year.

  • I am a bit concerned about your comment about setting a priority Dominique. What event did you change that for and why? Since CBM has a very specific order of carbon transfers already (see Overview section of the manual), and that the year 1 increments post spinup should be coming from the LandR/LandRCBM side, I don't quite follow.

This does particularly worries me. I set the priority so that the annual event of CBM happens after LandR and LandRCBM events. Otherwise, it was happening before, and so getting the increments from the previous year. Here is the order of the events for the first year:

> completed(simOut)[c(20:40),]
    eventTime           moduleName             eventType eventPriority         ._clockTime ._prevEventTimeFinish
        <num>               <char>                <char>         <num>              <POSc>                <POSc>
 1:      1985  Biomass_yieldTables                  init          1.00 2025-05-01 09:09:59   2025-05-01 09:09:29
 2:      1985         Biomass_core                  init          1.00 2025-05-01 09:10:31   2025-05-01 09:09:59
 3:      1985 LandRCBM_split3pools                  init          1.00 2025-05-01 09:10:38   2025-05-01 09:10:31
 4:      1985             CBM_core                  init          1.00 2025-05-01 09:10:38   2025-05-01 09:10:38
 5:      1985     CBM_dataPrep_RIA readDisturbanceEvents          5.00 2025-05-01 09:11:30   2025-05-01 09:10:38
 6:      1985 LandRCBM_split3pools                plotYC          5.00 2025-05-01 09:11:52   2025-05-01 09:11:30
 7:      1985             CBM_core                spinup          5.00 2025-05-01 09:12:01   2025-05-01 09:11:52
 8:      1985         Biomass_core    mortalityAndGrowth          6.00 2025-05-01 09:12:02   2025-05-01 09:12:01
 9:      1985         Biomass_core            summaryBGM          8.25 2025-05-01 09:12:04   2025-05-01 09:12:02
10:      1985         Biomass_core  plotSummaryBySpecies          9.00 2025-05-01 09:12:05   2025-05-01 09:12:04
11:      1985 LandRCBM_split3pools      annualIncrements          9.00 2025-05-01 09:12:07   2025-05-01 09:12:05
12:      1985 LandRCBM_split3pools    annualDisturbances          9.00 2025-05-01 09:12:08   2025-05-01 09:12:07
13:      1985         Biomass_core              plotMaps          9.25 2025-05-01 09:12:28   2025-05-01 09:12:08
14:      1985         Biomass_core              plotAvgs          9.50 2025-05-01 09:12:28   2025-05-01 09:12:28
15:      1985 LandRCBM_split3pools     summarizeAGBPools         10.00 2025-05-01 09:12:28   2025-05-01 09:12:28
16:      1985             CBM_core                annual         10.00 2025-05-01 09:12:41   2025-05-01 09:12:28
17:      1985 LandRCBM_split3pools              plotMaps         11.00 2025-05-01 09:12:51   2025-05-01 09:12:41

So, 1) we do the init event of all modules to get all the data needed (biomass data, yield curves, etc.) at the begining of the simulation. 2) we do the CBM spinup with the initial state of the landscape. 3) we have growth and mortality for the first year. 4) calculate the increments for year 1 (nothing happens at annualDisturbances here). 5) finally, we do the first CBM annual event.

@suz-estella
Copy link
Contributor

Leaving some thoughts & a brainstorm for discussion here! Interested in everyone's thoughts.

Annual event priority

Prioritizing the annual event to occur a bit later to allow for space to do some work between the spinup and the annual event sounds relatively harmless to me. It seems as though the annual event generally should run after all the disturbances and other data has been set for the year. I suspect that it might be useful to prioritize it a bit higher (5-8?) to allow space for plotting/summarizing/reporting events to occur later.

Tracking cohort groups

Putting a little breakdown of the CBM_core objects here for the sake of highlighting a potential issue and solution that may make reporting easier.

CBM_core objects

Looking at our CBM_core objects, I seem them as falling roughly into 3 categories:

Input objects: created by other modules to be used in the spinup and annual events; not expected to change during the simulation:

  • standDT
  • cohortDT
  • gcMeta
  • growth_increments
  • disturbanceMeta
  • disturbanceEvents (rows may be added, but not usually updated or deleted)

Intermediate objects: these can be "tweaked" as needed throughout the simulation; they are not considered "results" - their purpose is to be used in the annual event; they need to match each other, and also may need to match with the input objects in some cases:

  • cohortGroups
  • cohortGroupKeep
  • cbm_vars

Output objects: rows are added each year with the annual event results; only ever used in plotting or other results summaries:

  • NPP: 1 row per year and cohort group
  • cbmPools: 1 row per year and cohort group
  • emissionsProducts: 1 row per year

NPP and cbmPools output objects: incomplete?

Unlike with LandR, CBM_core doesn't expect the input tables (e.g. cohortDT) to change at all, and therefore it always expects the input tables to match the intermediate tables and the output tables. Importantly (as we have been discussing), CBM_core doesn't expect the pixelIndex:cohortID key to change every year. This allows for our NPP and cbmPools output tables to be relatively lightweight because we can easily join in our cohort and stand attributes (cohortDT and standDT) to these tables as needed via our cohortGroupKeep table. However, the downside of this is that NPP and cbmPools are not complete enough objects on their own for reporting results: we still need to use the cohortGroupKeep, cohortDT, and standDT tables to create maps and other summary tables. This means that we need several often hefty objects on hand as part of what we can call the "results" and some of these are input objects (standDT, cohortDT) and some are potentially intermediate (cohortGroupKeep).

Proposing a solution!

I think we can potentially solve the issue of tracking cohort groups with LandRCBM_split3pools while also solving 2 other issues: 1) update our NPP and cbmPools objects to be better standalone outputs that do not need cohortGroupKeep, cohortDT, and standDT to produce meaningful results tables & plots; 2) reduce the amount of large tables held in memory in CBM_core.

I think it would require a brainstorm, but I'm imagining:

  1. Saving all the yearly NPP and cbmPools data either to CSV or to an SQLite database and removing them from the simList
  2. For each of these make them standalone output objects by either:
  • Remove the cohortGroupID key from all output objects: save a row for every year and cohortID and include a pixelIndex column for easy mapping of results to input raster locations
  • Leave NPP and cbmPools as-is with the year and cohortGroupID columns, but also save an altered version of the cohortGroupKeep table that includes a year column so that the cohorts and cohort groups can be different each year but still join directly with NPP and cbmPools . It might have the columns year, cohortID, and cohortGroupID, and also include a pixelIndex column for easy mapping of results to input raster locations.

It would be useful to ask if there's any cohort attributes (e.g. GCID, age) that would be useful to track over time.

Looking ahead this may allow us to simplify the cohortGroupKeep table to only keep the columns cohortID, cohortGroupID, cohortGroupPrev, and spinup if we want to reduce the size of that table as well; it would only be needed as an "intermediate" object for the annual event, and never as an "output" that would be used to track cohort activity over time.

@suz-estella
Copy link
Contributor

Looking super awesome and clean! love it! My thoughts:

cbmAdmin

Just wanted to bring to attention this task & where it links to a PR discussion of replacing cbmAdmin with a direct query to the CBM database spatial_unitstable: https://github.com/orgs/PredictiveEcology/projects/7/views/15?pane=issue&itemId=105569539

Event design

Presenting a bit of a brainstorm idea with lots of pros/cons to consider I'm sure: I wonder if some of this work could be moved to the LandR_CBM side if the events were re-designed a bit. Of course, maybe there's some work divvying justification to having certain steps in CBM_core that I don't have perspective into.

For example: perhaps this section be moved to an event in LandR CBM that runs after the spinup, e.g. a postSpinupAdjustBiomass event.

Lines 339-367: Here, we replace the live biomass given after the spinup with live biomass of the observed data. The equation for calculating below ground biomass need to be reviewed and put into a function. Other than that, I think the code is sufficiently commented to be understandable. Note that at the end, we have 1 line per cohort.

Maybe this could be in a postAnnualChecks event (e.g.): https://github.com/PredictiveEcology/CBM_core/pull/62/files#diff-d6ec2fc963ea1e974e891e3bf926ad608e1acca4683023562eafd652614c1e1eR841-R850

Further: the annual event could potentially be broken into chunks that LandR_CBM could schedule work between. For example, split it into 2 events: annual_prepareDisturbedCohorts and annual_cbmExnStep (bad names maybe, but hopefully makes sense).

Annual event priority

I have left the event priority of the annual event to 10, but it can be changed to a lower value. However, I think that the lowest value it can take (without playing on the LandR side) is 8, since it needs to happen after mortality & growth (eventPriority = 6) and cohort reclassification (eventPriority = 7)

I think we could easily go ahead and make this super low in the main CBM repo as well if we want. I think in general it makes sense. Maybe 8? leave 9-10 for accumulateResults & plotting (etc). I can't imagine much would ever need to be done after the annual event but I could be wrong.

Tracking cohorts & CBM_core outputs

1.3 Lines 500-511: We simply put this new information in the data table cohortGroups and cohortGroupKeep. Note that we still keep the history of cohort groups over time in cohortGroupKeep. However, I don't see how this information is useful except for cohortGroupPrev, cohortGroupID, and cohortID.

@cboisvenue: tracking the cohortGroup via cohortGroupKeep beyond cohortGroupPrev I think is only useful for mapping carbon (@suz-estella , is this how you see it?). Like I said before, when the simulations got very long (in the RIA 500 years of fire), I would only keep the pixelGroups (as they were called back then, now cohortGroups) for the years I outputted cbmPools. We can discuss if we need to keep all those or not.

Tracking cohorts will always be useful for mapping carbon to pixels for repoting & plotting. It is also currently required to make sense of the current NPP and cbmPools outputs which are saved by year & cohortGroupID, but as we discussed, I'm hoping we can change that. I've made a task draft about this.

I think until we take the time to focus on this I'd say just let the cohortGroupKeep history be the mess it will be! The emissionsProducts table just has a row for every year so that's still useful for reviewing results.

Disturbances: where to process disturbanceRasters into disturbanceEvents

@cboisvenue: handling of disturbances: we will have to work with @suz-estella and figure out how to get the disturbances into spadesCBM without going through LandRCBM_split3pool, directly from (in this case) scfm. I think that both LandRCBM and spadesCBM should get disturbances externally and not have to do any GIS. Currently (unless I am looking at the wrong spot), LandRCBM_split3pools has a raster as an inputObject (rstCurrentBurn), and an outputObject disturbanceEvents as a data.table. A bit of thinking needs to go into this. It might mean using CBM_dataPrep_XX as in standard spadesCBM, it handles yearly disturbances.

One solution would be to create a simple module (CBM_disturbance, or something) that translate the outputs of disturbance modules into inputs for CBM_core. Currently, LandRCBM_split3pools does it for fire, but it could be pulled out and make it non-specific to LandRCBM. But yeah, alternatively, CBM_dataPrepX could track them, especially since it already handles disturbances in standard CBM runs.

Coincidentally, I've been thinking about this over the past week as well! This likely calls for an in person meeting (deserves its own thread at least) but here's a summary of what I've been thinking about so far:

I'm wondering if we would benefit from having an area-generic CBM_dataPrep module that can take the area-specific inputs (disturbanceRasters, disturbanceMeta, etc) from our other data prep modules (CBM_dataPrep_SK, CBM_dataPrep_RIA) and handle the common spatial transformations and data wrangling tasks we need to prepare them for CBM_core. This could include the yearly event that wrangles multiple disturbanceRasters and collapses them into the disturbanceEvents table. It could also include matching user disturbances and species to CBM IDs, and perhaps be the home of where we match our ecozones & boundaries to get spatial_unit_id (re: cbmAdmin).

To make this LandR (and any other module) friendly: I think it would be smart to design this module to only work with the inputs it finds in the simulation so that it would be agnostic to certain objects being there (such as gcMeta which might only be needed by CBM_vol2biomass).

CONS:

  • We'd have to run 5 modules instead of 4 for a standard CBM run (CBM_dataPrep_SK and CBM_dataPrep, e.g.)

PROS:

  • Area-specific data prep modules would have a greatly reduced workload and much less shared code: they would only need to handle working with inputs that are unique to that area.
  • Updating data prep steps (say, in the event of a change to the CBM_core module inputs) would be easy as you'd only need to do this in 1 place instead of in the module of every study area.
  • In summary: it would be really easy to plug in new study areas as we go forward

I think it makes sense to think about the relationship of a module like this to CBM_defaults. How is it distinct? Would it instead make sense to have 1 module that does all of this?

@cboisvenue
Copy link
Contributor

Looking super awesome and clean! love it! My thoughts:

Yes - nice work.

cbmAdmin

Just wanted to bring to attention this task & where it links to a PR discussion of replacing cbmAdmin with a direct query to the CBM database spatial_unitstable: CFS SpaDES (view)

Yes, I would like to get rid of the cbmAdmin. Can somebody handle that? @camillegiuliano maybe?

Event design

Presenting a bit of a brainstorm idea with lots of pros/cons to consider I'm sure: I wonder if some of this work could be moved to the LandR_CBM side if the events were re-designed a bit. Of course, maybe there's some work divvying justification to having certain steps in CBM_core that I don't have perspective into.

For example: perhaps this section be moved to an event in LandR CBM that runs after the spinup, e.g. a postSpinupAdjustBiomass event.

Lines 339-367: Here, we replace the live biomass given after the spinup with live biomass of the observed data. The equation for calculating below ground biomass need to be reviewed and put into a function. Other than that, I think the code is sufficiently commented to be understandable. Note that at the end, we have 1 line per cohort.

Maybe this could be in a postAnnualChecks event (e.g.): https://github.com/PredictiveEcology/CBM_core/pull/62/files#diff-d6ec2fc963ea1e974e891e3bf926ad608e1acca4683023562eafd652614c1e1eR841-R850

Further: the annual event could potentially be broken into chunks that LandR_CBM could schedule work between. For example, split it into 2 events: annual_prepareDisturbedCohorts and annual_cbmExnStep (bad names maybe, but hopefully makes sense).

I like this. @DominiqueCaron what to you think? The cleaner and more streamlined these events can be made, the better. Maybe we put this on our things to discuss for a group meeting early next week? Thoughts? @suz-estella, could you pleae make sure we cover this next week?

Annual event priority

I have left the event priority of the annual event to 10, but it can be changed to a lower value. However, I think that the lowest value it can take (without playing on the LandR side) is 8, since it needs to happen after mortality & growth (eventPriority = 6) and cohort reclassification (eventPriority = 7)

I think we could easily go ahead and make this super low in the main CBM repo as well if we want. I think in general it makes sense. Maybe 8? leave 9-10 for accumulateResults & plotting (etc). I can't imagine much would ever need to be done after the annual event but I could be wrong.

Looks like this is already done. (check)

Tracking cohorts & CBM_core outputs

1.3 Lines 500-511: We simply put this new information in the data table cohortGroups and cohortGroupKeep. Note that we still keep the history of cohort groups over time in cohortGroupKeep. However, I don't see how this information is useful except for cohortGroupPrev, cohortGroupID, and cohortID.

@cboisvenue: tracking the cohortGroup via cohortGroupKeep beyond cohortGroupPrev I think is only useful for mapping carbon (@suz-estella , is this how you see it?). Like I said before, when the simulations got very long (in the RIA 500 years of fire), I would only keep the pixelGroups (as they were called back then, now cohortGroups) for the years I outputted cbmPools. We can discuss if we need to keep all those or not.

Tracking cohorts will always be useful for mapping carbon to pixels for repoting & plotting. It is also currently required to make sense of the current NPP and cbmPools outputs which are saved by year & cohortGroupID, but as we discussed, I'm hoping we can change that. I've made a task draft about this.

I think until we take the time to focus on this I'd say just let the cohortGroupKeep history be the mess it will be! The emissionsProducts table just has a row for every year so that's still useful for reviewing results.

Ok. @suz-estella try to see what you can come-up with to make this happen.

Disturbances: where to process disturbanceRasters into disturbanceEvents

@cboisvenue: handling of disturbances: we will have to work with @suz-estella and figure out how to get the disturbances into spadesCBM without going through LandRCBM_split3pool, directly from (in this case) scfm. I think that both LandRCBM and spadesCBM should get disturbances externally and not have to do any GIS. Currently (unless I am looking at the wrong spot), LandRCBM_split3pools has a raster as an inputObject (rstCurrentBurn), and an outputObject disturbanceEvents as a data.table. A bit of thinking needs to go into this. It might mean using CBM_dataPrep_XX as in standard spadesCBM, it handles yearly disturbances.

One solution would be to create a simple module (CBM_disturbance, or something) that translate the outputs of disturbance modules into inputs for CBM_core. Currently, LandRCBM_split3pools does it for fire, but it could be pulled out and make it non-specific to LandRCBM. But yeah, alternatively, CBM_dataPrepX could track them, especially since it already handles disturbances in standard CBM runs.

Coincidentally, I've been thinking about this over the past week as well! This likely calls for an in person meeting (deserves its own thread at least) but here's a summary of what I've been thinking about so far:

I'm wondering if we would benefit from having an area-generic CBM_dataPrep module that can take the area-specific inputs (disturbanceRasters, disturbanceMeta, etc) from our other data prep modules (CBM_dataPrep_SK, CBM_dataPrep_RIA) and handle the common spatial transformations and data wrangling tasks we need to prepare them for CBM_core. This could include the yearly event that wrangles multiple disturbanceRasters and collapses them into the disturbanceEvents table. It could also include matching user disturbances and species to CBM IDs, and perhaps be the home of where we match our ecozones & boundaries to get spatial_unit_id (re: cbmAdmin).

To make this LandR (and any other module) friendly: I think it would be smart to design this module to only work with the inputs it finds in the simulation so that it would be agnostic to certain objects being there (such as gcMeta which might only be needed by CBM_vol2biomass).

CONS:

  • We'd have to run 5 modules instead of 4 for a standard CBM run (CBM_dataPrep_SK and CBM_dataPrep, e.g.)

PROS:

  • Area-specific data prep modules would have a greatly reduced workload and much less shared code: they would only need to handle working with inputs that are unique to that area.
  • Updating data prep steps (say, in the event of a change to the CBM_core module inputs) would be easy as you'd only need to do this in 1 place instead of in the module of every study area.
  • In summary: it would be really easy to plug in new study areas as we go forward

I think it makes sense to think about the relationship of a module like this to CBM_defaults. How is it distinct? Would it instead make sense to have 1 module that does all of this?

Yes, I have been thinking we could separate the CBM_dataPrep in two for a while. How hard would this be do you think @suz-estella ?
I think it would be a worth while change as we are trying to make it as easy as possible to change study area.

Nice work all.

@DominiqueCaron
Copy link
Contributor Author

DominiqueCaron commented May 21, 2025

Thanks for the comments and thoughts!

Here are mine:
cbmAdmin: Is the idea is to have a CBMutils function that returns the spatial units directly from the spatial_units? It would take ecoregion and jurisdiction as inputs?

event design: Yes, I think that dividing some of the events make sense, and would make the module a bit cleaner. I am on vacation next week, so I am not available to meet.

event priority: The priority currently works fine. It is pretty easy to change if ever we need. If we add new events, we can also give them numeric values (e.g., the priority for annual_prepareDisturbedCohorts could be 8.0 and annual_cbmExnStep 8.5).

cohort tracking: I have nothing to add. We can keep it as it is for now.

disturbanceEvents: I agree that a lot of spatial data preparation are not study specific. The simpler the study area-specific modules are, the simpler it will be to use spadesCBM + LandRCBM in new study area. A CBM_dataPrep module seems like the correct place to handle disturbance data.

@suz-estella
Copy link
Contributor

Yes, I have been thinking we could separate the CBM_dataPrep in two for a while. How hard would this be do you think @suz-estella ? I think it would be a worth while change as we are trying to make it as easy as possible to change study area.

@cboisvenue I think this could potentially be pretty easy. - just a matter of splitting the code in CBM_dataPrep_SK in 2 for the most part. If this is something we want ASAP, I could focus on drafting something out next week, then @DominiqueCaron you'd have something to work with when you get back!

@DominiqueCaron
Copy link
Contributor Author

Yes - I can take care of the "handling disturbances from modules part" when I am back!

@DominiqueCaron
Copy link
Contributor Author

I added some basic tests. The GHA seems to be running only the "integration" tests. I don't know if that is by design @suz-estella

Anyway, I think that everything works. So, once everyone approve, this PR could be merged if we want.

@suz-estella
Copy link
Contributor

@DominiqueCaron it's actually skipping the integration tests (with argument invert = TRUE) since were finding it wasn't helpful to have all the integration tests fail when we were making an update with PRs from multiple repos at the same time.

About merge-ability:I still think it's worth reviewing this to see what could be moved to either LandRCBM_split3pools or another module just for the purpose of bridging LandRCBM_split3pools and CBM_core (I don't know off the top of my head what makes sense for that).

From above:

Event design

Presenting a bit of a brainstorm idea with lots of pros/cons to consider I'm sure: I wonder if some of this work could be moved to the LandR_CBM side if the events were re-designed a bit. Of course, maybe there's some work divvying justification to having certain steps in CBM_core that I don't have perspective into.

For example: perhaps this section be moved to an event in LandR CBM that runs after the spinup, e.g. a postSpinupAdjustBiomass event.

Lines 339-367: Here, we replace the live biomass given after the spinup with live biomass of the observed data. The equation for calculating below ground biomass need to be reviewed and put into a function. Other than that, I think the code is sufficiently commented to be understandable. Note that at the end, we have 1 line per cohort.

Maybe this could be in a postAnnualChecks event (e.g.): https://github.com/PredictiveEcology/CBM_core/pull/62/files#diff-d6ec2fc963ea1e974e891e3bf926ad608e1acca4683023562eafd652614c1e1eR841-R850

Further: the annual event could potentially be broken into chunks that LandR_CBM could schedule work between. For example, split it into 2 events: annual_prepareDisturbedCohorts and annual_cbmExnStep (bad names maybe, but hopefully makes sense).

My 2 cents: I think it makes sense to have the "LandR-bridging-CBM" tasks occurring somewhere other than CBM_core (as must as possible) so that we can keep CBM_core focused on its module purpose. Ideally any changes we commit to CBM_core will make the module more flexible for any module trying to connect to it -- I'd go as far to say that ideally we'd keep any if(moduleName %in% modules(sim)) { switches out of our primary branches. We could always add parameters that can alter the default behaviour of the module in some specific way instead Of course, always easier said than done! and if impossible, maybe it's OK to keep a branch for LandR.

@cboisvenue
Copy link
Contributor

I would also like to keep any if(moduleName %in% modules(sim)) { switches out of our primary branches. Can you see if that is possible?

@DominiqueCaron
Copy link
Contributor Author

DominiqueCaron commented Jun 2, 2025

Ok sounds good!

Here are a few changes I plan to make after seeing your comments:

  1. Moving a lot of the LandRCBM specific operations to LandRCBM_split3pool:
    1.1. Create a postSpinupAdjustBiomass event to adjust the live biomass given by the spinup to match with the observed data.
    1.2. Create a annual_prepareCohorts event that creates the objects cohortDT, cohortGroups, cohortGroupKeep every time step.
    1.3. Create a annual_prepareCBMvars event that prepares the input for libcbmr::cbm_exn_step (i.e., cbm_vars).
    1.4. Create a postAnnualChecks event that checks that the CBM above ground biomass is still synchronize with LandR (maybe optional?)

  2. Create a boolean parameter withLandR to CBM_core. Some operations of CBM_core will need to be skipped, so this parameter could be used to control whether or not we skip them.

  3. Changing LandRCBM_split3pools, given the new CBM_dataPrep module:
    3.1 Instead of creating directly disturbanceEvents, LandRCBM_split3pools will create the objects disturbanceRasters and disturbanceMeta. These objects will then be handled by CBM_dataPrep to prepare disturbance data for CBM_core.
    3.2 LandRCBM_split3pools will let CBM_dataPrep do the spatial data preparation (creating standDT).

@cboisvenue
Copy link
Contributor

Sounds good.

Some thoughts on this part:

  1. Changing LandRCBM_split3pools, given the new CBM_dataPrep module:
    3.1 Instead of creating directly disturbanceEvents, LandRCBM_split3pools will create the objects disturbanceRasters and disturbanceMeta. These objects will then be handled by CBM_dataPrep to prepare disturbance data for CBM_core.
    3.2 LandRCBM_split3pools will let CBM_dataPrep do the spatial data preparation (creating standDT).

The disturbances will always be coming from either an external model/module (like scfm) and be integrated in both LandR and spadesCBM/LandRCBM. Do we need LandRCBM_split3pools to handle disturbances and rasters ? Maybe it can take the table that will come out of CBM_dataPrep to get the information of which cohorts get disturbed? I am thinking it would be a good idea to keep the GIS operations in one module, which in this case would be CBM_dataPrep. Thoughts?

@suz-estella
Copy link
Contributor

suz-estella commented Jun 2, 2025

Sounds like a good plan to me!

Create a boolean parameter withLandR to CBM_core. Some operations of CBM_core will need to be skipped, so this parameter could be used to control whether or not we skip them.

My only feedback is that I'd aim to even take it a step further and create fully descriptive parameters where possible instead, like skipStepSomethingSomething so that they are potentially readily available to any module that would want to do the same thing, without them having to use a switch called withLandR. Of course, very theoretical - I don't have an example in mind, so maybe this would never happen!

The disturbances will always be coming from either an external model/module (like scfm) and be integrated in both LandR and spadesCBM/LandRCBM. Do we need LandRCBM_split3pools to handle disturbances and rasters ? Maybe it can take the table that will come out of CBM_dataPrep to get the information of which cohorts get disturbed? I am thinking it would be a good idea to keep the GIS operations in one module, which in this case would be CBM_dataPrep. Thoughts?

I imagine the list of disturbanceRasters will have to be added to yearly somewhere (say, after a scfm annual run), but as long as the list is structured correctly, they will be able to processed by CBM_dataPrep. We could consider having CBM_dataPrep be aware of the modules we want to integrate it with, potentially, and handle that itself.

@cboisvenue
Copy link
Contributor

My only feedback is that I'd aim to even take it a step further and create fully descriptive parameters where possible instead, like skipStepSomethingSomething so that they are potentially readily available to any module that would want to do the same thing, without them having to use a switch called withLandR. Of course, very theoretical - I don't have an example in mind, so maybe this would never happen!

This is exactly what I am hoping would happen: increments could come from other vegetation models then LandR, even from some cohort-based growth and yield models. So I like this idea of a more generic switch with a descriptive name.

Maybe you two can go back and forth on this since Dominique has the best understanding of LandRCBM_split3pools and you Susan the best understanding of our new CBM_core structure?

@DominiqueCaron
Copy link
Contributor Author

The disturbances will always be coming from either an external model/module (like scfm) and be integrated in both LandR and spadesCBM/LandRCBM. Do we need LandRCBM_split3pools to handle disturbances and rasters ? Maybe it can take the table that will come out of CBM_dataPrep to get the information of which cohorts get disturbed? I am thinking it would be a good idea to keep the GIS operations in one module, which in this case would be CBM_dataPrep. Thoughts?

Similar to what I wrote here: PredictiveEcology/CBM_dataPrep#1 . For now LandRCBM_split3pools does the translation from the disturbance module to spadesCBM. However, I don't see why it wouldnt be in CBM_dataPrep. The on 'annoying' bit is that each disturbance module produce outputs with different names and possibly types. So regardless of whether it is LandRCBM_split3pools or CBM_dataPrep, as @suz-estella the module that handles disturbances will need a bit of work for each disturance module we want to integrate.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants