Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add leaderboard component to ClimaLand's long runs #890

Draft
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

ph-kev
Copy link
Member

@ph-kev ph-kev commented Oct 28, 2024

closes #872 - This PR adds a leaderboard component for ClimaLand's long runs. See the plots below.

gpp_bias_plot
lwu_bias_plot
et_bias_plot
global_rmse_and_bias_graphs

The values near the coastlines are Nans because the observational data is resampled so that it matches the same grid as the simulation data.

I added a new file land_leaderboard.jl since no simulation being done for long runs are longer are two years or more. I am not sure if this should be done since land_leaderboard.jl is exactly the same as land.jl beside the length of the simulation. The leaderboard code still works even if the length of the simulation is only a year, it would just not throw away the first year of the simulation and use it instead for comparing against the observational data.

The other thing is that the simulation is not correct because the era5 land forcing data is from 2021.

@ph-kev ph-kev force-pushed the kp/leaderboard branch 2 times, most recently from d5d99e3 to 4f6a203 Compare October 28, 2024 18:44
@ph-kev ph-kev marked this pull request as draft October 28, 2024 18:44
@ph-kev ph-kev requested a review from Sbozzolo October 28, 2024 20:02
@Sbozzolo Sbozzolo requested a review from kmdeck October 28, 2024 22:10
Copy link
Member

@Sbozzolo Sbozzolo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add a page in the documentation describing the bias plots/leaderboard, how to add them? You can start from the comments you already have in the file

experiments/long_runs/leaderboard/data_sources.jl Outdated Show resolved Hide resolved
@ph-kev ph-kev force-pushed the kp/leaderboard branch 5 times, most recently from 6a53f9a to adc5cd9 Compare October 29, 2024 20:05
ClimaLand.Artifacts.ilamb_dataset_path(;
context = "evspsbl_MODIS_et_0.5x0.5.nc",
),
"et",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we rely on using the same short_name in the data and the diagnostic output from the simulation anywhere?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

or it seems like this line could be anything, as long as we set obs_var_dict["et"]

Copy link
Member Author

@ph-kev ph-kev Oct 29, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't rely on the same short_name in the (observational?) data and the diagnostic output from the simulation. The short name (eton line 72) is the name that is given in the observations (.nc file).

It is like that because ClimaAnalysis sometimes can't correctly infers what variable is or if there is more than one variable in the .nc file.

The line can be anything as long as obs_var_dict["et"] is a function that takes in a start date and return a OutputVar.

shift_by = Dates.firstdayofmonth,
)
# More preprocessing to match the units with the simulation data
ClimaAnalysis.units(obs_var) == "kg/m2/s" &&
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we could link to ClimaAnalysis documentation explaining the formatting for units? If I understand correctly, this will carry out unit conversion, but that means the string must be in the format expected by ClimaAnalysis

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For that part of the code, it is just checking that the string of the units is "kg/m2/s" and if it is, then set the units to "kg m^-2 s^-1". The check is just there to make sure the units are set correctly.

There is no convention for the formatting of units unless one want to use automatic unit conversion. We only do this because ClimaAnalysis can't tell that "kg/m2/s" (in the observational data) is the same as "kg m^-2 s^-1" (in the simulation data).

"""
compute_leaderboard(leaderboard_base_path, diagnostics_folder_path)

Plot the biases against observations and simulation data.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see that both obs_var_dict and sim_var_dict are used in this function but not passed in as arguments. For clarity and to make it easy on future users, could we pass in those as arguments?

Copy link
Member

@kmdeck kmdeck Oct 29, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This argument is not used: diagnostics_folder_path, it was already used in data_sources.jl. No need to pass in, instead we should pass in sim_var_dict.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am going to make the entire data_sources.jl file a function instead since I realized it is confusing how things are being initialized before.

ClimaAnalysis.window(sim_var, "time", left = spinup_cutoff)
)

# Get 12 or less months of data
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this here? What if we had run for many years, and only used the first year as spinup?

Copy link
Member Author

@ph-kev ph-kev Oct 29, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added that there in case if the simulation run for longer than one year, but less than two years. For instance, if one were to run the simulation for 1.5 years, then only six months will be used after throwing away the first year.

The code will always use the second year regardless if the simulation runs for multiple years. It assumed that we start the simulation at year 2012 (although, it could be any year as long as data exists in the observations) The only reason is because some datasets are limited in time. For example, for the observational data for et, the last point in time is 2013-12-16.

@@ -0,0 +1,142 @@
import ClimaAnalysis
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I noticed that we change the units of the simulation output in some cases (GPP), and in the observations (LWU) in others. it would be good to make this consistent (simulation unit changes to match observations)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that this should be consistent, but I think we should change all observation units to match simulation units. This is because there is some consistency with how the units are written for the simulation data, but not for the observational data.

g_rmse = ClimaAnalysis.global_rmse(
ClimaAnalysis.slice(sim_var, time = t),
ClimaAnalysis.slice(obs_var, time = t),
mask = mask,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ClimaAnalysis uses the mask to change the denominator in RMSE and mean bias?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, the mask is used to normalize the global RMSE and global bias.

Interpolations.Flat(),
Interpolations.Flat(),
)
soil_params_mask = SpaceVaryingInput(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Once this PR merges: #883, could you change this script to use the helper functions for canopy and soil spatially varying parameters?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add leaderboard component to ClimaLand's long runs
3 participants