Add leaderboard component to ClimaLand's long runs #890

ph-kev · 2024-10-28T18:34:55Z

closes #872 - This PR adds a leaderboard component for ClimaLand's long runs. See the plots below.

The values near the coastlines are Nans because the observational data is resampled so that it matches the same grid as the simulation data.

I added a new file land_leaderboard.jl since no simulation being done for long runs are longer are two years or more. I am not sure if this should be done since land_leaderboard.jl is exactly the same as land.jl beside the length of the simulation. The leaderboard code still works even if the length of the simulation is only a year, it would just not throw away the first year of the simulation and use it instead for comparing against the observational data.

The other thing is that the simulation is not correct because the era5 land forcing data is from 2021.

experiments/long_runs/leaderboard/data_sources.jl

Sbozzolo

Can you add a page in the documentation describing the bias plots/leaderboard, how to add them? You can start from the comments you already have in the file

experiments/long_runs/leaderboard/data_sources.jl

kmdeck · 2024-10-29T20:13:01Z

experiments/long_runs/leaderboard/data_sources.jl

+            ClimaLand.Artifacts.ilamb_dataset_path(;
+                context = "evspsbl_MODIS_et_0.5x0.5.nc",
+            ),
+            "et",


Do we rely on using the same short_name in the data and the diagnostic output from the simulation anywhere?

or it seems like this line could be anything, as long as we set obs_var_dict["et"]

We don't rely on the same short_name in the (observational?) data and the diagnostic output from the simulation. The short name (eton line 72) is the name that is given in the observations (.nc file).

It is like that because ClimaAnalysis sometimes can't correctly infers what variable is or if there is more than one variable in the .nc file.

The line can be anything as long as obs_var_dict["et"] is a function that takes in a start date and return a OutputVar.

kmdeck · 2024-10-29T20:18:57Z

docs/src/leaderboard/leaderboard.md

+            shift_by = Dates.firstdayofmonth,
+        )
+        # More preprocessing to match the units with the simulation data
+        ClimaAnalysis.units(obs_var) == "kg/m2/s" &&


Maybe we could link to ClimaAnalysis documentation explaining the formatting for units? If I understand correctly, this will carry out unit conversion, but that means the string must be in the format expected by ClimaAnalysis

For that part of the code, it is just checking that the string of the units is "kg/m2/s" and if it is, then set the units to "kg m^-2 s^-1". The check is just there to make sure the units are set correctly.

There is no convention for the formatting of units unless one want to use automatic unit conversion. We only do this because ClimaAnalysis can't tell that "kg/m2/s" (in the observational data) is the same as "kg m^-2 s^-1" (in the simulation data).

experiments/long_runs/leaderboard/leaderboard.jl

kmdeck · 2024-10-29T20:28:43Z

experiments/long_runs/leaderboard/leaderboard.jl

+"""
+    compute_leaderboard(leaderboard_base_path, diagnostics_folder_path)
+
+Plot the biases against observations and simulation data.


I see that both obs_var_dict and sim_var_dict are used in this function but not passed in as arguments. For clarity and to make it easy on future users, could we pass in those as arguments?

This argument is not used: diagnostics_folder_path, it was already used in data_sources.jl. No need to pass in, instead we should pass in sim_var_dict.

I am going to make the entire data_sources.jl file a function instead since I realized it is confusing how things are being initialized before.

experiments/long_runs/leaderboard/leaderboard.jl

kmdeck · 2024-10-29T20:31:34Z

experiments/long_runs/leaderboard/leaderboard.jl

+                ClimaAnalysis.window(sim_var, "time", left = spinup_cutoff)
+        )
+
+        # Get 12 or less months of data


Why is this here? What if we had run for many years, and only used the first year as spinup?

I added that there in case if the simulation run for longer than one year, but less than two years. For instance, if one were to run the simulation for 1.5 years, then only six months will be used after throwing away the first year.

The code will always use the second year regardless if the simulation runs for multiple years. It assumed that we start the simulation at year 2012 (although, it could be any year as long as data exists in the observations) The only reason is because some datasets are limited in time. For example, for the observational data for et, the last point in time is 2013-12-16.

experiments/long_runs/leaderboard/leaderboard.jl

experiments/long_runs/land_leaderboard.jl

kmdeck · 2024-10-29T20:36:39Z

experiments/long_runs/leaderboard/data_sources.jl

@@ -0,0 +1,142 @@
+import ClimaAnalysis


I noticed that we change the units of the simulation output in some cases (GPP), and in the observations (LWU) in others. it would be good to make this consistent (simulation unit changes to match observations)

I agree that this should be consistent, but I think we should change all observation units to match simulation units. This is because there is some consistency with how the units are written for the simulation data, but not for the observational data.

kmdeck · 2024-10-29T20:38:49Z

experiments/long_runs/leaderboard/leaderboard.jl

+            g_rmse = ClimaAnalysis.global_rmse(
+                ClimaAnalysis.slice(sim_var, time = t),
+                ClimaAnalysis.slice(obs_var, time = t),
+                mask = mask,


ClimaAnalysis uses the mask to change the denominator in RMSE and mean bias?

Yes, the mask is used to normalize the global RMSE and global bias.

kmdeck · 2024-10-29T20:40:15Z

experiments/long_runs/land_leaderboard.jl

+        Interpolations.Flat(),
+        Interpolations.Flat(),
+    )
+    soil_params_mask = SpaceVaryingInput(


Once this PR merges: #883, could you change this script to use the helper functions for canopy and soil spatially varying parameters?

ph-kev force-pushed the kp/leaderboard branch 2 times, most recently from d5d99e3 to 4f6a203 Compare October 28, 2024 18:44

ph-kev marked this pull request as draft October 28, 2024 18:44

ph-kev force-pushed the kp/leaderboard branch from 4f6a203 to ca89738 Compare October 28, 2024 18:49

ph-kev requested a review from Sbozzolo October 28, 2024 20:02

Sbozzolo requested a review from kmdeck October 28, 2024 22:10

Sbozzolo reviewed Oct 28, 2024

View reviewed changes

experiments/long_runs/leaderboard/data_sources.jl Outdated Show resolved Hide resolved

Sbozzolo reviewed Oct 28, 2024

View reviewed changes

experiments/long_runs/leaderboard/data_sources.jl Outdated Show resolved Hide resolved

ph-kev force-pushed the kp/leaderboard branch 5 times, most recently from 6a53f9a to adc5cd9 Compare October 29, 2024 20:05

kmdeck reviewed Oct 29, 2024

View reviewed changes

experiments/long_runs/leaderboard/data_sources.jl Outdated Show resolved Hide resolved

kmdeck reviewed Oct 29, 2024

View reviewed changes

experiments/long_runs/leaderboard/leaderboard.jl Outdated Show resolved Hide resolved

kmdeck reviewed Oct 29, 2024

View reviewed changes

experiments/long_runs/leaderboard/leaderboard.jl Show resolved Hide resolved

kmdeck reviewed Oct 29, 2024

View reviewed changes

experiments/long_runs/leaderboard/leaderboard.jl Show resolved Hide resolved

kmdeck reviewed Oct 29, 2024

View reviewed changes

experiments/long_runs/land_leaderboard.jl Outdated Show resolved Hide resolved

kmdeck reviewed Oct 29, 2024

View reviewed changes

ph-kev added 2 commits October 30, 2024 10:41

Add Artifacts.ilamb_dataset_path

9d80318

Update ClimaAnalysis to 0.5.11

8e5802d

ph-kev force-pushed the kp/leaderboard branch from adc5cd9 to 99283da Compare October 30, 2024 20:51

Add leaderboard component

eb728b5

ph-kev force-pushed the kp/leaderboard branch from 99283da to eb728b5 Compare October 30, 2024 23:14

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add leaderboard component to ClimaLand's long runs #890

Add leaderboard component to ClimaLand's long runs #890

ph-kev commented Oct 28, 2024 •

edited

Loading

Sbozzolo left a comment

kmdeck Oct 29, 2024

kmdeck Oct 29, 2024

ph-kev Oct 29, 2024 •

edited

Loading

kmdeck Oct 29, 2024

ph-kev Oct 29, 2024

kmdeck Oct 29, 2024

kmdeck Oct 29, 2024 •

edited

Loading

ph-kev Oct 30, 2024

kmdeck Oct 29, 2024

ph-kev Oct 29, 2024 •

edited

Loading

kmdeck Oct 29, 2024

ph-kev Oct 30, 2024

kmdeck Oct 29, 2024

ph-kev Oct 30, 2024

kmdeck Oct 29, 2024

Add leaderboard component to ClimaLand's long runs #890

Are you sure you want to change the base?

Add leaderboard component to ClimaLand's long runs #890

Conversation

ph-kev commented Oct 28, 2024 • edited Loading

Sbozzolo left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ph-kev Oct 29, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

kmdeck Oct 29, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ph-kev Oct 29, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ph-kev commented Oct 28, 2024 •

edited

Loading

ph-kev Oct 29, 2024 •

edited

Loading

kmdeck Oct 29, 2024 •

edited

Loading

ph-kev Oct 29, 2024 •

edited

Loading