-
Notifications
You must be signed in to change notification settings - Fork 10
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add leaderboard component to ClimaLand's long runs #890
base: main
Are you sure you want to change the base?
Conversation
d5d99e3
to
4f6a203
Compare
4f6a203
to
ca89738
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you add a page in the documentation describing the bias plots/leaderboard, how to add them? You can start from the comments you already have in the file
6a53f9a
to
adc5cd9
Compare
ClimaLand.Artifacts.ilamb_dataset_path(; | ||
context = "evspsbl_MODIS_et_0.5x0.5.nc", | ||
), | ||
"et", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we rely on using the same short_name in the data and the diagnostic output from the simulation anywhere?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
or it seems like this line could be anything, as long as we set obs_var_dict["et"]
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We don't rely on the same short_name in the (observational?) data and the diagnostic output from the simulation. The short name (et
on line 72) is the name that is given in the observations (.nc file).
It is like that because ClimaAnalysis sometimes can't correctly infers what variable is or if there is more than one variable in the .nc file.
The line can be anything as long as obs_var_dict["et"]
is a function that takes in a start date and return a OutputVar
.
shift_by = Dates.firstdayofmonth, | ||
) | ||
# More preprocessing to match the units with the simulation data | ||
ClimaAnalysis.units(obs_var) == "kg/m2/s" && |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe we could link to ClimaAnalysis documentation explaining the formatting for units? If I understand correctly, this will carry out unit conversion, but that means the string must be in the format expected by ClimaAnalysis
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For that part of the code, it is just checking that the string of the units is "kg/m2/s" and if it is, then set the units to "kg m^-2 s^-1". The check is just there to make sure the units are set correctly.
There is no convention for the formatting of units unless one want to use automatic unit conversion. We only do this because ClimaAnalysis can't tell that "kg/m2/s" (in the observational data) is the same as "kg m^-2 s^-1" (in the simulation data).
""" | ||
compute_leaderboard(leaderboard_base_path, diagnostics_folder_path) | ||
|
||
Plot the biases against observations and simulation data. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see that both obs_var_dict and sim_var_dict are used in this function but not passed in as arguments. For clarity and to make it easy on future users, could we pass in those as arguments?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This argument is not used: diagnostics_folder_path, it was already used in data_sources.jl. No need to pass in, instead we should pass in sim_var_dict.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am going to make the entire data_sources.jl
file a function instead since I realized it is confusing how things are being initialized before.
ClimaAnalysis.window(sim_var, "time", left = spinup_cutoff) | ||
) | ||
|
||
# Get 12 or less months of data |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is this here? What if we had run for many years, and only used the first year as spinup?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added that there in case if the simulation run for longer than one year, but less than two years. For instance, if one were to run the simulation for 1.5 years, then only six months will be used after throwing away the first year.
The code will always use the second year regardless if the simulation runs for multiple years. It assumed that we start the simulation at year 2012 (although, it could be any year as long as data exists in the observations) The only reason is because some datasets are limited in time. For example, for the observational data for et
, the last point in time is 2013-12-16
.
@@ -0,0 +1,142 @@ | |||
import ClimaAnalysis |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I noticed that we change the units of the simulation output in some cases (GPP), and in the observations (LWU) in others. it would be good to make this consistent (simulation unit changes to match observations)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree that this should be consistent, but I think we should change all observation units to match simulation units. This is because there is some consistency with how the units are written for the simulation data, but not for the observational data.
g_rmse = ClimaAnalysis.global_rmse( | ||
ClimaAnalysis.slice(sim_var, time = t), | ||
ClimaAnalysis.slice(obs_var, time = t), | ||
mask = mask, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ClimaAnalysis uses the mask to change the denominator in RMSE and mean bias?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, the mask is used to normalize the global RMSE and global bias.
Interpolations.Flat(), | ||
Interpolations.Flat(), | ||
) | ||
soil_params_mask = SpaceVaryingInput( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Once this PR merges: #883, could you change this script to use the helper functions for canopy and soil spatially varying parameters?
adc5cd9
to
99283da
Compare
99283da
to
eb728b5
Compare
closes #872 - This PR adds a leaderboard component for ClimaLand's long runs. See the plots below.
The values near the coastlines are
Nan
s because the observational data is resampled so that it matches the same grid as the simulation data.I added a new file
land_leaderboard.jl
since no simulation being done for long runs are longer are two years or more. I am not sure if this should be done sinceland_leaderboard.jl
is exactly the same asland.jl
beside the length of the simulation. The leaderboard code still works even if the length of the simulation is only a year, it would just not throw away the first year of the simulation and use it instead for comparing against the observational data.The other thing is that the simulation is not correct because the era5 land forcing data is from 2021.