Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Restarting from a checkpoint should restore the time step? #3845

Open
qingli411 opened this issue Oct 10, 2024 · 1 comment
Open

Restarting from a checkpoint should restore the time step? #3845

qingli411 opened this issue Oct 10, 2024 · 1 comment
Labels
bug 🐞 Even a perfect program still has bugs

Comments

@qingli411
Copy link

It looks like that restarting from a checkpoint is not bit-for-bit? I think the issue is that when restarting from a checkpoint the time step is not restored -- it is still using the initial time step defined in Simulation(), not the last_Δt from the saved Clock object in the checkpoint file. See the example below, which is the output of the attached test case.

In the pickup run I changed the onscreen output from every 10 iterations to every 1 iterations to see the time step. Rather than using the previous time step (5.973 s) from the checkpoint, the pickup run is using a time step of 10 s which is the value when defining simulation = Simulation(model, Δt=10, stop_iteration=220). I’m using v0.91.5.

Initial run

[ Info: Initializing simulation...
Iteration: 0000, time: 0 seconds, Δt: 11 seconds, max(|u|) = 2.5e-01 ms⁻¹, wall time: 0 seconds
[ Info:     ... simulation initialization complete (13.909 seconds)
[ Info: Executing initial time step...
[ Info:     ... initial time step complete (4.696 seconds).
Iteration: 0010, time: 1.833 minutes, Δt: 11.212 seconds, max(|u|) = 2.6e-01 ms⁻¹, wall time: 19.116 seconds
Iteration: 0020, time: 3.702 minutes, Δt: 10.681 seconds, max(|u|) = 2.8e-01 ms⁻¹, wall time: 19.345 seconds
Iteration: 0030, time: 5.482 minutes, Δt: 10.215 seconds, max(|u|) = 2.9e-01 ms⁻¹, wall time: 19.629 seconds
Iteration: 0040, time: 7.185 minutes, Δt: 9.802 seconds, max(|u|) = 3.0e-01 ms⁻¹, wall time: 19.854 seconds
Iteration: 0050, time: 8.819 minutes, Δt: 9.433 seconds, max(|u|) = 3.1e-01 ms⁻¹, wall time: 20.082 seconds
Iteration: 0060, time: 10.391 minutes, Δt: 9.100 seconds, max(|u|) = 3.2e-01 ms⁻¹, wall time: 20.306 seconds
Iteration: 0070, time: 11.907 minutes, Δt: 8.798 seconds, max(|u|) = 3.3e-01 ms⁻¹, wall time: 20.559 seconds
Iteration: 0080, time: 13.374 minutes, Δt: 8.523 seconds, max(|u|) = 3.4e-01 ms⁻¹, wall time: 20.773 seconds
Iteration: 0090, time: 14.794 minutes, Δt: 8.270 seconds, max(|u|) = 3.5e-01 ms⁻¹, wall time: 21.001 seconds
Iteration: 0100, time: 16.173 minutes, Δt: 8.036 seconds, max(|u|) = 3.6e-01 ms⁻¹, wall time: 21.230 seconds
Iteration: 0110, time: 17.512 minutes, Δt: 7.820 seconds, max(|u|) = 3.7e-01 ms⁻¹, wall time: 21.456 seconds
Iteration: 0120, time: 18.815 minutes, Δt: 7.617 seconds, max(|u|) = 3.8e-01 ms⁻¹, wall time: 21.688 seconds
Iteration: 0130, time: 20.085 minutes, Δt: 7.424 seconds, max(|u|) = 3.9e-01 ms⁻¹, wall time: 21.914 seconds
Iteration: 0140, time: 21.322 minutes, Δt: 7.238 seconds, max(|u|) = 4.0e-01 ms⁻¹, wall time: 22.140 seconds
Iteration: 0150, time: 22.528 minutes, Δt: 7.046 seconds, max(|u|) = 4.0e-01 ms⁻¹, wall time: 22.366 seconds
Iteration: 0160, time: 23.703 minutes, Δt: 6.792 seconds, max(|u|) = 4.1e-01 ms⁻¹, wall time: 22.597 seconds
Iteration: 0170, time: 24.835 minutes, Δt: 6.500 seconds, max(|u|) = 4.2e-01 ms⁻¹, wall time: 22.822 seconds
Iteration: 0180, time: 25.918 minutes, Δt: 6.230 seconds, max(|u|) = 4.3e-01 ms⁻¹, wall time: 23.047 seconds
Iteration: 0190, time: 26.956 minutes, Δt: 5.973 seconds, max(|u|) = 4.3e-01 ms⁻¹, wall time: 23.271 seconds
Iteration: 0200, time: 27.952 minutes, Δt: 5.930 seconds, max(|u|) = 4.3e-01 ms⁻¹, wall time: 23.504 seconds
Iteration: 0210, time: 28.940 minutes, Δt: 5.979 seconds, max(|u|) = 4.1e-01 ms⁻¹, wall time: 24.139 seconds
[ Info: Simulation is stopping after running for 24.362 seconds.
[ Info: Model iteration 220 equals or exceeds stop iteration 220.
Iteration: 0220, time: 29.937 minutes, Δt: 5.898 seconds, max(|u|) = 4.1e-01 ms⁻¹, wall time: 24.362 seconds

Pickup run

[ Info: Initializing simulation...
[ Info:     ... simulation initialization complete (51.655 ms)
[ Info: Executing initial time step...
[ Info:     ... initial time step complete (5.017 seconds).
Iteration: 0201, time: 28.118 minutes, Δt: 10 seconds, max(|u|) = 4.2e-01 ms⁻¹, wall time: 0 seconds
Iteration: 0202, time: 28.285 minutes, Δt: 10 seconds, max(|u|) = 4.2e-01 ms⁻¹, wall time: 10.477 seconds
Iteration: 0203, time: 28.452 minutes, Δt: 10 seconds, max(|u|) = 4.1e-01 ms⁻¹, wall time: 10.499 seconds
Iteration: 0204, time: 28.618 minutes, Δt: 10 seconds, max(|u|) = 4.1e-01 ms⁻¹, wall time: 10.520 seconds
Iteration: 0205, time: 28.785 minutes, Δt: 10 seconds, max(|u|) = 4.1e-01 ms⁻¹, wall time: 10.544 seconds
Iteration: 0206, time: 28.952 minutes, Δt: 10 seconds, max(|u|) = 4.1e-01 ms⁻¹, wall time: 10.570 seconds
Iteration: 0207, time: 29.118 minutes, Δt: 10 seconds, max(|u|) = 4.1e-01 ms⁻¹, wall time: 10.595 seconds
Iteration: 0208, time: 29.285 minutes, Δt: 10 seconds, max(|u|) = 4.1e-01 ms⁻¹, wall time: 10.621 seconds
Iteration: 0209, time: 29.452 minutes, Δt: 10 seconds, max(|u|) = 4.1e-01 ms⁻¹, wall time: 10.647 seconds
Iteration: 0210, time: 29.618 minutes, Δt: 6.183 seconds, max(|u|) = 4.0e-01 ms⁻¹, wall time: 10.673 seconds
Iteration: 0211, time: 29.722 minutes, Δt: 6.183 seconds, max(|u|) = 4.0e-01 ms⁻¹, wall time: 14.392 seconds
Iteration: 0212, time: 29.825 minutes, Δt: 6.183 seconds, max(|u|) = 4.0e-01 ms⁻¹, wall time: 14.413 seconds
Iteration: 0213, time: 29.928 minutes, Δt: 6.183 seconds, max(|u|) = 4.0e-01 ms⁻¹, wall time: 14.435 seconds
Iteration: 0214, time: 30.031 minutes, Δt: 6.183 seconds, max(|u|) = 4.0e-01 ms⁻¹, wall time: 14.459 seconds
Iteration: 0215, time: 30.134 minutes, Δt: 6.183 seconds, max(|u|) = 4.0e-01 ms⁻¹, wall time: 14.485 seconds
Iteration: 0216, time: 30.237 minutes, Δt: 6.183 seconds, max(|u|) = 4.0e-01 ms⁻¹, wall time: 14.510 seconds
Iteration: 0217, time: 30.340 minutes, Δt: 6.183 seconds, max(|u|) = 4.0e-01 ms⁻¹, wall time: 14.536 seconds
Iteration: 0218, time: 30.443 minutes, Δt: 6.183 seconds, max(|u|) = 4.0e-01 ms⁻¹, wall time: 14.559 seconds
Iteration: 0219, time: 30.546 minutes, Δt: 6.183 seconds, max(|u|) = 4.0e-01 ms⁻¹, wall time: 14.582 seconds
[ Info: Simulation is stopping after running for 14.606 seconds.
[ Info: Model iteration 220 equals or exceeds stop iteration 220.
Iteration: 0220, time: 30.649 minutes, Δt: 6.023 seconds, max(|u|) = 4.0e-01 ms⁻¹, wall time: 14.606 seconds

run.jl.zip

@glwagner

@glwagner glwagner added the bug 🐞 Even a perfect program still has bugs label Oct 15, 2024
@glwagner
Copy link
Member

This is another reason why the Checkpointer essentially has to depend on the simulation rather than just the model.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug 🐞 Even a perfect program still has bugs
Projects
None yet
Development

No branches or pull requests

2 participants