-
Notifications
You must be signed in to change notification settings - Fork 118
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add support for s3 checkpoint #220
Conversation
if after != orig_vocab_size: | ||
print("i'm in") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cleanup comment
@@ -3,6 +3,7 @@ | |||
import os | |||
import random | |||
import socket | |||
import re |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
remove as it seems to be unused
Please run the pre-commit so that it fix automatically everything related to coding styles
In any case, good job ! |
Goal
Allow the upload checkpoint to s3 during training + resuming from ckpt
How
Add
S3UploadArgs
class, define like this in the configAdd support for s3 path in
resume_checkpoint_path
and it will copy (with s5cmd) it toocheckpoints_path
. This is done in thepre_init
phase with theparse_ckpt_path(config=self.config, parallel_context=self.parallel_context)
function.