Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support for optimizers that require an .eval() step #758

Closed
netw0rkf10w opened this issue Apr 11, 2024 · 5 comments
Closed

Support for optimizers that require an .eval() step #758

netw0rkf10w opened this issue Apr 11, 2024 · 5 comments

Comments

@netw0rkf10w
Copy link

Description

First of all if this feature is already supported then please consider this as a question.

I'm trying to reproduce some results of existing algorithms such as SGD, AdamW, and also the recently proposed ScheduleFree. For the latter, in particular, there is an .eval() step that needs to be done before the validation phase, and I haven't figured out how to do that properly (there doesn't seem to be any indication in their submission code).

Any help would be greatly appreciated! Thank you in advance!

@adefazio
Copy link

I use the closure form of the algorithm to avoid needing to do the eval() call. There has been a request from another user to support this this sort of eval() mode so that exponential weight averaging can be implemented. The organizers said that this would be something they will look at adding in the future. It should help with performance.

@netw0rkf10w
Copy link
Author

Thanks a lot, @adefazio. Nice trick!

As I understand, you basically swap the extrapolated points at every step, is that correct? That seems to be putting your algorithm at a disadvantage though. Have you observed considerable slowdown compared to the .eval() version?

And I'm quite surprised that the proposed feature hasn't been added to support your method. It's just two lines of code (actually we only need one). Or maybe I'm missing something?

Would love to hear your opinion on this, @priyakasimbeg. Thanks.

@adefazio
Copy link

The overhead is minimal, parameter copying is very fast.
The issue was raised very close to the deadline and so given the low overhead of the copy, it was decided that changing the competition API close to the deadline was not warranted.

@priyakasimbeg
Copy link
Contributor

Hi we're planning on discussing this feature request on Th 9/5 during the WG meeting

@priyakasimbeg
Copy link
Contributor

I believe this is partially addressed in #798 (decouple BN statistic updates from using running statistics) and will be fully addressed once pr/789 for #719 has been submitted. So I am deduping this. Please feel free to reopen if I have missed anything.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants