Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[4.1 postmortem] Unit test module for inference repo #1865

Open
nvzhihanj opened this issue Oct 1, 2024 · 4 comments
Open

[4.1 postmortem] Unit test module for inference repo #1865

nvzhihanj opened this issue Oct 1, 2024 · 4 comments
Assignees

Comments

@nvzhihanj
Copy link
Contributor

We propose to add some basic unit test framework (likely pytest) and tests to the inference repo. Ideally, it should test:

  • All configuration (mlperf.conf, user.conf) is valid and working (i.e. without typo, no items outside of an existing dictionary etc)
  • All reference benchmark dataset and model download modules are working
  • All reference models are running and producing reasonable output
  • Submission checker can catch errors as expected.
  • etc..

To get this started, we can aim at some simple tests:

@pgmpablo157321 to help implement
@nv-alicheng for more suggestions.

@nvzhihanj nvzhihanj changed the title Unit test module for inference repo [4.1 postmortem] Unit test module for inference repo Oct 1, 2024
@arjunsuresh
Copy link
Contributor

We also need a deadline for these to be done before every submission round. May be the same as code freeze date?

Currently we do have github tests for reference benchmark runs which includes download of models and datasets and even the submission checker - it is completed for small models and we also have GPTJ and SDXL tests running on self hosted github runners. We hope to cover LLAMA2, DLRMv2 and Mixtral this month.

The main concern is with upcoming benchmarks - say if the benchmark needs multiple GPUs and Terabytes of memory like say the GNN - we do not have an infrastructure to do the runs. There unit tests for the conf files and log parsing may be the only option.

@nvzhihanj
Copy link
Contributor Author

nvzhihanj commented Oct 2, 2024

@arjunsuresh Thank you for the suggestions, I think we can split into steps and cover the most important parts first (submission checker, config files, loadgen)
For the benchmarks we can implement 1 by 1 according to urgency
Regarding the deadline, I agree it should align with freeze date

@arjunsuresh
Copy link
Contributor

@nvzhihanj

Yes, we do need unit tests for submission checker. Currently we check it by running it on the previous submission repository which can catch most of the issues but not all. It is currently failing because "images" folder was removed before publication for stable diffusion but this was not updated in the submission checker.

Similar situation for loadgen - we currently have tests for it and we run it on the previous benchmarks. But if a new benchmark comes and new configs are added - they are not tested.

@mrmhodak
Copy link
Contributor

mrmhodak commented Oct 8, 2024

@pgmpablo157321 to work on this

@pgmpablo157321 pgmpablo157321 self-assigned this Oct 8, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants