[4.1 postmortem] Unit test module for inference repo #1865

nvzhihanj · 2024-10-01T16:37:17Z

We propose to add some basic unit test framework (likely pytest) and tests to the inference repo. Ideally, it should test:

All configuration (mlperf.conf, user.conf) is valid and working (i.e. without typo, no items outside of an existing dictionary etc)
All reference benchmark dataset and model download modules are working
All reference models are running and producing reasonable output
Submission checker can catch errors as expected.
etc..

To get this started, we can aim at some simple tests:

configuration walkthrough and validation
submission checker tests for some corner cases (once Submission checker: better modularize the code, and have better documentation of the expectation of results #1670 has started)

@pgmpablo157321 to help implement
@nv-alicheng for more suggestions.

arjunsuresh · 2024-10-02T08:26:53Z

We also need a deadline for these to be done before every submission round. May be the same as code freeze date?

Currently we do have github tests for reference benchmark runs which includes download of models and datasets and even the submission checker - it is completed for small models and we also have GPTJ and SDXL tests running on self hosted github runners. We hope to cover LLAMA2, DLRMv2 and Mixtral this month.

The main concern is with upcoming benchmarks - say if the benchmark needs multiple GPUs and Terabytes of memory like say the GNN - we do not have an infrastructure to do the runs. There unit tests for the conf files and log parsing may be the only option.

nvzhihanj · 2024-10-02T17:18:58Z

@arjunsuresh Thank you for the suggestions, I think we can split into steps and cover the most important parts first (submission checker, config files, loadgen)
For the benchmarks we can implement 1 by 1 according to urgency
Regarding the deadline, I agree it should align with freeze date

arjunsuresh · 2024-10-04T10:38:05Z

@nvzhihanj

Yes, we do need unit tests for submission checker. Currently we check it by running it on the previous submission repository which can catch most of the issues but not all. It is currently failing because "images" folder was removed before publication for stable diffusion but this was not updated in the submission checker.

Similar situation for loadgen - we currently have tests for it and we run it on the previous benchmarks. But if a new benchmark comes and new configs are added - they are not tested.

mrmhodak · 2024-10-08T16:48:28Z

@pgmpablo157321 to work on this

nvzhihanj changed the title ~~Unit test module for inference repo~~ [4.1 postmortem] Unit test module for inference repo Oct 1, 2024

arjunsuresh added the postmortem 4.1 label Oct 4, 2024

arjunsuresh mentioned this issue Oct 4, 2024

Adding nightly/weekly github actions for MLPerf inference runs mlcommons/cm4mlops#333

Open

8 tasks

pgmpablo157321 self-assigned this Oct 8, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[4.1 postmortem] Unit test module for inference repo #1865

[4.1 postmortem] Unit test module for inference repo #1865

nvzhihanj commented Oct 1, 2024

arjunsuresh commented Oct 2, 2024

nvzhihanj commented Oct 2, 2024 •

edited

Loading

arjunsuresh commented Oct 4, 2024

mrmhodak commented Oct 8, 2024

[4.1 postmortem] Unit test module for inference repo #1865

[4.1 postmortem] Unit test module for inference repo #1865

Comments

nvzhihanj commented Oct 1, 2024

arjunsuresh commented Oct 2, 2024

nvzhihanj commented Oct 2, 2024 • edited Loading

arjunsuresh commented Oct 4, 2024

mrmhodak commented Oct 8, 2024

nvzhihanj commented Oct 2, 2024 •

edited

Loading