Releases
v0.10.0
Release SuperBench v0.10.0
Latest
SuperBench 0.10.0 Release Notes
SuperBench Improvements
Support monitoring for AMD GPUs.
Support ROCm 5.7 and ROCm 6.0 dockerfile.
Add MSCCL support for Nvidia GPU.
Fix NUMA domains swap issue in NDv4 topology file.
Add NDv5 topo file.
Fix NCCL and NCCL-test to 2.18.3 for hang issue in CUDA 12.2.
Micro-benchmark Improvements
Add HPL random generator to gemm-flops with ROCm.
Add DirectXGPURenderFPS benchmark to measure the FPS of rendering simple frames.
Add HWDecoderFPS benchmark to measure the FPS of hardware decoder performance.
Update Docker image for H100 support.
Update MLC version into 3.10 for CUDA/ROCm dockerfile.
Bug fix for GPU Burn test.
Support INT8 in cublaslt function.
Add hipBLASLt function benchmark.
Support cpu-gpu and gpu-cpu in ib-validation.
Support graph mode in NCCL/RCCL benchmarks for latency metrics.
Support cpp implementation in distributed inference benchmark.
Add O2 option for gpu copy ROCm build.
Support different hipblasLt data types in dist inference.
Support in-place in NCCL/RCCL benchmark.
Support data type option in NCCL/RCCL benchmark.
Improve P2P performance with fine-grained GPU memory in GPU-copy test for AMD GPUs.
Update hipblaslt GEMM metric unit to tflops.
Support FP8 for hipblaslt benchmark.
Model Benchmark Improvements
Change torch.distributed.launch to torchrun.
Support Megatron-LM/Megatron-Deepspeed GPT pretrain benchmark.
Result Analysis
Support baseline generation from multiple nodes.
You can’t perform that action at this time.