SuperBench 0.10.0 Release Notes

SuperBench Improvements

Add HPL random generator to gemm-flops with ROCm.
Add DirectXGPURenderFPS benchmark to measure the FPS of rendering simple frames.
Add HWDecoderFPS benchmark to measure the FPS of hardware decoder performance.
Update Docker image for H100 support.
Update MLC version into 3.10 for CUDA/ROCm dockerfile.
Bug fix for GPU Burn test.
Support INT8 in cublaslt function.
Add hipBLASLt function benchmark.
Support cpu-gpu and gpu-cpu in ib-validation.
Support graph mode in NCCL/RCCL benchmarks for latency metrics.
Support cpp implementation in distributed inference benchmark.
Add O2 option for gpu copy ROCm build.
Support different hipblasLt data types in dist inference.
Support in-place in NCCL/RCCL benchmark.
Support data type option in NCCL/RCCL benchmark.
Improve P2P performance with fine-grained GPU memory in GPU-copy test for AMD GPUs.
Update hipblaslt GEMM metric unit to tflops.
Support FP8 for hipblaslt benchmark.