Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Resnet50 error while building engines (NVIDIA) #12

Closed
mahmoodn opened this issue Jun 28, 2024 · 1 comment
Closed

Resnet50 error while building engines (NVIDIA) #12

mahmoodn opened this issue Jun 28, 2024 · 1 comment

Comments

@mahmoodn
Copy link

Following NVIDIA readme file and after adding the system configuration and building the workloads, I tried to run resnet50 benchmark but at the beginning of the execution, I get the following error:

(mlperf) mahmood@mlperf-inference-mahmood-x86-64-26486:/work$ make run RUN_ARGS="--benchmarks=resnet50 --scenarios=offline"

make[1]: Entering directory '/work'
[2024-06-28 07:33:14,784 main.py:229 INFO] Detected system ID: KnownSystem.rtx3080_ryzen3700x
[2024-06-28 07:33:15,654 generate_engines.py:173 INFO] Building engines for resnet50 benchmark in Offline scenario...
[06/28/2024-07:33:15] [TRT] [I] [MemUsageChange] Init CUDA: CPU +2, GPU +0, now: CPU 35, GPU 823 (MiB)
[06/28/2024-07:33:18] [TRT] [I] [MemUsageChange] Init builder kernel library: CPU +1799, GPU +306, now: CPU 1969, GPU 1135 (MiB)
Process Process-1:
Traceback (most recent call last):
  File "/usr/lib/python3.8/multiprocessing/process.py", line 315, in _bootstrap
    self.run()
  File "/usr/lib/python3.8/multiprocessing/process.py", line 108, in run
    self._target(*self._args, **self._kwargs)
  File "/work/code/actionhandler/base.py", line 189, in subprocess_target
    return self.action_handler.handle()
  File "/work/code/actionhandler/generate_engines.py", line 176, in handle
    total_engine_build_time += self.build_engine(job)
  File "/work/code/actionhandler/generate_engines.py", line 159, in build_engine
    builder = get_benchmark(job.config)
  File "/work/code/__init__.py", line 87, in get_benchmark
    return cls(conf)
  File "/work/code/resnet50/tensorrt/ResNet50.py", line 332, in __init__
    super().__init__(ResNet50EngineBuilderOp(**args))
  File "/work/code/resnet50/tensorrt/ResNet50.py", line 148, in __init__
    if self.batch_size % self.gpu_res2res3_loop_count != 0:
ZeroDivisionError: integer division or modulo by zero
[2024-06-28 07:33:19,719 generate_engines.py:173 INFO] Building engines for resnet50 benchmark in Offline scenario...
[06/28/2024-07:33:19] [TRT] [I] [MemUsageChange] Init CUDA: CPU +2, GPU +0, now: CPU 35, GPU 823 (MiB)
[06/28/2024-07:33:21] [TRT] [I] [MemUsageChange] Init builder kernel library: CPU +1799, GPU +310, now: CPU 1969, GPU 1139 (MiB)
Process Process-2:
Traceback (most recent call last):
  File "/usr/lib/python3.8/multiprocessing/process.py", line 315, in _bootstrap
    self.run()
  File "/usr/lib/python3.8/multiprocessing/process.py", line 108, in run
    self._target(*self._args, **self._kwargs)
  File "/work/code/actionhandler/base.py", line 189, in subprocess_target
    return self.action_handler.handle()
  File "/work/code/actionhandler/generate_engines.py", line 176, in handle
    total_engine_build_time += self.build_engine(job)
  File "/work/code/actionhandler/generate_engines.py", line 159, in build_engine
    builder = get_benchmark(job.config)
  File "/work/code/__init__.py", line 87, in get_benchmark
    return cls(conf)
  File "/work/code/resnet50/tensorrt/ResNet50.py", line 332, in __init__
    super().__init__(ResNet50EngineBuilderOp(**args))
  File "/work/code/resnet50/tensorrt/ResNet50.py", line 148, in __init__
    if self.batch_size % self.gpu_res2res3_loop_count != 0:
ZeroDivisionError: integer division or modulo by zero
Traceback (most recent call last):
  File "/usr/lib/python3.8/runpy.py", line 194, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/usr/lib/python3.8/runpy.py", line 87, in _run_code
    exec(code, run_globals)
  File "/work/code/main.py", line 231, in <module>
    main(main_args, DETECTED_SYSTEM)
  File "/work/code/main.py", line 144, in main
    dispatch_action(main_args, config_dict, workload_setting)
  File "/work/code/main.py", line 202, in dispatch_action
    handler.run()
  File "/work/code/actionhandler/base.py", line 82, in run
    self.handle_failure()
  File "/work/code/actionhandler/base.py", line 186, in handle_failure
    self.action_handler.handle_failure()
  File "/work/code/actionhandler/generate_engines.py", line 184, in handle_failure
    raise RuntimeError("Building engines failed!")
RuntimeError: Building engines failed!
make[1]: *** [Makefile:37: generate_engines] Error 1
make[1]: Leaving directory '/work'
make: *** [Makefile:31: run] Error 2

The default generated configuration in configs/resnet50/Offline is shown below:

# Generated file by scripts/custom_systems/add_custom_system.py
# Contains configs for all custom systems in code/common/systems/custom_list.py

from . import *


@ConfigRegistry.register(HarnessType.LWIS, AccuracyTarget.k_99, PowerSetting.MaxP)
class RTX3080_RYZEN3700X(OfflineGPUBaseConfig):
    system = KnownSystem.rtx3080_ryzen3700x

    # Applicable fields for this benchmark are listed below. Not all of these are necessary, and some may be defined in the BaseConfig already and inherited.
    # Please see NVIDIA's submission config files for example values and which fields to keep.
    # Required fields (Must be set or inherited to run):
    gpu_batch_size: int = 0
    input_dtype: str = ''
    input_format: str = ''
    map_path: str = ''
    precision: str = ''
    tensor_path: str = ''

    # Optional fields:
    active_sms: int = 0
    assume_contiguous: bool = False
    buffer_manager_thread_count: int = 0
    cache_file: str = ''
    complete_threads: int = 0
    deque_timeout_usec: int = 0
    disable_beta1_smallk: bool = False
    energy_aware_kernels: bool = False
    gpu_copy_streams: int = 0
    gpu_inference_streams: int = 0
    gpu_res2res3_loop_count: int = 0
    instance_group_count: int = 0
    model_path: str = ''
    offline_expected_qps: float = 0.0
    performance_sample_count_override: int = 0
    preferred_batch_size: str = ''
    request_timeout_usec: int = 0
    run_infer_on_copy_streams: bool = False
    use_batcher_thread_per_device: bool = False
    use_cuda_thread_per_device: bool = False
    use_deque_limit: bool = False
    use_graphs: bool = False
    use_jemalloc: bool = False
    use_same_context: bool = False
    use_spin_wait: bool = False
    verbose_glog: int = 0
    warmup_duration: float = 0.0
    workspace_size: int = 0


@ConfigRegistry.register(HarnessType.Triton, AccuracyTarget.k_99, PowerSetting.MaxP)
class RTX3080_RYZEN3700X_Triton(RTX3080_RYZEN3700X):
    use_triton = True

    # Applicable fields for this benchmark are listed below. Not all of these are necessary, and some may be defined in the BaseConfig already and inherited.
    # Please see NVIDIA's submission config files for example values and which fields to keep.
    # Required fields (Must be set or inherited to run):
    gpu_batch_size: int = 0
    input_dtype: str = ''
    input_format: str = ''
    map_path: str = ''
    precision: str = ''
    tensor_path: str = ''

    # Optional fields:
    active_sms: int = 0
    assume_contiguous: bool = False
    batch_triton_requests: bool = False
    buffer_manager_thread_count: int = 0
    cache_file: str = ''
    complete_threads: int = 0
    deque_timeout_usec: int = 0
    disable_beta1_smallk: bool = False
    energy_aware_kernels: bool = False
    gather_kernel_buffer_threshold: int = 0
    gpu_copy_streams: int = 0
    gpu_inference_streams: int = 0
    gpu_res2res3_loop_count: int = 0
    instance_group_count: int = 0
    max_queue_delay_usec: int = 0
    model_path: str = ''
    num_concurrent_batchers: int = 0
    num_concurrent_issuers: int = 0
    offline_expected_qps: float = 0.0
    output_pinned_memory: bool = False
    performance_sample_count_override: int = 0
    preferred_batch_size: str = ''
    request_timeout_usec: int = 0
    run_infer_on_copy_streams: bool = False
    use_batcher_thread_per_device: bool = False
    use_concurrent_harness: bool = False
    use_cuda_thread_per_device: bool = False
    use_deque_limit: bool = False
    use_graphs: bool = False
    use_jemalloc: bool = False
    use_same_context: bool = False
    use_spin_wait: bool = False
    verbose_glog: int = 0
    warmup_duration: float = 0.0
    workspace_size: int = 0

I thought that gpu_batch_size: int = 0 is causing the problem, but changing that to 1 resulted in the same error. I also checked that nvidia-smi works as below:

(mlperf) mahmood@mlperf-inference-mahmood-x86-64-26486:/work$ nvidia-smi 
Fri Jun 28 07:42:30 2024       
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 535.54.03              Driver Version: 535.54.03    CUDA Version: 12.2     |
|-----------------------------------------+----------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |         Memory-Usage | GPU-Util  Compute M. |
|                                         |                      |               MIG M. |
|=========================================+======================+======================|
|   0  NVIDIA GeForce RTX 3080        Off | 00000000:2D:00.0  On |                  N/A |
|  0%   54C    P8              33W / 370W |    239MiB / 10240MiB |      7%      Default |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+
                                                                                         
+---------------------------------------------------------------------------------------+
| Processes:                                                                            |
|  GPU   GI   CI        PID   Type   Process name                            GPU Memory |
|        ID   ID                                                             Usage      |
|=======================================================================================|
+---------------------------------------------------------------------------------------+

Any idea about that?

@mahmoodn
Copy link
Author

Apparently, user has to comment/remove all parameters in configs/resnet50/Offline/custom.py and keep only these parameters with the required values:

    gpu_batch_size: int = 1
    offline_expected_qps: float = 37000

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant