Skip to content
Waleed edited this page Jun 7, 2018 · 5 revisions

Optimizing GPU Memory

Mask RCNN is designed for accuracy rather than memory efficiency. It's not a light-weight model. If you have a small GPU then you might notice that inferencing runs correctly but training fails with an Out of Memory error. That's because training requires a lot more memory than running in inference mode. Ideally, you'd want to use a GPU with 12GB or more, but you can train on smaller GPUs by choosing good settings and making the right trade-offs.

This is a list of things to consider if you're running out of memory. Many of these can be set in your Config class. See explanation of each setting in config.py

  1. Use a smaller backbone network. The default is resnet101, but you can use resnet50 to reduce memory load significantly and it's sufficient for most applications. It also trains faster.

    BACKBONE = "resnet50"
  2. Train fewer layers. If you're starting from pre-trained COCO or Imagenet weights then the early layers are already trained to extract low-level features and you can benefit from that. Especially if your images are also natural images like the ones in COCO and Imagenet.

    model.train(..., layers='heads', ...)  # Train heads branches (least memory)
    model.train(..., layers='3+', ...)     # Train resnet stage 3 and up
    model.train(..., layers='4+', ...)     # Train resnet stage 4 and up
    model.train(..., layers='all', ...)    # Train all layers (most memory)
  3. Use smaller images. The default settings resize images to squares of size 1024x1024. If you can use smaller images then you'd reduce memory requirements and cut training and inference time as well. Image size is controlled by these settings in Config:

    IMAGE_MIN_DIM = 800
    IMAGE_MAX_DIM = 1024
  4. Use smaller batch size. The default used by the paper authors is already small (2 images per GPU, using 8 GPUs). But if you have no other choice, consider reducing it further. This is set in the Config class as well.

    GPU_COUNT = 1
    IMAGES_PER_GPU = 2
  5. Use fewer ROIs in training the second stage. This setting is like the batch size for the second stage of the model.

    TRAIN_ROIS_PER_IMAGE = 200
  6. Reduce the maximum number of instances per image if your images don't have a lot of objects.

    MAX_GT_INSTANCES = 100
  7. Train on crops of images rather than full images. This method is used in the nucleus sample to pick 512x512 crops out of larger images.

    # Random crops of size 512x512
    IMAGE_RESIZE_MODE = "crop"
    IMAGE_MIN_DIM = 512
    IMAGE_MAX_DIM = 512

Important: Each of these changes has implications on training time and final accuracy. Read the comments next to each setting in config.py and refer to the code and the Mask RCNN paper to assess the full impact of each change.

Training with RGB-D or Grayscale images

The model is designed to work with RGB images. But if your images are grayscale (1 color channel), or RGB-D (3 color + 1 depth channels) then you'd need to change a few places in the code to accommodate your input.

This is a list of the places in the code that you need to update, collected from answers in the Issues section. Here N is the number of channels (1 for grayscale, 4 for RGB-D):

  1. In config.py find the line that sets the value of IMAGE_SHAPE and change the last dimension from 3 to N.
  2. In your Config class, change MEAN_PIXEL from 3 values to N values.
  3. The load_image() method in the Dataset class is designed for RGB. It converts Grayscale images to RGB and removes the 4th channel if present (because typically it's an alpha channel). You'll need to override this method to handle your images.
  4. Since you're changing the shape of the input, the shape of the first Conv layer will change as well. So you can't use the provided pre-trained weights. To get around that, use the exclude parameter when you load the weights to exclude the first layer. Then you'll need to train heads + 1 layer (or train all layers).
Clone this wiki locally