September 2019
tl;dr: Seminal paper from MSRA that improves upon faster R-CNN.
Faster RCNN computation increases as ROI number grows, as each ROI has a fully connected layer. R-FCN improves the computation efficiency by moving the FCN to before ROI pooling by generating position sensitive score maps (feat maps). Each PS score map is responsible to fire at a particular region (top-left corner) of a particular class.
Note that usually R-FCN has slightly lower performance, especially compared to FPN-powered Faster RCNN.
R-FCN cannot leverage FPN directly as the number of channels are too large for large dataset such as COCO. This is improved in Light-head RCNN to reduce the number of score maps from #class x p x p to 10. Instead, the simple voting mechanism is replaced by a fully connected layer.
- Summaries of the key ideas
- Summary of technical details
- This medium blog post from Jonathan Hui explains the intuition very well.