Rethinking ImageNet Pre-training

March 2020

tl;dr: ImageNet pretraining speeds up training but not necessarily increases accuracy.

Overall impression

Tons of ablation study. Another solid work from FAIR.

We should start exploring group normalization

Key ideas

ImageNet pretraining does not necessarily improve performance, unless it is below 10k COCO images (7 objects per image. For PASCAL images where 2 objects per iamge, we see overfitting even for 15k). ImageNet pretraining does not gives better regularization and not help reducing overfitting.
ImageNet pretraining is still useful in reducing research cycles.

Technical details

GroupNorm with batch size of 2 x 8 GPUs.

Notes

Questions and notes on how to improve/revise the current work