We present a new method for training pedestrian detectors on an unannotated image set, which is captured by a moving camera with a fixed height and angle from the ground. Our approach is general and robust and makes no other assumptions about the image dataset or the number of pedestrians. We automatically extract the vanishing point and the scale of the pedestrians to calibrate the virtual camera and generate a probability map for the pedestrians to spawn. Using these features, we overlay synthetic human-like agents in proper locations on the images from the unannotated dataset. We also present novel techniques to increase the realism of these synthetic agents and use the augmented images to train a Faster R-CNN detector. Our approach improves the accuracy by 12 to 13 percent over prior methods for unannotated image datasets.
View on arXiv