The task of category independent foreground segmentation in images is challenging for a machine learning system, because it needs to learn the general concept of an object, even for object categories that it hasn’t seen during training. In the case of foreground segmentation in videos, the problem is compounded by the fact that the object as well as the background change appearance throughout the video. We propose a method for learning the general concept of object appearance in videos, based on deep neural networks. Apart from learning the object appearance for each frame, our system learns the temporal changes between frames in a video, which represent the object motion, and thus leverages the temporal information available in videos. By learning a category-independent object segmentation, we are able to perform unsupervised video object segmentation. In addition, in the case of semi-supervised video segmentation (where one frame from the video is annotated) we further train our system to recognize a specific object which appears in the video. In both scenarios, our system compares favorably against the state of the art.
Furthermore, we demonstrate a novel use case for video object segmentation, by implementing a mobile application where a user captures a video of an object, and our system is able to segment the object and display it in an AR setting.