Learning to Segment Every Thing

Producing accurate pixel-level masks around specific objects within images is of course a common task in VFX. Current solutions can be labor-intensive, and the results from one task cannot be used directly to improve the quality of future work. Existing tools generally do not know the semantic context of the object whose mask is being extracted.

Segmentation is a very active area of research within the deep learning community due to its application to a wide range of real-world tasks. As a result of this there are large number of datasets available. It’s not difficult to imagine tools in the very near future which can produce quality masks from a single click, or even fit 3D proxy geometry to arbitrary objects within an image or image sequence.

Existing methods for object instance segmentation require all training instances to be labeled with segmentation masks. This requirement makes it expensive to annotate new categories and has restricted instance segmentation models to ∼100 well-annotated classes. The goal of this paper is to propose a new partially supervised training paradigm, together with a novel weight transfer function, that enables training instance segmentation models over a large set of categories for which all have box annotations, but only a small fraction have mask annotations. These contributions allow us to train Mask R-CNN to detect and segment 3000 visual concepts using box annotations from the Visual Genome dataset and mask annotations from the 80 classes in the COCO dataset. We carefully evaluate our proposed approach in a controlled study on the COCO dataset. This work is a first step towards instance segmentation models that have broad comprehension of the visual world.

Are you aware of some research that warrants coverage here? Contact us or let us know in the comments section below!


Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.