End-to-end learning for segmenting generic objects in images and videos

Computing & Wireless : Computing Methods

Available for licensing


  • Kristen Grauman , Computer Science
  • Suyog Jain , Computer Science
  • Bo Xiong , Computer Science

Background/unmet need

In any image and video understanding task, separating out important objects from the background is an essential first step. This allows any computer vision system to focus only on things that matter and suppress the impact of background regions. Existing techniques either require humans to guide the system partially (e.g., Photoshop software) or do not generalize well to large number of object categories. Our system is fully automatic and can separate objects from background for a large number of object categories.  

Invention Description

Researchers at The University of Texas at Austin have invented a computer vision system which is capable of segmenting (i.e., finding boundaries) of generic objects in images and videos. For images, our system learns about generic patterns that are indicative of objects and is able to separate them out from the background. For videos, our system relies on both appearance and motion patterns and combines them in a unified way for segmenting objects. This preprocessing can benefit several computer vision tasks such as image and video search, scene understanding, and editing. 


  • Fully automatic system for segmenting objects in images and videos
  • Very efficient during runtime
  • Highly scalable 


  • The system generalizes to thousands of object categories, which makes it widely applicable for segmenting objects in images and videos at a large scale. Existing methods are either restricted in their performance or provide good results only for a fixed number of object categories.
  • For video segmentation, this method learns to combine both appearance and motion in a principled way. Existing methods do not make an effective use of both the cues in a unified way.
  • Our system is very efficient to run, and can process each image within 2 to 3 seconds. For videos, we can process a frame in about 10 seconds. Existing methods, especially for video segmentation, can take up to a minute to process each frame. 

Market potential/applications

Technology companies which are interested in image and video understanding 

Development Stage

Lab/bench prototype

IP Status

  • 1 PCT patent application filed