Project done during the Introduction to Research subject at UPC, Master in Advanced Telecommunications Technologies. You can check the details of the project here: OAGCNN_for_VOS.pdf
Video object segmentation has increased in popularity since the release of Davis2016 in which a single object had to be segmented. With the release of Davis2017 and YouTube-VOS the task moved to multiple object segmentation, increasing in difficulty. In this work we focus on this scenario, presenting a novel graph convolutional neural network that has the sense of each object of the video sequence in each node, working entirely in the feature space domain. The nodes are initialized using an encoder that takes as input features from the image together with each object’s mask. After graph message passing, we use a decoder on each final node state to segment the object that node is referring to.