In the past decade, supervised learning in conjunction with deep networks has significantly improved the performance of many tasks in the field of computer vision. The success of supervised learning is mainly attributed to the rise of the large-scale annotated dataset and the advancement in computing hardware. However, these supervised learning approaches are notoriously "data-hungry," and we are often facing the challenge of insufficient amounts of perfect. This makes those supervised algorithms sometimes not practical in many real-world industrial applications.
To address this issue, Yunchao Wei, a professor at Beijing Jiaotong University, has made many researches since 2014 to improve deep network training to deviate from traditional paths of supervised learning using imperfect data, including object detection and semantic segmentation under weakly supervised conditions, domain adaptive/few-shot visual segmentation, and semi-supervised video segmentation.
In 2014, Wei won the championship of the ImageNet competition in object detection when he was a visiting student at the National University of Singapore. After an in-depth analysis of computer vision algorithms based on deep learning, he found that these algorithms rely heavily on a large number of labeled samples; their performance is otherwise poor, so he decided to explore how to enhance the performance of the algorithms especially when the data is not perfectly labeled.
Wei is one of the pioneers and advocates working on this challenging research area. He focuses on helping computers understand various objects in complex scenes by learning with imperfect annotated data.
In particular, Wei developed a series of techniques to learn semantic segmentation models to recognize the semantics of each pixel only using the imperfect image-level labels as supervision, which is much easier and cheaper to be obtained compared with pixel-level masks. The key challenge lies in how to learn accurately and efficiently to propagate the category information from image-level to pixel-level so that fully convolutional networks can be applied to train reliable segmentation models. To this end, he proposed saliency-based, adversarial-erasing-based, and attention-shift-based solutions and improved the performance by over 20% on the popular PASCAL benchmark in the past 3 years.
Related research has been cited for more than 8,700 times. The work was not only widely published in top conferences like TPAMI, CVPR, ICCV, and NIPS, but also helped Wei and his team to win the championship in the benchmark of multiple datasets and international competitions.
Wei’s work has significantly relieved the dependence of visual recognition algorithms on perfect annotated data and achieved accurate pixel-level understanding under imperfect conditions. His research can benefit many real-world applications where perfect annotated data is difficult to obtain (e.g., medical or agriculture images, videos, etc.). In the future, he will focus on exploring more kinds of imperfect data and integrating them into a unified learning framework.