In the field of deep learning, the gradient vanishing problem has hindered the training and deployment of deep models for many years. Huge efforts have been made, and DenseNet is the most well-known and effective among them.
The work on DenseNet won the Best Paper Award of CVPR 2017, which is a top-tier conference in computer vision, as well as in the field of AI. Gao Huang, a current assistant professor at Tsinghua University, is one of the creators of DenseNet.
In fact, he was the first author of the paper, working as a postdoctoral fellow at Cornell University when it was published. Prior to this, he studied at Tsinghua University and Beihang University. His research interests lie in machine learning and computer vision, in particular deep learning, dynamic neural networks, resource-efficient learning, weakly-supervised/unsupervised learning, and reinforcement learning.
When Huang started his first postdoctoral project in 2015, he chose to investigate the nature of very deep neural networks by dropping entire layers of deep residual networks during training. This was a unique and counter-intuitive move as the traditional view of neural networks would suggest it is detrimental to the performance.
It turned out that people did not know deep networks as thoroughly as they thought. Huang’s work showed that there is excessive redundancy within the learned internal representations, and redundancy was the “secret component” that made deep nets generalize so well. Dropping such layers would “force increased redundancy” upon the network, create a regularizer, and thus improve performance.
“This challenges the predominant belief of how neural networks learn and what kind of information the individual layers capture,” says Dr. Kilian Q. Weinberger, an Associate Professor at Cornell University who supervised Huang’s postdoctoral work. As a result, this work was accepted at the European Conference on Computer Vision (ECCV 2016) and invited as an oral presentation at the NIPS 2016 deep learning symposium.
The acknowledgment motivated Huang to incorporate new findings in the design of more efficient deep neural networks, so he collaborated with researchers from Facebook AI, Cornell University, and Tsinghua University to propose a new connectivity scheme. The goal was to drastically reduce the redundancy by allowing layers to access information from anywhere in the network, and to remove the necessity of copying learned features from one layer to the next, which had been a common deep network training practice for decades.
This novel connectivity paradigm was later named DenseNet.
“By introducing dense connectivity into deep networks, we can elegantly solve the gradient vanishing problem,” says Huang. More importantly, the network can be made much more efficient as there is no information bottleneck in DenseNet and each layer may only retain a small number of nodes/channels, instead of hundreds or thousands as in traditional neural networks. It has brought huge improvements in the training stability, computational efficiency, and generalization performance of the deep models, according to Huang.
So far, the paper introducing DenseNet has collected more than 17,400 citations. Dr. Yann LeCunn, the father of convolutional networks and the 2019 Turing Award laureate, recognized DenseNet as one of the most popular and representative deep CNN architectures. Popular deep learning platforms like TensorFlow and PyTorch also incorporate it as a standard CNN model.
In part fueled by the success of DenseNet, Huang started thinking about taking the computational resources into account and created two designs that allow one to train neural networks to be computationally efficient and less resource hungry. Two resulting papers, lightweight CNNs and dynamic neural networks were accepted by CVPR 2018 and ICLR 2018, respectively.
Huang’s current work focuses on dynamic neural networks, which is a new learning paradigm that behaves like human brains. “I believe that dynamic models will lead to the next generation of deep learning, bringing us at least two orders of magnitude of improvement in energy efficiency, higher interpretability, and robustness,” says Huang.