Interaction of Neural Architecture and Optimization in Deep Learning

Event details
Date | 02.09.2022 |
Hour | 13:00 › 15:00 |
Speaker | Atli Kosson |
Location | |
Category | Conferences - Seminars |
EDIC candidacy exam
Exam president: Prof. Amir Zamir
Thesis advisor: Prof. Martin Jaggi
Co-examiner: Prof. François Fleuret
Abstract
Modern deep neural networks have a complex structure, consisting of many layers as well as different types of trainable parameters such as convolutional filters, gains and biases. They are predominantly optimized using some form of stochastic gradient descent (SGD). The structure and parameterization of a neural network can strongly affect the conditioning of the optimization problem which greatly influences the performance of SGD and related methods. My research interests lie in understanding how neural architecture impacts the optimization dynamics and developing more robust optimization methods that account for the structure of the neural network.
Background papers
Wan, R., Zhu, Z., Zhang, X. and Sun, J., 2021. Spherical Motion Dynamics: Learning Dynamics of Normalized Neural Network using SGD and Weight Decay. Advances in Neural Information Processing Systems, 34.
https://proceedings.neurips.cc/paper/2021/hash/326a8c055c0d04f5b06544665d8bb3ea-Abstract.html
Neyshabur, B., Salakhutdinov, R.R. and Srebro, N., 2015. Path-sgd: Path-normalized optimization in deep neural networks. Advances in neural information processing systems, 28.
https://proceedings.neurips.cc/paper/2015/hash/eaa32c96f620053cf442ad32258076b9-Abstract.html
Dauphin, Y., De Vries, H. and Bengio, Y., 2015. Equilibrated adaptive learning rates for non-convex optimization. Advances in neural information processing systems, 28.
https://proceedings.neurips.cc/paper/2015/hash/430c3626b879b4005d41b8a46172e0c0-Abstract.html
Exam president: Prof. Amir Zamir
Thesis advisor: Prof. Martin Jaggi
Co-examiner: Prof. François Fleuret
Abstract
Modern deep neural networks have a complex structure, consisting of many layers as well as different types of trainable parameters such as convolutional filters, gains and biases. They are predominantly optimized using some form of stochastic gradient descent (SGD). The structure and parameterization of a neural network can strongly affect the conditioning of the optimization problem which greatly influences the performance of SGD and related methods. My research interests lie in understanding how neural architecture impacts the optimization dynamics and developing more robust optimization methods that account for the structure of the neural network.
Background papers
Wan, R., Zhu, Z., Zhang, X. and Sun, J., 2021. Spherical Motion Dynamics: Learning Dynamics of Normalized Neural Network using SGD and Weight Decay. Advances in Neural Information Processing Systems, 34.
https://proceedings.neurips.cc/paper/2021/hash/326a8c055c0d04f5b06544665d8bb3ea-Abstract.html
Neyshabur, B., Salakhutdinov, R.R. and Srebro, N., 2015. Path-sgd: Path-normalized optimization in deep neural networks. Advances in neural information processing systems, 28.
https://proceedings.neurips.cc/paper/2015/hash/eaa32c96f620053cf442ad32258076b9-Abstract.html
Dauphin, Y., De Vries, H. and Bengio, Y., 2015. Equilibrated adaptive learning rates for non-convex optimization. Advances in neural information processing systems, 28.
https://proceedings.neurips.cc/paper/2015/hash/430c3626b879b4005d41b8a46172e0c0-Abstract.html
Practical information
- General public
- Free