BEGIN:VCALENDAR
VERSION:2.0
PRODID:-//Memento EPFL//
BEGIN:VEVENT
SUMMARY:On the Connection Between Learning Two-Layers Neural Networks and 
 Tensor Decomposition
DTSTART:20180710T140000
DTEND:20180710T150000
DTSTAMP:20260407T183815Z
UID:1fb5e41b651c2c66948b4a7265007b67cf5453df1b781f203173d302
CATEGORIES:Conferences - Seminars
DESCRIPTION:Dr. Marco Mondelli\, Stanford University\n\n \nWe establish c
 onnections between the problem of learning a two-layers neural network wit
 h good generalization error and tensor decomposition. We consider a model 
 with d-dimensional input x\, r hidden units with weights w_i and output y\
 , i.e.\, y=\\sum_{i=1}^r \\sigma(<x\, w_i>)\, where < .\, . > denotes the 
 scalar product and \\sigma the activation function.\n\nFirst\, we show tha
 t\, if we cannot learn the weights w_i accurately\, then the neural networ
 k does not generalize well. More specifically\, the generalization error i
 s close to that of a trivial predictor with access only to the norm of the
  input. We prove this result in a model with separated isotropic weights a
 nd in a model with random weights. In both settings\, we assume that the i
 nput distribution is Gaussian\, which is common in the theoretical literat
 ure. Then\, we show that the problem of learning the weights w_i is at lea
 st as hard as the problem of tensor decomposition. We prove this result fo
 r any input distribution\, and we assume that the activation function is a
  polynomial whose degree is related to the order of the tensor to be decom
 posed.  By putting everything together\, we prove that learning a two-lay
 ers neural network that generalizes well is at least as hard as tensor dec
 omposition. It has been observed that neural network models with more para
 meters than training samples often generalize well\, even if the problem i
 s highly underdetermined. This means that the learning algorithm does not 
 estimate the weights accurately and yet is able to yield a good generaliza
 tion error. This paper shows that such a phenomenon cannot occur with a tw
 o-layers neural network when the input distribution is Gaussian. We also p
 rovide numerical evidence supporting our theoretical findings.\n\nBased on
  joint work with Andrea Montanari [https://arxiv.org/abs/1802.07301].\n 
LOCATION:INR 113 https://plan.epfl.ch/?room=INR113
STATUS:CONFIRMED
END:VEVENT
END:VCALENDAR
