Unravalling aspects of machine learning through a latent lens

Thumbnail

Event details

Date 16.06.2025
Hour 10:0011:00
Speaker Carl Allen
Location
Category Conferences - Seminars
Event Language English
Part 1: Variational Classification: A Probabilistic Generalization of the Softmax Classifier
Abstract: Neural network softmax classifiers output label predictions q(y|z) that achieve high accuracy, but how well do those outputs reflect ground truth label distributions p(y|z)? For example, are labels with ≈0.7 on a majority class correct ≈70% of the time (known as "calibration")? Empirical studies suggest typically not. To understand why, we theoretically probe softmax classifiers by considering inputs to the softmax layer as a latent variable z and deriving a variational training objective that generalises standard cross entropy. This shows a potential inconsistency between learned latent class distributions q(z|y), and p(z|y) required by the softmax layer in order for true label distributions to be output. Adding a term to mitigate this produces a "variational classifier", which offers new insight into the inner workings of widely-used softmax classifiers; and simultaneously improves several desirable properties: calibration, adversarial robustness, robustness to distribution shift and sample efficiency while maintaining accuracy (evaluated on several image/text datasets).
Part 2: Unpicking Data at the Seams: VAEs, Disentanglement and Independent Components
Abstract: Disentanglement, or identifying statistically independent factors of the data, is relevant to much of machine learning, from controlled data generation and robust classification to efficient encoding and improving our understanding of the data itself. Disentanglement arises in several generative paradigms including Variational Autoencoders (VAEs), Generative Adversarial Networks and diffusion models. Prior work takes a step towards understanding disentanglement in VAEs by showing that the common choice of diagonal posterior covariance matrices promotes orthogonality between columns of the decoder's Jacobian. Taking this further, we close the gap in understanding disentanglement by showing how if follows from Jacobian column-orthogonality and that it equates to factoring the data distribution into statistically independent components over the data manifold.

Practical information

  • Informed public
  • Free

Contact

  • lenaic.chizat@epfl.ch

Share