BEGIN:VCALENDAR
VERSION:2.0
PRODID:-//Memento EPFL//
BEGIN:VEVENT
SUMMARY:RCP Workshop: Advanced Deep Learning with PyTorch and NVIDIA Ecosy
 stem
DTSTART:20250929T130000
DTEND:20250930T170000
DTSTAMP:20260408T115638Z
UID:c1500fa96e79eed1b576b9a4f859f107d0b4695a19d06215f17a5582
CATEGORIES:Internal trainings
DESCRIPTION:This event is organized by the Research Computing Platform (R
 CP)\, which provides campus-wide IT infrastructure for the research commun
 ity.\n\nThis two half-day workshop is designed for practitioners and resea
 rchers eager to deepen their understanding of advanced deep learning conce
 pts\, tools\, and best practices using PyTorch\, NVIDIA NeMo\, CUDA\, and 
 other state-of-the-art frameworks. The agenda blends lectures\, demonstrat
 ions\, and collaborative discussions to provide a thorough exploration of 
 generative AI\, parallelism\, checkpointing\, resiliency and model deploym
 ent across multi-GPU and multi-node environments.\n\n! Registration is req
 uired to attend this workshop. Access is restricted to EPFL participants. 
 !\n\nRegistration form (restricted to EPFL email addresses): https://form
 s.office.com/e/g0sPMJzm2h\n\nThe two half-day sessions cover different con
 tent\, and we strongly encourage participants to attend both sessions for 
 the full learning experience.\n\nAGENDA\n\nDay 1: Sept 29 Afternoon (Half 
 Day) - Room SV 1717\n\n13:00 – 13:15 | Registration and Welcome\n\n	Par
 ticipant check-in\n	Workshop objectives and introductions\n\n13:15 – 13:
 30 | Welcome talk\, by Prof. Martin Jaggi (EPFL)\n\n13:30 – 14:00 | Fund
 amentals of GPU Architecture and Accelerated Computing \n\n\n	Introductio
 n to modern GPU architectures\n	Understanding memory hierarchies\, bandwid
 th\, and compute units\n	Overview of CUDA C/C++ and CUDA Python\n\n14:00 
 – 15:00 | Fundamentals of Deep Learning \n\n\n	An Introduction to Deep
  Learning\n	How a Neural Network Trains\n	Data Augmentation \n	Pre-Traine
 d Models\n	Generative AI\n\n15:00 – 15:15 | Towards a Graph Foundation 
 Model for Digital Pathology – by Sevda Ögüt (EPFL LTS4 PhD Student)\n\
 n\n	An introduction to our project on large-scale self-supervised pre-trai
 ning with graphs for histopathology.\n\n15:30 – 16:30 | Data Parallelism
 : Training Deep Learning Models on Multiple GPUs \n\n\n	Concepts of data 
 parallelism and distributed computing\n	Implementing data-parallel strateg
 ies with PyTorch’s DDP (Distributed Data Parallel)\n\n16:30 – 17:30 | 
 Model Parallelism and Large Model Deployment\n\n\n	Scaling and parallelizi
 ng  large neural networks\n	Techniques for managing large-model memory fo
 otprints and optimizing training performance\n	Introduction to model paral
 lelism frameworks\n	Example a transformer model using model parallelism\n\
 n17:30 | End of Day 1 (Half Day)\n\n\n	Recap and open Q&A\n	Preview of the
  afternoon session\n\n\nDay 2: Sept 30 Afternoon (Half Day) - Room: BC 420
 \n\n13:00 – 13:15 | Welcome Back and Review\n\n\n	Summary of Day 1 key l
 earnings\n	Outline of Day 2 agenda\n\n\n13:15 – 13:30 | Curating legally
  compliant and transparent training data at scale: insights from SwissAI's
  Apertus data collection and preparation - by Sven Najem-Meyer (EPFL PhD)
 \n\nThis presentation explores how SwissAI addresses legal compliance and 
 data transparency in training the Apertus LLM. It outlines the challenges 
 encountered and the methodologies applied to curate regulation-aligned dat
 asets\, as well as the tools used to enable efficient parallel data prepro
 cessing.\n\n13:30 – 14:00 | Generative AI with Diffusion Models\n\n\n	In
 troduction to generative AI concepts and applications\n	Exploring diffusio
 n models and their significance\n	Overview of NVIDIA’s generative AI too
 ls\n\n14:00 – 14:45 | Building Transformer-Based Natural Language Proces
 sing Pipelines \n\n\n	Advanced NLP techniques\n	Best practices for traini
 ng and fine-tuning large language models\n	Best practices for optimization
 s: speed and memory\n\n14:45 – 15:00 | Coffee Break\n\n\n	Snacks and net
 working\n\n15:00 – 15:15 | Protein Design on RCP – by Julius Wenckste
 rn (EPFL PhD Student)\n\n\n	How GPU-accelerated\, large-scale protein desi
 gn can help us learn more about biology.\n\n15:15 – 16:00 | Checkpointin
 g & Resiliency: Concepts\, Strategies\, and Frameworks\n\n\n	Deep dive int
 o checkpointing + resiliency for model recovery\, and efficient training\n
 	Overview of checkpointing + resiliency tools in PyTorch\, and NVIDIA fram
 eworks\n	Example of robust checkpointing based on Nemo and PyTorch\n\n16:0
 0 – 16:30 | Scaling CUDA Applications to Multiple Nodes – \n\n\n	Mult
 i-GPU and Multi-Node Programming: Frameworks and Libraries\n	Consideration
 s for scaling across GPU cluster\n	 Profiling and performance optimizatio
 n strategies\n\n16:30 – 17:00 | Closing Remarks\, Q&A\, and Next Steps\n
 \n\n	Open discussion and feedback\n	Meet the experts. Talk 1:1 or 1:N abou
 t your projects and challenge.\n	Resources for continued learning – DLI\
 , teacher kit\, ambassador program (Cristel)\n	Certificate distribution an
 d farewell\n\n\n! Registration is required to attend this workshop. Access
  is restricted to EPFL participants. !
LOCATION:
STATUS:CONFIRMED
END:VEVENT
END:VCALENDAR
