HPC for Learning

Seminars: When and Where?

In person, about once a month on a Thursday evening, with a limited number of seats. Seminars are generally hosted at SCAI (Sorbonne Center for Artificial Intelligence, see a map here) or at CICSU (Centre International de Conférences Sorbonne Université, see a map here), which are located in Jussieu (line 7). Because space is limited, registration is free but required. Click to expand the abstracts.

Upcoming

6 Feb 2026 · 18:00–19:00 — Ulysse Beaugnon (Google)
Title: ML for Systems: Practical Lessons from Deploying AI in Google's Production Infrastructure

Location: Seminar room of SCAI, Sorbonne Université, Paris. To attend, please register here.

Abstract

The foundational software systems that power machine learning advances—operating systems, compilers, and storage stacks—remain largely governed by sub-optimal and often outdated hand-tuned heuristics. As the complexity of modern systems grows and the pressure on compute resources intensifies, these static rules are becoming a bottleneck. In this talk, I will cover a few selected examples of how we applied ML to address system problems within Google production infrastructure. Beyond the success stories, I will also discuss the challenges, and lessons learned the hard way from applying ML in a system serving billions of users each day.

13 Mar 2026 · 16:00–17:00 — Alena Kopanicakova (Toulouse INP)
Title: Training of Deep Neural Networks Using Multilevel and Domain-Decomposition Methods

Location: Exceptionally, the seminar will be online. To attend, please register here.

Abstract

Training deep neural networks (DNNs) is predominantly carried out using stochastic gradient descent and its variants. While these methods are robust and widely applicable, their convergence often deteriorates for large-scale, ill-conditioned, or stiff problems commonly encountered in scientific machine learning. This has motivated the development of more advanced training strategies that can accelerate convergence, offer better parallelism, enable convergence control, and facilitate the automatic tuning of hyperparameters. To this end, we introduce a novel training framework for DNNs inspired by nonlinear multilevel and domain-decomposition (ML-DD) methods. Starting from deterministic ML-DD algorithms, we will discuss how to ensure the convergence in the presence of the subsampling noise. Moreover, we will present several strategies for constructing a hierarchy of subspaces by exploring the properties of the network architecture, data representation, and the loss function. The performance of the proposed ML-DD training algorithms will be demonstrated through a series of numerical experiments from the field of scientific machine learning, such as physics-informed neural networks or operator learning approaches.

[1] Gratton, S., Kopaničáková, A., & Toint, P. L. (2023). Multilevel objective-function-free optimization with an application to neural networks training. SIAM Journal on Optimization, 33(4), 2772-2800.
[2] Gratton, S., Kopaničáková, A., & Toint, P. (2025). Recursive bound-constrained AdaGrad with applications to multilevel and domain decomposition minimization. arXiv preprint arXiv:2507.11513.

19 March 2026 · 18:00–19:00 — Max Zimmer (Zuse Institute Berlin)
Title: Local Pruning: Efficient Post-Training Compression at Scale

Location: Seminar room of SCAI, Sorbonne Université, Paris. To attend, please register here.

Abstract

Pruning -- the removal of parameters from neural networks -- is a well-known technique for reducing the inference compute and memory requirements of large models. State-of-the-art methods for LLMs operate layer-wise, minimizing a per-layer objective on a small calibration dataset. The underlying NP-hard problem is that of finding a binary mask that determines which weights to keep, the so-called mask selection problem. While many existing approaches effectively ignore weight interactions in that selection, this talk presents two alternatives. Operating in discrete space, SparseSwaps is a local search method that refines any given mask via pairwise weight exchanges, each evaluable efficiently. On the other hand, operating in continuous space, SparseFW relaxes the combinatorial constraints to their convex hull and solves the resulting convex program using the Frank-Wolfe algorithm. Across modern GPT architectures, both methods reduce per-layer pruning error by up to 60-80% over existing approaches, with consistent improvements in perplexity and downstream accuracy.

20 May 2026 · 18:00–19:00 — Hamza Benchekroun (H company)
Title: From Text to Action: Scaling VLM Post-Training for Agents

Location: Room 105, Tower 44, CISCU, Sorbonne Université, Paris. Exceptionally, the seminar is being held on a Wednesday. To attend, please register here.

Abstract

While recent breakthroughs in distributed training have largely commoditized the training of large language models—exemplified by rapid milestones like the NanoGPT speedrun[1]—extending this efficiency to Vision-Language Models (VLMs) remains a major hurdle. As we push toward more complex agentic systems, the computational and architectural friction of integrating and aligning additional modalities at scale is becoming a significant bottleneck. In this talk, I will explore the different methods we use at H to scale multimodal post-training for agentic use cases. Beyond these techniques, I will also share the practical engineering challenges and the lessons we learned the hard way from training massive models across multiple modalities.

[1] modded-nanogpt: Speedrunning the NanoGPT baseline, Keller et.al, https://github.com/KellerJordan/modded-nanogpt

4 June 2026 · 18:00–19:00 — Eugene Belilovsky (MILA)
Title: Toward Globally Distributed Training of Foundation Models

Location: Seminar room of SCAI, Sorbonne Université, Paris. To attend, please register here.

Abstract

Frontier foundation model training has been growing in scale and resource demands. It is largely dominated by homogenous centralized training clusters with co-located compute and expensive high-bandwidth interconnects, and is often limited by power consumption. Harnessing globally distributed and heterogenous computational resources for these jobs is bottlenecked by the communication cost of moving data between accelerators. This also often dictates how, where, and by whom these models can be trained. Building on a line of work in communication-efficient optimization, we will discuss a recent work considers practical low-bandwidth pre-training of foundation models, and particularly LLMs. We will first look at a line of work on data-parallel communication-efficient methods based on infrequent communication and gradient compression, discussing how these methods perform and scale to larger training scenarios. We will then consider settings where models far exceed the memory of individual accelerators, and how this can be addressed by low-bandwidth alternatives to traditional model parallelism that allow broader participation with lower-resource compute.

25 June 2026 postponed to July 2 due to the heatwave in France · 17:00–18:00 — Onofrio Semeraro (CNRS/LISN)
Title: On the Reliability of Machine Learning in Scientific Computing: From Ergodic Dataset Design to Physics-Based Classification

Location: Room 105, Tower 44, CISCU, Sorbonne Université, Paris. To attend, please register here.

Abstract

Machine learning is rapidly transforming scientific computing; yet its apparent simplicity often masks critical issues such as limited generalizability, lack of guarantees, and strong case dependency. Increasing dataset size or model complexity does not necessarily address these problems and can incur high computational costs. In this presentation, we address these challenges through two illustrative examples that highlight both methodological and practical perspectives. First, we examine modeling and prediction of dynamical systems using neural networks. Specifically, we use Long Short-Term Memory (LSTM) networks to assess how the structure of training data and the role of memory gates affect long-term predictions. Drawing on insights from ergodic theory and curriculum learning, we analyze how dataset design can ensure physically consistent modeling and open pathways for active learning. The second example addresses a more practical application where the challenge arises from two coupled limitations: the scarcity of labelled simulations and the very large number of degrees of freedom per sample. We present two strategies to overcome these limitations. The first is a physics-based clustering approach, where the computational domain is segmented into meaningful regions by clustering quantities derived from the governing equations. The second is a morphing-based approach, in which heterogeneous simulations are aligned onto a common reference domain through smooth deformation models. The proposed framework is validated on application scenarios of increasing complexity, including two-dimensional aerodynamic flows around airfoils with controlled geometric variations and three-dimensional simulations of airflow in patient-specific upper airways for pathology classification.

Past seminars

No past seminars yet.

Program Committee

Program Chairs

Edouard Oyallon — CNRS, Sorbonne University
Alexandre Allauzen — Paris Dauphine
Alexandre Défossez — Kyutai
Michael Eickenberg — H
Thomas Moreau — Inria
Sixin Zhang — IRIT

Resources

Mailing list

To register to our mailing list, please fill in this form.

Contact

To propose a talk, please send a title, short abstract, at edouard.oyallon at cnrs dot fr.

Partners

As a member of the GDR C4P, the HPC for Learning GT holds seminars under the Paris ELLIS Unit, hosted on-site in the SCAI building. We gratefully acknowledge PEPR SHARP for financial support.