Abstracts
Slides of the spring school and talks are available on : this repository.
Spring school
Valentin de Bortoli (ENS Ulm)
Introduction to diffusion models - Slides - Video
Denoising diffusion models are a new paradigm in the field of generative modeling in machine learning. In the past three years all recent state-of-the-art models have relied on this technique (image and music synthesis, video and 3D generation, protein modeling...). In this talk, I will introduce the basics of generative modeling and diffusion models from a statistical point of view. I will discuss some practical and theoretical properties of these models and present some open questions in the field.
Adeline Fermanian (Califrais) and Linus Bleistein (Inria Paris)
Neural ODEs have emerged as a prominent model in the last years in a variety of areas in modern machine learning such as analyzing neural networks in the infinite depth limit, modeling time series or designing generative models. We start by discussing the links between ResNets and neural ODEs. We then present a few recent results on generalization, training and initialization of neural ODEs. Finally, we conclude the course by an in-depth review of continuous-time models for time series.
Keynotes
Darrick Lee (University of Oxford)
Random Surfaces and Higher Algebra (Part II)
Classical vector valued paths are widespread across pure and applied mathematics: from stochastic processes in probability to time series data in machine learning. Parallel transport (or path development) and path signatures provide an effective method to characterize such paths while preserving the concatenation structure of paths. In this talk, we extend this framework to build structure-preserving characterizations of surfaces using surface holonomy. This is based on joint work with Harald Oberhauser.
Yuzuru Inahama (Kyushu University)
Wong-Zakai approximation of density functions - Slides
We discuss the Wong-Zakai approximation of probability density functions of solutions (at a fixed time) of rough differential equations driven by fractional Brownian rough path with Hurst parameter H (1/2 >= H > 1/4). Besides rough path theory, we also use Hu-Watanabe's approximation theorem in the framework of Watanabe's distributional Malliavin calculus. When H=1/2, the random rough differential equations coincide with the corresponding Stratonovich-type stochastic differential equations. Even for that case, our main result seems new.
Hao Ni (University College London)
Inverting the path signature from its unitary development
The signature of a path, as a core object in rough path theory, is a faithful transformation from the path space into the tensor algebra space. While the signature is regarded as non-commutative monomials on the path space, the development of the path under some Lie group can be viewed as the moment generating function of the signature. Both signature and development serve as the principled and effective feature representations for streamed data in machine learning. Using an algebraic approach, [Chevyrev and Lyons 2016] showed that the development under the unitary group can uniquely determine the signature. In this talk, we provide an alternative, novel and constructive proof that offers a method for the explicit inversion of the signature from the unitary development, analogous to deriving moments from a moment-generating function. Our approach not only can be applied to show the uniqueness of the development under some other Lie groups, but also leads to a novel, general and computable metric on the probability measures on the path space. We illustrate the practical applications of our method through hypothesis testing on stochastic processes, highlighting its potential in generative models for time series generation.
Oleksandr Shchur (AWS AI)
Chronos: Pretrained Models for Time Series Forecasting - Slides
Time series forecasting is an essential component of decision-making in domains such as energy, retail, and finance. Traditionally, machine learning practitioners have focused on developing task-specific forecasting models that are restricted to a certain dataset or application domain. Inspired by the success of pretrained Large Language Models (LLMs) in natural language processing, it becomes imperative to explore whether a similar approach can be applied to forecasting: Can we train a single large model on huge amounts of diverse time series data, that will generalize to new unseen time series tasks? In this talk, we introduce Chronos, a family of pretrained forecasting models based on minimal modifications to LLM architectures, that accomplishes this goal. Chronos demonstrates remarkable zero-shot performance on unseen datasets, positioning pretrained models as a viable tool to greatly simplify forecasting pipelines.
Adeline Fermanian (Califrais)
Dynamic Survival Analysis with Controlled Latent States - Slides
We consider the task of learning individual-specific intensities of counting processes from a set of static variables and irregularly sampled time series. We introduce a novel modelization approach in which the intensity is the solution to a controlled differential equation. We first design a neural estimator by building on neural controlled differential equations. In a second time, we show that our model can be linearized in the signature space under sufficient regularity conditions, yielding a signature-based estimator which we call CoxSig. We provide theoretical learning guarantees for both estimators, before showcasing the performance of our models on a vast array of simulated and real-world datasets from finance, predictive maintenance and food supply chain management.
Kenji Fukumizu (Institute of Statistical Mathematics)
Neural Fourier Transform: learning group representation from data - Slides
In this study, we introduce a novel deep learning framework designed to infer group representations from data under the assumption that the data space is subject to an unknown group action. The data consists of examples of the group action, comprising a point and its transformation under a group element, or sequences generated through the successive application of a group element. Utilizing an autoencoder architecture, our approach maps the data to a latent space in a manner that is equivariant to the group action, achieving linear group action on the latent variables and thus approximating a group representation. Further, by applying block-diagonalization, we approximately decompose the representation into irreducible representations. We call this method the Neural Fourier Transform. This presents a generalized, data-driven approach to Fourier transform. We validate our framework across various scenarios, including one notable case where the group and its actions are not explicitly known, yet our method successfully learns the equivariant mapping and a group representation using only sequential data. Employing image sequences altered by transformations affecting color or shape, we demonstrate that our derived irreducible representations effectively disentangle the underlying generative processes of the data. Theoretical results supporting our methodology are also presented.
Mario Stanke (University of Greifswald)
Models for Evolution: Continuous-Time Markov Chains on a Tree - Slides
A standard model for the evolution of biological sequences is a continuous-time Markov chain on a discrete state space that evolves characters along the edges of a tree. Only the characters at the leaves are observed. The methods that are currently routinely applied in large scale to estimate the tree, its edge lengths and the parameters of the Markov chain are computationally intensive and unsatisfactory from a theoretical standpoint: The gradient of the likelihood with respect to the rate parameters of the process are estimated numerically and the tree is optimized by making random `moves` in the discrete space of all trees. I will present an auto-differentiable machine-learning layer that computes the likelihood of given sequence data. We have used this layer to discriminatively train multiple rate matrices end-to-end with a recurrent neural network for the classification of input sequences as either coding for a protein or not. We are currently using this to find genes in 64 mammalian genomes, leveraging thereby the fact that characters that code for a protein are under a particular evolutionary pressure. While this layer can also be used to compute the gradient with respect to the branch lengths, it is to my knowledge an open but worthwhile challenge to embed the discrete tree space such that in a joint model of tree, branch lengths and rate matrices the gradient with respect to all parameters of evolution can be computed and thus phylogenetic reconstruction or evolutionary classification be sped up or improved.
Invited talks
Fabian Harang (BI Norwegian Business School)
On the Signature of an Image
An analytic and algebraic understanding of iterated integral signatures associated to continuous paths has played a central role in in a wide range of mathematical areas, such as the construction of stochastic integration for non-martingales with rough paths theory, to formal representations and expansions of solutions to (partial) differential equations. In recent years, the signature has proven to be an efficient feature map for machine learning tasks, where the learning task is related to time series data, or data streams. In contrast to time series data, image data can naturally be seen as two-parameter fields taking values in multi-dimensional space, and in recent years there has been some research into the extension of the path signature to multi-parameter fields (see e.g. Chouk/Gubinelli 14, Lee/Oberhauser (21 and 23)). In this talk I will propose an new extension of the path signature to two-parameter fields motivated by expansions of solutions to certain hyperbolic PDEs with multiplicative noise. The algebraic structure of this object turns out to be rather complicated and I will discuss our current understanding of the challenges with going from 1 to 2 parameters, and provide some interesting observations related to a Chen type relation and a Shuffle type relation. At last I will briefly discuss the universality of the 2D signature, providing a universal approximation theorem, and discuss some open problems. This talk is based on forthcoming joint work with Joscha Diehl, Kurusch Ebrahimi-Fard, and Samy Tindel, and is part of the Signatures for Images project for 2023/2024 at CAS.
Antonio Orvieto (ELLIS Institute Tübingen)
Accurate and Efficient Processing of Long Sequences and Large Graphs without Attention - Slides
When applied to sequential data, transformers have an inherent challenge: their attention mechanism leads to quadratic complexity with respect to sequence length. This issue extends to graph transformers, where complexity scales quadratically with the number of nodes in the network. Today, we'll explore theoretically grounded alternatives to the attention mechanism that hinge on carefully parametrized linear recurrent neural networks. Unlike the more commonly known LSTMs and GRUs, linear RNNs are particularly GPU-efficient. This efficiency enables us to scale up the architecture, successfully study signal propagation, and achieve competitive performance. We'll present how, with a Linear Recurrent Unit (LRU) replacing attention, we can achieve state-of-the-art results on sequence modeling and graph data. This approach offers a promising direction for future research, especially in genetics, protein structure prediction, and audio/video processing and generation. Moreover, the architecture presents interesting connections with the randomized signature approach.
Carlos Amendola (TU Berlin)
Convex Hulls of Curves: Volumes and Signatures
We study the use of path signatures to compute the volume of the convex hull of a curve. We present sufficient conditions for a curve so that the volume of its convex hull can be computed by such formulae. The canonical example is the classical moment curve, and our class of curves, which we call cyclic, includes other known classes such as d-order curves and curves with totally positive torsion. We also conjecture a necessary and sufficient condition on curves for the signature volume formula to hold. Joint work with Darrick Lee and Chiara Meroni.
Maud Lemercier (University of Oxford)
A high-order numerical method for computing signature kernels
Signature kernels are at the core of several machine learning algorithms for analysing multivariate time series. The kernels of bounded variation paths, such as piecewise linear interpolations of time series data, are typically computed by solving a linear hyperbolic second-order PDE. However, this approach becomes considerably less practical for highly oscillatory inputs, due to significant time and memory complexities. To mitigate this issue, I will introduce a high-order method which involves replacing the original PDE, which has rapidly varying coefficients, with a system of coupled equations with piecewise constant coefficients. These coefficients are derived from the first few terms of the log-signatures of the input paths and can be computed efficiently using existing Python libraries.
Tim Seynnaeve (KU Leuven)
Decomposing tensor spaces via path signatures - Slides
The signature of a path is a sequence of tensors whose entries are iterated integrals, playing a key role in stochastic analysis and applications. The set of all signature tensors at a particular level gives rise to the universal signature variety. We show that the parametrization of this variety induces a natural decomposition of the tensor space via representation theory, and connect this to the study of path invariants. We also reveal certain constraints that apply to the rank and symmetry of a signature tensor. This talk is based on joint work with Carlos Améndola, Francesco Galuppi, Ángel David Ríos Ortiz, and Pierpaola Santarsiero.
Nikolas Tapia (Weierstrass Institute)
Stability of Deep Neural Networks via discrete rough paths
Using rough path techniques, we provide a priori estimates for the output of Deep Residual Neural Networks in terms of both the input data and the (trained) network weights. As trained network weights are typically very rough when seen as functions of the layer, we propose to derive stability bounds in terms of the total p-variation of trained weights for any p∈[1,3]. Unlike the C1-theory underlying the neural ODE literature, our estimates remain bounded even in the limiting case of weights behaving like Brownian motions, as suggested in (Cohen-Cont-Rossier-Xu, "Scaling Properties of Deep Residual Networks", 2021). Mathematically, we interpret residual neural network as solutions to (rough) difference equations, and analyse them based on recent results of discrete time signatures and rough path theory. Based on joint work with C. Bayer, S. Breneis and P. K. Friz.
Marco Rauscher (Technische Universität München)
Shortest-path recovery from signature with an optimal control approach
In this talk, we consider the signature-to-path reconstruction problem from the control theoretic perspective. Namely, we design an optimal control problem whose solution leads to the minimal-length path that generates a given signature. In order to do that, we minimize a cost functional consisting of two competing terms, i.e., a weighted final-time cost combined with the L2-norm squared of the controls. Moreover, we can show that, by taking the limit to infinity of the parameter that tunes the final-time cost, the problem Γ converges to the problem of finding a sub-Riemannian geodesic connecting two signatures. Finally, we provide an alternative reformulation of the latter problem, which is particularly suitable for the numerical implementation.
Kurusch Ebrahimi-Fard (NTNU Norwegian University of Science and Technology)
Log-signature of a surface holonomy
We will discuss the concept of log-signature in the context of surface holonomy. Based on joint work with I. Chevyrev, J. Diehl and N. Tapia.
Applications afternoon (Friday)
Giovanni Ballarin (University of Mannheim)
Memory of recurrent networks: Do we compute it right? - Slides
Numerical evaluations of the memory capacity (MC) of recurrent neural networks reported in the literature often contradict well-established theoretical bounds. In this paper, we study the case of linear echo state networks, for which the total memory capacity has been proven to be equal to the rank of the corresponding Kalman controllability matrix. We shed light on various reasons for the inaccurate numerical estimations of the memory, and we show that these issues, often overlooked in the recent literature, are of an exclusively numerical nature. More explicitly, we prove that when the Krylov structure of the linear MC is ignored, a gap between the theoretical MC and its empirical counterpart is introduced. As a solution, we develop robust numerical approaches by exploiting a result of MC neutrality with respect to the input mask matrix. Simulations show that the memory curves that are recovered using the proposed methods fully agree with the theory. Joint work with Lyudmila Grigoryeva and Juan-Pablo Ortega.
Remi Vaucher (University of Lyon 2 / Halias)
Detecting anomalous dynamics in multivariate time series using signature methods
The signature of a path is a wonderful tool for extracting geometric information from a multivariate time series. These informations are particularly useful for prediction and classification. On our end, we are working on detecting abnormal changes in dynamics, especially applied to real data. In this presentation, we will discuss two methods: studying the distribution of well-chosen coefficients, as well as studying the topological structure underlying the set of channels induced by the signature transform.
Nozomi Sugiura (JAMSTEC - Japan Agency for Marine-Earth Science)
Ocean Data Assimilation Focusing on Integral Quantities Characterizing Observation Profiles - Slides
An observation operator in data assimilation was formalized based on the signatures extracted from the integral quantities contained within observed vertical profiles in the ocean. A four-dimensional variational global ocean data assimilation system, founded on this observation operator, was developed and utilized to conduct preliminary data assimilation experiments over a ten-year assimilation window, comparing the proposed method, namely profile-by-profile matching, with the traditional method, namely point-by-point matching. The proposed method not only demonstrated a point-by-point skill comparable to the traditional method but also provided superior analysis fields in terms of profile shapes on the temperature-salinity plane. This is an indication of a well-balanced analysis field, in contrast to the traditional method, which can produce extremely poor relative errors for certain metrics.