I am a PhD candidate in machine learning at the Technical University of Munich in cooperation with the learning systems research group at Siemens AG. At Siemens, I work on industrial applications of machine learning. We solve modelling problems together with domain experts and work on related infrastructure tasks.
My research focuses on encoding expert knowledge into hierarchical probabilistic models to facilitate inference and specify what to learn from data. Models become more data efficient and more trustworthy if the result of learning is a collection of expert-interpretable components. In my work, I explore how Bayesian non-parametric models can be composed to enforce abstract constraints, yield principled reasoning under uncertainty, and enable scalable and reliable inference.
M.Sc. in Computer Science, 2017
KTH Royal Institute of Technology
M.Sc. in Computer Science, 2016
Technical University of Munich
B.Sc. in Computer Science, 2013
Technical University of Munich
In this extension, we demonstrate how semantic decompositions of dynamics models for Reinforcement Learning significantly increase data efficiency. We show how good model specification is critical for success and how the decomposition can be used for reward shaping.
I present how machine learning tasks in industrial settings differ from typical applications on the internet. As they require explicit handling of uncertainties and the incorporation of expert knowledge, the Bayesian paradigm is a good fit to formulate models.
Together with Carl Henrik Ek and Neill Campbell, I organized a workshop on uncertainty propagation in composite models at the Siemens AI Lab.
I argue that the task a model will be used for should be an explicit part of the modelling process. As an example, I show how ideas from Bayesian Optimization and Probabilistic Numberics can be used to reinterpret Reinforcement Learning.
We discuss how mean-field assumptions for inference in deep Gaussian processes lead to collapse of uncertainties. We propose possible modifications to discover compositional structure in training data and yield informative uncertainties.
We show how additional structure can be placed on surrogate models for Bayesian optimization to find the trends useful to exploit in search of the optimum. At the core of our approach is the use of a Latent Gaussian Process Regression model that allows us to modulate the input domain with an orthogonal latent space.
We interpret the data-association problem of multimodal regression in the context of deep Gaussian processes and present an inference scheme based on doubly stochastic variational inference.
We demonstrate how expert knowledge can be incorporated in probabilistic policy search by imposing Bayesian structure on the learning problem. Our models yield human-interpretable insights about the underlying dynamics and significantly increase data efficiency.
We extend multi-output Gaussian processes with nonlinear alignments and warpings. The resulting model connects multiple deep Gaussian processes with a shared layer that allows us to extract shared latent data from multiple time series.
In this talk, I present an introduction to pseudo-input methods for sparse GP approximations. I derive the variational lower bounds for SGPR and SVGP and give some intution for how they should be interpreted.
This project contains the code required for the installation and configuration of the different services running on my Linux server. To simplify dependency management, I use Docker-based deployments.
In my master’s thesis I explore a variant of PILCO for Bayesian model-based reinforcement learning using Gaussian processes. Instead of optimizing a closed-form parameterized policy, I select actions by applying particle swarm optimization to the expected reward, which takes uncertainties about the system dynamics into account.
Power diagrams are a generalization of Voronoi diagrams where the cell centers attract points with different forces. In this report I present an algorithm which calculates the incidence struture of such a diagram using the convex hull of a set of dual points.
LLVM-IL is a Scala-Library used to emit a subset of the textual LLVM-IR Code. Besides the direct commands, it contains some specific OOP features, like the creation of simple V-Tables paired with field access and virtual resolve. It works together with a simple runtime written in C.
The slides I created while teaching the tutorial for theoretical computer science at TU Munich. Theoretical computer sciences is held in the fourth semester of the Bachelor. It is an introduction to automata theory, formal grammars, computability and complexity theory.
Oblivious routing is generalization of multi commodity flows where the actual demand function is unknown. In this report I present a $\mathcal{O}(\log n)$ approximation algorithm using tree metrics. This result is then applied to the minimum bisection problem asking for an vertex bisection with minimal cost in the edges between the sets, also resulting in an $\mathcal{O}(\log n)$ approximation.
The slides I created while teaching the tutorial for discrete structures at TU Munich. Discrete structures is the first mathematical course for comptuer scientists held in the first semester of the Bachelor. It is an introduction to mathematical proofs, combinatorics, graph theory and algebra.