Scientific publications by Dotphoton

We generalize the continuous time framework for score-based generative models from an underlying Brownian motion (BM) to an approximation of fractional Brownian motion (FBM). We derive a continuous reparameterization trick and the reverse time model by representing FBM as a stochastic integral over a family of Ornstein-Uhlenbeck processes to define generative fractional diffusion models (GFDM) with driving noise converging to a non-Markovian process of infinite quadratic variation. The Hurst index H∈(0,1) of FBM enables control of the roughness of the distribution transforming path. To the best of our knowledge, this is the first attempt to build a generative model upon a stochastic process with infinite quadratic variation.

AUTHORS

Gabriel Nobis, Marco Aversa, Maximilian Springenberg, Michael Detzel, Stefano Ermon, Shinichi Nakajima, and Roderick Murray-Smith, Sebastian Lapuschkin, Christoph Knochenhauer, Luis Oala, Wojciech Samek

KeYWORDS

Diffusion models

SOURCE

In NeurIPS 2023 Workshop on Diffusion Models, 2023

DiffInfinite is a hierarchical diffusion model that generates arbitrarily large histological images while preserving long-range correlation structural information. Our approach first generates synthetic segmentation masks, subsequently used as conditions for the high-fidelity generative diffusion process. The proposed sampling method can be scaled up to any desired image size while only requiring small patches for fast training. Moreover, it can be parallelized more efficiently than previous large-content generation methods while avoiding tiling artefacts.

AUTHORS

Marco Aversa, Gabriel Nobis, Miriam Hägele, Kai Standvoss, Mihaela Chirica, Roderick Murray-Smith, Ahmed Alaa, Lukas Ruff, Daniela Ivanova, Wojciech Samek, Frederick Klauschen, Bruno Sanguinetti, Luis Oala

KeYWORDS

Generative Models, Medical Imaging, Histopathology, Diffusion Models

SOURCE

NeurIPS 2023 spotlight paper

While there are methods to prospectively validate the robustness of machine learning models to such dataset drifts, existing approaches do not account for explicit models of machine learning’s primary object of interest: the data. This limits our ability to study and understand the relationship between data generation and downstream machine learning model performance in a physically accurate manner. In this study, we demonstrate how to overcome this limitation by pairing traditional machine learning with physical optics to obtain explicit and differentiable data models.

AUTHORS

Luis Oala, Marco Aversa, Gabriel Nobis, Kurt Willis, Yoan Neuenschwander, Michèle Buck, Christian Matek, Jerome Extermann, Enrico Pomarico, Wojciech Samek, Roderick Murray-Smith, Christoph Clausen, Bruno Sanguinetti

KeYWORDS

Machine Learning, Artificial Intelligence, Computer Vision, Pattern Recognition, Data Drift

SOURCE

TMLR (Transactions on Machine Learning Research), 2023. Presented as workshop paper at: ICML Spurious Correlations, Invariance, and Stability Workshop, 2023 • ICML Differentiable Almost Everything Workshop, 2023

This report by the Focus Group on Artificial Intelligence for Health (FG-AI4H) goal is to assist in understanding the expectations of the regulatory bodies, promote the step-by-step implementation of the safety and effectiveness of AI/ML-based software-as-medical devices, and fill the current gap in international AI/ML-based medical device standards to the greatest extent possible.

AUTHORS

Luis Oala, Christian Johner, Peter G Goldsmidt, Pradeep Balachandran. Contributors (in alphabetical order): Aaron Y. Lee, Alixandro Werneck, Andrew Murchison, Anle Lin, Christoph Molnar, Johannes Tanne, Juliet Rumball-Smith, Pat Baird, Peter. G. Goldschmidt, Pierre Quartarolo, Shan Xu, Sven Piechottka, Zack Hornberger

KeYWORDS

AI/ML in healthcare, AI/ML standards in healthcare, AI/ML-based medical devices, AI checklist, regulatory framework, software-as-a-medical device

SOURCE

Available from ITU website, 2023

Once the raw data is collected, it is processed through a complex image signal processing (ISP) pipeline to produce an image compatible with human perception. However, this processing is rarely considered in machine learning modelling because available benchmark data sets are generally not in raw format. This study shows how to embed the forward acquisition process into the machine learning model.

AUTHORS

Marco Aversa, Luis Oala, Christoph Clausen, Roderick Murray-Smith, Bruno Sanguinetti

KeYWORDS

machine learning, image signal processing, ISP, physical data model

SOURCE

Machine Learning and the Physical Sciences workshop, NeurIPS 2022, selected for a contributed talk

Jetraw images and functions may be used in end-to-end models to generate synthetic data with statistics matching those of genuine raw images, and play an important role in data-centric AI methodologies. Here we show how these features are used for a machine-learning task: the segmentation of cars in an urban, suburban and rural environment. Starting from a drone and airship image dataset in the Jetraw format (with calibrated sensor and optics), we use an end-to-end model to emulate realistic satellite raw images with on-demand parameters.

AUTHORS

Marco Aversa, Ziad Malik, Phillip Geier, Fabien Droz, Andres Upegui, Roderick Murray-Smith, Christoph Clausen, Bruno Sanguinetti

KeYWORDS

synthetic data, machine learning, AI, data-centric AI, satellite, drones, compression

SOURCE

8th International Workshop on On-Board Payload, Athens, 26 September 2022

Interestingly, a recent metrologically accurate algorithm, offering up to 10:1 compression ratio, provides a prediction spread equivalent to that stemming from raw noise. The method described here allows to set a lower bound to the predictive uncertainty of a SL task and can be generalized to determine the statistical distortions originated from a variety of processing pipelines in AI-assisted fields.

AUTHORS

Enrico Pomarico, Cédric Schmidt, Florian Chays, David Nguyen, Arielle Planchette, AudreyTissot, Adrien Roux, Stéphane Pagès, Laura Batti, Christoph Clausen, Theo Lasser, Aleksandra, Radenovic, Bruno Sanguinetti & Jérôme Extermann

KeYWORDS

Artificial Intelligence (AI), Supervised Learning (SL) models, Deep Learning (DL) algorithms

SOURCE

Scientific Reports (2022) 12:3464

In this work, we target the paper-to-practice gap by applying an ML4H audit framework proposed by the ITU/WHO Focus Group on Artificial Intelligence for Health (FG-AI4H) to three use cases: diagnostic prediction of diabetic retinopathy, diagnostic prediction of Alzheimer’s disease, and cytomorphologic classification for leukemia diagnostics.

AUTHORS

Luis Oala, Jana Fehr, Luca Gilli, Pradeep Balachandran, Alixandro Werneck Leite, Saul Calderon-Ramirez, Danny Xie Li, Gabriel Nobis, Erick Alejandro Mu˜noz Alvarado, Giovanna Jaramillo-Gutierrez, Christian Matek, Arun Shroff, Ferath Kherif, Bruno Sanguinetti, Thomas Wiegand

KeYWORDS

Machine Learning, Health, Testing

SOURCE

Proceedings of the Machine Learning for Health, PMLR 136:280-317, 2020

The current movement towards increased use of lossy compression is highly risky, because even careful and tedious parameter tuning cannot guarantee that no applications are compromised. We implemented and validated a compression method that simultaneously provides a strong data reduction and preserves analysis results for all possible applications.

AUTHORS

Christoph Clausen, Bruno Sanguinetti, Yosef Akhtman, Enrico Pomarico, Jérôme Extermann

KeYWORDS

hyperspectral imaging, machine learning, Earth Observation, satellites, compression

SOURCE

Proceedings of ATTRACT Online Conference "Igniting the Deep Tech Revolution", 22 September 2020, online

In this paper, we discuss requirements for compression tuned for machine vision, demonstrate an implementation achieving a compression ratio in the range 5:1–10:1 at a rate 200 MB/s/core in software and 400 MB/s on a VHDL FPGA simulation having a 5k-LUT footprint. We also show that adding a machine-learning component to our compressor increases the compression ratio by 10% and allows for easy portability of an otherwise complex algorithm on enterogenous architectures.

AUTHORS

Bruno Sanguinetti, Christoph Clausen, Michael Desert, and Evgeniya Balysheva

KeYWORDS

compression, satellites, machine learning, AI, Earth Observation, ESA

SOURCE

7th International Workshop on On-Board Payload Data Compression by ESA and CNES, virtual online workshop, 2020