Scientific publications by Dotphoton

Unsupervised Segmentation by Diffusing, Walking and Cutting

We propose an unsupervised image segmentation method using features from pre-trained text-to-image diffusion models. Inspired by classic spectral clustering approaches, we construct adjacency matrices from self-attention layers between image patches and recursively partition using Normalised Cuts. A key insight is that self-attention probability distributions, which capture semantic relations between patches, can be interpreted as a transition matrix for random walks across the image. We leverage this by first using Random Walk Normalized Cuts directly on these self-attention activations to partition the image, minimizing transition probabilities between clusters while maximizing coherence within clusters. Applied recursively, this yields a hierarchical segmentation that reflects the rich semantics in the pre-trained attention layers, without any additional training. Next, we explore other ways to build the NCuts adjacency matrix from features, and how we can use the random walk interpretation of self-attention to capture long-range relationships. Finally, we propose an approach to automatically determine the NCut cost criterion, avoiding the need to tune this manually. We quantitatively analyse the effect incorporating different features, a constant versus dynamic NCut threshold, and incorporating multi-node paths when constructing the NCuts adjacency matrix. We show that our approach surpasses all existing methods for zero-shot unsupervised segmentation, achieving state-of-the-art results on COCO-Stuff-27 and Cityscapes.

AUTHORS

Daniela Ivanova, Marco Aversa, Paul Henderson, John Williamson

KeYWORDS

Unsupervised Semantic Segmentation, Computer Vision, Diffusion Models

publication

arXiv preprint

PRESENTED AT

ARTeFACT: Benchmarking Segmentation Models on Diverse Analogue Media Damage

Accurately detecting and classifying damage in analogue media such as paintings, photographs, textiles, mosaics, and frescoes is essential for cultural heritage preservation. While machine learning models excel in correcting degradation if the damage operator is known a priori, we show that they fail to robustly predict \emphwhere the damage is even after supervised training; thus, reliable damage detection remains a challenge. Motivated by this, we introduce ARTeFACT, a dataset for damage detection in diverse types analogue media, with over 11,000 annotations covering 15 kinds of damage across various subjects, media, and historical provenance. Furthermore, we contribute human-verified text prompts describing the semantic contents of the images, and derive additional textual descriptions of the annotated damage. We evaluate CNN, Transformer, diffusion-based segmentation models, and foundation vision models in zero-shot, supervised, unsupervised and text-guided settings, revealing their limitations in generalising across media types. By publicly sharing our dataset, we provide a benchmark for the analogue damage detection task, with the aim to advance developments in automated analogue media restoration and preservation.

AUTHORS

Daniela Ivanova, Marco Aversa, Paul Henderson, John Williamson

KeYWORDS

Anomaly Detection, Unsupervised Semantic Segmentation, Computer Vision, Diffusion Models

publication

IEEE/CVF Winter Conference on Applications of Computer Vision, 2025

PRESENTED AT

Presented as workshop paper at: European Conference on Computer Vision (ECCV) Workshop on VISART, 2024

Is One GPU Enough? Pushing Image Generation at Higher-Resolutions with Foundation Models

In this work, we introduce Pixelsmith, a zero-shot text-to-image generative framework to sample images at higher resolutions with a single GPU. We are the first to show that it is possible to scale the output of a pre-trained diffusion model by a factor of 1000, opening the road for gigapixel image generation at no additional cost. Our cascading method uses the image generated at the lowest resolution as a baseline to sample at higher resolutions. For the guidance, we introduce the Slider, a tunable mechanism that fuses the overall structure contained in the first-generated image with enhanced fine details. At each inference step, we denoise patches rather than the entire latent space, minimizing memory demands such that a single GPU can handle the process, regardless of the image's resolution. Our experimental results show that Pixelsmith not only achieves higher quality and diversity compared to existing techniques, but also reduces sampling time and artifacts.

AUTHORS

Athanasios Tragakis, Marco Aversa, Chaitanya Kaul, Roderick Murray-Smith, Daniele Faccio

KeYWORDS

Diffusion Models, Foundation models, Computer Vision, Generative Models, Synthetic Data

publication

Advances in Neural Information Processing Systems, 2024

PRESENTED AT

Croissant: A Metadata Format for ML-Ready Datasets (Spotlight)

Croissant 🥐, a metadata format to help standardise machine learning datasets. Croissant aims to enhance the discoverability and usability of datasets across various tools and platforms, making them more accessible to everyone. Today’s release includes the format documentation, an open source library, visual editor, with industry support from HuggingFace, Google Dataset Search, Kaggle, and OpenML amongst others.

AUTHORS

Mubashara Akhtar, Omar Benjelloun, Costanza Conforti, Luca Foschini, Joan Giner-Miguelez, Pieter Gijsbers, Sujata Goswami, Nitisha Jain, Michalis Karamousadakis, Michael Kuchnik, Satyapriya Krishna, Sylvain Lesage, Quentin Lhoest, Pierre Marcenac, Manil Maskey, Peter Mattson, Luis Oala, Hamidah Oderinwale, Pierre Ruyssen, Tim Santos, Rajat Shinde, Elena Simperl, Arjun Suresh, Goeffry Thomas, Slava Tykhonov, Joaquin Vanschoren, Susheel Varma, Jos van der Velde, Steffen Vogler, Carole-Jean Wu, Luyao Zhang

KeYWORDS

Dataset metadata format, ML infrastructure, Dataset standardisation

publication

Advances in Neural Information Processing Systems, 2024

PRESENTED AT

Generative fractional diffusion models

We introduce the first continuous-time score-based generative model that leverages fractional diffusion processes for its underlying dynamics. Although diffusion models have excelled at capturing data distributions, they still suffer from various limitations such as slow convergence, mode-collapse on imbalanced data, and lack of diversity. These issues are partially linked to the use of light-tailed Brownian motion (BM) with independent increments. In this paper, we replace BM with an approximation of its non-Markovian counterpart, fractional Brownian motion (fBM), characterized by correlated increments and Hurst index H∈(0,1), where H=0.5recovers the classical BM. To ensure tractable inference and learning, we employ a recently popularized Markov approximation of fBM (MA-fBM) and derive its reverse-time model, resulting in generative fractional diffusion models (GFDM). We characterize the forward dynamics using a continuous reparameterization trick and propose augmented score matching to efficiently learn the score function, which is partly known in closed form, at minimal added cost. The ability to drive our diffusion model via MA-fBM offers flexibility and control. H≤0.5 enters the regime of rough paths whereas H>0.5 regularizes diffusion paths and invokes long-term memory. The Markov approximation allows added control by varying the number of Markov processes linearly combined to approximate fBM. Our evaluations on real image datasets demonstrate that GFDM achieves greater pixel-wise diversity and enhanced image quality, as indicated by a lower FID, offering a promising alternative to traditional diffusion models.

AUTHORS

Gabriel Nobis, Maximilian Springenberg, Marco Aversa, Michael Detzel, Rembert Daems, Roderick Murray-Smith, and Shinichi Nakajima, Sebastian Lapuschkin, Stefano Ermon, Tolga Birdal, Manfred Opper, Christoph Knochenhauer, Luis Oala, Wojciech Samek

KeYWORDS

Generative Models, Diffusion Models, Score-based Models

publication

Advances in Neural Information Processing Systems, 2024.

PRESENTED AT

ICML 2024, Workshop on Structured Probabilistic Inference & Generative Modeling and NeurIPS 2023 Workshop on Diffusion Models, 2023.

A Standardized Machine-readable Dataset Documentation Format for Responsible AI

Data is critical to advancing AI technologies, yet its quality and documentation remain significant chal- lenges, leading to adverse downstream effects (e.g., potential biases) in AI applications. This paper ad- dresses these issues by introducing Croissant-RAI, a machine-readable metadata format designed to en- hance the discoverability, interoperability, and trust- worthiness of AI datasets. Croissant-RAI extends the Croissant metadata format and builds upon exist- ing responsible AI (RAI) documentation frameworks, offering a standardized set of attributes and prac- tices to facilitate community-wide adoption. Lever- aging established web-publishing practices, such as Schema.org, Croissant-RAI enables dataset users to easily find and utilize RAI metadata regardless of the platform on which the datasets are published. Fur- thermore, it is seamlessly integrated into major data search engines, repositories, and machine learning frameworks, streamlining the reading and writing of responsible AI metadata within practitioners’ exist- ing workflows. Croissant-RAI was developed through a community-led effort. It has been designed to be adaptable to evolving documentation requirements and is supported by a Python library and a visual edi- tor.

AUTHORS

Nitisha Jain, Mubashara Akhtar, Joan Giner-Miguelez, Rajat Shinde, Joaquin Vanschoren, Steffen Vogler, Sujata Goswami, Yuhan Rao, Tim Santos, Luis Oala, Michalis Karamousadakis, Manil Maskey, Pierre Marcenac, Costanza Conforti, Michael Kuchnik, Lora Aroyo, Omar Benjelloun, Elena Simperl

KeYWORDS

Dataset metadata, ML infrastructure

publication

arXiv preprint

PRESENTED AT

Introducing v0.5 of the AI Safety Benchmark from MLCommons

Drawing from discussions at the inaugural DMLR workshop at ICML 2023 and meetings prior, in this report we outline the relevance of community engagement and infrastructure development for the creation of next-generation public datasets that will advance machine learning science. We chart a path forward as a collective effort to sustain the creation and maintenance of these datasets and methods towards positive scientific, societal and business impact.

AUTHORS

Adarsh Agrawal, Ahmed M. Ahmed, Victor Akinwande, Najla Alfaraj, Elie Alhajjar, Lora Aroyo, Max Bartolo, Borhane Blili-Hamelin, Kurt Bollacker, Rishi Bomassani, Marisa Ferrara Boston, Siméon Campos, Kal Chakra, Canyu Chen, Cody Coleman, Zacharie Delpierre Coudert, Leon Derczynski, Debojyoti Dutta, Ian Eisenberg, James Ezick, Heather Frase, Brian Fuller, Ram Gandikota, Agasthya Gangavarapu, Ananya Gangavarapu, James Gealy, et al

KeYWORDS

Computation and Language, Artificial Intelligence (cs.AI)

publication

ArXiv preprint

PRESENTED AT

DMLR: Data-centric Machine Learning Research

Drawing from discussions at the inaugural DMLR workshop at ICML 2023 and meetings prior, in this report we outline the relevance of community engagement and infrastructure development for the creation of next-generation public datasets that will advance machine learning science. We chart a path forward as a collective effort to sustain the creation and maintenance of these datasets and methods towards positive scientific, societal and business impact.

AUTHORS

Luis Oala, Manil Maskey, Lilith Bat-Leah, Alicia Parrish, Nezihe Merve Gu ̈rel, Tzu-Sheng Kuo, Yang Liu, Rotem Dror, Danilo Brajovic, Xiaozhe Yao, Max Bartolo, William Gaviria Rojas, Ryan Hileman, Rainier Aliment, Michael W. Mahoney, Meg Risdal, Matthew Lease, Wojciech Samek, Debo Dutta, Curtis Northcutt, Cody Coleman, Braden Hancock, Bernard Koch, Girmaw Abebe Tadesse, Bojan Karlas, Ahmed Alaa, Adji Bousso Dieng, Natasha Noy, Vijay Janapa Reddi, James Zou, Praveen Paritosh, Mihaela van der Schaar, Kurt Bollacker, Lora Aroyo, Ce Zhang, Joaquin Vanschoren, Isabelle Guyon, Peter Mattson

KeYWORDS

Data-centric machine learning, Artificial intelligence, Datasets

publication

Journal of Data-centric Machine Learning Research

PRESENTED AT

DiffInfinite: Large mask-image synthesis via parallel random patch diffusion in histopathology

DiffInfinite is a hierarchical diffusion model that generates arbitrarily large histological images while preserving long-range correlation structural information. Our approach first generates synthetic segmentation masks, subsequently used as conditions for the high-fidelity generative diffusion process. The proposed sampling method can be scaled up to any desired image size while only requiring small patches for fast training. Moreover, it can be parallelized more efficiently than previous large-content generation methods while avoiding tiling artefacts.

AUTHORS

Marco Aversa, Gabriel Nobis, Miriam Hägele, Kai Standvoss, Mihaela Chirica, Roderick Murray-Smith, Ahmed Alaa, Lukas Ruff, Daniela Ivanova, Wojciech Samek, Frederick Klauschen, Bruno Sanguinetti, Luis Oala

KeYWORDS

Generative Models, Medical Imaging, Histopathology, Diffusion Models

publication

NeurIPS 2023 spotlight paper

PRESENTED AT

Data models for dataset drift controls in machine learning with optical images

While there are methods to prospectively validate the robustness of machine learning models to such dataset drifts, existing approaches do not account for explicit models of machine learning’s primary object of interest: the data. This limits our ability to study and understand the relationship between data generation and downstream machine learning model performance in a physically accurate manner. In this study, we demonstrate how to overcome this limitation by pairing traditional machine learning with physical optics to obtain explicit and differentiable data models.

AUTHORS

Luis Oala, Marco Aversa, Gabriel Nobis, Kurt Willis, Yoan Neuenschwander, Michèle Buck, Christian Matek, Jerome Extermann, Enrico Pomarico, Wojciech Samek, Roderick Murray-Smith, Christoph Clausen, Bruno Sanguinetti

KeYWORDS

Machine Learning, Artificial Intelligence, Computer Vision, Pattern Recognition, Data Drift

publication

TMLR (Transactions on Machine Learning Research), 2023.

PRESENTED AT

Presented as workshop paper at: ICML Spurious Correlations, Invariance, and Stability Workshop, 2023 • ICML Differentiable Almost Everything Workshop, 2023.

Good practices for health applications of machine learning: Considerations for manufacturers and regulators

This report by the Focus Group on Artificial Intelligence for Health (FG-AI4H) goal is to assist in understanding the expectations of the regulatory bodies, promote the step-by-step implementation of the safety and effectiveness of AI/ML-based software-as-medical devices, and fill the current gap in international AI/ML-based medical device standards to the greatest extent possible.

AUTHORS

Luis Oala, Christian Johner, Peter G Goldsmidt, Pradeep Balachandran. Contributors (in alphabetical order): Aaron Y. Lee, Alixandro Werneck, Andrew Murchison, Anle Lin, Christoph Molnar, Johannes Tanne, Juliet Rumball-Smith, Pat Baird, Peter. G. Goldschmidt, Pierre Quartarolo, Shan Xu, Sven Piechottka, Zack Hornberger

KeYWORDS

AI/ML in healthcare, AI/ML standards in healthcare, AI/ML-based medical devices, AI checklist, regulatory framework, software-as-a-medical device

publication

Available from ITU website, 2023

PRESENTED AT

Physical data models in machine learning imaging pipelines

Once the raw data is collected, it is processed through a complex image signal processing (ISP) pipeline to produce an image compatible with human perception. However, this processing is rarely considered in machine learning modelling because available benchmark data sets are generally not in raw format. This study shows how to embed the forward acquisition process into the machine learning model.

AUTHORS

Marco Aversa, Luis Oala, Christoph Clausen, Roderick Murray-Smith, Bruno Sanguinetti

KeYWORDS

machine learning, image signal processing, ISP, physical data model

publication

Machine Learning and the Physical Sciences workshop, NeurIPS 2022, selected for a contributed talk

PRESENTED AT

Data-centric AI workflow based on compressed raw images

Jetraw images and functions may be used in end-to-end models to generate synthetic data with statistics matching those of genuine raw images, and play an important role in data-centric AI methodologies. Here we show how these features are used for a machine-learning task: the segmentation of cars in an urban, suburban and rural environment. Starting from a drone and airship image dataset in the Jetraw format (with calibrated sensor and optics), we use an end-to-end model to emulate realistic satellite raw images with on-demand parameters.

AUTHORS

Marco Aversa, Ziad Malik, Phillip Geier, Fabien Droz, Andres Upegui, Roderick Murray-Smith, Christoph Clausen, Bruno Sanguinetti

KeYWORDS

synthetic data, machine learning, AI, data-centric AI, satellite, drones, compression

publication

8th International Workshop on On-Board Payload, Athens, 26 September 2022

PRESENTED AT

Statistical distortion of supervised learning predictions in optical microscopy induced by image compression

Interestingly, a recent metrologically accurate algorithm, offering up to 10:1 compression ratio, provides a prediction spread equivalent to that stemming from raw noise. The method described here allows to set a lower bound to the predictive uncertainty of a SL task and can be generalized to determine the statistical distortions originated from a variety of processing pipelines in AI-assisted fields.

AUTHORS

Enrico Pomarico, Cédric Schmidt, Florian Chays, David Nguyen, Arielle Planchette, AudreyTissot, Adrien Roux, Stéphane Pagès, Laura Batti, Christoph Clausen, Theo Lasser, Aleksandra, Radenovic, Bruno Sanguinetti & Jérôme Extermann

KeYWORDS

Artificial Intelligence (AI), Supervised Learning (SL) models, Deep Learning (DL) algorithms

publication

Scientific Reports (2022) 12:3464

PRESENTED AT

ML4H auditing: from paper to practice

In this work, we target the paper-to-practice gap by applying an ML4H audit framework proposed by the ITU/WHO Focus Group on Artificial Intelligence for Health (FG-AI4H) to three use cases: diagnostic prediction of diabetic retinopathy, diagnostic prediction of Alzheimer’s disease, and cytomorphologic classification for leukemia diagnostics.

AUTHORS

Luis Oala, Jana Fehr, Luca Gilli, Pradeep Balachandran, Alixandro Werneck Leite, Saul Calderon-Ramirez, Danny Xie Li, Gabriel Nobis, Erick Alejandro Mu˜noz Alvarado, Giovanna Jaramillo-Gutierrez, Christian Matek, Arun Shroff, Ferath Kherif, Bruno Sanguinetti, Thomas Wiegand

KeYWORDS

Machine Learning, Health, Testing

publication

Proceedings of the Machine Learning for Health, PMLR 136:280-317, 2020

PRESENTED AT

Unchaining hyperspectral imaging with quantum-inspired compression (UHIQIC)

The current movement towards increased use of lossy compression is highly risky, because even careful and tedious parameter tuning cannot guarantee that no applications are compromised. We implemented and validated a compression method that simultaneously provides a strong data reduction and preserves analysis results for all possible applications.

AUTHORS

Christoph Clausen, Bruno Sanguinetti, Yosef Akhtman, Enrico Pomarico, Jérôme Extermann

KeYWORDS

hyperspectral imaging, machine learning, Earth Observation, satellites, compression

publication

Proceedings of ATTRACT Online Conference "Igniting the Deep Tech Revolution", 22 September 2020, online

PRESENTED AT

Jetraw: validated image compression for quantitative and AI applications

In this paper, we discuss requirements for compression tuned for machine vision, demonstrate an implementation achieving a compression ratio in the range 5:1–10:1 at a rate 200 MB/s/core in software and 400 MB/s on a VHDL FPGA simulation having a 5k-LUT footprint. We also show that adding a machine-learning component to our compressor increases the compression ratio by 10% and allows for easy portability of an otherwise complex algorithm on enterogenous architectures.

AUTHORS

Bruno Sanguinetti, Christoph Clausen, Michael Desert, and Evgeniya Balysheva

KeYWORDS

compression, satellites, machine learning, AI, Earth Observation, ESA

publication

7th International Workshop on On-Board Payload Data Compression by ESA and CNES, virtual online workshop, 2020

PRESENTED AT

Have an idea for research or media collaboration? Let's join forces!

Thank you! Your submission has been received!

Oops! Something went wrong while submitting the form.

newsletter

Get product updates and industry insights in your inbox