We propose an unsupervised image segmentation method using features from pre-trained text-to-image diffusion models. Inspired by classic spectral clustering approaches, we construct adjacency matrices from self-attention layers between image patches and recursively partition using Normalised Cuts. A key insight is that self-attention probability distributions, which capture semantic relations between patches, can be interpreted as a transition matrix for random walks across the image. We leverage this by first using Random Walk Normalized Cuts directly on these self-attention activations to partition the image, minimizing transition probabilities between clusters while maximizing coherence within clusters. Applied recursively, this yields a hierarchical segmentation that reflects the rich semantics in the pre-trained attention layers, without any additional training. Next, we explore other ways to build the NCuts adjacency matrix from features, and how we can use the random walk interpretation of self-attention to capture long-range relationships. Finally, we propose an approach to automatically determine the NCut cost criterion, avoiding the need to tune this manually. We quantitatively analyse the effect incorporating different features, a constant versus dynamic NCut threshold, and incorporating multi-node paths when constructing the NCuts adjacency matrix. We show that our approach surpasses all existing methods for zero-shot unsupervised segmentation, achieving state-of-the-art results on COCO-Stuff-27 and Cityscapes.
Unsupervised Semantic Segmentation, Computer Vision, Diffusion Models
arXiv preprint
Accurately detecting and classifying damage in analogue media such as paintings, photographs, textiles, mosaics, and frescoes is essential for cultural heritage preservation. While machine learning models excel in correcting degradation if the damage operator is known a priori, we show that they fail to robustly predict \emphwhere the damage is even after supervised training; thus, reliable damage detection remains a challenge. Motivated by this, we introduce ARTeFACT, a dataset for damage detection in diverse types analogue media, with over 11,000 annotations covering 15 kinds of damage across various subjects, media, and historical provenance. Furthermore, we contribute human-verified text prompts describing the semantic contents of the images, and derive additional textual descriptions of the annotated damage. We evaluate CNN, Transformer, diffusion-based segmentation models, and foundation vision models in zero-shot, supervised, unsupervised and text-guided settings, revealing their limitations in generalising across media types. By publicly sharing our dataset, we provide a benchmark for the analogue damage detection task, with the aim to advance developments in automated analogue media restoration and preservation.
Anomaly Detection, Unsupervised Semantic Segmentation, Computer Vision, Diffusion Models
IEEE/CVF Winter Conference on Applications of Computer Vision, 2025
Presented as workshop paper at: European Conference on Computer Vision (ECCV) Workshop on VISART, 2024
In this work, we introduce Pixelsmith, a zero-shot text-to-image generative framework to sample images at higher resolutions with a single GPU. We are the first to show that it is possible to scale the output of a pre-trained diffusion model by a factor of 1000, opening the road for gigapixel image generation at no additional cost. Our cascading method uses the image generated at the lowest resolution as a baseline to sample at higher resolutions. For the guidance, we introduce the Slider, a tunable mechanism that fuses the overall structure contained in the first-generated image with enhanced fine details. At each inference step, we denoise patches rather than the entire latent space, minimizing memory demands such that a single GPU can handle the process, regardless of the image's resolution. Our experimental results show that Pixelsmith not only achieves higher quality and diversity compared to existing techniques, but also reduces sampling time and artifacts.
Diffusion Models, Foundation models, Computer Vision, Generative Models, Synthetic Data
Advances in Neural Information Processing Systems, 2024
Croissant 🥐, a metadata format to help standardise machine learning datasets. Croissant aims to enhance the discoverability and usability of datasets across various tools and platforms, making them more accessible to everyone. Today’s release includes the format documentation, an open source library, visual editor, with industry support from HuggingFace, Google Dataset Search, Kaggle, and OpenML amongst others.
Dataset metadata format, ML infrastructure, Dataset standardisation
Advances in Neural Information Processing Systems, 2024
We introduce the first continuous-time score-based generative model that leverages fractional diffusion processes for its underlying dynamics. Although diffusion models have excelled at capturing data distributions, they still suffer from various limitations such as slow convergence, mode-collapse on imbalanced data, and lack of diversity. These issues are partially linked to the use of light-tailed Brownian motion (BM) with independent increments. In this paper, we replace BM with an approximation of its non-Markovian counterpart, fractional Brownian motion (fBM), characterized by correlated increments and Hurst index H∈(0,1), where H=0.5recovers the classical BM. To ensure tractable inference and learning, we employ a recently popularized Markov approximation of fBM (MA-fBM) and derive its reverse-time model, resulting in generative fractional diffusion models (GFDM). We characterize the forward dynamics using a continuous reparameterization trick and propose augmented score matching to efficiently learn the score function, which is partly known in closed form, at minimal added cost. The ability to drive our diffusion model via MA-fBM offers flexibility and control. H≤0.5 enters the regime of rough paths whereas H>0.5 regularizes diffusion paths and invokes long-term memory. The Markov approximation allows added control by varying the number of Markov processes linearly combined to approximate fBM. Our evaluations on real image datasets demonstrate that GFDM achieves greater pixel-wise diversity and enhanced image quality, as indicated by a lower FID, offering a promising alternative to traditional diffusion models.
Generative Models, Diffusion Models, Score-based Models
Advances in Neural Information Processing Systems, 2024.
ICML 2024, Workshop on Structured Probabilistic Inference & Generative Modeling and NeurIPS 2023 Workshop on Diffusion Models, 2023.
Data is critical to advancing AI technologies, yet its quality and documentation remain significant chal- lenges, leading to adverse downstream effects (e.g., potential biases) in AI applications. This paper ad- dresses these issues by introducing Croissant-RAI, a machine-readable metadata format designed to en- hance the discoverability, interoperability, and trust- worthiness of AI datasets. Croissant-RAI extends the Croissant metadata format and builds upon exist- ing responsible AI (RAI) documentation frameworks, offering a standardized set of attributes and prac- tices to facilitate community-wide adoption. Lever- aging established web-publishing practices, such as Schema.org, Croissant-RAI enables dataset users to easily find and utilize RAI metadata regardless of the platform on which the datasets are published. Fur- thermore, it is seamlessly integrated into major data search engines, repositories, and machine learning frameworks, streamlining the reading and writing of responsible AI metadata within practitioners’ exist- ing workflows. Croissant-RAI was developed through a community-led effort. It has been designed to be adaptable to evolving documentation requirements and is supported by a Python library and a visual edi- tor.
Dataset metadata, ML infrastructure
arXiv preprint
Drawing from discussions at the inaugural DMLR workshop at ICML 2023 and meetings prior, in this report we outline the relevance of community engagement and infrastructure development for the creation of next-generation public datasets that will advance machine learning science. We chart a path forward as a collective effort to sustain the creation and maintenance of these datasets and methods towards positive scientific, societal and business impact.
Computation and Language, Artificial Intelligence (cs.AI)
ArXiv preprint
Drawing from discussions at the inaugural DMLR workshop at ICML 2023 and meetings prior, in this report we outline the relevance of community engagement and infrastructure development for the creation of next-generation public datasets that will advance machine learning science. We chart a path forward as a collective effort to sustain the creation and maintenance of these datasets and methods towards positive scientific, societal and business impact.
Data-centric machine learning, Artificial intelligence, Datasets
Journal of Data-centric Machine Learning Research
DiffInfinite is a hierarchical diffusion model that generates arbitrarily large histological images while preserving long-range correlation structural information. Our approach first generates synthetic segmentation masks, subsequently used as conditions for the high-fidelity generative diffusion process. The proposed sampling method can be scaled up to any desired image size while only requiring small patches for fast training. Moreover, it can be parallelized more efficiently than previous large-content generation methods while avoiding tiling artefacts.
Generative Models, Medical Imaging, Histopathology, Diffusion Models
NeurIPS 2023 spotlight paper
While there are methods to prospectively validate the robustness of machine learning models to such dataset drifts, existing approaches do not account for explicit models of machine learning’s primary object of interest: the data. This limits our ability to study and understand the relationship between data generation and downstream machine learning model performance in a physically accurate manner. In this study, we demonstrate how to overcome this limitation by pairing traditional machine learning with physical optics to obtain explicit and differentiable data models.
Machine Learning, Artificial Intelligence, Computer Vision, Pattern Recognition, Data Drift
TMLR (Transactions on Machine Learning Research), 2023.
Presented as workshop paper at: ICML Spurious Correlations, Invariance, and Stability Workshop, 2023 • ICML Differentiable Almost Everything Workshop, 2023.
This report by the Focus Group on Artificial Intelligence for Health (FG-AI4H) goal is to assist in understanding the expectations of the regulatory bodies, promote the step-by-step implementation of the safety and effectiveness of AI/ML-based software-as-medical devices, and fill the current gap in international AI/ML-based medical device standards to the greatest extent possible.
AI/ML in healthcare, AI/ML standards in healthcare, AI/ML-based medical devices, AI checklist, regulatory framework, software-as-a-medical device
Available from ITU website, 2023
Once the raw data is collected, it is processed through a complex image signal processing (ISP) pipeline to produce an image compatible with human perception. However, this processing is rarely considered in machine learning modelling because available benchmark data sets are generally not in raw format. This study shows how to embed the forward acquisition process into the machine learning model.
machine learning, image signal processing, ISP, physical data model
Machine Learning and the Physical Sciences workshop, NeurIPS 2022, selected for a contributed talk
Jetraw images and functions may be used in end-to-end models to generate synthetic data with statistics matching those of genuine raw images, and play an important role in data-centric AI methodologies. Here we show how these features are used for a machine-learning task: the segmentation of cars in an urban, suburban and rural environment. Starting from a drone and airship image dataset in the Jetraw format (with calibrated sensor and optics), we use an end-to-end model to emulate realistic satellite raw images with on-demand parameters.
synthetic data, machine learning, AI, data-centric AI, satellite, drones, compression
8th International Workshop on On-Board Payload, Athens, 26 September 2022
Interestingly, a recent metrologically accurate algorithm, offering up to 10:1 compression ratio, provides a prediction spread equivalent to that stemming from raw noise. The method described here allows to set a lower bound to the predictive uncertainty of a SL task and can be generalized to determine the statistical distortions originated from a variety of processing pipelines in AI-assisted fields.
Artificial Intelligence (AI), Supervised Learning (SL) models, Deep Learning (DL) algorithms
Scientific Reports (2022) 12:3464
In this work, we target the paper-to-practice gap by applying an ML4H audit framework proposed by the ITU/WHO Focus Group on Artificial Intelligence for Health (FG-AI4H) to three use cases: diagnostic prediction of diabetic retinopathy, diagnostic prediction of Alzheimer’s disease, and cytomorphologic classification for leukemia diagnostics.
Machine Learning, Health, Testing
Proceedings of the Machine Learning for Health, PMLR 136:280-317, 2020
The current movement towards increased use of lossy compression is highly risky, because even careful and tedious parameter tuning cannot guarantee that no applications are compromised. We implemented and validated a compression method that simultaneously provides a strong data reduction and preserves analysis results for all possible applications.
hyperspectral imaging, machine learning, Earth Observation, satellites, compression
Proceedings of ATTRACT Online Conference "Igniting the Deep Tech Revolution", 22 September 2020, online
In this paper, we discuss requirements for compression tuned for machine vision, demonstrate an implementation achieving a compression ratio in the range 5:1–10:1 at a rate 200 MB/s/core in software and 400 MB/s on a VHDL FPGA simulation having a 5k-LUT footprint. We also show that adding a machine-learning component to our compressor increases the compression ratio by 10% and allows for easy portability of an otherwise complex algorithm on enterogenous architectures.
compression, satellites, machine learning, AI, Earth Observation, ESA
7th International Workshop on On-Board Payload Data Compression by ESA and CNES, virtual online workshop, 2020