DMLR Data-centric Machine Learning Research

ICML 2024

A timeline of inflection points in the development of data-centric ideas. See the editorial DMLR: Data-centric Machine Learning Research -- Past, Present and Future for more details.


Topics and Theme

This is the fifth edition of highly successful workshops focused on data-centric AI, following the success of the Data-Centric AI workshop at NeurIPS 2021, ICML 2022, ICML 2023, and ICLR 2024.


Large-scale foundation models are revolutionizing machine learning, particularly in vision and language domains. While model architecture received significant attention in the past, recent focus has shifted towards the importance of data quality, size, and diversity, and provenance.

This workshop aims to highlight cutting-edge advancements in data-centric approaches for large-scale foundation models in new domains, in addition to language and vision, and engage the vibrant interdisciplinary community of researchers, practitioners, and engineers who tackle practical data challenges related to foundation models. By featuring innovative research and facilitating collaboration, it aims to bridge the gap between dataset-centric methodologies and the development of robust, versatile foundation models that are able to work in and across a variety of domains in service of humanity.

Topics will include, but are not limited to

  • Data sources for large-scale datasets:
  • Construction of datasets from large quantities of unlabeled/uncurated data
  • Model-assisted dataset construction
  • Quality signals for large-scale datasets
  • Datasets for evaluation
  • Datasets for specific applications.
  • Impact of dataset drifts in large-scale models
  • Ethical considerations for and governance of large-scale datasets
  • Data curation and HCI
  • Submissions to benchmarks such as DataPerf, DynaBench, and DataComp

If you are looking for examples of works previously presented at DMLR, you can find a list of papers here.

An overview of the history and vision behind DMLR, including links to previous keynotes, you can find in our editorial DMLR: Data-centric Machine Learning Research – Past, Present and Future.


A few selected exceptional research papers from DMLR workshop 2024 will be invited to contribute to the DMLR journal; the latest member of the JMLR family, aiming to provide a top archival venue for high-quality scholarly articles focused on the data aspect of machine learning research. The top submissions to the DMLR workshops will be invited to submit extended versions of their papers to the DMLR journal.


Important Dates

(Time zone: Anywhere on Earth)

  • Paper Submission deadline: May 24, 2024 May 30, 2024
  • Notification of Acceptance: June 17, 2024
  • Camera Ready Copy due: July 12, 2024

Session organization: virtual + in-person engagement

We aim at a discussion-centric workshop to allow for in-depth coverage of state-of-art and work-in-progress efforts and panel discussion and poster presentation along the data lifecycle in machine learning research and engineering: creation, quality and processing, governance and management/infrastructure.

The workshop will be organized in four components:

  • Keynotes and invited talks
  • Open panel discussions
  • Poster sessions
  • Networking sessions

For accepted papers, please see the details on the Program page.

About DMLR

DMLR is an open, distributed community organizing activities to discuss and advance research in data-centric machine learning.

We organize workshops and research retreats, maintain a journal, and run a working group at Machine Learning Commons (MLC) to support infrastructure projects.

You can find more details about the scope and history of our activities in the editorial Data-centric Machine Learning Research – Past, Present and Future.