DMLR@ICLR'24

Data-centric Machine Learning Research (DMLR) Workshop at ICLR 2024 (May 11 in Vienna, Austria)

Speakers


Topics and Theme

This is the fourth edition of highly successful workshops focused on data-centric AI, following the success of the Data-Centric AI workshop at NeurIPS 2021, ICML 2022, and ICML 2023.

Topics will include, but are not limited to:

  • Data collection and benchmarking techniques
  • Data governance frameworks for ML
  • Impact of data bias, variance, and drifts
  • Role of data in foundation models: pre-training, prompting, fine-tuning
  • Optimal data for standard evaluation framework in the context of changing model landscape
  • Domain specific data issues
  • Data-centric explainable AI
  • Data-centric approaches to AI alignment
  • Active learning, Data cleaning, acquisition for ML

If you are looking for examples of works previously presented at DMLR, you can find a list of papers here.

An overview of the history and vision behind DMLR, including links to previous keynotes, you can find in our editorial DMLR: Data-centric Machine Learning Research – Past, Present and Future.

A timeline of inflection points in the development of data-centric ideas. See the editorial DMLR: Data-centric Machine Learning Research -- Past, Present and Future for more details.


AI for Science: In addition, the DMLR workshop at ICLR 2024 encourages submissions around this year’s theme on AI for Science. Unlike general AI, AI for Science uses AI to tackle unique scientific challenges, uncover rare phenomena, deepen our understanding of scientific domains, and accelerate discoveries. Data-centric topics include but are not limited to:

  • Scientific research as inherently data-driven.
  • AI system integrity depending on high-quality training data, crucial in high-stakes science.
  • Essential robust data management for vast scientific data, as seen in projects at CERN and NASA.
  • Ethical considerations such as data privacy, biases, and diverse representation in scientific research.
  • The role of science domain experts in ensuring data aligns with scientific objectives, vital for research reliability.
  • The need for rigorous, data-driven machine learning standards, akin to those in mathematical and statistical modeling.

In collaboration with the United Nations AI for Good program we now offer top submissions the opportunity to give a spotlight talk at the AI for Good Discovery Track. Authors can select the option to be considered for a spotlight presentation during the submission process.


Logistics

Session organization: virtual + in-person engagement

We aim at a discussion-centric workshop to allow for in-depth coverage of state-of-art and work-in-progress efforts and panel discussion and poster presentation along the data lifecycle in machine learning research and engineering: creation, quality and processing, governance and management/infrastructure.

The workshop will be organized in four components:

  • Keynotes and invited talks
  • Open panel discussions
  • Poster sessions
  • Networking sessions

About DMLR

DMLR is an open, distributed community organizing activities to discuss and advance research in data-centric machine learning.

We organize workshops and research retreats, maintain a journal, and run a working group at Machine Learning Commons (MLC) to support infrastructure projects.

You can find more details about the scope and history of our activities in the editorial Data-centric Machine Learning Research – Past, Present and Future.