DMLR Data-centric Machine Learning Research

ICLR 2024

All authors and submissions should adhere to the ICLR policy.

We welcome two types of paper submissions:
- Research papers: up to 8 pages (not including references and appendices). Acceptable material includes original and high-quality unpublished contributions to the theory, practical aspects, as well as position papers relevant to the workshop topics.
- Extended abstracts: up to 2 pages (not including references and appendices). Acceptable material includes work which has already been submitted or published, preliminary results and controversial findings.
Posting all versions of a paper that is submitted to DMLR workshop, on preprint servers like ArXiv is permitted. Once the paper is accepted, the preprint version should be marked with the publication information.
The use of LLMs is allowed as a general-purpose writing assist tool. Authors should understand that they take full responsibility for the contents of their papers, including content generated by LLMs that could be construed as plagiarism or scientific misconduct (e.g., fabrication of facts). LLMs are not eligible for authorship.
Authors who choose to create new datasets must provide access to the datasets (view and download) to help reviewers assess submitted works. We strongly encourage authors to submit supplementary material (as a separate PDF) including:
- Data Card: we recommend authors to check data card template.
- Data Sheet: Check a datasheet example.
Authors are strongly encouraged to include a paragraph-long Reproducibility Statement at the end of the main text (before references) to discuss the efforts that have been made to ensure reproducibility. This optional reproducibility statement will not count toward the page limit, but should not be more than 1 page. We encourage authors to check model card template.
Submissions should adhere to the DMLR style templates: Latex template
Submissions are only accepted in written English.
All papers must be proofread (not just spell-checked) by the authors before submission.
Submission URL: https://openreview.net/group?id=ICLR.cc/2024/Workshop/DMLR (The submission site is open.)

NEW For poster submission NEW

In-person attendees should bring their poster in ICLR format details
Virtual attendees should send their poster via e-mail to danilo.brajovic@ipa.fraunhofer.de (until 26th April)

Important Dates

(Time zone: Anywhere on Earth)

Paper Submission deadline: ~~03 February 2024~~ 08 February 2024
Notification of Acceptance: 03 March 2024
Camera Ready Copy due: 05 April 2024 (optional; but no-submit means we (i) cannot create link on our website and (ii) do not release the paper via open review)
Poster submission for virtual presenters: 26 April 2024
Workshop: 11 May 2024

Topics and Theme

Topics will include, but are not limited to:

Data collection and benchmarking techniques
Data governance frameworks for ML
Impact of data bias, variance, and drifts
Role of data in foundation models: pre-training, prompting, fine-tuning
Optimal data for standard evaluation framework in the context of changing model landscape
Domain specific data issues
Data-centric explainable AI
Data-centric approaches to AI alignment
Active learning, Data cleaning, acquisition for ML

If you are looking for examples of works previously presented at DMLR, you can find a list of papers here.

An overview of the history and vision behind DMLR, including links to previous keynotes, you can find in our editorial DMLR: Data-centric Machine Learning Research – Past, Present and Future.

A timeline of inflection points in the development of data-centric ideas. See the editorial DMLR: Data-centric Machine Learning Research -- Past, Present and Future for more details.

AI for Science: In addition, the DMLR workshop at ICLR 2024 encourages submissions around this year’s theme on AI for Science. Unlike general AI, AI for Science uses AI to tackle unique scientific challenges, uncover rare phenomena, deepen our understanding of scientific domains, and accelerate discoveries. Data-centric topics include but are not limited to:

Scientific research as inherently data-driven.
AI system integrity depending on high-quality training data, crucial in high-stakes science.
Essential robust data management for vast scientific data, as seen in projects at CERN and NASA.
Ethical considerations such as data privacy, biases, and diverse representation in scientific research.
The role of science domain experts in ensuring data aligns with scientific objectives, vital for research reliability.
The need for rigorous, data-driven machine learning standards, akin to those in mathematical and statistical modeling.

In collaboration with the United Nations AI for Good program we now offer top submissions the opportunity to give a spotlight talk at the AI for Good Discovery Track. Authors can select the option to be considered for a spotlight presentation during the submission process.

DMLR Journal

The Journal of Data-centric Machine Learning Research (DMLR) is the latest member of the JMLR family, aiming to provide a top archival venue for high-quality scholarly articles focused on the data aspect of machine learning research. The top submissions to the DMLR workshops will be invited to submit extended version of their paper to the DMLR journal.

Workshop Organizers

Fatimah Alzamzami · Jerone Andrews · Lilith Bat-Leah · Danilo Brajovic · Holger Caesar · Mayee Chen · Paolo Climaco · Bernard Koch · Bolei Ma · Manil Maskey · Chanjun Park · Praveen Paritosh · Alicia Parrish · Sang Truong · Steffen Vogler · Zhangyang “Atlas” Wang · Xiaozhe Yao

Contact

If you have any questions about paper submission and the workshop, please join our Discord channel here: https://discord.gg/jYk3FNfYqG.