Accepted Papers
DMLR@ICLR 2024
We thank all authors who submitted to DMLR@ICLR 2024. All accepted manuscripts are listed below in random order. Authors who did not opt-in to publish their manuscript on the DMLR site have only the title of their work listed. Congrats to all author teams for being accepted to DMLR@ICLR 2024!
Posters with manuscript
-
When is Off-Policy Evaluation Useful? A Data-Centric Perspective by Hao Sun, Alex James Chan, Nabeel Seedat, Alihan Hüyük, Mihaela van der Schaar
-
Identifying Spurious Correlations Early in Training through the Lens of Simplicity Bias by Yu Yang, Eric Gan, Gintare Karolina Dziugaite, Baharan Mirzasoleiman
-
Calibrated prediction of scarce adverse drug reaction labels with conditional neural processes by Miguel Garcia Ortegon, Srijit Seal, Shantanu Singh, Andreas Bender, Sergio Bacallado
-
Data Distribution Valuation by Xinyi Xu, Shuaiqi Wang, Chuan-Sheng Foo, Bryan Kian Hsiang Low, Giulia Fanti
-
Corrective Machine Unlearning by Shashwat Goel, Ameya Prabhu, Philip Torr, Ponnurangam Kumaraguru, Amartya Sanyal
-
Heterogeneous Normal Classes Pose a Challenge for Anomaly Detection by Alain Ryser, Thomas M. Sutter, Alexander Marx, Julia E Vogt
-
Retail-786k: a Large-Scale Dataset for Visual Entity Matching by Bianca Lamm, Janis Keuper
-
Towards Grounded Visual Spatial Reasoning in Multi-Modal Vision Language Models by Navid Rajabi, Jana Kosecka
-
Learning to Rank for One-Round Active Learning by Zixin Ding, Si Chen, Ruoxi Jia, Yuxin Chen
-
Long Is More for Alignment: A Simple but Tough-to-Beat Baseline for Instruction Fine-Tuning by Hao Zhao, Maksym Andriushchenko, Francesco Croce, Nicolas Flammarion
-
WINDSET: Weather Insights and Novel Data for Systematic Evaluation and Testing by Rajat Shinde, Christopher E Phillips, Sujit Roy, Aman Gupta, Aditi Sheshadri, Manil Maskey, Rahul Ramachandran
-
PRE: Vision-Language Prompt Learning with Reparameterization Encoder by Thi Minh Anh Pham, An Duc Nguyen, Vasileios Argyriou, Georgios Tzimiropoulos, Cephas Svosve
-
Model-Based Data-Centric AI: Bridging the Divide Between Academic Ideals and Industrial Pragmatism by Chanjun Park, Minsoo Khang, Dahyun Kim
-
Deploying Data Selection Techniques on Dynamic Datasets by Maximilian Böther, Ana Klimovic
-
AdaDemo: Data-Efficient Demonstration Expansion for Generalist Robotic Agent by Tongzhou Mu, Yijie Guo, Jie Xu, Ankit Goyal, Hao Su, Dieter Fox, Animesh Garg
-
Atomic Data Groups: An issue in train-test splits for the real world as demonstrated through digital hardware design by Andrew David Gunter, Steven J E Wilton
-
Is margin all you need? An extensive empirical study of deep active learning on tabular data by Dara Bahri, Heinrich Jiang, Tal Schuster, Afshin Rostamizadeh
-
OODRobustBench: a benchmark and large-scale analysis of adversarial robustness under distribution shift by Lin Li, Yifei Wang, Chawin Sitawarin, Michael W. Spratling
-
Learning representations of learning representations by Rita González-Márquez, Dmitry Kobak
-
Urban Sound Propagation: a Benchmark for 1-Step Generative Modeling of Complex Physical Systems by Martin Spitznagel, Janis Keuper
-
Birbal: An efficient 7B instruct-model fine-tuned with curated datasets by Ashvini Kumar Jindal, Pawan Kumar Rajpoot, Ankur Parikh
-
Towards Robust Data Pruning by Artem M Vysogorets, Julia Kempe
-
Beyond Scale: The Diversity Coefficient as a Data Quality Metric for Variability in Natural Language Data by Brando Miranda, Alycia Lee, Sudharsan Sundar, Allison Casasola, Sanmi Koyejo
-
Graph Kernel Convolutions for Interpretable Classification by Magdalena Proszewska, Siddharth N
-
GRASP-GCN: Graph-Shape Prioritization for Neural Architecture Search under Distribution Shifts by Sofia Casarin, Emanuele Caruso, Oswald Lanz
-
Improving Semantic Segmentation Models through Synthetic Data Generation via Diffusion Models by Jonas Rabensteiner, Cynthia Ifeyinwa Ugwu, Oswald Lanz
-
Quantifying the Importance of Data Alignment in Downstream Model Performance by Krrish Chawla, Mario DePavia, Aryan Sahai, Sudharsan Sundar, Brando Miranda
-
Enabling High Data Throughput Reinforcement Learning on GPUs: A Domain Agnostic Framework for Data-Driven Scientific Research by Tian Lan, Huan Wang, Caiming Xiong, Silvio Savarese
-
Towards Efficient Active Learning in NLP via Pretrained Representations by Artem M Vysogorets, Achintya Gopal
-
Lifelong Benchmarks: Efficient Model Evaluation in an Era of Rapid Progress by Ameya Prabhu, Vishaal Udandarao, Philip Torr, Matthias Bethge, Adel Bibi, Samuel Albanie
-
Distributional Dataset Distillation with Subtask Decomposition by Tian Qin, Zhiwei Deng, David Alvarez-Melis
-
Towards Algorithmic Fairness by means of Instance-level Data Re-weighting based on Shapley Values by Adrian Arnaiz-Rodriguez, Nuria M Oliver
-
Unveiling the Intertwined Relationship Between Essential Sparsity and Robustness in Large Pre-trained Models by Saebyeol Shin, AJAY KUMAR JAISWAL, Shiwei Liu, Zhangyang Wang
-
Annotating Ambiguous Images: General Annotation Strategy for High-Quality Data with Real-World Biomedical Validation by Lars Schmarje, Vasco Grossmann, Claudius Zelenka, Johannes Brünger, Reinhard Koch
-
Bidirectional Long-Range Parser for Sequential Data Understanding by George Leotescu, Daniel Voinea, Alin-ionut Popa
-
Autoregressive activity prediction for low-data drug discovery by Johannes Schimunek, Lukas Friedrich, Daniel Kuhn, Günter Klambauer
-
CLE-SMOTE: Addressing Extreme Imbalanced Data Classification with Contrastive Learning-Enhanced SMOTE by Cara Lee, Faisal Nabulsi, Michael Xu, Christopher Kan, Andrew Kan, Rachel Yun, Bryan Jiang, Aiden Yun, Rana Suleiman, Talal Nabulsi, Isam Kharouf, Zaid Nabulsi
-
Computational Copyright: Towards A Royalty Model for AI Music Generation Platforms by Junwei Deng, Jiaqi Ma
-
Measuring Diversity in Datasets by Dora Zhao, Jerone Andrews, Orestis Papakyriakopoulos, Alice Xiang
-
Open Domain Generalization with a Single Network by Regularization Exploiting Pre-trained Features by Inseop Chung, KiYoon Yoo, Nojun Kwak
-
VTruST: Controllable value function based subset selection for Data-Centric Trustworthy AI by Soumi Das, Shubhadip Nag, Shreyyash Sharma, Suparna Bhattacharya, Sourangshu Bhattacharya
-
Private Data Measurements for Decentralized Data Markets by Charles Lu, Mohammad Mohammadi Amiri, Ramesh Raskar
-
TOTEM: Tokenized Time Series Embeddings for General Time Series Analysis by Sabera J Talukder, Yisong Yue, Georgia Gkioxari
-
Verified Training for Counterfactual Explanation Robustness under Data Shift by Anna P. Meyer, Yuhao Zhang, Loris D’Antoni, Aws Albarghouthi
-
Building Scalable Video Understanding Benchmarks through Sports by Aniket Agarwal, Alex L Zhang, Karthik R Narasimhan, Igor Gilitschenski, Vishvak Murahari, Yash Kant
-
From Categories to Classifier: Name-Only Continual Learning by Exploring the Web by Ameya Prabhu, Hasan Abed Al Kader Hammoud, Ser-Nam Lim, Bernard Ghanem, Philip Torr, Adel Bibi
-
Style-Content Disentanglement Under Conditional Shift by Dan Andrei Iliescu, Damon Wischik
-
Enhanced Variational Autoencoder Estimation from Incomplete Data using Mixture Variational Families by Vaidotas Simkus, Michael U. Gutmann
-
CRUXEval: A Benchmark for Code Reasoning, Understanding and Execution by Alex Gu, Baptiste Roziere, Hugh James Leather, Armando Solar-Lezama, Gabriel Synnaeve, Sida Wang
-
Coactive Learning for Large Language Models using Implicit User Feedback by Aaron David Tucker, Kianté Brantley, Adam Cahall, Thorsten Joachims
-
Fractals as Pre-training Datasets for Anomaly Detection and Localization by Cynthia Ifeyinwa Ugwu, Sofia Casarin, Oswald Lanz
-
PointSAGE : Mesh-independent superresolution approach to fluid flow predictions by Rajat Sarkar, Krishna Sai Sudhir Aripirala, Vishal Sudam Jadhav, Sagar Srinivas Sakhinana, Venkataramana Runkana
-
Analyzing Diffusion Models on Synthesizing Training Datasets by Shin’ya Yamaguchi
-
Empowering Large Language Models for Textual Data Augmentation by Yichuan Li, Kaize Ding, Jianling Wang, Kyumin Lee
-
Re-evaluating Retrosynthesis Algorithms with Syntheseus by Krzysztof Maziarz, Austin Tripp, Guoqing Liu, Megan Stanley, Shufang Xie, Piotr Gaiński, Philipp Seidl, Marwin Segler
-
LLM-Guided Counterfactual Data Generation for Fairer AI by Ashish Mishra, Gyanaranjan Nayak, Suparna Bhattacharya, Tarun Kumar, Arpit Shah, Martin Foltin
-
Step-DAD: Semi-Amortized Policy-Based Bayesian Experimental Design by Desi R. Ivanova, Marcel Hedman, Cong Guan, Tom Rainforth
-
Learning Galaxy Intrinsic Alignment Correlations by Sneh Pandya, Yuanyuan Yang, Nicholas Van Alfen, Jonathan Blazek, Robin Walters
-
Towards Quantifying the Effect of Datasets for Benchmarking: A Look at Tabular Machine Learning by Ravin Kohli, Matthias Feurer, Bernd Bischl, Katharina Eggensperger, Frank Hutter
-
One Law, Many Languages: Benchmarking Multilingual Legal Reasoning for Judicial Support by Vishvaksenan Rasiah, Ronja Stern, Veton Matoshi, Matthias Stürmer, Ilias Chalkidis, Daniel E. Ho, Joel Niklaus
-
You can’t handle the (dirty) truth: Data-centric insights improve pseudo-labeling by Nabeel Seedat, Nicolas Huynh, Fergus Imrie, Mihaela van der Schaar
-
H2O+: An Improved Framework for Hybrid Offline-and-Online RL with Dynamics Gaps by Haoyi Niu, Tianying Ji, Bingqi Liu, Haocheng Zhao, Xiangyu Zhu, Jianying Zheng, Pengfei Huang, Guyue Zhou, Jianming HU, Xianyuan Zhan
-
Interpretable Graph Neural Networks for Tabular Data by Amr Alkhatib, Sofiane ENNADIR, Henrik Boström, Michalis Vazirgiannis
-
Information Compensation: A Fix for Any-scale Dataset Distillation by Peng Sun, Bei Shi, Xinyi Shang, Tao Lin
-
Combining Time Series Modalities to Create Endpoint-driven Patient Records by Robin van de Water, Axel Winter, Max M Maurer, Felix August Treykorn, Bjarne Pfitzner, Igor M. Sauer, Bert Arnrich
-
Genetic Learning for Designing Sim-to-Real Data Augmentations by Bram Vanherle, Nick Michiels, Frank Van Reeth
Posters without manuscript
-
On the Scalability of GNNs for Molecular Graphs by Maciej Sypetkowski, Frederik Wenkel, Farimah Poursafaei, Nia Dickson, Karush Suri, Philip Fradkin, Dominique Beaini
-
Feedback-guided Data Synthesis for Imbalanced Classification by Reyhane Askari Hemmat, Mohammad Pezeshki, Florian Bordes, Michal Drozdzal, Adriana Romero-Soriano
-
Beyond Human Data: Scaling Self-Training for Problem-Solving with Language Models by Avi Singh, John D Co-Reyes, Rishabh Agarwal
-
Pretraining Probabilistic Models for Scalable Precision Agriculture by Ruhana Azam, Sang T. Truong, Samuel B. Fernandes, Andrew D.B. Leakey, Alexander Lipka, Mohammed El-Kebir, Sanmi Koyejo
-
Environment-adjusted Topic Models by Dominic Sobhani, Amir Feder, David Blei
-
FTFT: efficient and robust Fine-Tuning by transFerring Training Dynamics by Yupei Du, Albert Gatt, Dong Nguyen
-
GitChameleon: Breaking the version barrier for code generation models by Nizar Islah, Justine Gehring, Diganta Misra, Massimo Caccia, Irina Rish
-
Understanding the Robustness of Multi-modal Contrastive Learning to Distribution Shift by Yihao Xue, Siddharth Joshi, Dang Nguyen, Baharan Mirzasoleiman
-
Is a picture of a bird a bird? A mixed-methods approach to understanding diverse human perspectives and ambiguity in machine vision models by Alicia Parrish, Susan Hao, Sarah Laszlo, Lora Aroyo
-
Denoising Drug Discovery ADMET Data for Improved Regression Task Performance by Matthew Adrian, Yunsie Chung, Alan C Cheng
-
Pushing the Decision Boundaries: Discovering New Classes in Audio Data by Ryuhaerang Choi, Soumyajit Chatterjee, Dimitris Spathis, Fahim Kawsar, Mohammad Malekzadeh
-
Multi-model evaluation with labeled and unlabeled data by Divya M Shanmugam, Shuvom Sadhuka, Manish Raghavan, John Guttag, Bonnie Berger, Emma Pierson
-
Exploring the Efficacy of Meta-Learning: Unveiling Superior Data Diversity Utilization of MAML Over Pre-training by Kavita Selva, Satita Vittayaareekul, Brando Miranda
-
The Science of Data Filtering: Data Curation cannot be Compute Agnostic by Sachin Goyal, Pratyush Maini, Zachary Chase Lipton, Aditi Raghunathan, J Zico Kolter
-
QuRating: Selecting High-Quality Data for Training Language Models by Alexander Wettig, Aatmik Gupta, Saumya Malik, Danqi Chen
-
Data-Efficient Multi-Modal Contrastive Learning: Prioritizing Data Quality over Quantity by Siddharth Joshi, Arnav Jain, Ali Payani, Baharan Mirzasoleiman
-
Language Models as Science Tutors by Alexis Chevalier, Jiayi Geng, Alexander Wettig, Howard Chen, Sebastian Mizera, Toni Annala, Max Aragon, Arturo Rodriguez Fanlo, Simon Frieder, Simon Machado, Akshara Prabhakar, Ellie Thieu, Jiachen T. Wang, Zirui Wang, Xindi Wu, Mengzhou Xia, Wenhan Xia, Jiatong Yu, Junjie Zhu, Zhiyong Ren, Sanjeev Arora, Danqi Chen
-
Annotation Sensitivity: Drivers of Training Data Quality by Jacob Beck, Bolei Ma, Stephanie Eckman, Christoph Kern, Rob Chew, Frauke Kreuter
-
QualEval: Qualitative Evaluation for Model Improvement by Vishvak Murahari, Ameet Deshpande, Peter Clark, Tanmay Rajpurohit, Ashish Sabharwal, Karthik R Narasimhan, Ashwin Kalyan