Accepted Papers

We thank all authors who submitted to DMLR’23. All accepted manuscripts are listed below in random order. Authors who did not opt-in to publish their manuscript on the DMLR site have only the title of their work listed. Congrats to all author teams for being accepted to DMLR’23!

Posters with manuscript

Paper Title Authors Files  
Data Integration for Driver Telematics with Selection Biases Hashan Peiris (Simon Fraser University); Himchan Jeong (Simon Fraser University)*; Jae Kwang Kim (Iowa State University) pdf  
Accelerating Batch Active Learning Using Continual Learning Techniques Gantavya Bhatt (University of Washington, Seattle)*; Arnav Das (University of Washington); Megh M Bhalerao (University of Washington); Rui Yang (Memorial Sloan Kettering Cancer Center); Vianne R Gao (Weill Medical College); Jeff Bilmes (UW) pdf  
Taming Small-sample Bias in Low-budget Active Learning Linxin Song (Waseda University)*; Jieyu Zhang (University of Washington); Xiaotian Lu (Kyoto University); Tianyi Zhou (University of Maryland, College Park) pdf  
Training with Low-Label-Quality Data: Rank Pruning and Multi-Review Yue Xing (Michigan State University)*; Ashutosh Pandey (Meta Platforms); David Yan (Meta Platforms); Fei Wu (Meta); Michael Fronda (Meta Platforms); Pamela Bhattacharya (Meta Platforms) pdf  
Training on Thin Air: Improve Image Classification with Generated Data Yongchao Zhou (University of Toronto)*; Hshmat U Sahak (University of Toronto); Jimmy Ba (University of Toronto) pdf  
DMOps: Data Management Operations and Recipes Eujeong Choi (Upstage); Chanjun Park (Upstage)* pdf  
Inter-Annotator Agreement in the Wild: Uncovering Its Emerging Roles and Considerations in Real-World Scenarios NamHyeok Kim (Upstage); Chanjun Park (Upstage)* pdf  
Transcending Traditional Boundaries: Leveraging Inter-Annotator Agreement (IAA) for Enhancing Data Management Operations (DMOps) Damrin Kim (Konkuk University); NamHyeok Kim (Upstage); Chanjun Park (Upstage)*; Harksoo Kim (Konkuk University) pdf  
DoReMi: Optimizing Data Mixtures Speeds Up Language Model Pretraining Sang Michael Xie (Stanford University)*; Hieu Pham (Google); Xuanyi Dong (University of Technology Sydney); Nan Du (Google Brain); Hanxiao Liu (Google Brain); Yifeng Lu (Google Brain); Percy Liang (Stanford University); Quoc Le (Google Brain); Tengyu Ma (Stanford); Adams Wei Yu (Google Brain) pdf  
On the Trade-off of Intra-/Inter-class Diversity for Supervised Pre-training Jieyu Zhang (University of Washington)*; Bohan Wang (University of Science and Technology of China); zhengyu hu (NA); Pang We Koh (University of Washington); Alexander J Ratner (University of Washington) pdf  
Algorithm Selection for Deep Active Learning with Imbalanced Datasets Jifan Zhang (University of Wisconsin)*; Shuai Shao (Meta); Saurabh Verma (Meta); Robert Nowak (University of Wisconsin, Madison) pdf  
How to Improve Imitation Learning Performance with Sub-optimal Supplementary Data? Ziniu Li (The Chinese University of Hong Kong, Shenzhen)*; Tian Xu (Nanjing University); Zeyu Qin (HKUST); Yang Yu (Nanjing University); Zhiquan Luo (The Chinese University of Hong Kong, Shenzhen and Shenzhen Research Institute of Big Data) pdf  
Synthetic Alone: Exploring the Dark Side of Synthetic Data for Grammatical Error Correction Chanjun Park (Upstage)*; Seonmin Koo (Korea University); Seolhwa Lee (University of Copenhagen); Jaehyung Seo (Korea University); Sugyeong Eo (Korea University); Hyeonseok Moon (Korea University); Heuiseok Lim (Korea University) pdf  
Contrastive clustering of tabular data Piotr Przemielewski (Jagiellonian University)*; Witold Wydmański (Jagiellonian University); Marek Śmieja (Jagiellonian University) pdf  
Detecting Errors in Numerical Data via any Regression Model Hang Zhou (UC Davis); Jonas Mueller (Cleanlab)*; Mayank Kumar (Cleanlab); Jane-Ling Wang (UC Davis); Jing Lei (Carnegie Mellon University) pdf  
THOS: A Benchmark Dataset for Targeted Hate and Offensive Speech Saad A Almohaimeed (University of Central Florida)*; Saleh Almohaimeed (University of Central Florida); Ashfaq Ali Shafin (Florida International University); Bogdan Carbunar (Florida International University); Ladislau Boloni (University of Central Florida) pdf  
Understanding Unfairness via Training Concept Influence Yuanshun Yao (ByteDance); Yang Liu (UC Santa Cruz)* pdf  
Promises and Pitfalls of Threshold-based Auto-labeling Harit Vishwakarma (University of Wisconsin Madison)*; Heguang Lin (University of Wisconsin-Madison); Frederic Sala (University of Wisconsin-Madison); Ramya Korlakai Vinayak (University of Wisconsin-Madison) pdf  
Detecting Dataset Drift and Non-IID Sampling via k-Nearest Neighbors Jesse E Cummings (MIT)*; Jonas Mueller (Cleanlab); Elías Snorrason (Cleanlab) pdf  
Data-Driven Approach for Formality-Sensitive Machine Translation: Language-Specific Handling and Synthetic Data Generation Seungjun Lee (Korea University)*; Hyeonseok Moon (Korea University); Chanjun Park (Upstage); Heuiseok Lim (Korea University) pdf  
Towards Declarative Systems for Data-Centric Machine Learning Stefan Grafberger (University of Amsterdam); Bojan Karlaš (Harvard University); Paul Groth (University of Amsterdam); Sebastian Schelter (University of Amsterdam)* pdf  
DataCI: A Platform for Data-Centric AI on Streaming Data Huaizheng Zhang (BreezeML)*; Yizheng Huang (BreezeML); Yuanming Li (Independent Researcher) pdf  
EPIC: Graph Augmentation with Edit Path Interpolation via Learnable Cost Jaeseung Heo (POSTECH)*; Seungbeom Lee (POSTECH); Sungsoo Ahn (POSTECH); Dongwoo Kim (POSTECH) pdf  
Repeated Random Sampling for Minimizing the Time-to-Accuracy of Learning Patrik Okanovic (ETH Zurich); Roger Waleffe (University of Wisconsin-Madison)*; Vasilis Mageirakos (ETH Zurich); Konstantinos Nikolakakis (Yale University); Amin Karbasi (Yale); Dionysios Kalogerias (Yale University); Nezihe Merve Gürel (ETH Zürich); Theodoros Rekatsinas (ETH Zurich) pdf  
Data-Centric Defense: Shaping Loss Landscape with Augmentations to Counter Model Inversion Si Chen (Virginia Tech)*; Feiyang Kang (Virginia Tech); Nikhil Abhyankar (Virginia Tech); Ming Jin (Virginia Tech); Ruoxi Jia (Virginia Tech) pdf  
Dataset Interfaces: Diagnosing Model Failures Using Controllable Counterfactual Generation Joshua L Vendrow (MIT)*; Saachi Jain (MIT); Logan Engstrom (MIT); Aleksander Madry (MIT) pdf  
Performance Scaling via Optimal Transport: Enabling Data Selection from Partially Revealed Sources Feiyang Kang (Virginia Tech)*; Hoang Anh Just (Virginia Tech); Anit Kumar Sahu (Amazon Alexa AI); Ruoxi Jia (Virginia Tech) pdf  
Toward Practical Automatic Speech Recognition and Post-Processing: a Call for Explainable Error Benchmark Guideline Seonmin Koo (Korea University)*; Chanjun Park (Upstage); Jinsung Kim (Korea University); Jaehyung Seo (Korea University); Sugyeong Eo (Korea University); Hyeonseok Moon (Korea University); Heuiseok Lim (Korea University) pdf  
A Skew-Sensitive Evaluation Framework for Imbalanced Data Classification Min Du (Palo Alto Networks)*; Nesime Tatbul (Intel Labs and MIT); Brian Rivers (Intel); Akhilesh Kumar Gupta (University of Pennsylvania); Lucas Hu (Palo Alto Networks); Wei Wang (Palo Alto Networks); Ryan C Marcus (MIT); Shengtian Zhou (Snap); Insup Lee (University of Pennsylvania); Justin Gottschlich (Merly and Stanford University) pdf  
Investigating minimizing the training set fill distance in machine learning regression Paolo Climaco (Institut für Numerische Simulation, Universität Bonn)*; Jochen Garcke (University Bonn) pdf  
Evaluating the Capabilities of Multi-modal Reasoning Models with Synthetic Task Data Nathan Vaska (MIT Lincoln Laboratories)*; Victoria Helus (MIT Lincoln Laboratory) pdf  
Addressing Discrepancies in Semantic and Visual Alignment in Neural Networks Natalie Abreu (MIT Lincoln Laboratory); Nathan Vaska (MIT Lincoln Laboratories)*; Victoria Helus (MIT Lincoln Laboratory) pdf  
Knowledge Graph-Augmented Korean Generative Commonsense Reasoning Dahyun Jung (Korea University)*; Jaehyung Seo (Korea University); Jaewook Lee (Korea University); Chanjun Park (Upstage); Heuiseok Lim (Korea University) pdf  
Large Language Model as Attributed Training Data Generator: A Tale of Diversity and Bias Yue Yu (Georgia Institute of Technology)*; Yuchen Zhuang (Georgia Institute of Technology); Jieyu Zhang (University of Washington); Yu Meng (University of Illinois Urbana-Champaign); Alexander J Ratner (University of Washington); Ranjay Krishna (University of Washington); Jiaming Shen (Google Research); Chao Zhang (Georgia Institute of Technology) pdf  
Principlism Guided Responsible Data Curation Jerone T A Andrews (Sony AI)*; Dora Zhao (Sony AI); William Thong (Sony AI); Apostolos Modas (Sony); Orestis Papakyriakopoulos (Sony AI); Alice Xiang (Sony AI) pdf  
RewriteLM: An Instruction-Tuned Large Language Model for Text Rewriting Lei Shu (Google); Liangchen Luo (Google)*; Jayakumar Hoskere (Google); Yun Zhu (Google); Yinxiao Liu (Google); Simon Tong (Google); Jindong Chen (Google); Lei Meng (Google) pdf  
Data-OOB: Out-of-bag Estimate as a Simple and Efficient Data Value Yongchan Kwon (Columbia University); James Zou (Stanford University)* pdf  
Partial Label Learning meets Active Learning: Enhancing Annotation Efficiency through Binary Questioning Shivangana Rawat (Indian Institute of Technology, Hyderabad)*; Chaitanya Devaguptapu (Fujitsu Research); Vineeth Balasubramanian (Indian Institute of Technology Hyderabad) pdf  
Characterizing Risk Regimes for Safe Deployment of Deep Regression Models Jayaraman J. Thiagarajan (Lawrence Livermore National Laboratory)*; Vivek Narayanaswamy (Lawrence Livermore National Laboratory); Puja Trivedi (University of Michigan); Rushil Anirudh (Lawrence Livermore National Laboratory) pdf  
Improve Model Inference Cost with Image Gridding Shreyas Krishnaswamy (University of California, Berkeley)*; Lisa Dunlap (UC Berkeley); Lingjiao Chen (University of Wisconsin-Madison); Matei Zaharia (Stanford and Databricks); James Zou (Stanford University); Joey Gonzalez (Berkeley) pdf  
Towards an Efficient Algorithm for Time Series Forecasting with Anomalies Hao Cheng (University of California, Santa Cruz); Qingsong Wen (Alibaba DAMO Academy)*; Yang Liu (UC Santa Cruz); Liang Sun (Alibaba Group) pdf  
Skill-it! A Data-Driven Skills Framework for Understanding and Training Language Models Mayee Chen (Stanford University)*; Nicholas Roberts (University of Wisconsin-Madison); Kush Bhatia (Stanford University); Jue WANG (Zhejiang University); Ce Zhang (ETH); Frederic Sala (University of Wisconsin-Madison); Christopher Re (Stanford University) pdf  
The Matrix Reloaded: A Counterfactual Perspective on Bias in Machine Learning Andre V Carreiro (Fraunhofer Portugal AICOS); Mariana Pinto (Faculty of Science and Technology, Nova University of Lisbon); Pedro S Madeira (Fraunhofer Portugal AICOS)*; Alberto Lopez (Imprensa Nacional - Casa da Moeda); Hugo Gamboa (LIBPhys, Faculdade de Ciências e Tecnologia, Universidade Noval de Lisboa) pdf  
Graphtester: Exploring Theoretical Boundaries of GNNs on Graph Datasets M. Eren Akbiyik (ETH Zurich)*; Florian Grötschla (ETH Zürich); Beni Egressy (ETH Zurich); Roger Wattenhofer (ETH Zurich) pdf  
No Imputation without Representation Oliver U Lenz (Universiteit Gent)*; Daniel Peralta (Ghent University ); Chris Cornelis (Ghent University) pdf  
L3Cube-MahaSent-MD: A Multi-domain Marathi Sentiment Analysis Dataset and Transformer Models Aabha Pingle (Pune Institute of Computer Technology)*; Aditya Vyawahare (Pune Institute of Computer Technology); Isha Joshi (Pune Institute of Computer Technology); Rahul Tangsali (SCTR’s Pune Institute of Computer Technology); Raviraj Joshi (Indian Institute of Technology Madras) pdf  
Point Cloud Classification with ModelNet40: What is left? Jarne Van den Herrewegen (Oqton / Ghent University)*; Tom Tourwé (Oqton); Francis Wyffels (Ghent University) pdf  
In or Out? Fixing ImageNet Out-of-Distribution Detection Evaluation Julian Bitterwolf (University of Tübingen)*; Maximilian Mueller (University of Tübingen); Matthias Hein (University of Tübingen) pdf  
Localized Data Work as a Precondition for Data-Centric ML: A Case Study of Full Lifecycle Crop Disease Identification in Ghana Darlington Akogo (minoHealth); Issah A Samori (minoHealth AI Labs); Cyril S K Akafia (minoHealth AI Labs); Harriet Dede Fiagbor (minoHealth AI Labs); Andrews A Kangah (KaraAgro AI Labs); Donald Donald (KaraAgro); Kwabena Fuachie (Kara Agro AI); Luis Oala ( Dotphoton AG)* pdf  
Offline Reinforcement Learning with Imbalanced Datasets Li Jiang (Tsinghua University)*; Sijie Chen (Fudan University); Jielin Qiu (Carnegie Mellon University); Haoran Xu (JD Technology); Victor Chan (TBSI); DING ZHAO (Carnegie Mellon University) pdf  
Bayesian Optimisation Against Climate Change: Applications and Benchmarks Sigrid Passano Hellan (University of Edinburgh)*; Christopher Lucas (University of Edinburgh); Nigel Goddard (University of Edinburgh) pdf  
On the Usefulness of Synthetic Tabular Data Generation Dionysis Manousakas (Amazon)*; Sergul Aydore (Amazon) pdf  
Active learning for time instant classification Nauman Ahad (Georgia Institute of Technology)*; Namrata Nadagouda (Georgia Institute of Technology); Eva L Dyer (Georgia Tech); Mark Davenport (Georgia Institute of Technology) pdf  
Speech Wikimedia: A 77 Language Multilingual Speech Dataset Rafael Mosquera Gómez (MLCommons); Julian Eusse (MLCommons); Juan Manual Ciro (Factored); Daniel Galvez (NVIDIA)*; Ryan Hileman (Talon Voice); Kurt Bollacker (The Long Now Foundation); David Kanter (MLCommons) pdf  
Data-Efficient Contrastive Self-supervised Learning: Most Beneficial Examples for Supervised Learning Contribute the Least Siddharth Joshi (UCLA)*; Baharan Mirzasoleiman (UCLA) pdf  
Characterizing the Impacts of Semi-supervised Learning for Weak Supervision Jeffrey Li (University of Washington)*; Jieyu Zhang (University of Washington); Ludwig Schmidt (University of Washington); Alexander J Ratner (University of Washington) pdf  
Estimating label quality and errors in semantic segmentation data via any model Vedang Lad (MIT); Jonas Mueller (Cleanlab)* pdf  
STG-MTL: Scalable Task Grouping for Multi-Task Learning Using Data Maps Ammar Sherif (Nile University)*; Abubakar Abid (Hugging Face); Mustafa Elattar (Nile University); Mohamed ElHelw (Nile University) pdf  
ObjectLab: Automated Diagnosis of Mislabeled Images in Object Detection Data Ulyana Tkachenko (Cleanlab); Aditya Thyagarajan (CleanLab); Jonas Mueller (Cleanlab)* pdf  
Beyond Scale: the Diversity Coefficient as a Data Quality Metric Demonstrates LLMs are Pre-trained on Formally Diverse Data Alycia Y Lee (Stanford University)*; Brando Miranda (Stanford University); Sanmi Koyejo (Stanford University) pdf  
Learning pipeline-invariant representation for robust brain phenotype prediction Xinhui Li (Georgia Institute of Technology)*; Alex Fedorov (Georgia Institute of Technology); Mrinal Mathur (Georgia State University); Anees Abrol (TReNDS); Gregory Kiar (Child Mind Institute); Sergey Plis (Georgia State University); Vince Calhoun (TReNDS) pdf  
Is Pre-training Truly Better Than Meta-Learning? Brando Miranda (Stanford University)*; Patrick Yu (University of Illinois Urbana-Champaign); Saumya Goyal (Stanford University); Yu-Xiong Wang (University of Illinois at Urbana-Champaign); Sanmi Koyejo (Stanford University) pdf  
Adaptive Aggregated Drift Detector Beverly A Quon (University of California, Irvine)*; Jean-Luc Gaudiot (University of California, Irvine) pdf  
On Estimating the Epistemic Uncertainty of Graph Neural Networks using Stochastic Centering Puja Trivedi (University of Michigan)*; Mark Heimann (Lawrence Livermore); Rushil Anirudh (Lawrence Livermore National Laboratory); Danai Koutra (U Michigan); Jayaraman J. Thiagarajan (Lawrence Livermore National Laboratory) pdf  
LabelBench: A Comprehensive Framework for Benchmarking Label-Efficient Learning Jifan Zhang (University of Wisconsin)*; Yifang Chen (University of Washington); Gregory H Canal (University of Wisconsin-Madison); Stephen O Mussmann (University of Washington); Yinglun Zhu (University of Wisconsin-Madison); Simon Du (University of Washington); Kevin Jamieson (U Washington); Robert Nowak (University of Wisconsin, Madison) pdf  
Internet Explorer: Targeted Representation Learning on the Open Web Alexander C Li (Carnegie Mellon University)*; Ellis L Brown (Carnegie Mellon University); Alexei A Efros (UC Berkeley); Deepak Pathak (Carnegie Mellon University) pdf  
Uncovering Neural Scaling Law in Molecular Representation Learning Dingshuo Chen (University of Chinese Academy of Sciences)*; Yanqiao ZHU (University of California, Los Angeles); Jieyu Zhang (University of Washington); Yuanqi Du (Cornell University); Zhixun Li (The Chinese University of Hong Kong); Qiang Liu (Institute of Automation, Chinese Academy of Sciences); Shu Wu (NLPR, China); Liang Wang (NLPR, China) pdf  
MultiLegalPile: A 689GB Multilingual Legal Corpus Joel Niklaus (University of Bern)*; Veton Matoshi (Bern University of Applied Sciences); Matthias Stürmer (University of Bern); Ilias Chalkidis (University of Copenhagen); Daniel Ho (Stanford Law) pdf  
Self-supervised Autoencoder for Correlation-Preserving in Tabular GANs Siddarth Ramesh (Adobe); Surgan Jandial (MDSR Labs, Adobe)*; Gauri Gupta (MIT); Piyush Gupta (Adobe Systems India Pvt Ltd); Balaji Krishnamurthy () pdf  
D4: Improving LLM Pretraining via Document De-Duplication and Diversification Kushal Tirumala (FAIR)*; Daniel Simig (Meta AI); Armen Aghajanyan (FAIR); Ari S Morcos (Facebook AI Research (FAIR)) pdf  
Ensemble Fractional Imputation for Incomplete Categorical Data with a Graphical Model Yonghyun Kwon (Iowa State University)*; Jae Kwang Kim (Iowa State University) pdf  
Put on your detective hat: What’s wrong in this video? Rohith Peddi (The University of Texas at Dallas)*; Shivvrat Arya (The University of Texas at Dallas ); Bharath Challa (The University of Texas at Dallas); Likhitha Pallapothula (University of Texas at Dallas ); AKSHAY VYAS (University of Texas at Dallas); Qifan Zhang (The University of Texas at Dallas); Jikai Wang (University of Texas at Dallas); Vasundhara Komaragiri (UT Dallas); Eric Ragan (University of Florida); Nicholas Ruozzi (UT Dallas); Yu Xiang (The University of Texas at Dallas); Vibhav Gogate (UT Dallas) pdf  

Posters without manuscript

Paper Title Authors
Regularizing Neural Networks with Meta-Learning Generative Models Shin’ya Yamaguchi (NTT / Kyoto University)*; Daiki Chijiwa (NTT); Sekitoshi Kanai (NTT); Atsutoshi Kumagai (NTT Computer and Data Science Laboratories); Hisashi Kashima (Kyoto University)
To Aggregate or Not? Learning with Separate Noisy Labels Jiaheng Wei (UCSC)*; Zhaowei Zhu (University of California, Santa Cruz); Tianyi Luo (Amazon); Ehsan Amid (Google Brain); Abhishek Kumar (Google Brain); Yang Liu (UC Santa Cruz)
Early Experiments in Scalable Dataset Selection for Self-Supervised Learning in Geospatial Imagery Models Muhammed T Razzak (University of Oxford)*; Anthony Ortiz (Microsoft); Caleb Robinson (Microsoft AI for Good Research Lab)
Unitail: A Benchmark for Detecting, Reading, and Matching in Retail Scene Fangyi Chen (Carnegie Mellon University)*; Han Zhang (CMU); Hao Chen (Carnegie Mellon University); Kai Hu (Carnegie Mellon University); Jiachen Dou (Carnegie Mellon University); zaiwang li (pitt); Chenchen Zhu (Meta); Marios Savvides (Carnegie Mellon University)
CD-GraB: Coordinating Distributed Example Orders for Provably Accelerated Training A. Feder Cooper (Cornell University)*; Wentao Guo (Cornell University); Duc Khiem Pham (Cornell University); Tiancheng Yuan (Cornell University); Charlie F Ruan (Cornell University); Yucheng Lu (Cornell University); Christopher De Sa (Cornell University)
How to Cope with Gradual Data Drift? Rasool Fakoor (AWS)*; Jonas Mueller (Cleanlab); Zachary Lipton (Carnegie Mellon University); Pratik A Chaudhari (University of Pennsylvania); Alex J Smola (Amazon)
Programmable Synthetic Tabular Data Generation Mark Vero (ETH Zurich)*; Mislav Balunovic (ETH Zurich); Martin Vechev (ETH Zurich)
Prioritized Trajectory Replay: A Replay Memory for Data-driven Reinforcement Learning Jinyi Liu (Tianjin University)*; Yi Ma (Tianjin University); Jianye Hao (Tianjin University); Yujing Hu (NetEase Fuxi AI Lab); Yan Zheng (Tianjin University); Tangjie Lv (NetEase Fuxi AI Lab); Changjie Fan (NetEase Fuxi AI Lab)
Probing Heterogeneous Pretraining Datasets with Small Curated Datasets Gregory Yauney (Cornell University)*; Emily Reif (Google); David Mimno (Cornell University)
Participatory Personalization in Classification Hailey Joren (UC San Diego)*; Chirag Nagpal (Carnegie Mellon University); Katherine Heller (Google); Berk Ustun (UCSD)
Fair Machine Unlearning: Data Removal while Mitigating Disparities Alex Oesterling (Harvard University)*; Jiaqi Ma (University of Illinois Urbana-Champaign); Flavio Calmon (Harvard University); Himabindu Lakkaraju (Harvard)
Evaluating the Evaluators: Are Current Few-Shot Learning Benchmarks Fit for Purpose? Luísa B Shimabucoro (Universidade de São Paulo); Timothy Hospedales (Edinburgh University); Henry Gouk (University of Edinburgh)*
TMARS: Improving Visual Representations by Circumventing Text Feature Learning Pratyush Maini (IIT Delhi); Sachin Goyal (Carnegie Mellon University)*; Zachary Lipton (Carnegie Mellon University); Zico Kolter (Carnegie Mellon University); Aditi Raghunathan (Carnegie Mellon University)
Do Machine Learning Models Learn Statistical Rules Inferred from Data? Aaditya Naik (University of Pennsylvania)*; Yinjun Wu (University of Pennsylvania); Mayur Naik (University of Pennsylvania); Eric Wong (University of Pennsylvania)
Predicting Article Time Periods with Text2Time: A Transformer-based Approach KARTHICK GUNASEKARAN (KARTHICK GUNASEKARAN)*
Making Scalable Meta Learning Practical Sang Keun Choe (Carnegie Mellon University)*; Sanket Vaibhav Mehta (Carnegie Mellon University); Hwijeen Ahn (Carnegie Mellon University); Willie Neiswanger (Stanford University); Pengtao Xie (UC San Diego); Emma Strubell (Carnegie Mellon University); Eric Xing (MBZUAI, CMU, and Petuum Inc.)
Learning Better with Less: Effective Augmentation for Sample-Efficient Visual Reinforcement Learning Guozheng Ma (Tsinghua University)*; Linrui Zhang (Tsinghua University); Haoyu Wang (Tsinghua University); Lu Li (Tsinghua University); Zilin Wang (Tsinghua University); Zhen Wang (The University of Sydney ); Li Shen (JD Explore Academy); Xueqian Wang (Tsinghua University); Dacheng Tao (The University of Sydney)
Enhancing Time Series Forecasting Models under Concept Drift by Data-centric Online Ensembling Yi-Fan Zhang (NLPR, China)*; Qingsong Wen (Alibaba DAMO Academy); Xue Wang (Alibaba DAMO Academy); Weiqi Chen (Alibaba Group); Liang Sun (Alibaba Group); Zhang Zhang (Institute of Automation, Chinese Academy of Sciences); Liang Wang (NLPR, China); Rong Jin (Twitter); Tieniu Tan (NLPR, China)
Data Banzhaf: A Robust Data Valuation Framework for Machine Learning Jiachen T. Wang (Princeton University)*; Ruoxi Jia (Virginia Tech)
A Privacy-Friendly Approach to Data Valuation Jiachen T. Wang (Princeton University)*; Yuqing Zhu (UC Santa Barbara); Yu-Xiang Wang (UC Santa Barbara); Ruoxi Jia (Virginia Tech); Prateek Mittal (Princeton University)
On Robustness-Accuracy Characterization of Large Language Models using Synthetic Datasets Ching-Yun Ko (MIT)*; Pin-Yu Chen (IBM Research); Payel Das (IBM Research); Yung-Sung Chuang (MIT); Luca Daniel (Massachusetts Institute of Technology)
On Memorization and Privacy risks of Sharpness Aware Minimization Young In Kim (Purdue University)*; Pratiksha Agrawal (Purdue University); Johannes Royset (Naval Postgraduate School); Rajiv Khanna (Purdue University)
Does Progress On Object Recognition Benchmarks Improve Real-World Generalization? Megan Richards (Meta)*; Diane Bouchacourt (Meta); Mark Ibrahim (Meta); Polina Kirichenko (New York University)
On the Reproducibility of Data Valuation under Learning Stochasticity Jiachen T. Wang (Princeton University)*; Feiyang Kang (Virginia Tech); Chiyuan Zhang (MIT); Ruoxi Jia (Virginia Tech); Prateek Mittal (Princeton University)
Why Do Self-Supervised Models Transfer? On Data Augmentation and Feature Properties Linus Ericsson (University of Edinburgh)*; Henry Gouk (University of Edinburgh); Timothy Hospedales (Edinburgh University)
On Data Quality and Speed of Training: Bad Data Slows Training Newsha Ardalani (Meta AI (FAIR))*; Mostafa Elhoushi (Meta); Carole-Jean Wu (Meta AI)
Suboptimal Data Can Bottleneck Scaling Jacob Buckman (Mila)*; Kshitij Gupta (Mila); Ethan Caballero (Mila); Rishabh Agarwal (Google Research, Brain Team); Marc G. Bellemare (Google Brain)
Prediction without Preclusion Recourse Verification with Reachable Sets Avni Kothari (UC San Diego)*; Berk Ustun (UCSD); Lily Weng (UCSD); Bogdan Kulynych (EPFL)
Birds of an Odd Feather: Guaranteed Out-of-Distribution (OOD) Novel Category Detection Yoav Wald (Johns Hopkins)*; Suchi Saria (Johns Hopkins University)
Mobile Internet Quality Estimation using Self-Tuning Kernel Regression Hanyang Jiang (Georgia Institute of Technology)*; Yao Xie (Georgia Tech); Ellen Zegura (Georgia Tech); Elizabeth Belding (University of California, Santa Barbara); Shaowu Yuchi (Georgia Institute of Technology)
Decoupled Graph Label Denoising for Robust Semi-Supervised Node Classification Kaize Ding (Arizona State University)*; Yancheng Wang (Arizona State University); Huan Liu (Arizona State University)
Can Expert Demonstration Guarantee Offline Performance in Sparse Reward Environment? Jeyeon Eo (Soongsil University)*; Dongsu Lee (Soongsil University ); Minhae Kwon (Soongsil University)
Improving multimodal datasets with image captioning Thao T Nguyen (University of Washington)*; Samir Gadre (Columbia University); Gabriel Ilharco (University of Washington); Sewoong Oh (University of Washington); Ludwig Schmidt (University of Washington)
Identifying Implicit Social Biases in Vision-Language Models Kimia Hamidieh (University of Toronto, Vector Institute)*; Haoran Zhang (MIT); Thomas Hartvigsen (MIT); Marzyeh Ghassemi (University of Toronto, Vector Institute)
SemDeDup: Data-efficient learning at web-scale through semantic deduplication Amro Abbas (Meta)*; Daniel Simig (Meta AI); Surya Ganguli (Stanford University); Ari S Morcos (Facebook AI Research (FAIR)); Kushal Tirumala (FAIR)
PhysicsCAP: Dynamic Captions For Natural Scene Changes Hidetomo Sakaino (Weathernews Inc.)*