Towards Efficient Data Valuation Based on the Shapley Value Ruoxi Jia∗, David Dao∗, Boxin Wang, Frances Ann Hubis, Nick Hynes, Nezihe Merve Gurel, Bo Li, Ce Zhang, Dawn Song, Costas Spanos 2019
Right for the Right Reasons: Training Differentiable Models by Constraining their Explanations Andrew Ross, Michael C. Hughes, and Finale Doshi-Velez 2017
Pervasive Label Errors in Test Sets Destabilize Machine Learning Benchmarks Curtis G. Northcutt, Anish Athalye, Jonas Mueller 2021
Tabular Engineering with Automunge Teague, Nicholas* 2021
Automatic Knowledge Augmentation for Generative Commonsense Reasoning Seo, Jaehyung*; Park, Chanjun; Eo, Sugyeong; Moon, Hyeonseok; Lim, Heuiseok 2021
A Hybrid Bayesian Model to Analyse Healthcare Data Pourshahrokhi, Narges*; Kouchaki, Samaneh; Kober, Kord; Miaskowski, Christine ; Barnaghi, Payam 2021
Picket: Guarding Against Corrupted Data in Tabular Data during Learning and Inference Zifan Liu, Zhechun Zhou, Theodoros Rekatsinas 2021
How should human translation coexist with NMT? Efficient tool for building high quality parallel corpus Park, Chanjun*; Lee, Seolhwa; Moon, Hyeonseok; Eo, Sugyeong; Seo, Jaehyung; Lim, Heuiseok 2021
AirSAS: Controlled Dataset Generation for Physics-Informed Machine Learning Cowen, Benjamin*; Park, J. Daniel; Blanford, Thomas E.; Goehle, Geoff; Brown, Daniel C. 2021
Decreasing Annotation Burden of Pairwise Comparisons with Human-in-the-Loop Sorting: Application in Medical Image Artifact Rating Jang, Ikbeom*; Danley, Garrison; Chang, Ken; Kalpathy-Cramer, Jayashree 2021
Sampling To Improve Predictions For Underrepresented Observations In Imbalanced Data JKjærsgaard, Rune D.*; Grønberg, Manja; Clemmensen, Line 2021
DiagnosisQA: A semi-automated pipeline for developing clinician validated diagnosis specific QA datasets Mishra, Shreya; Awasthi, Raghav; Papay, Frankie; Maheshwari, Kamal; Cywinski, Jacek; Khanna, Ashish; Mathur, Piyush * 2021
A New Tool for Efficiently Generating Quality Estimation Datasets Eo, Sugyeong; Park, Chanjun*; Seo, Jaehyung; Moon, Hyeonseok; Lim, Heuiseok 2021
A First Look Towards One-Shot Object Detection with SPOT for Data-Efficient Learning Chakraborty, Ria*; Popli, Madhur; Lamba, Rachit; Verma, Rishi 2021
YMIR: A Rapid Data-centric Development Platform for Vision Applications Huang, Phoenix X.; Hu, Wenze*; Brendel, William; Chandraker, Manmohan; Li, Li-Jia; Wang, Xiaoyu 2021
PyHard: a novel tool for generating hardness embeddings to support data-centric analysis Paiva, Pedro Yuri Arbs*; Smith-Miles, Kate; Valeriano, Maria; Lorena, Ana 2021
Annotation Quality Framework - Accuracy,Credibility, and Consistency Lavitas, Liliya*; Lee, Allen; Redfield, Olivia; Fletcher, Daniel; Eck, Matthias; Janardhanan, Sunil 2021
Ontolabeling: Re-Thinking Data Labeling For Computer Vision Croce, Nicola*; Nieto, Marcos 2021
A Data-Centric Approach for Training Deep Neural Networks with Less Data Motamedi, Mohammad*; Sakharnykh, Nikolay; Kaldewey, Tim 2021
Finding Label Errors in Autonomous Vehicle Data With Learned Observation Assertions Kang, Daniel*; Arechiga, Nikos; Pillai, Sudeep; Bailis, Peter D; Zaharia, Matei 2021
Single-Click 3D Object Annotation on LiDAR Point Clouds Trung Nguyen; Binh-Son Hua; Duc Thanh Nguyen; Dinh Phung 2021
A Data-Centric Image Classification Benchmark Schmarje, Lars*; Liao, Yuan-Hong; Koch, Reinhard 2021
Effect of Radiology Report Labeler Quality on Deep Learning Models for Chest X-Ray Interpretation Jain, Saahil*; Smit, Akshay; Ng, Andrew; Rajpurkar, Pranav 2021
Automatic Data Quality Evaluation for Text Classification li, jiazheng* 2021