28 Towards a Shared Rubric for Dataset Annotation Andrew Marc Greene 2021
27 Data Augmentation for Intent Classification Derek Chen, Claire Yin 2022
26 Fix your Models by Fixing your Datasets Atindriyo Sanyal, Vikram Chatterji, Nidhi Vyas, Ben Epstein, Nikita Demir, Anthony Corletti 2021
25 Can machines learn to see without visual databases? Alessandro Betti, Marco Gori, Stefano Melacci, Marcello Pelillo, Fabio Roli 2021
24 Towards Efficient Data Valuation Based on the Shapley Value Ruoxi Jia∗, David Dao∗, Boxin Wang, Frances Ann Hubis, Nick Hynes, Nezihe Merve Gurel, Bo Li, Ce Zhang, Dawn Song, Costas Spanos 2019
23 Right for the Right Reasons: Training Differentiable Models by Constraining their Explanations Andrew Ross, Michael C. Hughes, and Finale Doshi-Velez 2017
22 Pervasive Label Errors in Test Sets Destabilize Machine Learning Benchmarks Curtis G. Northcutt, Anish Athalye, Jonas Mueller 2021
21 Tabular Engineering with Automunge Teague, Nicholas* 2021
20 Automatic Knowledge Augmentation for Generative Commonsense Reasoning Seo, Jaehyung*; Park, Chanjun; Eo, Sugyeong; Moon, Hyeonseok; Lim, Heuiseok 2021
19 A Hybrid Bayesian Model to Analyse Healthcare Data Pourshahrokhi, Narges*; Kouchaki, Samaneh; Kober, Kord; Miaskowski, Christine ; Barnaghi, Payam 2021
18 Picket: Guarding Against Corrupted Data in Tabular Data during Learning and Inference Zifan Liu, Zhechun Zhou, Theodoros Rekatsinas 2021
17 How should human translation coexist with NMT? Efficient tool for building high quality parallel corpus Park, Chanjun*; Lee, Seolhwa; Moon, Hyeonseok; Eo, Sugyeong; Seo, Jaehyung; Lim, Heuiseok 2021
16 AirSAS: Controlled Dataset Generation for Physics-Informed Machine Learning Cowen, Benjamin*; Park, J. Daniel; Blanford, Thomas E.; Goehle, Geoff; Brown, Daniel C. 2021
15 Decreasing Annotation Burden of Pairwise Comparisons with Human-in-the-Loop Sorting: Application in Medical Image Artifact Rating Jang, Ikbeom*; Danley, Garrison; Chang, Ken; Kalpathy-Cramer, Jayashree 2021
14 Sampling To Improve Predictions For Underrepresented Observations In Imbalanced Data JKjærsgaard, Rune D.*; Grønberg, Manja; Clemmensen, Line 2021
13 DiagnosisQA: A semi-automated pipeline for developing clinician validated diagnosis specific QA datasets Mishra, Shreya; Awasthi, Raghav; Papay, Frankie; Maheshwari, Kamal; Cywinski, Jacek; Khanna, Ashish; Mathur, Piyush * 2021
12 A New Tool for Efficiently Generating Quality Estimation Datasets Eo, Sugyeong; Park, Chanjun*; Seo, Jaehyung; Moon, Hyeonseok; Lim, Heuiseok 2021
11 A First Look Towards One-Shot Object Detection with SPOT for Data-Efficient Learning Chakraborty, Ria*; Popli, Madhur; Lamba, Rachit; Verma, Rishi 2021
10 YMIR: A Rapid Data-centric Development Platform for Vision Applications Huang, Phoenix X.; Hu, Wenze*; Brendel, William; Chandraker, Manmohan; Li, Li-Jia; Wang, Xiaoyu 2021
9 PyHard: a novel tool for generating hardness embeddings to support data-centric analysis Paiva, Pedro Yuri Arbs*; Smith-Miles, Kate; Valeriano, Maria; Lorena, Ana 2021
8 Annotation Quality Framework - Accuracy,Credibility, and Consistency Lavitas, Liliya*; Lee, Allen; Redfield, Olivia; Fletcher, Daniel; Eck, Matthias; Janardhanan, Sunil 2021
7 Ontolabeling: Re-Thinking Data Labeling For Computer Vision Croce, Nicola*; Nieto, Marcos 2021
6 A Data-Centric Approach for Training Deep Neural Networks with Less Data Motamedi, Mohammad*; Sakharnykh, Nikolay; Kaldewey, Tim 2021
5 Finding Label Errors in Autonomous Vehicle Data With Learned Observation Assertions Kang, Daniel*; Arechiga, Nikos; Pillai, Sudeep; Bailis, Peter D; Zaharia, Matei 2021
4 Single-Click 3D Object Annotation on LiDAR Point Clouds Trung Nguyen; Binh-Son Hua; Duc Thanh Nguyen; Dinh Phung 2021
3 A Data-Centric Image Classification Benchmark Schmarje, Lars*; Liao, Yuan-Hong; Koch, Reinhard 2021
2 Effect of Radiology Report Labeler Quality on Deep Learning Models for Chest X-Ray Interpretation Jain, Saahil*; Smit, Akshay; Ng, Andrew; Rajpurkar, Pranav 2021
1 Automatic Data Quality Evaluation for Text Classification li, jiazheng* 2021