Research Papers in Data Centric AI

NEW

ID	TITLE	AUTHORS	PUBLISHED
28	Towards a Shared Rubric for Dataset Annotation	Andrew Marc Greene	2021
27	Data Augmentation for Intent Classification	Derek Chen, Claire Yin	2022
26	Fix your Models by Fixing your Datasets	Atindriyo Sanyal, Vikram Chatterji, Nidhi Vyas, Ben Epstein, Nikita Demir, Anthony Corletti	2021
25	Can machines learn to see without visual databases?	Alessandro Betti, Marco Gori, Stefano Melacci, Marcello Pelillo, Fabio Roli	2021
24	Towards Efficient Data Valuation Based on the Shapley Value	Ruoxi Jia∗, David Dao∗, Boxin Wang, Frances Ann Hubis, Nick Hynes, Nezihe Merve Gurel, Bo Li, Ce Zhang, Dawn Song, Costas Spanos	2019
23	Right for the Right Reasons: Training Differentiable Models by Constraining their Explanations	Andrew Ross, Michael C. Hughes, and Finale Doshi-Velez	2017
22	Pervasive Label Errors in Test Sets Destabilize Machine Learning Benchmarks	Curtis G. Northcutt, Anish Athalye, Jonas Mueller	2021
21	Tabular Engineering with Automunge	Teague, Nicholas*	2021
20	Automatic Knowledge Augmentation for Generative Commonsense Reasoning	Seo, Jaehyung*; Park, Chanjun; Eo, Sugyeong; Moon, Hyeonseok; Lim, Heuiseok	2021
19	A Hybrid Bayesian Model to Analyse Healthcare Data	Pourshahrokhi, Narges*; Kouchaki, Samaneh; Kober, Kord; Miaskowski, Christine ; Barnaghi, Payam	2021
18	Picket: Guarding Against Corrupted Data in Tabular Data during Learning and Inference	Zifan Liu, Zhechun Zhou, Theodoros Rekatsinas	2021
17	How should human translation coexist with NMT? Efficient tool for building high quality parallel corpus	Park, Chanjun*; Lee, Seolhwa; Moon, Hyeonseok; Eo, Sugyeong; Seo, Jaehyung; Lim, Heuiseok	2021
16	AirSAS: Controlled Dataset Generation for Physics-Informed Machine Learning	Cowen, Benjamin*; Park, J. Daniel; Blanford, Thomas E.; Goehle, Geoff; Brown, Daniel C.	2021
15	Decreasing Annotation Burden of Pairwise Comparisons with Human-in-the-Loop Sorting: Application in Medical Image Artifact Rating	Jang, Ikbeom*; Danley, Garrison; Chang, Ken; Kalpathy-Cramer, Jayashree	2021
14	Sampling To Improve Predictions For Underrepresented Observations In Imbalanced Data	JKjærsgaard, Rune D.*; Grønberg, Manja; Clemmensen, Line	2021
13	DiagnosisQA: A semi-automated pipeline for developing clinician validated diagnosis specific QA datasets	Mishra, Shreya; Awasthi, Raghav; Papay, Frankie; Maheshwari, Kamal; Cywinski, Jacek; Khanna, Ashish; Mathur, Piyush *	2021
12	A New Tool for Efficiently Generating Quality Estimation Datasets	Eo, Sugyeong; Park, Chanjun*; Seo, Jaehyung; Moon, Hyeonseok; Lim, Heuiseok	2021
11	A First Look Towards One-Shot Object Detection with SPOT for Data-Efficient Learning	Chakraborty, Ria*; Popli, Madhur; Lamba, Rachit; Verma, Rishi	2021
10	YMIR: A Rapid Data-centric Development Platform for Vision Applications	Huang, Phoenix X.; Hu, Wenze*; Brendel, William; Chandraker, Manmohan; Li, Li-Jia; Wang, Xiaoyu	2021
9	PyHard: a novel tool for generating hardness embeddings to support data-centric analysis	Paiva, Pedro Yuri Arbs*; Smith-Miles, Kate; Valeriano, Maria; Lorena, Ana	2021
8	Annotation Quality Framework - Accuracy,Credibility, and Consistency	Lavitas, Liliya*; Lee, Allen; Redfield, Olivia; Fletcher, Daniel; Eck, Matthias; Janardhanan, Sunil	2021
7	Ontolabeling: Re-Thinking Data Labeling For Computer Vision	Croce, Nicola*; Nieto, Marcos	2021
6	A Data-Centric Approach for Training Deep Neural Networks with Less Data	Motamedi, Mohammad*; Sakharnykh, Nikolay; Kaldewey, Tim	2021
5	Finding Label Errors in Autonomous Vehicle Data With Learned Observation Assertions	Kang, Daniel*; Arechiga, Nikos; Pillai, Sudeep; Bailis, Peter D; Zaharia, Matei	2021
4	Single-Click 3D Object Annotation on LIDAR Point Clouds	Trung Nguyen; Binh-Son Hua; Duc Thanh Nguyen; Dinh Phung	2021
3	A Data-Centric Image Classification Benchmark	Schmarje, Lars*; Liao, Yuan-Hong; Koch, Reinhard	2021
2	Effect of Radiology Report Labeler Quality on Deep Learning Models for Chest X-Ray Interpretation	Jain, Saahil*; Smit, Akshay; Ng, Andrew; Rajpurkar, Pranav	2021
1	Automatic Data Quality Evaluation for Text Classification	li, jiazheng*	2021

Explore our list of open source tools on Data Centric AI

Data labeling platform →