Research papers in Data Centric AI
The most promising research in the field of Data Centric AI curated by the Mindkosh team.
TITLE | AUTHORS | PUBLISHED |
---|---|---|
Towards Efficient Data Valuation Based on the Shapley Value | Ruoxi Jia∗, David Dao∗, Boxin Wang, Frances Ann Hubis, Nick Hynes, Nezihe Merve Gurel, Bo Li, Ce Zhang, Dawn Song, Costas Spanos | 2019 |
Right for the Right Reasons: Training Differentiable Models by Constraining their Explanations | Andrew Ross, Michael C. Hughes, and Finale Doshi-Velez | 2017 |
Pervasive Label Errors in Test Sets Destabilize Machine Learning Benchmarks | Curtis G. Northcutt, Anish Athalye, Jonas Mueller | 2021 |
Tabular Engineering with Automunge | Teague, Nicholas* | 2021 |
Automatic Knowledge Augmentation for Generative Commonsense Reasoning | Seo, Jaehyung*; Park, Chanjun; Eo, Sugyeong; Moon, Hyeonseok; Lim, Heuiseok | 2021 |
A Hybrid Bayesian Model to Analyse Healthcare Data | Pourshahrokhi, Narges*; Kouchaki, Samaneh; Kober, Kord; Miaskowski, Christine ; Barnaghi, Payam | 2021 |
Picket: Guarding Against Corrupted Data in Tabular Data during Learning and Inference | Zifan Liu, Zhechun Zhou, Theodoros Rekatsinas | 2021 |
How should human translation coexist with NMT? Efficient tool for building high quality parallel corpus | Park, Chanjun*; Lee, Seolhwa; Moon, Hyeonseok; Eo, Sugyeong; Seo, Jaehyung; Lim, Heuiseok | 2021 |
AirSAS: Controlled Dataset Generation for Physics-Informed Machine Learning | Cowen, Benjamin*; Park, J. Daniel; Blanford, Thomas E.; Goehle, Geoff; Brown, Daniel C. | 2021 |
Decreasing Annotation Burden of Pairwise Comparisons with Human-in-the-Loop Sorting: Application in Medical Image Artifact Rating | Jang, Ikbeom*; Danley, Garrison; Chang, Ken; Kalpathy-Cramer, Jayashree | 2021 |
Sampling To Improve Predictions For Underrepresented Observations In Imbalanced Data | JKjærsgaard, Rune D.*; Grønberg, Manja; Clemmensen, Line | 2021 |
DiagnosisQA: A semi-automated pipeline for developing clinician validated diagnosis specific QA datasets | Mishra, Shreya; Awasthi, Raghav; Papay, Frankie; Maheshwari, Kamal; Cywinski, Jacek; Khanna, Ashish; Mathur, Piyush * | 2021 |
A New Tool for Efficiently Generating Quality Estimation Datasets | Eo, Sugyeong; Park, Chanjun*; Seo, Jaehyung; Moon, Hyeonseok; Lim, Heuiseok | 2021 |
A First Look Towards One-Shot Object Detection with SPOT for Data-Efficient Learning | Chakraborty, Ria*; Popli, Madhur; Lamba, Rachit; Verma, Rishi | 2021 |
YMIR: A Rapid Data-centric Development Platform for Vision Applications | Huang, Phoenix X.; Hu, Wenze*; Brendel, William; Chandraker, Manmohan; Li, Li-Jia; Wang, Xiaoyu | 2021 |
PyHard: a novel tool for generating hardness embeddings to support data-centric analysis | Paiva, Pedro Yuri Arbs*; Smith-Miles, Kate; Valeriano, Maria; Lorena, Ana | 2021 |
Annotation Quality Framework - Accuracy,Credibility, and Consistency | Lavitas, Liliya*; Lee, Allen; Redfield, Olivia; Fletcher, Daniel; Eck, Matthias; Janardhanan, Sunil | 2021 |
Ontolabeling: Re-Thinking Data Labeling For Computer Vision | Croce, Nicola*; Nieto, Marcos | 2021 |
A Data-Centric Approach for Training Deep Neural Networks with Less Data | Motamedi, Mohammad*; Sakharnykh, Nikolay; Kaldewey, Tim | 2021 |
Finding Label Errors in Autonomous Vehicle Data With Learned Observation Assertions | Kang, Daniel*; Arechiga, Nikos; Pillai, Sudeep; Bailis, Peter D; Zaharia, Matei | 2021 |
Single-Click 3D Object Annotation on LiDAR Point Clouds | Trung Nguyen; Binh-Son Hua; Duc Thanh Nguyen; Dinh Phung | 2021 |
A Data-Centric Image Classification Benchmark | Schmarje, Lars*; Liao, Yuan-Hong; Koch, Reinhard | 2021 |
Effect of Radiology Report Labeler Quality on Deep Learning Models for Chest X-Ray Interpretation | Jain, Saahil*; Smit, Akshay; Ng, Andrew; Rajpurkar, Pranav | 2021 |
Automatic Data Quality Evaluation for Text Classification | li, jiazheng* | 2021 |