Research papers in Data Centric AI
The most promising research in the field of Data Centric AI curated by the Mindkosh team.
ID | TITLE | AUTHORS | PUBLISHED |
---|---|---|---|
28 | Towards a Shared Rubric for Dataset Annotation | Andrew Marc Greene | 2021 |
27 | Data Augmentation for Intent Classification | Derek Chen, Claire Yin | 2022 |
26 | Fix your Models by Fixing your Datasets | Atindriyo Sanyal, Vikram Chatterji, Nidhi Vyas, Ben Epstein, Nikita Demir, Anthony Corletti | 2021 |
25 | Can machines learn to see without visual databases? | Alessandro Betti, Marco Gori, Stefano Melacci, Marcello Pelillo, Fabio Roli | 2021 |
24 | Towards Efficient Data Valuation Based on the Shapley Value | Ruoxi Jia∗, David Dao∗, Boxin Wang, Frances Ann Hubis, Nick Hynes, Nezihe Merve Gurel, Bo Li, Ce Zhang, Dawn Song, Costas Spanos | 2019 |
23 | Right for the Right Reasons: Training Differentiable Models by Constraining their Explanations | Andrew Ross, Michael C. Hughes, and Finale Doshi-Velez | 2017 |
22 | Pervasive Label Errors in Test Sets Destabilize Machine Learning Benchmarks | Curtis G. Northcutt, Anish Athalye, Jonas Mueller | 2021 |
21 | Tabular Engineering with Automunge | Teague, Nicholas* | 2021 |
20 | Automatic Knowledge Augmentation for Generative Commonsense Reasoning | Seo, Jaehyung*; Park, Chanjun; Eo, Sugyeong; Moon, Hyeonseok; Lim, Heuiseok | 2021 |
19 | A Hybrid Bayesian Model to Analyse Healthcare Data | Pourshahrokhi, Narges*; Kouchaki, Samaneh; Kober, Kord; Miaskowski, Christine ; Barnaghi, Payam | 2021 |
18 | Picket: Guarding Against Corrupted Data in Tabular Data during Learning and Inference | Zifan Liu, Zhechun Zhou, Theodoros Rekatsinas | 2021 |
17 | How should human translation coexist with NMT? Efficient tool for building high quality parallel corpus | Park, Chanjun*; Lee, Seolhwa; Moon, Hyeonseok; Eo, Sugyeong; Seo, Jaehyung; Lim, Heuiseok | 2021 |
16 | AirSAS: Controlled Dataset Generation for Physics-Informed Machine Learning | Cowen, Benjamin*; Park, J. Daniel; Blanford, Thomas E.; Goehle, Geoff; Brown, Daniel C. | 2021 |
15 | Decreasing Annotation Burden of Pairwise Comparisons with Human-in-the-Loop Sorting: Application in Medical Image Artifact Rating | Jang, Ikbeom*; Danley, Garrison; Chang, Ken; Kalpathy-Cramer, Jayashree | 2021 |
14 | Sampling To Improve Predictions For Underrepresented Observations In Imbalanced Data | JKjærsgaard, Rune D.*; Grønberg, Manja; Clemmensen, Line | 2021 |
13 | DiagnosisQA: A semi-automated pipeline for developing clinician validated diagnosis specific QA datasets | Mishra, Shreya; Awasthi, Raghav; Papay, Frankie; Maheshwari, Kamal; Cywinski, Jacek; Khanna, Ashish; Mathur, Piyush * | 2021 |
12 | A New Tool for Efficiently Generating Quality Estimation Datasets | Eo, Sugyeong; Park, Chanjun*; Seo, Jaehyung; Moon, Hyeonseok; Lim, Heuiseok | 2021 |
11 | A First Look Towards One-Shot Object Detection with SPOT for Data-Efficient Learning | Chakraborty, Ria*; Popli, Madhur; Lamba, Rachit; Verma, Rishi | 2021 |
10 | YMIR: A Rapid Data-centric Development Platform for Vision Applications | Huang, Phoenix X.; Hu, Wenze*; Brendel, William; Chandraker, Manmohan; Li, Li-Jia; Wang, Xiaoyu | 2021 |
9 | PyHard: a novel tool for generating hardness embeddings to support data-centric analysis | Paiva, Pedro Yuri Arbs*; Smith-Miles, Kate; Valeriano, Maria; Lorena, Ana | 2021 |
8 | Annotation Quality Framework - Accuracy,Credibility, and Consistency | Lavitas, Liliya*; Lee, Allen; Redfield, Olivia; Fletcher, Daniel; Eck, Matthias; Janardhanan, Sunil | 2021 |
7 | Ontolabeling: Re-Thinking Data Labeling For Computer Vision | Croce, Nicola*; Nieto, Marcos | 2021 |
6 | A Data-Centric Approach for Training Deep Neural Networks with Less Data | Motamedi, Mohammad*; Sakharnykh, Nikolay; Kaldewey, Tim | 2021 |
5 | Finding Label Errors in Autonomous Vehicle Data With Learned Observation Assertions | Kang, Daniel*; Arechiga, Nikos; Pillai, Sudeep; Bailis, Peter D; Zaharia, Matei | 2021 |
4 | Single-Click 3D Object Annotation on LIDAR Point Clouds | Trung Nguyen; Binh-Son Hua; Duc Thanh Nguyen; Dinh Phung | 2021 |
3 | A Data-Centric Image Classification Benchmark | Schmarje, Lars*; Liao, Yuan-Hong; Koch, Reinhard | 2021 |
2 | Effect of Radiology Report Labeler Quality on Deep Learning Models for Chest X-Ray Interpretation | Jain, Saahil*; Smit, Akshay; Ng, Andrew; Rajpurkar, Pranav | 2021 |
1 | Automatic Data Quality Evaluation for Text Classification | li, jiazheng* | 2021 |