HOWTO: Work with Small Datasets

(updated on January 12, 2020; not complete yet)

 

In general:

  1. 7 Effective Ways to Deal with a Small Dataset [link]
  2. Dealing with very small datasets [link]
  3. What to do with "small" data? [link]

 

 

For Images:

  1. Breaking the curse of small datasets in Machine Learning: Part 1 [link]
  2. Breaking the curse of small data sets in Machine Learning: Part 2 [link]
  3. You can probably use deep learning even if your data isn't that big [link]
  4. Applying deep learning to real-world problems [link]

HOWTO: Data Augmentation

[last updated on January 12, 2020; not complete yet]

 

Data Augmentation:

  1. Research Guide: Data Augmentation for Deep Learning, [Nearly] Everything you need to know in 2019 [link], keywords: Random Erasing Data Augmentation (2017), AutoAugment: Learning Augmentation Strategies from Data (CVPR 2019), Fast AutoAugment (2019), Learning Data Augmentation Strategies for Object Detection (2019), SpecAugment: for Automatic Speech Recognition (Interspeech 2019), EDA: for Boosting Performance on Text Classification Tasks (EMNLP-IJCNLP 2019), Unsupervised Data Augmentation for Consistency Training (2019)
  2. Data augmentation on entire dataset before splitting [link], conclusion: this practice is incorrect.
  3. How does data augmentation reduce overfitting? [link]

 

Data Augmentation for Regression Tasks:

Online articles that mentioned DA for regression tasks:

  1. Shehroz Khan's answer to What does the term data augmentation mean in the context of machine learning? [link]
  2. What you need to know about data augmentation for machine learning [link]
  3. Data augmentation techniques for general datasets? [link] (Teng: To me, it seems they were discussing feature engineering instead of adding more data points.)
  4. Data Augmentation Techniques for Cat/Binary/Continuous Numerical Dataset [link], keywords: SMOTE

 

 

Data Augmentation for Unbalanced Dataset in Classification Tasks:

  1. Oversampling and undersampling in data analysis [link]
  2. imbalanced-learn [GitHub] [docs]
  3. A collection of 85 minority oversampling techniques (SMOTE) for imbalanced learning with multi-class oversampling and model selection features [docs] [GitHub]
  4. A Deep Dive Into Imbalanced Data: Over-Sampling [link]
  5. SMOTE for high-dimensional class-imbalanced data, Rok Blagus and Lara Lusa, 2013 [link]
  6. SMOTE explained for noobs - Synthetic Minority Over-sampling TEchnique line by line [link]
  7. Detecting representative data and generating synthetic samples to improve learning accuracy with imbalanced data sets [link]
  8. ADASYN: Adaptive Synthetic Sampling Method for Imbalanced Data [link]

 

Data Augmentation for Image:

  1. Data Augmentation for Deep Learning [link], keywords: image augmentation packages, PyTorch framework
  2. 1000x Faster Data Augmentation [link], keywords: learn augmentation policies, Population Based Augmentation, Tune Framework
  3. A survey on Image Data Augmentation for Deep Learning, Connor Shorten and Taghi M. Khoshgoftaar [link]
  4. Python | Data Augmentation [link]
  5. How to Configure Image Data Augmentation in Keras [link]
  6. Data Augmentation | How to use Deep Learning when you have Limited Data -- Part 2 [link], keywords: online augmentation, offline augmentation
  7. Data augmentation for improving deep learning in image classification problem, Mikolajczyk et al. [link]
  8. The Effectiveness of Data Augmentation in Image Classification using Deep Learning, Jason Wang and Luis Perez [link]

 

Data Augmentation for Audio:

to be added ...

 

Data Augmentation for Texts:

  1. These are the Easiest Data Augmentation Techniques in Natural Language Processing you can think of -- and they work. Simple text editing techniques can make huge performance gains for small datasets. [link]

 

Data Augmentation for Time Series

  1. Data Augmentation strategies for Time Series Forecasting [link]