Can we derive formulas from neural networks or decision trees?

(updated on June 10, 2020)

I was wondering if we could derive numerical relationships between input variables and output variables from nonlinear model structures such neural networks and decision trees. I find a Q&A on ResearchGate, and I like the answer.

One answer wrote: "It is possible to obtain an equation after developing an ANN for prediction. in fact, you will end up with a long equation including the inputs, weights, biases, etc."

Multi-output Regression Models

(updated on June 10, 2020)

A multi-output regression task predicts multiple numerical properties for each sample (reference).

The article titled "Regression Models with multiple target variables" by Kiran Karkera (link) covers exactly what I am interested. Here are the key points other than the modeling details.

  1. Terminology: multi-output regression or multi-target regression; related terms for classification tasks are multi-label classification, multi-class classification, and multioutput-multiclass classification (aka multi-task classification).
  2. Popular open source ML libraries have little support for the multi-output regression task.

These are the two papers that are mentioned in Kiran Karkera's article.

Here is an article discussing how to develop multi-output regression models with python posted online on March 27, 2020.


2020 OR Talks

(update on April 20, 2020)

Analytics for a Better World Webinars (ABW-W) [more information]

  1. Wednesday April 29, 2020,  EST 11AM (CET 17.00 PM), Speaker: Dimitris Bertsimas (MIT, Cambridge) | on a variety of aspects of COVID-19
  2. Wednesday May 27, 2020, CET 17.00 PM, Speakers: Koen Peters (World Food Programme and Zero Hunger Lab, Tilburg University), Hein Fleuren (Zero Hunger Lab, Tilburg University) | They will highlight Tilburg University's (The Netherlands) work at the United Nations - World Food Programme (WFP) headquarters in Rome

Discrete Optimization Talks (DOTs) [homepage]

Example Videos for Data Analytics

This post is for my students in OPIM3103-008 Spring 2020. Compare these examples to the two for Management Information Systems.

The Math Behind Basketball's Wildest Moves | Rajiv Maheswaran | TED Talks

How data transformed the NBA | The Economist

This is an easter egg. Email me with (1) the following quote; (2) your full name; and (3) the names of three types of Access database objects to me at You will get two bonus points if your answer to (3) is correct and one bonus point if otherwise. This easter egg is active from 0:00 March 30, 2020 to 11:59pm April 5, 2020. (Email me within this time period.)

Here is the quote.

Don't be pushed by your problems. Be led by your dreams. -- Ralph Waldo Emerson

HOWTO: Work with Small Datasets

(updated on January 12, 2020; not complete yet)


In general:

  1. 7 Effective Ways to Deal with a Small Dataset [link]
  2. Dealing with very small datasets [link]
  3. What to do with "small" data? [link]



For Images:

  1. Breaking the curse of small datasets in Machine Learning: Part 1 [link]
  2. Breaking the curse of small data sets in Machine Learning: Part 2 [link]
  3. You can probably use deep learning even if your data isn't that big [link]
  4. Applying deep learning to real-world problems [link]

HOWTO: Data Augmentation

[last updated on January 12, 2020; not complete yet]


Data Augmentation:

  1. Research Guide: Data Augmentation for Deep Learning, [Nearly] Everything you need to know in 2019 [link], keywords: Random Erasing Data Augmentation (2017), AutoAugment: Learning Augmentation Strategies from Data (CVPR 2019), Fast AutoAugment (2019), Learning Data Augmentation Strategies for Object Detection (2019), SpecAugment: for Automatic Speech Recognition (Interspeech 2019), EDA: for Boosting Performance on Text Classification Tasks (EMNLP-IJCNLP 2019), Unsupervised Data Augmentation for Consistency Training (2019)
  2. Data augmentation on entire dataset before splitting [link], conclusion: this practice is incorrect.
  3. How does data augmentation reduce overfitting? [link]


Data Augmentation for Regression Tasks:

Online articles that mentioned DA for regression tasks:

  1. Shehroz Khan's answer to What does the term data augmentation mean in the context of machine learning? [link]
  2. What you need to know about data augmentation for machine learning [link]
  3. Data augmentation techniques for general datasets? [link] (Teng: To me, it seems they were discussing feature engineering instead of adding more data points.)
  4. Data Augmentation Techniques for Cat/Binary/Continuous Numerical Dataset [link], keywords: SMOTE



Data Augmentation for Unbalanced Dataset in Classification Tasks:

  1. Oversampling and undersampling in data analysis [link]
  2. imbalanced-learn [GitHub] [docs]
  3. A collection of 85 minority oversampling techniques (SMOTE) for imbalanced learning with multi-class oversampling and model selection features [docs] [GitHub]
  4. A Deep Dive Into Imbalanced Data: Over-Sampling [link]
  5. SMOTE for high-dimensional class-imbalanced data, Rok Blagus and Lara Lusa, 2013 [link]
  6. SMOTE explained for noobs - Synthetic Minority Over-sampling TEchnique line by line [link]
  7. Detecting representative data and generating synthetic samples to improve learning accuracy with imbalanced data sets [link]
  8. ADASYN: Adaptive Synthetic Sampling Method for Imbalanced Data [link]


Data Augmentation for Image:

  1. Data Augmentation for Deep Learning [link], keywords: image augmentation packages, PyTorch framework
  2. 1000x Faster Data Augmentation [link], keywords: learn augmentation policies, Population Based Augmentation, Tune Framework
  3. A survey on Image Data Augmentation for Deep Learning, Connor Shorten and Taghi M. Khoshgoftaar [link]
  4. Python | Data Augmentation [link]
  5. How to Configure Image Data Augmentation in Keras [link]
  6. Data Augmentation | How to use Deep Learning when you have Limited Data -- Part 2 [link], keywords: online augmentation, offline augmentation
  7. Data augmentation for improving deep learning in image classification problem, Mikolajczyk et al. [link]
  8. The Effectiveness of Data Augmentation in Image Classification using Deep Learning, Jason Wang and Luis Perez [link]


Data Augmentation for Audio:

to be added ...


Data Augmentation for Texts:

  1. These are the Easiest Data Augmentation Techniques in Natural Language Processing you can think of -- and they work. Simple text editing techniques can make huge performance gains for small datasets. [link]


Data Augmentation for Time Series

  1. Data Augmentation strategies for Time Series Forecasting [link]