Can we derive formulas from neural networks or decision trees?

(updated on June 10, 2020)

I was wondering if we could derive numerical relationships between input variables and output variables from nonlinear model structures such neural networks and decision trees. I find a Q&A on ResearchGate, and I like the answer.

One answer wrote: "It is possible to obtain an equation after developing an ANN for prediction. in fact, you will end up with a long equation including the inputs, weights, biases, etc."

Multi-output Regression Models

(updated on June 10, 2020)

A multi-output regression task predicts multiple numerical properties for each sample (reference).


The article titled "Regression Models with multiple target variables" by Kiran Karkera (link) covers exactly what I am interested. Here are the key points other than the modeling details.

  1. Terminology: multi-output regression or multi-target regression; related terms for classification tasks are multi-label classification, multi-class classification, and multioutput-multiclass classification (aka multi-task classification).
  2. Popular open source ML libraries have little support for the multi-output regression task.

These are the two papers that are mentioned in Kiran Karkera's article.


Here is an article discussing how to develop multi-output regression models with python posted online on March 27, 2020.

 

2020 OR Talks

(update on April 20, 2020)

Analytics for a Better World Webinars (ABW-W) [more information]

  1. Wednesday April 29, 2020,  EST 11AM (CET 17.00 PM), Speaker: Dimitris Bertsimas (MIT, Cambridge) | on a variety of aspects of COVID-19
  2. Wednesday May 27, 2020, CET 17.00 PM, Speakers: Koen Peters (World Food Programme and Zero Hunger Lab, Tilburg University), Hein Fleuren (Zero Hunger Lab, Tilburg University) | They will highlight Tilburg University's (The Netherlands) work at the United Nations - World Food Programme (WFP) headquarters in Rome

Discrete Optimization Talks (DOTs) [homepage]

Example Videos for Data Analytics

This post is for my students in OPIM3103-008 Spring 2020. Compare these examples to the two for Management Information Systems.

The Math Behind Basketball's Wildest Moves | Rajiv Maheswaran | TED Talks

How data transformed the NBA | The Economist

This is an easter egg. Email me with (1) the following quote; (2) your full name; and (3) the names of three types of Access database objects to me at teng.huang@uconn.edu. You will get two bonus points if your answer to (3) is correct and one bonus point if otherwise. This easter egg is active from 0:00 March 30, 2020 to 11:59pm April 5, 2020. (Email me within this time period.)

Here is the quote.

Don't be pushed by your problems. Be led by your dreams. -- Ralph Waldo Emerson

HOWTO: Work with Small Datasets

(updated on January 12, 2020; not complete yet)

 

In general:

  1. 7 Effective Ways to Deal with a Small Dataset [link]
  2. Dealing with very small datasets [link]
  3. What to do with "small" data? [link]

 

 

For Images:

  1. Breaking the curse of small datasets in Machine Learning: Part 1 [link]
  2. Breaking the curse of small data sets in Machine Learning: Part 2 [link]
  3. You can probably use deep learning even if your data isn't that big [link]
  4. Applying deep learning to real-world problems [link]

HOWTO: Data Augmentation

[last updated on January 12, 2020; not complete yet]

 

Data Augmentation:

  1. Research Guide: Data Augmentation for Deep Learning, [Nearly] Everything you need to know in 2019 [link], keywords: Random Erasing Data Augmentation (2017), AutoAugment: Learning Augmentation Strategies from Data (CVPR 2019), Fast AutoAugment (2019), Learning Data Augmentation Strategies for Object Detection (2019), SpecAugment: for Automatic Speech Recognition (Interspeech 2019), EDA: for Boosting Performance on Text Classification Tasks (EMNLP-IJCNLP 2019), Unsupervised Data Augmentation for Consistency Training (2019)
  2. Data augmentation on entire dataset before splitting [link], conclusion: this practice is incorrect.
  3. How does data augmentation reduce overfitting? [link]

 

Data Augmentation for Regression Tasks:

Online articles that mentioned DA for regression tasks:

  1. Shehroz Khan's answer to What does the term data augmentation mean in the context of machine learning? [link]
  2. What you need to know about data augmentation for machine learning [link]
  3. Data augmentation techniques for general datasets? [link] (Teng: To me, it seems they were discussing feature engineering instead of adding more data points.)
  4. Data Augmentation Techniques for Cat/Binary/Continuous Numerical Dataset [link], keywords: SMOTE

 

 

Data Augmentation for Unbalanced Dataset in Classification Tasks:

  1. Oversampling and undersampling in data analysis [link]
  2. imbalanced-learn [GitHub] [docs]
  3. A collection of 85 minority oversampling techniques (SMOTE) for imbalanced learning with multi-class oversampling and model selection features [docs] [GitHub]
  4. A Deep Dive Into Imbalanced Data: Over-Sampling [link]
  5. SMOTE for high-dimensional class-imbalanced data, Rok Blagus and Lara Lusa, 2013 [link]
  6. SMOTE explained for noobs - Synthetic Minority Over-sampling TEchnique line by line [link]
  7. Detecting representative data and generating synthetic samples to improve learning accuracy with imbalanced data sets [link]
  8. ADASYN: Adaptive Synthetic Sampling Method for Imbalanced Data [link]

 

Data Augmentation for Image:

  1. Data Augmentation for Deep Learning [link], keywords: image augmentation packages, PyTorch framework
  2. 1000x Faster Data Augmentation [link], keywords: learn augmentation policies, Population Based Augmentation, Tune Framework
  3. A survey on Image Data Augmentation for Deep Learning, Connor Shorten and Taghi M. Khoshgoftaar [link]
  4. Python | Data Augmentation [link]
  5. How to Configure Image Data Augmentation in Keras [link]
  6. Data Augmentation | How to use Deep Learning when you have Limited Data -- Part 2 [link], keywords: online augmentation, offline augmentation
  7. Data augmentation for improving deep learning in image classification problem, Mikolajczyk et al. [link]
  8. The Effectiveness of Data Augmentation in Image Classification using Deep Learning, Jason Wang and Luis Perez [link]

 

Data Augmentation for Audio:

to be added ...

 

Data Augmentation for Texts:

  1. These are the Easiest Data Augmentation Techniques in Natural Language Processing you can think of -- and they work. Simple text editing techniques can make huge performance gains for small datasets. [link]

 

Data Augmentation for Time Series

  1. Data Augmentation strategies for Time Series Forecasting [link]