Machine learning basics

History / Edit / PDF / EPUB / BIB
Created: May 15, 2016 / Updated: December 22, 2018 / Status: in progress / 4 min read (~787 words)

  • Is it currently possible to batch process in parallel? If so, how is the network updated?

math basics
operations (addition, subtraction, multiplication by scalar, matrix/vector and matrix/matrix multiplication, inverse, transpose)

cost function
gradient descent
normal equation
classification (binary vs multiclass)

  • one vs all/rest
    linear/non-linear boundaries
  • reduce the number of features
  • regularization
    neural network
  • layer
  • bias
  • weight
  • input/output
  • activation
  • forward/backward propagation
    training/test/validation sets
    diagnosis of learning algorithm
  • bias (underfit) vs variance (overfit)
    learning curve (error based on training set size)


  • get more training examples -> fix high variance
  • try smaller sets of features -> fix high variance
  • try getting additional features -> fix high bias
  • try adding polynomial features -> fix high bias
  • try decreasing lambda -> fix high bias
  • try increasing lambda -> fix high variance

neural networks architecture

  • small neural network -> prone to underfitting
  • large neural network -> prone to overfitting

how to prioritize your work

  • start with a simple algorithm that can be implemented quickly
    • implement it and test it on cross-validation data
  • plot learning curves to decide if more data, features, etc. are likely to help
  • do error analysis: manually examine the examples (in the cross validation set) that your algorithm made errors on. See if you spot any systematic trend in what type of examples it is making errors on
  • use pareto rule (80/20)
  • use numerical evaluation to determine if a change is an improvement or not

skewed classes

  • precision/recall
  • precision = true positive/# predicted positive = (true+/(true+ + false+))
  • recall = true positive/# actual positive = (true +/(true+ + false-))
  • f1 score = 2pr/(p+r)

support vector machine

  • linear
  • gaussian

unsupervised learning

  • clustering
    • k-means (cluster assignment, move centroid to the cluster's mean location)
    • elbow method (choose the number of clusters automatically)
  • dimensionality reduction (data compression)
    • principal component analysis

To avoid overfitting, the number of parameters estimated from the data must be considerably less than the number of data points

  • Linear regression: estimate the coordinates of a value
  • Logistic regression: answer a yes/no question
  • Softmax classification: answer a multiple choice question

  • Define the task
  • Obtain data (write a data fetcher)
  • Prepare data
  • Implement network architecture
    • Define cost, error and optimize computation methods
  • Feed data into network

import matplotlib.pyplot as plt
from matplotlib import patches
from scipy import ndimage

image = ndimage.imread('file.png', mode='RGB')
fig, ax = plt.subplots(1)
p = [
        (507, 768),
        63, 46,
for patch in p: