Data Mining

John Samuel
CPE Lyon

Year: 2019-2020
Email: john(dot)samuel(at)cpe(dot)fr

Goals

Artifical Neural Networks
Deep Learning
Reinforcement Learning
Data Licences, Ethics and Privacy

Inspired by biological neural networks
Collection of connected nodes called artificial neurons.
Artificial neurons can transmit signal from one to another (like in a synapse).
Signal between artificial neurons is a real number
The output of a neuron is the sum of weighted inputs.

Artificial neural networks

Perceptron

Algorithm for supervised learning of binary classifiers
Binary classifier

Artificial neural networks

Perceptron: Formal definition

Let y = f(z) be output of perceptron for an input vector z
Let N be the number of training examples
Let X be the input feature space
Let {(x₁, d₁),...,(x_N, d_N)} be the N training examples, where
- x_i is the feature vector of i^th training example.
- d_i is the desired output value.
- x_j,i be the i^th feature of j^th training example.
- x_j,0 = 1

Artificial neural networks

Perceptron: Formal definition

Weights are represented in the following manner:
- w_i is the i^th value of weight vector.
- w_i(t) is the i^th value of weight vector at a given time t.

Artificial neural networks

Perceptron: Steps

Initialize weights and threshold
For each example (x_j, d_j) in training set
- Calculate the weight: y_j(t)=f[w(t).x_j]
- Update the weights: w_i(t + 1) = w_i(t) + (d_j-y_j(t))x_j,i
Repeat step 2 until the iteration error 1/s (Σ |d_j - y_j(t)|) is less than user-specified threshold.

Backpropagation

Backward propagation of errors
Adjust the weight of neurons by calculating the gradient of the loss function
Error is calculated and propagated back to the network layers

Deep neural networks

Multiple hidden layers between the input and output layers

Applications

Computer vision
Speech recognition
Drug design
Natural language processing
Machine translation

Convolutional deep neural networks

Analysis of images
Inspired by neurons in the virtual cortex
Network learns the filters

Inspired by behaviourist psychology
Actions to be taken in order to maximize the cumulative award.

Data usage licences
Confidentiality and Privacy
Ethics

Big Data

Volume
Variety
Velocity
Veracity
Value

Privacy

Open Data

Linked Open data cloud

Archived data

Online resources

Colors

Color Tool - Material Design

Images

Wikimedia Commons