## 1.1.1 - 1.1.2

There are two important phases in machine learning the *training phase* is the first. This phase combines data and answers known as *training data*. Each pair of the input data and the answer is known as an *example*.

The training process uses the examples to produce automatically discovered *rules*. Although this is done automatically, the rules are not discovered from scratch. A blueprint is provided to arrive at these rules which is called a *model*. This forms a *hypothesis space* for the rules the machine might learn. Without this constraint it wouldn’t be effective to solve problems this way. These models vary in how many layers are in the neural network, what types of layers they are and how they are wired together. The process takes the model and iteratively nudges outputs toward the desired output. This kind of process is called supervised learning.

The second phase is the *inference phase* where the model applies the rules on new data. The fundamental process that enables such learning is the transformation of data from one form to another that more effectively allows the solution to be found. The representation of the data is important in this process and can also be referred to as it’s encoding The search for better representations or encodings of data applies to all machine learning algorithms including neural networks, decision trees and kernal methods.

## 1.1.3 Neural Networks and Deep Learning

Neural networks mimic mammalian brains layered structure. The stages that process data are referred to as *layers*. Each step or layer transforms the representation with the goal of getting the answer. Neural network layers differ from pure functions in that they are stateful.

A layer’s state or memory is usualy stored as its *weight*. The frequently used dense layer operates by multiplying data with a matrix and adding a vector to the result of the matrix multiplication. The matrix and the vector are the dense layer’s weights. This process is repeated with training data systematically reducing the value of the *loss function*. Gradient descent is the primary way in which these neural networks are trained.

*Deep learning* is the study and application of deep neural networks (neural networks with * many layers*). The number of layers is called the model’s depth