- Know how images and other perceptual data such as audio are represented as multidimensional tensors
- Know what convnets are, how they works and what makes them well suited to machine learning with images.
- Know how to write and train data to solve the task of classifying hand written digits
- Know how to train models with Node.js
- Know how to use convnets for spoken word recognition
4.1 Vectors to Tensors: Representing Image Data
Representing an image requires a 3D tensor. The first two dimensions are the height and width of the image, whereas the third dimension has a size of 3 and is referred to as the color channel. Grayscale or blackand white images only require a color channel tensor of size 1.
This method of encoding is called a height width channel or HWC
Multiple images can be batched together using a tensor shap of NHWC where N is the number of images in the batch
4.1.1 The MNIST Dataset
When a dataset is said to be balanced this means there are approximately equal numbers of examples for the categories in the classification.
Given the problem of classifying an image into its handwritten number representation we know the input of the problem and an output format. The input will be the NHWC format and the output shape will be a tensor [null,10] where 10 is a one hot encoded tensor indicating which digit is represented in the image.
Convnets stands for convolutional¹ networks ⁽ ¹ ⁾ convolutional is just a type of mathematical operation