Understanding Supervised Machine Learning
In supervised machine learning, the goal is to learn a function f(x) that maps input x to output y based on known pairs (x, y) from a dataset. The idea is to find a function f that best fits the data so that we can predict y for new values of x. Linear Models
If the function is linear, it can be represented as: f(x)=wx+bf(x)=wx+b Here, w (weight) and b (bias) are parameters that need to be determined.
Since we assume the function is a straight line, the task is to adjust w and b to minimize the loss function, which measures how far off the predicted values are from the actual values. Common loss functions include Mean Squared Error (MSE) and Mean Absolute Error (MAE). Optimization
An optimizer algorithm is used to find the best values for the parameters w and b. One common optimization technique is Gradient Descent. Instead of brute-forcing through all possible values, Gradient Descent calculates the derivative of the loss function with respect to each parameter and updates the parameters iteratively to minimize the loss. The update step is controlled by a scalar known as the learning rate.
Gradient Descent updates the parameters as follows: θ=θ−α⋅∇L(θ)θ=θ−α⋅∇L(θ) where:
θθ represents the parameters (e.g., w and b)
αα is the learning rate
∇L(θ)∇L(θ) is the gradient of the loss function with respect to θθ
Deep Learning and Complex Functions
In deep learning, the function f(x) is far more complex, often involving millions of parameters. Unlike linear regression, where we know the form of the function beforehand, deep learning models (like Convolutional Neural Networks, CNNs) learn the function directly from data.
To achieve this, deep learning models use activation functions such as the Rectified Linear Unit (ReLU), which introduces non-linearity to the model, allowing it to learn more complex patterns. These functions are applied at each layer of the network, and the model keeps adjusting its parameters until it finds a function that fits the data well.