Welcome to my blog!

Hello! I’m Mayank. I blog about my learnings here which usually revolve around the realm of machine learning, backend engineering, devops and software architecture.

View on GitHub

Understanding Supervised Machine Learning

In supervised machine learning, the goal is to learn a function f(x) that maps input x to output y based on known pairs (x, y) from a dataset. The idea is to find a function f that best fits the data so that we can predict y for new values of x. Linear Models

If the function is linear, it can be represented as: f(x)=wx+bf(x)=wx+b Here, w (weight) and b (bias) are parameters that need to be determined.

Since we assume the function is a straight line, the task is to adjust w and b to minimize the loss function, which measures how far off the predicted values are from the actual values. Common loss functions include Mean Squared Error (MSE) and Mean Absolute Error (MAE). Optimization

An optimizer algorithm is used to find the best values for the parameters w and b. One common optimization technique is Gradient Descent. Instead of brute-forcing through all possible values, Gradient Descent calculates the derivative of the loss function with respect to each parameter and updates the parameters iteratively to minimize the loss. The update step is controlled by a scalar known as the learning rate.

Gradient Descent updates the parameters as follows: θ=θ−α⋅∇L(θ)θ=θ−α⋅∇L(θ) where:

θθ represents the parameters (e.g., w and b)
αα is the learning rate
∇L(θ)∇L(θ) is the gradient of the loss function with respect to θθ

Deep Learning and Complex Functions

In deep learning, the function f(x) is far more complex, often involving millions of parameters. Unlike linear regression, where we know the form of the function beforehand, deep learning models (like Convolutional Neural Networks, CNNs) learn the function directly from data.

To achieve this, deep learning models use activation functions such as the Rectified Linear Unit (ReLU), which introduces non-linearity to the model, allowing it to learn more complex patterns. These functions are applied at each layer of the network, and the model keeps adjusting its parameters until it finds a function that fits the data well.