Neural Networks: life as an alchemy

[ Last Updated: 2024-07-21 ]

After two weeks of deep-diving into code and feeling the soul-crushing weight of hyperparameter tuning, I’ve finally gathered the courage to write about Neural Networks. The core of this algorithm is incredibly "brute-force," yet the implementation details require immense patience.

What is a Neural Network?

Neural Networks (NN) aren't new; they gained academic attention back in the 1980s, inspired by the biological interactions between neurons in the human brain. While engineers love this mechanical analogy, a modern NN's inner workings are quite distinct from an actual human brain.

After a period of stagnation due to limited data and computing power, the field exploded recently under the name "Deep Learning." Today, NNs solve complex tasks in image recognition, natural language processing, and more. Let's break down what actually happens inside the "box."

The Components of a Neural Network

NeuralNetwork

1. Hidden Layers

The most significant difference between a simple regression and a Neural Network is the inclusion of Hidden Layers. These layers allow the model to capture interactions between input features to create new, higher-level information.

NeuralNetworkSimple

Imagine predicting if someone will buy a product based on Price ( $X_1$ ) and Utility ( $X_2$ ). A hidden layer might combine these to create new "indices":

Value-for-Money Index: High utility + Low price.
Necessity Index: High utility (where price matters less, like a washing machine).
Bargain Hunter Index: Extremely low price, even if utility is low.

By adding more layers, these indices can interact with each other, creating even more complex patterns.

2. Activation Functions: The "On/Off" Switch

If every neuron only performed linear calculations (multiplication and addition), the entire network would just collapse into one big linear function. To capture non-linear patterns, we use Activation Functions.

activationfunction

ReLU (Rectified Linear Unit): $f(x) = \max(0, x)$ . It sets all negative inputs to 0 and keeps positive inputs as they are. It is computationally efficient and speeds up convergence compared to Sigmoid.
Sigmoid: Maps values to a (0, 1) range, often used in the output layer for binary classification.

In our "Buy/No-Buy" example, if an item is useful but expensive, the "Bargain Hunter" neuron might output a negative value. ReLU would flip that neuron "off" (setting it to 0) so it doesn't negatively affect the final decision in a linear way.

3. The Output Layer and Loss Functions

The Output Layer produces the final prediction. The choice of activation function here depends on the task:

Binary Classification: Sigmoid.
Multi-class Classification: Softmax.
Regression: Linear function.

Softmax is particularly cool for multi-class problems (like identifying digits 0-9). It turns raw scores into probabilities that sum up to 1: $\text{Softmax}(z_i) = \frac{e^{z_i}}{\sum_{j=1}^{k} e^{z_j}}$ The Zi is the score to show how it belongs to the category i. softmax We then use Cross-Entropy Loss to measure how far our prediction is from the truth. $L = -\sum_{i=1}^{K} y_i \log(\hat{y}_i)$

4. Backpropagation: The "Recall" Process

How does the network learn? Through Backpropagation. We calculate the error at the output and use the Chain Rule from calculus to "propagate" that error backward through the layers, updating every weight ( $w$ ) and bias ( $b$ ) along the way.

$\frac{\partial L}{\partial W^{[1]}} = \frac{\partial L}{\partial a^{[2]}} \cdot \frac{\partial a^{[2]}}{\partial z^{[2]}} \cdot \frac{\partial z^{[2]}}{\partial a^{[1]}} \cdot \frac{\partial a^{[1]}}{\partial z^{[1]}} \cdot \frac{\partial z^{[1]}}{\partial W^{[1]}}$

It’s a massive, iterative game of "tuning the knobs" until the total error is minimized.

Conclusion: The "Alchemy" of Coefficients

In a deep network, individual neurons lose their obvious "meaning" (like our "Bargain Hunter" example). Below is a visualization of 60 neurons' weights from my handwritten digit recognition (MNIST) project. Some look like strokes of numbers; others look like random static.

neurons

It’s magical that these "static" patterns, when combined, can recognize digits with incredible accuracy. It feels a bit like alchemy—if you provide enough data and turn the knobs long enough, you eventually find gold.

I’m currently looking into Convolutional Neural Networks (CNNs) for better performance on MNIST, though that might take another week or two to write up!

Cat

Honks:

Honestly, successfully "demystifying" Neural Networks for myself has been a highlight of the month. They aren't "unfathomable"—just complex and incredibly powerful.

— Untitled Penguin 2024/07/21 18:30

Manifecto and Honks