Computer Vision (1)
Generative AI (2)
Machine Learning Basics (18)
Deep Learning (52)
- DL Basics (16)
- DL Architectures (17)
  - Feedforward Network / MLP (2)
  - Sequence models (6)
  - Transformers (9)
- DL Training and Optimization (17)
Natural Language Processing (27)
- NLP Data Preparation (18)
Supervised Learning (115)
- Regression (41)
  - Linear Regression (26)
  - Generalized Linear Models (9)
  - Regularization (6)
- Classification (70)
  - Logistic Regression (10)
  - Support Vector Machine (9)
  - Ensemble Learning (24)
  - Other Classification Models (9)
  - Classification Evaluations (9)
Unsupervised Learning (55)
- Clustering (37)
  - Distance Measures (9)
  - K-Means Clustering (9)
  - Hierarchical Clustering (3)
  - Gaussian Mixture Models (5)
  - Clustering Evaluations (6)
- Dimensionality Reduction (9)
Statistics (34)
Data Preparation (35)
- Feature Engineering (30)
- Sampling Techniques (5)

What are the key hyper-parameters of a neural network model?

Categories: DL Training and Optimization

Updated: March 9, 2024

Hyper-parameters in a Neural Network Model

Title: Hyperparameters of a Neural Network
Source: AIML.com Research

Hyperparameters are the configuration settings of a neural network that are set before the training process begins. These parameters are not learned during training, instead they influence how the model learns and generalizes from the data. Here are some key hyperparameters of a neural network model:

Weight initialization at the beginning of the network training. This can impact how quickly the model converges. Eg: random initialization, Xavier/Glorot initialization, and He initialization
Number of hidden layers, also known as depth of the network, impact the learning capacity and generalization ability of the network
Number of neurons per layer impacts the network’s capacity to capture complex patterns. Larger layers can capture more intricate features but might also lead to overfitting
Activation function introduces non-linearity into the model, allowing it to capture complex relationships in the data. Eg: ReLU (Rectified Linear Unit), tanh, Leaky ReLU
Loss function quantifies the difference between the predicted and actual values. Neural Network training aims to minimize the loss function. The choice of loss function depends on the type of problem you’re solving. Eg: Mean Squared Error (MSE), Huber Loss for regression tasks, and Cross-entropy for classification
Optimization algorithm determines how the model’s weights are updated during training. Eg: Stochastic Gradient Descent (SGD), Adam, RMSProp
Learning Rate determines the step size taken during weight updates in the optimization process. A larger learning rate can lead to faster convergence but might risk overshooting the optimal parameters. A smaller learning rate can lead to slower convergence but with more stable adjustments. Some of the popular learning rate values used are 0.1, 0.01, 0.001

It’s common to start with a larger learning rate and then decrease it gradually during training using techniques like learning rate decay. The idea behind learning rate decay is that at the beginning of training, when the model’s parameters are far from the optimal values, using a larger learning rate can help the model converge faster. However, as training progresses and the parameters get closer to the optimal values, a smaller learning rate might be more appropriate to ensure that the model converges accurately and doesn’t overshoot the optimal solution.
Batch Size is the number of training examples used in a single iteration of gradient descent. Typical values for batch size are 16, 32, 64, 128, 256, 512 and 1024.
Epochs is the number of times the entire training dataset is passed through the network during training. Eg: 1, 10, 50, 100
Number of iterations: An iteration refers to one update of the model’s parameters based on a batch of training examples. During each iteration, a mini-batch of data (subset of the training data) is used to compute gradients and update the model’s weights and bias parameters. The total number of iterations can be calculated using the following formula:
No. of iterations = (Total training examples / Batch size) * Total epochs
Regularization Techniques are used to prevent overfitting, or the problems of dead neurons. Some of the common regularization methods are Dropout, L1 (Lasso) / L2 (Ridge), Batch normalization, and Early Stopping.

Few practical examples for hyper-parameter values:

Hyperparameters for training a Deep Neural Network (DNN)

example of hyperparameters for a deep neural network — Title: Hyperparameters associated with DNN in vehicular networks
Source: DNN-Assisted Cooperative Localization in Vehicular Networks paper published by School of Electrical Engineering, Korea University

Hyperparameters for training the GPT series (large language models)

Title: Sizes, architectures, and learning hyper-parameters (batch size in tokens and learning rate) of GPT models
Source: GPT-3 paper: Language Models are Few-Shot Learners

Video Explanation

In the “Neural Networks Hyperparameters explained” video, Misra Turp draws out the neural network structure and clearly explains the hyperparameters used in Neural Network training at different parts of the network (Runtime: 7 mins)

YouTube video — Neural Networks Hyperparameters by Misra Turp

In the “Hyperparameter Tuning” video, Misra Turp goes a step further into the hyperparameter concept of Neural Network and talks about different methods of Hyperparameter tuning including Grid search, Random search, Manually zooming in and some other sophisticated techniques such as Bayesian search, gradient-based search and evolutionary algorithms. (Runtime: 10 mins)

Author

AIML.com

Help us improve this post by suggesting in comments below:

– modifications to the text, and infographics
– video resources that offer clear explanations for this question
– code snippets and case studies relevant to this concept
– online blogs, and research publications that are a “must read” on this topic

Leave the first comment (Cancel Reply)

You must be logged in to post a comment.

Partner Ad

Join us on:

Find out all the ways that you can

Contribute