Gradient Descent
Gradient descent is the optimization algorithm that trains most machine-learning models by repeatedly nudging parameters in the direction that most reduces prediction error.
Gradient descent is the workhorse optimization algorithm behind most modern machine learning, including neural networks. Training a model means finding parameter values that minimize a loss function, a measure of how wrong the model's predictions are. Gradient descent does this iteratively: it computes the gradient, the direction in which the loss increases fastest, and then takes a small step in the opposite direction. Repeating this many times walks the parameters steadily downhill toward a configuration with lower error.
The size of each step is governed by the learning rate, a key hyperparameter: too large and the process overshoots or diverges, too small and training is painfully slow. In practice, models are trained with variants such as stochastic gradient descent, which estimates the gradient from small random batches of data for speed, and adaptive optimizers like Adam, which adjust the effective step size per parameter. For deep networks, gradients are computed efficiently by backpropagation, which applies the chain rule through the network's layers.
For investors, gradient descent is not something you interact with directly, but understanding it clarifies what model training actually is: an iterative numerical search, not magic. It also explains why training choices like learning rate, batch size, and stopping point matter, and why two training runs can land in slightly different solutions. Combined with regularization, the optimization process is steered toward solutions that generalize rather than merely memorize.
hedgewing.ai's four deep-learning models (LSTM, GRU, TCN, and Transformer) are all trained with gradient-based optimization on roughly 45 engineered features. The learned parameters are then validated through nightly walk-forward backtesting, ensuring that what the optimizer found on historical data actually holds up on subsequent out-of-sample periods.
Related terms
Neural Network · Deep Learning · Supervised Learning · Regularization
Back to the hedgewing.ai glossary · See AI stock forecasts