I once taught “Learning Rate” as “Desperate Index”.

1 min readMar 27, 2020

I once taught “Learning Rate” as “Desperate Index”. You should use something like 0.001 not to be too desperate for a solution :)

Besides that, I don’t really see gradient clipping or the exploding/vanishing gradient problems being overly mentioned anymore. I know they still occur, I just don’t see them being as relevant today as they once were.

A side note on batch norm, I recently became aware that you can train a ResNet for CIFAR-10 by just training batch norm layers. I have written an article on that. I think you might like it.

Written by Ygor Serpa

Responses (1)