Ygor Serpa
1 min readMar 27, 2020

--

I once taught “Learning Rate” as “Desperate Index”. You should use something like 0.001 not to be too desperate for a solution :)

Besides that, I don’t really see gradient clipping or the exploding/vanishing gradient problems being overly mentioned anymore. I know they still occur, I just don’t see them being as relevant today as they once were.

A side note on batch norm, I recently became aware that you can train a ResNet for CIFAR-10 by just training batch norm layers. I have written an article on that. I think you might like it.

--

--

Ygor Serpa
Ygor Serpa

Written by Ygor Serpa

Former game developer turned data scientist after falling in love with AI and all its branches.

Responses (1)