Aug 17, 2021
I know nothing about the problem you solved, but I have done a similar thing once. A quick tip is to split your network at the first dropout. If you are dealing with images and doing dropout just at the last layers, this can make running the last layers several times much faster than running everything over and over.
Regression does indeed have its peculiarities. Batch norm is also not very suited for regression.