I find the idea interesting, but the experimentation is lacking.

1 min readAug 17, 2020

I find the idea interesting, but the experimentation is lacking. CIFAR-10 is a small dataset that's is really fast to train on. You could have trained more learning rate variations, instead of just 3 (as well as reducing the learning rate). Besides, there is another easy way to do pooling that has the benefits you mentioned (and you didn't include): using a 2x2 convolution with stride 2. This is called "a learnable downsampling function". People usually use 4x4 convolutions with stride 2 for that, but 2x2 should work fine.

Another thing to look at is pool size. Maybe this allows you to pool larger windows with similar accuracy. That would be a huge win. You might also try to adapt the softmax function to be "weaker", so that it lies more in the middle between avg and max pooling than being "almost max pooling"

Written by Ygor Serpa

No responses yet