1 min readMar 27, 2020
I don’t know if you are aware of the “Reformer” and “Stop Thinking with your Head” papers. Both are algorithmic improvements to the transformer architecture, making them less parameter heavy. I wonder if we are going to see a tiny reformer in the near future.