The data collection is done from the assets themselves, which I have access to being one of the game developers.
The main issue, though, is that many of the in-game sprites are composed out of 2 or 3 sub-sprites using layers (such as Lucy and Lucy’s tail). Thus, I had to port some of the game layering / transform / animation code to Python in order to merge the sprites as they are used in-game.
I couldn’t get this all done by the time I published this article, so I had to use characters that didn’t rely on this too much. This is one of the reasons Sarah is the main evaluation dataset, besides the other mentioned reasons.
After getting the sprites right, I had to crop them to the 256x256 range for performance, as this greatly improves the training speed while losing barely anything.
Regarding Pix2Pix, Lucy’s 500 or so sprites did get decent results on the shading problem. So I would say it is a yes. For the color problem, though, Pix2Pix is not the best option. Instead, a semantic segmentation algorithm would have fared better as a generator than the u-net regression model used by Pix2Pix. I don’t believe adding more sprites would help that problem any further.
I hope this answers your questions, feel free to ask more.
Hopefully, this year I will be bringing better results (and maybe a released game as well). I will keep you updated when news appear.