The state of the art in image generation is BigGAN.
They are pretty fun.
What is more, they make it clear that the latent space clearly captures very meaningful shared properties across classes. The poses of quite different animals are conserved, and “cat eyes” clearly map onto “dog eyes” during interpolation. These sort of properties suggest that the network ‘understands’ the scene it is generating.
Here are some more:
(this one moves in latent space as well as class space, hence the change of pose:)
Churches to mosques: