Pix-2-pix is basically a conditional adversarial network which is an extended version of unconditional generative adversarial network (GAN). It is used for paired image to image translation in which an image is given as an input and the translated version of that image is generated as the output. It is also comprised of two networks 1. Discriminator and 2. Generator. The generator is nothing but the U-Net model architecture which works by downsampling an image to a bottleneck layer and upsampling it again with the help of skip connections.
For the discriminator, a PatchGAN model is used which is a deep convolutional neural network that gives prediction on the patches of an image as real or fake. The discriminator is run convolutionally across the image and then all the responses are averaged to a single score.
In pix-2-pix gan, an adversarial loss which is binary-crossentropy and L1 loss (mean absolute loss between target image and generated image) are combined to form a composite loss for training the generator.
The dataset that I used can be downloaded from here: http://efrosgans.eecs.berkeley.edu/pix2pix/datasets/