Picture that Sketch:

Photorealistic Image Generation from Abstract Sketches

1SketchX, CVSSP, University of Surrey, United Kingdom 2iFlyTek-Surrey Joint Research Centre on Artifiial Intelligence

(a) Set of photos generated by the proposed method. (b) While existing methods can generate faithful photos from perfectly pixel-aligned edgemaps, they fall short drastically in case of highly deformed and sparse free-hand sketches. In contrast, our autoregressive sketch-to-photo generation model produces highly photorealistic outputs from highly abstract sketches.


Given an abstract, deformed, ordinary sketch from untrained amateurs like you and me, this paper turns it into a photorealistic image - just like those shown in Fig. 1(a), all non-cherry-picked. We differ significantly from prior art in that we do not dictate an edgemap-like sketch to start with, but aim to work with abstract free-hand human sketches. In doing so, we essentially democratise the sketch-to-photo pipeline, "picturing" a sketch regardless of how good you sketch. Our contribution at the outset is a decoupled encoder-decoder training paradigm, where the decoder is a StyleGAN trained on photos only. This importantly ensures that generated results are always photorealistic. The rest is then all centred around how best to deal with the abstraction gap between sketch and photo. For that, we propose an autoregressive sketch mapper trained on sketch-photo pairs that maps a sketch to the StyleGAN latent space. We further introduce specific designs to tackle the abstract nature of human sketches, including a fine-grained discriminative loss on the back of a trained sketch-photo retrieval model, and a partial-aware sketch augmentation strategy. Finally, we showcase a few downstream tasks our generation model enables, amongst them is showing how fine-grained sketch-based image retrieval, a well-studied problem in the sketch community, can be reduced to an image (generated) to image retrieval task, surpassing state-of-the-arts. We put forward generated results in the supplementary for everyone to scrutinise.


The sketch mapper aims to predict the corresponding latent code of associated photo in the manifold of pre-trained GAN.


The sketch mapper learns to map a sketch to the latent code of its paired photo in a pre-trained StyleGAN manifold, trained with a mix of reconstruction, fine-grained discriminative, and distillation losses.


this slowpoke moves
Qualitative comparison with various state-of-the-art competitors on ShoeV2 dataset. Ours-ref (column 3) results depict that our method can faithfully replicate the appearance of a given reference photo (shown in the top-right inset).

this slowpoke moves
Precise semantic editing.

this slowpoke moves
Effect of noisy stroke addition (left) and generation from partial sketches (right).

this slowpoke moves
Generalisation across sketch styles.

this slowpoke moves
Multi-modal generation showing varied colour and, appearance features.

this slowpoke moves
Qualitative results on ShoeV2, ChairV2, and Handbag datasets.

this slowpoke moves this slowpoke moves
Generating transitional photos between two given sketches.


title={{Picture that Sketch: Photorealistic Image Generation from Abstract Sketches}},
author={Subhadeep Koley and Ayan Kumar Bhunia and Aneeshan Sain and Pinaki Nath Chowdhury and Tao Xiang and Yi-Zhe Song},

Copyright: CC BY-NC-SA 4.0 © Subhadeep Koley | Last updated: 23 May 2023 |Template Credit: Nerfies