It's All About Your Sketch:

Democratising Sketch Control in Diffusion Models

1SketchX, CVSSP, University of Surrey, United Kingdom 2iFlyTek-Surrey Joint Research Centre on Artifiial Intelligence

(Top-left:) Comparison of images generated by our method with SGDM, ControlNet, and T2I-Adapter. (Top-right:) A set of photos generated by our method. Bottom: While existing methods generate realistic images from pixel-perfect edgemaps, they perform sub-optimally for freehand abstract sketches.

Abstract

This paper unravels the potential of sketches for diffusion models, addressing the deceptive promise of direct sketch control in generative AI. We importantly democratise the process, enabling amateur sketches to generate precise images, living up to the commitment of "what you sketch is what you get". A pilot study underscores the necessity, revealing that deformities in existing models stem from spatial-conditioning. To rectify this, we propose an abstraction-aware framework, utilising a sketch adapter, adaptive time-step sampling, and discriminative guidance from a pre-trained fine-grained sketch-based image retrieval model, working synergistically to reinforce fine-grained sketch-photo association. Our approach operates seamlessly during inference without the need for textual prompts; a simple, rough sketch akin to what you and I can create suffices! We welcome everyone to examine results presented in the paper and its supplementary. Contributions include democratising sketch control, introducing an abstraction-aware framework, and leveraging discriminative guidance, validated through extensive experiments.

Live demo comparison with T2I-Adapter and ControlNet.

Pilot Study

Images generated by T2I-Adapter for different sketch-guidance factors (ω ∈ [0, 1]). Determining the optimum ω to obtain an ideal balance (green-bordered) between photorealism and sketch-fidelity requires manual intervention and is sample-specific. A high value of ω works well for less deformed sketches, while the same for an abstract sketch produces deformed outputs and vice-versa.

Architecture

Our overall training pipeline.

Results

Qualitative comparison with SOTA sketch-to-image generation models on Sketchy. For ControlNet, T2I-Adapter, and PITI, we use the fixed prompt “a photo of [CLASS]”, with [CLASS] replaced with corresponding class-labels of the input sketches.

Examples showing generalisation potential across different datasets (left) and stroke-styles (right).

Illustration of cross-model generalisation. Our method trained with SD v1.5, performs well on other unseen SD variants (e.g., v1.4) without further fine-tuning.

Examples depicting the effect of adding noisy strokes (left) and generation from partially-completed sketches (right).

Our method seamlessly transfers local semantic edits on input sketches into output photos.

Qualitative comparison with SOTAs. For ControlNet, T2I-Adapter, and PITI, we use the fixed prompt “a photo of [CLASS]”, with [CLASS] replaced with corresponding class-labels of the input sketches.

Images generated with our method.

Images generated with our method.

Images generated with our method.

Images generated with our method.

Images generated with our method.

Images generated with our method.

BibTeX

@inproceedings{koley2024handle,
title={{It's All About Your Sketch: Democratising Sketch Control in Diffusion Models}},
author={Koley, Subhadeep and Bhunia, Ayan Kumar and Sekhri, Deeptanshu and Sain, Aneeshan and Chowdhury, Pinaki Nath and Xiang, Tao and Song, Yi-Zhe},
booktitle={CVPR},
year={2024}
}

Copyright: CC BY-NC-SA 4.0 © Subhadeep Koley | Last updated: 05 April 2024 | Good artists copy, great artists steal.