Review: Advances in Style Transfer

Abstract

We discuss recent advances in style transfer in CVPR ‘17 and SIGGRAPH ‘17, and provide pointers to relevant papers.

Introduction

Since the introduction of neural style transfer by [Gatys et al. 2016], several commercial software harnessing the deep model underlying have appeared and gained arguable large popularaity.

The idea behind the scene is quite simple: extract style and content features and generate a fused image base on those features. How? Deep neural networks come to rescue. A pre-trained convnet is used to extract features from the style and the content image. Then starting with a randomly initialized image, one may iteratively “refine” it by minimizing the a notion of loss between the generated image and extracted feature at each layer. The mixed content loss, style loss and perceptual loss proposed by [Johnson et al. 2016] is then back propograted to the output image itself.

After several iterations, we may then expect a stylized image that assumes structural contents from the source and high-level style traits (e.g., texture patches, color histograms, etc.) from the reference image.

Unlike traditional approaches that mostly operate on lower level features such as texture patches, neural style transfer has been proven to be more powerful to transfer higher level features. And indeed, is more interesting.

Recent Progress

Nevertheless, the model introduced by [Gatys et al. 2016] is not flawless. Indeed, in this year’s CVPR and SIGGRAPH, we see several papers attempting to address various problems and achieving intriguing results.

Multiple Style Transfer

The original model learns style separately. New styles were not possible unless having the network being retrained on the new references. [Chen et al. 2017] attacked this by introducing a bank of convolutional filters that enable the decouple between contents and styles.

Hence new style can be directly trained on the existed StyleBank incrementally. It is also more convenient for style fusion, as can be done by weighing the filter bank.

Controls and Constraints

The goal of style transfer, by the end of the day, is to achieve perceptually consistent, aesthetically pleasant (well, at least in my opinion) results. The original model may go wild under certain circumstances, for example, due to the lack of context awareness. As being widely noticed, styles from some context (say, sky) may be applied to some irrelevant context (say, a house), such discrepancy is one direction people strive to overcome.

We see [Gatys et al. 2017; Luan et al. 2017; Liao et al. 2017] all more or less respond to this issue. [Gatys et al. 2017] introduces several controlling factors that may act on spatial context (to determine the corresponding regions for transferring, thus transferring only occurs between similar contexts); colors (thus patterns may be transferred without alternating the color); and scale (applying separate styles for different scales).

Unlike most existing models that apply some “painting-like” style to real-world images, deep photo style transfer proposed by [Luan et al. 2017], on the other hand, learns styles from real photographs. One advantage of this insight is the prevention of structural distortion. As common in many other algorithms, lines or areas are almost always distorted. This is done by constraining the transfer operation to be occur only in the color space.

A direct observation from this constraint is that it limits the model’s capability of being able only to applying style related to color, e.g., alternation between day and night, weather variation, etc. Nevertheless, their results are mostly photorealistic compared to others.

[Liao et al. 2017] tackles this problem from a pair-wise supervised fashion, as what is known as transfer between visual attributes (that is, higher-level traits). Such attributes are learnt by image analogy (that is, the establishment of structural correspondences between input images).

The utilization of image analogy offers this model a unique merit: style transfer is acted on pairs. The model simultaneously generates two outputs given two inputs with the style interchanged. Their results are most impressive so far as I can tell (interesting enough, a comparison between [Luan et al. 2017] is also given).

Another work might be inline would be [Shu et al. 2017], where the transfer occurs exclusively in lightning. In particular, their algorithm, without exploiting deep neural networks, is capable of apply the desired illumination from one portrait to another without sabotage facial features.

Faster Transfer

Video Style Transfer

Applying neural style transfer on video on a frame-by-frame basis may cause artifacts. Indeed, the temporal relationship between frames are not exploited. [Huang et al. 2017] noted the “flickers” caused by inconsistency of style patches applied on the same region, and propose to alleviate this problem by feeding two consecutive frames at one time to minimize temporal inconsistency. Their model achieves real-time consistent performance, also of interests, by feeding frames that may be “far away” in time, long-term consistency can also be preserved.

High Resolution

As far as high resolution images are concerned, the transfer process often fails to obtain the intricate fine details of the texture. In [Wang et al. 2017], we see a coarse-to-fine approach applied in a hierarchical manner that is able to generate high resolution stylized images with correct textures in scale (surprisingly though, I did not perceive much difference compared to the baseline model).

Remark

Clearly, neural style transfer seems to dominate the arena of style transfer. We see many problems are being addressed, and the resulting stylized image is more and more realistic (or unrealistic, depends on the application), less flawed. We expect more discrepancies between fully stylized results may be settled in the future.

Moreover, it is quite inspiring to note work such as [Shu et al. 2017] that achieves impressing results without the help of neural networks.

References

Chen, D., Yuan, L., Liao, J., Yu, N., and Hua, G. 2017. StyleBank: An Explicit Representation for Neural Image Style Transfer. CVPR ’17.
Gatys, L.A., Ecker, A.S., and Bethge, M. 2016. Image style transfer using convolutional neural networks. CVPR ’16, 2414–2423.
Gatys, L.A., Ecker, A.S., Bethge, M., Hertzmann, A., and Shechtman, E. 2017. Controlling Perceptual Factors in Neural Style Transfer. CVPR ’17.
Huang, H., Wang, H., Luo, W., et al. 2017. Real-Time Neural Style Transfer for Videos. CVPR ’17.
Johnson, J., Alahi, A., and Fei-Fei, L. 2016. Perceptual losses for real-time style transfer and super-resolution. ECCV ’16, Springer, 694–711.
Liao, J., Yao, Y., Yuan, L., Hua, G., and Kang, S.B. 2017. Visual Attribute Transfer through Deep Image Analogy. ACM Transactions on Graphics 36, 4.
Luan, F., Paris, S., Shechtman, E., and Bala, K. 2017. Deep Photo Style Transfer. CVPR ’17.
Shu, Z., Hadap, S., Shechtman, E., Sunkavalli, K., Paris, S., and Samaras, D. 2017. Portrait lighting transfer using a mass transport approach. ACM Transactions on Graphics 36, 4, 145a.
Wang, X., Oxholm, G., Zhang, D., and Wang, Y.-F. 2017. Multimodal Transfer: A Hierarchical Deep Convolutional Neural Network for Fast Artistic Style Transfer. CVPR ’17.