Partial disentanglement of hierarchical variational auto-encoder for texture synthesis

Marek Jakab; Lukas Hudec; Wanda Benesova

Partial disentanglement of hierarchical variational auto-encoder for texture synthesis

View Fulltext

Author(s): Marek Jakab¹ ; Lukas Hudec¹ ; Wanda Benesova¹
- Affiliations: 1: Faculty of Informatics and Information Technologies , Slovak University of Technology , Ilkovicova 2, Bratislava , Slovakia
Source: Volume 14, Issue 8, December 2020, p. 564 – 574
DOI: 10.1049/iet-cvi.2019.0416 , Print ISSN 1751-9632, Online ISSN 1751-9640

Received 15/07/2019, Accepted 21/04/2020, Revised 24/02/2020, Published 30/04/2020

Multiple research studies have recently demonstrated deep networks can generate realistic-looking textures and stylised images from a single texture example. However, they suffer from some drawbacks. Generative adversarial networks are in general difficult to train. Multiple feature variations, encoded in their latent representation, require a priori information to generate images with specific features. The auto-encoders are prone to generate a blurry output. One of the main reasons is the inability to parameterise complex distributions. The authors present a novel texture generative model architecture extending the variational auto-encoder approach. It gradually increases the accuracy of details in the reconstructed images. Thanks to the proposed architecture, the model is able to learn a higher level of details resulting from the partial disentanglement of latent variables. The generative model is also capable of synthesising complex real-world textures. The model consists of multiple separate latent layers responsible for learning the gradual levels of texture details. Separate training of latent representations increases the stability of the learning process and provides partial disentanglement of latent variables. The experiments with proposed architecture demonstrate the potential of variational auto-encoders in the domain of texture synthesis and also tend to yield sharper reconstruction as well as synthesised texture images.

References

1. 1)
  - 25. Kulkarni, T.D., Whitney, W.F., Kohli, P., et al: ‘Deep convolutional inverse graphics network’. Advances in Neural Information Processing Systems, Montréal, Quebec, Canada, 2015, pp. 2539–2547.
2. 2)
  - 3. Karras, T., Laine, S., Aila, T.: ‘A style-based generator architecture for generative adversarial networks’. Proc. IEEE Conf. on Computer Vision and Pattern Recognition, Long Beach, California, USA, 2019, pp. 4401–4410.
3. 3)
  - 11. Efros, A., Leung, T.: ‘Texture synthesis by non-parametric sampling’. Proc. Seventh IEEE Int. Conf. on Computer Vision, Corfu, Greece, 1999, vol. 2, pp. 1033–1038. Available at http://graphics.cs.cmu.edu/people/efros/research/NPS/efros-iccv99.pdf; http://ieeexplore.ieee.org/document/790383/.
4. 4)
  - 9. Rezende, D.J., Mohamed, S., Wierstra, D.: ‘Stochastic backpropagation and approximate inference in deep generative models’, arXiv preprint arXiv:1401.4082, 2014.
5. 5)
  - 26. Theis, L., van den Oord, A., Bethge, M.: ‘A note on the evaluation of generative models’, 2015, pp. 1–10. Available at http://arxiv.org/abs/1511.01844.
6. 6)
  - 36. Fréchet, M.: ‘On the distance of two laws of probability’, Wkly. Rec. Acad. Sci. Sessions, 1957, 244, (6), pp. 689–692.
7. 7)
  - 41. Shu, Z., Sahasrabudhe, M., Alp Guler, R., et al: ‘Deforming autoencoders: unsupervised disentangling of shape and appearance’. Proc. European Conf. on Computer Vision (ECCV), Munich, Germany, 2018, pp. 650–665.
8. 8)
  - 1. Liu, G., Reda, F.A., Shih, K.J., et al: ‘Image inpainting for irregular holes using partial convolutions’. Proc. European Conf. on Computer Vision (ECCV), Munich, Germany, 2018, pp. 85–100.
9. 9)
  - 44. Zhou, Y., Zhu, Z., Bai, X., et al: ‘Non-stationary texture synthesis by adversarial expansion’, arXiv preprint arXiv:1805.04487, 2018.
10. 10)
  - 20. Gatys, L.A., Ecker, A.S., Bethge, M.: ‘Texture Synthesis Using Convolutional Neural Networks’. Advances in Neural Information Processing Systems 28, (Curran Associates, Inc., Montreal, Quebec, Canada, May 2015), pp. 262–270. Available at http://papers.nips.cc/paper/5633-texture-synthesis-using-convolutional-neural-networks.pdf.
11. 11)
  - 8. Kingma, D.P., Welling, M.: ‘Auto-encoding variational Bayes’. Proc. Int. Conf. on Learning Representations (ICLR), Banff, Canada, 2014, pp. 1–14. Available at http://arxiv.org/abs/1312.6114.
12. 12)
  - 39. Chen, R.T.Q., Li, X., Grosse, R., et al: ‘Isolating sources of disentanglement in variational autoencoders’. NeurIPS, Montreal, Quebec, Canada, 2018. Available at http://arxiv.org/abs/1802.04942.
13. 13)
  - 29. Zhou, Y., Zhu, Z., Bai, X., et al: ‘Nonstationary texture synthesis by adversarial expansion’, ACM Trans. Graph., 2018, 37, pp. 49:1–49:13.
14. 14)
  - 31. Vo, H.V., Duong, N.Q.K., Perez, P.: ‘Structural inpainting’. Proc. 26th ACM Int. Conf. on Multimedia, Seoul, Republic of Korea, 2018, pp. 1948–1956. Available at http://arxiv.org/abs/1803.10348.
15. 15)
  - 35. Heusel, M., Ramsauer, H., Unterthiner, T., et al: ‘GANs trained by a two time-scale update rule converge to a local Nash equilibrium’. Advances in Neural Information Processing Systems, Long Beach, California, USA, 2017, pp. 6626–6637.
16. 16)
  - 32. Xian, W., Sangkloy, P., Agrawal, V., et al: ‘TextureGAN: controlling deep image synthesis with texture patches’. Proc. IEEE Conf. on Computer Vision and Pattern Recognition, Salt Lake City, Utah, USA, 2018, pp. 8456–8465.
17. 17)
  - 18. Krause, J., Stark, M., Deng, J., et al: ‘3D object representations for fine-grained categorization’. ICCV (3dRR-13), Sydney, Australia, 2013.
18. 18)
  - 40. Mathieu, M., Zhao, J.J., Sprechmann, P., et al: ‘Disentangling factors of variation in deep representations using adversarial training’. Proc. 30th Int. Conf. on Neural Information Processing Systems (NIPS'16), Barcelona, Spain, 2016.
19. 19)
  - 21. Ulyanov, D., Lebedev, V., Vedaldi, A., et al: ‘Texture networks: feed-forward synthesis of textures and stylized images’. Proc. 33rd Int. Conf. on Machine Learning (ICML), New York City, New York, USA, 2016. Available at http://arxiv.org/abs/1603.03417.
20. 20)
  - 38. Higgins, I., Matthey, L., Pal, A., et al: ‘beta-VAE: learning basic visual concepts with a constrained variational framework’. ICLR, Toulon, France, 2017, no. July, pp. 1–13. Available at https://openreview.net/forum?id=Sy2fzU9gl.
21. 21)
  - 43. Hudec, L., Benesova, W.: ‘Texture similarity evaluation via siamese convolutional neural network’. 2018 25th Int. Conf. on Systems, Signals and Image Processing (IWSSIP), Maribor, Slovenia, 2018, pp. 1–5.
22. 22)
  - 22. Chandra, R., Grover, S., Lee, K., et al: ‘Texture synthesis with recurrent variational auto-encoder’, arXiv preprint arXiv:1712.08838, 2017.
23. 23)
  - 2. Park, T., Liu, M.-Y., Wang, T.-C., et al: ‘Semantic image synthesis with spatially-adaptive normalization’. Proc. IEEE Conf. on Computer Vision and Pattern Recognition, Long Beach, California, USA, 2019, pp. 2337–2346.
24. 24)
  - 17. Yu, F., Seff, A., Zhang, Y., et al: ‘LSUN: construction of a large-scale image dataset using deep learning with humans in the loop’, arXiv preprint arXiv:1506.03365, 2015.
25. 25)
  - 5. Zhao, S., Song, J., Ermon, S.: ‘Learning hierarchical features from deep generative models’. Proc. 34th Int. Conf. on Machine Learning, Sydney, Australia, 2017, vol. 70, pp. 4091–4099.
26. 26)
  - 28. Jetchev, N., Bergmann, U., Vollgraf, R.: ‘Texture synthesis with spatial generative adversarial networks’, CoRR, vol. abs/1611.08207, 2016.
27. 27)
  - 7. Van den Oord, A., Kalchbrenner, N., Espeholt, L., et al: ‘Conditional image generation with pixelCNN decoders’. Advances in Neural Information Processing Systems, Barcelona, Spain, 2016, pp. 4790–4798.
28. 28)
  - 12. Wei, L.-Y., Levoy, M.: ‘Fast texture synthesis using tree-structured vector quantization’. Proc. 27th Annual Conf. on Computer Graphics and Interactive Techniques, New Orleans, Louisiana, USA, 2000, pp. 479–488. Available at http://portal.acm.org/citation.cfm?doid=344779.345009.
29. 29)
  - 24. Vincent, P., Larochelle, H., Lajoie, I., et al: ‘Stacked denoising autoencoders: learning useful representations in a deep network with a local denoising criterion’, J. Mach. Learn. Res., 2010, 11, pp. 3371–3408.
30. 30)
  - 6. van den Oord, A., Kalchbrenner, N., Kavukcuoglu, K.: ‘Pixel recurrent neural networks’, arXiv preprint arXiv:1601.06759, 2016.
31. 31)
  - 13. Liang, L., Liu, C., Xu, Y.-Q., et al: ‘Real-time texture synthesis by patch-based sampling’, ACM Trans. Graph., 2001, 20, (3), pp. 127–150. Available at http://portal.acm.org/citation.cfm?doid=501786.501787.
32. 32)
  - 19. Cimpoi, M., Maji, S., Kokkinos, I., et al: ‘Describing textures in the wild’. Proc. IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), Columbus, Ohio, USA, 2014.
33. 33)
  - 4. Hu, J., Shen, L., Sun, G.: ‘Squeeze-and-excitation networks’. Proc. IEEE conf. on computer vision and pattern recognition, Salt Lake City, Utah, USA, June 2018, pp. 7132–7141.
34. 34)
  - 37. Higgins, I., Amos, D., Pfau, D., et al: ‘Towards a definition of disentangled representations’, arXiv preprint arXiv:1812.02230, 2018.
35. 35)
  - 10. Goodfellow, I., Pouget-Abadie, J., Mirza, M., et al: ‘Generative adversarial nets’. Advances in Neural Information Processing Systems, Montréal, Canada, 2014, pp. 2672–2680.
36. 36)
  - 30. Hsu, C., Chen, F., Wang, G.: ‘High-resolution image inpainting through multiple deep networks’. 2017 Int. Conf. on Vision, Image and Signal Processing (ICVISP), Osaka, Japan, September 2017, pp. 76–81. Available at http://ieeexplore.ieee.org/document/8123592/.
37. 37)
  - 42. Sønderby, C.K., Raiko, T., Maaløe, L., et al: ‘Ladder variational autoencoders’. Proc. 30th Int. Conf. on Neural Information Processing Systems (NIPS'16), Barcelona, Spain, 2016.
38. 38)
  - 33. Szegedy, C., Liu, W., Jia, Y., et al: ‘Going deeper with convolutions’. Proc. IEEE Conf. on Computer Vision and Pattern Recognition, Boston, Massachusetts, USA, 2015, pp. 1–9.
39. 39)
  - 15. Kaspar, A., Neubert, B., Lischinski, D., et al: ‘Self tuning texture optimization’, Comput. Graph. Forum, 2015, 34, (2), pp. 349–359.
40. 40)
  - 16. Liu, Z., Luo, P., Wang, X., et al: ‘Deep learning face attributes in the wild’. Proc. Int. Conf. on Computer Vision (ICCV), Santiago, Chile, December 2015.
41. 41)
  - 23. Yan, X., Yang, J., Sohn, K.,: ‘Attribute2image: conditional image generation from visual attributes’, in Leibe, B., Matas, J., Sebe, N., et al (Eds.): ‘Computer vision – ECCV 2016’ (Springer International Publishing, Cham, 2016), pp. 776–791.
42. 42)
  - 14. Bartakke, P., Vaidya, S., Sutaone, M.: ‘Refining structural texture synthesis approach’, IET Image Process., 2011, 5, (2), p. 184.
43. 43)
  - 34. Salimans, T., Karpathy, A., Chen, X., et al: ‘PixelCNN++: improving the PixelCNN with discretized logistic mixture likelihood and other modifications’, arXiv preprint arXiv:1701.05517, 2017.
44. 44)
  - 27. Li, C., Wand, M.: ‘Precomputed real-time texture synthesis with markovian generative adversarial networks’, in Leibe, B., Matas, J., Sebe, N., et al (Eds.): ‘Proceedings of the European Conference on Computer Vision (ECCV)’ (Springer, Amsterdam, The Netherlands, 2016), pp. 702–716.

Login

Not registered yet?

Share

Tools

Login to add to favourites

Key

Partial disentanglement of hierarchical variational auto-encoder for texture synthesis

References

Related content