The image-to-image translation, i.e. from source image domain to target image domain, has made significant progress in recent years. The most popular method for unpaired image-to-image translation is CycleGAN. However, it always cannot accurately and rapidly learn the key features in target domains. So, the CycleGAN model learns slowly and the translation quality needs to be improved. In this study, a multi-head mutual-attention CycleGAN (MMA-CycleGAN) model is proposed for unpaired image-to-image translation. In MMA-CycleGAN, the cycle-consistency loss and adversarial loss in CycleGAN are still used, but a mutual-attention (MA) mechanism is introduced, which allows attention-driven, long-range dependency modelling between the two image domains. Moreover, to efficiently deal with the large image size, the MA is further improved to the multi-head mutual-attention (MMA) mechanism. On the other hand, domain labels are adopted to simplify the MMA-CycleGAN architecture, so only one generator is required to perform bidirectional translation tasks. Experiments on multiple datasets demonstrate MMA-CycleGAN is able to learn rapidly and obtain photo-realistic images in a shorter time than CycleGAN.

References

1. 1)
  - 15. Reed, S., Akata, Z., Yan, X., et al: ‘Generative adversarial text to image synthesis’. arXiv preprint arXiv:1605.05396, 2016.
2. 2)
  - 21. Yi, Z., Zhang, H., Tan, P., et al: ‘Dualgan: unsupervised dual learning for image-to-image translation’. 2017 IEEE Int. Conf. on Computer Vision (ICCV), Venice, Italy, 2017, pp. 2868–2876.
3. 3)
  - 18. Liu, M.-Y., Tuzel, O.: ‘Coupled generative adversarial networks’. Advances in Neural Information Processing Systems, Barcelona, Spain, 2016, pp. 469–477.
4. 4)
  - 4. Goodfellow, I., Pouget-Abadie, J., Mirza, M., et al: ‘Generative adversarial nets’. Advances in Neural Information Processing Systems, Montreal, Canada, 2014, pp. 2672–2680.
5. 5)
  - 26. Zhang, R., Pfister, T., Li, J.: ‘Harmonic Unpaired Image-to-Image Translation’. arXiv preprint arXiv:1902.09727, 2019.
6. 6)
  - 10. Mejjati, Y.A., Richardt, C., Tompkin, J., et al: ‘Unsupervised attention-guided image-to-image translation’. Advances in Neural Information Processing Systems, Montreal, Canada, 2018, pp. 3693–3703.
7. 7)
  - 33. Sermanet, P., Chintala, S., LeCun, Y.: ‘Convolutional neural networks applied to house numbers digit classification’. 2012 21st International Conference on Pattern Recognition (ICPR), Tsukuba, Japan, 2012, pp. 3288–3291.
8. 8)
  - 6. He, D., Xia, Y., Qin, T., et al: ‘Dual learning for machine translation’. Advances in Neural Information Processing Systems, Barcelona, Spain, 2016, pp. 820–828.
9. 9)
  - 11. Ledig, C., Theis, L., Huszár, F., et al: ‘Photo-realistic single image super-resolution using a generative adversarial network, 2017, pp. 4681–4690.
10. 10)
  - 2. Eigen, D., Fergus, R.: ‘Predicting depth, surface normals and semantic labels with a common multi-scale convolutional architecture’. Proc. of the IEEE Int. Conf. on Computer Vision, Santiago, Chile, 2015, pp. 2650–2658.
11. 11)
  - 20. Liu, M.-Y., Breuel, T., Kautz, J.: ‘Unsupervised image-to-image translation networks’. Advances in Neural Information Processing Systems, Long Beach, CA, USA., 2017, pp. 700–708.
12. 12)
  - 23. Chen, Y., Lai, Y.-K., Liu, Y.-J.: ‘Cartoongan: generative adversarial networks for photo cartoonization’. Proc. of the IEEE Conf. on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA., 2018, pp. 9465–9474.
13. 13)
  - 32. Li, M., Huang, H., Ma, L., et al: ‘Unsupervised image-to-image translation with stacked cycle-consistent adversarial networks’. Proc. of the European Conf. on Computer Vision (ECCV), Munich, Germany, 2018, pp. 184–199.
14. 14)
  - 8. Chen, X., Xu, C., Yang, X., et al: ‘Attention-gan for object transfiguration in wild images’. Proc. of the European Conf. on Computer Vision (ECCV), Munich, Germany, 2018, pp. 164–180.
15. 15)
  - 22. Kim, T., Cha, M., Kim, H., et al: ‘Learning to discover cross-domain relations with generative adversarial networks, 2017, pp. 1857–1865.
16. 16)
  - 27. Wu, W., Cao, K., Li, C., et al: ‘TransGaGa: geometry-aware unsupervised image-to-image translation’. Proc. of the IEEE Conf. on Computer Vision and Pattern Recognition, Long Beach, CA, USA., 2019, pp. 8012–8021.
17. 17)
  - 14. Reed, S.E., Akata, Z., Mohan, S., et al: ‘Learning what and where to draw’. Advances in Neural Information Processing Systems, Barcelona, Spain, 2016, pp. 217–225.
18. 18)
  - 3. Isola, P., Zhu, J.Y., Zhou, T., et al: ‘Image-to-image translation with conditional adversarial networks’. IEEE Conf. on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 2016.
19. 19)
  - 1. Zhang, R., Isola, P., Efros, A.A.: ‘Colorful image colorization’. European Conf. on Computer Vision, Amsterdam, Netherlands, 2016, pp. 649–666.
20. 20)
  - 12. Shi, W., Caballero, J., Huszár, F., et al: ‘Real-time single image and video super-resolution using an efficient sub-pixel convolutional neural network’. Proc. of the IEEE Conf. on Computer Vision and Pattern Recognition, Las Vegas, NV, USA., 2016, pp. 1874–1883.
21. 21)
  - 25. Lu, G., Zhou, Z., Song, Y., et al: ‘Guiding the one-to-one mapping in cyclegan via optimal transport’. Proc. of the AAAI Conf. on Artificial Intelligence, Hawaii, HI, USA., 2019, vol. 33, pp. 4432–4439.
22. 22)
  - 19. Long, J., Shelhamer, E., Darrell, T.: ‘Fully convolutional networks for semantic segmentation’. Proc. of the IEEE Conf. on Computer Vision and Pattern Recognition, Santiago, Chile, 2015, pp. 3431–3440.
23. 23)
  - 31. Choi, Y., Choi, M., Kim, M., et al: ‘Stargan: unified generative adversarial networks for multi-domain image-to-image translation’. Proc. of the IEEE Conf. on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA., 2018, pp. 8789–8797.
24. 24)
  - 24. Chang, H., Lu, J., Yu, F., et al: ‘Pairedcyclegan: asymmetric style transfer for applying and removing makeup’. 2018 IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), Salt Lake City, UT, USA., 2018.
25. 25)
  - 16. Zhang, H., Xu, T., Li, H., et al: ‘Stackgan: text to photo-realistic image synthesis with stacked generative adversarial networks’. Proc. of the IEEE Int. Conf. on Computer Vision, Venice, Italy, 2017, pp. 5907–5915.
26. 26)
  - 17. Taigman, Y., Polyak, A., Wolf, L.: ‘Unsupervised Cross-domain Image Generation’. arXiv preprint arXiv:1611.02200, 2016.
27. 27)
  - 9. Yang, C., Kim, T., Wang, R., et al: ‘Show, attend and translate: unsupervised image translation with self-regularization and attention’, IEEE Trans. Image Process., 2019, 28, (10), pp. 4845–4856.
28. 28)
  - 28. Parmar, N., Vaswani, A., Uszkoreit, J., et al: ‘Image transformer’. arXiv preprint arXiv:1802.05751, 2018.
29. 29)
  - 34. Salimans, T., Goodfellow, I., Zaremba, W., et al: ‘Improved techniques for training Gans’. Advances in Neural Information Processing System, Barcelona, Spain, 2016, pp. 2234–2242.
30. 30)
  - 5. Zhu, J.Y., Park, T., Isola, P., et al: ‘Unpaired image-to-image translation using cycle-consistent adversarial networks’. IEEE Int. Conf. on Computer Vision, Venice, Italy, 2017.
31. 31)
  - 30. Zhang, H., Goodfellow, I., Metaxas, D., et al: ‘Self-attention generative adversarial networks’. Int. Conf. on Machine Learning, Long Beach, CA, USA., 2019, pp. 7354–7363.
32. 32)
  - 29. Wang, X., Girshick, R., Gupta, A., et al: ‘Non-local neural networks’. Proc. of the IEEE Conf. on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA., 2018, pp. 7794–7803.
33. 33)
  - 13. Sønderby, C.K., Caballero, J., Theis, L., et al: ‘Amortised Map Iinference for Image Super-Resolution’. arXiv preprint arXiv:1610.04490, 2016.
34. 34)
  - 7. Vaswani, A., Shazeer, N., Parmar, N., et al: ‘Attention is all you need’. Advances in Neural Information Processing Systems, Long Beach, CA, USA., 2017, pp. 5998–6008.

Multi-head mutual-attention CycleGAN for unpaired image-to-image translation

References

Related content