Journal of Textile Research ›› 2025, Vol. 46 ›› Issue (06): 203-211.doi: 10.13475/j.fzxb.20241200401

• Apparel Engineering • Previous Articles     Next Articles

Cross-pose virtual try-on based on improved appearance flow network

LUO Ruiqi1, CHANG Dashun1, HU Xinrong1,2,3(), LIANG Jinxing1,2, PENG Tao1,2,3, CHEN Jia1,2,3, LI Li1,2,3   

  1. 1. College of Computer Science and Artificial Intelligence, Wuhan Textile University, Wuhan, Hubei 430200, China
    2. Engineering Research Center of Hubei Province for Clothing Information, Wuhan, Hubei 430200, China
    3. Hubei Engineering Research Center for Intelligent Textile and Clothing, Wuhan, Hubei 430200, China
  • Received:2024-12-04 Revised:2025-03-05 Online:2025-06-15 Published:2025-07-02
  • Contact: HU Xinrong E-mail:hxr@wtu.edu.cn

Abstract:

Objective Current research on virtual try-on predominantly focuses on single garments under simple poses, with its efficacy heavily reliant on frontal garment images, thereby limiting practical applications. In contrast, cross-pose virtual try-on demonstrates enhanced practical applicability by transferring complete outfits to target individuals, yet faces substantial challenges in achieving optimal results due to distortions caused by pose variations and garment complexities. To address the issue of suboptimal cross-pose virtual try-on performance under challenging poses, this paper proposes an improved global appearance flow network to achieve enhanced cross-pose virtual try-on technology.

Method First, a Co-Attention attention module is introduced to optimize the global style vector, guiding the modulated convolutional network in estimating the global appearance flow. Next, a channel attention mechanism is employed to enhance the clothing feature information, effectively reducing information loss during cross-pose try-on. Finally, deformable convolutions replace traditional convolutions in the global appearance flow optimization module, improving the global appearance flow and better preserving clothing details.

Results Quantitative experimental results on the DeepFashion dataset show that the model proposed in this paper significantly outperforms the baseline model in two key indicators, i.e., structural similarity index Measure (SSIM) and frechet inception distance (FID). Specifically, the SSIM value is increased by approximately 4.8%, indicating a higher similarity in structural features between the generated images and the real images. Meanwhile, the FID value is increased by 23.5%, further validating that the distribution features of the generated images are closer to those of the real images. Qualitative experimental results on the DeepFashion dataset show that compared to CoCosNet, CT-Net and FS-VTON models, CoCosNet and CT-Net do not incorporate global information, making them face significant challenges when dealing with model images that only contain local regions. As a result, they often generate unrealistic clothing. Moreover, due to the inability of TPS to handle large deformations, CT-Net generated images contain numerous artifacts. Although FS-VTON incorporates global information, it performs poorly in handling local clothing deformations. The proposed model can reasonably estimate clothing warping in cross-posture scenarios by combining the global and local information.

Conclusion This paper proposes an improved appearance flow prediction network aimed at enhancing the realism and practicality of virtual try-on systems, thereby creating more possibilities and opportunities for the fashion industry and e-commerce. By introducing a Co-Attention module to optimize global style feature vectors, the network is better equipped to handle large-scale deformations during virtual try-on. Additionally, the integration of deformable convolutions in the local refinement network for appearance flow further enhances the network's ability to preserve clothing details. Extensive experiments on the DeepFashion dataset demonstrate that the proposed method significantly outperforms other approaches. Although the model proposed in this paper has achieved relatively satisfactory experimental results, there is still room for optimization. In the future, research on the virtual try-on for high-resolution images can be carried out, and data of outdoor scenes can be added to the dataset. In addition, the model can be further improved to make it independent of parsers and enable end-to-end training, thereby promoting the industrial application of the model.

Key words: virtual try-on, attention mechanism, appearance flow, cross-pose, deformable convolution

CLC Number: 

  • TS942.8

Fig.1

Framework diagram based on improved appearance flow network"

Fig.2

Co-attention attention mechanism module"

Fig.3

Coarse appearance flow prediction module"

Fig.4

Global appearance flow refinement module"

Tab.1

Quantitative comparison of various methods"

方法 SSIM FID
CoCosNet 0.77 31.85
CT-Net 0.82 21.25
FS-VTON 0.83 15.39
本文方法 0.87 11.77

Fig.5

Visual effects of various methods. (a) Unpaired results 1; (b) Unpaired results 2; (c) Unpaired results 3; (d) Unpaired results 4; (e) Unpaired results 5"

Fig.6

Visual effects of multi-view experiments. (a) Target person 1; (b) Target person 2; (c) Target person 3; (d) Target person 4"

Fig.7

Visual effects of ablation experiments. (a) Unpaired results 6; (b) Unpaired results 7; (c) Unpaired results 8"

[1] HE S, SONG Y Z, XIANG T. Style-based global appearance flow for virtual try-on[C]// Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Washington D.C.: IEEE Press, 2022: 3470-3479.
[2] ZHENYU X, ZAIYU H, XIN D, et al. Gp-vton: Towards general purpose virtual try-on via collaborative local-flow global-parsing learning[C]// Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Washington D.C.: IEEE Press, 2023: 23550-23559.
[3] YANG H, ZHANG R, GUO X, et al. Towards photo-realistic virtual try-on by adaptively generating-preserving image content[C]// Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Washington D.C.: IEEE Press, 2020: 7850-7859.
[4] WANG B, ZHENG H, LIANG X, et al. Toward characteristic-preserving image-based virtual try-on network[C]// Proceedings of the European conference on Computer Vision (ECCV). Berlin: Springer, 2018: 589-604.
[5] ISSENHUTH T, MARY J, CALAUZENES C. Do not mask what you do not need to mask: a parser-free virtual try-on[C]// Proceedings of the European Conferenceon Computer Vision. Berlin: Springer, 2020: 619-635.
[6] HAN X, WU Z, WU Z, et al. Viton: an image-based virtual try-on network[C]// Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Washington D.C.: IEEE Press, 2018: 7543-7552.
[7] DUCHON J. Splines minimizing rotation-invariant semi-norms in sobolev spaces[C]// Proceedings of a Conference on Constructive Theory of Functions of Several Variables. Berlin: Springer, 1977: 85-100.
[8] HAN X, HU X, HUANG W, et al. Clothflow: a flow-based model for clothed person generation[C]// Proceedings of the IEEE/CVF International Conference on Computer Vision. Washington D.C.: IEEE Press, 2019: 10471-10480.
[9] 韩超远, 李健, 王泽震. 改进的PF-AFN在虚拟试衣中的应用[J]. 计算机辅助设计与图形学学报, 2023, 35(10): 1500-1509.
HAN Chaoyuan, LI Jian, WANG Zezhen. Application of Improved PF-AFN in virtual try-on[J]. Journal of Computer-Aided Design & Computer Graphics, 2023, 35(10): 1500-1509.
[10] 陈宝玉, 张怡, 于冰冰, 等. 两阶段可调节感知蒸馏网络的虚拟试衣方法[J]. 图学学报, 2022, 43(2): 316-323.
CHEN Baoyu, ZHANG Yi, YU Bingbing, et al. Two-stage adjustable perceptual distillation network for virtual try-on[J]. Journal of Graphics, 2022, 43(2): 316-323.
doi: 10.11996/JG.j.2095-302X.2022020316
[11] 谭泽霖, 白静, 陈冉, 等. FP-VTON:基于注意力机制的特征保持虚拟试衣网络[J]. 计算机工程与应用, 2022, 58(23): 186-196.
doi: 10.3778/j.issn.1002-8331.2105-0278
TAN Zelin, BAI Jing, CHEN Ran, et al. FP-VTON: attention-based feature preserving virtual try-on net-work[J]. Computer Engineering and Applications, 2022, 58(23): 186-196.
doi: 10.3778/j.issn.1002-8331.2105-0278
[12] WU Z, LIN G, TAO Q, et al. M2e-try on net: fashion from model to everyone[C]// Proceedings of the 27th ACM International Conference on Multimedia. New York: ACM Press, 2019: 293-301.
[13] GÜLER R A, NEVEROVA N, KOKKINOS I. Densepose: dense human pose estimation in the wild[C]// Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Washington D.C.: IEEE Press, 2018: 7297-7306.
[14] RAJ A, SANGKLOY P, CHANG H, et al. SwapNet: image based garment transfer[C]// Proceedings of the European Conferenceon Computer Vision. Berlin: Springer, 2018: 679-695.
[15] YANG F, LIN G. CT-net: complementary transfering network for garment transfer with arbitrary geometric changes[C]// Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Washington D.C.: IEEE Press, 2021: 9899-9908.
[16] DOSOVITSKIY A, FISCHER P, ILG E, et al. Flownet: learning optical flow with convolutional networks[C]// Proceedings of the IEEE International Conference on Computer Vision. Washington D.C.: IEEE Press, 2015: 2758-2766.
[17] HUI T W, TANG X, LOY C C. Liteflownet: a lightweight convolutional neural network for optical flow estimation[C]// Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Washington D.C.: IEEE Press, 2018: 8981-8989.
[18] ILG E, MAYER N, SAIKIA T, et al. Flownet 2.0: evolution of optical flow estimation with deep net-works[C]// Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Washington D.C.: IEEE Press, 2017: 2462-2470.
[19] BAI S, ZHOU H, LI Z, et al. Single stage virtual try-on via deformable attention flows[C]// European Conference on Computer Vision. Berlin: Springer, 2022: 409-425.
[20] LIANG X, GONG K, SHEN X, et al. Look into person: joint body parsing & pose estimation network and a new benchmark[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2019, 41(4):871-885.
[21] LU X, WANG W, MA C, et al. See more, know more: unsupervised video object segmentation with co-attention siamese networks[C]// Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Washington D.C.: IEEE Press, 2019: 3623-3632.
[22] WU J, MAZUR T R, RUAN S, et al. A deep boltzmann machine-driven level set method for heart motion tracking using cine MRI images[J]. Medical Image Analysis, 2018, 47: 68-80.
doi: S1361-8415(18)30128-2 pmid: 29679848
[23] LIU F, WANG K, LIU D, et al. Deep pyramid local attention neural network for cardiac structure segmentation in two-dimensional echocardiography[J]. Medical Image Analysis, 2021.DOI:10.1016/j.medin.2020. 101873.
[24] AHN S S, TA K, THORN S L, et al. Co-attention spatial transformer network for unsupervised motion tracking and cardiac strain analysis in 3d echocardiography[J]. Medical Image Analysis, 2023.DOI:10.1016/j.media.2022.102711.
[25] SUN D, ROTH S, BLACK M J. A quantitative analysis of current practices in optical flow estimation and the principles behind them[J]. International Journal of Computer Vision, 2014, 106: 115-137.
[26] SIMONYAN K, ZISSERMAN A. Very deep convolutional networks for large-scale image recogni-tion[J]. Compute Science, 2014.DOI:10.48550/arxiv.1409.1556.
[27] LIU Z, LUO P, QIU S, et al. Deepfashion: powering robust clothes recognition and retrieval with rich annotations[C]// Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Washington D.C.: IEEE Press, 2016: 1096-1104.
[28] WANG Z, BOVIK A C, SHEIKH H R, et al. Image quality assessment: from error visibility to structural similarity[J]. IEEE Transactions on Image processing, 2004, 13(4): 600-612.
doi: 10.1109/tip.2003.819861 pmid: 15376593
[29] HEUSEL M, RAMSAUER H, UNTERTHINER T, et al. Gans trained by a two time-scale update rule converge to a local nash equilibrium[J]. Advances in Neural Information Processing Systems, 2017. DOI:10.48550/arxiv.1706.08500.
[30] ZHANG P, ZHANG B, CHEN D, et al. Cross-domain correspondence learning for exemplar-based image translation[C]// Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Washington D.C.: IEEE Press, 2020: 5143-5153.
[1] YU Haoran, WANG Ping, WANG Hao, DING Dong. High-precision 3-D virtual try-on model based on cross-attention multi-view generation and diffusion [J]. Journal of Textile Research, 2025, 46(03): 167-176.
[2] HOU Jue, DING Huan, YANG Yang, LU Yinwen, YU Lingjie, LIU Zheng. Lightweight parser-free virtual try-on based on mixed knowledge distillation and feature enhancement techniques [J]. Journal of Textile Research, 2024, 45(09): 164-174.
[3] LU Yinwen, HOU Jue, YANG Yang, GU Bingfei, ZHANG Hongwei, LIU Zheng. Single dress image video synthesis based on pose embedding and multi-scale attention [J]. Journal of Textile Research, 2024, 45(07): 165-172.
[4] HU Xudong, TANG Wei, ZENG Zhifa, RU Xin, PENG Laihu, LI Jianqiang, WANG Boping. Structure classification of weft-knitted fabric based on lightweight convolutional neural network [J]. Journal of Textile Research, 2024, 45(05): 60-69.
[5] GU Meihua, HUA Wei, DONG Xiaoxiao, ZHANG Xiaodan. Occlusive clothing image segmentation based on context extraction and attention fusion [J]. Journal of Textile Research, 2024, 45(05): 155-164.
[6] SHI Hongyu, WEI Yingjie, GUAN Shengqi, LI Yi. Cotton foreign fibers detection algorithm based on residual structure [J]. Journal of Textile Research, 2023, 44(12): 35-42.
[7] MA Chuangjia, QI Lizhe, GAO Xiaofei, WANG Ziheng, SUN Yunquan. Stitch quality detection method based on improved YOLOv4-Tiny [J]. Journal of Textile Research, 2023, 44(08): 181-188.
[8] YUAN Tiantian, WANG Xin, LUO Weihao, MEI Chennan, WEI Jingyan, ZHONG Yueqi. Three-dimensional virtual try-on network based on attention mechanism and vision transformer [J]. Journal of Textile Research, 2023, 44(07): 192-198.
[9] FU Han, HU Feng, GONG Jie, YU Lianqing. Defect reconstruction algorithm for fabric defect detection [J]. Journal of Textile Research, 2023, 44(07): 103-109.
[10] CHEN Jia, YANG Congcong, LIU Junping, HE Ruhan, LIANG Jinxing. Cross-domain generation for transferring hand-drawn sketches to garment images [J]. Journal of Textile Research, 2023, 44(01): 171-178.
[11] GU Meihua, LIU Jie, LI Liyao, CUI Lin. Clothing image segmentation method based on feature learning and attention mechanism [J]. Journal of Textile Research, 2022, 43(11): 163-171.
[12] LI Bowen, WANG Ping, LIU Yuye. 3-D virtual try-on technique based on dynamic feature of body postures [J]. Journal of Textile Research, 2021, 42(09): 144-149.
[13] ZHANG Yijie, LI Tao, LÜ Yexin, DU Lei, ZOU Fengyuan. Progress in garment ease design and its modeling methods [J]. Journal of Textile Research, 2021, 42(04): 184-190.
[14] XIA Haibang, HUANG Hongyun, DING Zuohua. Clothing comfort evaluation based on transfer learning and support vector machine [J]. Journal of Textile Research, 2020, 41(06): 125-131.
[15] YANG Gang;;ZHONG Yueqi;. It is not rare for penetration phenomena to occur between garment and mannequin when 3-D garment is dressed onto various mannequins. In order to enhance the reusability of the 3-D scanned garment model, an algorithm based on same layer penetration compensation has been proposed, in which, the penetration detection and compensation between garment and mannequin are expressed respectively by the crossover and compensation between garment vertex and body triangle, and between garment edge/body triangle. The over-deformation is compensated via the position adjustment procedure. Experimental results verify that this method is an efficient approach for reusing 3-D scanned garment models. [J]. JOURNAL OF TEXTILE RESEARCH, 2010, 31(10): 134-138.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!