Journal of Textile Research ›› 2024, Vol. 45 ›› Issue (12): 180-188.doi: 10.13475/j.fzxb.20240200801

• Apparel Engineering • Previous Articles     Next Articles

Intelligent customization recommendation for traditional Hanfu based on improved stack-generative adversarial network

CAI Liling1, WANG Mei2, SHAO Yibing1, CHEN Wei1, CAO Huaqing3, JI Xiaofen1,4()   

  1. 1. School of International Education, Zhejiang Sci-Tech University, Hangzhou, Zhejiang 310018, China
    2. School of Fashion Design & Engineering, Zhejiang Sci-Tech University, Hangzhou, Zhejiang 310018, China
    3. College of Textile Science and Engineering, Zhejiang Sci-Tech University, Hangzhou, Zhejiang 310018, China
    4. China National Silk Museum, Hangzhou, Zhejiang 310002, China
  • Received:2024-02-07 Revised:2024-08-21 Online:2024-12-15 Published:2024-12-31
  • Contact: JI Xiaofen E-mail:xiaofenji@zstu.edu.cn

Abstract:

Objective In order to improve the communication efficiency and optimize the customization experience during the process of customizing Hanfu for users, an intelligent customization recommendation method based on an improved stack-generative adversarial network is proposed, relying on the technology of text generated images. Using this method, the user's customized requirement text will be directly generated into Hanfu sample images for reference, efficiently achieving personalized recommendations based on the user's customized needs.

Method The improvement method consisted of a two-stage model. In the first stage, a demand text encoding model based on Embeddding and Long Short-Term Memory (Embedding-LSTM) was constructed to address the problem of sparse and isolated text encoding in the original model. In the second stage, residual structures were introduced to optimize the feature extraction and transmission capabilities of Hanfu text images, providing better guide to the the second stage training process based on the primary features learned in the first stage. Then, the average absolute error was used to construct a loss function to optimize the learning process, thereby improving the consistency between text description and generated images.

Results A dataset comprising 8 156 Hanfu item images was established through data collection and enhancement techniques. By further subdividing the traditional Hanfu into 70 subdimensions based on 7 attributes, which are dynasty, category, collar shape, sleeve shape, color scheme, season, and image presentation, a learning and mapping model was established from customized demand text to Hanfu images. To determine the authenticity of the generated Hanfu images, the Inception V3 model was utilized to calculate the FrÉchet Incepton Distance(FID value). To effectively evaluate the quality of the generated Hanfu images, the pretrained Hanfu fine-grained classification model was used to calculate the second FID value. The objective evaluation results revealed that after the improvement of the model, the differences between the two FID values dropped from 35.86% to 10.59% respectively, indicating that the Hanfu images generated by the improved model had stronger discriminability and were closer to real Hanfu compared to the original model. 38 respondents were invited to participate in a subjective evaluation, and it was found that the Hanfu images generated by the improved model scored higher than the real images in terms of similarity, structural integrity, overall beauty, and text matching. This indicated that the images generated by the improved model were more realistic and favored than those generated by the original model, with more complete and aesthetic style results and higher compatibility with text labels. Comparative experiments were conducted with large model methods such as the Transformer-based ERNIE ViLG diffusion model, the U-Net convolutional neural network-based Composer, and traditional Diffusion Models. The results showed that the improved method proposed in this research had higher FID values when calculated using the Inception V3 model, while the FID values calculated using the Hanfu fine-grained classification model had the lowest scores. This indicated that the Hanfu generated by the proposed method contained more traditional Chinese clothing elements, suggesting that the proposed method was more suitable for the generation of traditional Hanfu.

Conclusion This research focused on Hanfu customization recommendation and proposed an intelligent recommendation method based on an improved stack-generative adversarial network. Research has shown that the improved method is able to produce more realistic and high-quality Hanfu images, and performs better in matching requirement texts, demonstrating the effectiveness and applicability of the proposed method in generating Hanfu images. This study only selected traditional Hanfu as the research object. In future research, more comprehensive Hanfu text and image data will be collected and organized to improve the robustness and diversity of model generation, enhance Hanfu recommendation satisfaction, and optimize customization experience. In addition, future research can also consider combining improved methods with large-scale modeling techniques to continuously optimize and improve Hanfu customization recommendation technology.

Key words: deep learning, generative adversarial network, Hanfu customization, intelligent recommendation, generative model

CLC Number: 

  • TS941.2

Fig.1

Improved stacked generative adversarial network model architecture diagram"

Tab.1

Training parameter configuration"

超参数 含义 设置
generator_Lr 生成器学习率 1×10-5
discriminator_Lr 鉴别器学习率 4×10-5
Epoch1 第1阶段
迭代次数
1 000
Epoch2 第2阶段
迭代次数
2 000
λ 重构损失权重 100

Fig.2

Embedding-LSTM text learning flow chart"

Fig.3

Structure diagram of first stage generative adversarial network model"

Fig.4

Details of first stage generated adversarial Network model. (a) Upsampling block structure; (b) Downsampling block structure"

Fig.5

Second stage generates network model structure diagram"

Fig.6

Details of second stage generated adversarial network model. (a) Upsampling block structure; (b) Middle layer sampling block structure; (c) Downsampling block structure"

Fig.7

Comparison on generation effect of improved model in two stages. (a) Hanfu generated in first stage; (b) Hanfu generated in second stage"

Fig.8

Comparison on generation effect between improved model and original model. (a) Hanfu generated by original model; (b) Hanfu generated by improved model"

Tab.2

Objective evaluation results"

评估用模型 FID值 差异/%
原始模型 改进模型
Inception V3 122.235 78.401 -35.86
汉服细粒度识别模型 13.792 12.334 -10.59

Tab.3

Subjective evaluation questionnaire setting"

模块 测评内容 问题设置
模块1 相似性 请评估生成汉服与原汉服相似度
模块2 生成质量 请评估生成汉服的结构完整度
请评估生成汉服的整体美感
请评估生成汉服的文本匹配度

Tab.4

Subjective evaluation results"

模块 模型 平均值 标准差
模块1 原始模型 5.09 0.45
改进模型 5.24 0.22
模块2 原始模型 问题1 5.13 0.78
问题2 5.50 0.83
问题3 4.55 0.89
改进模型 问题1 5.47 0.80
问题2 5.53 0.83
问题3 5.63 0.63

Fig.9

Different methods to generate images comparison. (a) ERNIE-ViLG; (b) Composer; (c) Diffusion; (d) This paper"

Tab.5

FID values of images different methods to generate"

方法 FID值
Inception V3模型 汉服细粒度识别模型
ERNIE-ViLG方法 30.787 466.967
Composer方法 77.807 485.214
Diffusion Model方法 67.434 475.587
本文方法 76.481 182.699
[1] CHAKRABORTY S, HOQUE M, JEEM N, et al. Fashion recommendation systems, models and methods: a review[J]. Informatics, 2021, 8(3): 1-34.
[2] 杨宇, 吴国栋, 刘玉良, 等. 生成对抗网络及其个性化推荐研究[J]. 小型微型计算机系统, 2022, 43(3): 574-581.
YANG Yu, WU Guodong, LIU Yuliang, et al. Research on generative adversarial nets and personalized recommendation[J]. Journal of Chinese Computer Systems, 2022, 43(3): 574-581.
[3] 江学为, 田润雨, 卢方骁, 等. 基于模拟评分的服装推荐改进算法[J]. 纺织学报, 2021, 42(12): 138-144.
JIANG Xuewei, TIAN Runyu, LU Fangxiao, et al. Improved clothing recommendation algorithm based on simulation scoring[J]. Journal of Textile Research, 2021, 42(12): 138-144.
[4] HIDAYATI S, HSU C, CHANG Y, et al. What dress fits me best? fashion recommendation on the clothing style for personal body shape[C]// 2018 ACM Multimedia Conference. Seoul: Association for Computing Machinery, 2018: 438-446.
[5] YANG B. Clothing design style recommendation using decision tree algorithm combined with deep learning[J]. Computational Intelligence and Neuro-science, 2022, 2022(1): 1-10.
[6] 曹寅, 秦俊平, 马千里, 等. 文本生成图像研究综述[J]. 浙江大学学报(工学版), 2024, 58(2): 219-238.
CAO Yin, QIN Junping, MA Qianli, et al. Survey of text-to-image synthesis[J]. Journal of Zhejiang University (Engineering Science), 2024, 58(2):219-238.
[7] HUANG M, MAO Z, WANG P, et al. Dse-gan: Dynamic semantic evolution generative adversarial network for text-to-image generation[C]// Proceedings of the 30th ACM International Conference on Multimedia. Lisboa: Association for Computing Machinery, 2022: 4345-4354.
[8] ZHANG H, KOH J Y, BALDRIDGE J, et al. Cross-modal contrastive learning for text-to-image genera-tion[C]// Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Nashville: IEEE, 2021: 833-842.
[9] ZHANG H, XU T, LI H, et al. StackGAN: text to photo-realistic image synthesis with stacked generative adversarial networks[C]// 2017 IEEE International Conference on Computer Vision. Venice: IEEE, 2017: 5908-5916.
[10] ZHANG H, GOODFELLOW I J, METAXAS D N, et al. Self-attention generative adversarial networks[C]// Proceedings of the 36th International Conference on Machine Learning. Long Beach: Association for Computing Machinery, 2019: 7354-7363.
[11] 任夷. 中国服装史[M]. 北京: 北京大学出版社, 2015:124-146.
REN Yi. History of Chinese clothing[M]. Beijing: Peking University Press, 2015:124-146.
[12] 杨婷婷, 虞佳颖, 肖姚, 等. 基于Embedding-GRU的水库水位预测模型[J]. 南水北调与水利科技(中英文), 2023, 21(5): 940-950.
YANG Tingting, YU Jiaying, XIAO Yao, et al. Reservoir level prediction based on Embedding-GRU model[J]. South-to-North Water Transfers and Water Science & Technology, 2023, 21(5): 940-950.
[13] JORDAN M. Serial order: a parallel distributed processing approach. advances in psychology[J]. North-Holland, 1997, 121: 471-495.
[14] VAN HOUDT G, MOSQUERA C, NÁPOLES G. A review on the long short-term memory model[J]. Artificial Intelligence Review, 2020, 53(8): 5929-5955.
[15] HOCHREITER S, SCHMIDHUBER J. Long short-term memory[J]. Neural Computation, 1997, 9(8): 1735-1780.
[16] HEUSEL M, RAMSAUER H, UNTERTHINER T, et al. Gans trained by a two time-scale update rule converge to a local nash equilibrium[C]// Proceedings of the 31st International Conference on Neural Information Processing Systems. Long Beach: Curran Associates Inc, 2017: 6629-6640.
[17] ZHANG H, YIN W, FANG Y, et al. ERNIE-ViLG: Unified generative pre-training for bidirectional vision-language generation[EB/OL]. (2021-12-31)[2022-04-13]. https://arxiv.org/abs/2112.15283.
[18] HUANG L, CHEN D, LIU Y, et al. Composer: Creative and controllable image synthesis with composable conditions[EB/OL]. (2023-2-20)[2023-02-23]. https://arxiv.org/abs/2112.15283.
[19] ROMBACH R, BLATTMANN A, LORENZ D, et al. High-resolution image synthesis with latent diffusion models[C]// Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Nashville: IEEE, 2022: 10684-10695.
[1] LIU Yanping, GUO Peiyao, WU Ying. Research progress in deep learning technology for fabric defect detection [J]. Journal of Textile Research, 2024, 45(12): 234-242.
[2] LI Yang, ZHANG Yongchao, PENG Laihu, HU Xudong, YUAN Yanhong. Fabric defect detection based on improved cross-scene Beetle global search algorithm [J]. Journal of Textile Research, 2024, 45(10): 89-94.
[3] LU Yinwen, HOU Jue, YANG Yang, GU Bingfei, ZHANG Hongwei, LIU Zheng. Single dress image video synthesis based on pose embedding and multi-scale attention [J]. Journal of Textile Research, 2024, 45(07): 165-172.
[4] WEN Jiaqi, LI Xinrong, FENG Wenqian, LI Hansen. Rapid extraction of edge contours of printed fabrics [J]. Journal of Textile Research, 2024, 45(05): 165-173.
[5] LU Weijian, TU Jiajia, WANG Junru, HAN Sijie, SHI Weimin. Model for empty bobbin recognition based on improved residual network [J]. Journal of Textile Research, 2024, 45(01): 194-202.
[6] CHI Panpan, MEI Chennan, WANG Yan, XIAO Hong, ZHONG Yueqi. Single soldier camouflage small target detection based on boundary-filling [J]. Journal of Textile Research, 2024, 45(01): 112-119.
[7] LIU Yuye, WANG Ping. High-precision intelligent algorithm for virtual fitting based on texture feature learning [J]. Journal of Textile Research, 2023, 44(05): 177-183.
[8] YANG Hongmai, ZHANG Xiaodong, YAN Ning, ZHU Linlin, LI Na'na. Robustness algorithm for online yarn breakage detection in warp knitting machines [J]. Journal of Textile Research, 2023, 44(05): 139-146.
[9] GU Bingfei, ZHANG Jian, XU Kaiyi, ZHAO Songling, YE Fan, HOU Jue. Human contour and parameter extraction from complex background [J]. Journal of Textile Research, 2023, 44(03): 168-175.
[10] LI Yang, PENG Laihu, LI Jianqiang, LIU Jianting, ZHENG Qiuyang, HU Xudong. Fabric defect detection based on deep-belief network [J]. Journal of Textile Research, 2023, 44(02): 143-150.
[11] WANG Bin, LI Min, LEI Chenglin, HE Ruhan. Research progress in fabric defect detection based on deep learning [J]. Journal of Textile Research, 2023, 44(01): 219-227.
[12] CHEN Jia, YANG Congcong, LIU Junping, HE Ruhan, LIANG Jinxing. Cross-domain generation for transferring hand-drawn sketches to garment images [J]. Journal of Textile Research, 2023, 44(01): 171-178.
[13] AN Yijin, XUE Wenliang, DING Yi, ZHANG Shunlian. Evaluation of textile color rubbing fastness based on image processing [J]. Journal of Textile Research, 2022, 43(12): 131-137.
[14] CHEN Jinguang, LI Xue, SHAO Jingfeng, MA Lili. Lightweight clothing detection method based on an improved YOLOv5 network [J]. Journal of Textile Research, 2022, 43(10): 155-160.
[15] JIANG Hui, MA Biao. Style similarity algorithm based on clothing style [J]. Journal of Textile Research, 2021, 42(11): 129-136.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!