纺织学报 ›› 2024, Vol. 45 ›› Issue (12): 180-188.doi: 10.13475/j.fzxb.20240200801

• 服装工程 • 上一篇    下一篇

基于改进堆叠生成对抗网络的传统汉服智能定制推荐

蔡丽玲1, 王梅2, 邵一兵1, 陈炜1, 曹华卿3, 季晓芬1,4()   

  1. 1.浙江理工大学 国际教育学院, 浙江 杭州 310018
    2.浙江理工大学 服装学院, 浙江 杭州 310018
    3.浙江理工大学 纺织科学与工程学院(国际丝绸学院), 浙江 杭州 310018
    4.中国丝绸博物馆, 浙江 杭州 310002
  • 收稿日期:2024-02-07 修回日期:2024-08-21 出版日期:2024-12-15 发布日期:2024-12-31
  • 通讯作者: 季晓芬(1971—),女,教授,博士。主要研究方向为服装个性化定制与智能制造。E-mail: xiaofenji@zstu.edu.cn
  • 作者简介:蔡丽玲(1980—),女,副教授,博士。主要研究方向为服装个性化定制。
  • 基金资助:
    浙江省哲学社会科学规划项目(24NDJC170YB);浙江理工大学科研基金启动项目(23196023-Y)

Intelligent customization recommendation for traditional Hanfu based on improved stack-generative adversarial network

CAI Liling1, WANG Mei2, SHAO Yibing1, CHEN Wei1, CAO Huaqing3, JI Xiaofen1,4()   

  1. 1. School of International Education, Zhejiang Sci-Tech University, Hangzhou, Zhejiang 310018, China
    2. School of Fashion Design & Engineering, Zhejiang Sci-Tech University, Hangzhou, Zhejiang 310018, China
    3. College of Textile Science and Engineering, Zhejiang Sci-Tech University, Hangzhou, Zhejiang 310018, China
    4. China National Silk Museum, Hangzhou, Zhejiang 310002, China
  • Received:2024-02-07 Revised:2024-08-21 Published:2024-12-15 Online:2024-12-31

摘要:

为优化消费者的定制体验,提升传统汉服定制推荐的效率,依据文本生成图片原理,提出一种基于改进堆叠生成对抗网络的智能定制推荐方法,该方法由2个生成对抗网络模型构成。首先构建基于嵌入式-长短期记忆网络(Embedding and Long Short-Term Memory,Embedding-LSTM)的需求文本编码模型,改进原模型存在的文本编码稀疏孤立问题;其次引入残差结构,提升汉服文图特征提取和传递能力,最后采用平均绝对误差构建损失函数优化学习过程,提升文本描述与生成图像的一致性。结果表明:与原模型相比,由改进模型所生成的汉服图像更逼真,质量更优,细节处理更精细,主客观评估结果证明模型改进的有效性;与ERNIE-ViLG、Composer和传统的Diffusion Model等大模型方法相比,所提方法在匹配需求文本和生成真实性方面表现更好,显示了其适用性。

关键词: 深度学习, 生成对抗网络, 汉服定制, 智能推荐, 生成模型

Abstract:

Objective In order to improve the communication efficiency and optimize the customization experience during the process of customizing Hanfu for users, an intelligent customization recommendation method based on an improved stack-generative adversarial network is proposed, relying on the technology of text generated images. Using this method, the user's customized requirement text will be directly generated into Hanfu sample images for reference, efficiently achieving personalized recommendations based on the user's customized needs.

Method The improvement method consisted of a two-stage model. In the first stage, a demand text encoding model based on Embeddding and Long Short-Term Memory (Embedding-LSTM) was constructed to address the problem of sparse and isolated text encoding in the original model. In the second stage, residual structures were introduced to optimize the feature extraction and transmission capabilities of Hanfu text images, providing better guide to the the second stage training process based on the primary features learned in the first stage. Then, the average absolute error was used to construct a loss function to optimize the learning process, thereby improving the consistency between text description and generated images.

Results A dataset comprising 8 156 Hanfu item images was established through data collection and enhancement techniques. By further subdividing the traditional Hanfu into 70 subdimensions based on 7 attributes, which are dynasty, category, collar shape, sleeve shape, color scheme, season, and image presentation, a learning and mapping model was established from customized demand text to Hanfu images. To determine the authenticity of the generated Hanfu images, the Inception V3 model was utilized to calculate the FrÉchet Incepton Distance(FID value). To effectively evaluate the quality of the generated Hanfu images, the pretrained Hanfu fine-grained classification model was used to calculate the second FID value. The objective evaluation results revealed that after the improvement of the model, the differences between the two FID values dropped from 35.86% to 10.59% respectively, indicating that the Hanfu images generated by the improved model had stronger discriminability and were closer to real Hanfu compared to the original model. 38 respondents were invited to participate in a subjective evaluation, and it was found that the Hanfu images generated by the improved model scored higher than the real images in terms of similarity, structural integrity, overall beauty, and text matching. This indicated that the images generated by the improved model were more realistic and favored than those generated by the original model, with more complete and aesthetic style results and higher compatibility with text labels. Comparative experiments were conducted with large model methods such as the Transformer-based ERNIE ViLG diffusion model, the U-Net convolutional neural network-based Composer, and traditional Diffusion Models. The results showed that the improved method proposed in this research had higher FID values when calculated using the Inception V3 model, while the FID values calculated using the Hanfu fine-grained classification model had the lowest scores. This indicated that the Hanfu generated by the proposed method contained more traditional Chinese clothing elements, suggesting that the proposed method was more suitable for the generation of traditional Hanfu.

Conclusion This research focused on Hanfu customization recommendation and proposed an intelligent recommendation method based on an improved stack-generative adversarial network. Research has shown that the improved method is able to produce more realistic and high-quality Hanfu images, and performs better in matching requirement texts, demonstrating the effectiveness and applicability of the proposed method in generating Hanfu images. This study only selected traditional Hanfu as the research object. In future research, more comprehensive Hanfu text and image data will be collected and organized to improve the robustness and diversity of model generation, enhance Hanfu recommendation satisfaction, and optimize customization experience. In addition, future research can also consider combining improved methods with large-scale modeling techniques to continuously optimize and improve Hanfu customization recommendation technology.

Key words: deep learning, generative adversarial network, Hanfu customization, intelligent recommendation, generative model

中图分类号: 

  • TS941.2

图1

改进堆叠生成对抗网络模型架构图"

表1

参数配置"

超参数 含义 设置
generator_Lr 生成器学习率 1×10-5
discriminator_Lr 鉴别器学习率 4×10-5
Epoch1 第1阶段
迭代次数
1 000
Epoch2 第2阶段
迭代次数
2 000
λ 重构损失权重 100

图2

Embedding-LSTM文本学习流程图"

图3

第1阶段生成对抗网络模型结构图"

图4

第1阶段生成对抗网络模型细节信息"

图5

第2阶段生成网络模型结构图"

图6

第2阶段生成对抗网络模型细节信息"

图7

改进模型在2个阶段的生成效果对比"

图8

改进模型与原模型的生成效果对比"

表2

客观评估结果"

评估用模型 FID值 差异/%
原始模型 改进模型
Inception V3 122.235 78.401 -35.86
汉服细粒度识别模型 13.792 12.334 -10.59

表3

主观评估问卷设置"

模块 测评内容 问题设置
模块1 相似性 请评估生成汉服与原汉服相似度
模块2 生成质量 请评估生成汉服的结构完整度
请评估生成汉服的整体美感
请评估生成汉服的文本匹配度

表4

主观评估结果"

模块 模型 平均值 标准差
模块1 原始模型 5.09 0.45
改进模型 5.24 0.22
模块2 原始模型 问题1 5.13 0.78
问题2 5.50 0.83
问题3 4.55 0.89
改进模型 问题1 5.47 0.80
问题2 5.53 0.83
问题3 5.63 0.63

图9

不同方法生成图像对比图"

表5

不同方法生成图像的FID值"

方法 FID值
Inception V3模型 汉服细粒度识别模型
ERNIE-ViLG方法 30.787 466.967
Composer方法 77.807 485.214
Diffusion Model方法 67.434 475.587
本文方法 76.481 182.699
[1] CHAKRABORTY S, HOQUE M, JEEM N, et al. Fashion recommendation systems, models and methods: a review[J]. Informatics, 2021, 8(3): 1-34.
[2] 杨宇, 吴国栋, 刘玉良, 等. 生成对抗网络及其个性化推荐研究[J]. 小型微型计算机系统, 2022, 43(3): 574-581.
YANG Yu, WU Guodong, LIU Yuliang, et al. Research on generative adversarial nets and personalized recommendation[J]. Journal of Chinese Computer Systems, 2022, 43(3): 574-581.
[3] 江学为, 田润雨, 卢方骁, 等. 基于模拟评分的服装推荐改进算法[J]. 纺织学报, 2021, 42(12): 138-144.
JIANG Xuewei, TIAN Runyu, LU Fangxiao, et al. Improved clothing recommendation algorithm based on simulation scoring[J]. Journal of Textile Research, 2021, 42(12): 138-144.
[4] HIDAYATI S, HSU C, CHANG Y, et al. What dress fits me best? fashion recommendation on the clothing style for personal body shape[C]// 2018 ACM Multimedia Conference. Seoul: Association for Computing Machinery, 2018: 438-446.
[5] YANG B. Clothing design style recommendation using decision tree algorithm combined with deep learning[J]. Computational Intelligence and Neuro-science, 2022, 2022(1): 1-10.
[6] 曹寅, 秦俊平, 马千里, 等. 文本生成图像研究综述[J]. 浙江大学学报(工学版), 2024, 58(2): 219-238.
CAO Yin, QIN Junping, MA Qianli, et al. Survey of text-to-image synthesis[J]. Journal of Zhejiang University (Engineering Science), 2024, 58(2):219-238.
[7] HUANG M, MAO Z, WANG P, et al. Dse-gan: Dynamic semantic evolution generative adversarial network for text-to-image generation[C]// Proceedings of the 30th ACM International Conference on Multimedia. Lisboa: Association for Computing Machinery, 2022: 4345-4354.
[8] ZHANG H, KOH J Y, BALDRIDGE J, et al. Cross-modal contrastive learning for text-to-image genera-tion[C]// Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Nashville: IEEE, 2021: 833-842.
[9] ZHANG H, XU T, LI H, et al. StackGAN: text to photo-realistic image synthesis with stacked generative adversarial networks[C]// 2017 IEEE International Conference on Computer Vision. Venice: IEEE, 2017: 5908-5916.
[10] ZHANG H, GOODFELLOW I J, METAXAS D N, et al. Self-attention generative adversarial networks[C]// Proceedings of the 36th International Conference on Machine Learning. Long Beach: Association for Computing Machinery, 2019: 7354-7363.
[11] 任夷. 中国服装史[M]. 北京: 北京大学出版社, 2015:124-146.
REN Yi. History of Chinese clothing[M]. Beijing: Peking University Press, 2015:124-146.
[12] 杨婷婷, 虞佳颖, 肖姚, 等. 基于Embedding-GRU的水库水位预测模型[J]. 南水北调与水利科技(中英文), 2023, 21(5): 940-950.
YANG Tingting, YU Jiaying, XIAO Yao, et al. Reservoir level prediction based on Embedding-GRU model[J]. South-to-North Water Transfers and Water Science & Technology, 2023, 21(5): 940-950.
[13] JORDAN M. Serial order: a parallel distributed processing approach. advances in psychology[J]. North-Holland, 1997, 121: 471-495.
[14] VAN HOUDT G, MOSQUERA C, NÁPOLES G. A review on the long short-term memory model[J]. Artificial Intelligence Review, 2020, 53(8): 5929-5955.
[15] HOCHREITER S, SCHMIDHUBER J. Long short-term memory[J]. Neural Computation, 1997, 9(8): 1735-1780.
[16] HEUSEL M, RAMSAUER H, UNTERTHINER T, et al. Gans trained by a two time-scale update rule converge to a local nash equilibrium[C]// Proceedings of the 31st International Conference on Neural Information Processing Systems. Long Beach: Curran Associates Inc, 2017: 6629-6640.
[17] ZHANG H, YIN W, FANG Y, et al. ERNIE-ViLG: Unified generative pre-training for bidirectional vision-language generation[EB/OL]. (2021-12-31)[2022-04-13]. https://arxiv.org/abs/2112.15283.
[18] HUANG L, CHEN D, LIU Y, et al. Composer: Creative and controllable image synthesis with composable conditions[EB/OL]. (2023-2-20)[2023-02-23]. https://arxiv.org/abs/2112.15283.
[19] ROMBACH R, BLATTMANN A, LORENZ D, et al. High-resolution image synthesis with latent diffusion models[C]// Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Nashville: IEEE, 2022: 10684-10695.
[1] 刘燕萍, 郭佩瑶, 吴莹. 面向织物疵点检测的深度学习技术应用研究进展[J]. 纺织学报, 2024, 45(12): 234-242.
[2] 李杨, 张永超, 彭来湖, 胡旭东, 袁嫣红. 基于改进甲壳虫全域搜索算法的机织物疵点检测[J]. 纺织学报, 2024, 45(10): 89-94.
[3] 陆寅雯, 侯珏, 杨阳, 顾冰菲, 张宏伟, 刘正. 基于姿态嵌入机制和多尺度注意力的单张着装图像视频合成[J]. 纺织学报, 2024, 45(07): 165-172.
[4] 文嘉琪, 李新荣, 冯文倩, 李瀚森. 印花面料的边缘轮廓快速提取方法[J]. 纺织学报, 2024, 45(05): 165-173.
[5] 陆伟健, 屠佳佳, 王俊茹, 韩思捷, 史伟民. 基于改进残差网络的空纱筒识别模型[J]. 纺织学报, 2024, 45(01): 194-202.
[6] 池盼盼, 梅琛楠, 王焰, 肖红, 钟跃崎. 基于边缘填充的单兵迷彩伪装小目标检测[J]. 纺织学报, 2024, 45(01): 112-119.
[7] 刘玉叶, 王萍. 基于纹理特征学习的高精度虚拟试穿智能算法[J]. 纺织学报, 2023, 44(05): 177-183.
[8] 杨宏脉, 张效栋, 闫宁, 朱琳琳, 李娜娜. 一种高鲁棒性经编机上断纱在线检测算法[J]. 纺织学报, 2023, 44(05): 139-146.
[9] 顾冰菲, 张健, 徐凯忆, 赵崧灵, 叶凡, 侯珏. 复杂背景下人体轮廓及其参数提取[J]. 纺织学报, 2023, 44(03): 168-175.
[10] 李杨, 彭来湖, 李建强, 刘建廷, 郑秋扬, 胡旭东. 基于深度信念网络的织物疵点检测[J]. 纺织学报, 2023, 44(02): 143-150.
[11] 王斌, 李敏, 雷承霖, 何儒汉. 基于深度学习的织物疵点检测研究进展[J]. 纺织学报, 2023, 44(01): 219-227.
[12] 陈佳, 杨聪聪, 刘军平, 何儒汉, 梁金星. 手绘草图到服装图像的跨域生成[J]. 纺织学报, 2023, 44(01): 171-178.
[13] 安亦锦, 薛文良, 丁亦, 张顺连. 基于图像处理的纺织品耐摩擦色牢度评级[J]. 纺织学报, 2022, 43(12): 131-137.
[14] 陈金广, 李雪, 邵景峰, 马丽丽. 改进YOLOv5网络的轻量级服装目标检测方法[J]. 纺织学报, 2022, 43(10): 155-160.
[15] 江慧, 马彪. 基于服装风格的款式相似度算法[J]. 纺织学报, 2021, 42(11): 129-136.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!