纺织学报 ›› 2026, Vol. 47 ›› Issue (1): 196-206.doi: 10.13475/j.fzxb.20250500501

• 服装工程 • 上一篇    下一篇

基于深度学习的服装关键点实时检测模型

冯采伶1, 于施佳2, 韩曙光3()   

  1. 1.浙江理工大学 服装学院, 浙江 杭州 310018
    2.浙江机电职业技术学院 创业学院, 浙江 杭州 310053
    3.浙江理工大学 理学院, 浙江 杭州 310018
  • 收稿日期:2025-05-06 修回日期:2025-11-11 出版日期:2026-01-15 发布日期:2026-01-15
  • 通讯作者: 韩曙光(1977—),男,教授,博士。主要研究方向为服装智能制造、物流系统优化、数学建模及应用。E-mail:dawn1024@zstu.edu.cn
  • 作者简介:冯采伶(2001—),女,硕士生。主要研究方向为服装数字化技术。
  • 基金资助:
    国家自然科学基金项目(12471304)

Real-time detection model for clothing keypoints based on deep learning

FENG Cailing1, YU Shijia2, HAN Shuguang3()   

  1. 1. School of Fashion, Zhejiang Sci-Tech University, Hangzhou, Zhejiang 310018, China
    2. Entrepreneurship School, Zhejiang Polytechnic University of Mechanical and Electrical Engineering, Hangzhou, Zhejiang 310053, China
    3. School of Science, Zhejiang Sci-Tech University, Hangzhou, Zhejiang 310018, China
  • Received:2025-05-06 Revised:2025-11-11 Published:2026-01-15 Online:2026-01-15

摘要:

针对复杂场景下服装关键点检测模型的准确性与实时性难以兼得的问题,以提高检测准确性并保持实时性能为目的,提出了一种基于深度学习的服装关键点实时检测模型。该模型以实时多人姿态估计架构为基础,首先构建中值增强的通道与空间注意力模块,通过并行执行全局平均池化、最大池化与中值池化,融合生成注意力权重,增强服装关键部位的特征表示;其次设计跨尺度特征融合模块,将骨干网络中不同层级的特征图进行上采样、拼接与交叉卷积融合,构建兼具细节信息与语义特征的金字塔结构;进一步建立自注意力特征增强模块,通过计算特征点间相似性动态生成注意力图,自适应调整各区域特征权重;最终实施分类别微调策略,针对6类典型服装分别建立专用模型以优化整体性能。结果表明:该方法在DeepFashion2和DeepFashion数据集上分别达到了65.1%与68.0%的检测准确度,同时保持140.0帧/s和142.3帧/s的实时处理速度。该模型提升了复杂场景下服装关键点检测的综合性能,未来可应用于服装智能制造和虚拟试衣等领域。

关键词: 服装关键点, 实时检测, 深度学习, 自注意力机制, 跨尺度特征融合

Abstract:

Objective The objective of this study is to develop a deep-learning-based real-time detection model for clothing keypoints, aiming to address the challenge of balancing accuracy and real-time performance in complex scenarios. The research focuses on enhancing the robustness and precision of keypoint detection across various clothing types. This is essential for advancing applications in intelligent clothing manufacturing and virtual fitting. Existing models struggle with occlusions, diverse clothing styles, varying keypoint sizes, and real-time performance. This work aims to overcome these limitations while maintaining high computational efficiency.

Method This study proposes the Real-Time Fashion Pose Estimation (RTFPose) model for real-time clothing keypoint detection, based on the RTMPose architecture. RTFPose includes the Median Enhanced Channel and Spatial Attention Module (MECS) to enhance key area features and reduce noise for better occlusion detection. The Cross-Scale Feature Fusion Module (CSFF) integrates multi-scale features to handle varying keypoint sizes. The Self-Attention Feature Enhancement Module (SAFE) focuses on keypoint regions to suppress background interference. Additionally, a finetuning strategy addresses data imbalance.

Results The RTFPose model demonstrated excellent performance on the DeepFashion2 dataset, which achieved a high speed of 140 frames/s with an Area Under the Curve (AUC) value of 65.1%, a significant 6.5% improvement in accuracy compared to the baseline model RTMPose. Additionally, on the DeepFashion dataset, the model achieved the percentage of correct keypoints (P) of 68.0% at a real-time speed of 142.3 frames/s. These results further demonstrate the model's strong performance in keypoint detection accuracy while maintaining high efficiency and validate its good generalization capability across different datasets. The model shows that the introduction of the MECS separately improves the accuracy to 59.5%. Through the channel-spatial attention mechanism, it effectively improves the feature visibility of clothing keypoints in occluded scenes. After stacking the CSFF, the accuracy further increased to 62.0%. This module integrates multi-level features of the backbone network and solves the problem of keypoint size variability by fusing high and low-resolution details and semantic information. After further introducing the SAFE, the performance reached 63.8%. The self-attention mechanism adaptively focused on keypoint areas, reduced background texture interference (such as clothing folds and decorative patterns), and improved feature purity. The final overlay classification fine-tuning strategy achieved a model accuracy of 65.1%. The fine-tuning was achieved by independently training six types of clothing, balancing the accuracy of keypoint detection for each type of clothing. These results highlight the effectiveness of the proposed modules and the fine-tuning strategy in enhancing the robustness and accuracy of the RTFPose model in complex scenarios. The model's ability to maintain high efficiency while improving detection accuracy makes it a valuable solution for real-time clothing key-point detection in various industrial applications.

Conclusion In conclusion, the proposed model effectively balances real-time performance and detection accuracy for clothing keypoint detection. By integrating MECS, CSFF, and SAFE, the model significantly enhances its robustness and accuracy in complex scenarios. Additionally, the fine-tuning strategy effectively addresses data imbalance, improving detection performance across different clothing types. The lightweight design and high efficiency of the proposed model make it particularly valuable for industrial applications such as smart clothing manufacturing and virtual fitting. Future work will focus on three main directions: firstly enhancing the system's adaptability to dynamic scenes to improve robustness and real-time processing capabilities in dynamic environments; secondly, utilizing multimodal data fusion technology to integrate depth information and texture features, thereby improving recognition accuracy; thirdly, adopting a self-supervised learning paradigm to reduce dependence on manual annotation and enhance the model's generalization performance. These advancements will further strengthen the applicability and effectiveness of the proposed model in various industrial settings.

Key words: keypoint of clothing, real-time detection, deep learning, attention enhancement mechanism, cross-scale feature fusion

中图分类号: 

  • TS941.7

图1

RTFPose网络主干图"

图2

MECS模块"

图3

跨尺度特征融合模块"

图4

CCF模块"

图5

自注意力特征增强模块"

图6

自注意力机制"

表1

DeepFashion2验证集上各方法的实验结果对比"

算法 推理速度/
(帧·s-1)
准确率/%
Mask R-CNN[6] 3.0~7.0 52.9
DeepMark[9] 17.8 53.2
DAFE[17] 50.0 54.9
RTMPose[15] 145.6 58.6
DeepMark++ (Hourglass 768×768)[10] 69.9 59.1
Aggregation and Finetuning[7] 4.3 61.2
多尺度空间特征引导方法[8] 10.6~14.1 67.4
YOLO-T-Pose(仅在T恤上训练)[18] 99.0 74.4
YOLO-T-Shirt(仅在T恤上训练)[19] 67.1 76.0
本文算法 140.0 65.1

表2

DeepFashion验证集上各方法的实验结果对比"

算法 推理速度/(帧·s-1) N P/%
FashionNet[16] 0.078 9
DFA[20] 4.5 0.068 0
RTMPose[15] 147.2 0.067 0 66.3
DLAN[21] 5.2 0.064 2
CSPN[22] 1.0 0.056 0
SKDAT[23] 65.0
DBN[24] 67.3
DUKED[25] 70.0
本文算法 142.3 0.059 4 68.0

图7

DeepFashion2上的可视化实验结果"

图8

消融实验可视化结果"

表3

微调前后服装关键点检测准确率"

服装种类 准确率/%
T恤 外套 下装 连衣裙 背心 吊带
微调前 66.7 54.9 68.9 60.1 63.7 59.8
微调后 67.9 57.5 70.3 63.7 65.3 65.9
[1] LI Z, WEI P F, YIN X, et al. Virtual try-on with pose-garment keypoints guided inpainting[C]// 2023 IEEE/CVF International Conference on Computer Vision (ICCV). New York: IEEE, 2024: 22731-22740.
[2] 陈嫒嫒, 李来, 刘光灿, 等. 基于关键点的服装检索[J]. 计算机应用, 2017, 37(11): 3249-3255.
doi: 10.11772/j.issn.1001-9081.2017.11.3249
CHEN Aiai, LI Lai, LIU Guangcan, et al. Clothing retrieval based on landmarks[J]. Journal of Computer Applications, 2017, 37(11): 3249-3255.
doi: 10.11772/j.issn.1001-9081.2017.11.3249
[3] 史英杰, 杨珂, 王建欣, 等. 基于机器学习的时尚穿搭推荐研究综述[J]. 计算机应用研究, 2022, 39(4): 978-985.
SHI Yingjie, YANG Ke, WANG Jianxin, et al. Survey on fashion outfit recommendation research based on machine learning[J]. Application Research of Computers, 2022, 39(4): 978-985.
[4] 李鹏飞, 郑明智, 景军锋. 图像处理在衣服尺寸在线测量中的应用[J]. 电子测量与仪器学报, 2016, 30(8): 1214-1219.
LI Pengfei, ZHENG Mingzhi, JING Junfeng. Application of image processing in on-line clothes size measurement[J]. Journal of Electronic Measurement and Instrumentation, 2016, 30(8): 1214-1219.
[5] 王生伟. 基于图像处理的服装织物质量检测系统[D]. 南京: 东南大学, 2019. 15-49.
WANG Shengwei. Garment and fabric quality inspection system based on image processing[D]. Nanjing: Southeast University, 2019. 15-49.
[6] GE Y Y, ZHANG R M, WANG X G, et al. DeepFashion2: a versatile benchmark for detection, pose estimation, segmentation and re-identification of clothing images[C]// 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). New York: IEEE, 2020: 5332-5340.
[7] LIN T H. Aggregation and finetuning for clothes landmark detection[EB/OL]. [2025-05-04]. https://arxiv.org/abs/2005.00419.
[8] 谢志峰, 周志鹏, 王兆胜, 等. 多尺度空间特征引导的服装关键点检测方法[J]. 计算机辅助设计与图形学学报, 2022, 34(11): 1763-1771.
XIE Zhifeng, ZHOU Zhipeng, WANG Zhaosheng, et al. Multi-scale spatial feature-guided cloth landmark estimation[J]. Journal of Computer-Aided Design & Computer Graphics, 2022, 34(11): 1763-1771.
[9] SIDNEV A, TRUSHKOV A, KAZAKOV M, et al. DeepMark:one-shot clothing detection[C]// 2019 IEEE/CVF International Conference on Computer Vision Workshop (ICCVW). New York: IEEE, 2020: 3201-3204.
[10] SIDNEV A, KRAPIVIN A, TRUSHKOV A, et al. DeepMark++:real-time clothing detection at the edge[C]// 2021 IEEE Winter Conference on Applications of Computer Vision (WACV). New York: IEEE, 2021: 2979-2987.
[11] KIM H J, LEE D H, NIAZ A, et al. Multiple-clothing detection and fashion landmark estimation using a single-stage detector[J]. IEEE Access, 2021, 9: 11694-11704.
doi: 10.1109/Access.6287639
[12] LYU C Q, ZHANG W W, HUANG H A, et al. RTMDet: an empirical study of designing real-time object detectors[EB/OL]. [2025-05-04]. https://arxiv.org/abs/2212.07784.
[13] 孙方伟, 李承阳, 谢永强, 等. 深度学习应用于遮挡目标检测算法综述[J]. 计算机科学与探索, 2022, 16(6): 1243-1259.
doi: 10.3778/j.issn.1673-9418.2112035
SUN Fangwei, LI Chengyang, XIE Yongqiang, et al. Review of deep learning applied to occluded object detection[J]. Journal of Frontiers of Computer Science and Technology, 2022, 16(6): 1243-1259.
[14] 安天一, 李宁, 王超. 深度强化学习模型轻量化算法研究[J]. 计算机科学与应用, 2023(4): 779-788.
AN Tianyi, LI Ning, WANG Chao. Research on lightweight algorithms for deep reinforcement learning[J]. Computer Science and Application, 2023(4): 779-788.
[15] JIANG T, LU P, ZHANG L, et al. RTMPose: real-time multi-person pose estimation based on MMPose[EB/OL].[2025-05-04]. https://arxiv.org/abs/2303.07399.
[16] LIU Z W, LUO P, QIU S, et al. DeepFashion:powering robust clothes recognition and retrieval with rich annotations[C]// 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). New York: IEEE, 2016: 1096-1104.
[17] CHEN M, QIN Y J, QI L Z, et al. Improving fashion landmark detection by dual attention feature enhancement[C]// 2019 IEEE/CVF International Conference on Computer Vision Workshop (ICCVW). New York: IEEE, 2019: 3101-3104.
[18] MAJI D, NAGORI S, MATHEW M, et al. YOLO-pose:enhancing YOLO for multi person pose estimation using object keypoint similarity loss[C]// 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW). New York: IEEE, 2022: 2636-2645.
[19] 陈润林, 史英杰, 杜方. YOLO-T-Shirt: 一种基于级联架构和融合几何信息的T恤关键点检测方法[J]. 北京服装学院学报(自然科学版), 2024, 44(2): 88-96.
CHEN Runlin, SHI Yingjie, DU Fang. YOLO-T-shirt: a T-shirt landmark detection method based on cascade architecture and fusion geometry information[J]. Journal of Beijing Institute of Clothing Technology, 2024, 44(2): 88-96.
[20] LIU Z W, YAN S J, LUO P, et al. Fashion landmark detection in the wild[C]// Computer Vision-ECCV 2016. Cham: Springer, 2016: 229-245.
[21] YAN S J, LIU Z W, LUO P, et al. Unconstrained fashion landmark detection via hierarchical recurrent transformer networks[C]// Proceedings of the 25th ACM International Conference on Multimedia. New York: ACM, 2017: 172-180.
[22] 李维乾, 张紫云, 王海, 等. 级联层叠金字塔网络模型的服装关键点检测[J]. 计算机系统应用, 2020, 29(4): 254-259.
LI Weiqian, ZHANG Ziyun, WANG Hai, et al. Cascaded stacked pyramid network model for key point detection of clothing[J]. Computer Systems & Applications, 2020, 29(4): 254-259.
[23] YING N, ZHANG X W, HU M, et al. Self-supervised keypoint detection based on affine transformation[J]. Journal of the Franklin Institute, 2025, 362(8): 107648.
doi: 10.1016/j.jfranklin.2025.107648
[24] WU T, WANG K, TANG C M, et al. Diffusion-based network for unsupervised landmark detection[J]. Knowledge-Based Systems, 2024, 292: 111627.
doi: 10.1016/j.knosys.2024.111627
[25] HEDLIN E, SHARMA G, MAHAJAN S, et al. Unsupervised keypoints from pretrained diffusion models[C]// 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). New York: IEEE, 2024: 22820-22830.
[1] 庹武, 刘琼洋, 李庆响, 陈谦, 范睿鸽, 李佩. 基于机器视觉与YOLO11n的女西裤尺寸自动测量[J]. 纺织学报, 2025, 46(12): 208-215.
[2] 周青青, 常硕, 毛志平, 吴伟. 人工智能在纺织印染行业中的应用研究进展[J]. 纺织学报, 2025, 46(12): 260-269.
[3] 余志才, 余晓娜, 丁笑君, 顾冰菲. 基于PointNet分类模型的织物三维悬垂模型匹配[J]. 纺织学报, 2025, 46(11): 111-117.
[4] 朱耀麟, 李政, 张强, 陈鑫, 陈锦妮, 张洪松. 基于近红外光谱和多特征网络的羊毛和羊绒定量检测[J]. 纺织学报, 2025, 46(09): 104-111.
[5] 王青, 姜越夫, 赵恬恬, 赵世航, 刘甲怡. 基于深度学习的纱管位姿估计方法及抓取实验[J]. 纺织学报, 2025, 46(07): 217-226.
[6] 顾孟尚, 张宁, 潘如如, 高卫东. 结合频域卷积模块的机织物图像疵点目标检测[J]. 纺织学报, 2025, 46(05): 159-168.
[7] 白雨薇, 徐健, 朱耀麟, 丁展博, 刘晨雨. 基于改进YOLOv8的梳棉机棉网上棉结检测方法[J]. 纺织学报, 2025, 46(03): 56-63.
[8] 黄小源, 侯珏, 杨阳, 刘正. 基于改进深度学习模型的高精度服装样板自动生成[J]. 纺织学报, 2025, 46(02): 236-243.
[9] 蔡丽玲, 王梅, 邵一兵, 陈炜, 曹华卿, 季晓芬. 基于改进堆叠生成对抗网络的传统汉服智能定制推荐[J]. 纺织学报, 2024, 45(12): 180-188.
[10] 刘燕萍, 郭佩瑶, 吴莹. 面向织物疵点检测的深度学习技术应用研究进展[J]. 纺织学报, 2024, 45(12): 234-242.
[11] 李杨, 张永超, 彭来湖, 胡旭东, 袁嫣红. 基于改进甲壳虫全域搜索算法的机织物疵点检测[J]. 纺织学报, 2024, 45(10): 89-94.
[12] 陆寅雯, 侯珏, 杨阳, 顾冰菲, 张宏伟, 刘正. 基于姿态嵌入机制和多尺度注意力的单张着装图像视频合成[J]. 纺织学报, 2024, 45(07): 165-172.
[13] 文嘉琪, 李新荣, 冯文倩, 李瀚森. 印花面料的边缘轮廓快速提取方法[J]. 纺织学报, 2024, 45(05): 165-173.
[14] 池盼盼, 梅琛楠, 王焰, 肖红, 钟跃崎. 基于边缘填充的单兵迷彩伪装小目标检测[J]. 纺织学报, 2024, 45(01): 112-119.
[15] 陆伟健, 屠佳佳, 王俊茹, 韩思捷, 史伟民. 基于改进残差网络的空纱筒识别模型[J]. 纺织学报, 2024, 45(01): 194-202.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!