基于深度学习的服装关键点实时检测模型

doi:10.13475/j.fzxb.20250500501

Abstract

Abstract:

Objective The objective of this study is to develop a deep-learning-based real-time detection model for clothing keypoints, aiming to address the challenge of balancing accuracy and real-time performance in complex scenarios. The research focuses on enhancing the robustness and precision of keypoint detection across various clothing types. This is essential for advancing applications in intelligent clothing manufacturing and virtual fitting. Existing models struggle with occlusions, diverse clothing styles, varying keypoint sizes, and real-time performance. This work aims to overcome these limitations while maintaining high computational efficiency.

Method This study proposes the Real-Time Fashion Pose Estimation (RTFPose) model for real-time clothing keypoint detection, based on the RTMPose architecture. RTFPose includes the Median Enhanced Channel and Spatial Attention Module (MECS) to enhance key area features and reduce noise for better occlusion detection. The Cross-Scale Feature Fusion Module (CSFF) integrates multi-scale features to handle varying keypoint sizes. The Self-Attention Feature Enhancement Module (SAFE) focuses on keypoint regions to suppress background interference. Additionally, a finetuning strategy addresses data imbalance.

Results The RTFPose model demonstrated excellent performance on the DeepFashion2 dataset, which achieved a high speed of 140 frames/s with an Area Under the Curve (AUC) value of 65.1%, a significant 6.5% improvement in accuracy compared to the baseline model RTMPose. Additionally, on the DeepFashion dataset, the model achieved the percentage of correct keypoints (P) of 68.0% at a real-time speed of 142.3 frames/s. These results further demonstrate the model's strong performance in keypoint detection accuracy while maintaining high efficiency and validate its good generalization capability across different datasets. The model shows that the introduction of the MECS separately improves the accuracy to 59.5%. Through the channel-spatial attention mechanism, it effectively improves the feature visibility of clothing keypoints in occluded scenes. After stacking the CSFF, the accuracy further increased to 62.0%. This module integrates multi-level features of the backbone network and solves the problem of keypoint size variability by fusing high and low-resolution details and semantic information. After further introducing the SAFE, the performance reached 63.8%. The self-attention mechanism adaptively focused on keypoint areas, reduced background texture interference (such as clothing folds and decorative patterns), and improved feature purity. The final overlay classification fine-tuning strategy achieved a model accuracy of 65.1%. The fine-tuning was achieved by independently training six types of clothing, balancing the accuracy of keypoint detection for each type of clothing. These results highlight the effectiveness of the proposed modules and the fine-tuning strategy in enhancing the robustness and accuracy of the RTFPose model in complex scenarios. The model's ability to maintain high efficiency while improving detection accuracy makes it a valuable solution for real-time clothing key-point detection in various industrial applications.

Conclusion In conclusion, the proposed model effectively balances real-time performance and detection accuracy for clothing keypoint detection. By integrating MECS, CSFF, and SAFE, the model significantly enhances its robustness and accuracy in complex scenarios. Additionally, the fine-tuning strategy effectively addresses data imbalance, improving detection performance across different clothing types. The lightweight design and high efficiency of the proposed model make it particularly valuable for industrial applications such as smart clothing manufacturing and virtual fitting. Future work will focus on three main directions: firstly enhancing the system's adaptability to dynamic scenes to improve robustness and real-time processing capabilities in dynamic environments; secondly, utilizing multimodal data fusion technology to integrate depth information and texture features, thereby improving recognition accuracy; thirdly, adopting a self-supervised learning paradigm to reduce dependence on manual annotation and enhance the model's generalization performance. These advancements will further strengthen the applicability and effectiveness of the proposed model in various industrial settings.

Key words: keypoint of clothing, real-time detection, deep learning, attention enhancement mechanism, cross-scale feature fusion

CLC Number:

TS941.7

FENG Cailing, YU Shijia, HAN Shuguang. Real-time detection model for clothing keypoints based on deep learning[J].Journal of Textile Research, 2026, 47(1): 196-206.

Add to citation manager EndNote|Reference Manager|ProCite|BibTeX|RefWorks

URL: http://www.fzxb.org.cn/EN/10.13475/j.fzxb.20250500501

http://www.fzxb.org.cn/EN/Y2026/V47/I1/196

Figures/Tables 11

Fig.1

Fig.2

Fig.3

Fig.4

Fig.5

Fig.6

Tab.1

Tab.2

Fig.7

Fig.8

Tab.3

References 25

[1]	LI Z, WEI P F, YIN X, et al. Virtual try-on with pose-garment keypoints guided inpainting[C]// 2023 IEEE/CVF International Conference on Computer Vision (ICCV). New York: IEEE, 2024: 22731-22740.
[2]	陈嫒嫒, 李来, 刘光灿, 等. 基于关键点的服装检索[J]. 计算机应用, 2017, 37(11): 3249-3255. doi: 10.11772/j.issn.1001-9081.2017.11.3249
	CHEN Aiai, LI Lai, LIU Guangcan, et al. Clothing retrieval based on landmarks[J]. Journal of Computer Applications, 2017, 37(11): 3249-3255. doi: 10.11772/j.issn.1001-9081.2017.11.3249
[3]	史英杰, 杨珂, 王建欣, 等. 基于机器学习的时尚穿搭推荐研究综述[J]. 计算机应用研究, 2022, 39(4): 978-985.
	SHI Yingjie, YANG Ke, WANG Jianxin, et al. Survey on fashion outfit recommendation research based on machine learning[J]. Application Research of Computers, 2022, 39(4): 978-985.
[4]	李鹏飞, 郑明智, 景军锋. 图像处理在衣服尺寸在线测量中的应用[J]. 电子测量与仪器学报, 2016, 30(8): 1214-1219.
	LI Pengfei, ZHENG Mingzhi, JING Junfeng. Application of image processing in on-line clothes size measurement[J]. Journal of Electronic Measurement and Instrumentation, 2016, 30(8): 1214-1219.
[5]	王生伟. 基于图像处理的服装织物质量检测系统[D]. 南京: 东南大学, 2019. 15-49.
	WANG Shengwei. Garment and fabric quality inspection system based on image processing[D]. Nanjing: Southeast University, 2019. 15-49.
[6]	GE Y Y, ZHANG R M, WANG X G, et al. DeepFashion2: a versatile benchmark for detection, pose estimation, segmentation and re-identification of clothing images[C]// 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). New York: IEEE, 2020: 5332-5340.
[7]	LIN T H. Aggregation and finetuning for clothes landmark detection[EB/OL]. [2025-05-04]. https://arxiv.org/abs/2005.00419.
[8]	谢志峰, 周志鹏, 王兆胜, 等. 多尺度空间特征引导的服装关键点检测方法[J]. 计算机辅助设计与图形学学报, 2022, 34(11): 1763-1771.
	XIE Zhifeng, ZHOU Zhipeng, WANG Zhaosheng, et al. Multi-scale spatial feature-guided cloth landmark estimation[J]. Journal of Computer-Aided Design & Computer Graphics, 2022, 34(11): 1763-1771.
[9]	SIDNEV A, TRUSHKOV A, KAZAKOV M, et al. DeepMark:one-shot clothing detection[C]// 2019 IEEE/CVF International Conference on Computer Vision Workshop (ICCVW). New York: IEEE, 2020: 3201-3204.
[10]	SIDNEV A, KRAPIVIN A, TRUSHKOV A, et al. DeepMark++:real-time clothing detection at the edge[C]// 2021 IEEE Winter Conference on Applications of Computer Vision (WACV). New York: IEEE, 2021: 2979-2987.
[11]	KIM H J, LEE D H, NIAZ A, et al. Multiple-clothing detection and fashion landmark estimation using a single-stage detector[J]. IEEE Access, 2021, 9: 11694-11704. doi: 10.1109/Access.6287639
[12]	LYU C Q, ZHANG W W, HUANG H A, et al. RTMDet: an empirical study of designing real-time object detectors[EB/OL]. [2025-05-04]. https://arxiv.org/abs/2212.07784.
[13]	孙方伟, 李承阳, 谢永强, 等. 深度学习应用于遮挡目标检测算法综述[J]. 计算机科学与探索, 2022, 16(6): 1243-1259. doi: 10.3778/j.issn.1673-9418.2112035
	SUN Fangwei, LI Chengyang, XIE Yongqiang, et al. Review of deep learning applied to occluded object detection[J]. Journal of Frontiers of Computer Science and Technology, 2022, 16(6): 1243-1259.
[14]	安天一, 李宁, 王超. 深度强化学习模型轻量化算法研究[J]. 计算机科学与应用, 2023(4): 779-788.
	AN Tianyi, LI Ning, WANG Chao. Research on lightweight algorithms for deep reinforcement learning[J]. Computer Science and Application, 2023(4): 779-788.
[15]	JIANG T, LU P, ZHANG L, et al. RTMPose: real-time multi-person pose estimation based on MMPose[EB/OL].[2025-05-04]. https://arxiv.org/abs/2303.07399.
[16]	LIU Z W, LUO P, QIU S, et al. DeepFashion:powering robust clothes recognition and retrieval with rich annotations[C]// 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). New York: IEEE, 2016: 1096-1104.
[17]	CHEN M, QIN Y J, QI L Z, et al. Improving fashion landmark detection by dual attention feature enhancement[C]// 2019 IEEE/CVF International Conference on Computer Vision Workshop (ICCVW). New York: IEEE, 2019: 3101-3104.
[18]	MAJI D, NAGORI S, MATHEW M, et al. YOLO-pose:enhancing YOLO for multi person pose estimation using object keypoint similarity loss[C]// 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW). New York: IEEE, 2022: 2636-2645.
[19]	陈润林, 史英杰, 杜方. YOLO-T-Shirt: 一种基于级联架构和融合几何信息的T恤关键点检测方法[J]. 北京服装学院学报(自然科学版), 2024, 44(2): 88-96.
	CHEN Runlin, SHI Yingjie, DU Fang. YOLO-T-shirt: a T-shirt landmark detection method based on cascade architecture and fusion geometry information[J]. Journal of Beijing Institute of Clothing Technology, 2024, 44(2): 88-96.
[20]	LIU Z W, YAN S J, LUO P, et al. Fashion landmark detection in the wild[C]// Computer Vision-ECCV 2016. Cham: Springer, 2016: 229-245.
[21]	YAN S J, LIU Z W, LUO P, et al. Unconstrained fashion landmark detection via hierarchical recurrent transformer networks[C]// Proceedings of the 25th ACM International Conference on Multimedia. New York: ACM, 2017: 172-180.
[22]	李维乾, 张紫云, 王海, 等. 级联层叠金字塔网络模型的服装关键点检测[J]. 计算机系统应用, 2020, 29(4): 254-259.
	LI Weiqian, ZHANG Ziyun, WANG Hai, et al. Cascaded stacked pyramid network model for key point detection of clothing[J]. Computer Systems & Applications, 2020, 29(4): 254-259.
[23]	YING N, ZHANG X W, HU M, et al. Self-supervised keypoint detection based on affine transformation[J]. Journal of the Franklin Institute, 2025, 362(8): 107648. doi: 10.1016/j.jfranklin.2025.107648
[24]	WU T, WANG K, TANG C M, et al. Diffusion-based network for unsupervised landmark detection[J]. Knowledge-Based Systems, 2024, 292: 111627. doi: 10.1016/j.knosys.2024.111627
[25]	HEDLIN E, SHARMA G, MAHAJAN S, et al. Unsupervised keypoints from pretrained diffusion models[C]// 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). New York: IEEE, 2024: 22820-22830.

Related Articles 15

[1]	TUO Wu, LIU Qiongyang, LI Qingxiang, CHEN Qian, FAN Ruige, LI Pei. Automatic size measurement of women's trousers based on machine vision and YOLO11n [J]. Journal of Textile Research, 2025, 46(12): 208-215.
[2]	ZHOU Qingqing, CHANG Shuo, MAO Zhiping, WU Wei. Research progress in applications of artificial intelligence in dyeing and finishing industry [J]. Journal of Textile Research, 2025, 46(12): 260-269.
[3]	YU Zhicai, YU Xiaona, DING Xiaojun, GU Bingfei. Matching of three-dimensional fabric drape models based on PointNet classification modeling [J]. Journal of Textile Research, 2025, 46(11): 111-117.
[4]	ZHU Yaolin, LI Zheng, ZHANG Qiang, CHEN Xin, CHEN Jinni, ZHANG Hongsong. Quantitative detection of wool and cashmere based on near infrared spectroscopy and multi-feature network [J]. Journal of Textile Research, 2025, 46(09): 104-111.
[5]	WANG Qing, JIANG Yuefu, ZHAO Tiantian, ZHAO Shihang, LIU Jiayi. Pose estimation and bobbin grasping based on deep learning methods [J]. Journal of Textile Research, 2025, 46(07): 217-226.
[6]	GU Mengshang, ZHANG Ning, PAN Ruru, GAO Weidong. Object detection of weaving fabric defects using frequency-domain convolution modules [J]. Journal of Textile Research, 2025, 46(05): 159-168.
[7]	BAI Yuwei, XU Jian, ZHU Yaolin, DING Zhanbo, LIU Chenyu. Image detection of cotton nep in carding net based on improved YOLOv8 [J]. Journal of Textile Research, 2025, 46(03): 56-63.
[8]	HUANG Xiaoyuan, HOU Jue, YANG Yang, LIU Zheng. Automatic generation of high-precision garment patterns based on improved deep learning model [J]. Journal of Textile Research, 2025, 46(02): 236-243.
[9]	CAI Liling, WANG Mei, SHAO Yibing, CHEN Wei, CAO Huaqing, JI Xiaofen. Intelligent customization recommendation for traditional Hanfu based on improved stack-generative adversarial network [J]. Journal of Textile Research, 2024, 45(12): 180-188.
[10]	LIU Yanping, GUO Peiyao, WU Ying. Research progress in deep learning technology for fabric defect detection [J]. Journal of Textile Research, 2024, 45(12): 234-242.
[11]	LI Yang, ZHANG Yongchao, PENG Laihu, HU Xudong, YUAN Yanhong. Fabric defect detection based on improved cross-scene Beetle global search algorithm [J]. Journal of Textile Research, 2024, 45(10): 89-94.
[12]	LU Yinwen, HOU Jue, YANG Yang, GU Bingfei, ZHANG Hongwei, LIU Zheng. Single dress image video synthesis based on pose embedding and multi-scale attention [J]. Journal of Textile Research, 2024, 45(07): 165-172.
[13]	WEN Jiaqi, LI Xinrong, FENG Wenqian, LI Hansen. Rapid extraction of edge contours of printed fabrics [J]. Journal of Textile Research, 2024, 45(05): 165-173.
[14]	LU Weijian, TU Jiajia, WANG Junru, HAN Sijie, SHI Weimin. Model for empty bobbin recognition based on improved residual network [J]. Journal of Textile Research, 2024, 45(01): 194-202.
[15]	CHI Panpan, MEI Chennan, WANG Yan, XIAO Hong, ZHONG Yueqi. Single soldier camouflage small target detection based on boundary-filling [J]. Journal of Textile Research, 2024, 45(01): 112-119.

Metrics

Viewed

Full text

Abstract

Cited

Shared

Discussed

Comments

Recommended 0

No Suggested Reading articles found!

算法	推理速度/ (帧·s^-1)	准确率/%
Mask R-CNN^[6]	3.0~7.0	52.9
DeepMark^[9]	17.8	53.2
DAFE^[17]	50.0	54.9
RTMPose^[15]	145.6	58.6
DeepMark++ (Hourglass 768×768)^[10]	69.9	59.1
Aggregation and Finetuning^[7]	4.3	61.2
多尺度空间特征引导方法^[8]	10.6~14.1	67.4
YOLO-T-Pose(仅在T恤上训练)^[18]	99.0	74.4
YOLO-T-Shirt(仅在T恤上训练)^[19]	67.1	76.0
本文算法	140.0	65.1

算法	推理速度/(帧·s^-1)	N	P/%
FashionNet^[16]	—	0.078 9	—
DFA^[20]	4.5	0.068 0	—
RTMPose^[15]	147.2	0.067 0	66.3
DLAN^[21]	5.2	0.064 2	—
CSPN^[22]	1.0	0.056 0	—
SKDAT^[23]	—	—	65.0
DBN^[24]	—	—	67.3
DUKED^[25]	—	—	70.0
本文算法	142.3	0.059 4	68.0

服装种类	准确率/%
服装种类	T恤	外套	下装	连衣裙	背心	吊带
微调前	66.7	54.9	68.9	60.1	63.7	59.8
微调后	67.9	57.5	70.3	63.7	65.3	65.9

Real-time detection model for clothing keypoints based on deep learning

RichHTML

PDF (PC)