• CN:11-2187/TH
  • ISSN:0577-6686

机械工程学报 ›› 2019, Vol. 55 ›› Issue (17): 133-144.doi: 10.3901/JME.2019.17.133

• 数字化设计与制造 • 上一篇    

面向制造过程数据的两阶段无监督特征选择方法

张洁1,2, 盛夏1, 张朋1, 秦威1, 赵新明1   

  1. 1. 上海交通大学机械与动力工程学院 上海 200240;
    2. 东华大学机械工程学院 上海 201620
  • 收稿日期:2018-09-19 修回日期:2019-02-03 发布日期:2020-01-07
  • 通讯作者: 张洁(通信作者),女,1963年出生,教授。主要研究方向为制造系统的建模、优化调度与控制、制造信息工程和智能制造系统。E-mail:mezhangjie@dhu.edu.cn
  • 作者简介:盛夏,男,1993年出生。主要研究方向为数字化制造与智能制造。E-mail:shengxia0111@sjtu.edu.cn;张朋,男,1988年出生,博士研究生。主要研究方向为制造系统优化调度与控制E-mail:pgnahz@qq.com;秦威,男,1985年出生,副教授。主要研究方向为复杂(制造)系统的建模、控制与优化,制造大数据分析,制造智能、人工智能在制造业的应用。E-mail:wqin@sjtu.edu.cn;赵新明,女,1960年出生,副教授。主要研究方向为机械设计及图形处理技术。E-mail:zhaoxm@sjtu.edu.cn
  • 基金资助:
    国家自然科学基金资助项目(U1537110,51435009)。

Two-stage Unsupervised Feature Selection Method Oriented to Manufacturing Procedural Data

ZHANG Jie1,2, SHENG Xia1, ZHANG Peng1, QIN Wei1, ZHAO Xinming1   

  1. 1. Institute of Intelligent Manufacturing and Information Engineering, Shanghai Jiao Tong University, Shanghai 200240;
    2. College of Mechanical Engineering, Donghua University, Shanghai 201620
  • Received:2018-09-19 Revised:2019-02-03 Published:2020-01-07

摘要: 现代化制造车间无时无刻不在产生大量数据,其中绝大部分以无标签结构化原始数据的形式存储在现代化制造企业的工业大数据平台中。这些制造数据一方面具有很大的潜在价值,另一方面因为其具有高噪声、高冗余性的特点,难以直接分析与利用。因此,针对制造过程原始数据的特点,以去除制造数据冗余性、挖掘原始数据局部结构为目的,提出一种两阶段无监督特征选择方法。该方法的第一阶段采用遗传算法产生的原始数据的低维子集作为径向基神经网络(Radial basis fuction neural network,RBFNN)的输入,利用RBFNN复现原始数据的全部维度,并以降维率及复现精度作为遗传算法(Genetic algorithm,GA)的适应度函数,通过GA多次迭代学习高维特征的低维表示,删除原始数据集中的冗余特征与噪声特征。第二阶段采用拉普拉斯特征得分(Laplacian score,LS)逐维评价剩余特征对于反映数据局部几何结构的作用,挖掘对改善分类性能更有效的特征。通过与LS等无监督特征选择算法对比,验证了提出的两阶段无监督特征选择方法能够有效降低制造数据的冗余性,并提高数据的分类性能。

关键词: 无监督特征选择, 遗传算法, 径向基神经网络, 拉普拉斯得分, 制造过程数据

Abstract: In a modernized manufacturing workshop, myriads of data are incessantly produced and a large part of those are stored in the industrial big data platform of the modern manufacturing enterprise in the form of structuralized unlabeled raw data. Those manufacturing data are of great latent exploitative value, because of their characteristics of high-noise and high-redundancy, however, direct analysis and utilization of them are impossible. Aiming at reducing the redundancy of manufacturing procedural data and excavating their local structure, a two-stage unsupervised feature selection method is proposed. In the first stage of the method, subset of the original feature set generated by genetic algorithm(GA) is utilized as the input features of radius basis function neural network(RBFNN), to reconstruct the unabridged original feature set. The ratio of dimensionality reduction and reconstructional accuracy are calculated jointly as the fitness function of GA, which is optimized by iteration to learn a low-dimensional representation of high-dimensional features, removing redundant and noisy features of the origin feature set. In the second stage, Laplacian score(LS) is employed to evaluate the locality preserving power of the remaining features, unearthing features which are prone to improving the performance of classification. By comparing with other unsupervised feature selection method, the method proposed here is proven more effective in reducing the redundancy of manufacturing data and simultaneously enhancing the performance of classification.

Key words: unsupervised feature selection, genetic algorithm, radius basis function neural network, Laplacian score, manufacturing procedural data

中图分类号: