Researchers develop new feature selection method for limited-sample industrial data

Celebrity Gig
Framework of the proposed robust feature selection method. Credit: NIMTE

A research team from the Ningbo Institute of Materials Technology and Engineering of the Chinese Academy of Sciences has introduced a novel feature selection method by removing noise entropy within mutual information. This study was published in IEEE Transactions on Industrial Informatics.

Feature selection, a critical step in machine learning and data mining, aims to reduce dimensionality by eliminating irrelevant or redundant features, thereby improving model performance. However, industrial data, often characterized by small sample sizes and high dimensionality, pose significant challenges, including high computational costs and the risk of overfitting.

Traditional methods struggle to maintain accuracy when dealing with such data, particularly in the presence of sensor noise, which can distort mutual information metrics and degrade classification performance.

READ ALSO:  Civil society groups demand action against 'sexist' AI disinformation

To overcome these limitations, the research team proposed an approach by modeling feature noise as a censored normal distribution. Leveraging the principle of maximum entropy, they determined the entropy of noise by solving the variance equation in transmission.

Additionally, the researchers developed a noise-free mutual information metric to assess the relevance of a label and noise-corrupted features. Thus, the entropy of unknown feature noise within mutual information was removed while retaining noisy samples, eliminating the impact of noise in classification with limited samples.

The proposed method outperforms conventional techniques by providing a more reliable assessment of noise across all noisy samples. Building on this, the researchers introduced a novel criterion called Maximal Noise-Free Relevance and Minimal Redundancy (MNFR-MR), which ensures robust feature selection.

READ ALSO:  Why sociolinguistics holds the key to better LLMs and a fairer world

This approach addresses a critical bottleneck in processing industrial data, particularly in scenarios where sample sizes are constrained. As industries increasingly adopt data-driven technologies such as the Industrial Internet of Things (IIoT) and digital twins, this method holds significant promise for unlocking actionable insights and improving decision-making across various domains.

This study not only advances the theoretical understanding of feature selection in noisy, high-dimensional datasets but also offers practical solutions for real-world industrial applications, paving the way for more accurate and efficient data-driven intelligence.

READ ALSO:  New algorithm helps enhance LLM collaboration for smarter, more efficient solutions

More information:
Chan Xu et al, Robust Feature Selection by Removing Noise Entropy Within Mutual Information for Limited-Sample Industrial Data, IEEE Transactions on Industrial Informatics (2025). DOI: 10.1109/TII.2025.3534417

Provided by
Chinese Academy of Sciences


Citation:
Researchers develop new feature selection method for limited-sample industrial data (2025, March 12)
retrieved 12 March 2025
from

This document is subject to copyright. Apart from any fair dealing for the purpose of private study or research, no
part may be reproduced without the written permission. The content is provided for information purposes only.

Categories

Share This Article
Leave a comment