
#Noise mapping journal series
Then missing values are inputted with predicted values using a built classifier model.Īn end-to-end methodology for replacing erroneous values in time series datasets of multi-stage processes is proposed. Noisy erroneous values in the batched dataset are first replaced with missing values. The objective was to denoise the dataset by replacing the erroneous values with values predicted using classification modelling. In addition to the dataset being multivariate and of medium dimensionality, it contained a lot of erroneous values. The 116-feature, 14,088-entries dataset included timeseries information for: ambient factory conditions material property specifications machine operating parameters and dimensional characteristics of the output exiting from each of the two stages. A dataset that was generated using a Manufacturing Execution System was used for the research. Given that such datasets may have hundreds to thousands of sensor-generated parameters, feature set reduction through feature relevance ranking is considered for boosting the performance of the prediction model. This research explores the use of classification prediction modelling to replace erroneous values in time series datasets of multi-stage manufacturing processes. It may be infeasible or expensive to change sensors or the system, and so decision makers are faced with no other option than to rely on noisy datasets. Where noisy values are significant in the dataset, they inhibit machine learning performance and can also lead to false predictions. Additionally, noisy datasets are difficult to interpret by those not familiar with the system. Erroneous sensor values can lead to inaccurate decisions, for example, raising false alarms on defects. In other processes, such as extrusion lines, the extruded product may be stained, for example with cooling liquids, and this can cause sensors to misread. They are also prone to generate erroneous values when sensing the product feature characteristics in high-speed production lines, for example where the product frequently shifts from desired position. In real-life settings, sensors are perturbed by mechanical vibrations and electronic signals. In the future, the plan is to inject the prediction models into streaming data to simultaneously enable erroneous value correction and predictive process monitoring in real-time.ĭata driven decisions in manufacturing systems rely much on time-sequenced readings from multiple time-synchronized sensors. The methodology is useful for both missing and invalid value correction in process datasets. There is a paucity of this type of methodology for dealing with invalid entries in process datasets. The results indicate that the methodology is able to replace erroneous values with likely true values, to a very high degree of accuracy. To do this systematically, the process flow direction and stages in the manufacturing process are exploited to partition the dataset into subsets for model building. With many attributes having a significant number of erroneous values, the invalid values replacement is done one attribute at a time. Finally, predicted values are inserted into the dataset to fill in the missing entries. A Random Forest classifier model is built to predict replacement values for the missing values. Then, ReliefF algorithm is used to select the most relevant features to progress for prediction modelling, and also to boost the performance of the prediction model. In the methodology, invalid values specified by process owners are first converted to missing values. Using a multi-stage, multi-output process dataset as an experimental case, this article reports a methodology for replacing erroneous sensor values with their predicted likely values. Noisy datasets inhibit machine learning and knowledge discovery. Industrial sensor systems are prone to transmit inaccurate readings, which result in noisy datasets. In manufacturing processes, datasets intended for data driven decisions are majorly generated from time-sequenced sensor readings.
