Recently I was working on Proof Of Concept (POC) for machine health analysis and prediction for IoT data. Machine Learning on IoT, blending Python ML with Microsoft technologies and getting best out of both proved to an article to blog.
In this blog I will be talking about my design for POC. Few more blogs will follow on pre & post processing data analytics, in-depth look at Machine Learning, IoT stream analytics etc.
Project is POC for analytics on food processing machines based on IoT information. Predictive maintenance, Prediction of processing times and Anomaly detection were the key areas of interest.
IoT data from various sensors in processing machines, warehouse sensors for temperature, humidity and density were the variables used for Machine Learning.
There were two vertical analyses.
The project was intended for:
One of the challenges was to assess the metrics to determine the frequency at which sensor readings/data to be aggregated.
There were about 15 sensors per processing machine emitting data every 5 seconds and 5 sensors outside the machine emitting every second. I decided to source aggregated data for every 20 seconds as feed for Machine Learning.
Having performed analytics on IoT data earlier, I knew it is difficult to cut the noise and understand where the actual value lies. I did run few data analytics with live and historic data. Data analytics provided fair view for data cleansing.
I was consulting for a Microsoft house! While learning Data Science technologies, I was told non-Microsoft technologies works best. Even I saw couple of blogs quoting “forget Microsoft, while working with Data Science”. Hence integrating Machine Learning with Microsoft technologies seemed challenging, initially.
The solution discussed here is an IoT solution leveraging the power of the Microsoft BI suite for ETL/visualization and Machine Learning using Python. For POC IoT data stored in Azure Cloud was used. I proposed to use Azure IoT hub and Stream analytics for production use.
App Services bridge ETL, Machine Learning and Power BI. SSIS, Windows Application, Flask, Reporting services were part of App Services.
Below diagram shows the many paths that data takes during processing.
Microsoft SSIS was used as ETL platform.
Advantages with Microsoft SQL Server Integration Services (SSIS) are
Data pre processing and initial EDA was performed using Python. All the process data were read from SQL Data warehouse onto Python for further processing where imputations and featuring engineering of the data was performed.
During POC, most common missing data were due to the processing materials blurring the sensors. I imputed data based on sensors in the same phase of machine and historic data from Data Ware house.
For machines we were predicting processing time, environmental factors were playing key role. Key aspect of feature engineering was deducing processing season.
In this blog models discussed are for the prediction of processing time based on various sensors and environmental factors.
Data was split into training and test data. I used stratified split to split the test/training data to have balanced products in both. To obtain a validation set, further split the above training set. For this, I used a random split such that our new, smaller training set constituted roughly 60% of the overall product-specific data and the validation set constituted roughly 20%.
I decided to go with simple models for POC for the sake of interpretability. Regularized linear model (Lasso) was used as it performs both variable selection and regularization and also gives better prediction accuracy.
I am planning for exclusive blog on ML in future.
ML model was exposed as Web-app using Flask. ETL process (SSIS) was run to get the predictions of the live data and persisted to Data Warehouse which was the golden source for all visualizations. Also, SSIS pushed data to Power BI for quick insights and alerts.
Visualizations and analytics were delivered using Reporting Services and Power BI. Reports and dashboards were made available via Online, embedded with Windows applications, mobile devices and options to subscribe to reports.
Alerts at various process and status of analytics were pushed to users via Mobile app. Microsoft Flow pushed email in large scale via email. Data driven alerts were delivered using reporting services.