Machine Learning Basic Steps
An end-to-end Machine Learning project is a big task and has a lot of steps before achieving the end result. It starts with the conception of an idea, to its implementation and all the way up to deployment. Even after deployment we need to keep on checking its performance and update it if needed. Let’s try to understand these steps in more details.
The Basic Steps
These steps are not hard and fast rules to be followed but more of a guideline. There maybe a lot of sub steps inside each of them. But the overall process should remain the same
Problem Statement : The first and foremost thing is to know the problem statement and how ML can possibly help in it. For example it could be a recommendation system for movies, image classifier etc. Depending on the End Goal , the data source and the features/data columns can be determined. Thus the first step is to define the task and the aim of the task.
Data Collection : The next step would be to get hold of the relevant data that can be used for our task. We can get this data by scraping or collecting from already available databases etc. In this task we will collect as much as raw data is available without any data transformation.
Data Cleaning : The raw data from above step may not be proper i.e it may have missing values or there will be lot of string data and so on. The missing data needs to be either filled or not considered i.e the whole row could be discarded. There are different ways in which the data will be substituted like filling with mean or the most common data etc.
Feature Selection : In this the different features or columns are analyzed and the best ones out of those are selected. This can be done by using techniques like filter methods, wrapper methods, through visualization etc. This will also be helpful in reducing the memory imprint thereby improving performance if done correctly.
Model Selection and Application : Once the data has been cleaned and features decided, we will use different algorithms to check if we are able to get the desired accuracy. There are lot of algorithm to choose from. One should start from the easiest to more complex and check their performance on time, memory and accuracy.
Deployment and Maintenance : Once we get the desired result from our model we have to deploy them where the users will utilize them. This can be done on cloud or an application on web etc. Once deployed their performance will be monitored so that if it falls below a certain criteria, the algorithm has to be updated or retrained.
The below Diagram summarizes the whole process:
So these are the broad steps which have to be undertaken for a successful ML deployment. There maybe many smaller steps inside each of these like model fine tuning, validating results and so on.
References:
- A lot of googling amongst which the major sources were medium.com, Hands — On Machine Learning with Scikit-Learn, Keras and Tensorflow
Originally published at http://evrythngunder3d.wordpress.com on November 3, 2021.