Machine Learning — The important keywords (Part 1)

Shrinand Kadekodi
4 min readMar 21, 2020

--

Machine Learning has been one of the most sought after skill sets. For beginners the start maybe a bit overwhelming. This post is to make the beginner aware of the terms used in Machine Learning. Hopefully it will be helpful to the readers on their journey so that they can understand the terms better. So lets dig in!

Dataset:

The first term you will come into contact is Dataset. This is the data in which is the basic requirement for any Machine Learning algorithm. This data can be an excel, csv or a database. Dataset is formed from Features and multiple Instances. We will see these terms below. There are different Datasets used in ML. We will go through them one by one.

Training Dataset: This dataset is used for training the model or ML algorithm. By taking this data as input the algorithm will try to find the equation to fit the data.

Test Dataset: This Dataset is used to see how well the algorithm works i.e how well it gives correct output. This can be used to check the accuracy with which the algorithm works.

Instance:

An instance is a row of data in the dataset. Each row gives some information about the domain. It is also described by its attributes.

Features/Attributes:

The columns of the dataset are called as Feature/Attribute. Its a measurable property of the object to analyze. So some will be inputs to model while others will be needed to be predicted.

The below image may give a better idea of the above concept:

Model and Algorithm:

Algorithm are methods and procedures used to get a task done or to solve a problem. These are mathematical techniques derived by mathematicians and statisticians to get the desired result. Model are well-defined computations formed as a result of an algorithm that takes set of values as input and produces some value/set of values as output.

It is sometimes seen that the words algorithm and model maybe interchangeably used. But now you know what it means 😉

Bias:

Bias is the inherent prejudice in the observed result due to faulty assumption. This happens due to different factors one among which is when the data is skewed. An example of bias would be a data set wherein the data of one group maybe predominant over other. This will produce a ML model with bias to the predominant category. Also bias creeps in when we simplify the assumptions made by a model to make the target function easier to learn. Simple algorithms like linear algorithms (classification or regression) have high bias.

Algorithms having higher bias are also said to underfit. Underfitting models cause an algorithm to miss the relevant relations between features and target outputs. Though the models maybe simple which gives fast performance but its accuracy is affected which won’t be of much use in real life cases.

Variance:

Variance, in the context of Machine Learning, is a type of error that occurs due to a model’s sensitivity to small fluctuations in the training set. If variance is high then the model will take into consideration even the noise in data which will adversely affect the accuracy. This leads to complex model which will work well with test data but may not work for other data sets. The changes in estimate are huge when there is any small change in training data set while training the model. Algorithms like Decision trees, knn etc have higher variance.

Higher variance algorithms are said to overfit. Overfitting models try to learn even the most random of things irrespective of original data and follow them too closely. These models often are complex and take more time.

Bias-Variance Trade off:

The goal of any ML algorithm is to have low bias and low variance. Balancing these two factors to get optimum output is the goal of any ML algorithm. This balancing act is the bias-variance trade off. There are different techniques which can be used for achieving this in algorithms. For example in knn which has low bias and high variance, increasing k will increase the bias while reducing variance. This gives us the fact that there is no way in escaping this trade off. Increasing variance will decrease bias while increasing bias will decrease variance 😢.

This is the end of Part 1 for now. I will keep on adding more terms in upcoming blogs. Feel free to correct me in any terms above and let me know below if you want any specific terms request.

Links to learn more from:
- https://datascience.stackexchange.com/questions/37345/what-is-the-meaning-of-term-variance-in-machine-learning-model
- https://becominghuman.ai/machine-learning-bias-vs-variance-641f924e6c57
- https://machinelearningmastery.com/gentle-introduction-to-the-bias-variance-trade-off-in-machine-learning/
- https://www.quora.com/What-is-the-difference-between-an-algorithm-and-a-model-in-machine-learning
- https://www.datarobot.com/wiki/feature/
- https://machinelearningmastery.com/data-learning-and-modeling/
- https://towardsdatascience.com/understanding-and-reducing-bias-in-machine-learning-6565e23900ac

Originally published at http://evrythngunder3d.wordpress.com on March 21, 2020.

--

--

Shrinand Kadekodi
Shrinand Kadekodi

Written by Shrinand Kadekodi

Simply curious about everything!

No responses yet