MACHINE LEARNING ALGORITHMS

Machine Learning  Algorithms

And it’s Overview

 


 

ABSTRACT

In this paper, various machine learning algorithms have been discussed. These algorithms are used for various purposes like data mining, image processing, predictive analytics, etc. to name a few. The main advantage of using machine learning is that, once an algorithm learns what to do with data, it can do its work automatically.

Keywords – Machine learning, algorithms, pseudo code

(1)    INTRODUCTION

 

In Machine Learning, the algorithm is just the method that you’ll use to find those relationships within your data. Algorithms can be complex or simple, big or small, or any permutation of things: but at the core, they’re just ways of figuring out what, if anything, drives the changes you’re trying to predict. Heard of Deep Learning? That’s (basically) a type of algorithm. Much like the MergeSort algorithm is efficient at sorting arrays, Machine Learning algorithms are efficient at surfacing relationships and associations.Internet and its applications have become an integral part of today’s human lifestyle. It has become an essential tool in every aspect. Due to the tremendous demand and necessity, researchers went beyond connecting just computers into the web. These researches led to the birth of a sensational gizmo, Internet of Things (IoT). Communication over the internet has grown from user -user interaction to device –device interactions these days. The IoT concepts were proposed years back but still it’s in the initial stage of commercial deployment. Home automation industry and transportation industries are seeing rapid growth with IoT. Yet not many articles have been published in this field of study. Since most of the process is done through the internet, we must have an active high-speed internet connection. The technology can be simply explained as a connection between humans-computers-things. All the equipment’s we use in our day to day life can be controlled and monitored using the IoT.

Machine learning is used to teach machines how to handle the data more efficiently. Sometimes after viewing the data, we cannot interpret the pattern or extract information from the data. In that case, we apply machine learning [1]. With the abundance of datasets available, the demand for machine learning is in rise. Many industries from medicine to military apply machine learning to extract relevant information. The purpose of machine learning is to learn from the data. Many studies have been done on how to make machines learn by themselves [2] [3]. Many mathematicians and programmers apply several approaches to find the solution of this problem. Some of them are demonstrated in Fig. 1. All the techniques of machine learning are explained in Section 2. Section 3 concludes this paper.

 

(2)    TYPES OF MACHINE LEARNING :

 

2.1)Supervised Learning:

The supervised machine learning algorithms are those algorithms which needs external assistance. The input dataset is divided into train and test dataset. The train dataset has output variable which needs to be predicted or classified. All algorithms learn some kind of patterns from the training dataset and apply them to the test dataset for prediction or classification [4]. The workflow of supervised machine learning algorithms is given in Fig. 2. Three most famous supervised machine learning algorithms have been discussed here.

2.1.1)Decision Tree:

 Decision trees are those type of trees which groups attributes by sorting them based on their values. Decision tree is used mainly for classification purpose. Each tree consists of nodes and branches. Each nodes represents attributes in a group that is to be classified and each branch represents a value that the node can take [4]. An example of decision tree is given in Fig.

 


The pseudo code for Decision tree is described in Fig. 4; where S, A and y are training set, input attribute and target attribute respectively.

 

Workflow of supervised machine learning algorithm:

 

2)Naive Bayes:

Naïve Bayes mainly targets the text classification industry. It is mainly used for clustering and classification purpose [6]. The underlying architecture of Naïve Bayes depends on the conditional probability. It creates trees based on their probability of happening. These trees are also known as Bayesian Network.

 

Bayes’ Theorem is stated as:

P(h|d) = (P(d|h) * P(h)) / P(d)

·        P(h|d) is the probability of hypothesis h given the data d. This is called the posterior probability.

·        P(d|h) is the probability of data d given that the hypothesis h was true.

·        P(h) is the probability of hypothesis h being true (regardless of the data). This is called the prior probability of h.

·        P(d) is the probability of the data (regardless of the hypothesis).

 

 

 An example of the network is given in Fig

 

 

 

 


(PSEUDOCODE FOR NAÏVE BYTES)

3)Support Vector Machine:

 

 

 Another most widely used state-of-the-art machine learning technique is Support Vector Machine (SVM).is

mainly used for classification. SVM works on the principle of margin calculation. It basically, draw margins between the classes. The margins are drawn in such a fashion that the distance between the margin and the classes is maximum and hence, minimizing the classification error. A support vector machine (SVM) is a supervised machine learning model that uses classification algorithms for two-group classification problems. After giving an SVM model sets of labeled training data for each  category, they’re able to categorize new text.

 

An example of working and pseudo code of SVM is given in Fig:

4) Unsupervised Learning:

 The unsupervised learning algorithms learns few features from the data. When new data is introduced, it uses the previously learned features to recognize the class of the data. It is mainly used for clustering and feature reduction. Unsupervised machine learning algorithms infer patterns from a dataset without reference to known, or labeled, outcomes. Unlike supervised machine learning, unsupervised machine learning methods cannot be directly applied to a regression or a classification problem because you have no idea what the values for the output data might be, making it impossible for you to train the algorithm the way you normally would. Unsupervised learning can instead be used to discover the underlying structure of the data.

An example of workflow of unsupervised learning is given in:

 

The two main algorithms for clustering and dimensionality reduction techniques are discussed below.

 4.1)K-Means Clustering:

Clustering or grouping is a type of unsupervised learning technique that when initiates, creates groups automatically. The items which possesses similar characteristics are put in the same cluster. This algorithm is called k-means because it creates k distinct clusters. The mean of the values in a particular cluster is the center of that cluster. A clustered data is represented.

Kmeans algorithm is an iterative algorithm that tries to partition the dataset into Kpre-defined distinct non-overlapping subgroups (clusters) where each data point belongs to only one group. It tries to make the intra-cluster data points as similar as possible while also keeping the clusters as different (far) as possible. It assigns data points to a cluster such that the sum of the squared distance between the data points and the cluster’s centroid (arithmetic mean of all the data points that belong to that cluster) is at the minimum. The less variation we have within clusters, the more homogeneous (similar) the data points are within the same cluster.

The algorithm for k-means is given in Fig.

 

(4.2)Principal Component Analysis:

 In Principal Component Analysis or PCA, the dimension of the data is reduced to make the computations faster and easier. To understand how PCA works, let’s take an example of 2D data. When the data is being plot in a graph, it will take up two axes. PCA is applied on the data, the data then will be 1D. Principal component analysis of a data matrix extracts the dominant patterns in the matrix in terms of a complementary set of score and loading plots. It is the responsibility of the data analyst to formulate the scientific issue at hand in terms of PC projections, PLS regressions, etc. Ask yourself, or the investigator, why the data matrix was collected, and for what purpose the experiments and measurements were made. Specify before the analysis what kinds of patterns you would expect and what you would find exciting.

The results of the analysis depend on the scaling of the matrix, which therefore must be specified. Variance scaling, where each variable is scaled to unit variance, can be recommended for general use, provided that almost constant variables are left unscaled. Combining different types of variables warrants blockscaling.

In the initial analysis, look for outliers and strong groupings in the plots, indicating that the data matrix perhaps should be “polished” or whether disjoint modeling is the proper course.

For plotting purposes, two or three principal components are usually sufficient, but for modeling purposes the number of significant components should be properly determined, e.g. by cross-validation.

Use the resulting principal components to guide your continued investigation or chemical experimentation, not as an end in itself.

 

(An Example pseudocode for Principal component analysis)

5). Semi - Supervised Learning Semi –

supervised learning algorithms is a technique which combines the power of both supervised and unsupervised learning. It can be fruit-full in those areas of machine learning and data mining where the unlabeled data is already present and getting the labeled data is a tedious process [15]. There are many categories of semi-supervised learning [16]. Some of which are discussed below:

   5.1) Generative Models:

 Generative models are one of the oldest semi-supervised learning method assumes a structure like p(x,y) = p(y)p(x|y) where p(x|y) is a mixed distribution e.g. Gaussian mixture models. Within the unlabeled data, the mixed components can be identifiable. One labeled example per component is enough to confirm the mixture distribution.

  5.2)Self-Training:

 In self-training, a classifier is trained with a portion of labeled data. The classifier is then fed with unlabeled data. The unlabeled points and the predicted labels are added together in the training set. This procedure is then repeated further. Since the classifier is learning itself, hence the name self-training.

5.3)Transductive SVM:

 Transductive support vector machine or TSVM is an extension of SVM. In TSVM, the labeled and unlabeled data both are considered. It is used to label the unlabeled data in such a way that the margin is maximum between the labeled and unlabeled data. Finding an exact solution by TSVM is a NP-hard problem.

6). Reinforcement Learning:

Reinforcement learning is a type of learning which makes decisions based on which actions to take such that the outcome is more positive. The learner has no knowledge which actions to take until it’s been given a situation. The action which is taken by the learner may affect situations and their actions in the future. Reinforcement learning solely depends on two criteria: trial and error search and delayed outcome . This is a long overdue blog post on Reinforcement Learning (RL). RL is hot! You may have noticed that computers can now automatically learn to play ATARI games (from raw game pixels!), they are beating world champions at Go, simulated quadrupeds are learning to run and leap, and robots are learning how to perform complex manipulation tasks that defy explicit programming. It turns out that all of these advances fall under the umbrella of RL research. I also became interested in RL myself over the last ~year: I worked through Richard Sutton’s book, read through David Silver’s course, watched John Schulmann’s lectures, wrote an RL library in Javascript, over the summer interned at DeepMind working in the DeepRL group, and most recently pitched in a little with the design/development of OpenAI Gym, a new RL benchmarking toolkit. So I’ve certainly been on this funwagon for at least a year but until now I haven’t gotten around to writing up a short post on why RL is a big deal, what it’s about, how it all developed and where it might be going.

 

 The general model  for reinforcement learning is depicted in Fig

 

7). Multitask Learning:

Multi-Task learning is a sub-field of Machine Learning that aims to solve multiple different tasks at the same time, by taking advantage of the similarities between different tasks. This can improve the learning efficiency and also act as a regularizer which we will discuss in a while.
Formally, if there are 
n tasks (conventional deep learning approaches aim to solve just 1 task using 1 particular model), where these n tasks or a subset of them are related to each other but not exactly identical, Multi-Task Learning (MTL) will help in improving the learning of a particular model by using the knowledge contained in all the n tasks. Multitask learning has a simple goal of helping other learners to perform better. When multitask learning algorithms are applied on a task, it remembers the procedure how it solved the problem or how it reaches to the particular conclusion. The algorithm then uses these steps to find the solution of other similar problem or task. This helping of one algorithm to another can also be termed as inductive transfer mechanism. If the learners share their experience with each other, the learners can learn concurrently rather than individually and can be much faster

8). Ensemble Learning:

 When various individual learners are combined to form only one learner then that particular type of learning is called ensemble learning. The individual learner may be Naïve Bayes, decision tree, neural network, etc. Ensemble learning is a hot topic since 1990s. It has been observed that, a collection of learners is almost always better at doing a particular job rather than individual learners . Two popular Ensemble learning techniques are given below.

8.1) Boosting:

 Boosting is a technique in ensemble learning which is used to decrease bias and variance. Boosting creates a collection of weak learners and convert them to one strong learner. A weak learner is a classifier which is barely correlated with true classification. On the other hand, a strong learner is a type of classifier which is strongly correlated with true classification [21]. The pseudo code for AdaBoost (which is most popular example of boosting) is give in Fig.

8.2) Bagging:

A Bagging classifier is an ensemble meta-estimator that fits base classifiers each on random subsets of the original dataset and then aggregate their individual predictions (either by voting or by averaging) to form a final prediction. Such a meta-estimator can typically be used as a way to reduce the variance of a black-box estimator (e.g., a decision tree), by introducing randomization into its construction procedure and then making an ensemble out of it.
Each base classifier is trained in parallel with a training set which is generated by randomly drawing, with replacement, N examples(or data) from the original training dataset – where N is the size of the original training set. Training set for each of the base classifiers is independent of each other. Many of the original data may be repeated in the resulting training set while others may be left out.

Bagging reduces overfitting (variance) by averaging or voting, however, this leads to an increase in bias, which is compensated by the reduction in variance though.Bagging or bootstrap aggregating is applied where the accuracy and stability of a machine learning algorithm needs to be increased. It is applicable in classification and regression. Bagging also decreases variance and helps in handling overfitting. The pseudo code for bagging in given in Fig

9). Neural Network Learning :

Neural networks are artificial systems that were inspired by biological neural networks. These systems learn to perform tasks by being exposed to various datasets and examples without any task-specific rules. The idea is that the system generates identifying characteristics from the data they have been passed without being programmed with a pre-programmed understanding of these datasets.

Neural networks are based on computational models for threshold logic. Threshold logic is a combination of algorithms and mathematics. Neural networks are based either on the study of the brain or on the application of neural networks to artificial intelligence. The work has led to improvements in finite automata theory.

Components of a typical neural network involve neurons, connections, weights, biases, propagation function, and a learning rule. Neurons will receive an input  from predecessor neurons that have an activation , threshold , an activation function f, and an output function . Connections consist of connections, weights and biases which rules how neuron  transfers output to neuron . Propagation computes the input and outputs the output and sums the predecessor neurons function with the weight. The learning rule modifies the weights and thresholds of the variables in the network.

The neural network (or artificial neural network or ANN) is derived from the biological concept of neurons. A neuron is a cell like structure in a brain. To understand neural network, one must understand how a neuron works. A neuron has mainly four parts (see Fig). They are dendrites, nucleus, soma and axon.

The dendrites receive electrical signals. Soma processes the electrical signal. The output of the process is carried by the axon to the dendrite terminals where the output is sent to next neuron. The nucleus is the heart of the neuron. The inter-connection of neuron is called neural network where electrical impulses travel around the brain.

9.1).Supervised Neural Network:

As the name suggests, supervised learning takes place under the supervision of a teacher. This learning process is dependent. During the training of ANN under supervised learning, the input vector is presented to the network, which will produce an output vector. This output vector is compared with the desired/target output vector. An error signal is generated if there is a difference between the actual output and the desired/target output vector. On the basis of this error signal, the weights would be adjusted until the actual output is matched with the desired output.In the supervised neural network, the output of the input is already known. The predicted output of the neural network is compared with the actual output. Based on the error, the parameters are changed, and then fed into the neural network again. Fig. will summarize the process. Supervised neural network is used in feed forward neural network.

 

9.2) Unsupervised Neural Network:

As the name suggests, this type of learning is done without the supervision of a teacher. This learning process is independent. During the training of ANN under unsupervised learning, the input vectors of similar type are combined to form clusters. When a new input pattern is applied, then the neural network gives an output response indicating the class to which input pattern belongs. In this, there would be no feedback from the environment as to what should be the desired output and whether it is correct or incorrect. Hence, in this type of learning the network itself must discover the patterns, features from the input data and the relation for the input data over the output.

 Here, the      neural network has no prior clue about the   output the input. The main job of the network is to categorize the data according to some similarities. The neural network checks the correlation between various inputs and groups them. The schematic diagram is shown in Fig

9.3) Reinforced Neural Network:

Reinforcement learning is an area of Machine Learning. It is about taking suitable action to maximize reward in a particular situation. It is employed by various software and machines to find the best possible behavior or path it should take in a specific situation. Reinforcement learning differs from the supervised learning in a way that in supervised learning the training data has the answer key with it so the model is trained with the correct answer itself whereas in reinforcement learning, there is no answer but the reinforcement agent decides what to do to perform the given task. In the absence of training dataset, it is bound to learn from its experience.

 In reinforced neural network, the network behaves as if a human communicates with the environment. From the environment, a feedback has been provided to the network acknowledging the fact that whether the decision taken by the network is right or wrong. If the decision is right, the connections which points to that particular output is strengthened. The connections are weakened otherwise. The network has no previous information about the output. Reinforced neural network is represented in Fig.

10). Instance-Based Learning:

Instance-based learning refers to a family of techniques for  classification  and  regression, which produce a class label/predication based on the similarity of the query to its nearest neighbor(s) in the training set. In explicit contrast to other methods such as  decision trees and  neural networks, instance-based learning algorithms do not create an abstraction from specific instances. Rather, they simply store all the data, and at query time derive an answer from an examination of the query’s  nearest neighbour(s).

Somewhat more generally, instance-based learning can refer to a class of procedures for solving new problems based on the solutions of similar past problems.

In instance-based learning, the learner learns a particular type of pattern. It tries to apply the same pattern to the newly fed data. Hence the name instance-based. It is a type of lazy learner which waits for the test data to arrive and then act on it together with training data. The complexity of the learning algorithm increases with the size of the data. Given below is a well-known example of instance-based learning which is k-nearest neighbour.

 

10.1)K-Nearest Neighbor:

 In k-nearest neighbor (or KNN), the training data (which is well-labeled) is fed into the learner. When the test data is introduced to the learner, it compares both the data. k most correlated data is taken from training set. The majority of k is taken which serves as the new class for the test data [27]. The pseudo code for KNN is given in

 

                             CONCLUSION:

This paper surveys various machine learning algorithms. Today each and every person is using machine learning knowingly or unknowingly. From getting a recommended product in online shopping to updating photos in social networking sites. This paper gives an introduction to most of the popular machine learning algorithms.

 

                             REFERENCES:

[1] W. Richert, L. P. Coelho, “Building Machine Learning Systems with Python”, Packt Publishing Ltd., ISBN 978-1-78216-140-0

 [2] M. Welling, “A First Encounter with Machine Learning”

[3] M. Bowles, “Machine Learning in Python: Essential Techniques for Predictive Analytics”, John Wiley & Sons Inc., ISBN: 978-1-118- 96174-2

[4] S.B. Kotsiantis, “Supervised Machine Learning: A Review of Classification Techniques”, Informatica 31 (2007) 249-268

 [5] L. Rokach, O. Maimon, “Top – Down Induction of Decision Trees Classifiers – A Survey”, IEEE Transactions on Systems,

 [6] D. Lowd, P. Domingos, “Naïve Bayes Models for Probability Estimation”

[7]https://webdocs.cs.ualberta.ca/~greiner/C651/Homework2_Fall2008.html

 [8] D. Meyer, “Support Vector Machines – The Interface to libsvm in package e1071”, August 2015

[9] S. S. Shwartz, Y. Singer, N. Srebro, “Pegasos: Primal Estimated sub - Gradient Solver for SVM”, Proceedings of the 24th International Conference on Machine Learning, Corvallis, OR, 2007

[10] http://www.simplilearn.com/what-is-machine-learning-and-why-itmatters-article

 [11] P. Harrington, “Machine Learning in action”, Manning Publications Co., Shelter Island, New York, 2012

[12] http://pypr.sourceforge.net/kmeans.html

[13] K. Alsabati, S. Ranaka, V. Singh, “An efficient k-means clustering algorithm”, Electrical Engineering and Computer Science, 1997

 [14] M. Andrecut, “Parallel GPU Implementation of Iterative PCA Algorithms”, Institute of Biocomplexity and Informatics, University of Calgary, Canada, 2008

[15] X. Zhu, A. B. Goldberg, “Introduction to Semi – Supervised Learning”, Synthesis Lectures on Artificial Intelligence and Machine Learning, 2009, Vol. 3, No. 1, Pages 1-130

[16] X. Zhu, “Semi-Supervised Learning Literature Survey”, Computer Sciences, University of Wisconsin-Madison, No. 1530, 2005

[17] R. S. Sutton, “Introduction: The Challenge of Reinforcement Learning”, Machine Learning, 8, Page 225-227, Kluwer Academic Publishers, Boston, 1992

[18] L. P. Kaelbing, M. L. Littman, A. W. Moore, “Reinforcement Learning: A Survey”, Journal of Artificial Intelligence Research, 4, Page 237-285, 1996

[19] R. Caruana, “Multitask Learning”, Machine Learning, 28, 41-75, Kluwer Academic Publishers, 1997

[20] D. Opitz, R. Maclin, “Popular Ensemble Methods: An Empirical Study”, Journal of Artificial Intelligence Research, 11, Pages 169- 198, 1999

[21] Z. H. Zhou, “Ensemble Learning”, National Key Laboratory for Novel Software Technology, Nanjing University, Nanjing, China

[22] https://en.wikipedia.org/wiki/Boosting_(machine_learning) [23] https://en.wikipedia.org/wiki/Bootstrap_aggregating

[24] V. Sharma, S. Rai, A. Dev, “A Comprehensive Study of Artificial Neural Networks”, International Journal of Advanced Research in Computer Science and Software Engineering, ISSN 2277128X, Volume 2, Issue 10, October 2012

 [25] S. B. Hiregoudar, K. Manjunath, K. S. Patil, “A Survey: Research Summary on Neural Networks”, International Journal of Research in Engineering and Technology, ISSN: 2319 1163, Volume 03, Special Issue 03, pages 385-389, May, 2014

 [26] https://en.wikipedia.org/wiki/Instance-based_learning

[27] P. Harrington, “Machine Learning in Action”, Manning Publications Co., Shelter Island, New York, ISBN 9781617290183, 2012 [28] J. M. Keller, M. R. Gray, J. A. Givens Jr., “A Fuzzy K-Nearest Neighbor Algorithm”, IEEE Transactions on Systems, Man and Cybernetics, Vol. SMC-15, No. 4, August 1985

 

 

 


Comments

Popular Posts