MACHINE LEARNING ALGORITHMS
ABSTRACT
In this paper, various
machine learning algorithms have been discussed. These algorithms are used for
various purposes like data mining, image processing, predictive analytics, etc.
to name a few. The main advantage of using machine learning is that, once an
algorithm learns what to do with data, it can do its work automatically.
Keywords
– Machine learning, algorithms, pseudo code
(1) INTRODUCTION
In Machine Learning,
the algorithm is just the method that you’ll use to find
those relationships within your data. Algorithms can be complex or simple, big
or small, or any permutation of things: but at the core, they’re just ways of
figuring out what, if anything, drives the changes you’re trying to predict.
Heard of Deep Learning? That’s (basically) a type of algorithm. Much like the
MergeSort algorithm is efficient at sorting arrays, Machine Learning algorithms
are efficient at surfacing relationships and associations.Internet and its applications
have become an integral part of today’s human lifestyle. It has become an
essential tool in every aspect. Due to the tremendous demand and necessity,
researchers went beyond connecting just computers into the web. These
researches led to the birth of a sensational gizmo, Internet of Things (IoT).
Communication over the internet has grown from user -user interaction to device
–device interactions these days. The IoT concepts were proposed years back but
still it’s in the initial stage of commercial deployment. Home automation
industry and transportation industries are seeing rapid growth with IoT. Yet
not many articles have been published in this field of study. Since most of the
process is done through the internet, we must have an active high-speed
internet connection. The technology can be simply explained as a connection
between humans-computers-things. All the equipment’s we use in our day to day
life can be controlled and monitored using the IoT.
Machine learning is used to
teach machines how to handle the data more efficiently. Sometimes after viewing
the data, we cannot interpret the pattern or extract information from the data.
In that case, we apply machine learning [1]. With the abundance of datasets
available, the demand for machine learning is in rise. Many industries from
medicine to military apply machine learning to extract relevant information.
The purpose of machine learning is to learn from the data. Many studies have
been done on how to make machines learn by themselves [2] [3]. Many
mathematicians and programmers apply several approaches to find the solution of
this problem. Some of them are demonstrated in Fig. 1. All the techniques of
machine learning are explained in Section 2. Section 3 concludes this paper.
(2) TYPES OF MACHINE LEARNING :
2.1)Supervised Learning:
The supervised machine
learning algorithms are those algorithms which needs external assistance. The
input dataset is divided into train and test dataset. The train dataset has
output variable which needs to be predicted or classified. All algorithms learn
some kind of patterns from the training dataset and apply them to the test
dataset for prediction or classification [4]. The workflow of supervised
machine learning algorithms is given in Fig. 2. Three most famous supervised
machine learning algorithms have been discussed here.
2.1.1)Decision Tree:
Decision trees are those type of trees which
groups attributes by sorting them based on their values. Decision tree is used
mainly for classification purpose. Each tree consists of nodes and branches.
Each nodes represents attributes in a group that is to be classified and each
branch represents a value that the node can take [4]. An example of decision
tree is given in Fig.
The pseudo code for Decision tree is described
in Fig. 4; where S, A and y are training set, input attribute and target
attribute respectively.
Workflow of
supervised machine learning algorithm:
2)Naive
Bayes:
Naïve Bayes mainly targets the text classification
industry. It is mainly used for clustering and classification purpose [6]. The
underlying architecture of Naïve Bayes depends on the conditional probability.
It creates trees based on their probability of happening. These trees are also
known as Bayesian Network.
Bayes’ Theorem is stated as:
P(h|d) = (P(d|h) * P(h))
/ P(d)
·
P(h|d) is the probability of hypothesis h given the
data d. This is called the posterior probability.
·
P(d|h) is the probability of data d given that the
hypothesis h was true.
·
P(h) is the probability of hypothesis h being true
(regardless of the data). This is called the prior probability of h.
·
P(d) is the probability of the data (regardless of
the hypothesis).
An example of the network is given in Fig
(PSEUDOCODE
FOR NAÏVE BYTES)
3)Support Vector Machine:
Another most widely used state-of-the-art
machine learning technique is Support Vector Machine (SVM).is
mainly used for
classification. SVM works on the principle of margin calculation. It basically,
draw margins between the classes. The margins are drawn in such a fashion that
the distance between the margin and the classes is maximum and hence,
minimizing the classification error. A
support vector machine (SVM) is a supervised machine learning model that uses
classification algorithms for two-group classification problems. After giving
an SVM model sets of labeled training data for each category, they’re
able to categorize new text.
An example of working and
pseudo code of SVM is given in Fig:
4) Unsupervised Learning:
The
unsupervised learning algorithms learns few features from the data. When new
data is introduced, it uses the previously learned features to recognize the
class of the data. It is mainly used for clustering and feature reduction. Unsupervised machine learning
algorithms infer patterns from a dataset without reference to known, or
labeled, outcomes. Unlike supervised machine learning, unsupervised machine learning methods cannot be
directly applied to a regression or a classification problem because you have no idea what the
values for the output data might be, making it impossible for you to train the algorithm the way you normally would.
Unsupervised learning can instead be used to discover the underlying structure
of the data.
An example of workflow of unsupervised learning is
given in:
The two main algorithms for clustering and
dimensionality reduction techniques are discussed below.
4.1)K-Means Clustering:
Clustering or grouping is a type of unsupervised
learning technique that when initiates, creates groups automatically. The items
which possesses similar characteristics are put in the same cluster. This
algorithm is called k-means because it creates k distinct clusters. The mean of
the values in a particular cluster is the center of that cluster. A clustered
data is represented.
Kmeans algorithm is an iterative algorithm that tries
to partition the dataset into Kpre-defined distinct
non-overlapping subgroups (clusters) where each data point belongs to only
one group. It tries to
make the intra-cluster data points as similar as possible while also keeping
the clusters as different (far) as possible. It assigns data points to a
cluster such that the sum of the squared distance between the data points and
the cluster’s centroid (arithmetic mean of all the data points that belong to
that cluster) is at the minimum. The less variation we have within clusters,
the more homogeneous (similar) the data points are within the same cluster.
The algorithm for k-means is given in Fig.
(4.2)Principal Component
Analysis:
In Principal
Component Analysis or PCA, the dimension of the data is reduced to make the
computations faster and easier. To understand how PCA works, let’s take an
example of 2D data. When the data is being plot in a graph, it will take up two
axes. PCA is applied on the data, the data then will be 1D. Principal component analysis of a data matrix extracts
the dominant patterns in the matrix in terms of a complementary set of score
and loading plots. It is the responsibility of the data analyst to formulate
the scientific issue at hand in terms of PC projections, PLS regressions, etc.
Ask yourself, or the investigator, why the data matrix was collected, and for
what purpose the experiments and measurements were made. Specify before the
analysis what kinds of patterns you would expect and what you would find
exciting.
The results of the
analysis depend on the scaling of the matrix, which therefore must be
specified. Variance scaling, where each variable is scaled to unit variance,
can be recommended for general use, provided that almost constant variables are
left unscaled. Combining different types of variables warrants blockscaling.
In the initial
analysis, look for outliers and strong groupings in the plots, indicating that
the data matrix perhaps should be “polished” or whether disjoint modeling is
the proper course.
For plotting
purposes, two or three principal components are usually sufficient, but for
modeling purposes the number of significant components should be properly
determined, e.g. by cross-validation.
Use the resulting
principal components to guide your continued investigation or chemical experimentation, not as an end in itself.
(An Example pseudocode for Principal component
analysis)
5). Semi - Supervised
Learning Semi –
supervised learning algorithms is a technique which
combines the power of both supervised and unsupervised learning. It can be
fruit-full in those areas of machine learning and data mining where the
unlabeled data is already present and getting the labeled data is a tedious
process [15]. There are many categories of semi-supervised learning [16]. Some
of which are discussed below:
5.1) Generative Models:
Generative
models are one of the oldest semi-supervised learning method assumes a
structure like p(x,y) = p(y)p(x|y) where p(x|y) is a mixed distribution e.g.
Gaussian mixture models. Within the unlabeled data, the mixed components can be
identifiable. One labeled example per component is enough to confirm the
mixture distribution.
5.2)Self-Training:
In self-training,
a classifier is trained with a portion of labeled data. The classifier is then
fed with unlabeled data. The unlabeled points and the predicted labels are
added together in the training set. This procedure is then repeated further.
Since the classifier is learning itself, hence the name self-training.
5.3)Transductive SVM:
Transductive
support vector machine or TSVM is an extension of SVM. In TSVM, the labeled and
unlabeled data both are considered. It is used to label the unlabeled data in
such a way that the margin is maximum between the labeled and unlabeled data.
Finding an exact solution by TSVM is a NP-hard problem.
6). Reinforcement Learning:
Reinforcement
learning is a type of learning which makes decisions based on which actions to
take such that the outcome is more positive. The learner has no knowledge which
actions to take until it’s been given a situation. The action which is taken by
the learner may affect situations and their actions in the future.
Reinforcement learning solely depends on two criteria: trial and error search
and delayed outcome . This
is a long overdue blog post on Reinforcement Learning (RL). RL is hot! You may
have noticed that computers can now automatically learn to play
ATARI games (from raw game pixels!), they are beating world champions at Go, simulated quadrupeds are learning to run and leap, and robots are learning how to
perform complex
manipulation tasks that
defy explicit programming. It turns out that all of these advances fall under
the umbrella of RL research. I also became interested in RL myself over the
last ~year: I worked through
Richard Sutton’s book,
read through David
Silver’s course,
watched John
Schulmann’s lectures,
wrote an RL library in Javascript, over the summer interned at DeepMind
working in the DeepRL group, and most recently pitched in a little with the
design/development of OpenAI Gym, a new RL benchmarking toolkit. So I’ve
certainly been on this funwagon for at least a year but until now I haven’t
gotten around to writing up a short post on why RL is a big deal, what it’s
about, how it all developed and where it might be going.
The general
model for reinforcement learning is
depicted in Fig
7). Multitask Learning:
Multi-Task learning is a sub-field of Machine
Learning that aims to solve multiple different tasks at the same time, by
taking advantage of the similarities between different tasks. This can improve
the learning efficiency and also act as a regularizer which we will discuss in
a while.
Formally, if there are n tasks (conventional deep learning approaches aim to solve just 1
task using 1 particular model), where these n tasks or a subset of them are related to each other but not exactly
identical, Multi-Task Learning (MTL) will help in improving the learning of a particular model by using
the knowledge contained in all the n tasks. Multitask learning has a simple
goal of helping other learners to perform better. When multitask learning
algorithms are applied on a task, it remembers the procedure how it solved the
problem or how it reaches to the particular conclusion. The algorithm then uses
these steps to find the solution of other similar problem or task. This helping
of one algorithm to another can also be termed as inductive transfer mechanism.
If the learners share their experience with each other, the learners can learn
concurrently rather than individually and can be much faster
8). Ensemble Learning:
When various
individual learners are combined to form only one learner then that particular
type of learning is called ensemble learning. The individual learner may be
Naïve Bayes, decision tree, neural network, etc. Ensemble learning is a hot
topic since 1990s. It has been observed that, a collection of learners is
almost always better at doing a particular job rather than individual learners .
Two popular Ensemble learning techniques are given below.
8.1) Boosting:
Boosting is a
technique in ensemble learning which is used to decrease bias and variance.
Boosting creates a collection of weak learners and convert them to one strong
learner. A weak learner is a classifier which is barely correlated with true
classification. On the other hand, a strong learner is a type of classifier
which is strongly correlated with true classification [21]. The pseudo code for
AdaBoost (which is most popular example of boosting) is give in Fig.
8.2) Bagging:
A Bagging classifier is an ensemble meta-estimator
that fits base classifiers each on random subsets of the original dataset and
then aggregate their individual predictions (either by voting or by averaging)
to form a final prediction. Such a meta-estimator can typically be used as a
way to reduce the variance of a black-box estimator (e.g., a decision tree), by
introducing randomization into its construction procedure and then making an
ensemble out of it.
Each base classifier is trained in parallel with a training set which is generated
by randomly drawing, with replacement, N examples(or data) from the original
training dataset – where N is the size of the
original training set. Training set for each of the base classifiers
is independent of each other. Many of the original data may be repeated in the
resulting training set while others may be left out.
Bagging
reduces overfitting (variance) by averaging or voting, however, this leads to
an increase in bias, which is compensated by the reduction in variance though.Bagging or bootstrap aggregating is applied where the
accuracy and stability of a machine learning algorithm needs to be increased.
It is applicable in classification and regression. Bagging also decreases
variance and helps in handling overfitting. The pseudo code for bagging in
given in Fig
9). Neural Network Learning
:
Neural networks are artificial systems that were inspired by
biological neural networks. These systems learn to perform tasks by being
exposed to various datasets and examples without any task-specific rules. The
idea is that the system generates identifying characteristics from the data
they have been passed without being programmed with a pre-programmed
understanding of these datasets.
Neural
networks are based on computational models for threshold logic. Threshold logic
is a combination of algorithms and mathematics. Neural networks are based
either on the study of the brain or on the application of neural networks to
artificial intelligence. The work has led to improvements in finite automata
theory.
Components
of a typical neural network involve neurons, connections, weights, biases,
propagation function, and a learning rule. Neurons will receive an
input from predecessor neurons that have an activation ,
threshold , an activation function f, and an output function .
Connections consist of connections, weights and biases which rules how
neuron transfers output to
neuron . Propagation computes the input
and outputs the output and sums the predecessor neurons function with the
weight. The learning rule modifies the weights and thresholds of the variables
in the network.
The neural network (or artificial neural network or
ANN) is derived from the biological concept of neurons. A neuron is a cell like
structure in a brain. To understand neural network, one must understand how a
neuron works. A neuron has mainly four parts (see Fig). They are dendrites,
nucleus, soma and axon.
The dendrites receive electrical signals. Soma
processes the electrical signal. The output of the process is carried by the axon
to the dendrite terminals where the output is sent to next neuron. The nucleus
is the heart of the neuron. The inter-connection of neuron is called neural
network where electrical impulses travel around the brain.
9.1).Supervised Neural Network:
As the name suggests, supervised learning takes place
under the supervision of a teacher. This learning process is dependent. During
the training of ANN under supervised learning, the input vector is presented to
the network, which will produce an output vector. This output vector is compared
with the desired/target output vector. An error signal is generated if there is
a difference between the actual output and the desired/target output vector. On
the basis of this error signal, the weights would be adjusted until the actual
output is matched with the desired output.In the supervised
neural network, the output of the input is already known. The predicted output
of the neural network is compared with the actual output. Based on the error,
the parameters are changed, and then fed into the neural network again. Fig. will
summarize the process. Supervised neural network is used in feed forward neural
network.
9.2)
Unsupervised Neural Network:
As the name suggests,
this type of learning is done without the supervision of a teacher. This
learning process is independent. During the training of ANN under unsupervised
learning, the input vectors of similar type are combined to form clusters. When
a new input pattern is applied, then the neural network gives an output
response indicating the class to which input pattern belongs. In this, there
would be no feedback from the environment as to what should be the desired
output and whether it is correct or incorrect. Hence, in this type of learning
the network itself must discover the patterns, features from the input data and
the relation for the input data over the output.
Here, the
neural network has no prior clue about the output the input. The main job of the
network is to categorize the data according to some similarities. The neural
network checks the correlation between various inputs and groups them. The
schematic diagram is shown in Fig
9.3)
Reinforced Neural Network:
Reinforcement
learning is an area of Machine Learning. It is about taking suitable action to
maximize reward in a particular situation. It is employed by various software
and machines to find the best possible behavior or path it should take in a
specific situation. Reinforcement learning differs from the supervised learning
in a way that in supervised learning the training data has the answer key with
it so the model is trained with the correct answer itself whereas in
reinforcement learning, there is no answer but the reinforcement agent decides
what to do to perform the given task. In the absence of training dataset, it is
bound to learn from its experience.
In reinforced neural network, the network
behaves as if a human communicates with the environment. From the environment,
a feedback has been provided to the network acknowledging the fact that whether
the decision taken by the network is right or wrong. If the decision is right,
the connections which points to that particular output is strengthened. The
connections are weakened otherwise. The network has no previous information
about the output. Reinforced neural network is represented in Fig.
10). Instance-Based Learning:
Instance-based learning refers to a family of techniques
for classification and regression, which produce a
class label/predication based on the similarity of the query to its nearest
neighbor(s) in the training set. In explicit contrast to other methods such
as decision trees and neural networks, instance-based
learning algorithms do not create an abstraction from specific instances.
Rather, they simply store all the data, and at query time derive an answer from
an examination of the query’s nearest neighbour(s).
Somewhat more generally, instance-based learning can refer
to a class of procedures for solving new problems based on the solutions of
similar past problems.
In
instance-based learning, the learner learns a particular type of pattern. It
tries to apply the same pattern to the newly fed data. Hence the name
instance-based. It is a type of lazy learner which waits for the test data to
arrive and then act on it together with training data. The complexity of the
learning algorithm increases with the size of the data. Given below is a
well-known example of instance-based learning which is k-nearest neighbour.
10.1)K-Nearest Neighbor:
In k-nearest neighbor (or KNN), the training
data (which is well-labeled) is fed into the learner. When the test data is
introduced to the learner, it compares both the data. k most correlated data is
taken from training set. The majority of k is taken which serves as the new class
for the test data [27]. The pseudo code for KNN is given in
CONCLUSION:
This paper
surveys various machine learning algorithms. Today each and every person is
using machine learning knowingly or unknowingly. From getting a recommended
product in online shopping to updating photos in social networking sites. This
paper gives an introduction to most of the popular machine learning algorithms.
REFERENCES:
[1] W. Richert,
L. P. Coelho, “Building Machine Learning Systems with Python”, Packt Publishing
Ltd., ISBN 978-1-78216-140-0
[2] M. Welling, “A First Encounter with
Machine Learning”
[3] M. Bowles,
“Machine Learning in Python: Essential Techniques for Predictive Analytics”,
John Wiley & Sons Inc., ISBN: 978-1-118- 96174-2
[4] S.B.
Kotsiantis, “Supervised Machine Learning: A Review of Classification
Techniques”, Informatica 31 (2007) 249-268
[5] L. Rokach, O. Maimon, “Top – Down
Induction of Decision Trees Classifiers – A Survey”, IEEE Transactions on
Systems,
[6] D. Lowd, P. Domingos, “Naïve Bayes Models
for Probability Estimation”
[7]https://webdocs.cs.ualberta.ca/~greiner/C651/Homework2_Fall2008.html
[8] D. Meyer, “Support Vector Machines – The
Interface to libsvm in package e1071”, August 2015
[9] S. S.
Shwartz, Y. Singer, N. Srebro, “Pegasos: Primal Estimated sub - Gradient Solver
for SVM”, Proceedings of the 24th International Conference on Machine Learning,
Corvallis, OR, 2007
[10] http://www.simplilearn.com/what-is-machine-learning-and-why-itmatters-article
[11] P. Harrington, “Machine Learning in action”,
Manning Publications Co., Shelter Island, New York, 2012
[12] http://pypr.sourceforge.net/kmeans.html
[13] K. Alsabati,
S. Ranaka, V. Singh, “An efficient k-means clustering algorithm”, Electrical
Engineering and Computer Science, 1997
[14] M. Andrecut, “Parallel GPU Implementation
of Iterative PCA Algorithms”, Institute of Biocomplexity and Informatics,
University of Calgary, Canada, 2008
[15] X. Zhu, A.
B. Goldberg, “Introduction to Semi – Supervised Learning”, Synthesis Lectures
on Artificial Intelligence and Machine Learning, 2009, Vol. 3, No. 1, Pages
1-130
[16] X. Zhu,
“Semi-Supervised Learning Literature Survey”, Computer Sciences, University of
Wisconsin-Madison, No. 1530, 2005
[17] R. S.
Sutton, “Introduction: The Challenge of Reinforcement Learning”, Machine
Learning, 8, Page 225-227, Kluwer Academic Publishers, Boston, 1992
[18] L. P.
Kaelbing, M. L. Littman, A. W. Moore, “Reinforcement Learning: A Survey”,
Journal of Artificial Intelligence Research, 4, Page 237-285, 1996
[19] R. Caruana,
“Multitask Learning”, Machine Learning, 28, 41-75, Kluwer Academic Publishers,
1997
[20] D. Opitz, R.
Maclin, “Popular Ensemble Methods: An Empirical Study”, Journal of Artificial
Intelligence Research, 11, Pages 169- 198, 1999
[21] Z. H. Zhou,
“Ensemble Learning”, National Key Laboratory for Novel Software Technology,
Nanjing University, Nanjing, China
[22]
https://en.wikipedia.org/wiki/Boosting_(machine_learning) [23] https://en.wikipedia.org/wiki/Bootstrap_aggregating
[24] V. Sharma,
S. Rai, A. Dev, “A Comprehensive Study of Artificial Neural Networks”,
International Journal of Advanced Research in Computer Science and Software
Engineering, ISSN 2277128X, Volume 2, Issue 10, October 2012
[25] S. B. Hiregoudar, K. Manjunath, K. S.
Patil, “A Survey: Research Summary on Neural Networks”, International Journal
of Research in Engineering and Technology, ISSN: 2319 1163, Volume 03, Special
Issue 03, pages 385-389, May, 2014
[26] https://en.wikipedia.org/wiki/Instance-based_learning
[27] P.
Harrington, “Machine Learning in Action”, Manning Publications Co., Shelter
Island, New York, ISBN 9781617290183, 2012 [28] J. M. Keller, M. R. Gray, J. A.
Givens Jr., “A Fuzzy K-Nearest Neighbor Algorithm”, IEEE Transactions on
Systems, Man and Cybernetics, Vol. SMC-15, No. 4, August 1985
Comments
Post a Comment