br cates with its neighbor node
(16) cates with its neighbor node (1, 2), which further communicates through the chain to the Lsize node, respectively.
Fig. 3. Optimal features selection using CAGA.
Fig. 4. Chain-like agent structure.
The features extraction phase is the prerequisite for nuclei classification that generates evidence about the condition of the disease. In the proposed method, three well-known supervised learning classifiers, i.e., Support Vector Machine (SVM), Naïve Bayesian (NB), and Random Forest (RF), are used for the detection and classification of malignant cells.
SVM is a supervised machine learning technique used for classification problems. It seeks to find the optimal hyperplane w⃗ that best divides or separates the data into two classes (malignant and benign). The data points xi that are closer to the hyperplane w⃗ are called support vectors. These support vectors or data points can change the position of the hyperplane w⃗. The classification
performance of SVM is better when the data point lies far from the hyperplane.
In the case of non-linear separation, data xi ∈ Rm is trans-ferred to higher dimensional space Rn (where n > m) through transformation functions (also called as kernel functions K (xi, xj)), such that the data can be linearly separated. The advantage of using kernel function is an implicit transformation of data-sets into higher dimensions without using extra memories and at a minimum computational cost. Different kernel functions exist that can be used, for example, radial basis function, linear func-tion, polynomial, and quadratic. In the proposed approach, Radial Basis Function (RBF) kernel (also known as the Gaussian kernel) is used that gives the optimal boundary for the classification of MK 2206 into malignant or benign. For input data xi and xj representing feature vectors, the RBF kernel is defined by  and given by the following equation: K (xi, xj) = exp (−
where σ is a free parameter having value greater than 0.
NB is also a supervised machine learning technique that classi-fies data on the basis of (Bayes’ rule) probability distribution [40, 41]. It works well for large size data with high dimensionality. Bayes’ rule is used in the manipulation of conditional probabili-ties . Based on the Bayes theorem, the Bayesian classifier is aimed to determine the posterior probability P(Yk X ) from P(Yk), P(X ) and P(X Yk). P(Yk X ) = P(Yk)P(X Yk) (18)
• P(X Yk) : represents the likelihood probability of feature (X ) for a given class Yk (Malignant or Benign).
• P(Yk X) : denotes the posterior probability of the target class Yi (Malignant or Benign) given features set (X ).
Fig. 5. Block diagram of K-fold cross validation.
• P(Yk) : represents the prior probability of class Yi (Malignant or Benign). • P(X) : shows the prior probability of given features X.
The RF classifier was first introduced in 1999 by Breiman , which is also known as an ensemble learning algorithm. The basic idea is to build a small decision tree in parallel with a small number of features and then combine them to make a forest. The details of random forest technique are given in , and each step of the RF can briefly be explained as follows: for every tree in the forest, a bootstrap is selected from a sample S. The decision tree is learned by a customized decision tree learning algorithm. At every node of the tree, a subset of optimal features f is selected randomly from the set F of features. Each tree node (representing feature set) is further divided into sub-node fi, which represents the optimal feature subset from the parent node F . r> 2.5. Ensemble classifier
Ensemble learning is a new trend in machine learning, which trains a number of specific classifiers and selects the one which produces the best result in classification. It has been shown that multiple classifiers give better results as compared to any indi-vidual classifier . Classifier ensembles (put together) multiple classifiers by applying algebraic combination rules given in [46,
47] to improve the performance of overall classification . In ensemble-based classification, for the learning step, a base classifier is selected randomly.
In the proposed approach, while using ensemble-based classi-fication, SVM is selected as a base classifier, where each classifier generates binary results. The dataset is divided into training and testing sets using cross validation algorithms. For training and testing purposes, data is divided into k disjoint sets where kth set is used for testing and k − 1 is used for training, as shown in Fig. 5. To select the best classifiers, simple majority voting is used in the proposed method. It has been observed that Naïve Bayesian is selected as the best classifier on the bases of majority voting using average probability.