# 1. Introduction

Radial basis function (RBF) networks have been widely studied and applied to various tasks of regression and classification problem, cf. [16, 19]. Since the concepts of RBF neural networks were introduced in the literature [20], there have been a number of interesting and useful generalizations of the generic topology of these networks and their learning methods, cf. [1, 4, 12, 14]. The visible feature of RBF neural networks comes with their fast two-phase training method. During this learning process, the values of the parameters of the radial basis functions are determined independently from the weight values of the output layer. Typically, the parameters of the basis functions (referred to as receptive fields) are estimated by some relatively fast and general methods of unsupervised learning applied to input data. After the basis functions have been determined, the output layer’s weights are obtained as the least-squares solution to a system of linear equations (e.g., by using the Moore-Penrose pseudo-inverse [2]). Compared to the nonlinear optimization that is usually considered in the training of neural networks, this two-stage method is usually much faster and can help avoid local minima and eliminate difficulties with the convergence of the overall learning process [13].

Let us look in more detail at this two-step design of RBF neural networks by highlighting the diversity of the optimization tools available there.

(a) Optimization of the hidden layer: We encounter a significant variety of radial basis functions being used and face with diverse ways of their development. Discussed are such typical forms of RBFs as Gaussian functions. Other analytical versions of such functions are also available; see [5]. An alternative way of dealing with the formation of the RBFs (receptive fields) is to exploit various clustering techniques including its commonly encountered representatives such as K-means and Fuzzy C-Means (FCM) [19]. Furthermore, the optimization method such as Particle Swarm Optimization can be used to position the RBFs in the input space, refer to [21]. (b) Optimization of the output layer: For learning scheme for the linear neuron located at the output layer of the network, gradient-based methods and Expectation Maximization (EM)-based training method are in common usage, see [9, 11].

Let us consider a way in how to locate the receptive fields of RBF neural networks (i.e., how to analyze and describe the input space which is inherently related with the output space through some unknown function whenever the output space is the real numbers space present in case of regression problems or the space of integers used in classification tasks). As noted above, various unsupervised clustering methods such as K-means and Fuzzy C-Means have been proposed to construct receptive fields. In particular, for regression problems, Pedrycz [18] has pointed at a certain drawback of the original objective function based clustering techniques such as Fuzzy C-Means clustering. This shortcoming, which is commonly encountered when using the clustering methods based on the minimization of the objective function to form linguistic terms of a fuzzy model over the input space, is that all of those terms are formed in a completely unsupervised manner even though there is some component of supervision available which comes in the form of dependent (output) variables. To alleviate this shortcoming and take into account information about the output space, the Conditional Fuzzy C-means (c-FCM) clustering has been proposed. Given that the information about the output is used in the method, it brings some component of supervision to the clustering process.

In this study, we develop a concept of RBF neural networks based on the supervisory clustering (supervision-augmented clustering) which relates with supervisory clustering realized for regression problems. Furthermore, when dealing with classification problems, the supervisory clustering has to be activated within the boundary area occupied by patterns to be classified. We define the boundary area as a certain region of the input space where the data (patterns) belonging to different classes are located. Given a mixture of data coming from different classes or associated with a substantial variety of output’s values, the boundary region can be regarded as a source of useful discriminatory information. In contrast, the regions of the input space associated within the core of each class (where by the core we mean a region of the input space being predominantly occupied by patterns belonging to the same class) might be a limited source of discriminatory information.

In order to activate the supervised clustering within the boundary area, we describe this area by using several linguistic terms (quantified in terms of fuzzy sets). This approach is legitimate considering that fuzzy sets are naturally geared to describe concepts (here classes) exhibiting overlap with elements belonging to other classes. After determining the boundary area, we invoke supervisory clustering to analyze the structure of the space. The performance of the proposed classifier is contrasted with the results produced by polynomial Fuzzy Radial Basis Function Neural Networks (pFRBF NNs). To show the classification abilities of the proposed classifier preferred to the various types of classifiers, we compare the generalization ability of the proposed classifier with the well-known classifiers.

This study is organized as follows. In Section 2, we review the generic architecture of the generic RBF NNs and the extended RBF NNs. Next, in Section 3, we propose and elaborate on the pFRBF NNs classifiers focused on the boundary decision area and conditional fuzzy clustering. Extensive experimental studies are covered in Section 5 while Section 6 offers some concluding comments.

# 2. Architecture of the Extended pRBFNNs

Several researches have said that the generic pFRBF NNs exhibit some advantages including global optimal approximation and classification capabilities as well as rapid convergence of the underlying learning procedures, see [6, 8]. The generic topology of pFRBF NNs is depicted in Fig. 1.

**Fig. 1.**General architecture of the generic pFRBF Neural Networks

In Fig. 1, Γi, i=1, 2,…, c denotes receptive fields (radial basis functions), while “m” stands for the number of the input variables. The output of the generic pFRBF NN comes as a linear combination of the outputs ( Γ(x) ) of the corresponding nodes at the hidden layer with the connection weights w1, w2, ⋯ , wc as shown below

Where x = [ x1 x2 ⋯ xm ] ∈ ℜm and Γi (x) is the activation level of the i-th node present at the hidden layer.

Generally, the Gaussian type pFRBFs are used as receptive fields

where vi and σi are the apex (center) and the spread of the ith receptive field, respectively.

There are two major differences between the extended pFRBF NNs and the generic version of pFRBF NNs. The first one concerns the type of the underlying receptive fields. In the extended pFRBF NNs, the prototypes of the receptive fields (i.e., the nodes of the hidden layer) are determined by running fuzzy clustering. The output of each node in the hidden layer is an activation level of the corresponding linguistic term (fuzzy set)

The second difference arises in terms of the type of the connection (weights) between the hidden layer and output layer. In the extended pFRBF NNs, we use linear functions or the 2nd order polynomials rather than confining ourselves to some fixed numeric values. The architecture of the extended pFRBF NN and the type of connection weights considered above is shown in Fig. 2.

**Fig. 2.**Architecture of the extended pFRBF Neural Networks

In Fig. 2, fi denotes the connection (weight) between the ith node of hidden layer and the node in the output layer. The connection fi is expressed as a linear function or the 2nd order polynomial. More specifically, we have

Here, ai = [ ai0 ai1 ⋯ aim ]T ∈ ℜ(m+1)

Here,

The activation level of each node in the hidden layer is determined using (3). The normalized activation level uik follows the expression

The following relationship holds

For the output node we obtain

# 3. The Development of The pFRBFNN Classifier Activated Within the Boundary Area

When we consider a two class problem, as elaborated on in the introduction, we use the extended pFRBF NNs to be used as the primary classifier whose receptive fields are constructed by running supervised clustering (i.e., the conditional fuzzy C-Means).

The basic conjecture coming with the proposed classifier is that more ambiguous information is present within the boundary area than with the core area (generally speaking, the core area has homogeneous patterns belonging to the same class whereas the boundary area typically embraces patterns belonging to several classes).

The boundary surface is formed within boundary area. In this paper, the boundary surface (area) for each class is determined by using the extended pFRBF NNs as already presented in Section II. The output of extended pFRBF NNs is aggregated through a certain linear combination of the local models, which describe the relationship between the input variables and the output variables present within the related local areas. The local model (i.e., the linear function or the 2nd order polynomial) of pFRBF NNs defines the local boundary surface, which is formed within the local area defined by the receptive field.

We anticipate that the improvement of classification performance becomes associated with the use of the receptive fields that are positioned within the boundary area.

## 3.1 Defining the boundary area

Let us recall that the boundary area pertains to the region in the input space in which we encounter patterns belonging to different classes. In contrast, the core area (region) is highly homogeneous where there are data belonging to the same class. Fig. 3 illustrates some core and boundary areas formed for the two-class data.

**Fig. 3.**Examples of core and boundary areas

In order to define the boundary areas by using linguistic terms, the data patterns involved in each class are previously analyzed by the Possibilistic C-Means (PCM) clustering. As far as data set is concerned, we consider a finite set of “n” input-output data coming in the form of the ordered pairs{xk , gk} , k = 1, 2, ⋯ , n, xk ∈ ℜm, while gk ∈ {1, 2, ⋯, l }, l is the number of classes. The output variable gk is the class label. Denote by Li the set of indices of the data pattern involved in the i-th class.

The original FCM uses the probabilistic constraint meaning that the membership grades for the same data sum up to one. While this is useful in forming the partition, the membership values resulting from the FCM and the related methods, however, may not always correspond to the intuitive concept of degree of belongingness, compatibility or typicality (commonality) as noted in the literature. Krishnapuram and Keller relaxed this constraint and introduced possibilistic clustering (PCM) by minimizing the following objective function

where ηi is a certain positive number, and “p” is a fuzzification coefficient that should be determined as any real number greater than 1 which is the same parameter as that is used in the ordinary FCM.

The first term requires that the distances from data points to the prototypes be as low as possible while the second term forces the values of uik to be as large as possible, thus avoiding running into a trivial solution. It is recommended to select ηi as discussed in [10], that is

Typically, the value of K is chosen to be equal to 1. The update of the prototypes is realized in the same way as this has been done in the FCM algorithm,

The membership degree (partition matrix) in the PCM is calculated as follows

We determine the prototypes and the activation levels for each class separately. The prototypes of class “j” (i.e. gk = j ) is calculated as follows

is the prototype of the i-th cluster of the j-class, nj is the number of elements of the index set Lj, and L j{k} means the k-th element of the index set Lj .

The activation level of the i-th cluster for the j-class is calculated as follows.

Where is the specific ηi available in the j-th class.

Note that these expressions are the modified versions of (12) and (13).

The higher the activation levels (15) are, the more visibly the data is involved in the core area of the corresponding class.

After calculating the activation levels and prototypes for all classes, we define the boundary area as follows.

Here T stands for some t-norm and S denotes a certain t-conorm (s-norm). In this study, the t-norm is realized as the minimum operator and the t-conorm is specified as the probabilistic sum.

and denote the activation levels of the i-th cluster of “1” class and the j-th cluster of “2” class, respectively. As shown in Fig. 4, with 2 classes where each class is composed of 2 clusters, the boundary area is defined in the following form

**Fig. 4.**Examples of the Boundary Area associated with the corresponding values of α-cuts of the fuzzy clusters (membership functions)

## 3.2 Conditional fuzzy C-Means clustering within boundary area

The idea of Conditional Fuzzy C-Means (c-FCM, for short) clustering proposed in [18] was applied to the design of pFRBF neural networks as presented in [19]. To elaborate on the essence of the method, let us consider a set of patterns X = {x1, x2, ⋯ xN}, xk ∈ ℜm (where m stands for the dimensionality of the input space) along with an auxiliary information granule, which is defined as the boundary area. Each element of X is then associated with the auxiliary information granule (fuzzy set) B given by (16).

In conditional clustering, the data pattern xk is clustered by taking into consideration the conditions (auxiliary information expressed in the form given by B(x1), B(x2), ⋯ , B(xn) ) based on some linguistic term expressed as a fuzzy set B ( B : ℜ→[0,1] ). The objective function used in the conditional fuzzy clustering is the same as the one used in the FCM, namely

where J is the objective function, uik is the activation level associated with the linguistic term B defining the boundary area, vi is the ith cluster and c is the number of rules (clusters) formed for this context. The difference between the FCM and c-FCM comes in the form of the constraint imposed on the partition matrix where we now have

Here, B( xk ) is the linguistic term (fuzzy set) which means the activation level how much the input data xk is involved in the boundary area. Now the optimization problem is formulated in the following form

The iterative optimization scheme is governed by the two update formulas using which we successively modify the partition matrix and the prototypes

## 3.3 pRBFNNs classifier- The use of conditional fuzzy C-Means clustering and a focus on the boundary area

In what follows, we propose the pFRBF NNs classifier developed by the c-FCM clustering supervised by the linguistic term, which specifies the boundary area.

As mentioned earlier, we assume that in order to improve the classification performance one has to locate the pFRBFs within the boundary area. pFRBF NNs is composed of the linear combination of the local models, which are defined on the local areas (receptive fields). In this way, the pFRBF NNs classifier can be regarded as a linear combination of the local boundary surfaces.

The local models of the pFRBF NNs are activated within the receptive fields (pFRBFs). Therefore, the pFRBFs located within the boundary area have the potential to form the “sound” boundary surface. Fig. 5 shows an overall development process of the proposed classifier.

**Fig. 5.**The overall development of the proposed classifier based on the extended pFRBF NNs, boundary area decision, and c-FCM

As shown in Fig. 2, the output of the proposed pFRBF NNs classifier comes as the linear combination of the connection weights such as ( f1, f2, ⋯, fc ) with the activation levels of each node of the hidden layer ( Γ1, Γ2, ⋯, Γc ). The way to calculate the output of the network of the proposed classifier is similar to the output of the extended pFRBF NNs. However, the activation levels of each pFRBF of the proposed model are described by using (21) which is quite different from the description provided by (2) and (6).

To estimate the connections we use the orthogonal least square method and the weighted least square estimation method. Proceeding with the optimization details, the objective function of Least Square Estimation (LSE) reads as follows

where

The optimal values of the coefficients are expressed in a well-known manner

When we use the weighted LSE to estimate the coefficients of local models, we assume that each data patterns comes with its priority and data patterns with high priority significantly affect the estimation process whereas data with low priority participate to a limited degree and can be almost neglected. The activation levels of the linguistic variable defining the boundary area can be considered as the priority index. As said earlier, we emphasize the data positioned within the boundary area.

Unlike the conventional LSE, the objective function of the weighted LSE is defined as follows

Where,

In the above expression, q denotes the linguistic modifier of the activation level of the boundary area. If the values of q get higher than 1, we arrive at higher specificity of the underlying linguistic information while an opposite effect becomes present when dealing with the lower values of q [3]. Note that the diagonal partition matrix D is the reduced matrix, which is composed of the activation levels of all data pairs to the linguistic term B as the diagonal elements.

The optimal values of the coefficients by using the weighted LSE are expressed in a well-known manner.

The final output of the pFRBF NNs comes in the form

The estimated class label is calculated by using the decision rule

# 4. Experimental Study

In order to evaluate and quantify the classification effectiveness of the proposed classifier, the proposed classifier is experimented with by making use of a series of numeric data such as two synthetic datasets and several Machine Learning datasets (http://www.ics.uci.edu/~mlearn/MLRepository.html). In the assessment of the performance of the classifiers, we use the error rate of the resulting classifier.

We investigate and report the results of each experiment in terms of the mean and the standard deviation of the performance index. We consider some predefined values of the parameters of the network whose values are summarized in Table 1. The choice of these particular numeric values has been motivated by the need to come up with a possibility to investigate of the performance of the model in a fairly comprehensive range of scenarios.

**Table 1.**Selected Numeric Values of the Parameters of the Proposed Model

In what follows, we report on several experiments dealing with some machine learning data sets (http://www. ics.uci.edu/~mlearn/MLRepository.html). For simplicity, we deal with two class- problems (the classifier can be extended to deal with more than two classes). The experiments were repeated 10 times using a random split of data into 70%-30% training and testing subsets. Table 2 contrasts the classification error of the proposed classifier with other well-known methods known in the literature [17]. In this experiments, the generic type basic neural networks (NNs), principal component analysis (PCA) and linear discriminant analysis (LDA) are used. Support vector machine (SVM) is available in a MATLAB toolbox, see http://theoval.sys.uea.ac.uk/~gcc/svm/toolbox/. For the decision tree methods, the code of C4.5 trees was coming from the Classification Toolbox of MATLAB (http://www.yom-tov.info/cgi-bin/list_uploaded_files.pl) and the decision trees used some functions coming from the Statistics Toolbox of MATLAB.

**Table 2.**Results of comparative analysis (The best results are shown in boldface)

Table 3 shows the comparison between the proposed classifier and the classification methods based on the boundary analysis. In this experiments, we use 10 fold cross validation to evaluate the classification abilities and the final correct classification ratio is given in terms of its average and the standard deviation. From the results in Table VI, we can see that the proposed classifier is better than the LBDA based classifiers in terms of the classification abilities achieving higher classification rates.

**Table 3.**LBDA - linear boundary discriminant analysis. LBDA+NN(non) uses only non-boundary patterns to train Nearest Neighbor classifier, while LBDA+NN(all) uses all patterns to train the same classifier.

# 5. Conclusion

In this paper, we proposed the new design methodology of polynomial fuzzy radial basis function neural networks for the classification problem. Unlike the usual design method of RBFs, the proposed design method concentrate on a detailed description of the boundary regions in the feature space. The learning algorithm used to in the development of the conclusion part of the rules takes advantage of the linear discriminant analysis. To evaluate the proposed model for classification problem, we completed several experiments using 2-dimensional synthetic datasets and a number of machine learning datasets.