^{1}

^{2}

^{3}

^{1}

^{4}

The objective of this research is the presentation of a neural network capable of solving complete nonlinear algebraic systems of n equations with n unknowns. The proposed neural solver uses the classical back propagation algorithm with the identity function as the output function, and supports the feature of the adaptive learning rate for the neurons of the second hidden layer. The paper presents the fundamental theory associated with this approach as well as a set of experimental results that evaluate the performance and accuracy of the proposed method against other methods found in the literature.

The estimation of the roots associated with nonlinear algebraic systems has been a major area of research in applied mathematics [

The paper is organized as follows. Section 2 presents the formulation of the problem to be solved, namely, the structure of the complete n × n nonlinear algebraic system and the classical solution methods with their advantages and disadvantages. Section 3 presents a review of the related work that tries to deal with the problems associated with the classical methods. The methods presented here are based on a lot of different approaches such as simulated annealing, genetic algorithms and neural networks, and a short description is given for each one of them. Sections 4 and 5 present the proposed neural system, the rationale behind its structure and its mathematical description together with its convergence analysis. Section 6 presents the experimental results emerged from its application on solving sample nonlinear example systems borrowed from the literature. These results are compared against those emerged by applying other methods and then, the accuracy and the performance of the proposed neural solver are evaluated. Finally, Section 7 presents the conclusions of this research and discusses topics associated with a potential future work on this subject.

As it is well known from the basic principles of nonlinear algebra, a nonlinear system of n equations with n unknowns is defined as

or in vector form

where

is the vector of the system solutions (or roots). For the case of constant (namely non functional) coefficients, the above equations can also expressed as [

holds, with the function

When all degrees coincide, that is,

Even though the notion of the resultant has been defined for homogenous nonlinear equations, it can also describe non-homogenous algebraic equations as well. In the general case, the resultant

where

The classical method for solving the nonlinear system defined above is the well known Newton’s method [

and generates a vector sequence

where _{k}. Even though the method converges fast to the solution provided that the initial guess (namely the starting point x_{0}) is a good one, it is not considered as an efficient algorithm, since it requires in each step the evaluation (or approximation) of n^{2} partial derivatives and the solution of an n × n linear system. A performance evaluation of the Newton’s method as well as other similar direct methods, shows that these methods are impractical for large scale problems, due to the large computational cost and memory requirements. In this work, besides the classical Newton’s method, the fixed Newton’s method was also used. As it is well known, the difference between these variations is the fact that in the fixed method the matrix of derivatives is not updated during iterations, and therefore the algorithm uses always the derivative matrix associated with the initial condition x_{0}.

An improvement to the classical Newton’s method can be found in the work of Broyden [

・ Broyden method 1. This method allows the update of the Jacobian approximation B_{i} during the step i in a stable and efficient way and is related to the equation

where i and

・ Broyden method 2. This method allows the elimination of the requirement of a linear solver to compute the step direction and is related to the equation

with the parameters of this equation to be defined as previously.

According to the literature [

Ren et al. [

In the work of Ren et al., the objective function

Pourjafari & Mojallali [

with the root finding process to be composed of two phases, namely a global search where plants abandon non- minimum points vacant and settle down around minima, and an exact search where the exact locations of roots are determined via a clustering procedure that cluster plants around each root; in this way, the search is restricted only to that clusters.

Oliveira and Petraglia [

Effati and Nazemi [

Liu et al. [

optimization vector x corresponds to the population habitual residence, the objective function

corresponds to the attractive place of residuals, whereas the global (local) optimal solution of the problems, corresponds to the most attractive (the beneficial) areas. Furthermore, the population flow corresponds to the local random search of the algorithm, while the population migration corresponds to the way of choosing solutions. The quasi-Newton variation of this approach, starts with a set of N uniformly and randomly generated points, and through an iterative procedure moves the most attract point to the center of the search region and shrinks that region until a criterion has been satisfied. At this point, the solution has been found.

The last family of methods used for the identification of nonlinear algebraic system roots is based to the use of neural network techniques. Typical examples of these methods include the use of recurrent neural networks [

where

Margaris, Goulianas and Adamopoulos [

In order to understand the logic behind the proposed method let us consider the complete 2 × 2 nonlinear algebraic system of equations

and the neural network of

The neural solver for the complete n × n nonlinear algebraic system is based exactly on the same logic, but there are also some new features that are not used in [

and its solution is the vector

The neural network architecture that is capable of solving the general case of a n × n nonlinear system of algebraic equations, is shown in

・ Layer 1 is the single input layer.

・ Layer 2 contains n neurons each one of them is connected to the input of the first layer. The weights of the n synapses defined in this way, are the only variable weights of the network. During network training their values are updated according to the equations of the back propagation and after a successful training these weights contain the n components of a system root

・ Layer 3 is composed of n blocks of neurons with the

・ Layer 4 contains an output neuron for each equation, namely, a total number of n neurons that use the identity function

One the other hand, the matrices of the synaptic weights follow the logic used in [

is the only variable weight matrix, whose elements (after the successful training of the network) are the components of one of the system roots, or in a mathematical notation

The matrix W_{23} is composed of n rows with the i_{th} row to be associated with the variable _{th} neuron of the second layer with the complete set of neurons of the third layer. There is a total number of

neurons in this layer and therefore the dimensions of the matrix W_{23} are_{i}, otherwise they have a zero value. Therefore, we have

(16)

Finally, the matrix W_{34} has dimensions

with elements

(17)

The figure shows only the synapses with a nonzero weight value.

Since the unique neuron of the first layer does not participate in the calculations, it is not included in the index notation and therefore if we use the symbol u to describe the neuron input and the symbol v to describe the neuron output, the symbols u_{1} and v_{1} are associated with the n neurons of the second layer, the symbols u_{2} and v_{2} are associated with the k neurons of the third layer and the symbols u_{3} and v_{3} are associated with the n neurons of the third (output) layer. These symbols are accompanied by additional indices that identify a specific neuron inside a layer and this notation is used throughout the remaining part of the paper.

In this section we build the equations of the back propagation algorithm. As it is well known form the literature, this algorithm is composed of three stages, namely a forward pass, a backward pass associated with the estimation of the delta parameter, and a second forward pass related to the weight adaptation process. In a more detailed description, these stages regarding the problem under consideration are performed as follows:

The inputs and the outputs of the network neurons during the forward pass stage, are computed as follows:

LAYER 2

LAYER 3 As it has been mentioned in the description of the neural system, since the neurons in the third layer are organized in n groups with the

LAYER 4 The input and the output of the fourth layer neurons are

and

In this backward phase of the back propagation algorithm the values of the δ parameters are estimated. The delta parameter of the neurons associated with the second, the third and the fourth layer is denoted as δ_{1}, δ_{2} and δ_{3}, with additional indices to identify the position of the neuron inside the layer. Starting from the output layer and using the well known back propagation equations we have

On the other hand, the delta parameter of the third layer neurons are

Note that, in this case, for each third layer neuron and for each one of the x_{k} parameters (where

Finally, the δ parameter of the second layer neurons have the form

where we use the notation

In the last two equations the j variable gets the values

By the mean square error defining equation

where

The adaptive learning rate is a novel feature of the proposed simulator that allows each neuron of the second layer to have its own learning rate

The energy function associated with the

If the energy function for the

From the weight update equation of the back propagation algorithm we have

and therefore,

The convergence condition of the back propagation algorithm is expressed as

it has to be

or equivalently

Defining the adaptive learning rate parameter (ALRP) μ, the above equation can be expressed as

where _{th} column of the Jacobian matrix

for the m_{th} iteration. Using this notation, the back propagation algorithm converges for adaptive learning rate parameter (LALR) values μ < 2.

To examine and test the validity and the accuracy of the proposed method, sample nonlinear algebraic systems were selected and solved using the neural network approach and the results were compared against those obtained from other methods. More specifically, the network solved five nonlinear example systems of n equations with n unknowns, three systems for n = 2, one system for n = 3 and one system for n = 6. In these simulations, the adaptive learning rate approach (ALR) is considered as the primary algorithm, but the network is also tested with a fixed learning rate (FLR) value. Even though in the theoretical analysis and the construction of the associated equations the classical back propagation algorithm was used, the simulations showed that the execution time can be further decreased (leading to the speedup of the simulation process) if in each cycle the synaptic weights were updated one after the other and the new output values were used as input parameters in the corrections associated with the next weight adaptation, an approach that is also used by Meng & Zeng [^{−}^{tol} were used, with a value of tol = 12 to give an accuracy of 6 decimal digits (as in the work of Effati & Nazemi [

According to the proposed method, the condition that ensures the convergence of the back propagation algorithm is given by the inequality μ < 2, where μ is the adaptive learning rate parameter (ALRP). Therefore, to test the validity of this statement, a lot of simulations were performed, with the value of ALRP varying between μ = 0.1 and μ = 1.9 with a variation step equal to 0.1 (note, however, that the value μ = 1.95 was also used). The maximum allowed number of iterations was set to N = 10000 and a training procedure that reaches this limit is considered to be unsuccessful. In all cases, the initial conditions is a set in the form

In almost all cases, the variation step of the system variables is equal to 0.1 or 0.2 even though the values of 0.5 and 1.0 were also used for large

After the description of the experimental conditions let us now present the five example systems as well as the experimental results emerged for each one of them. In the following presentation the roots of the example systems are identified and compared with the roots estimated by the other methods.

Example 1: Consider the following nonlinear algebraic system of two equations with two unknowns:

This system has a unique root with a value of

Note, that this example can also be found in the papers of Effati & Nazemi [

The experimental results for this system and for

Example 2: This system includes also two equations with two unknowns and it is defined as

and it has infinite roots. Therefore, in this case the neural solver tries to identify those roots that belong to the search interval used in any case. This system is also examined in the papers of Effati & Nazemi [

The experimental results in this case are the identified roots in the intervals

Method | x_{1} | x_{2} | F_{1} | F_{2} |
---|---|---|---|---|

Effati & Nazemi | 0.09600 | 0.99760 | 0.019223 | 0.0167760 |

Grosan & Abraham | −0.00138 | 1.00270 | −0.002760 | −0.0000637 |

Neural ALR Method | 0.00000 | 1.00000 | 0.000000 | 0.0000000 |

Method | x_{1} | x_{2} | Global absolute error |
---|---|---|---|

Oliveira & Petraglia | 2.392236421590953 × 10^{−2} | 1.00000000000000 | 0.000000000000 |

Neural ALR Method | 0.00000000000000 | 1.00000000000000 | 0.000000000000 |

case; these roots are not reported in the results, since they are identified again in the next simulation, where the value of the α parameter increases. For example, for μ = 0.5 and α = 2, the network found 23 roots but we keep only the root_{1}, x_{2} ≤ 2. In the same way, for α = 10 and for the same μ value, the network found 149 roots, but only 13 of them belong to the search interval −10 ≤ x_{1}, x_{2} ≤ 10. It is not difficult to guess, that the 23 roots found for α = 2 is a subset of the 149 roots found for α = 10, and so on. The simulation results for α = 2 associated with a tolerance value of tol = 12 and tol = 31 are shown in _{1}, x_{2} ≤ 10 with 15 digits accuracy, namely for a tolerance of tol = 31, are shown in _{1}, x_{2} ≤ 2 with variation step Δx_{1} = Δx_{2} = 0.2), 10,201 for α = 10 (namely −10 ≤ x_{1}, x_{2} ≤ 10 with variation step Δx_{1} = Δx_{2} = 0.2) and 10,201 for α = 100 (namely −100 ≤ x_{1}, x_{2} ≤ 100 with variation step Δx_{1} = Δx_{2} = 0.2).

The results for α = 1000 are too lengthy to be presented here and, due to the enormous computation time, it was not possible to perform the exhaustive search as with the previous cases. Therefore, the algorithm run with variation steps of Δx_{1} = Δx_{2} = 0.2, 0.5 and 1.0, and only for the ALRP value μ = 1.5. The number of identified roots in each case varies according to the variation step, but in all cases the network identified a number of 1273 roots in the interval [−1000, +1000]. This is a new result that does not appear in the other works used for comparison.

Before proceeding to the next example system, let us present some results regarding the FLR (fixed learning rate) method, in which all the neurons share the same learning rate value. In this experiment the learning rate

Method | x_{1} | x_{2} | F_{1} | F_{2} |
---|---|---|---|---|

Effati & Nazemi | 0.15750 | 0.49700 | 0.005455 | 0.007390 |

Grosan & Abraham | 0.15772 | 0.49458 | 0.001640 | 0.000969 |

Neural ALR Method | 0.15652 | 0.49338 | −0.000001 | 0.000001 |

Method | x_{1} | x_{2} | Global absolute error |
---|---|---|---|

Oliveira & Petraglia | 0.1565200696831246 | 0.493376374223309 | 1.656070029373846175E−14 |

Neural ALR Method | 0.1565200696831300 | 0.493376374223239 | 0.0000000000000092 |

μ | α = 2 | α = 10 | α = 100 | |||
---|---|---|---|---|---|---|

Roots found | Perc % | Roots found | Perc % | Roots found | Perc % | |

0.10 | 11 | 9.09% | 110 | 11.82% | 204 | 62.25% |

0.20 | 11 | 9.09% | 095 | 13.68% | 195 | 65.13% |

0.30 | 13 | 7.69% | 136 | 09.56% | 250 | 50.80% |

0.40 | 16 | 6.25% | 147 | 08.84% | 215 | 59.07% |

0.50 | 23 | 4.35% | 136 | 09.56% | 216 | 58.80% |

0.60 | 18 | 5.56% | 125 | 10.40% | 218 | 58.26% |

0.70 | 21 | 4.76% | 127 | 10.24% | 220 | 57.73% |

0.80 | 22 | 4.55% | 149 | 08.72% | 236 | 53.81% |

0.90 | 21 | 4.76% | 120 | 10.83% | 219 | 57.99% |

1.00 | 28 | 3.57% | 142 | 09.15% | 240 | 52.92% |

1.10 | 26 | 3.85% | 147 | 08.84% | 225 | 56.44% |

1.20 | 30 | 3.33% | 150 | 08.67% | 234 | 54.27% |

1.30 | 41 | 2.44% | 179 | 07.26% | 261 | 48.66% |

1.40 | 54 | 1.85% | 267 | 04.87% | 311 | 40.84% |

1.50 | 47 | 2.13% | 238 | 05.46% | 296 | 42.91% |

1.60 | 50 | 2.00% | 251 | 05.18% | 329 | 38.60% |

1.70 | 49 | 2.04% | 229 | 05.68% | 296 | 42.91% |

1.80 | 54 | 1.85% | 260 | 05.00% | 295 | 43.05% |

1.90 | 56 | 1.79% | 220 | 05.91% | 289 | 43.94% |

1.95 | 61 | 1.64% | 231 | 05.63% | 301 | 42.19% |

Root No. | x_{1} | x_{2} |
---|---|---|

ROOT 01 | −9.268257891086250 | −8.931401586546140 |

ROOT 02 | −8.744542160840460 | −7.164787219352630 |

ROOT 03 | −5.602949507250660 | −4.023194565762840 |

ROOT 04 | −2.461356853660870 | −0.881601912173048 |

ROOT 05 | −2.985072583906650 | −2.648216279366540 |

ROOT 06 | 9.581298030452510 | 9.918154334992620 |

ROOT 07 | 3.298112723272930 | 3.634969027813040 |

ROOT 08 | 0.156520069683130 | 0.493376374223237 |

ROOT 09 | 0.680235799928919 | 2.259990741416740 |

ROOT 10 | −6.126665237496440 | −5.789808932956330 |

ROOT 11 | 3.821828453518710 | 5.401583395006530 |

ROOT 12 | 6.963421107108500 | 8.543176048596330 |

ROOT 13 | 6.439705376862720 | 6.776561681402830 |

was much smaller than ALR, with the μ parameter to get the values from μ = 0.01 to μ = 0.2, with a variation step of Δμ = 0.1. The tolerance value was equal to tol = 12 and the systems studied for the parameters α = 2, 10, 100. As a typical example, let us present the results associated with the value μ = 0.11. In this case, the network identified the unique root for α = 2 (as well as 4 additional roots outside the search region) after 17 iterations and 127 roots for α = 100 (as well as 24 additional roots outside the search interval), and for the same iteration number. The simulation showed that the performance of the network decreases as the FLR value varies from 0.15 to 0.2 (with a variation step of 0.01). More specifically, in this case, the neural network is not able to identify the unique root in the interval [−2, 2] and, furthermore, it finds only the half of the roots found previously (namely, 6 of 13 roots in the interval [−10, 10] and 64 of 127 roots in the interval [−100, 100]). A construction of a diagram similar to the one shown in

Example 3: This nonlinear algebraic system can be found in [

and it has a unique root with a value of

It is interesting to note that the solution method proposed for this system in the literature, is also a neural based approach and, therefore, it is more suitable for comparison with the proposed method. The simulation results emerged from the proposed neural method, as well as from the method of Meng & Zeng are shown in

Example 4: Consider the system of three equations with three unknowns

This system has only one root. The solution of this system using the Population Migration Algorithm (PAM) [

Nonlinear System Neural Solver | ||||||
---|---|---|---|---|---|---|

μ | Iter | x_{1} | x_{2} | |||

−1 | −1 | 1.1 | 12 | 0.000000001239 | −0.4511229388 | 0.4451777054 |

−1 | +1 | 1.1 | 11 | 0.000000001663 | −0.4511229388 | 0.4451777054 |

+1 | −1 | 1.1 | 12 | 0.000000001277 | −0.4511229388 | 0.4451777054 |

+1 | +1 | 1.1 | 11 | 0.000000001356 | −0.4511229388 | 0.4451777054 |

Method of Meng and Zeng | ||||||

μ | Iter | x_{1} | x_{2} | |||

−1 | −1 | 1.1 | 17 | 0 | −0.4511229388 | 0.4451777054 |

−1 | +1 | 1.1 | 17 | 4.44 × 10^{−16} | −0.4511229388 | 0.4451777054 |

+1 | −1 | 1.1 | 18 | 2.22 × 10^{−16} | −0.4511229388 | 0.4451777054 |

+1 | +1 | 1.1 | 17 | 2.22 × 10^{−16} | −0.4511229388 | 0.4451777054 |

evaluate the performance of the proposed neural solver, the system is solved for the search region −2 ≤ x_{1}, x_{2}, x_{3} ≤ 2 with a variation step of Δx_{1} = Δx_{2} = Δx_{3} = 0.2, leading thus to the examination of 9261 initial condition combinations. For each one of those combinations the root identification was performed using the same ALRP values (namely from μ = 0.1 to μ = 1.95). The results for the best simulation runs can be found in

A typical graph that shows the variation of the average iteration number with respect the value of the ALRP parameter is shown in

Example 5: The last example is associated with a large enough nonlinear algebraic system of six equations with six unknowns defined as

Method | μ | Iteration Number | x_{1} | x_{2} | x_{3} | Success Rate |
---|---|---|---|---|---|---|

PMA | N/A | 44 | −0.7677605533 | 0.0753306710 | −1.1621588410 | 88% |

QPMA | N/A | 50 | −0.7677605635 | 0.0753306762 | −1.1621511842 | 100% |

ALR | 0.7 | 34 | −0.7677605624 | 0.0753306764 | −1.1621511859 | 100% |

ALR | 1.1 | 12 | −0.7677605624 | 0.0753306764 | −1.1621511859 | 98% |

This system also appears in the work of Liu et al. [_{1} = x_{3} = x_{5} = −1 and x_{2} = x_{4} = x_{6} = 1. The experimental results for this system are given in

Method | μ | Iteration Number | Solution vector | Success Rate | ||
---|---|---|---|---|---|---|

x_{1} | x_{2} | x_{3} | ||||

PMA | N/A | 43 | −1.0000012311 | 0.9999695481 | −1.0000000121 | 86% |

x_{4} | x_{5} | x_{6} | ||||

0.9999998799 | −1.0000003622 | 0.9999985874 | ||||

x_{1} | x_{2} | x_{3} | ||||

QPMA | N/A | 50 | −1.0000000000 | 1.0000000000 | −1.0000000000 | 100% |

x_{4} | x_{5} | x_{6} | ||||

1.0000000000 | −1.0000000000 | 1.0000000000 | ||||

x_{1} | x_{2} | x_{3} | ||||

ALR | 1.5 | 63 | −1.0000000000 | 1.0000000000 | −1.0000000000 | 71% |

x_{4} | x_{5} | x_{6} | ||||

1.0000000000 | −1.0000000000 | 1.0000000000 |

The variation of the success rate with respect to the ALRP parameter μ for the last two example systems is plotted in

A very crucial task in evaluating this type of arithmetic methods is to examine their scaling capabilities and more specifically to measure the iteration number and the execution time as the dimension of the system increases. In order to compare the results of the proposed method with the ones emerged by other well known methods, the following example systems were implemented and solved using the proposed algorithm:

・ Example 6: This system is Problem 1 in [

with initial conditions

・ Example 7: This system is Problem 3 in [

・ Example 8: This system is Problem 4 in [

・ Example 9: This system is Problem 2 in [

with initial conditions

・ Example 10: This system is Problem 1 in [

with initial conditions

・ Example 11: This system is Problem 2 in [

with initial conditions

・ Example 12: This system is Problem 3 in [

with initial conditions

・ Example 13: This system is Problem 5 in [

with initial conditions

The stopping criterion in these simulations is defined as

where tol is the tolerance value. To achieve the best results and an accuracy of 6 decimal points in the final solution, different values for tol and ALRP were used in each example, and more specifically,

・ the values tol = 12 and ALRP = 19 for Example 6

・ the values tol = 12 and ALRP = 0.8 for Example 7

・ the values tol = 12 and ALRP = 1.0 for Example 8

・ the values tol = 12 and ALRP = 0.6 for Example 9

・ the values tol = 18 and ALRP = 1.1 for Example 10

・ the values tol = 36 and ALRP = 1.9 for Example 11

・ the values tol = 10 and ALRP = 0.8 for Example 12

・ the values tol = 12 and ALRP = 1.0 for Example 13

The simulation results for the above examples are summarized in

The simulation runs were implemented on an Intel Core i5 cpu machine with 2.66Ghz and 4GB Memory, the applications were written in Matlab and the main conclusions drawn from them are the following:

・ In most of the examples (except examples 8a and 9), the iteration number is almost the same, no matter the dimension of the systems used.

・ In most of the examples, the proposed method needs less iterations than Newton and Broyden methods even for the large systems, and is better in CPU time for the convergence to a good solution with 6 decimal points accuracy, except the case of n ≥ 200, where Broyden-2 method has better CPU time.

・ In Example 8a the Newton method converges only for small dimensions, and for values n ≥ 5 the only method that converges is the proposed one.

n | Newton | Fixed Newton | Broyden-1 | Broyden-2 | GBALR | ALRP | ||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|

ITER | CPU | ITER | CPU | ITER | CPU | ITER | CPU | ITER | CPU | |||

E X 6 | 5 | 17 | 0.0051 | − | − | 25 | 0.0012 | 25 | 0.0009 | 4 | 0.0003 | 1.9 |

10 | 17 | 0.0060 | − | − | 25 | 0.0022 | 25 | 0.0015 | 4 | 0.0005 | 1.9 | |

20 | 17 | 0.0052 | − | − | 25 | 0.0029 | 25 | 0.0017 | 4 | 0.0007 | 1.9 | |

50 | 17 | 0.0208 | − | − | 26 | 0.0066 | 26 | 0.0032 | 4 | 0.0018 | 1.9 | |

100 | 18 | 0.0385 | − | − | 26 | 0.0246 | 26 | 0.0174 | 4 | 0.0062 | 1.9 | |

200 | 18 | 0.1960 | − | − | 27 | 0.0897 | 27 | 0.0399 | 4 | 0.0343 | 1.9 | |

500 | 18 | 39.703 | − | − | 27 | 0.9723 | 27 | 0.2133 | 4 | 0.8369 | 1.9 | |

1000 | 19 | 389.304 | − | − | 27 | 92.776 | 25 | 11.240 | 4 | 87.197 | 1.9 | |

E X 7 | 5 | − | − | − | − | − | − | 16 | 0.0040 | − | − | − |

10 | − | − | − | − | − | − | − | − | − | − | − | |

20 | − | − | − | − | − | − | − | − | 49 | 0.0062 | 0.8 | |

50 | − | − | − | − | − | − | − | − | 49 | 0.0210 | 0.8 | |

100 | − | − | − | − | − | − | − | − | 49 | 0.0749 | 0.8 | |

200 | − | − | − | − | − | − | − | − | 49 | 0.4114 | 0.8 | |

500 | − | − | − | − | − | − | − | − | 49 | 89.818 | 0.8 | |

1000 | − | − | − | − | − | − | − | − | 49 | 849.829 | 0.8 |

E X 8a | 2 | 21 | 0.074 | 354 | 0.0078 | − | − | − | − | 8 | 0.0004 | 1.0 |
---|---|---|---|---|---|---|---|---|---|---|---|---|

3 | 50 | 0.0059 | − | − | − | − | − | − | 40 | 0.0013 | 1.0 | |

4 | 361 | 0.0243 | − | − | − | − | − | − | 23 | 0.0009 | 1.0 | |

5 | 21 | 0.0081 | − | − | − | − | − | − | 7 | 0.0004 | 1.0 | |

10 | − | − | − | − | − | − | − | − | 20 | 0.0014 | 1.0 | |

20 | − | − | − | − | − | − | − | − | 64 | 0.0096 | 1.0 | |

50 | − | − | − | − | − | − | − | − | − | − | − | |

100 | − | − | − | − | − | − | − | − | 142 | 0.3298 | − | |

200 | − | − | − | − | − | − | − | − | 248 | 26.998 | 1.0 | |

500 | − | − | − | − | − | − | − | − | − | − | − | |

1000 | − | − | − | − | − | − | − | − | − | − | − | |

E X 8b | 5 | 3 | 0.0022 | 12 | 0.0005 | 16 | 0.0009 | − | − | 3 | 0.0003 | 1.0 |

10 | 3 | 0.0024 | 11 | 0.0006 | 15 | 0.0012 | − | − | 3 | 0.0004 | 1.0 | |

20 | 3 | 0.0069 | 10 | 0.0023 | 15 | 0.0021 | 18 | 0.0019 | 3 | 0.0007 | 1.0 | |

50 | 3 | 0.0104 | 10 | 0.0078 | 15 | 0.0090 | − | − | 3 | 0.0021 | 1.0 | |

100 | 3 | 0.0271 | 9 | 0.0112 | 15 | 0.0268 | − | − | 3 | 0.0080 | 1.0 | |

200 | 3 | 0.0389 | 9 | 0.0588 | 15 | 0.0989 | − | − | 3 | 0.0369 | 1.0 | |

500 | 3 | 0.6665 | 8 | 0.5913 | 14 | 0.7548 | − | − | 3 | 0.6161 | 1.0 | |

1000 | 3 | 63.795 | 8 | 49.367 | 14 | 65.362 | − | − | 3 | 64.434 | 1.0 | |

E X 9 | 5 | − | − | − | − | − | − | − | − | 94 | 0.0031 | 0.6 |

10 | − | − | − | − | − | − | − | − | 104 | 0.0062 | 0.6 | |

20 | − | − | − | − | − | − | − | − | 110 | 0.0124 | 0.6 | |

50 | − | − | − | − | − | − | − | − | 114 | 0.0453 | 0.6 | |

100 | − | − | − | − | − | − | − | − | 118 | 0.1734 | 0.6 | |

200 | − | − | − | − | − | − | − | − | 120 | 0.9706 | 0.6 | |

500 | − | − | − | − | − | − | − | − | 124 | 227.027 | 0.6 | |

1000 | − | − | − | − | − | − | − | − | 127 | 2622.660 | 0.6 | |

E X 10 | 5 | − | − | − | − | − | − | − | − | 23 | 0.0013 | 1.1 |

10 | − | − | − | − | − | − | − | − | 24 | 0.0017 | 1.1 | |

20 | − | − | − | − | − | − | − | − | 28 | 0.0040 | 1.1 | |

50 | − | − | − | − | − | − | − | − | 28 | 0.0124 | 1.1 | |

100 | − | − | − | − | − | − | − | − | 28 | 0.0463 | 1.1 | |

200 | − | − | − | − | − | − | − | − | 28 | 0.2315 | 1.1 | |

500 | − | − | − | − | − | − | − | − | 28 | 49.068 | 1.1 | |

1000 | − | − | − | − | − | − | − | − | 28 | 463.956 | 1.1 |

E X 11 | 5 | 33 | 0.0037 | − | − | 48 | 0.0021 | 48 | 0.0015 | 14 | 0.0006 | 1.9 |
---|---|---|---|---|---|---|---|---|---|---|---|---|

10 | 34 | 0.0102 | − | − | 49 | 0.0036 | 49 | 0.0019 | 14 | 0.0009 | 1.9 | |

20 | 34 | 0.0160 | − | − | 49 | 0.0052 | 49 | 0.0026 | 14 | 0.0017 | 1.9 | |

50 | 34 | 0.0223 | − | − | 50 | 0.0125 | 50 | 0.0061 | 14 | 0.0059 | 1.9 | |

100 | 35 | 0.0780 | − | − | 50 | 0.0512 | 50 | 0.0312 | 14 | 0.0204 | 1.9 | |

200 | 35 | 0.3631 | − | − | 50 | 0.1601 | 50 | 0.0594 | 14 | 0.1083 | 1.9 | |

500 | 35 | 73.897 | − | − | 51 | 12.917 | 51 | 0.2826 | 15 | 26.230 | 1.9 | |

1000 | 36 | 705.095 | − | − | 51 | 85.712 | 51 | 29.776 | 15 | 249.442 | 1.9 | |

E X 12 | 5 | 4 | 0.0020 | 17 | 0.0006 | 6 | 0.0005 | 6 | 0.0004 | 1 | 0.0002 | 0.8 |

10 | 4 | 0.0020 | 18 | 0.0007 | 6 | 0.0005 | 6 | 0.0004 | 1 | 0.0001 | 0.8 | |

20 | 4 | 0.0037 | 18 | 0.0011 | 6 | 0.0007 | 6 | 0.0004 | 1 | 0.0002 | 0.8 | |

50 | 4 | 0.0101 | 19 | 0.0076 | 6 | 0.0018 | 6 | 0.0009 | 1 | 0.0005 | 0.8 | |

100 | 4 | 0.0185 | 20 | 0.0126 | 6 | 0.0064 | 6 | 0.0023 | 1 | 0.0019 | 0.8 | |

200 | 4 | 0.0622 | 02 | 0.0720 | 6 | 0.0258 | 6 | 0.0175 | 1 | 0.0089 | 0.8 | |

500 | 4 | 0.8655 | 21 | 0.9181 | 6 | 0.3926 | 6 | 0.0379 | 1 | 0.1891 | 0.8 | |

1000 | 4 | 76.968 | 20 | 64.971 | 6 | 26.772 | 6 | 0.1683 | 1 | 17.110 | 0.8 | |

E X 13a | 5 | 6 | 0.0037 | − | − | 7 | 0.0011 | 7 | 0.0010 | 6 | 0.0006 | 1.0 |

10 | 6 | 0.0039 | − | − | 8 | 0.0015 | 8 | 0.0011 | 6 | 0.0006 | 1.0 | |

20 | 6 | 0.0045 | − | − | 8 | 0.0022 | 8 | 0.0014 | 6 | 0.0010 | 1.0 | |

50 | 6 | 0.0091 | − | − | 8 | 0.0034 | 8 | 0.0016 | 6 | 0.0027 | 1.0 | |

100 | 6 | 0.0270 | − | − | 8 | 0.0086 | 8 | 0.0057 | 6 | 0.0102 | 1.0 | |

200 | 6 | 0.0668 | − | − | 8 | 0.0319 | 8 | 0.0084 | 6 | 0.0486 | 1.0 | |

500 | 6 | 13.930 | − | − | 8 | 0.3883 | 8 | 0.0519 | 6 | 11.701 | 1.0 | |

1000 | 6 | 113.84 | − | − | 8 | 28.732 | 8 | 19.110 | 6 | 89.599 | 1.0 | |

E X 13b | 5 | 5 | 0.0060 | 31 | 0.0010 | 11 | 0.0008 | 11 | 0.0006 | 5 | 0.0004 | 0.8 |

10 | 5 | 0.0061 | 32 | 0.0029 | 11 | 0.0010 | 13 | 0.0008 | 5 | 0.0005 | 0.8 | |

20 | 5 | 0.0028 | 33 | 0.0021 | 13 | 0.0014 | 13 | 0.0009 | 5 | 0.0008 | 0.8 | |

50 | 5 | 0.0086 | 34 | 0.0063 | 14 | 0.0041 | 13 | 0.0018 | 5 | 0.0022 | 0.8 | |

100 | 5 | 0.0270 | 34 | 0.0206 | 13 | 0.0127 | 13 | 0.0106 | 5 | 0.0075 | 0.8 | |

200 | 5 | 0.0632 | 35 | 0.1224 | 16 | 0.0537 | 13 | 0.0438 | 5 | 0.0431 | 0.8 | |

500 | 5 | 10.899 | 36 | 14.769 | 16 | 0.5260 | 13 | 0.0828 | 5 | 0.8842 | 0.8 | |

1000 | 5 | 93.228 | 36 | 99.048 | 19 | 41.113 | 13 | 0.3420 | 5 | 82.958 | 0.8 |

・ In Examples 7, 9 and 10 the proposed method (GBARL) is the only that converges.

・ The fixed Newton method converges only in Examples 8b, 12 and 13b.

・ In Example 12 the proposed method needs only one iteration to converge.

・ Even though the Broyden-2 method leads generally to better CPU times for large system dimensions (e.g. n = 200), in a lot of cases it is unable to converge (e.g. in Examples 7, 8, 9, 10).

・ In Example 8a the proposed method exhibits an irregular behavior regarding the variation of the iteration number, while in Example 9 this variation is more regular. In all the other cases, the number of iterations is almost constant with a very small variations.

・ Besides the Examples 8b and 13, the required number of iterations for convergence is smaller than the one associated with the other methods.

The objective of this research was the design and performance evaluation of a neural network architecture, capable of solving a complete nonlinear algebraic system of n equations with n unknowns. The developed theory shows that the network must be used with an adaptive learning rate parameter μ < 2. According to the experiments, for small values of the system dimension n, good values of μ can be found in the region [0.7, 1.7] with a best value at μ < 1.1, whereas for large n, good values of μ can be found in the region [1.0, 1.9], with a best value at μ < 1.5. The network was able to identify all the available roots in the search region with an exceptional accuracy. The proposed method was tested with large sparse systems with dimension from n = 5 to n = 1000 and in the most cases the number of iterations did not depend on the system dimension. The results showed that the GBARL method was better than the classical methods with respect to the iteration number required for convergence to a solution, as well as the CPU execution time, for system dimensions n ≤ 100. Regarding larger system dimensions, the GBARL method is better than the Broyden-2 method with respect to the number of iterations but requires more CPU execution time. However the Broyden-2 could not able to converge for a lot of examples and for the initial conditions found in the literature. Challenges for future research include the use of the network with other activation functions in the output layer, such as the hyperbolic tangent function, as well as the ability of the network to handle situations such that the case of multiple roots (real and complex) for the case of overdetermined and underdetermined systems.

The research of K. Goulianas, A. Margaris, I. Refanidis, K. Diamantaras and T. Papadimitriou, has been co-financed by the European Union (European Social Fund-ESF) and Greek national funds through the Operational Program “Education and Lifelong Learning” of the National Strategic Reference Framework (NSRF)― Research Funding Program: THALES, Investing in knowledge society through the European Social Fund.

Konstantinos Goulianas,Athanasios Margaris,Ioannis Refanidis,Konstantinos Diamantaras,Theofilos Papadimitriou, (2016) A Back Propagation-Type Neural Network Architecture for Solving the Complete n x n Nonlinear Algebraic System of Equations. Advances in Pure Mathematics,06,455-480. doi: 10.4236/apm.2016.66033