Synthesizing multi-layer perceptron network with ant lion, biogeography-based dragonfly algorithm evolutionary strategy invasive weed and league champion optimization hybrid algorithms in predicting heating load in residential buildings

The significance of heating load (HL) accurate approximation is the primary motivation of this research to distinguish the most efficient predictive model among several neural-metaheuristic models. The proposed models are through synthesizing multi-layer perceptron network (MLP) with ant lion optimization (ALO), biogeography-based optimization (BBO), dragonfly algorithm (DA), evolutionary strategy (ES), invasive weed optimization (IWO), and league champion optimization (LCA) hybrid algorithms. Each ensemble is optimized in terms of the operating population. Accordingly, the ALO-MLP, BBO-MLP, DA-MLP, ES-MLP, IWO-MLP, and LCA-MLP presented their best performance for population sizes of 350, 400, 200, 500, 50, and 300, respectively. The comparison was carried out by implementing a ranking system. Based on the obtained overall scores (OSs), the BBO (OS = 36) featured as the most capable optimization technique, followed by ALO (OS = 27) and ES (OS = 20). Due to the efficient performance of these algorithms, the corresponding MLPs can be promising substitutes for traditional methods used for HL analysis.

Up to now, diverse notions of soft computing techniques (e.g., support vector machine (SVM) and artificial neural network (ANN)) have been effectively used for energy consumption modeling [87][88][89][90][91]. Roy, et al. [92] proposed multivariate adaptive regression splines (MARS) coupled with an extreme learning machine (ELM) for predicting the HL and CL.They used the first model to perform importance analysis of the parameters to feed the second model.Likewise, Sholahudin and Han [93] used an ANN along with the Taguchi method for investigating the effect of the input factors on the HL.The feasibility of a random forest predictive method was investigated by Tsanas and Xifara [94] and Gao et al. [95] for both HL and CL factors.The latter reference is a comprehensive comparative study that compares the simulation capability of sixteen machine learning models (e.g., elastic net, radial basis function regression).This study also confirmed the high efficiency of alternating model tree and rules decision table models.Chou and Bui [91] proposed the combination of ANN and SVM as a proper model for new designs of energy-conserving buildings.The applicability of the neuro-fuzzy approach (ANFIS) for predicting the HL and CL was explored by Nilashi et al. [96].They used expectation-maximization and principal component analysis along with the ANFIS, respectively, for clustering objective and removing noise.Referring to obtained values of mean absolute error (MAE) (0.16 and 0.52 for the HL and CL predictions, respectively), they concluded that the proposed model is accurate enough for this aim.
In addition, studies in different fields have shown that utilizing metaheuristic algorithms is an effective idea for improving the accuracy of typical predictors [97,98].For energy-efficient buildings, Moayedi et al. [99] improved the ANN for forecasting the CL by benefiting from the foraging/social behavior of ants, Harris hawks, and elephant (i.e., the EHO algorithm).The results (e.g., the correlation values over 85%) show that the applied algorithms can satisfactorily handle the optimization task.An EHO-based CL predictive formula was also presented.Amar and Zeraibi [100] used the firefly algorithm to optimize the SVM (parameters) for HL modeling in district heating systems.Their model outperformed genetic programming and ANN.Moayedi et al. [99] employed a grasshopper optimization algorithm (GOA) and grey wolf optimization (GWO) algorithms for enhancing the HL prediction of ANN.A significant decrease in the MEA calculated for the ANN (from 2.0830 to 1.7373 and 1.6514, respectively, by incorporation of the GOA and GWO) means that the algorithms can build a more reliable ANN network compared to the typical back-propagation one.In addition, other studies such as [26] outlined the competency of such algorithms in the same fields.As a visible gap of knowledge, despite the variety of studies that have mainly focused on broadly used metaheuristic techniques [101], there are still some algorithms that need to be evaluated.Therefore, assessing the performance of six novel optimization techniques, namely ant lion optimization (ALO), biogeography-based optimization (BBO), many-objective sizing optimization [102][103][104], data-driven robust optimization [35,105], the dragonfly algorithm (DA), evolutionary strategy (ES), invasive weed optimization (IWO), and league champion optimization (LCA), is the central aim of the present paper.

Data Provision and Analysis
Providing a reliable dataset is an essential step in intelligent model implementation.These data are used in two stages.Firstly, the significant share is analyzed by the models to infer the relationship between the intended factors and independent variables.The rests are then used to represent unseen conditions of the problem and the performance of the model for stranger data.
In this article, the used dataset was downloaded from a freely available data repository (http://archive.ics.uci.edu/mL/datasets/Energy+efficiency,accessed on 20 December 2020) based on a study by Tsanas and Xifara [94].They analyzed 768 residential buildings with different geometries using Ecotect software [106] to obtain the HL and CL as the outputs.They set the information of eight independent factors, namely relative compactness (RC), overall height (OH), surface area (SA), orientation, wall area (WA), glazing area (GA), roof area (RA), and glazing area distribution (GAD).Figure 1 shows the distribution of these factors versus the HL, which we aim to predict in this study.Based on plenty of previous studies [97], a random division process was carried out to specify 538 samples (i.e., 70% of the whole) and 230 rows (i.e., 30% of the whole) to the training and testing sets, respectively.

Methodology
The overall methodology used in this study is shown in Figure 2.

Artificial Neural Network
ANNs are popular data mining techniques based on the biological mechanism of the neural network [107].ANNs are able to deal with highly complicated engineering simulations because of the non-linear analysis option [108,109].This approach distinguishes itself by different notions including multi-layer perceptron (MLP) [110], radial basis function [111], and general regression [112].In this study, an MLP network was selected as the basic method.Figure 3 depicts the MLP general structure predicting M output variables by taking into consideration L input factors.It is important to note that in an MLP, more than one hidden layer can be sandwiched between two other layers.However, theoretical studies have demonstrated the efficiency of unique hidden layer MLPs for any problem.
ANNs normally benefit from the training scheme of Levenberg-Marquardt (LM), an approximation to the method of Newton [113] (Equation (1)).The LM is known to be quicker and enjoy more power compared to conventional gradient descent technique [114,115].
where ∇V(x) and ∇ 2 V(x) are the gradient and the Hessian matrix, respectively.The following equation expresses V(x) as a sum of squares function: Next, let J(x) be the Jacobean matrix, then it can be written: Equation ( 1) can be written as follows when S(x) ≈ 0: Lastly, Equation ( 5) presents the central equation of the LM, based on the Gauss-Newton method.∆x = J T (x)J(x) Remarkably, high and low values of µ turn this algorithm to steepest descent (with step 1/µ) and Gauss-Newton, respectively.The overall methodology used in this study is shown in Figure 2.

Artificial Neural Network
ANNs are popular data mining techniques based on the biological mechanism of the neural network [107].ANNs are able to deal with highly complicated engineering simulations because of the non-linear analysis option [108,109].This approach distinguishes itself by different notions including multi-layer perceptron (MLP) [110], radial basis function [111], and general regression [112].In this study, an MLP network was selected as the basic method.Figure 3 depicts the MLP general structure predicting M output variables by taking into consideration L input factors.It is important to note that in an MLP, more than one hidden layer can be sandwiched between two other layers.However, theoretical studies have demonstrated the efficiency of unique hidden layer MLPs for any problem.
ANNs normally benefit from the training scheme of Levenberg-Marquardt (LM), an approximation to the method of Newton [113] (Equation ( 1)).The LM is known to be quicker and enjoy more power compared to conventional gradient descent technique [114,115].
where  Next, let (x) J be the Jacobean matrix, then it can be written:

Swarm-Based Metaheuristic Ideas
Optimization algorithms which have recently been very popular for enhancing the performance of predictive models (e.g., ANNs) are based on swarm functioning of a group of corresponding individuals.They are mostly inspired by nature and seek an optimal global solution for a defined problem by analyzing the relationship between the existing parameters.Coupled with an ANN, these optimizers seek to adjust the biases and weights.This process is better explained in the next section.Here, the overall idea of the intended algorithms is briefly described.
Ant lion optimization (Mirjalili [116]) is a recently-developed hybrid model that mimics the herding behavior of ant lions.It comprises different stages in which the prey (usually an ant) gets trapped and hunted in a hole by a random walk.The capability of the individuals is evaluated by a "roulette wheel selection" function.Biogeography-based optimization is based on two items: (a) the information concerning biogeography and (b) the way different species are distributed.This algorithm was designed by Simon [117] and was used by Mirjalili, et al. [118] to train an MLP network.In the BBO, there are migration and mutation steps and the population is made up of "habits".Note that these habits are evaluated by two indices called the habitat suitability index and suitability index variable.The dragonfly algorithm is another population-based optimization technique proposed by Mirjalili [119].Based on the Reynolds swarm intelligence, the DA draws on three stages, namely separation, alignment, and cohesion.The name evolutionary strategy implies a stochastic search approach proposed by Schwefel [120].In the ES, two operators of selection and mutation act during the evolution and adaption stages.The population is produced with offspring variables and the offspring's modality is compared to that of the parents.Inspired by the colonizing behavior of weeds, invasive weed optimization was presented by Mehrabian and Lucas [121].The optimal solution of this algorithm is the most suitable site for the plants to grow and reproduce.The algorithm begins with the initialization and after reproducing, it runs the stages called spatial dispersal and competitive exclusion, and gets stopped after meeting with the termination measures.Last but not least, league champion optimization is suggested by Kashan [122], mimicking sporting competitions in leagues.The LCA tries to find the best-fitted solution to the problem by implementing an artificial league including schedule programming and determining the winner/looser teams.More information about the mentioned algorithms (e.g., mathematical relationships) was detailed in previous studies (for the ALO [123,124], BBO [125], DA [126], ES [127], IWO [128], and LCA [129,130]).

Hybridization Process and Sensitivity Analysis
In order to develop the proposed neural-metaheuristic ensembles, the algorithms should be hybridized with the ANN.To this end, utilizing the provided data, the general equation of an MLP neural network is yielded to the ALO, BBO, DA, ES, IWO, and LCA as the problem function.But before that, it is required to determine the most suitable structure (i.e., the number of neurons) of it.As explained previously, the number of neurons in the first and the last layers is equal to the number of input and output variables, respectively.Hence, only the number of hidden neurons can be varied.Based on a trial-and-error process, it was set to five.Therefore, the network architecture was distinguished as 8 × 5 × 1.
Each ensemble was executed within 1000 repetitions, where the mean square error (MSE) was defined to measure the performance error during them (objective function = MSE).For greater reliability of the results, a sensitivity analysis was carried out in this part.Eleven different population sizes, including 25, 50, 100, 150, 200, 250, 300, 350, 400, 450, and 500, were tested for each model, and the best-fitted complexity was used to predict the HL in the following.The convergence curves belonging to elite networks of each model are presented in Figure 4.According to these charts, for all algorithms, the error is chiefly reduced within the first half of the iterations.Test best population sizes are determined 350, 400, 200, 500, 50, and 300 for the ALO-MLP, BBO-MLP, DA-MLP, ES-MLP, IWO-MLP, and LCA-MLP, respectively.(e) (f)

Statistical Accuracy Assessment
Three broadly used criteria are applied to measure the prediction accuracy of the implemented models by reporting the error and correlation of the results.For this purpose, MAE (along with the RMSE) and the coefficient of determination (R 2 ) are used.These criteria are applied to the data belonging to the training and testing groups to demonstrate the qualities of learning and prediction, respectively.Assuming G as the total number of

Statistical Accuracy Assessment
Three broadly used criteria are applied to measure the prediction accuracy of the implemented models by reporting the error and correlation of the results.For this purpose, MAE (along with the RMSE) and the coefficient of determination (R 2 ) are used.These criteria are applied to the data belonging to the training and testing groups to demonstrate the qualities of learning and prediction, respectively.Assuming G as the total number of samples, and J i observed, and J i predicted as the real and forecasted HL values, Equations ( 6)-( 8) formulate the RMSE, MAE, and R 2 .
where J observed denotes the mean of J i observed values.

Training Results
The results of elite structures of each model are evaluated in this section.Figure 5 shows the training results.In this regard, the error (=real HL − forecasted HL) is calculated and marked for all 538 samples.In this phase, the maximum and minimum of the (positive) error values were 0.0136 and 6.4455, 0.0018 and 6.0681, 0.0019 and 9.2773, 0.0248 and 7.3006, 0.0184 and 6.3776, and 0.0715 and 8.4620, respectively, for the leaning process of ALO-MLP, BBO-MLP, DA-MLP, ES-MLP, IWO-MLP, and LCA-MLP ensembles.
Referring to the calculated RMSEs (2.6054, 2.5359, 3.4314, 2.7146, 3.2506, and 3.8297), all six models achieved a reliable performance in understanding the non-linear relationship of the HL and eight influential factors.Another piece of evidence that supports this claim is the MAE index (2.0992,2.0846, 2.9402, 2.0848, 2.8709, and 3.4091).Furthermore, the correlation between the expected and real HLs is higher than 92% in all models.In detail, the values of R 2 are 0.9539, 0.9596, 0.9222, 0.9357, 0.9547, and 0.9386.

Validation Results
The developed models are then applied to the second group of data to assess the generalization capability of them.Figure 6 depicts the correlation between the expected HLs and networks' products.As is seen, all obtained R 2 s (0.9406, 0.9516, 0.9340, 0.9318, 0.9431, and 0.9400) reflect higher than 93% accuracy for all models.In this phase, the errors range between −5.5792 and 6.9349, −5.6311 and 6.3000, −9.3137 and 6.8288, −7.0282 and 7.0647, −6.2505 and 5.8823, and −8.2384 and 6.1992, respectively.

Validation Results
The developed models are then applied to the second group of data to assess the generalization capability of them.Figure 6 depicts the correlation between the expected HLs and networks' products.As is seen, all obtained R 2 s (0.9406, 0.9516, 0.9340, 0.9318, 0.9431, and 0.9400) reflect higher than 93% accuracy for all models.In this phase, the errors range between −5.5792 and 6.9349, −5.6311 and 6.3000, −9.3137 and 6.8288, −7.0282 and 7.0647, −6.2505 and 5.8823, and −8.2384 and 6.1992, respectively.

Score-Based Comparison and Time Efficiency
Table 1 summarizes the values of the RMSE, MAE, and R 2 obtained for the training and testing phases.In this section, the comparison between the performance of the used predictors is carried out to determine the most reliable one.For this purpose, by taking into consideration all three accuracy criteria, a ranking system is developed.In this way, a score is calculated for each criterion based on the relative performance of the proposed model.The summation of these scores gives an overall score (OS) to rank the models.Table 2 gives the scores assigned to each model.According to the results, the most significant OS (=18) is obtained for the BBO-MLP in both the training and testing phases.The ALO and ES-based ensembles emerged as the second and third most accurate ones, respectively.However, the IWO in the training phase and the LCA in the testing phase gained a similar rank to the ES.In addition, it can be seen that the results of the DA-MLP are less consistent than other colleagues.
Moreover, Figure 7 illustrates the time required for implementing the used models.This item is also measured for other well-known optimization techniques (including Harris hawks optimization (HHO) [131], GWO [132], whale optimization algorithm (WOA) [133], artificial bee colony (ABC) [134], ant colony optimization (ACO) [135], elephant herding optimization (EHO) [136], genetic algorithm (GA) [137], imperialist competitive algorithm (ICA) [138], particle swarm optimization (PSO) [139], and wind driven optimization (WDO) [140]) to be compared with ALO, BBO, DA, ES, IWO, and LCA.This figure indicates that the metaheuristic algorithms used in this study present a good time-efficiency in comparison with other models.Moreover, it was observed that the ABC, HHO, and DA take the greatest amount of time for almost all of the population sizes.

Presenting the HL Predictive Equation
In the previous section, it was concluded that the BBO constructs the most reliable neural network.This means that the biases and connecting weights optimized by this technique can analyze and predict the HL more accurately compared to other metaheuristic algorithms.Therefore, the governing relationships in the BBO-MLP ensemble are extracted and presented as the best HL predictive formula (Equation ( 9)).As is seen, there are five parameters (Z1, Z2, …, Z5) in this equation, which need to be calculated by Equation (10).Basically, the response of the neurons in the hidden layer are represented by Z1, Z2, …, Z5.Remarkably, the term Tansig is the network activation function, which is expressed by Equation (11).
HL BBO Due to the fact that the dataset used in this study is a prepared dataset dedicated to residential buildings, the applicability of the used methods is derived for this type of building.However, there are many studies that have successfully employed machine learning tools for predicting the thermal loads of buildings with other usages, such as office, commercial, and industrial ones [141].Hence, utilizing multi-usage datasets for future works can overcome this limitation.
Another idea may be evaluating the accuracy of the new generation of hybrid models which can be divided into (a) the combination of the existing metaheuristic tools with

Presenting the HL Predictive Equation
In the previous section, it was concluded that the BBO constructs the most reliable neural network.This means that the biases and connecting weights optimized by this technique can analyze and predict the HL more accurately compared to other metaheuristic algorithms.Therefore, the governing relationships in the BBO-MLP ensemble are extracted and presented as the best HL predictive formula (Equation ( 9)).As is seen, there are five parameters (Z1, Z2, . . ., Z5) in this equation, which need to be calculated by Equation (10).Basically, the response of the neurons in the hidden layer are represented by Z1, Z2, . . ., Z5. Remarkably, the term Tansig is the network activation function, which is expressed by Equation (11).
4.6.Further Discussion and Future Works Due to the fact that the dataset used in this study is a prepared dataset dedicated to residential buildings, the applicability of the used methods is derived for this type of building.However, there are many studies that have successfully employed machine learning tools for predicting the thermal loads of buildings with other usages, such as office, commercial, and industrial ones [141].Hence, utilizing multi-usage datasets for future works can overcome this limitation.
Another idea may be evaluating the accuracy of the new generation of hybrid models which can be divided into (a) the combination of the existing metaheuristic tools with other intelligent models, e.g., ANFIS and SVM, or (b) utilizing more recent optimizers for the existing ANN models.Both ideas are helpful to possibly recognize more efficient predictive methods.Moreover, a practical use of the implemented models is also of interest.In order to evaluate the generalizability of the methods, they can be applied to the information taken from real-world buildings noting that the input parameters considered for predicting the HL should be the same as those used in this study; otherwise, it would be a new development.

Conclusions
The high competency of optimization techniques in various engineering fields motivated the authors to employ and compare the efficacy of six novel metaheuristic techniques, namely ant lion optimization, biogeography-based optimization, dragonfly algorithm, evolutionary strategy, invasive weed optimization, and league champion optimization, in hybridizing the neural network for accurate estimation of the heating load.The proper structure of all seven methods was determined by sensitivity analysis and it was shown that the most appropriate population size could be varied from one algorithm to another.The smallest and largest populations were 50 and 500 hired by the IWO and ES, respectively.The high rate of accuracy observed for all models indicated that metaheuristic techniques could successfully establish a non-linear ANN-based relationship that predicts the HL from the building characteristics.Comparison based on the used accuracy indices revealed that the BBO, ALO, and ES (with around 94% correlation of the results) are able to construct more reliable ANNs in comparison with IWO, LCA, and DA.In addition, the models enjoy a good time efficiency relative to some other existing algorithms.However, the authors believe that, due to recent advances in metaheuristic science, further comparative studies may be required for outlining the most efficient predictive method.

Figure 2 .
Figure 2. The general path of the study.
the gradient and the Hessian matrix, respectively.The following equation expresses (x) V as a sum of squares function:

Figure 2 .Figure 3 .
Figure 2. The general path of the study.

Figure 7 .
Figure 7.The computation time needed for various hybrid methods.

Figure 7 .
Figure 7.The computation time needed for various hybrid methods.

Table 1 .
The results of accuracy assessment.

Table 2 .
The executed ranking system.