Skip to main content
  • Research note
  • Open access
  • Published:

Assessment of factors affecting tourism satisfaction using K-nearest neighborhood and random forest models



This study aimed to identify factors affecting the satisfaction of tourists traveling to the city of Hamadan as Asian urban tourism capital in 2018. The data a random sample of 300 tourists were collected using a designed questionnaire. We applied random-forest and K-nearest-neighborhood methods to analyze the data.


The variables of society behavior, municipal equipment and cost of services were the three top rank variables in predicting tourist satisfaction. Considering the capacity of the ancient city of Hamadan for tourism, policy-makers can use our results in planning for providing a sustainable development and flourishing tourism industry in this city.


Currently, tourism is one of the most dynamic economic activities with many socio-economic/environmental/cultural benefits. Nowadays, economists call it invisible exports. Tourism industry involves the presence of different economic sectors, so, the revenues obtained from services and goods provided to tourists are distributed among a larger number of people in the community. Thus, tourism is closer to social justice (one of the main pillars of sustainable development) than other sectors of the economy in terms of distributing benefits. On the other hand, travelling to different areas makes societies familiar with the culture and customs of other communities, and provides appropriate conditions for cultural and social interactions which in turn lead to promote peace, security, and promote tolerance among different cultures that are pillars of sustainable development [1,2,3]. The multidimensional nature of the tourism industry causes major changes in the host society in addition to catering for tourists [4]. Yet, due to the role played by a number of governmental institutions and private sector enterprises, the tourism industry has a very complicated entity [5]. Hence, authorities are attempting to provide the opportunities to take advantage of the positive aspects of this industry by preparing tourism attraction and making them precious in potential areas [6]. One of the pillars of tourism development is the demand for tourism which has a complex structure, because increasing/decreasing contribution of tourism revenues depends on different factors.

Satisfaction of tourists is one of the most important factors that guarantee future profit growth. Nowadays, many organizations have considered tourist satisfaction as an important criterion for measuring the quality of their work. The tourist’s satisfaction is achieved by designing appropriate processes such that services provided meet the expectations of the tourists [7]. Studying of tourism literatures shows that the satisfaction of tourists from a destination/place is an important factor in selecting a destination which means that if the tourists are satisfied with their journey to a destination, they are expected to return or to offer the destination to others. Tourist satisfaction has become a substantial subject for most service industries [8, 9].

Considering that the tourists travel to different areas to achieve mental relaxation, the shortage in quantity/quality of provided services reduces the amount of tourists and prevents the realization of sustainable tourism [10]. Given the importance and increasing contribution of this sector in the modern economy, planning to strengthen the infrastructure of tourism and improving the quality-of-services and facilities for tourists are necessary more than ever.

The city of Hamadan, as an ecotourism and historic destination, can provide a platform for development of sustainable tourism through proper planning and provision of infrastructure. This study aimed a) to identify factors affecting the satisfaction of tourists traveling the city of Hamadan in line with the appropriate decision making for the development of tourism as well as increasing the satisfaction of tourists; and b) to compare performance of two data mining techniques of random forest (RF) and K-nearest neighborhood (KNN) in predicting tourism satisfaction.

Main text

Materials and methods

Study area

The city of Hamadan is located between latitude 23°59′ and 25° 45′ north and longitude 47° 34′ and 49° 36′ east of the Greenwich Meridian. Hamadan is located in the western part of Iran on the slopes of Alvand mountain range and it has a mountainous weather and has many ancient, historical, natural, cultural, recreational and sportive items for tourist attractions.

Hamadan is the first Persian capital that has been mentioned by Herodotus (the famous Greek historian) [11]. The Ganj Nameh inscriptions (500 BC), the stone lion statue related to of the Medes (the first dynasty of Iran; 728 to 549 BC), the holy tomb of Esther and Mordechai (for the Jews) mentioned in the Torah of the Old Testament, the tomb of Avicenna [the well-known physician and philosopher (about 1000 years ago)], the tomb of Baba Tahir famous poet (about 1000 years ago), the Lalehjin’s Handicrafts city (which was registered as the World Pottery City in UNESCO in 2017), Ali-Sadr Cave, the unique largest water cave in the world, the beautiful Alvand mountain range with the highest peak with an altitude of 3574 meters above sea level and very beautiful valleys, suitable weather, ski resort as well as large sports and recreational complexes are among the tourist areas of Hamadan city [12]. These particular conditions have made Hamadan as a place with a potential for attracting tourists.

Statistical Analysis

This study used information of 300 tourists (see Additional file 1 for information about data collection) and utilized two widely used non-parametric methods of RF and KNN to capture any nonlinear relationship between inputs and outputs.

Random forest

RF as an ensemble learning method works based on creating several (say 1000) regression trees [13]. In each tree the predictor with smaller prediction error is selected to be at the top of the tree and it is split into two parts (for continuous variable a cutoff point is created using minimizing prediction error). This partitioning continues recursively to create a tree. Prediction is then done using averaging the response variable in each leaves of the final tree (in regression setting). To improve the prediction performance of the tree regression methodology, the random forest technique applies random selection in two ways. In this regard, each tree is formed by using a random sample that is selected from all original observations (bootstrapped sampling), and a random sample of candidate variables (inputs) for splitting [13, 14]. The issue of instability of these trees is handled by these randomness because it leads to introducing differences in individual predictions that are obtained from each tree [15]. In order to obtain predictions for the final forest, the averaging rule is used for the results of all individual trees [14, 15].

K-nearest neighborhood

KNN method is one of the most popular non-parametric regression methods. In this method, the distribution function of predictive values is obtained by using a nonparametric distribution of a kernel function. This model predicts future observations based on the similar situation at present, i.e., the probable conditions in the future will be the same as those that occurred at the present time. The probability of occurrence of each state in the current situation depends on the similarity of the observed vector of the independent variables at present and the observed independent vector in the historical series [3].


To implement the models, variables in Table 1 were used as predictors and TS was used as the output. Then, both RF and K-NN techniques were implemented to identify important variables that affect TS. To provide some goodness of fit measures, we divided the data set into two sets of training and testing (80–20%). The two models were trained using the data in the training set and were tested on the testing set. Evaluation criteria used to investigate the performance of the methods included root mean square error (RMSE), Criterion-referenced measurement (CRM), Nash–Sutcliffe efficiency (NSE) [16], Pearson correlation coefficient (R) and Morgan-Granger-Newbold (MGN) statistic [17].

Table 1 Indicators and variables of research

Results and discussion

Table 2, shows the demographic characteristics of the individual participated in the study. According to the table, the majority of the participants was male (65.4%), aged between 20 and 45 (54.3%), single (67%) and had academic education (78.4%).

Table 2 Characteristics of the Individual Participated in the study

Both RF and KNN models were optimized and the evaluation criteria were computed over the testing set. Table 3, shows the performance criteria of RF and K-NN. Comparing the results of the two models showed that although RF model produced smaller error rate (RMSE = 0.34) and greater correlation between observed and predicted responses (R = 0.90) compared with the K-NN model, but the K-NN model had smaller biases (CRM = 0.0005) (see Additional file 2: Fig. S1) and skewness of the errors (0.32) (see Additional file 2: Fig. S2). In fact, the precision of the RF model and the bias rate of the K-NN model were better. Accordingly, in order to test the significance difference between the accuracy of the results of the two models, the MGN statistic was used. The value of this statistic was not significant (P value = 0.509) indicating that there was no statistical differences between the accuracy of the models.

Table 3 Performance criteria of random forest and K-NN

Additional file 2: Fig. S3 shows the variable importance (VIMP) for the predictors of satisfaction. As can be seen, society behavior and municipal equipment were the first two top rank variables and cost of services was the least important variable in predicting tourist satisfaction.

Discussion and conclusion

The tourism industry brings about diversity in economic activities and employment, and realizes the distribution of revenues to a wider range of individuals and groups in societies. In the context of the changes brought about by globalization, only those cities and regions with strategic and futuristic programs have the potential to make optimal use of the advantages of this industry. These cities are mostly creative cities.

The results of this research showed that the behavior of the host community index including the variables such as the honesty of the host community, hospitality spirit, citizens’ attitude towards tourists had the highest effect on the satisfaction of tourists. The way citizens treat tourists reflects the host society’s understanding of tourists in the cultural, social, and economical exchanges. This in turn leads to recognizing the attractions and the financial benefits of tourism as well as to enhancing the conditions for peace and sustainable security among communities. Therefore, the deeper interaction between the host and guest communities in the tourism sector leads to more satisfaction and consequently it results in a thriving in the tourism industry.

The second top most rank variable affecting tourism was municipal facilities and equipment including items such as access to housing centers, access to fuel cell, drinking water, ATMs, urban traffic, etc. This index had a significant impact on tourism with an importance of 0.81. Municipal facilities and equipment are a factor indicating the convenience and comfort as well as the utility of tourist spaces. These factors reduce the hardship of travel and the fatigue caused by displacement and brings pleasure and tranquility to tourists. The evaluation of the outputs of the model indicates the favorable situation that tourists have expressed about the types of urban equipment in Hamadan.

The quality of environment with the importance of 0.488 was the third top most important factor affecting the tourism process. The main elements of this construct are climatic comfort, environmental relaxation, visual quality and urban landscape. In this construct, special attention is paid to the elements of native-oriented tourism. Environmental quality is the underlying factor of paying attention to the protection of natural and human-made environmental identities. The urban landscape reveals the totality of a city as a text, allowing the reader to interpret this text by the interpreter (the tourist). The city of Hamadan has an appropriate environmental quality from the perspective of tourists due to the design and morphology of the city as well as its establishment in the beautiful slopes of Mount Alvand which are among the factors attracting tourists to this city. Other factors involving in the tourism process were the quality of services with coefficient 0.314 and cost of services with coefficient 0.167 that have had less impact on tourism.

Considering the ability and capacity of the ancient city of Hamadan for tourism (which has led to selecting this city as Asian urban tourism pilot and capital in 2018), the need for paying more attention to policies and programs by all policy-makers working in the field of tourism is clear. The consequence of such a broad participation is the continued prosperity of the tourism industry.


One main limitation of this study was the lack of enough international and foreign tourists to compare the results. It is suggested to consider foreign tourists in the future studies.

Availability of data and materials

The data is available upon the request from the first author.



random forest


K-nearest neighborhood


root mean square error


determination coefficient


Pearson correlation


root mean square error


criterion-referenced measurement


Nash–Sutcliffe efficiency




  1. Abesey S, Allah SS, Tahmasebi IS. Investigating the relative efficiency of tourism management in the provinces of the country in the third and fourth development plans. Econ Strategy. 2015;7:198.

    Google Scholar 

  2. Boroujeni HZ, Turkman N. Analysis of the development of religious tourism in Hamedan Province. Sci Manag Manag Iran. 2013;30:57–80.

    Google Scholar 

  3. Karamuz M, Araghinejad S (2014) Advanced hydrology (Vol.): Amir Kabir University.

  4. Azami M, Araghinejad S (2012) Development of k-nearest neighbor regression method in forecasting river stream flow.

  5. Yüksel A, Yüksel F, Culha O. Ministers’ statements: a policy implementation instrument for sustainable tourism? J Sustain Tour. 2012;20(4):513–32.

    Article  Google Scholar 

  6. Rosentraub MS, Joo M. Tourism and economic development: which investments produce gains for regions? Tourism Manag. 2009;30(5):759–70.

    Article  Google Scholar 

  7. Taghavi M, Soleimani AG. The factors influencing the growth of the tourism industry. Econ Res. 2017;3:157.

    Google Scholar 

  8. Nahid EB, Akbar AND. The effects of the mental image of isfahan tourists on the development of tourism. J Tourism Manag Stud. 2016;31:109–25.

    Google Scholar 

  9. Salleh M, Omar K, Yaakop AY, Mahmmod AR. Tourist satisfaction in Malaysia. Int J Bus Soc Sci. 2013;4(5):221–6.

    Google Scholar 

  10. Kermani M. Regional economics, theory and models. Tehran: Samt; 2017.

    Google Scholar 

  11. Jahanpour A (2018) Hamedan Sights: Fanavaran.

  12. Fatemi S (2017) Hamedan Tourism: Tourism Organization of Hamedan Province.

  13. Grömping U. Variable importance assessment in regression: linear regression versus random forest. Am Stat. 2009;63(4):308–19.

    Article  Google Scholar 

  14. Breiman L. Random forests. Mach Learn. 2001;45(1):5–32.

    Article  Google Scholar 

  15. Barnett T, Dümenil L, Schlese U, Roeckner E, Latif M. The effect of Eurasian snow cover on regional and global climate variations. J Atmos Sci. 1989;46(5):661–86.

    Article  Google Scholar 

  16. Lindström G. Lake water levels for calibration of the S-HYPE model. Hydrol Res. 2016;47(4):672–82.

    Article  Google Scholar 

  17. Ghorbani S, Afgheh SM. Forecasting the house price for ahvaz city: the comparison of the hedonic and artificial neural network models. J Urban Econ Manag. 2017;5(19):29–44.

    Article  Google Scholar 

Download references


We would like to appreciate the Vice-chancellor of Education of Hamadan University of Medical Science for technical support and the Vice-chancellor of Research and Technology of Hamadan University of Technology for their approval and support of this work (No.970128380). We also would like to thank the Vice-chancellor of Lorestan University.


This study was partially funded by Lorestan University (Grant No: 2-2-98-1-03). Lorestan University provided also technical support for the present study.

Author information

Authors and Affiliations



LT and HA conceived the research topic, explored that idea, performed the statistical analysis and drafted the manuscript. HM participated in data analysis and writing. All authors read and approved the final manuscript.

Corresponding author

Correspondence to Hamed Abbasi.

Ethics declarations

Ethics approval and consent to participate

This study was approved by the Hamadan University of Medical Sciences Ethics Committee (Approval Code: IR.UMSHA.REC.1397.34). Written informed consent was obtained from all participants.

Consent to publish

Not applicable.

Competing interests

The authors declare that they have no competing interests.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Additional file 1.

Data collection.

Additional file 2.

Additional figures.

Rights and permissions

Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (, which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver ( applies to the data made available in this article, unless otherwise stated.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Tapak, L., Abbasi, H. & Mirhashemi, H. Assessment of factors affecting tourism satisfaction using K-nearest neighborhood and random forest models. BMC Res Notes 12, 749 (2019).

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: