A Study on the satisfaction of Chinese bed and breakfast based on big data reviews

: Purpose B&B stays have enjoyed popularity in China over recent decades. Online reviews can truly and objectively reflect consumers' satisfaction. Using the network text analysis method, this study analyses the over-all satisfaction, the trend of satisfaction, and the main factors affecting consumers' satisfaction by taking the TOP50 online reviews of China B&B as the sample. The results show that the overall satisfaction degree of the Top 50 of China's B&B List is relatively high, and it has continuously improved in the past three years. But there are still some negative evaluations. The core competitiveness is service, focusing on personalized needs and hu-manistic care, while the weakness in the room hardware is also important to improve satisfaction. The personalized service, infrastructure, the service process, and internal & external environment should be improved to improve consumers' satisfaction with the homestay. The research attempts to provide theoretical and practical references for improving service quality.


Introduction
In 2015, the general office of the State Council of the People's Republic of China issued guiding opinions on accelerating the development of the living service industry and promoting the upgrading of consumption structure, which pointed out the need to develop subdivided business forms such as Inn and homestay actively. With the transformation and upgrading of China's tourism, the improvement of residents' consumption and personalized demand development promote the homestay industry's rapid development. However, the homestay industry has not formed a unified standard, and the quality varies greatly, resulting in low consumer satisfaction, which affects consumers' purchase behavior and the future development of the homestay industry to a certain extent. Currently, only a little research about the homestay industry mainly includes the behavior of residents and tourists in the community, the development and countermeasures of homestay, and the related industries of homestay (G. H. Zhang & Meng, 2017).
Foreign research on B&B began in the 1980s. Dawson and others conducted a sampling survey on the current situation of B&B tourism in New York State. The results show that the choice of B&B tourism is diverse (Dawson & Brown, 1988). Fang Chi-Kuo and others believe that the management of B&B tourism is closely related to the development of ecotourism (Kuo & Kuo, 2012). Rasoolimanesh surveyed the homestay tourists in Linglong Valley and proved that tourists' perceived value positively affects consumers' satisfaction (Rasoolimanesh et al., 2016). It can be seen that foreign scholars on the research of B&B mainly use cases for empirical analysis, pay more attention to the research on operation and management, and analyze it from the perspectives of the target market population of B&B and the relationship between host and guest.
Domestic scholars' research on B&B mainly focused on describing concepts and phenomena before 2010. Qualitative methods are mostly used, and academic attention is not much. After 2010, scholars paid attention to operability and practicality. Wang Huiling and others used text analysis to study the high-frequency words in the comments and constructed the evaluation scale of the family hotel (H. L. Wang & Wu, 2015). Zheng introduced the development of B&B in mainland China and summarized the experience in the development and management of B&B in Wanjia hotels in Zengcheng, Guangzhou (Zheng, 2015). According to Shi & Han (2019), explored the elements of resident experience in metropolitan residential Inns through a comprehensive theoretical analysis of the guest evaluation on the international homestay reservation platform Airbnb.
The development of big data makes more and more people pay attention to comment data. In addition, the internet has become important for consumers to release and obtain tourism information. Therefore, online comments are becoming increasingly important for B&B. However, the existing research on B&B satisfaction is mainly based on the qualitative questionnaire method, and there is less research on big data comments. Therefore, this paper studies the consumers' satisfaction with the homestay industry, and the results significantly improve service quality and sustainable development.

Methodology
This study uses the network text analysis method. Network text analysis mainly focuses on content under the background of big data. It is a scientific method to deeply analyze the research object's content and discover the phenomenon's essence (Qiu & Zou, 2004). This method can convert text data into quantifiable data. Many scholars use it in tourism research, mainly in image perception and tourist satisfaction (Zhao & Chen, 2019) (Chen & Huang, 2008) (Fu & Wang, 2012). Network text analysis usually uses ROST CM for analysis. ROST CM is a content mining system software developed by Professor Shenyang, School of Information Management, Wuhan University. It carries out word segmentation, word frequency statistics, and semantic network analysis for web pages, forums, and other network information sources to achieve the purposes of content mining and text analysis. Besides, this paper uses emotional analysis to evaluate the emotional polarity of the top 50 network comments of the B&B and to get the emotional score. Then it divides the positive and negative comments to obtain the overall satisfaction of the top 50, the change of satisfaction in the past three years, and the consumer satisfaction under various indicators of the B&B. It also analyzes the main advantages and disadvantages of the top 50 of the B&B.
This research data is taken from the Ctrip website, one of China's most famous tourism websites, with rich travel notes and comments (Zhao & Chen, 2019). At the same time, Ctrip has a clear classification of homestays, and the online comment system is relatively mature and comprehensive, which is conducive to acquiring accurate data for this study.
This paper uses data from representative homestays with widespread popularity in China, which are the top 50 from the "black truffle" award list of China homestays list (from now on, referred to as "China homestay top 50"). At the same time, to conduct a longitudinal analysis of the satisfaction trend, this study uses the comment data of China's top 50 B&Bs from 2017 to 2019. Because some B&Bs are not registered on Ctrip Website, and the list of top 50 coincides yearly, the total amount of data acquired is 21981.
To solve the problems of data invalidity and redundancy, data preprocessing needs to be carried out, mainly including: first, the processing of duplicate data regards the same continuous evaluation of the same user as the duplicate data. Therefore, it deletes them, and only one item is retained (Wei & Yangjie, 2013). The second is the processing of invalid data. The content and text are irrelevant to this study, and the data that cannot obtain text information are regarded as invalid data and deleted. For example, one only publishes pictures without text comments and the comment without useful information about consumers' attitudes toward B&B, which are judged as business advertising, B&B publicity, malicious comments, etc.
After preprocessing, the sorted comment documents are transformed into TXT format that can be identified by ROST CM software. Finally, 5361 valid comments on the top 50 of B&B are obtained.

Results and Discussions Results
By the ROST CM software, this paper analyzes the word frequency of the top 50 comments of China B&B from 2017 to 2019. The top 100 high-frequency words in Table 1 are selected for this research. Regarding the part of speech, high-frequency words mainly include nouns, adjectives, and verbs. Most nouns are related to the homestay, and adjectives mainly reflect consumers' subjective feelings and evaluation of homestays. The number of verbs is relatively small, which shows the activities and action contents (such as pick-up, help, prepare, etc.) of consumers or homestay staff in the whole process of check-in. The top six high-frequency words are hotel, service, room, breakfast, housekeeper, and environment. Further analysis of its characteristics leads to the following judgments: 1.
In addition to "hotel", which ranks first in terms of word frequency, "service", "room," and "environment" frequently appear in consumers' comments, and the number of occurrences is more than 1000. It indicates that the homestay, the software and hardware facilities, and services are still the most concern by the consumer. 2.
The frequency of "service" is higher than that of "room". It reflects that service also affects consumers' satisfaction to a great extent. Unlike hotels, B&B has the characteristics of home atmosphere, humanistic care, and experience sense, which is also reflected in the high frequency of the word "housekeeper", "enthusiasm," and "considerate". The operator (service provider) is an important factor affecting consumers' satisfaction. 3. The frequency of "satisfaction" is relatively high, indicating that most consumers are satisfied with the top 50 homestays. ROST CM software is used to conduct semantic network analysis on the review text and construct visual graphics in Figure 1 with the structural characteristics of "core-sub-core-periphery", spreading from core to periphery. The words with more connecting lines represent a higher frequency of occurrence. Core words are in the center with a higher frequency. Closer to the core words, closer relationship with the core words. For example, most comments are around "hotel" (indicating that consumers cannot differentiate between B&B and hotel). "Service", "room", "breakfast", "butler", and other words in the center indicate that consumers pay attention to the main aspects of the homestay. "Comfortable", "satisfied", "clean", and other adjectives are in the periphery, related to the secondary core words and core words. Generally, the semantic network structure of consumers' perception of homestay presents a system with the homestay and hardware and software facilities as the core, with consumers' feelings on the periphery.

Figure 1. Semantic network diagram
Consumers' emotional cognition is an important reflection of their satisfaction with B&B. The ROST CM software is a popular tool for analyzing sentiment (Lang, 2017). According to previous studies, satisfaction can be divided into positive, neutral, and negative (Cai et al., 2015). Statistical results are shown in Table 2.  (-15, -25), and high (-∞, -25). Neutral emotions are not divided. Table 2 shows that in the past three years, the overall satisfaction rate of consumers with the top 50 B&B has reached 87.51%, with high, moderate, and generally positive emotions accounting for 46.04%, 17.04%, and 24.44%, respectively. It indicates that most consumers are satisfied with B&B and make positive comments. The negative comment accounted for 11.90%, with general, moderate, and high negative emotion accounting for 1.60%, 0.62%, and 0.24%, respectively. It indicates that some consumers are not satisfied with B&B and a few of them are strongly discontent. There is much room for improvement in consumers' satisfaction. We can know the trend of the overall satisfaction with China's top 50 homestays by analyzing the change in the proportion of consumers' positive and negative comments in the past three years. By visualizing the proportion of comments with different emotional tendencies in the past three years, the annual trend chart of satisfaction is obtained (Figure 2), in which positive comments increased from 79.3% in 2017 to 89.2% in 2019, and negative comments increased from 19.8% to 10.3%. The results show that the proportion of positive comments has increased, and the proportion of negative comments has decreased yearly. It shows that consumers are more and more satisfied with the top 50 B&B. However, there are still 10% negative evaluations and much room for improvement. Emotional orientation reflects a person's likes and dislikes, and consumers' attitudes towards homestay are intuitively reflected in their emotional orientation (Wang & Zheng, 2020). Referring to the research of Wang & Zheng (2020), comments are divided into positive comments (87.51%) and negative comments (11.9%) according to different emotional tendencies. Review texts with different emotional tendencies are the research object, and high-frequency word analysis is carried out to get consumers' attention to homestay under different emotional tendencies. We calculate repeated terms and the percentage of the top 30 high-frequency words, as shown in Figure 3. It shows that housekeepers and services have more good comments than bad reviews. It is because the consumers are satisfied with the service and servants. A good review of "environment" and "design" is more than a bad one. It means that the homestay environment can greatly improve the consumer's satisfaction. However, comparing the dissatisfaction rate, the negative comment rate of "room" is much higher than the favorable one. The room is still a disaster. For the consumer, the "room" factor has high attention but does not win much satisfaction. Comments about "experience" and "facilities" show there is still much room to improve hardware and software facilities. In short, consumers are more satisfied with the services and external environment but discontent with the hardware and software facilities.

2.
Analysis of satisfaction with homestay elements.
To further understand customer satisfaction with elements of the homestay, this paper constructs an evaluation index system of homestay factors. It classifies the factors into four categories, based on the text of the high-frequency words, semantic network analysis, combined with the domestic evaluation system of homestay quality and development of the B&B, referring to research by Fan (2019), Pi & Zheng (2017) Zhang & Yang (2017). By deleting useless words such as "hotel", "afternoon", and "elder brother", classifying the top 30 high-frequency words by referring to previous studies (Hu, 2020) obtains the data of homestay factors with good or bad reviews, as well as the word frequency ratio of homestay factors. Among these homestay elements, the rate of consumers' satisfaction is high in "service", "environmental location", and "catering", but low in "room hardware", "catering", and "environment". An inductive table of factors of homestay satisfaction is shown in Table 3, while the statistical table of the affective tendency of factors of homestay satisfaction is shown in Table 4.   Note: Emotional tendency proportion is calculated based on all good and bad comments. The proportion of homestay factors is calculated based on comments of four categories.
In the negative emotional tendency, there are a lot of problems with the room hardware. The main problems are: (1) Hardware. Comments are mainly about the room sound insulation. "We can hear noise in the room at night." "The floors are crunchy, and the sound insulation is poor. "Complaints about design and light: " room lighting is very poor", "power socket distribution is not reasonable". Most homestays are built by renovating ordinary houses, so the sound insulation effect is unsatisfactory. At the same time, the room's design also needs to improve.
(2) Facilities. Consumers complain that some facilities are lacking, such as "there is no towel rack or laundry basket in the room" and "the room does not provide a refrigerator". These reflect consumers who stay in the top 50 B&B have higher requirements, but the B&B is not up to standard. At the same time, the equipment is old and of poor quality. "Service facilities need to be improved. Air conditioning is not hot, and there is no hot water in the morning". "Facilities are old". For the consumers, safety is particularly important, but the current facilities are not safe enough. These complaints reflect that the homestay fails to consider practicality and security, though it offers new facilities. It provides a novel experience but acquires decreased satisfaction. (3) Hygiene. "The room is small and full of a strange smell. The bed stinks with sweat." "There are bugs and spiders in the room and stains on the sheets". All comments show that there are still health problems in the homestays on the top list. There is room for improvement in health standardization.
In the positive emotional tendency, there are several aspects of "service": (1) Service subject: butler and servants, etc. Reviews include "In the whole process, Butler always helps me", and "shopkeeper's service is very good". The core competitiveness of the top 50 B&B is service, in which butler is an important part. From the comments, we know that the service staff of the top 50 B&B are more considerate and provide good and satisfying service.
(2) Personalized satisfaction. Besides accommodation, guests' other needs should be considered (such as shuttle and food). Comments include "offer paid airport pick-up service for guests" and "we will do our best to meet guests' requests." In other words, B&B should meet the personalized requirements of consumers, which can greatly increase the emotional added value and improve their satisfaction of staying. (3) Humanistic care. "The host is amiable and enthusiastic", "family warmth", and "your needs will be quickly met". Unlike a hotel, B&B provides a good home atmosphere and humanistic care. As a result, the top 50 B&B greatly improve consumers' sense of experience and satisfaction.
Next is the environment, which includes the external environment. The comments are "quiet", "peaceful" and "beautiful". Comments about the internal environment are "clean" and "sweet". Most homestays boast a beautiful natural environment and good internal environment, with which consumers are satisfied. Other comments, for example, "the location of the hotel is remote", "the location is difficult to find", and "the surrounding in the village is messy, and it is dusty outside", reflect the remote location of the B&B ensures a beautiful environment but results in poor surroundings and poor accessibility.
The last is catering. In the negative emotional tendency, there are several comments such as "breakfast is very delicious and delicate though there are not many varieties", "breakfast is packed too simply", "breakfast is too late", and "breakfast varieties are not enough". It reflects the homestay does not pay much attention to breakfast. Besides, there are other problems, such as unpunctuality, incomplete preparation, and few varieties. Consumers who stay in TOP50 B&B have high standards and requirements for breakfast. Other comments, for example, "the breakfast is enough" and "breakfast is very rich" are mainly about the good taste and sufficient portion of breakfast. Exquisite preparation is the reason for consumers' praise.
In short, the analysis results show that high-frequency words respond to satisfaction. The top six high-frequency words reflect what the tourists need. In high-frequency words, "service" is more important than other words. It indicated that service quality is the core competitiveness of B&B. It is necessary to strengthen the training of the staff because enthusiasm and attention to detail can increase consumers' happiness. Among other factors that affect satisfaction, infrastructure is more important. The operators should strictly follow the industry standards issued by the National Tourism Administration and improve infrastructure to ensure that supplies are fully equipped, and the guest rooms are clean. Attention should be paid to strengthening the sound insulation of the room to ensure that consumers can have a good rest. Specialty products can increase satisfaction, such as catering services. However, the satisfaction degree of catering is not high. The operators should standardize the service process for a higher level of catering service.

Conclusions
Taking China's TOP50 homestay as an example, this study uses ROST CM to conduct word segmentation on the review texts of China's top 50 homestays. Following conclusions are drawn through high-frequency word analysis, semantic network analysis, and sentiment analysis of the texts. First, the overall satisfaction of the top 50 B&B is high. The positive comment rate is 87.51%, the negative comment rate is 11.90%, and only a few consumers are strongly discontent. Second, overall satisfaction has increased year by year. For example, the proportion of positive top 50 evaluations increased year by year from 2017 to 2019, reaching 89.2% in 2019. However, there are still nearly 10% negative evaluations. Third, among the satisfaction factors of homestay, the degree of satisfaction with "service" is the highest, reflected in the home atmosphere and sufficient humanistic care. However, the low degree of satisfaction with "hardware" is mainly reflected in the outdated and incomplete facilities. Besides, the lack of practicality and security of new facilities leads to a poor experience. So, the operators should provide personalized service, improve infrastructure, standardize the service process, and upgrade the internal and external environment for higher satisfaction.