Analyzing tourist data on Twitter: a case study in the province of Granada at Spain
Journal of Hospitality and Tourism Insights
ISSN: 2514-9792
Article publication date: 15 March 2021
Issue publication date: 6 April 2022
Abstract
Purpose
The main aim of this paper is to build an approach to analyze the tourist content posted on social media. The approach incorporates information extraction, cleaning, data processing, descriptive and content analysis and can be used on different social media platforms such as Instagram, Facebook, etc. This work proposes an approach to social media analytics in traveler-generated content (TGC), and the authors use Twitter to apply this study and examine data about the city and the province of Granada.
Design/methodology/approach
In order to identify what people are talking and posting on social media about places, events, restaurants, hotels, etc. the authors propose the following approach for data collection, cleaning and data analysis. The authors first identify the main keywords for the place of study. A descriptive analysis is subsequently performed, and this includes post metrics with geo-tagged analysis and user metrics, retweets and likes, comments, videos, photos and followers. The text is then cleaned. Finally, content analysis is conducted, and this includes word frequency calculation, sentiment and emotion detection and word clouds. Topic modeling was also performed with latent Dirichlet association (LDA).
Findings
The authors used the framework to collect 262,859 tweets about Granada. The most important hashtags are #Alhambra and #SierraNevada, and the most prolific user is @AlhambraCultura. The approach uses a seasonal context, and the posted tweets are divided into two periods (spring–summer and autumn–winter). Word frequency was calculated and again Granada, Alhambra are the most frequent words in both periods in English and Spanish. The topic models show the subjects that are mentioned in both languages, and although there are certain small differences in terms of language and season, the Alhambra, Sierra Nevada and gastronomy stand out as the most important topics.
Research limitations/implications
Extremely difficult to identify sarcasm, posts may be ambiguous, users may use both Spanish and English words in their tweets and tweets may contain spelling mistakes, colloquialisms or even abbreviations. Multilingualism represents also an important limitation since it is not clear how tweets written in different languages should be processed. The size of the data set is also an important factor since the greater the amount of data, the better the results. One of the largest limitations is the small number of geo-tagged tweets as geo-tagging would provide information about the place where the tweet was posted and opinions of it.
Originality/value
This study proposes an interesting way to analyze social media data, bridging tourism and social media literature in the data analysis context and contributes to discover patterns and features of the tourism destination through social media. The approach used provides the prospective traveler with an overview of the most popular places and the major posters for a particular tourist destination. From a business perspective, it informs managers of the most influential users, and the information obtained can be extremely useful for managing their tourism products in that region.
Keywords
Acknowledgements
This work has been funded by the Spanish Ministerio de Economía y Competitividad under project TIN2016-77902-C3-2-P and the European Regional Development Fund (ERDF-FEDER).
Citation
Viñán-Ludeña, M.S. and de Campos, L.M. (2022), "Analyzing tourist data on Twitter: a case study in the province of Granada at Spain", Journal of Hospitality and Tourism Insights, Vol. 5 No. 2, pp. 435-464. https://doi.org/10.1108/JHTI-11-2020-0209
Publisher
:Emerald Publishing Limited
Copyright © 2021, Emerald Publishing Limited