property='og:image'/> Leopoldo Martínez D.: From Big Data to Big Insights - Case study: Sustainable and Luxury Tourism (1/4)

lunes, 5 de marzo de 2018

From Big Data to Big Insights - Case study: Sustainable and Luxury Tourism (1/4)


(Note: you can review this post in spanish)


1. Introduction



The rapid growth of user generated content in the social media on the web (discussion and comment forums, blogs, microblogs, among others) is one of the main raison of the huge volume of opinion data recorded in different digital formats (texts, images and videos).

For companies this data provides a rich source of information about its users and consumers behavior, specially about who they are, what they do and why they do it. This information treated properly can be transformed in actions oriented to improve their quality of products, provide better service, identify new business opportunities, among other activities.


To analyze this information, cutting-edge technologies are required for extract meaning and understanding (insights) about consumers (ex. product and brand perception, user experience), staff (ex. improvement opportunities) and business (ex. new markets and alliances).

On the other hand, there is an immense potential to use this set of analysis technologies in different types of businesses. This is especially true in the area of luxury and sustainable tourism, due to factors such as:

  • Tourists increasingly used to pay attention to what is discussed in online social media about tourism and leisure services.
  • Tourists with high purchasing power and preference for topics such as protection of the environment and promotion of social responsibility activities.

These factors and a market with high growth projections create the conditions for the conformation of virtual communities of users and consumers, from which it will be possible to extract meaning and understanding that can be used to develop personalized offers that consolidate trends. of consumption in a specific sector.

It is in my interest to show in four posts the insights obtained from a study that I developed based on the comments that people and organizations made about the topic of luxury and sustainable tourism on Twitter.

To carry out this study I used computer tools to analyze social media data, both commercial (IBM Watson, Google Cloud Platform and MeaningCloud), as well as open source (Gephi and SpagoBI).

The results that will be shown in each post will be the following:

  • First post: definition of searching space and identification of virtual communities through social networks analysis.
  • Second post: identification of interconnection channels, influencers and brokers through social networks analysis.
  • Third post: evaluation of the different topics that are being discussed and their relationship with brands and organizations through natural language processing.
  • Fourth post: identification of the psychological profile and consumption preferences of the virtual communities detected through IBM Personality Insights tool.

Next, the results of the first part.

2. Social listening: identifying virtual communities, influencers and brokers


2.1. Defining the searching space


This task consists in defining the words or key terms that will be used to extract data from Twitter. In this sense, it is necessary to define if you want to "listen" only what is being discussed around the topic, or if you want to "listen" as the topic is discussed around a specific organization, product or brand. For this study, results will be presented on what is being discussed around the theme of luxury and sustainable tourism in English language.
The data that define the searching space can be found in Table 1.


Table 1. Searching space

From the data extraction a total of 390617 records were obtained distributed as shown in Figure 1 interactively (hove the mouse over this and the other graphics):



Fig. 1. Results of data extraction

The results show that globally the tendency is to retweet (73.3%), rather than to reply (5.3%). Also, this trend remains uniform in the observation time, as observed biweekly in Figure 2.


Fig. 2. Biweekly trend of the extracted data


Another relevant fact was the total of Twitter accounts that discussed or were mentioned, whose result was 60877.

2.2. Virtual communities


Given the immensity of data that can be extracted from the comments and opinions that are made in an online social media such as Twitter, it is vital to segment this data into groups of accounts that can be categorized by one or several characteristics that describe them.

In this sense, in the context of this study the concept of virtual communities fits perfectly with this requirement, since it seeks to define groups based on the intensity of conversations that take place between accounts, assuming that this intensity is due to common interests that exist around a topic.


For the set of data identified in Table 1, a total of 9179 communities were detected, of which the 23 largest have 50% of the accounts. Figure 3 shows the detected virtual communities (sized by the number of accounts that compose them).


Fig. 3. Virtual communities detected


A first analysis focus can be represented by the larger communities, such as the first six, which account for 26.8% of the accounts.

See you in post 2 where I will identify the interconnection channels that exist between communities, as well as the big influencers and brokers.

3. Conclusions and recommendations


In this first part, some of the potential of the use of state-of-the-art technologies for the analysis of online social media data was shown. This potential was deployed through the identification of group behaviors such as virtual communities, allowing the identification of the first focal points of interest.

On the other hand, in the context of luxury and sustainable tourism, there was a strong tendency to retweet rather than reply. This characteristic will be important when communication strategies have to be defined.

Finally, it would be advisable to extend this first study the following component:

- Geolocation: when knowing where the people, organizations or companies that own the accounts are, one could propose actions that give commercial or non-commercial value adapted to the local preferences of those communities.

Leopoldo Martínez D.
(www.linkedin.com/in/winacore)

No hay comentarios.:

Publicar un comentario