Characterizing alternative and emerging tobacco product transition of use behavior on Twitter

The objective of this study was to develop an inductive coding approach specific to characterizing user-generated social media conversations about transition of use of different tobacco and alternative and emerging tobacco products (ATPs). A total of 40,206 tweets were collected from the Twitter public API stream that were geocoded from 2018 to 2019. Using data mining approaches, these tweets were then filtered for keywords associated with tobacco and ATP use behavior. This resulted in a subset of 5718 tweets, with 657 manually annotated and identified as associated with user-generated conversations about tobacco and ATP use behavior. The 657 tweets were coded into 9 parent codes: inquiry, interaction, observation, opinion, promote, reply, share knowledge, use characteristics, and transition of use behavior. The highest number of observations occurred under transition of use (43.38%, n = 285), followed by current use (39.27%, n = 258), opinions about use (0.07%, n = 46), and product promotion (0.06%, n = 37). Other codes had less than ten tweets that discussed these themes. Results provide early insights into how social media users discuss topics related to transition of use and their experiences with different and emerging tobacco product use behavior.


Introduction
Social media is now a common source of health-related information [1]. This includes user-generated conversations about a variety of topics, with an emerging field focused on better understanding tobacco and alternative and emerging tobacco (ATP) and electronic nicotine delivery system (ENDS) related knowledge, attitudes, and behaviors [2,3]. User generated social media conversations can be assessed [4] to better understand how health behaviors are changing closer to real-time [5]. This approach introduces certain advantages over traditional survey methodology including faster identification of emerging trends [6]. However, methods to appropriately code social media content for specific health-related topics remain underdeveloped, particularly in the context of characterizing transitions in behaviors that change over time.
Twitter is a micro blogging social networking platform that allows users to tweet 280-character messages, which can then be retweeted, favorited, and shared across a network of online users [7]. Users can form online communities [8] by interacting with other users who share similar beliefs, interests, and opinions about topics. This includes users who initiate, use, and transition between different tobacco and ATP and ENDS products [9,10]. In fact, Twitter has specifically become a platform for sharing information about electronic cigarettes (e-cigarettes) [11][12][13][14] a nicotine delivery device commercially available only in the past decade [15]. Bardier et al. BMC Res Notes (2021) 14:303 Evidencing growing popularity of vaping behavior, studies have shown that online searches for electronic cigarettes have increased [16]. However, increased uptake of different types of e-cigarettes (e.g., Juul, heat-not-burn, etc.), particularly among youth and young adults, has not been without controversy [17]. Ongoing concerns about the long-term health impact of nicotine consumption [18], e-cigarette-related adverse events [19] (e.g., the 2019 outbreak of e-cigarette or vaping product use-associated lung injury), [20][21][22] and mixed evidence about the efficacy of ATPs as cessation devices, continues to generate public health and patient safety concerns [23,24]. These concerns are accentuated when trying to assess the interaction of use behavior between traditional combustible tobacco products (e.g., cigarettes, cigars) and ENDS [25].
Understanding the pathways of transition of tobacco and ATP use-including what products users initiate on, why they switch between products, and unique health harms related to dual-use (i.e., simultaneous use of both combustible and ATPs/ENDS)-is still a relatively underdeveloped area of study [26]. Hence, the objective of this study was to examine Twitter user conversations to characterize users' conversations in relation to transition of use associated with ENDS, with a focus on developing an inductive coding approach specific to characterizing transition of use knowledge, attitudes, and behaviors.

Methods
We conducted a retrospective observational social media study in two phases: (1) data collection; and (2) content analysis using an unsupervised machine learning and inductive coding approach. Inductive content analysis was used to identify and characterize posts relevant to tobacco and ATP use (i.e., "signal" tweets) and involved manual annotation by coders with training in tobacco and substance use behavior, with results used to generate a codebook of transition and behavioral-related themes that could also be iterated on in future social media studies.

Data collection
Data was first collected from the Twitter public streaming API with a filter to collect all tweets that contained geocoded posts located in the United States, with no further language or demographical restrictions. The time period of data collection was from 07/21/2018-07/21/2019. This initial dataset of geocoded tweets was then filtered for the keywords and hashtags "vape" and "vaping" in order to better isolate relevant twitter posts associated with study aims and for purposes of preliminary data analysis about ENDS behavior. The collected data included textual content of the tweet, user and account information, URLs, and time and date of post.

Data mining
To identify themes in our full corpus of tweets, we used an unsupervised machine learning approach called the Biterm Topic Model (BTM) designed to detect patterns in data and summarize the entire corpus of tweets into distinct highly correlated categories [27]. BTM is used to sort short text into highly prevalent themes without the need for predetermined coding or training and has been previously used for exploration of key public health topics [28][29][30][31]. For each topic, BTM generates the top 20 words that represent the topic cluster. These topics were then reviewed and selected to identify clusters of Twitter conversations relevant to vaping and transition of use. Using BTM, we are able to identify "signal" topics based on the BTM output and eliminate irrelevant topics. BTM topics were first generated after applying keyword filters and were included for further analysis if they were pertinent to vape and vaping behavior, topics were excluded if they contained irrelevant topics or appeared to correlate with non-user generated conversations (e.g., news tweets, etc. ) We then extracted all the posts from the select vaping BTM topics and manually coded the content of tweets in these topics to ensure relevance to user-generated tobacco and ENDS use behavior. Posts were excluded as signals if they were: (1) news related and not organically user-generated content; (2) not written in English; and (3) retweets, the tweets that were retweeted counted as only one tweet. However, all tweets, replies, and tweets containing photos or videos were included to assess additional contextual information in addition to content analysis of text of tweet. Transition of use was classified as switching from one tobacco or ATP/ENDS product to another.

Content analysis
Tweets and any associated URLs/hyperlinks were aggregated into a table and imported into Atlas.ti qualitative software for content analysis [32]. A first iterative, inductive analysis of the data was conducted (JSY) to identify thematic areas and classify tweets into codes with code descriptions. Tweets were read for identification of thematic areas in the dataset, then coded based on thematic areas of interest. Codes and coding descriptions were developed and modified iteratively throughout the coding process. A second analysis of the dataset was undertaken to expand the codebook to include subcodes. Subcodes and subcode descriptions were created and modified iteratively during a second round of data coding. Once a coding scheme was developed, the data were coded, extracted, and reviewed to assess the validity of the coding scheme by a second coder (CB). The final coding scheme and distribution of codes is presented in Fig. 1 and Table 1.

Ethics and data collection
Data was collected from the Twitter public API stream and included publicly available tweets that were filtered for posts with geolocation/geotagged information. As the study did not involve human subjects, involved no interactions with online users, and only used publicly available data that was further de-identified for research purposes, ethics, and IRB approval was not required and twitter users were not consented into this study [33]. Any user identifiable information was removed from the study results.

Results
A total of 40,206 tweets were collected after filtering for "vape" and "vaping" keywords/hashtags. After data filtering, we ran BTM on the keyword filtered data to generate topic clusters and reviewed them for relevance to study aims. We chose 16 BTM clusters, which comprised a total of 5728 (14.25%) tweets selected based on word groupings relevant to vaping and ATP/ENDS behavior terms. After manually annotating these tweets for characteristics relevant to tobacco and ATP/ENDS use and behavior, we removed all non-signal tweets, leaving 589 signal tweets related to transition of use that were further analyzed. The 589 signal posts were categorized into 10 tobacco/ATP/ENDS general use and behavior thematic codes listed and identified in Table 1.
There were also transitions among different ATP product types as well as cannabis product types, one of which was vaping a cannabis product. Vaping use factors that were observed as influencing transition of use included self-reporting of addiction prompting use, reaction to adverse symptoms, cost of ATPs/ENDS, faulty or broken ATPs/ENDS, preference for flavors, losing or misplacing ATPs/ENDS, interest in polysubstance use,  Vaping trick, such as blowing clouds I have been trying to nail this trick for a while and I finally succeed, I was so shocked. #vape#vapetricks concern about reducing nicotine levels, stigma, and the alleged therapeutic effects of vaping, especially cannabis.

Discussion
This study explored user-generated conversations occurring on Twitter in relation to tobacco and ATP/ ENDS use, with a specific focus on transition of use between these highly addictive products. We observed that this subset of Twitter users actively tweeted about their experience using tobacco and ATPs/ENDS, representing powerful information about this behavior that is influenced by a changing landscape of new and emerging nicotine products. The majority of tweets reviewed related to tobacco and ATP/ENDS use and behavior characteristics, including users asking about tobacco/ATP/ENDS products, how to quit, observations of tobacco/ATP/ENDS use behavior, opinions about products and vaping (including claiming vaping as a healthier alternative to tobacco or its alleged therapeutic benefits), sharing knowledge about tobacco/ ATP/ENDS products, and specific characteristics of use (e.g. addiction, adverse events, costs, flavoring, tricks, etc.) Close to half of all conversations discussed transition of use behavior, including users actively discussed the types of tobacco/ATP/ENDS products used and switched between, as well as provided reasons for product use change. A wide variety of tobacco/ATP/ ENDS products were mentioned, including combustible tobacco products (e.g., cigarettes), chewing tobacco, different types of e-cigarettes (Juul, vaping pens, etc.) and cannabis smoking products. Transition was observed between different products and within specific product classes (i.e., transitioning from one type of e-cigarette product to another), with some users (n = 32) selfreporting polytobacco and polysubstance behavior (e.g., smoking cigarettes and also vaping). Users expressed various sentiment about different products including how products could act as substitutes for others, what products made them feel better, attempts to quit use of one product by switching to another, and issues related to cost and access. Some users stated that cannabis vaping products helped them with cessation of nicotine addiction.
Based on these preliminary results, Twitter appears to enable robust conversation and sharing of information related to tobacco and ATP/ENDS use and can act as a digital forum for smokers and vapers to accumulate knowledge, share experiences, and actually lead to potential behavior change associated with nicotine use and addiction.

Conclusion
The results of our study are exploratory in nature and were derived from a sample of general geolocated tweets over a one-year period, which were then filtered for common vaping keywords and then analyzed using unsupervised machine learning. The results of this study are not generalizable to overall trends in tobacco or ATP/ENDS behavior, but nevertheless provide important insights into conversations occurring among Twitter users specific to transition of tobacco and nicotine product use. Themes associated with the transition of use were primarily focused on navigating quit attempts or having trouble quitting in the past, those who had relapsed to nicotine addiction, and those who had quit cigarettes but still vaped. These results provide early evidence that experiences in transition of use also present opportunities for more targeted cessation interventions, particularly in the context of increasing knowledge of known health harms related to tobacco use and nicotine addiction and exposure [34,35]. Future work should conduct further confirmatory studies to assess if themes related to transition of use knowledge, attitudes and behaviors observed hold true in other digital communities and use more structured research approaches to generalize findings. Future studies should also examine other platforms now popular among youth and young adults, such as Instagram, Snapchat, and TikTok.

Limitations
This study was exploratory and meant to generate hypotheses for future research. The study's limitations include use of a single platform and that Twitter user demographics may not reflect that of the general population of tobacco/ATP/ENDS users. The sample of tweets were also limited based on a convenience sample generated from geocoded tweets, and hence, may be subject to sample bias as it is estimated that only 1% of all tweets are geocoded [36,37]. Future studies should use multiple Twitter APIs to generate a more representative Twitter dataset.