Using data mining techniques to explore security issues in smart living environments in Twitter
Abstract
In present-day in consumers’ homes, there are millions of Internet-connected devices that are known to jointly represent the Internet of Things (IoT). The development of the IoT industry has led to the emergence of connected devices and home assistants that create smart living environments. However, the continuously generated data accumulated by these connected devices create security issues and raise user’s privacy concerns. The present study aims to explore the main security issues in smart living environments using data mining techniques. To this end, we applied a three-sentence data mining analysis of 938,258 tweets collected from Twitter under the user-generated data (UGD) framework. First, sentiment analysis was applied using Textblob which was tested with support vector classifier, multinomial naïve bayes, logistic regression, and random forest classifier; as a result, the analyzed tweets were divided into those expressing positive, negative, and neutral sentiment. Next, a Latent Dirichlet Allocation (LDA) algorithm was applied to divide the sample into topics related to security issues in smart living environments. Finally, the insights were extracted by applying a textual analysis process in Python validated with the analysis of frequency and weighted percentage variables and calculating the statistical measure known as mutual information (MI) to analyze the identified n-grams (unigrams and bigrams). As a result of the research 10 topics were identified in which we found that the main security issues are malware, cybersecurity attacks, data storing vulnerabilities, the use of testing software in IoT, and possible leaks due to the lack of user experience. We discussed different circumstances and causes that may affect user security and privacy when using IoT devices and emphasized the importance of UGC in the processing of personal data of IoT device users.
Collections
- Artículos de Revista [4552]