• Title/Summary/Keyword: text mining analysis

Search Result 1,208, Processing Time 0.028 seconds

Predicting numeric ratings for Google apps using text features and ensemble learning

  • Umer, Muhammad;Ashraf, Imran;Mehmood, Arif;Ullah, Saleem;Choi, Gyu Sang
    • ETRI Journal
    • /
    • v.43 no.1
    • /
    • pp.95-108
    • /
    • 2021
  • Application (app) ratings are feedback provided voluntarily by users and serve as important evaluation criteria for apps. However, these ratings can often be biased owing to insufficient or missing votes. Additionally, significant differences have been observed between numeric ratings and user reviews. This study aims to predict the numeric ratings of Google apps using machine learning classifiers. It exploits numeric app ratings provided by users as training data and returns authentic mobile app ratings by analyzing user reviews. An ensemble learning model is proposed for this purpose that considers term frequency/inverse document frequency (TF/IDF) features. Three TF/IDF features, including unigrams, bigrams, and trigrams, were used. The dataset was scraped from the Google Play store, extracting data from 14 different app categories. Biased and unbiased user ratings were discriminated using TextBlob analysis to formulate the ground truth, from which the classifier prediction accuracy was then evaluated. The results demonstrate the high potential for machine learning-based classifiers to predict authentic numeric ratings based on actual user reviews.

Safety Culture: A Retrospective Analysis of Occupational Health and Safety Mining Reports

  • Tetzlaff, Emily J.;Goggins, Katie A.;Pegoraro, Ann L.;Dorman, Sandra C.;Pakalnis, Vic;Eger, Tammy R.
    • Safety and Health at Work
    • /
    • v.12 no.2
    • /
    • pp.201-208
    • /
    • 2021
  • Background: In the mining industry, various methods of accident analysis have utilized official accident investigations to try and establish broader causation mechanisms. An emerging area of interest is identifying the extent to which cultural influences, such as safety culture, are acting as drivers in the reoccurrence of accidents. Thus, the overall objective of this study was to analyze occupational health and safety (OHS) reports in mining to investigate if/how safety culture has historically been framed in the mining industry, as it relates to accident causation. Methods: Using a computer-assisted qualitative data analysis software, 34 definitions of safety culture were analyzed to highlight key terms. Based on word count and contextual relevance, 26 key terms were captured. Ten OHS reports were then analyzed via an inductive thematic analysis, using the key terms. This analysis provided a concept map representing the 50-year data set and facilitated the use of text framing to highlight safety culture in the selected OHS mining reports. Results: Overall, 954 references and six themes, safety culture, attitude, competence, belief, patterns, and norms, were identified in the data set. Of the 26 key terms originally identified, 24 of them were captured within the text. The results made evident two distinct frames in which to interpret the data: the role of the individual and the role of the organization, in safety culture. Conclusion: Unless efforts are made to understand and alter cultural drivers and share these findings within and across industries, the same accidents are likely to continue to occur.

A Study on the Finding of Promising Export Items in Defense industry for Export Market Expansion-Focusing on Text Mining Analysis-

  • Yeo, Seoyoon;Jeong, Jong Hee;Kim, Seong Ho
    • Journal of the Korea Society of Computer and Information
    • /
    • v.27 no.10
    • /
    • pp.235-243
    • /
    • 2022
  • This paper aims to find promising export items for market expansion of defense export items. Germany, the UK, and France were selected as export target countries to obtain unstructured forecast data on weapons system acquisition plans for the next ten years by each country. Using the TF-IDF in text mining analysis, keywords that appeared frequently in data from three countries were derived. As a result of this paper, keywords for each country's major acquisition projects drawing. However, most of the derived keywords were related to mainstay weapon systems produced by domestic defense companies in each country. To discover promising export items from text mining, we proposed that the drawn keywords are distinguished as similar weapon systems. In addition, we assort the weapon systems that the three countries will get a plan to acquire commonly. As a result of this paper, it can be seen that the current promising export item is a weapon system related to the information system. Prioritizing overseas demands using key words can set clear market entry goals. In the case of domestic companies based on needs, it is possible to establish a specific entry strategy. Relevant organizations also can provide customized marketing support.

Analysis of Trends in Domestic Learning Counseling Research Using Text Mining Methods (텍스트 마이닝 방법을 활용한 국내 학습상담 연구 동향 분석)

  • Hyun, Yong-Chan;Yang, Ji-Hye;Park, Jung-Hwan
    • Journal of Convergence for Information Technology
    • /
    • v.12 no.3
    • /
    • pp.302-310
    • /
    • 2022
  • This study examined the results obtained using the text mining method for research trends related to learning counseling among adolescents and suggested subsequent research directions. The top 1 and 2 of Korean youth concerns are learning and career paths. Topic modeling analysis was conducted using text mining techniques that can minimize researcher's subjectivity and prejudice for 201 academic papers above KCI registration candidates through RISS with keywords such as Learning Counseling and Academic Counseling. Learning counseling topic results showed counseling experience [topic 1], group counseling research [topic 2], parent counseling [topic 3], and learning technology program development [topic 4]. Research related to learning counseling is developing counseling for emotional stability. Group counseling, parent counseling, and learning technology programs. Learning counseling to solve adolescents' concerns is expected to continue research on integrated support through psychological emotion, parent counseling, and collaboration with learning technology experts.

A Study on the Integration Between Smart Mobility Technology and Information Communication Technology (ICT) Using Patent Analysis

  • Alkaabi, Khaled Sulaiman Khalfan Sulaiman;Yu, Jiwon
    • Journal of the Korea Society of Computer and Information
    • /
    • v.24 no.6
    • /
    • pp.89-97
    • /
    • 2019
  • This study proposes a method for investigating current patents related to information communication technology and smart mobility to provide insights into future technology trends. The method is based on text mining clustering analysis. The method consists of two stages, which are data preparation and clustering analysis, respectively. In the first stage, tokenizing, filtering, stemming, and feature selection are implemented to transform the data into a usable format (structured data) and to extract useful information for the next stage. In the second stage, the structured data is partitioned into groups. The K-medoids algorithm is selected over the K-means algorithm for this analysis owing to its advantages in dealing with noise and outliers. The results of the analysis indicate that most current patents focus mainly on smart connectivity and smart guide systems, which play a major role in the development of smart mobility.

Study on prediction for a film success using text mining (텍스트 마이닝을 활용한 영화흥행 예측 연구)

  • Lee, Sanghun;Cho, Jangsik;Kang, Changwan;Choi, Seungbae
    • Journal of the Korean Data and Information Science Society
    • /
    • v.26 no.6
    • /
    • pp.1259-1269
    • /
    • 2015
  • Recently, big data is positioning as a keyword in the academic circles. And usefulness of big data is carried into government, a local public body and enterprise as well as academic circles. Also they are endeavoring to obtain useful information in big data. This research mainly deals with analyses of box office success or failure of films using text mining. For data, it used a portal site 'D' and film review data, grade point average and the number of screens gained from the Korean Film Commission. The purpose of this paper is to propose a model to predict whether a film is success or not using these data. As a result of analysis, the correct classification rate by the prediction model method proposed in this paper is obtained 95.74%.

Sentiment Analysis Using Deep Learning Model based on Phoneme-level Korean (한글 음소 단위 딥러닝 모형을 이용한 감성분석)

  • Lee, Jae Jun;Kwon, Suhn Beom;Ahn, Sung Mahn
    • Journal of Information Technology Services
    • /
    • v.17 no.1
    • /
    • pp.79-89
    • /
    • 2018
  • Sentiment analysis is a technique of text mining that extracts feelings of the person who wrote the sentence like movie review. The preliminary researches of sentiment analysis identify sentiments by using the dictionary which contains negative and positive words collected in advance. As researches on deep learning are actively carried out, sentiment analysis using deep learning model with morpheme or word unit has been done. However, this model has disadvantages in that the word dictionary varies according to the domain and the number of morphemes or words gets relatively larger than that of phonemes. Therefore, the size of the dictionary becomes large and the complexity of the model increases accordingly. We construct a sentiment analysis model using recurrent neural network by dividing input data into phoneme-level which is smaller than morpheme-level. To verify the performance, we use 30,000 movie reviews from the Korean biggest portal, Naver. Morpheme-level sentiment analysis model is also implemented and compared. As a result, the phoneme-level sentiment analysis model is superior to that of the morpheme-level, and in particular, the phoneme-level model using LSTM performs better than that of using GRU model. It is expected that Korean text processing based on a phoneme-level model can be applied to various text mining and language models.

Performance analysis of volleyball games using the social network and text mining techniques (사회네트워크분석과 텍스트마이닝을 이용한 배구 경기력 분석)

  • Kang, Byounguk;Huh, Mankyu;Choi, Seungbae
    • Journal of the Korean Data and Information Science Society
    • /
    • v.26 no.3
    • /
    • pp.619-630
    • /
    • 2015
  • The purpose of this study is to provide basic information to develop a game strategy plan of a team in a future by identifying the patterns of attack and pass of national men's professional volleyball teams and extracting core key words related with volleyball game performance to evaluate game performance using 'social network analysis' and 'text mining'. As for the analysis result of 'social network analysis' with the whole data, group '0' (6 players) and group '1' (11 players) were partitioned. A point of view the degree centrality and betweenness centrality in 'social network analysis' results, we can know that the group '1' more active game performance than the group '0'. The significant result for two group (win and loss) obtained by 'text mining' according to two groups ('0' and '1') obtained by 'social network analysis' showed significant difference (p-value: 0.001). As for clustering of each network, group '0' had the tendency to score points through set player D and E. In group '1', the player K had the tendency to fail if he attack through 'dig'; players C and D have a good performance through 'set' play.

Big Data Analysis on the Perception of Home Training According to the Implementation of COVID-19 Social Distancing

  • Hyun-Chang Keum;Kyung-Won Byun
    • International Journal of Internet, Broadcasting and Communication
    • /
    • v.15 no.3
    • /
    • pp.211-218
    • /
    • 2023
  • Due to the implementation of COVID-19 distancing, interest and users in 'home training' are rapidly increasing. Therefore, the purpose of this study is to identify the perception of 'home training' through big data analysis on social media channels and provide basic data to related business sector. Social media channels collected big data from various news and social content provided on Naver and Google sites. Data for three years from March 22, 2020 were collected based on the time when COVID-19 distancing was implemented in Korea. The collected data included 4,000 Naver blogs, 2,673 news, 4,000 cafes, 3,989 knowledge IN, and 953 Google channel news. These data analyzed TF and TF-IDF through text mining, and through this, semantic network analysis was conducted on 70 keywords, big data analysis programs such as Textom and Ucinet were used for social big data analysis, and NetDraw was used for visualization. As a result of text mining analysis, 'home training' was found the most frequently in relation to TF with 4,045 times. The next order is 'exercise', 'Homt', 'house', 'apparatus', 'recommendation', and 'diet'. Regarding TF-IDF, the main keywords are 'exercise', 'apparatus', 'home', 'house', 'diet', 'recommendation', and 'mat'. Based on these results, 70 keywords with high frequency were extracted, and then semantic indicators and centrality analysis were conducted. Finally, through CONCOR analysis, it was clustered into 'purchase cluster', 'equipment cluster', 'diet cluster', and 'execute method cluster'. For the results of these four clusters, basic data on the 'home training' business sector were presented based on consumers' main perception of 'home training' and analysis of the meaning network.

Perceptions and Trends of Digital Fashion Technology - A Big Data Analysis - (빅데이터 분석을 이용한 디지털 패션 테크에 대한 인식 연구)

  • Song, Eun-young;Lim, Ho-sun
    • Fashion & Textile Research Journal
    • /
    • v.23 no.3
    • /
    • pp.380-389
    • /
    • 2021
  • This study aimed to reveal the perceptions and trends of digital fashion technology through an informational approach. A big data analysis was conducted after collecting the text shown in a web environment from April 2019 to April 2021. Key words were derived through text mining analysis and network analysis, and the structure of perception of digital fashion technology was identified. Using textoms, we collected 8144 texts after data refinement, conducted a frequency of emergence and central component analysis, and visualized the results with word cloud and N-gram. The frequency of appearance also generated matrices with the top 70 words, and a structural equivalent analysis was performed. The results were presented with network visualizations and dendrograms. Fashion, digital, and technology were the most frequently mentioned topics, and the frequencies of platform, digital transformation, and start-ups were also high. Through clustering, four clusters of marketing were formed using fashion, digital technology, startups, and augmented reality/virtual reality technology. Future research on startups and smart factories with technologies based on stable platforms is needed. The results of this study contribute to increasing the fashion industry's knowledge on digital fashion technology and can be used as a foundational study for the development of research on related topics.