• Title/Summary/Keyword: Big data analysis method

Search Result 868, Processing Time 0.027 seconds

An exploratory study for the development of a education framework for supporting children's development in the convergence of "art activity" and "language activity": Focused on Text mining method ('미술'과 '언어' 활동 융합형의 아동 발달지원 교육 프레임워크 개발을 위한 탐색적 연구: 텍스트 마이닝을 중심으로)

  • Park, Yunmi;Kim, Sijeong
    • Journal of the Korea Convergence Society
    • /
    • v.12 no.3
    • /
    • pp.297-304
    • /
    • 2021
  • This study aims not only to access the visual thought-oriented approach that has been implemented in established art therapy and education but also to integrate language education and therapeutic approach to support the development of school-age children. Thus, text mining technique was applied to search for areas where different areas of language and art can be integrated. This research was conducted in accordance with the procedure of basic research, preliminary DB construction, text screening, DB pre-processing and confirmation, stop-words removing, text mining analysis and the deduction about the convergent areas. These results demonstrated that this study draws convergence areas related to regional, communication, and learning functions, areas related to problem solving and sensory organs, areas related to art and intelligence, areas related to information and communication, areas related to home and disability, topics, conceptualization, peer-related areas, integration, reorganization, attitudes. In conclusion, this study is meaningful in that it established a framework for designing an activity-centered convergence program of art and language in the future and attempted a holistic approach to support child development.

A Study on Popular Sentiment for Generation MZ: Through social media (SNS) sentiment analysis (MZ세대에 대한 대중감성 연구: 소셜미디어(SNS) 감성 분석을 통해)

  • Myung-suk Ann
    • The Journal of the Convergence on Culture Technology
    • /
    • v.9 no.1
    • /
    • pp.19-26
    • /
    • 2023
  • In this study, the public sensitivity of the 'MZ generation' was examined through the social media big data sensitivity analysis method. For the analysis, the consumer account SNS text was examined, and positive and negative emotional factors were presented by classifying external sensibilities and emotions of the MZ generation. In conclusion, the positive emotions of liking and interest in relation to the "MZ generation" were 72.1%, higher than the negative emotional ratio of 27.9%. In positive sensitivity, the older generation showed 'a favorable feeling for the individuality and dignifiedness of the MZ generation' and 'interest in the MZ generation with new values'. In contrast, the MZ generation has a favorable feeling for 'the fact that they are a generation of their own boldness, youthfulness and individuality' and 'small growthism'. Negative sensitivity outside the MZ generation was found to be 'A concern about the marriage avoidance, employment difficulties, debt investment, and resignation trends of the MZ generation', 'Hate the MZ generation who treats Kkondae' and 'Difficult to talk to the MZ generation'. On the other hand, the negative emotions felt by the MZ generation itself were 'Rejection of generalization', 'Rejection of generation and gender conflicts', 'Rejection of competition worse than the older generation', 'Relative failure of the rich era', and 'Sadness to live in a predicted climate disaster'. Therefore, the older generation should not look at the MZ generation in general, but as individuals, and should alleviate conflicts with intergenerational understanding and empathy. there is a need for community consideration to solve generational conflicts, gender conflicts, and environmental problems.

Selective Word Embedding for Sentence Classification by Considering Information Gain and Word Similarity (문장 분류를 위한 정보 이득 및 유사도에 따른 단어 제거와 선택적 단어 임베딩 방안)

  • Lee, Min Seok;Yang, Seok Woo;Lee, Hong Joo
    • Journal of Intelligence and Information Systems
    • /
    • v.25 no.4
    • /
    • pp.105-122
    • /
    • 2019
  • Dimensionality reduction is one of the methods to handle big data in text mining. For dimensionality reduction, we should consider the density of data, which has a significant influence on the performance of sentence classification. It requires lots of computations for data of higher dimensions. Eventually, it can cause lots of computational cost and overfitting in the model. Thus, the dimension reduction process is necessary to improve the performance of the model. Diverse methods have been proposed from only lessening the noise of data like misspelling or informal text to including semantic and syntactic information. On top of it, the expression and selection of the text features have impacts on the performance of the classifier for sentence classification, which is one of the fields of Natural Language Processing. The common goal of dimension reduction is to find latent space that is representative of raw data from observation space. Existing methods utilize various algorithms for dimensionality reduction, such as feature extraction and feature selection. In addition to these algorithms, word embeddings, learning low-dimensional vector space representations of words, that can capture semantic and syntactic information from data are also utilized. For improving performance, recent studies have suggested methods that the word dictionary is modified according to the positive and negative score of pre-defined words. The basic idea of this study is that similar words have similar vector representations. Once the feature selection algorithm selects the words that are not important, we thought the words that are similar to the selected words also have no impacts on sentence classification. This study proposes two ways to achieve more accurate classification that conduct selective word elimination under specific regulations and construct word embedding based on Word2Vec embedding. To select words having low importance from the text, we use information gain algorithm to measure the importance and cosine similarity to search for similar words. First, we eliminate words that have comparatively low information gain values from the raw text and form word embedding. Second, we select words additionally that are similar to the words that have a low level of information gain values and make word embedding. In the end, these filtered text and word embedding apply to the deep learning models; Convolutional Neural Network and Attention-Based Bidirectional LSTM. This study uses customer reviews on Kindle in Amazon.com, IMDB, and Yelp as datasets, and classify each data using the deep learning models. The reviews got more than five helpful votes, and the ratio of helpful votes was over 70% classified as helpful reviews. Also, Yelp only shows the number of helpful votes. We extracted 100,000 reviews which got more than five helpful votes using a random sampling method among 750,000 reviews. The minimal preprocessing was executed to each dataset, such as removing numbers and special characters from text data. To evaluate the proposed methods, we compared the performances of Word2Vec and GloVe word embeddings, which used all the words. We showed that one of the proposed methods is better than the embeddings with all the words. By removing unimportant words, we can get better performance. However, if we removed too many words, it showed that the performance was lowered. For future research, it is required to consider diverse ways of preprocessing and the in-depth analysis for the co-occurrence of words to measure similarity values among words. Also, we only applied the proposed method with Word2Vec. Other embedding methods such as GloVe, fastText, ELMo can be applied with the proposed methods, and it is possible to identify the possible combinations between word embedding methods and elimination methods.

Semantic Visualization of Dynamic Topic Modeling (다이내믹 토픽 모델링의 의미적 시각화 방법론)

  • Yeon, Jinwook;Boo, Hyunkyung;Kim, Namgyu
    • Journal of Intelligence and Information Systems
    • /
    • v.28 no.1
    • /
    • pp.131-154
    • /
    • 2022
  • Recently, researches on unstructured data analysis have been actively conducted with the development of information and communication technology. In particular, topic modeling is a representative technique for discovering core topics from massive text data. In the early stages of topic modeling, most studies focused only on topic discovery. As the topic modeling field matured, studies on the change of the topic according to the change of time began to be carried out. Accordingly, interest in dynamic topic modeling that handle changes in keywords constituting the topic is also increasing. Dynamic topic modeling identifies major topics from the data of the initial period and manages the change and flow of topics in a way that utilizes topic information of the previous period to derive further topics in subsequent periods. However, it is very difficult to understand and interpret the results of dynamic topic modeling. The results of traditional dynamic topic modeling simply reveal changes in keywords and their rankings. However, this information is insufficient to represent how the meaning of the topic has changed. Therefore, in this study, we propose a method to visualize topics by period by reflecting the meaning of keywords in each topic. In addition, we propose a method that can intuitively interpret changes in topics and relationships between or among topics. The detailed method of visualizing topics by period is as follows. In the first step, dynamic topic modeling is implemented to derive the top keywords of each period and their weight from text data. In the second step, we derive vectors of top keywords of each topic from the pre-trained word embedding model. Then, we perform dimension reduction for the extracted vectors. Then, we formulate a semantic vector of each topic by calculating weight sum of keywords in each vector using topic weight of each keyword. In the third step, we visualize the semantic vector of each topic using matplotlib, and analyze the relationship between or among the topics based on the visualized result. The change of topic can be interpreted in the following manners. From the result of dynamic topic modeling, we identify rising top 5 keywords and descending top 5 keywords for each period to show the change of the topic. Existing many topic visualization studies usually visualize keywords of each topic, but our approach proposed in this study differs from previous studies in that it attempts to visualize each topic itself. To evaluate the practical applicability of the proposed methodology, we performed an experiment on 1,847 abstracts of artificial intelligence-related papers. The experiment was performed by dividing abstracts of artificial intelligence-related papers into three periods (2016-2017, 2018-2019, 2020-2021). We selected seven topics based on the consistency score, and utilized the pre-trained word embedding model of Word2vec trained with 'Wikipedia', an Internet encyclopedia. Based on the proposed methodology, we generated a semantic vector for each topic. Through this, by reflecting the meaning of keywords, we visualized and interpreted the themes by period. Through these experiments, we confirmed that the rising and descending of the topic weight of a keyword can be usefully used to interpret the semantic change of the corresponding topic and to grasp the relationship among topics. In this study, to overcome the limitations of dynamic topic modeling results, we used word embedding and dimension reduction techniques to visualize topics by era. The results of this study are meaningful in that they broadened the scope of topic understanding through the visualization of dynamic topic modeling results. In addition, the academic contribution can be acknowledged in that it laid the foundation for follow-up studies using various word embeddings and dimensionality reduction techniques to improve the performance of the proposed methodology.

Analysis on the Uniformity of Temperature and Humidity According to Environment Control in Tomato Greenhouses (토마토 재배 온실의 환경조절에 따른 온습도 균일도 분석)

  • Nam, Sang-Woon;Kim, Young-Shik
    • Journal of Bio-Environment Control
    • /
    • v.18 no.3
    • /
    • pp.215-224
    • /
    • 2009
  • A survey on the actual state of heating, cooling, ventilation, and air-flow and experimental measurement of temperature and humidity distribution in tomato greenhouse were performed to provide fundamental data required in the development of air-flow control technology. In single-span plastic houses, which account for most of 136 tomato greenhouses surveyed, roof windows, ventilation and air-flow fans were installed in a low rate, and installation specs of those facilities showed a very large deviation. There were no farms installed greenhouse cooling facilities. In the hot air heating system, which account for most of heating type, installation specs of hot air duct showed also a large deviation. The exhaust air temperature and wind speed in hot air duct also were measured to have a big difference depending on the distance from the heater. We are using the maximum difference as indicator to determine whether temperature distribution is uniform. However if the temperature slope is not identical in greenhouse, it can't represent the uniformity. We analyzed relation between the maximum difference and the uniformity of temperature and humidity distribution. The uniformity was calculated using the mean and standard deviation of data from 12 measuring points. They showed high correlation but were represented differently by linear in the daytime and quadratic in the nighttime. It could see that the uniformity of temperature and humidity distribution was much different according to greenhouse type and heating method. The installation guidelines for ventilation and air-flow fan, the spread of greenhouse cooling technology for year-round stable production, and improvement of air duct and heating system, etc. are needed.

The 1:5,000 Forest Soil Map: Current Status and Future Directions (1:5,000 산림입지토양도의 제작과 활용 및 향후 발전 방향)

  • Kwon, Minyoung;Kim, Gaeun;Jeong, Jinhyun;Choi, Changeun;Park, Gwansoo;Kim, Choonsig;Son, Yowhan
    • Journal of Korean Society of Forest Science
    • /
    • v.110 no.4
    • /
    • pp.479-495
    • /
    • 2021
  • To improve on the efficient management of forest resources, it is necessary to create a forest soil map, which represents a comprehensive database of forest lands. Although a 1:25,000 scale forest site map has been used in Korea, the need for a large-scale forest soil map with high precision and information on forest lands that is specialized for individual purposes has been identified. Moreover, to keep pace with the advancement in forest management and transition to a digital society, it is essential to develop a method for constructing new forest soil maps that can diversify its use. Therefore, this paper presented a developmental process and used a 1:5,000 scale forest soil map to propose future directions. National maps showing the soil type, depth, and texture were produced based on the survey and analysis of forest soils, followed by the Forest Land Soil Map (1:5,000) Production Standard Manual. Alternatively, forest soil map data were the basis on which various other maps that can be used to prevent and predict forest disasters and evaluate environmental capacities were developed. Accordingly, ways to provide appropriate information to achieve the national forest plan, secure forestry big data, and accomplish sustainable forest management that corresponds to the national development plan are proposed based on results from the current study.

Molecular epidemiologic trends of norovirus and rotavirus infection and relation with climate factors: Cheonan, Korea, 2010-2019 (노로바이러스 및 로타바이러스 감염의 역학 및 기후요인과의 관계: 천안시, 2010-2019)

  • Oh, Eun Ju;Kim, Jang Mook;Kim, Jae Kyung
    • Journal of Digital Convergence
    • /
    • v.18 no.12
    • /
    • pp.425-434
    • /
    • 2020
  • Background: Viral infection outbreaks are emerging public health concerns. They often exhibit seasonal patterns that could be predicted by the application of big data and bioinformatic analyses. Purpose: The purpose of this study was to identify trends in diarrhea-causing viruses such as rotavirus (Gr.A), norovirus G-I, and norovirus G-II in Cheonan, Korea. The identified related factors of diarrhea-causing viruses may be used to predict their trend and prevent their infections. Method: A retrospective analysis of 4,009 fecal samples from June 2010 to December 2019 was carried out at Dankook University Hospital in Cheonan. Reverse transcription-PCR (RT-PCR) was employed to identify virus strains. Information about seasonal patterns of infection was extracted and compared with local weather data. Results: Out of the 4,009 fecal samples tested using multiplex RT-PCR (mRT-PCR), 985 were positive for infection with Gr.A, G-I, and G-II. Out of these 985 cases, 95.3% (n = 939) were under 10 years of age. Gr.A, G-I, and G-II showed high infection rates in patients under 10 years of age. Student's t-test showed a significant correlation between the detection rate of Gr.A and the relative humidity. The detection rate of G-II significantly correlated with wind-chill temperature. Conclusion: Climate factors differentially modulate rotavirus and norovirus infection patterns. These observations provide novel insights into the seasonal impact on the pathogenesis of Gr.A, G-I, and G-II.

Enterprise Human Resource Management using Hybrid Recognition Technique (하이브리드 인식 기술을 이용한 전사적 인적자원관리)

  • Han, Jung-Soo;Lee, Jeong-Heon;Kim, Gui-Jung
    • Journal of Digital Convergence
    • /
    • v.10 no.10
    • /
    • pp.333-338
    • /
    • 2012
  • Human resource management is bringing the various changes with the IT technology. In particular, if HRM is non-scientific method such as group management, physical plant, working hours constraints, personal contacts, etc, the current enterprise human resources management(e-HRM) appeared in the individual dimension management, virtual workspace (for example: smart work center, home work, etc.), working time flexibility and elasticity, computer-based statistical data and the scientific method of analysis and management has been a big difference in the sense. Therefore, depending on changes in the environment, companies have introduced a variety of techniques as RFID card, fingerprint time & attendance systems in order to build more efficient and strategic human resource management system. In this paper, time and attendance, access control management system was developed using multi camera for 2D and 3D face recognition technology-based for efficient enterprise human resource management. We had an issue with existing 2D-style face-recognition technology for lighting and the attitude, and got more than 90% recognition rate against the poor readability. In addition, 3D face recognition has computational complexities, so we could improve hybrid video recognition and the speed using 3D and 2D in parallel.

A Study on Market Size Estimation Method by Product Group Using Word2Vec Algorithm (Word2Vec을 활용한 제품군별 시장규모 추정 방법에 관한 연구)

  • Jung, Ye Lim;Kim, Ji Hui;Yoo, Hyoung Sun
    • Journal of Intelligence and Information Systems
    • /
    • v.26 no.1
    • /
    • pp.1-21
    • /
    • 2020
  • With the rapid development of artificial intelligence technology, various techniques have been developed to extract meaningful information from unstructured text data which constitutes a large portion of big data. Over the past decades, text mining technologies have been utilized in various industries for practical applications. In the field of business intelligence, it has been employed to discover new market and/or technology opportunities and support rational decision making of business participants. The market information such as market size, market growth rate, and market share is essential for setting companies' business strategies. There has been a continuous demand in various fields for specific product level-market information. However, the information has been generally provided at industry level or broad categories based on classification standards, making it difficult to obtain specific and proper information. In this regard, we propose a new methodology that can estimate the market sizes of product groups at more detailed levels than that of previously offered. We applied Word2Vec algorithm, a neural network based semantic word embedding model, to enable automatic market size estimation from individual companies' product information in a bottom-up manner. The overall process is as follows: First, the data related to product information is collected, refined, and restructured into suitable form for applying Word2Vec model. Next, the preprocessed data is embedded into vector space by Word2Vec and then the product groups are derived by extracting similar products names based on cosine similarity calculation. Finally, the sales data on the extracted products is summated to estimate the market size of the product groups. As an experimental data, text data of product names from Statistics Korea's microdata (345,103 cases) were mapped in multidimensional vector space by Word2Vec training. We performed parameters optimization for training and then applied vector dimension of 300 and window size of 15 as optimized parameters for further experiments. We employed index words of Korean Standard Industry Classification (KSIC) as a product name dataset to more efficiently cluster product groups. The product names which are similar to KSIC indexes were extracted based on cosine similarity. The market size of extracted products as one product category was calculated from individual companies' sales data. The market sizes of 11,654 specific product lines were automatically estimated by the proposed model. For the performance verification, the results were compared with actual market size of some items. The Pearson's correlation coefficient was 0.513. Our approach has several advantages differing from the previous studies. First, text mining and machine learning techniques were applied for the first time on market size estimation, overcoming the limitations of traditional sampling based- or multiple assumption required-methods. In addition, the level of market category can be easily and efficiently adjusted according to the purpose of information use by changing cosine similarity threshold. Furthermore, it has a high potential of practical applications since it can resolve unmet needs for detailed market size information in public and private sectors. Specifically, it can be utilized in technology evaluation and technology commercialization support program conducted by governmental institutions, as well as business strategies consulting and market analysis report publishing by private firms. The limitation of our study is that the presented model needs to be improved in terms of accuracy and reliability. The semantic-based word embedding module can be advanced by giving a proper order in the preprocessed dataset or by combining another algorithm such as Jaccard similarity with Word2Vec. Also, the methods of product group clustering can be changed to other types of unsupervised machine learning algorithm. Our group is currently working on subsequent studies and we expect that it can further improve the performance of the conceptually proposed basic model in this study.

An Analysis on the Expert Opinions of Future City Scenarios (미래도시 전망 분석)

  • Jo, Sung Su;Baek, Hyo Jin;Han, Hoon;Lee, Sang Ho
    • Journal of the Korean Regional Science Association
    • /
    • v.35 no.3
    • /
    • pp.59-76
    • /
    • 2019
  • This study aims to develop urban scenarios for future cities and validate the future city scenarios using a Delphi method. The scenarios of future city was derived from urban structure, land use, transportation, and urban infrastructure and development using big data analysis, environmental scanning techniques, and literature review. The Delphi survey interviewed 24 erudite scholars and experts across 6 nations including Korea, USA, UK, Japan, China, Australia and India. The Delphi survey structure was designed to test future city scenarios, verified by the 5-point Likert scale. The survey also asked the timing of each scenario likely happens by the three terms of near-future, mid-future and far-future. Results of the Delphi survey reveal the following points. Firstly, for the future urban structure it is anticipated that urban concentration continues and higher density living in global mega cities near future. In the mid-future small and medium size cities may decrease. Secondly, the land use pattern in the near-future is expected of increasing space sharing and mixed or layered vertical land-use. In addition underground space is likely to be extended in the mid-future. Thirdly, in the near-future, transport and infrastructure was expected to show ICT embedded integration platform and public and private smart transport. Finally, the result of Delphi survey shows that TOD (Transit Oriented Development) becomes a development norm and more emphasis on energy and environment fields.