• Title/Summary/Keyword: Learning data set

Search Result 1,114, Processing Time 0.027 seconds

Construction of Event Networks from Large News Data Using Text Mining Techniques (텍스트 마이닝 기법을 적용한 뉴스 데이터에서의 사건 네트워크 구축)

  • Lee, Minchul;Kim, Hea-Jin
    • Journal of Intelligence and Information Systems
    • /
    • v.24 no.1
    • /
    • pp.183-203
    • /
    • 2018
  • News articles are the most suitable medium for examining the events occurring at home and abroad. Especially, as the development of information and communication technology has brought various kinds of online news media, the news about the events occurring in society has increased greatly. So automatically summarizing key events from massive amounts of news data will help users to look at many of the events at a glance. In addition, if we build and provide an event network based on the relevance of events, it will be able to greatly help the reader in understanding the current events. In this study, we propose a method for extracting event networks from large news text data. To this end, we first collected Korean political and social articles from March 2016 to March 2017, and integrated the synonyms by leaving only meaningful words through preprocessing using NPMI and Word2Vec. Latent Dirichlet allocation (LDA) topic modeling was used to calculate the subject distribution by date and to find the peak of the subject distribution and to detect the event. A total of 32 topics were extracted from the topic modeling, and the point of occurrence of the event was deduced by looking at the point at which each subject distribution surged. As a result, a total of 85 events were detected, but the final 16 events were filtered and presented using the Gaussian smoothing technique. We also calculated the relevance score between events detected to construct the event network. Using the cosine coefficient between the co-occurred events, we calculated the relevance between the events and connected the events to construct the event network. Finally, we set up the event network by setting each event to each vertex and the relevance score between events to the vertices connecting the vertices. The event network constructed in our methods helped us to sort out major events in the political and social fields in Korea that occurred in the last one year in chronological order and at the same time identify which events are related to certain events. Our approach differs from existing event detection methods in that LDA topic modeling makes it possible to easily analyze large amounts of data and to identify the relevance of events that were difficult to detect in existing event detection. We applied various text mining techniques and Word2vec technique in the text preprocessing to improve the accuracy of the extraction of proper nouns and synthetic nouns, which have been difficult in analyzing existing Korean texts, can be found. In this study, the detection and network configuration techniques of the event have the following advantages in practical application. First, LDA topic modeling, which is unsupervised learning, can easily analyze subject and topic words and distribution from huge amount of data. Also, by using the date information of the collected news articles, it is possible to express the distribution by topic in a time series. Second, we can find out the connection of events in the form of present and summarized form by calculating relevance score and constructing event network by using simultaneous occurrence of topics that are difficult to grasp in existing event detection. It can be seen from the fact that the inter-event relevance-based event network proposed in this study was actually constructed in order of occurrence time. It is also possible to identify what happened as a starting point for a series of events through the event network. The limitation of this study is that the characteristics of LDA topic modeling have different results according to the initial parameters and the number of subjects, and the subject and event name of the analysis result should be given by the subjective judgment of the researcher. Also, since each topic is assumed to be exclusive and independent, it does not take into account the relevance between themes. Subsequent studies need to calculate the relevance between events that are not covered in this study or those that belong to the same subject.

Optimal supervised LSA method using selective feature dimension reduction (선택적 자질 차원 축소를 이용한 최적의 지도적 LSA 방법)

  • Kim, Jung-Ho;Kim, Myung-Kyu;Cha, Myung-Hoon;In, Joo-Ho;Chae, Soo-Hoan
    • Science of Emotion and Sensibility
    • /
    • v.13 no.1
    • /
    • pp.47-60
    • /
    • 2010
  • Most of the researches about classification usually have used kNN(k-Nearest Neighbor), SVM(Support Vector Machine), which are known as learn-based model, and Bayesian classifier, NNA(Neural Network Algorithm), which are known as statistics-based methods. However, there are some limitations of space and time when classifying so many web pages in recent internet. Moreover, most studies of classification are using uni-gram feature representation which is not good to represent real meaning of words. In case of Korean web page classification, there are some problems because of korean words property that the words have multiple meanings(polysemy). For these reasons, LSA(Latent Semantic Analysis) is proposed to classify well in these environment(large data set and words' polysemy). LSA uses SVD(Singular Value Decomposition) which decomposes the original term-document matrix to three different matrices and reduces their dimension. From this SVD's work, it is possible to create new low-level semantic space for representing vectors, which can make classification efficient and analyze latent meaning of words or document(or web pages). Although LSA is good at classification, it has some drawbacks in classification. As SVD reduces dimensions of matrix and creates new semantic space, it doesn't consider which dimensions discriminate vectors well but it does consider which dimensions represent vectors well. It is a reason why LSA doesn't improve performance of classification as expectation. In this paper, we propose new LSA which selects optimal dimensions to discriminate and represent vectors well as minimizing drawbacks and improving performance. This method that we propose shows better and more stable performance than other LSAs' in low-dimension space. In addition, we derive more improvement in classification as creating and selecting features by reducing stopwords and weighting specific values to them statistically.

  • PDF

An Analysis of the Use of Media Materials in School Health Education and Related Factors in Korea (학과보건교육에서의 매체활용실태 및 영향요인 분석)

  • Kim, Young-Im;Jung, Hye-Sun;Ahn, Ji-Young;Park, Jung-Young;Park, Eun-Ok
    • Journal of the Korean Society of School Health
    • /
    • v.12 no.2
    • /
    • pp.207-215
    • /
    • 1999
  • The objectives of this study are to explain the use of media materials in school health education with other related factors in elementary, middle, and high schools in Korea. The data were collected by questionnaires from June to September in 1998. The number of subjects were 294 school nurses. The PC-SAS program was used for statistical analysis such as percent distribution, chi-squared test, spearman correlation test, and logistic regression. The use of media materials in health education has become extremely common. Unfortunately, much of the early materials were of poor production quality, reflected low levels of interest, and generally did little to enhance health education programming. A recent trend in media materials is a move away from the fact filled production to a more affective, process-oriented approach. There is an obvious need for health educators to use high-quality, polished productions in order to counteract the same levels of quality used by commercial agencies that often promote "unhealthy" lifestyles. Health educators need to be aware of the advantages and disadvantages of the various forms of media. Selecting media materials should be based on more than cost, availability, and personal preference. Selection should be based on the goal of achieving behavioral objectives formulated before the review process begins. The decision to use no media materials rather than something of dubious quality usually be the right decision. Poor-quality, outdated, or boring materials will usually have a detrimental effect on the presentation. Media materials should be viewed as vehicles to enhance learning, not products that will stand in isolation. Process of materials is an essential part of the educational process. The major results were as follows : 1. The elementary schools used the materials more frequently. But the production rate of media materials was not enough. The budget was too small for a wide use of media materials in school health education. These findings suggest that all schools have to increase the budget of health education programs. 2. Computers offer an incredibly diverse set of possibilities for use in health education, ranging from complicated statistical analysis to elementary-school-level health education games. But the use rate of this material was not high. The development of related software is essential. Health educators would be well advised to develop a basic operating knowledge of media equipment. 3. In this study, the most effective materials were films in elementary school and videotapes in middle and high school. Film tends to be a more emotive medium than videotape. The difficulties of media selection involved the small amount of extant educational materials. Media selection is a multifaceted process and should be based on a combination of sound principles. 4. The review of material use following student levels showed that the more the contents were various, the more the use rate was high. 5. Health education videotapes and overhead projectors proved the most plentiful and widest media tools. The information depicted was more likely to be current. As a means to display both text and graphic information, this instructional medium has proven to be both effective and enduring. 6. An analysis of how effective the quality of school nurse and school use of media materials shows a result that is not complete (p=0.1113). But, the budget of health education is a significant variable. The increase of the budget therefore is essential to effective use of media materials. From these results it is recommended that various media materials be developed and be wide used.

  • PDF

The impact of functional brain change by transcranial direct current stimulation effects concerning circadian rhythm and chronotype (일주기 리듬과 일주기 유형이 경두개 직류전기자극에 의한 뇌기능 변화에 미치는 영향 탐색)

  • Jung, Dawoon;Yoo, Soomin;Lee, Hyunsoo;Han, Sanghoon
    • Korean Journal of Cognitive Science
    • /
    • v.33 no.1
    • /
    • pp.51-75
    • /
    • 2022
  • Transcranial direct current stimulation (tDCS) is a non-invasive brain stimulation that is able to alter neuronal activity in particular brain regions. Many studies have researched how tDCS modulates neuronal activity and reorganizes neural networks. However it is difficult to conclude the effect of brain stimulation because the studies are heterogeneous with respect to the stimulation parameter as well as individual difference. It is not fully in agreement with the effects of brain stimulation. In particular few studies have researched the reason of variability of brain stimulation in response to time so far. The study investigated individual variability of brain stimulation based on circadian rhythm and chronotype. Participants were divided into two groups which are morning type and evening type. The experiment was conducted by Zoom meeting which is video meeting programs. Participants were sent experiment tool which are Muse(EEG device), tdcs device, cell phone and cell phone holder after manuals for experimental equipment were explained. Participants were required to make a phone in frount of a camera so that experimenter can monitor online EEG data. Two participants who was difficult to use experimental devices experimented in a laboratory setting where experimenter set up devices. For all participants the accuracy of 98% was achieved by SVM using leave one out cross validation in classification in the the effects of morning stimulation and the evening stimulation. For morning type, the accuracy of 92% and 96% was achieved in classification in the morning stimulation and the evening stimulation. For evening type, it was 94% accuracy in classification for the effect of brain stimulation in the morning and the evening. Feature importance was different both in classification in the morning stimulation and the evening stimulation for morning type and evening type. Results indicated that the effect of brain stimulation can be explained with brain state and trait. Our study results noted that the tDCS protocol for target state is manipulated by individual differences as well as target state.