• Title/Summary/Keyword: 분류시스템

Search Result 6,451, Processing Time 0.035 seconds

Analysis of News Agenda Using Text mining and Semantic Network Analysis: Focused on COVID-19 Emotions (텍스트 마이닝과 의미 네트워크 분석을 활용한 뉴스 의제 분석: 코로나 19 관련 감정을 중심으로)

  • Yoo, So-yeon;Lim, Gyoo-gun
    • Journal of Intelligence and Information Systems
    • /
    • v.27 no.1
    • /
    • pp.47-64
    • /
    • 2021
  • The global spread of COVID-19 around the world has not only affected many parts of our daily life but also has a huge impact on many areas, including the economy and society. As the number of confirmed cases and deaths increases, medical staff and the public are said to be experiencing psychological problems such as anxiety, depression, and stress. The collective tragedy that accompanies the epidemic raises fear and anxiety, which is known to cause enormous disruptions to the behavior and psychological well-being of many. Long-term negative emotions can reduce people's immunity and destroy their physical balance, so it is essential to understand the psychological state of COVID-19. This study suggests a method of monitoring medial news reflecting current days which requires striving not only for physical but also for psychological quarantine in the prolonged COVID-19 situation. Moreover, it is presented how an easier method of analyzing social media networks applies to those cases. The aim of this study is to assist health policymakers in fast and complex decision-making processes. News plays a major role in setting the policy agenda. Among various major media, news headlines are considered important in the field of communication science as a summary of the core content that the media wants to convey to the audiences who read it. News data used in this study was easily collected using "Bigkinds" that is created by integrating big data technology. With the collected news data, keywords were classified through text mining, and the relationship between words was visualized through semantic network analysis between keywords. Using the KrKwic program, a Korean semantic network analysis tool, text mining was performed and the frequency of words was calculated to easily identify keywords. The frequency of words appearing in keywords of articles related to COVID-19 emotions was checked and visualized in word cloud 'China', 'anxiety', 'situation', 'mind', 'social', and 'health' appeared high in relation to the emotions of COVID-19. In addition, UCINET, a specialized social network analysis program, was used to analyze connection centrality and cluster analysis, and a method of visualizing a graph using Net Draw was performed. As a result of analyzing the connection centrality between each data, it was found that the most central keywords in the keyword-centric network were 'psychology', 'COVID-19', 'blue', and 'anxiety'. The network of frequency of co-occurrence among the keywords appearing in the headlines of the news was visualized as a graph. The thickness of the line on the graph is proportional to the frequency of co-occurrence, and if the frequency of two words appearing at the same time is high, it is indicated by a thick line. It can be seen that the 'COVID-blue' pair is displayed in the boldest, and the 'COVID-emotion' and 'COVID-anxiety' pairs are displayed with a relatively thick line. 'Blue' related to COVID-19 is a word that means depression, and it was confirmed that COVID-19 and depression are keywords that should be of interest now. The research methodology used in this study has the convenience of being able to quickly measure social phenomena and changes while reducing costs. In this study, by analyzing news headlines, we were able to identify people's feelings and perceptions on issues related to COVID-19 depression, and identify the main agendas to be analyzed by deriving important keywords. By presenting and visualizing the subject and important keywords related to the COVID-19 emotion at a time, medical policy managers will be able to be provided a variety of perspectives when identifying and researching the regarding phenomenon. It is expected that it can help to use it as basic data for support, treatment and service development for psychological quarantine issues related to COVID-19.

The Design Improvement Plan of Seoul Forest Visitor Centers for Little Children (서울시 유아숲체험장의 공간 개선 방안)

  • Kim, Minjung;Jeong, Wookju
    • Journal of the Korean Institute of Landscape Architecture
    • /
    • v.49 no.6
    • /
    • pp.49-63
    • /
    • 2021
  • The Forest Visitor Centers for Little Children who means preschoolers is an educational facility that achieves holistic growth by experiencing forests, and it should not be completed by installing specific facilities in the forest environment, but should be a space where preschoolers can play freely in the forest environment themselves. This study comprehensively evaluated the current status of Seoul Forest Visitor Centers for Little Children and suggested space improvement measures to enhance the effectiveness of forest experience. Through the theoretical review, seven spatial elements that enhance the effect of forest experience and six areas composing outdoor play areas were derived to prepare an analysis table for current status evaluation, and field survey studies were conducted on 24 centers in Seoul. Through expert interviews, the physical status was examined from the perspective of childhood education and the experiences of the users were summarized. As a result of the study, the Seoul Forest Visitor Center for Little Children is classified into six types according to the location characteristics and spatial structure, and has the characteristics of each type. The effectiveness of forest experience can be enhanced by identifying and revealing the environmental strengths of individual centers. In the case of outdoor experience learning zones, the proportion of exercise play areas was very large. By evenly organizing the forest experience space for each area, it will be possible to provide more diverse experiences to preschoolers. However, the status of uniform facility-oriented cannot be viewed as a fragmentary factor that lowers the effect of forest experience. The key to increasing the effect of forest experience by inducing creative activities is the spatial composition that considers the surrounding natural environment. Facilities should be a medium to help preschoolers' interest move into the forest. This study prepared data to understand the average physical status of the Seoul Forest Visitor Center for Little Children and suggested space improvement measures to increase the effectiveness of forest experience. This can be used as basic data for research to improve the quality level of the Seoul Forest Visitor Center for Little Children about 10 years after the project was implemented.

Knowledge graph-based knowledge map for efficient expression and inference of associated knowledge (연관지식의 효율적인 표현 및 추론이 가능한 지식그래프 기반 지식지도)

  • Yoo, Keedong
    • Journal of Intelligence and Information Systems
    • /
    • v.27 no.4
    • /
    • pp.49-71
    • /
    • 2021
  • Users who intend to utilize knowledge to actively solve given problems proceed their jobs with cross- and sequential exploration of associated knowledge related each other in terms of certain criteria, such as content relevance. A knowledge map is the diagram or taxonomy overviewing status of currently managed knowledge in a knowledge-base, and supports users' knowledge exploration based on certain relationships between knowledge. A knowledge map, therefore, must be expressed in a networked form by linking related knowledge based on certain types of relationships, and should be implemented by deploying proper technologies or tools specialized in defining and inferring them. To meet this end, this study suggests a methodology for developing the knowledge graph-based knowledge map using the Graph DB known to exhibit proper functionality in expressing and inferring relationships between entities and their relationships stored in a knowledge-base. Procedures of the proposed methodology are modeling graph data, creating nodes, properties, relationships, and composing knowledge networks by combining identified links between knowledge. Among various Graph DBs, the Neo4j is used in this study for its high credibility and applicability through wide and various application cases. To examine the validity of the proposed methodology, a knowledge graph-based knowledge map is implemented deploying the Graph DB, and a performance comparison test is performed, by applying previous research's data to check whether this study's knowledge map can yield the same level of performance as the previous one did. Previous research's case is concerned with building a process-based knowledge map using the ontology technology, which identifies links between related knowledge based on the sequences of tasks producing or being activated by knowledge. In other words, since a task not only is activated by knowledge as an input but also produces knowledge as an output, input and output knowledge are linked as a flow by the task. Also since a business process is composed of affiliated tasks to fulfill the purpose of the process, the knowledge networks within a business process can be concluded by the sequences of the tasks composing the process. Therefore, using the Neo4j, considered process, task, and knowledge as well as the relationships among them are defined as nodes and relationships so that knowledge links can be identified based on the sequences of tasks. The resultant knowledge network by aggregating identified knowledge links is the knowledge map equipping functionality as a knowledge graph, and therefore its performance needs to be tested whether it meets the level of previous research's validation results. The performance test examines two aspects, the correctness of knowledge links and the possibility of inferring new types of knowledge: the former is examined using 7 questions, and the latter is checked by extracting two new-typed knowledge. As a result, the knowledge map constructed through the proposed methodology has showed the same level of performance as the previous one, and processed knowledge definition as well as knowledge relationship inference in a more efficient manner. Furthermore, comparing to the previous research's ontology-based approach, this study's Graph DB-based approach has also showed more beneficial functionality in intensively managing only the knowledge of interest, dynamically defining knowledge and relationships by reflecting various meanings from situations to purposes, agilely inferring knowledge and relationships through Cypher-based query, and easily creating a new relationship by aggregating existing ones, etc. This study's artifacts can be applied to implement the user-friendly function of knowledge exploration reflecting user's cognitive process toward associated knowledge, and can further underpin the development of an intelligent knowledge-base expanding autonomously through the discovery of new knowledge and their relationships by inference. This study, moreover than these, has an instant effect on implementing the networked knowledge map essential to satisfying contemporary users eagerly excavating the way to find proper knowledge to use.

The prediction of the stock price movement after IPO using machine learning and text analysis based on TF-IDF (증권신고서의 TF-IDF 텍스트 분석과 기계학습을 이용한 공모주의 상장 이후 주가 등락 예측)

  • Yang, Suyeon;Lee, Chaerok;Won, Jonggwan;Hong, Taeho
    • Journal of Intelligence and Information Systems
    • /
    • v.28 no.2
    • /
    • pp.237-262
    • /
    • 2022
  • There has been a growing interest in IPOs (Initial Public Offerings) due to the profitable returns that IPO stocks can offer to investors. However, IPOs can be speculative investments that may involve substantial risk as well because shares tend to be volatile, and the supply of IPO shares is often highly limited. Therefore, it is crucially important that IPO investors are well informed of the issuing firms and the market before deciding whether to invest or not. Unlike institutional investors, individual investors are at a disadvantage since there are few opportunities for individuals to obtain information on the IPOs. In this regard, the purpose of this study is to provide individual investors with the information they may consider when making an IPO investment decision. This study presents a model that uses machine learning and text analysis to predict whether an IPO stock price would move up or down after the first 5 trading days. Our sample includes 691 Korean IPOs from June 2009 to December 2020. The input variables for the prediction are three tone variables created from IPO prospectuses and quantitative variables that are either firm-specific, issue-specific, or market-specific. The three prospectus tone variables indicate the percentage of positive, neutral, and negative sentences in a prospectus, respectively. We considered only the sentences in the Risk Factors section of a prospectus for the tone analysis in this study. All sentences were classified into 'positive', 'neutral', and 'negative' via text analysis using TF-IDF (Term Frequency - Inverse Document Frequency). Measuring the tone of each sentence was conducted by machine learning instead of a lexicon-based approach due to the lack of sentiment dictionaries suitable for Korean text analysis in the context of finance. For this reason, the training set was created by randomly selecting 10% of the sentences from each prospectus, and the sentence classification task on the training set was performed after reading each sentence in person. Then, based on the training set, a Support Vector Machine model was utilized to predict the tone of sentences in the test set. Finally, the machine learning model calculated the percentages of positive, neutral, and negative sentences in each prospectus. To predict the price movement of an IPO stock, four different machine learning techniques were applied: Logistic Regression, Random Forest, Support Vector Machine, and Artificial Neural Network. According to the results, models that use quantitative variables using technical analysis and prospectus tone variables together show higher accuracy than models that use only quantitative variables. More specifically, the prediction accuracy was improved by 1.45% points in the Random Forest model, 4.34% points in the Artificial Neural Network model, and 5.07% points in the Support Vector Machine model. After testing the performance of these machine learning techniques, the Artificial Neural Network model using both quantitative variables and prospectus tone variables was the model with the highest prediction accuracy rate, which was 61.59%. The results indicate that the tone of a prospectus is a significant factor in predicting the price movement of an IPO stock. In addition, the McNemar test was used to verify the statistically significant difference between the models. The model using only quantitative variables and the model using both the quantitative variables and the prospectus tone variables were compared, and it was confirmed that the predictive performance improved significantly at a 1% significance level.

Analyzing Different Contexts for Energy Terms through Text Mining of Online Science News Articles (온라인 과학 기사 텍스트 마이닝을 통해 분석한 에너지 용어 사용의 맥락)

  • Oh, Chi Yeong;Kang, Nam-Hwa
    • Journal of Science Education
    • /
    • v.45 no.3
    • /
    • pp.292-303
    • /
    • 2021
  • This study identifies the terms frequently used together with energy in online science news articles and topics of the news reports to find out how the term energy is used in everyday life and to draw implications for science curriculum and instruction about energy. A total of 2,171 online news articles in science category published by 11 major newspaper companies in Korea for one year from March 1, 2018 were selected by using energy as a search term. As a result of natural language processing, a total of 51,224 sentences consisting of 507,901 words were compiled for analysis. Using the R program, term frequency analysis, semantic network analysis, and structural topic modeling were performed. The results show that the terms with exceptionally high frequencies were technology, research, and development, which reflected the characteristics of news articles that report new findings. On the other hand, terms used more than once per two articles were industry-related terms (industry, product, system, production, market) and terms that were sufficiently expected as energy-related terms such as 'electricity' and 'environment.' Meanwhile, 'sun', 'heat', 'temperature', and 'power generation', which are frequently used in energy-related science classes, also appeared as terms belonging to the highest frequency. From a network analysis, two clusters were found including terms related to industry and technology and terms related to basic science and research. From the analysis of terms paired with energy, it was also found that terms related to the use of energy such as 'energy efficiency,' 'energy saving,' and 'energy consumption' were the most frequently used. Out of 16 topics found, four contexts of energy were drawn including 'high-tech industry,' 'industry,' 'basic science,' and 'environment and health.' The results suggest that the introduction of the concept of energy degradation as a starting point for energy classes can be effective. It also shows the need to introduce high-tech industries or the context of environment and health into energy learning.

Development of Correction Formulas for KMA AAOS Soil Moisture Observation Data (기상청 농업기상관측망 토양수분 관측자료 보정식 개발)

  • Choi, Sung-Won;Park, Juhan;Kang, Minseok;Kim, Jongho;Sohn, Seungwon;Cho, Sungsik;Chun, Hyenchung;Jung, Ki-Yuol
    • Korean Journal of Agricultural and Forest Meteorology
    • /
    • v.24 no.1
    • /
    • pp.13-34
    • /
    • 2022
  • Soil moisture data have been collected at 11 agrometeorological stations operated by The Korea Meteorological Administration (KMA). This study aimed to verify the accuracy of soil moisture data of KMA and develop a correction formula to be applied to improve their quality. The soil of the observation field was sampled to analyze its physical properties that affect soil water content. Soil texture was classified to be sandy loam and loamy sand at most sites. The bulk density of the soil samples was about 1.5 g/cm3 on average. The content of silt and clay was also closely related to bulk density and water holding capacity. The EnviroSCAN model, which was used as a reference sensor, was calibrated using the self-manufactured "reference soil moisture observation system". Comparison between the calibrated reference sensor and the field sensor of KMA was conducted at least three times at each of the 11 sites. Overall, the trend of fluctuations over time in the measured values of the two sensors appeared similar. Still, there were sites where the latter had relatively lower soil moisture values than the former. A linear correction formula was derived for each site and depth using the range and average of the observed data for the given period. This correction formula resulted in an improvement in agreement between sensor values at the Suwon site. In addition, the detailed approach was developed to estimate the correction value for the period in which a correction formula was not calculated. In summary, the correction of soil moisture data at a regular time interval, e.g., twice a year, would be recommended for all observation sites to improve the quality of soil moisture observation data.

Transcriptomic Analysis of Triticum aestivum under Salt Stress Reveals Change of Gene Expression (RNA sequencing을 이용한 염 스트레스 처리 밀(Triticum aestivum)의 유전자 발현 차이 확인 및 후보 유전자 선발)

  • Jeon, Donghyun;Lim, Yoonho;Kang, Yuna;Park, Chulsoo;Lee, Donghoon;Park, Junchan;Choi, Uchan;Kim, Kyeonghoon;Kim, Changsoo
    • KOREAN JOURNAL OF CROP SCIENCE
    • /
    • v.67 no.1
    • /
    • pp.41-52
    • /
    • 2022
  • As a cultivar of Korean wheat, 'Keumgang' wheat variety has a fast growth period and can be grown stably. Hexaploid wheat (Triticum aestivum) has moderately high salt tolerance compared to tetraploid wheat (Triticum turgidum L.). However, the molecular mechanisms related to salt tolerance of hexaploid wheat have not been elucidated yet. In this study, the candidate genes related to salt tolerance were identified by investigating the genes that are differently expressed in Keumgang variety and examining salt tolerant mutation '2020-s1340.'. A total of 85,771,537 reads were obtained after quality filtering using NextSeq 500 Illumina sequencing technology. A total of 23,634,438 reads were aligned with the NCBI Campala Lr22a pseudomolecule v5 reference genome (Triticum aestivum). A total of 282 differentially expressed genes (DEGs) were identified in the two Triticum aestivum materials. These DEGs have functions, including salt tolerance related traits such as 'wall-associated receptor kinase-like 8', 'cytochrome P450', '6-phosphofructokinase 2'. In addition, the identified DEGs were classified into three categories, including biological process, molecular function, cellular component using gene ontology analysis. These DEGs were enriched significantly for terms such as the 'copper ion transport', 'oxidation-reduction process', 'alternative oxidase activity'. These results, which were obtained using RNA-seq analysis, will improve our understanding of salt tolerance of wheat. Moreover, this study will be a useful resource for breeding wheat varieties with improved salt tolerance using molecular breeding technology.

A Checklist to Improve the Fairness in AI Financial Service: Focused on the AI-based Credit Scoring Service (인공지능 기반 금융서비스의 공정성 확보를 위한 체크리스트 제안: 인공지능 기반 개인신용평가를 중심으로)

  • Kim, HaYeong;Heo, JeongYun;Kwon, Hochang
    • Journal of Intelligence and Information Systems
    • /
    • v.28 no.3
    • /
    • pp.259-278
    • /
    • 2022
  • With the spread of Artificial Intelligence (AI), various AI-based services are expanding in the financial sector such as service recommendation, automated customer response, fraud detection system(FDS), credit scoring services, etc. At the same time, problems related to reliability and unexpected social controversy are also occurring due to the nature of data-based machine learning. The need Based on this background, this study aimed to contribute to improving trust in AI-based financial services by proposing a checklist to secure fairness in AI-based credit scoring services which directly affects consumers' financial life. Among the key elements of trustworthy AI like transparency, safety, accountability, and fairness, fairness was selected as the subject of the study so that everyone could enjoy the benefits of automated algorithms from the perspective of inclusive finance without social discrimination. We divided the entire fairness related operation process into three areas like data, algorithms, and user areas through literature research. For each area, we constructed four detailed considerations for evaluation resulting in 12 checklists. The relative importance and priority of the categories were evaluated through the analytic hierarchy process (AHP). We use three different groups: financial field workers, artificial intelligence field workers, and general users which represent entire financial stakeholders. According to the importance of each stakeholder, three groups were classified and analyzed, and from a practical perspective, specific checks such as feasibility verification for using learning data and non-financial information and monitoring new inflow data were identified. Moreover, financial consumers in general were found to be highly considerate of the accuracy of result analysis and bias checks. We expect this result could contribute to the design and operation of fair AI-based financial services.

Characteristics of Flue Gas Using Direct Combustion of VOC and Ammonia (휘발성 유기 화합물 및 암모니아 직접 연소를 통한 배기가스 특성)

  • Kim, JongSu;Choi, SeukCheun;Jeong, SooHwa;Mock, ChinSung;Kim, DooBoem
    • Clean Technology
    • /
    • v.28 no.2
    • /
    • pp.131-137
    • /
    • 2022
  • The semiconductor process currently emits various by-products and unused gases. Emissions containing pollutants are generally classified into categories such as organic, acid, alkali, thermal, and cabinet exhaust. They are discharged after treatment in an atmospheric prevention facility suitable for each exhaust type. The main components of organic exhaust are volatile organic compounds (VOC), which is a generic term for oxygen-containing hydrocarbons, sulfur-containing hydrocarbons, and volatile hydrocarbons, while the main components of alkali exhaust include ammonia and tetramethylammonium hydroxide. The purpose of this study was to determine the combustion characteristics and analyze the NOX reduction rate by maintaining a direct combustion and temperature to process organic and alkaline exhaust gases simultaneously. Acetone, isopropyl alcohol (IPA), and propylene glycol methyl ether acetate (PGMEA) were used as VOCs and ammonia was used as an alkali exhaust material. Independent and VOC-ammonia mixture combustion tests were conducted for each material. The combustion tests for the VOCs confirmed that complete combustion occurred at an equivalence ratio of 1.4. In the ammonia combustion test, the NOX concentration decreased at a lower equivalence ratio. In the co-combustion of VOC and ammonia, NO was dominant in the NOX emission while NO2 was detected at approximately 10 ppm. Overall, the concentration of nitrogen oxide decreased due to the activation of the oxidation reaction as the reaction temperature increased. On the other hand, the concentration of carbon dioxide increased. Flameless combustion with an electric heat source achieved successful combustion of VOC and ammonia. This technology is expected to have advantages in cost and compactness compared to existing organic and alkaline treatment systems applied separately.

Development of a water quality prediction model for mineral springs in the metropolitan area using machine learning (머신러닝을 활용한 수도권 약수터 수질 예측 모델 개발)

  • Yeong-Woo Lim;Ji-Yeon Eom;Kee-Young Kwahk
    • Journal of Intelligence and Information Systems
    • /
    • v.29 no.1
    • /
    • pp.307-325
    • /
    • 2023
  • Due to the prolonged COVID-19 pandemic, the frequency of people who are tired of living indoors visiting nearby mountains and national parks to relieve depression and lethargy has exploded. There is a place where thousands of people who came out of nature stop walking and breathe and rest, that is the mineral spring. Even in mountains or national parks, there are about 600 mineral springs that can be found occasionally in neighboring parks or trails in the metropolitan area. However, due to irregular and manual water quality tests, people drink mineral water without knowing the test results in real time. Therefore, in this study, we intend to develop a model that can predict the quality of the spring water in real time by exploring the factors affecting the quality of the spring water and collecting data scattered in various places. After limiting the regions to Seoul and Gyeonggi-do due to the limitations of data collection, we obtained data on water quality tests from 2015 to 2020 for about 300 mineral springs in 18 cities where data management is well performed. A total of 10 factors were finally selected after two rounds of review among various factors that are considered to affect the suitability of the mineral spring water quality. Using AutoML, an automated machine learning technology that has recently been attracting attention, we derived the top 5 models based on prediction performance among about 20 machine learning methods. Among them, the catboost model has the highest performance with a prediction classification accuracy of 75.26%. In addition, as a result of examining the absolute influence of the variables used in the analysis through the SHAP method on the prediction, the most important factor was whether or not a water quality test was judged nonconforming in the previous water quality test. It was confirmed that the temperature on the day of the inspection and the altitude of the mineral spring had an influence on whether the water quality was unsuitable.