• Title/Summary/Keyword: keyword-based learning

Search Result 131, Processing Time 0.026 seconds

Analysis of Business Performance of Local SMEs Based on Various Alternative Information and Corporate SCORE Index

  • HWANG, Sun Hee;KIM, Hee Jae;KWAK, Dong Chul
    • The Journal of Economics, Marketing and Management
    • /
    • v.10 no.3
    • /
    • pp.21-36
    • /
    • 2022
  • Purpose: The purpose of this study is to compare and analyze the enterprise's score index calculated from atypical data and corrected data. Research design, data, and methodology: In this study, news articles which are non-financial information but qualitative data were collected from 2,432 SMEs that has been extracted "square proportional stratification" out of 18,910 enterprises with fixed data and compared/analyzed each enterprise's score index through text mining analysis methodology. Result: The analysis showed that qualitative data can be quantitatively evaluated by region, industry and period by collecting news from SMEs, and that there are concerns that it could be an element of alternative credit evaluation. Conclusion: News data cannot be collected even if one of the small businesses is self-employed or small businesses has little or no news coverage. Data normalization or standardization should be considered to overcome the difference in scores due to the amount of reference. Furthermore, since keyword sentiment analysis may have different results depending on the researcher's point of view, it is also necessary to consider deep learning sentiment analysis, which is conducted by sentence.

Geographical Name Denoising by Machine Learning of Event Detection Based on Twitter (트위터 기반 이벤트 탐지에서의 기계학습을 통한 지명 노이즈제거)

  • Woo, Seungmin;Hwang, Byung-Yeon
    • KIPS Transactions on Software and Data Engineering
    • /
    • v.4 no.10
    • /
    • pp.447-454
    • /
    • 2015
  • This paper proposes geographical name denoising by machine learning of event detection based on twitter. Recently, the increasing number of smart phone users are leading the growing user of SNS. Especially, the functions of short message (less than 140 words) and follow service make twitter has the power of conveying and diffusing the information more quickly. These characteristics and mobile optimised feature make twitter has fast information conveying speed, which can play a role of conveying disasters or events. Related research used the individuals of twitter user as the sensor of event detection to detect events that occur in reality. This research employed geographical name as the keyword by using the characteristic that an event occurs in a specific place. However, it ignored the denoising of relationship between geographical name and homograph, it became an important factor to lower the accuracy of event detection. In this paper, we used removing and forecasting, these two method to applied denoising technique. First after processing the filtering step by using noise related database building, we have determined the existence of geographical name by using the Naive Bayesian classification. Finally by using the experimental data, we earned the probability value of machine learning. On the basis of forecast technique which is proposed in this paper, the reliability of the need for denoising technique has turned out to be 89.6%.

Keyword Retrieval-Based Korean Text Command System Using Morphological Analyzer (형태소 분석기를 이용한 키워드 검색 기반 한국어 텍스트 명령 시스템)

  • Park, Dae-Geun;Lee, Wan-Bok
    • Journal of the Korea Convergence Society
    • /
    • v.10 no.2
    • /
    • pp.159-165
    • /
    • 2019
  • Based on deep learning technology, speech recognition method has began to be applied to commercial products, but it is still difficult to be used in the area of VR contents, since there is no easy and efficient way to process the recognized text after the speech recognition module. In this paper, we propose a Korean Language Command System, which can efficiently recognize and respond to Korean speech commands. The system consists of two components. One is a morphological analyzer to analyze sentence morphemes and the other is a retrieval based model which is usually used to develop a chatbot system. Experimental results shows that the proposed system requires only 16% commands to achieve the same level of performance when compared with the conventional string comparison method. Furthermore, when working with Google Cloud Speech module, it revealed 60.1% of success rate. Experimental results show that the proposed system is more efficient than the conventional string comparison method.

The Study on Implementation of Crime Terms Classification System for Crime Issues Response

  • Jeong, Inkyu;Yoon, Cheolhee;Kang, Jang Mook
    • International Journal of Advanced Culture Technology
    • /
    • v.8 no.3
    • /
    • pp.61-72
    • /
    • 2020
  • The fear of crime, discussed in the early 1960s in the United States, is a psychological response, such as anxiety or concern about crime, the potential victim of a crime. These anxiety factors lead to the burden of the individual in securing the psychological stability and indirect costs of the crime against the society. Fear of crime is not a good thing, and it is a part that needs to be adjusted so that it cannot be exaggerated and distorted by the policy together with the crime coping and resolution. This is because fear of crime has as much harm as damage caused by criminal act. Eric Pawson has argued that the popular impression of violent crime is not formed because of media reports, but by official statistics. Therefore, the police should watch and analyze news related to fear of crime to reduce the social cost of fear of crime and prepare a preemptive response policy before the people have 'fear of crime'. In this paper, we propose a deep - based news classification system that helps police cope with crimes related to crimes reported in the media efficiently and quickly and precisely. The goal is to establish a system that can quickly identify changes in security issues that are rapidly increasing by categorizing news related to crime among news articles. To construct the system, crime data was learned so that news could be classified according to the type of crime. Deep learning was applied by using Google tensor flow. In the future, it is necessary to continue research on the importance of keyword according to early detection of issues that are rapidly increasing by crime type and the power of the press, and it is also necessary to constantly supplement crime related corpus.

Legal search method using S-BERT

  • Park, Gil-sik;Kim, Jun-tae
    • Journal of the Korea Society of Computer and Information
    • /
    • v.27 no.11
    • /
    • pp.57-66
    • /
    • 2022
  • In this paper, we propose a legal document search method that uses the Sentence-BERT model. The general public who wants to use the legal search service has difficulty searching for relevant precedents due to a lack of understanding of legal terms and structures. In addition, the existing keyword and text mining-based legal search methods have their limits in yielding quality search results for two reasons: they lack information on the context of the judgment, and they fail to discern homonyms and polysemies. As a result, the accuracy of the legal document search results is often unsatisfactory or skeptical. To this end, This paper aims to improve the efficacy of the general public's legal search in the Supreme Court precedent and Legal Aid Counseling case database. The Sentence-BERT model embeds contextual information on precedents and counseling data, which better preserves the integrity of relevant meaning in phrases or sentences. Our initial research has shown that the Sentence-BERT search method yields higher accuracy than the Doc2Vec or TF-IDF search methods.

A Study on Automatic Classification of Newspaper Articles Based on Unsupervised Learning by Departments (비지도학습 기반의 행정부서별 신문기사 자동분류 연구)

  • Kim, Hyun-Jong;Ryu, Seung-Eui;Lee, Chul-Ho;Nam, Kwang Woo
    • Journal of the Korea Academia-Industrial cooperation Society
    • /
    • v.21 no.9
    • /
    • pp.345-351
    • /
    • 2020
  • Administrative agencies today are paying keen attention to big data analysis to improve their policy responsiveness. Of all the big data, news articles can be used to understand public opinion regarding policy and policy issues. The amount of news output has increased rapidly because of the emergence of new online media outlets, which calls for the use of automated bots or automatic document classification tools. There are, however, limits to the automatic collection of news articles related to specific agencies or departments based on the existing news article categories and keyword search queries. Thus, this paper proposes a method to process articles using classification glossaries that take into account each agency's different work features. To this end, classification glossaries were developed by extracting the work features of different departments using Word2Vec and topic modeling techniques from news articles related to different agencies. As a result, the automatic classification of newspaper articles for each department yielded approximately 71% accuracy. This study is meaningful in making academic and practical contributions because it presents a method of extracting the work features for each department, and it is an unsupervised learning-based automatic classification method for automatically classifying news articles relevant to each agency.

Korean Part-Of-Speech Tagging by using Head-Tail Tokenization (Head-Tail 토큰화 기법을 이용한 한국어 품사 태깅)

  • Suh, Hyun-Jae;Kim, Jung-Min;Kang, Seung-Shik
    • Smart Media Journal
    • /
    • v.11 no.5
    • /
    • pp.17-25
    • /
    • 2022
  • Korean part-of-speech taggers decompose a compound morpheme into unit morphemes and attach part-of-speech tags. So, here is a disadvantage that part-of-speech for morphemes are over-classified in detail and complex word types are generated depending on the purpose of the taggers. When using the part-of-speech tagger for keyword extraction in deep learning based language processing, it is not required to decompose compound particles and verb-endings. In this study, the part-of-speech tagging problem is simplified by using a Head-Tail tokenization technique that divides only two types of tokens, a lexical morpheme part and a grammatical morpheme part that the problem of excessively decomposed morpheme was solved. Part-of-speech tagging was attempted with a statistical technique and a deep learning model on the Head-Tail tokenized corpus, and the accuracy of each model was evaluated. Part-of-speech tagging was implemented by TnT tagger, a statistical-based part-of-speech tagger, and Bi-LSTM tagger, a deep learning-based part-of-speech tagger. TnT tagger and Bi-LSTM tagger were trained on the Head-Tail tokenized corpus to measure the part-of-speech tagging accuracy. As a result, it showed that the Bi-LSTM tagger performs part-of-speech tagging with a high accuracy of 99.52% compared to 97.00% for the TnT tagger.

Analysis of Smart Factory Research Trends Based on Big Data Analysis (빅데이터 분석을 활용한 스마트팩토리 연구 동향 분석)

  • Lee, Eun-Ji;Cho, Chul-Ho
    • Journal of Korean Society for Quality Management
    • /
    • v.49 no.4
    • /
    • pp.551-567
    • /
    • 2021
  • Purpose: The purpose of this paper is to present implications by analyzing research trends on smart factories by text analysis and visual analysis(Comprehensive/ Fields / Years-based) which are big data analyses, by collecting data based on previous studies on smart factories. Methods: For the collection of analysis data, deep learning was used in the integrated search on the Academic Research Information Service (www.riss.kr) to search for "SMART FACTORY" and "Smart Factory" as search terms, and the titles and Korean abstracts were scrapped out of the extracted paper and they are organize into EXCEL. For the final step, 739 papers derived were analyzed using the Rx64 4.0.2 program and Rstudio using text mining, one of the big data analysis techniques, and Word Cloud for visualization. Results: The results of this study are as follows; Smart factory research slowed down from 2005 to 2014, but until 2019, research increased rapidly. According to the analysis by fields, smart factories were studied in the order of engineering, social science, and complex science. There were many 'engineering' fields in the early stages of smart factories, and research was expanded to 'social science'. In particular, since 2015, it has been studied in various disciplines such as 'complex studies'. Overall, in keyword analysis, the keywords such as 'technology', 'data', and 'analysis' are most likely to appear, and it was analyzed that there were some differences by fields and years. Conclusion: Government support and expert support for smart factories should be activated, and researches on technology-based strategies are needed. In the future, it is necessary to take various approaches to smart factories. If researches are conducted in consideration of the environment or energy, it is judged that bigger implications can be presented.

An Development of Image Retrieval Model based on Image2Vec using GAN (Generative Adversarial Network를 활용한 Image2Vec기반 이미지 검색 모델 개발)

  • Jo, Jaechoon;Lee, Chanhee;Lee, Dongyub;Lim, Heuiseok
    • Journal of Digital Convergence
    • /
    • v.16 no.12
    • /
    • pp.301-307
    • /
    • 2018
  • The most of the IR focus on the method for searching the document, so the keyword-based IR system is not able to reflect the feature information of the image. In order to overcome these limitations, we have developed a system that can search similar images based on the vector information of images, and it can search for similar images based on sketches. The proposed system uses the GAN to up sample the sketch to the image level, convert the image to the vector through the CNN, and then retrieve the similar image using the vector space model. The model was learned using fashion image and the image retrieval system was developed. As a result, the result is showed meaningful performance.

Understanding of Generative Artificial Intelligence Based on Textual Data and Discussion for Its Application in Science Education (텍스트 기반 생성형 인공지능의 이해와 과학교육에서의 활용에 대한 논의)

  • Hunkoog Jho
    • Journal of The Korean Association For Science Education
    • /
    • v.43 no.3
    • /
    • pp.307-319
    • /
    • 2023
  • This study aims to explain the key concepts and principles of text-based generative artificial intelligence (AI) that has been receiving increasing interest and utilization, focusing on its application in science education. It also highlights the potential and limitations of utilizing generative AI in science education, providing insights for its implementation and research aspects. Recent advancements in generative AI, predominantly based on transformer models consisting of encoders and decoders, have shown remarkable progress through optimization of reinforcement learning and reward models using human feedback, as well as understanding context. Particularly, it can perform various functions such as writing, summarizing, keyword extraction, evaluation, and feedback based on the ability to understand various user questions and intents. It also offers practical utility in diagnosing learners and structuring educational content based on provided examples by educators. However, it is necessary to examine the concerns regarding the limitations of generative AI, including the potential for conveying inaccurate facts or knowledge, bias resulting from overconfidence, and uncertainties regarding its impact on user attitudes or emotions. Moreover, the responses provided by generative AI are probabilistic based on response data from many individuals, which raises concerns about limiting insightful and innovative thinking that may offer different perspectives or ideas. In light of these considerations, this study provides practical suggestions for the positive utilization of AI in science education.