• Title/Summary/Keyword: 자동정보 추출

Search Result 1,995, Processing Time 0.036 seconds

A Study on the Comparison of Learning Performance in Capsule Endoscopy by Generating of PSR-Weigted Image (폴립 가중치 영상 생성을 통한 캡슐내시경 영상의 학습 성능 비교 연구)

  • Lim, Changnam;Park, Ye-Seul;Lee, Jung-Won
    • KIPS Transactions on Software and Data Engineering
    • /
    • v.8 no.6
    • /
    • pp.251-256
    • /
    • 2019
  • A capsule endoscopy is a medical device that can capture an entire digestive organ from the esophagus to the anus at one time. It produces a vast amount of images consisted of about 8~12 hours in length and more than 50,000 frames on a single examination. However, since the analysis of endoscopic images is performed manually by a medical imaging specialist, the automation requirements of the analysis are increasing to assist diagnosis of the disease in the image. Among them, this study focused on automatic detection of polyp images. A polyp is a protruding lesion that can be found in the gastrointestinal tract. In this paper, we propose a weighted-image generation method to enhance the polyp image learning by multi-scale analysis. It is a way to extract the suspicious region of the polyp through the multi-scale analysis and combine it with the original image to generate a weighted image, that can enhance the polyp image learning. We experimented with SVM and RF which is one of the machine learning methods for 452 pieces of collected data. The F1-score of detecting the polyp with only original images was 89.3%, but when combined with the weighted images generated by the proposed method, the F1-score was improved to about 93.1%.

Automatic Classification and Vocabulary Analysis of Political Bias in News Articles by Using Subword Tokenization (부분 단어 토큰화 기법을 이용한 뉴스 기사 정치적 편향성 자동 분류 및 어휘 분석)

  • Cho, Dan Bi;Lee, Hyun Young;Jung, Won Sup;Kang, Seung Shik
    • KIPS Transactions on Software and Data Engineering
    • /
    • v.10 no.1
    • /
    • pp.1-8
    • /
    • 2021
  • In the political field of news articles, there are polarized and biased characteristics such as conservative and liberal, which is called political bias. We constructed keyword-based dataset to classify bias of news articles. Most embedding researches represent a sentence with sequence of morphemes. In our work, we expect that the number of unknown tokens will be reduced if the sentences are constituted by subwords that are segmented by the language model. We propose a document embedding model with subword tokenization and apply this model to SVM and feedforward neural network structure to classify the political bias. As a result of comparing the performance of the document embedding model with morphological analysis, the document embedding model with subwords showed the highest accuracy at 78.22%. It was confirmed that the number of unknown tokens was reduced by subword tokenization. Using the best performance embedding model in our bias classification task, we extract the keywords based on politicians. The bias of keywords was verified by the average similarity with the vector of politicians from each political tendency.

CKFont2: An Improved Few-Shot Hangul Font Generation Model Based on Hangul Composability (CKFont2: 한글 구성요소를 이용한 개선된 퓨샷 한글 폰트 생성 모델)

  • Jangkyoung, Park;Ammar, Ul Hassan;Jaeyoung, Choi
    • KIPS Transactions on Software and Data Engineering
    • /
    • v.11 no.12
    • /
    • pp.499-508
    • /
    • 2022
  • A lot of research has been carried out on the Hangeul generation model using deep learning, and recently, research is being carried out how to minimize the number of characters input to generate one set of Hangul (Few-Shot Learning). In this paper, we propose a CKFont2 model using only 14 letters by analyzing and improving the CKFont (hereafter CKFont1) model using 28 letters. The CKFont2 model improves the performance of the CKFont1 model as a model that generates all Hangul using only 14 characters including 24 components (14 consonants and 10 vowels), where the CKFont1 model generates all Hangul by extracting 51 Hangul components from 28 characters. It uses the minimum number of characters for currently known models. From the basic consonants/vowels of Hangul, 27 components such as 5 double consonants, 11/11 compound consonants/vowels respectively are learned by deep learning and generated, and the generated 27 components are combined with 24 basic consonants/vowels. All Hangul characters are automatically generated from the combined 51 components. The superiority of the performance was verified by comparative analysis with results of the zi2zi, CKFont1, and MX-Font model. It is an efficient and effective model that has a simple structure and saves time and resources, and can be extended to Chinese, Thai, and Japanese.

An Efficient Estimation of Place Brand Image Power Based on Text Mining Technology (텍스트마이닝 기반의 효율적인 장소 브랜드 이미지 강도 측정 방법)

  • Choi, Sukjae;Jeon, Jongshik;Subrata, Biswas;Kwon, Ohbyung
    • Journal of Intelligence and Information Systems
    • /
    • v.21 no.2
    • /
    • pp.113-129
    • /
    • 2015
  • Location branding is a very important income making activity, by giving special meanings to a specific location while producing identity and communal value which are based around the understanding of a place's location branding concept methodology. Many other areas, such as marketing, architecture, and city construction, exert an influence creating an impressive brand image. A place brand which shows great recognition to both native people of S. Korea and foreigners creates significant economic effects. There has been research on creating a strategically and detailed place brand image, and the representative research has been carried out by Anholt who surveyed two million people from 50 different countries. However, the investigation, including survey research, required a great deal of effort from the workforce and required significant expense. As a result, there is a need to make more affordable, objective and effective research methods. The purpose of this paper is to find a way to measure the intensity of the image of the brand objective and at a low cost through text mining purposes. The proposed method extracts the keyword and the factors constructing the location brand image from the related web documents. In this way, we can measure the brand image intensity of the specific location. The performance of the proposed methodology was verified through comparison with Anholt's 50 city image consistency index ranking around the world. Four methods are applied to the test. First, RNADOM method artificially ranks the cities included in the experiment. HUMAN method firstly makes a questionnaire and selects 9 volunteers who are well acquainted with brand management and at the same time cities to evaluate. Then they are requested to rank the cities and compared with the Anholt's evaluation results. TM method applies the proposed method to evaluate the cities with all evaluation criteria. TM-LEARN, which is the extended method of TM, selects significant evaluation items from the items in every criterion. Then the method evaluates the cities with all selected evaluation criteria. RMSE is used to as a metric to compare the evaluation results. Experimental results suggested by this paper's methodology are as follows: Firstly, compared to the evaluation method that targets ordinary people, this method appeared to be more accurate. Secondly, compared to the traditional survey method, the time and the cost are much less because in this research we used automated means. Thirdly, this proposed methodology is very timely because it can be evaluated from time to time. Fourthly, compared to Anholt's method which evaluated only for an already specified city, this proposed methodology is applicable to any location. Finally, this proposed methodology has a relatively high objectivity because our research was conducted based on open source data. As a result, our city image evaluation text mining approach has found validity in terms of accuracy, cost-effectiveness, timeliness, scalability, and reliability. The proposed method provides managers with clear guidelines regarding brand management in public and private sectors. As public sectors such as local officers, the proposed method could be used to formulate strategies and enhance the image of their places in an efficient manner. Rather than conducting heavy questionnaires, the local officers could monitor the current place image very shortly a priori, than may make decisions to go over the formal place image test only if the evaluation results from the proposed method are not ordinary no matter what the results indicate opportunity or threat to the place. Moreover, with co-using the morphological analysis, extracting meaningful facets of place brand from text, sentiment analysis and more with the proposed method, marketing strategy planners or civil engineering professionals may obtain deeper and more abundant insights for better place rand images. In the future, a prototype system will be implemented to show the feasibility of the idea proposed in this paper.

Feasibility of Automated Detection of Inter-fractional Deviation in Patient Positioning Using Structural Similarity Index: Preliminary Results (Structural Similarity Index 인자를 이용한 방사선 분할 조사간 환자 체위 변화의 자동화 검출능 평가: 초기 보고)

  • Youn, Hanbean;Jeon, Hosang;Lee, Jayeong;Lee, Juhye;Nam, Jiho;Park, Dahl;Kim, Wontaek;Ki, Yongkan;Kim, Donghyun
    • Progress in Medical Physics
    • /
    • v.26 no.4
    • /
    • pp.258-266
    • /
    • 2015
  • The modern radiotherapy technique which delivers a large amount of dose to patients asks to confirm the positions of patients or tumors more accurately by using X-ray projection images of high-definition. However, a rapid increase in patient's exposure and image information for CT image acquisition may be additional burden on the patient. In this study, by introducing structural similarity (SSIM) index that can effectively extract the structural information of the image, we analyze the differences between daily acquired x-ray images of a patient to verify the accuracy of patient positioning. First, for simulating a moving target, the spherical computational phantoms changing the sizes and positions were created to acquire projected images. Differences between the images were automatically detected and analyzed by extracting their SSIM values. In addition, as a clinical test, differences between daily acquired x-ray images of a patient for 12 days were detected in the same way. As a result, we confirmed that the SSIM index was changed in the range of 0.85~1 (0.006~1 when a region of interest (ROI) was applied) as the sizes or positions of the phantom changed. The SSIM was more sensitive to the change of the phantom when the ROI was limited to the phantom itself. In the clinical test, the daily change of patient positions was 0.799~0.853 in SSIM values, those well described differences among images. Therefore, we expect that SSIM index can provide an objective and quantitative technique to verify the patient position using simple x-ray images, instead of time and cost intensive three-dimensional x-ray images.

A Proposal of a Keyword Extraction System for Detecting Social Issues (사회문제 해결형 기술수요 발굴을 위한 키워드 추출 시스템 제안)

  • Jeong, Dami;Kim, Jaeseok;Kim, Gi-Nam;Heo, Jong-Uk;On, Byung-Won;Kang, Mijung
    • Journal of Intelligence and Information Systems
    • /
    • v.19 no.3
    • /
    • pp.1-23
    • /
    • 2013
  • To discover significant social issues such as unemployment, economy crisis, social welfare etc. that are urgent issues to be solved in a modern society, in the existing approach, researchers usually collect opinions from professional experts and scholars through either online or offline surveys. However, such a method does not seem to be effective from time to time. As usual, due to the problem of expense, a large number of survey replies are seldom gathered. In some cases, it is also hard to find out professional persons dealing with specific social issues. Thus, the sample set is often small and may have some bias. Furthermore, regarding a social issue, several experts may make totally different conclusions because each expert has his subjective point of view and different background. In this case, it is considerably hard to figure out what current social issues are and which social issues are really important. To surmount the shortcomings of the current approach, in this paper, we develop a prototype system that semi-automatically detects social issue keywords representing social issues and problems from about 1.3 million news articles issued by about 10 major domestic presses in Korea from June 2009 until July 2012. Our proposed system consists of (1) collecting and extracting texts from the collected news articles, (2) identifying only news articles related to social issues, (3) analyzing the lexical items of Korean sentences, (4) finding a set of topics regarding social keywords over time based on probabilistic topic modeling, (5) matching relevant paragraphs to a given topic, and (6) visualizing social keywords for easy understanding. In particular, we propose a novel matching algorithm relying on generative models. The goal of our proposed matching algorithm is to best match paragraphs to each topic. Technically, using a topic model such as Latent Dirichlet Allocation (LDA), we can obtain a set of topics, each of which has relevant terms and their probability values. In our problem, given a set of text documents (e.g., news articles), LDA shows a set of topic clusters, and then each topic cluster is labeled by human annotators, where each topic label stands for a social keyword. For example, suppose there is a topic (e.g., Topic1 = {(unemployment, 0.4), (layoff, 0.3), (business, 0.3)}) and then a human annotator labels "Unemployment Problem" on Topic1. In this example, it is non-trivial to understand what happened to the unemployment problem in our society. In other words, taking a look at only social keywords, we have no idea of the detailed events occurring in our society. To tackle this matter, we develop the matching algorithm that computes the probability value of a paragraph given a topic, relying on (i) topic terms and (ii) their probability values. For instance, given a set of text documents, we segment each text document to paragraphs. In the meantime, using LDA, we can extract a set of topics from the text documents. Based on our matching process, each paragraph is assigned to a topic, indicating that the paragraph best matches the topic. Finally, each topic has several best matched paragraphs. Furthermore, assuming there are a topic (e.g., Unemployment Problem) and the best matched paragraph (e.g., Up to 300 workers lost their jobs in XXX company at Seoul). In this case, we can grasp the detailed information of the social keyword such as "300 workers", "unemployment", "XXX company", and "Seoul". In addition, our system visualizes social keywords over time. Therefore, through our matching process and keyword visualization, most researchers will be able to detect social issues easily and quickly. Through this prototype system, we have detected various social issues appearing in our society and also showed effectiveness of our proposed methods according to our experimental results. Note that you can also use our proof-of-concept system in http://dslab.snu.ac.kr/demo.html.

Systematic Approach to The Extraction of Effective Region for Tongue Diagnosis (설진 유효 영역 추출의 시스템적 접근 방법)

  • Kim, Keun-Ho;Do, Jun-Hyeong;Ryu, Hyun-Hee;Kim, Jong-Yeol
    • Journal of the Institute of Electronics Engineers of Korea SC
    • /
    • v.45 no.6
    • /
    • pp.123-131
    • /
    • 2008
  • In Oriental medicine, the status of a tongue is the important indicator to diagnose the condition of one's health like the physiological and the clinicopathological changes of internal organs in a body. A tongue diagnosis is not only convenient but also non-invasive, and therefore widely used in Oriental medicine. However, the tongue diagnosis is affected by examination circumstances like a light source, patient's posture, and doctor's condition a lot. To develop an automatic tongue diagnosis system for an objective and standardized diagnosis, segmenting a tongue region from a facial image captured and classifying tongue coating are inevitable but difficult since the colors of a tongue, lips, and skin in a mouth are similar. The proposed method includes preprocessing, over-segmenting, detecting the edge with a local minimum over a shading area from the structure of a tongue, correcting local minima or detecting the edge with the greatest color difference, selecting one edge to correspond to a tongue shape, and smoothing edges, where preprocessing consists of down-sampling to reduce computation time, histogram equalization, and edge enhancement, which produces the region of a segmented tongue. Finally, the systematic procedure separated only a tongue region from a face image with a tongue, which was obtained from a digital tongue diagnosis system. Oriental medical doctors' evaluation for the results illustrated that the segmented region excluding a non-tongue region provides important information for the accurate diagnosis. The proposed method can be used for an objective and standardized diagnosis and for an u-Healthcare system.

Technical Trends of Rare Metal Recycling in the Next Generation Automobile (차세대 자동차용 희소금속 리싸이클링 기술동향)

  • Hwang, Young-Gil;Kil, Sang-Cheol;Kim, Jong-Heon
    • Resources Recycling
    • /
    • v.23 no.2
    • /
    • pp.3-16
    • /
    • 2014
  • Car exhaust $CO_2$ gas reduction and fuel efficiency of the car lighter for the current era is a big challenge. The developments of high-performance Nd magnets, Li-ion secondary battery and exhaust gas purification performance of PGM catalysts used in the lightweight EV and HEV are activated. Country in order to improve the car lighter and function that use the resources of rare metals are ubiquitous imported from China because of export supply control, as soaring prices have unstable supply and demand. Compared to the emissions from the next-generation automotive recycling, waste scarce resources need to be. This study investigated the recycling technology analysis and development of the information technology, or delivered to the researchers by giving national car industry aims to contribute to the development. Findings, pulmonary high-performance motor vehicle emissions in the exhaust gas purification PGM Catalysts, Li-ion battery and Nd magnets recycling technology, such as pre- and post-processing techniques to classify technology, pre-urban mining technology mechanical separation by screening techniques under development, the study and post-processing technology has, pyro and hydro metallurgical smelting technology is established. Waste Recycling in terms of economic efficiency of mechanical components for the intensive study of screening techniques is needed.

Development of BIM-based Work Process Model in Construction Phase (시공단계의 BIM기반 건설사업관리 업무절차 모델 개발)

  • Yu, Yongsin;Jeong, Jiseong;Jung, Insu;Yoon, Hobin;Lee, Chansik
    • Korean Journal of Construction Engineering and Management
    • /
    • v.14 no.1
    • /
    • pp.133-143
    • /
    • 2013
  • BIM can be utilized variously in construction management(CM) in the respect that it helps to manage comprehensively the construction information and make reliable decisions, but the adoption of BIM is insufficient in the CM area. The purpose of this study is to develop work process models and their guides in order to utilize BIM effectively in CM work at construction stage. This study defined BIM functions as 'BIM converting design', 'Model review', 'Data extraction', 'Automatic estimate', '4D simulation', 'Drawing creation', 'Engineering sector linkage analysis' through literature search, and generated CM works applicable to BIM by analyzing the CM work and process. This study developed BIM-based CM work process models by reconstructing the existing work process in connection with BIM function through an analysis on the relationship between BIM function and CM work, and reconstructing the role of each project participants. In order to improve the usefulness of the developed models, guides that described the BIM works of project participants were prepared through interviews and case studies. To validate the utilization of the models, a comparative analysis on the BIM process of precedent studies was also made and a survey was conducted on experts. This study can contribute to increasing the utilization of BIM in the CM area and can be helpful for CM companies to develop an in-house BIM guide. In the future, it will be necessary to make an assessment on the models from a business perspective through case applications and constantly update BIM-based CM work process model in consideration of the expansion of CM work due to the application of BIM.

Automatic Recommendation of Nearby Tourist Attractions related to Events (이벤트와 관련된 주변 관광지 자동 추천 알고리즘 개발)

  • Ahn, Jinhyun;Im, Dong-Hyuk
    • Journal of the Korea Academia-Industrial cooperation Society
    • /
    • v.21 no.3
    • /
    • pp.407-413
    • /
    • 2020
  • Participating in exhibitions is one of the major activities for tourists. When selecting their next travel destination after participating in an event, they use map services and social network services, such as blogs, to obtain information about tourist attractions. The map services are location-based recommendations, because they can easily retrieve information regarding nearby places. Blogs contain informative content about tourist attractions, thereby providing content-based recommendations. However, few services consider both location and content. In location-based recommendations, tourist attractions that are not related to the content of the event attended might be recommended. Content-based recommendation has a disadvantage in that events located at a distance might get recommended. We propose an algorithm that considers both location and content, based on information from the Korea Tourism Organization's Linked Open Data (LOD), Wikipedia, and a Korean dictionary. By extracting nouns from the description of a tourist attraction and then comparing them with nouns about other attractions, a content-based relationship is determined. The distance to the event is calculated based on the latitude and longitude of each tourist attraction. A weight selected by the user is used for linear combination with the content-based relationship to determine the preference order of the recommendations.