• Title/Summary/Keyword: extracting methods

Search Result 948, Processing Time 0.028 seconds

Story-based Information Retrieval (스토리 기반의 정보 검색 연구)

  • You, Eun-Soon;Park, Seung-Bo
    • Journal of Intelligence and Information Systems
    • /
    • v.19 no.4
    • /
    • pp.81-96
    • /
    • 2013
  • Video information retrieval has become a very important issue because of the explosive increase in video data from Web content development. Meanwhile, content-based video analysis using visual features has been the main source for video information retrieval and browsing. Content in video can be represented with content-based analysis techniques, which can extract various features from audio-visual data such as frames, shots, colors, texture, or shape. Moreover, similarity between videos can be measured through content-based analysis. However, a movie that is one of typical types of video data is organized by story as well as audio-visual data. This causes a semantic gap between significant information recognized by people and information resulting from content-based analysis, when content-based video analysis using only audio-visual data of low level is applied to information retrieval of movie. The reason for this semantic gap is that the story line for a movie is high level information, with relationships in the content that changes as the movie progresses. Information retrieval related to the story line of a movie cannot be executed by only content-based analysis techniques. A formal model is needed, which can determine relationships among movie contents, or track meaning changes, in order to accurately retrieve the story information. Recently, story-based video analysis techniques have emerged using a social network concept for story information retrieval. These approaches represent a story by using the relationships between characters in a movie, but these approaches have problems. First, they do not express dynamic changes in relationships between characters according to story development. Second, they miss profound information, such as emotions indicating the identities and psychological states of the characters. Emotion is essential to understanding a character's motivation, conflict, and resolution. Third, they do not take account of events and background that contribute to the story. As a result, this paper reviews the importance and weaknesses of previous video analysis methods ranging from content-based approaches to story analysis based on social network. Also, we suggest necessary elements, such as character, background, and events, based on narrative structures introduced in the literature. We extract characters' emotional words from the script of the movie Pretty Woman by using the hierarchical attribute of WordNet, which is an extensive English thesaurus. WordNet offers relationships between words (e.g., synonyms, hypernyms, hyponyms, antonyms). We present a method to visualize the emotional pattern of a character over time. Second, a character's inner nature must be predetermined in order to model a character arc that can depict the character's growth and development. To this end, we analyze the amount of the character's dialogue in the script and track the character's inner nature using social network concepts, such as in-degree (incoming links) and out-degree (outgoing links). Additionally, we propose a method that can track a character's inner nature by tracing indices such as degree, in-degree, and out-degree of the character network in a movie through its progression. Finally, the spatial background where characters meet and where events take place is an important element in the story. We take advantage of the movie script to extracting significant spatial background and suggest a scene map describing spatial arrangements and distances in the movie. Important places where main characters first meet or where they stay during long periods of time can be extracted through this scene map. In view of the aforementioned three elements (character, event, background), we extract a variety of information related to the story and evaluate the performance of the proposed method. We can track story information extracted over time and detect a change in the character's emotion or inner nature, spatial movement, and conflicts and resolutions in the story.

Comparison of Phylogenetic Characteristics of Viable but Non-Culturable (VBNC) Bacterial Populations in the Pine and Quercus Forest Soil by 16S rDNA-ARDRA (16S rDNA-ARDRA법을 이용한 소나무림과 상수리나무림 토양 내 VBNC 세균군집의 계통학적 특성 비교)

  • Han Song-Ih;Kim Youn-Ji;Whang Kyung-Sook
    • Korean Journal of Microbiology
    • /
    • v.42 no.2
    • /
    • pp.116-124
    • /
    • 2006
  • In this study was performed to analyze quantitatively the number of viable but non-culturable bacteria in the Pine and Quercus forest soil by improved direct viable count (DVC) and plate count (PC) methods. The number of living bacteria of Pine and Quercus forest soil by PC method were less then 1% of DVC method. This result showed that viable but non-culturable (VBNC) bacteria existed in the forest soil with high percentage. Diversity and structure of VBNC bacterial populations in forest soil were analyzed by direct extracting of DNA and 16S rDNA-ARDRA from Pine and Quercus forest soil. Each of them obtained 111 clones and 108 clones from Pine and Quercus forest soil. Thirty different RFLP types were detected from Pine forest soil and twenty-six different RFLP types were detected from Quercus forest soil by HeaIII. From ARDRA groups, dominant clones were selected for determining their phylogenetic characteristics based on 16S rDNA sequence. Based on the 16S rDNA sequences, dominant clones from ARDRA groups of Pine forest soil were classified into 7 major phylogenetic groups ${\alpha}$-proteobacteria (12 clones), ${\gamma}$-proteobacteria (3 clones), ${\delta}$-proteobacteria (1 clone), Flexibacter/Cytophaga (1 clone), Actinobacteria (4 clones), Acidobacteria (4 clones), Planctomycetes (5 clones). Also, dominant clones from ARDRA groups of Quercus forest soil were classified into 6 major phylogenetic groups : ${\alpha}$-proteobacte,ia (4clones), ${\gamma}$-proteobacteria (2 clones), Actinobacteria (10 clones), Acidobacteria (8 clones), Planctomycetes (1 clone), and Verrucomicobia (1 clone). Result of phylogeneric analysis of microbial community from Pine and Quercus forest soils were mostly confirmed at uncultured or unidentified bacteria, VBNC bacteria of over 99% existent in forest soil were confirmed variable composition of unknown micro-organism.

An Oceanic Current Map of the East Sea for Science Textbooks Based on Scientific Knowledge Acquired from Oceanic Measurements (해양관측을 통해 획득된 과학적 지식에 기반한 과학교과서 동해 해류도)

  • Park, Kyung-Ae;Park, Ji-Eun;Choi, Byoung-Ju;Byun, Do-Seong;Lee, Eun-Il
    • The Sea:JOURNAL OF THE KOREAN SOCIETY OF OCEANOGRAPHY
    • /
    • v.18 no.4
    • /
    • pp.234-265
    • /
    • 2013
  • Oceanic current maps in the secondary school science and earth science textbooks have played an important role in piquing students's inquisitiveness and interests in the ocean. Such maps can provide students with important opportunities to learn about oceanic currents relevant to abrupt climate change and global energy balance issues. Nevertheless, serious and diverse errors in these secondary school oceanic current maps have been discovered upon comparison with up-to-date scientific knowledge concerning oceanic currents. This study presents the fundamental methods and strategies for constructing such maps error-free, through the unification of the diverse current maps currently in the textbooks. In order to do so, we analyzed the maps found in 27 different textbooks and compared them with other up-to-date maps found in scientific journals, and developed a mapping technique for extracting digitalized quantitative information on warm and cold currents in the East Sea. We devised analysis items for the current visualization in relation to the branching features of the Tsushima Warm Current (TWC) in the Korea Strait. These analysis items include: its nearshore and offshore branches, the northern limit and distance from the coast of the East Korea Warm Current, outflow features of the TWC near the Tsugaru and Soya Straits and their returning currents, and flow patterns of the Liman Cold Current and the North Korea Cold Current. The first draft of the current map was constructed based upon the scientific knowledge and input of oceanographers based on oceanic in-situ measurements, and was corrected with the help of a questionnaire survey to the members of an oceanographic society. In addition, diverse comments have been collected from a special session of the 2013 spring meeting of the Korean Oceanographic Society to assist in the construction of an accurate current map of the East Sea which has been corrected repeatedly through in-depth discussions with oceanographers. Finally, we have obtained constructive comments and evaluations of the interim version of the current map from several well-known ocean current experts and incorporated their input to complete the map's final version. To avoid errors in the production of oceanic current maps in future textbooks, we provide the geolocation information (latitude and longitude) of the currents by digitalizing the map. This study is expected to be the first step towards the completion of an oceanographic current map suitable for secondary school textbooks, and to encourage oceanographers to take more interest in oceanic education.

A Study on Training Dataset Configuration for Deep Learning Based Image Matching of Multi-sensor VHR Satellite Images (다중센서 고해상도 위성영상의 딥러닝 기반 영상매칭을 위한 학습자료 구성에 관한 연구)

  • Kang, Wonbin;Jung, Minyoung;Kim, Yongil
    • Korean Journal of Remote Sensing
    • /
    • v.38 no.6_1
    • /
    • pp.1505-1514
    • /
    • 2022
  • Image matching is a crucial preprocessing step for effective utilization of multi-temporal and multi-sensor very high resolution (VHR) satellite images. Deep learning (DL) method which is attracting widespread interest has proven to be an efficient approach to measure the similarity between image pairs in quick and accurate manner by extracting complex and detailed features from satellite images. However, Image matching of VHR satellite images remains challenging due to limitations of DL models in which the results are depending on the quantity and quality of training dataset, as well as the difficulty of creating training dataset with VHR satellite images. Therefore, this study examines the feasibility of DL-based method in matching pair extraction which is the most time-consuming process during image registration. This paper also aims to analyze factors that affect the accuracy based on the configuration of training dataset, when developing training dataset from existing multi-sensor VHR image database with bias for DL-based image matching. For this purpose, the generated training dataset were composed of correct matching pairs and incorrect matching pairs by assigning true and false labels to image pairs extracted using a grid-based Scale Invariant Feature Transform (SIFT) algorithm for a total of 12 multi-temporal and multi-sensor VHR images. The Siamese convolutional neural network (SCNN), proposed for matching pair extraction on constructed training dataset, proceeds with model learning and measures similarities by passing two images in parallel to the two identical convolutional neural network structures. The results from this study confirm that data acquired from VHR satellite image database can be used as DL training dataset and indicate the potential to improve efficiency of the matching process by appropriate configuration of multi-sensor images. DL-based image matching techniques using multi-sensor VHR satellite images are expected to replace existing manual-based feature extraction methods based on its stable performance, thus further develop into an integrated DL-based image registration framework.

Application of Amplitude Demodulation to Acquire High-sampling Data of Total Flux Leakage for Tendon Nondestructive Estimation (덴던 비파괴평가를 위한 Total Flux Leakage에서 높은 측정빈도의 데이터를 획득하기 위한 진폭복조의 응용)

  • Joo-Hyung Lee;Imjong Kwahk;Changbin Joh;Ji-Young Choi;Kwang-Yeun Park
    • Journal of the Korea institute for structural maintenance and inspection
    • /
    • v.27 no.2
    • /
    • pp.17-24
    • /
    • 2023
  • A post-processing technique for the measurement signal of a solenoid-type sensor is introduced. The solenoid-type sensor nondestructively evaluates an external tendon of prestressed concrete using the total flux leakage (TFL) method. The TFL solenoid sensor consists of primary and secondary coils. AC electricity, with the shape of a sinusoidal function, is input in the primary coil. The signal proportional to the differential of the input is induced in the secondary coil. Because the amplitude of the induced signal is proportional to the cross-sectional area of the tendon, sectional loss of the tendon caused by ruptures or corrosion can be identified by the induced signal. Therefore, it is important to extract amplitude information from the measurement signal of the TFL sensor. Previously, the amplitude was extracted using local maxima, which is the simplest way to obtain amplitude information. However, because the sampling rate is dramatically decreased by amplitude extraction using the local maxima, the previous method places many restrictions on the direction of TFL sensor development, such as applying additional signal processing and/or artificial intelligence. Meanwhile, the proposed method uses amplitude demodulation to obtain the signal amplitude from the TFL sensor, and the sampling rate of the amplitude information is same to the raw TFL sensor data. The proposed method using amplitude demodulation provides ample freedom for development by eliminating restrictions on the first coil input frequency of the TFL sensor and the speed of applying the sensor to external tension. It also maintains a high measurement sampling rate, providing advantages for utilizing additional signal processing or artificial intelligence. The proposed method was validated through experiments, and the advantages were verified through comparison with the previous method. For example, in this study the amplitudes extracted by amplitude demodulation provided a sampling rate 100 times greater than those of the previous method. There may be differences depending on the given situation and specific equipment settings; however, in most cases, extracting amplitude information using amplitude demodulation yields more satisfactory results than previous methods.

Analysis of Micro-Sedimentary Structure Characteristics Using Ultra-High Resolution UAV Imagery: Hwangdo Tidal Flat, South Korea (초고해상도 무인항공기 영상을 이용한 한국 황도 갯벌의 미세 퇴적 구조 특성 분석)

  • Minju Kim;Won-Kyung Baek;Hoi Soo Jung;Joo-Hyung Ryu
    • Korean Journal of Remote Sensing
    • /
    • v.40 no.3
    • /
    • pp.295-305
    • /
    • 2024
  • This study aims to analyze the micro-sedimentary structures of the Hwangdo tidal flats using ultra-high resolution unmanned aerial vehicle (UAV) data. Tidal flats, located in the transitional area between land and sea, constantly change due to tidal activities and provide a unique environment important for understanding sedimentary processes and environmental conditions. Traditional field observation methods are limited in spatial and temporal coverage, and existing satellite imagery does not provide sufficient resolution to study micro-sedimentary structures. To overcome these limitations, high-resolution images of the Hwangdo tidal flats in Chungcheongnam-do were acquired using UAVs. This area has experienced significant changes in its sedimentary environment due to coastal development projects such as sea wall construction. From May 17 to 18, 2022, sediment samples were collected from 91 points during field surveys and 25 in-situ points were intensively analyzed. UAV data with a spatial resolution of approximately 0.9 mm allowed identifying and extracting parameters related to micro-sedimentary structures. For mud cracks, the length of the major axis of the polygons was extracted, and the wavelength and ripple symmetry index were extracted for ripple marks. The results of the study showed that in areas with mud content above 80%, mud cracks formed at an average major axis length of 37.3 cm. In regions with sand content above 60%, ripples with an average wavelength of 8 cm and a ripple symmetry index of 2.0 were formed. This study demonstrated that micro-sedimentary structures of tidal flats can be effectively analyzed using ultra-high resolution UAV data without field surveys. This highlights the potential of UAV technology as an important tool in environmental monitoring and coastal management and shows its usefulness in the study of sedimentary structures. In addition, the results of this study are expected to serve as baseline data for more accurate sedimentary facies classification.

Region of Interest Extraction and Bilinear Interpolation Application for Preprocessing of Lipreading Systems (입 모양 인식 시스템 전처리를 위한 관심 영역 추출과 이중 선형 보간법 적용)

  • Jae Hyeok Han;Yong Ki Kim;Mi Hye Kim
    • The Transactions of the Korea Information Processing Society
    • /
    • v.13 no.4
    • /
    • pp.189-198
    • /
    • 2024
  • Lipreading is one of the important parts of speech recognition, and several studies have been conducted to improve the performance of lipreading in lipreading systems for speech recognition. Recent studies have used method to modify the model architecture of lipreading system to improve recognition performance. Unlike previous research that improve recognition performance by modifying model architecture, we aim to improve recognition performance without any change in model architecture. In order to improve the recognition performance without modifying the model architecture, we refer to the cues used in human lipreading and set other regions such as chin and cheeks as regions of interest along with the lip region, which is the existing region of interest of lipreading systems, and compare the recognition rate of each region of interest to propose the highest performing region of interest In addition, assuming that the difference in normalization results caused by the difference in interpolation method during the process of normalizing the size of the region of interest affects the recognition performance, we interpolate the same region of interest using nearest neighbor interpolation, bilinear interpolation, and bicubic interpolation, and compare the recognition rate of each interpolation method to propose the best performing interpolation method. Each region of interest was detected by training an object detection neural network, and dynamic time warping templates were generated by normalizing each region of interest, extracting and combining features, and mapping the dimensionality reduction of the combined features into a low-dimensional space. The recognition rate was evaluated by comparing the distance between the generated dynamic time warping templates and the data mapped to the low-dimensional space. In the comparison of regions of interest, the result of the region of interest containing only the lip region showed an average recognition rate of 97.36%, which is 3.44% higher than the average recognition rate of 93.92% in the previous study, and in the comparison of interpolation methods, the bilinear interpolation method performed 97.36%, which is 14.65% higher than the nearest neighbor interpolation method and 5.55% higher than the bicubic interpolation method. The code used in this study can be found a https://github.com/haraisi2/Lipreading-Systems.

Visualizing the Results of Opinion Mining from Social Media Contents: Case Study of a Noodle Company (소셜미디어 콘텐츠의 오피니언 마이닝결과 시각화: N라면 사례 분석 연구)

  • Kim, Yoosin;Kwon, Do Young;Jeong, Seung Ryul
    • Journal of Intelligence and Information Systems
    • /
    • v.20 no.4
    • /
    • pp.89-105
    • /
    • 2014
  • After emergence of Internet, social media with highly interactive Web 2.0 applications has provided very user friendly means for consumers and companies to communicate with each other. Users have routinely published contents involving their opinions and interests in social media such as blogs, forums, chatting rooms, and discussion boards, and the contents are released real-time in the Internet. For that reason, many researchers and marketers regard social media contents as the source of information for business analytics to develop business insights, and many studies have reported results on mining business intelligence from Social media content. In particular, opinion mining and sentiment analysis, as a technique to extract, classify, understand, and assess the opinions implicit in text contents, are frequently applied into social media content analysis because it emphasizes determining sentiment polarity and extracting authors' opinions. A number of frameworks, methods, techniques and tools have been presented by these researchers. However, we have found some weaknesses from their methods which are often technically complicated and are not sufficiently user-friendly for helping business decisions and planning. In this study, we attempted to formulate a more comprehensive and practical approach to conduct opinion mining with visual deliverables. First, we described the entire cycle of practical opinion mining using Social media content from the initial data gathering stage to the final presentation session. Our proposed approach to opinion mining consists of four phases: collecting, qualifying, analyzing, and visualizing. In the first phase, analysts have to choose target social media. Each target media requires different ways for analysts to gain access. There are open-API, searching tools, DB2DB interface, purchasing contents, and so son. Second phase is pre-processing to generate useful materials for meaningful analysis. If we do not remove garbage data, results of social media analysis will not provide meaningful and useful business insights. To clean social media data, natural language processing techniques should be applied. The next step is the opinion mining phase where the cleansed social media content set is to be analyzed. The qualified data set includes not only user-generated contents but also content identification information such as creation date, author name, user id, content id, hit counts, review or reply, favorite, etc. Depending on the purpose of the analysis, researchers or data analysts can select a suitable mining tool. Topic extraction and buzz analysis are usually related to market trends analysis, while sentiment analysis is utilized to conduct reputation analysis. There are also various applications, such as stock prediction, product recommendation, sales forecasting, and so on. The last phase is visualization and presentation of analysis results. The major focus and purpose of this phase are to explain results of analysis and help users to comprehend its meaning. Therefore, to the extent possible, deliverables from this phase should be made simple, clear and easy to understand, rather than complex and flashy. To illustrate our approach, we conducted a case study on a leading Korean instant noodle company. We targeted the leading company, NS Food, with 66.5% of market share; the firm has kept No. 1 position in the Korean "Ramen" business for several decades. We collected a total of 11,869 pieces of contents including blogs, forum contents and news articles. After collecting social media content data, we generated instant noodle business specific language resources for data manipulation and analysis using natural language processing. In addition, we tried to classify contents in more detail categories such as marketing features, environment, reputation, etc. In those phase, we used free ware software programs such as TM, KoNLP, ggplot2 and plyr packages in R project. As the result, we presented several useful visualization outputs like domain specific lexicons, volume and sentiment graphs, topic word cloud, heat maps, valence tree map, and other visualized images to provide vivid, full-colored examples using open library software packages of the R project. Business actors can quickly detect areas by a swift glance that are weak, strong, positive, negative, quiet or loud. Heat map is able to explain movement of sentiment or volume in categories and time matrix which shows density of color on time periods. Valence tree map, one of the most comprehensive and holistic visualization models, should be very helpful for analysts and decision makers to quickly understand the "big picture" business situation with a hierarchical structure since tree-map can present buzz volume and sentiment with a visualized result in a certain period. This case study offers real-world business insights from market sensing which would demonstrate to practical-minded business users how they can use these types of results for timely decision making in response to on-going changes in the market. We believe our approach can provide practical and reliable guide to opinion mining with visualized results that are immediately useful, not just in food industry but in other industries as well.

Construction of Event Networks from Large News Data Using Text Mining Techniques (텍스트 마이닝 기법을 적용한 뉴스 데이터에서의 사건 네트워크 구축)

  • Lee, Minchul;Kim, Hea-Jin
    • Journal of Intelligence and Information Systems
    • /
    • v.24 no.1
    • /
    • pp.183-203
    • /
    • 2018
  • News articles are the most suitable medium for examining the events occurring at home and abroad. Especially, as the development of information and communication technology has brought various kinds of online news media, the news about the events occurring in society has increased greatly. So automatically summarizing key events from massive amounts of news data will help users to look at many of the events at a glance. In addition, if we build and provide an event network based on the relevance of events, it will be able to greatly help the reader in understanding the current events. In this study, we propose a method for extracting event networks from large news text data. To this end, we first collected Korean political and social articles from March 2016 to March 2017, and integrated the synonyms by leaving only meaningful words through preprocessing using NPMI and Word2Vec. Latent Dirichlet allocation (LDA) topic modeling was used to calculate the subject distribution by date and to find the peak of the subject distribution and to detect the event. A total of 32 topics were extracted from the topic modeling, and the point of occurrence of the event was deduced by looking at the point at which each subject distribution surged. As a result, a total of 85 events were detected, but the final 16 events were filtered and presented using the Gaussian smoothing technique. We also calculated the relevance score between events detected to construct the event network. Using the cosine coefficient between the co-occurred events, we calculated the relevance between the events and connected the events to construct the event network. Finally, we set up the event network by setting each event to each vertex and the relevance score between events to the vertices connecting the vertices. The event network constructed in our methods helped us to sort out major events in the political and social fields in Korea that occurred in the last one year in chronological order and at the same time identify which events are related to certain events. Our approach differs from existing event detection methods in that LDA topic modeling makes it possible to easily analyze large amounts of data and to identify the relevance of events that were difficult to detect in existing event detection. We applied various text mining techniques and Word2vec technique in the text preprocessing to improve the accuracy of the extraction of proper nouns and synthetic nouns, which have been difficult in analyzing existing Korean texts, can be found. In this study, the detection and network configuration techniques of the event have the following advantages in practical application. First, LDA topic modeling, which is unsupervised learning, can easily analyze subject and topic words and distribution from huge amount of data. Also, by using the date information of the collected news articles, it is possible to express the distribution by topic in a time series. Second, we can find out the connection of events in the form of present and summarized form by calculating relevance score and constructing event network by using simultaneous occurrence of topics that are difficult to grasp in existing event detection. It can be seen from the fact that the inter-event relevance-based event network proposed in this study was actually constructed in order of occurrence time. It is also possible to identify what happened as a starting point for a series of events through the event network. The limitation of this study is that the characteristics of LDA topic modeling have different results according to the initial parameters and the number of subjects, and the subject and event name of the analysis result should be given by the subjective judgment of the researcher. Also, since each topic is assumed to be exclusive and independent, it does not take into account the relevance between themes. Subsequent studies need to calculate the relevance between events that are not covered in this study or those that belong to the same subject.

An Intelligence Support System Research on KTX Rolling Stock Failure Using Case-based Reasoning and Text Mining (사례기반추론과 텍스트마이닝 기법을 활용한 KTX 차량고장 지능형 조치지원시스템 연구)

  • Lee, Hyung Il;Kim, Jong Woo
    • Journal of Intelligence and Information Systems
    • /
    • v.26 no.1
    • /
    • pp.47-73
    • /
    • 2020
  • KTX rolling stocks are a system consisting of several machines, electrical devices, and components. The maintenance of the rolling stocks requires considerable expertise and experience of maintenance workers. In the event of a rolling stock failure, the knowledge and experience of the maintainer will result in a difference in the quality of the time and work to solve the problem. So, the resulting availability of the vehicle will vary. Although problem solving is generally based on fault manuals, experienced and skilled professionals can quickly diagnose and take actions by applying personal know-how. Since this knowledge exists in a tacit form, it is difficult to pass it on completely to a successor, and there have been studies that have developed a case-based rolling stock expert system to turn it into a data-driven one. Nonetheless, research on the most commonly used KTX rolling stock on the main-line or the development of a system that extracts text meanings and searches for similar cases is still lacking. Therefore, this study proposes an intelligence supporting system that provides an action guide for emerging failures by using the know-how of these rolling stocks maintenance experts as an example of problem solving. For this purpose, the case base was constructed by collecting the rolling stocks failure data generated from 2015 to 2017, and the integrated dictionary was constructed separately through the case base to include the essential terminology and failure codes in consideration of the specialty of the railway rolling stock sector. Based on a deployed case base, a new failure was retrieved from past cases and the top three most similar failure cases were extracted to propose the actual actions of these cases as a diagnostic guide. In this study, various dimensionality reduction measures were applied to calculate similarity by taking into account the meaningful relationship of failure details in order to compensate for the limitations of the method of searching cases by keyword matching in rolling stock failure expert system studies using case-based reasoning in the precedent case-based expert system studies, and their usefulness was verified through experiments. Among the various dimensionality reduction techniques, similar cases were retrieved by applying three algorithms: Non-negative Matrix Factorization(NMF), Latent Semantic Analysis(LSA), and Doc2Vec to extract the characteristics of the failure and measure the cosine distance between the vectors. The precision, recall, and F-measure methods were used to assess the performance of the proposed actions. To compare the performance of dimensionality reduction techniques, the analysis of variance confirmed that the performance differences of the five algorithms were statistically significant, with a comparison between the algorithm that randomly extracts failure cases with identical failure codes and the algorithm that applies cosine similarity directly based on words. In addition, optimal techniques were derived for practical application by verifying differences in performance depending on the number of dimensions for dimensionality reduction. The analysis showed that the performance of the cosine similarity was higher than that of the dimension using Non-negative Matrix Factorization(NMF) and Latent Semantic Analysis(LSA) and the performance of algorithm using Doc2Vec was the highest. Furthermore, in terms of dimensionality reduction techniques, the larger the number of dimensions at the appropriate level, the better the performance was found. Through this study, we confirmed the usefulness of effective methods of extracting characteristics of data and converting unstructured data when applying case-based reasoning based on which most of the attributes are texted in the special field of KTX rolling stock. Text mining is a trend where studies are being conducted for use in many areas, but studies using such text data are still lacking in an environment where there are a number of specialized terms and limited access to data, such as the one we want to use in this study. In this regard, it is significant that the study first presented an intelligent diagnostic system that suggested action by searching for a case by applying text mining techniques to extract the characteristics of the failure to complement keyword-based case searches. It is expected that this will provide implications as basic study for developing diagnostic systems that can be used immediately on the site.