• Title/Summary/Keyword: source text

Search Result 267, Processing Time 0.027 seconds

Inferring Undiscovered Public Knowledge by Using Text Mining-driven Graph Model (텍스트 마이닝 기반의 그래프 모델을 이용한 미발견 공공 지식 추론)

  • Heo, Go Eun;Song, Min
    • Journal of the Korean Society for information Management
    • /
    • v.31 no.1
    • /
    • pp.231-250
    • /
    • 2014
  • Due to the recent development of Information and Communication Technologies (ICT), the amount of research publications has increased exponentially. In response to this rapid growth, the demand of automated text processing methods has risen to deal with massive amount of text data. Biomedical text mining discovering hidden biological meanings and treatments from biomedical literatures becomes a pivotal methodology and it helps medical disciplines reduce the time and cost. Many researchers have conducted literature-based discovery studies to generate new hypotheses. However, existing approaches either require intensive manual process of during the procedures or a semi-automatic procedure to find and select biomedical entities. In addition, they had limitations of showing one dimension that is, the cause-and-effect relationship between two concepts. Thus;this study proposed a novel approach to discover various relationships among source and target concepts and their intermediate concepts by expanding intermediate concepts to multi-levels. This study provided distinct perspectives for literature-based discovery by not only discovering the meaningful relationship among concepts in biomedical literature through graph-based path interference but also being able to generate feasible new hypotheses.

Equivalence in Translation and its Components (등가를 통한 번역의 이론과 구성 요소 분석)

  • PARK, Jung-Joon
    • Cross-Cultural Studies
    • /
    • v.19
    • /
    • pp.251-270
    • /
    • 2010
  • The subject of the paper is to discern the validity of the translation theory put forward by the ESIT(Ecole Sup?rieur d'Interpr?tes et de Tranducteurs, Universit? Paris III) and how it differentiates from the other translation theories. First, the paper will analyze the theoretical aspects put forward by examining the equivalence that may be discerned between the french and korean translation in relation to the original english text that is being translated. Employing the equivalence in translation may shed new insights into the unterminable discussions we witness today between the literal translation and the free translation. Contrary to the formal equivalence the dynamic equivalence by Nida suggests that the messages retain the same meanings whether it be the original or a translated text to the/for the reader. In short, the object of the dynamic equivalence is to identify the closest equivalence to the suggested source language. The concept of correspondence and equivalence defined by theoriticians of translation falls to the domain of dynamic equivalence suggested by Nida. In translation theory the domain of usage of language and the that of discourse is denoted separately. by usage one denotes the translation through symbols that make up language itself. In contrast to this, the discourse is suggestive of defining the newly created expressions which may be denoted as being a creative equivalence which embodies the original message for the singular situation at hand. The translator will however find oneself incorporating the two opposing theories in translating. Translation falls under the criteria of text and not of language, thus one cannot regulate or foresee any special circumstances that may arise in translation of discourse, the translation to reflect this condition should always be delimited. All other translation should be subject to translation by equivalence. The interpretation theory in translation (of ESIT) in effect is relative to both the empirical and philosophical approach and is suggestive of new perspective in translation. In conclusion, the above suggested translation theory is different from the skopos theory and the polysystem theory in that it only takes in to account the elements that are in close relation to the original text, and also that it was developed for educational purposes opening new perspectives in the domain of translation theories.

A Study on the Use of Stopword Corpus for Cleansing Unstructured Text Data (비정형 텍스트 데이터 정제를 위한 불용어 코퍼스의 활용에 관한 연구)

  • Lee, Won-Jo
    • The Journal of the Convergence on Culture Technology
    • /
    • v.8 no.6
    • /
    • pp.891-897
    • /
    • 2022
  • In big data analysis, raw text data mostly exists in various unstructured data forms, so it becomes a structured data form that can be analyzed only after undergoing heuristic pre-processing and computer post-processing cleansing. Therefore, in this study, unnecessary elements are purified through pre-processing of the collected raw data in order to apply the wordcloud of R program, which is one of the text data analysis techniques, and stopwords are removed in the post-processing process. Then, a case study of wordcloud analysis was conducted, which calculates the frequency of occurrence of words and expresses words with high frequency as key issues. In this study, to improve the problems of the "nested stopword source code" method, which is the existing stopword processing method, using the word cloud technique of R, we propose the use of "general stopword corpus" and "user-defined stopword corpus" and conduct case analysis. The advantages and disadvantages of the proposed "unstructured data cleansing process model" are comparatively verified and presented, and the practical application of word cloud visualization analysis using the "proposed external corpus cleansing technique" is presented.

Research on the Financial Data Fraud Detection of Chinese Listed Enterprises by Integrating Audit Opinions

  • Leiruo Zhou;Yunlong Duan;Wei Wei
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.17 no.12
    • /
    • pp.3218-3241
    • /
    • 2023
  • Financial fraud undermines the sustainable development of financial markets. Financial statements can be regarded as the key source of information to obtain the operating conditions of listed companies. Current research focuses more on mining financial digital data instead of looking into text data. However, text data can reveal emotional information, which is an important basis for detecting financial fraud. The audit opinion of the financial statement is especially the fair opinion of a certified public accountant on the quality of enterprise financial reports. Therefore, this research was carried out by using the data features of 4,153 listed companies' financial annual reports and audits of text opinions in the past six years, and the paper puts forward a financial fraud detection model integrating audit opinions. First, the financial data index database and audit opinion text database were built. Second, digitized audit opinions with deep learning Bert model was employed. Finally, both the extracted audit numerical characteristics and the financial numerical indicators were used as the training data of the LightGBM model. What is worth paying attention to is that the imbalanced distribution of sample labels is also one of the focuses of financial fraud research. To solve this problem, data enhancement and Focal Loss feature learning functions were used in data processing and model training respectively. The experimental results show that compared with the conventional financial fraud detection model, the performance of the proposed model is improved greatly, with Area Under the Curve (AUC) and Accuracy reaching 81.42% and 78.15%, respectively.

Layout Analysis for Calculation of Web Page Similarity as Image

  • Mitsuhashi, Noriaki;Yamaguchi, Toru;Takama, Yasufumi
    • Proceedings of the Korean Institute of Intelligent Systems Conference
    • /
    • 2003.09a
    • /
    • pp.142-145
    • /
    • 2003
  • When we search information on the Web using search engines, they only analyze the text information collected from the source files of Web pages. However, there is a limit to analyze the layout of a Web page only from its source file, although Web page design is the most important factor for a user to estimate a page. In particular it often happens on the Web that the pages of similar design ofter similar information. We propose a method to analyze layout for comparing the design of pages by treating the displayed page as image.

  • PDF

Text/Voice Recognition & Translation Application Development Using Open-Source (오픈소스를 이용한 문자/음성 인식 및 번역 앱 개발)

  • Yun, Tae-Jin;Seo, Hyo-Jong;Kim, Do-Heon
    • Proceedings of the Korean Society of Computer Information Conference
    • /
    • 2017.07a
    • /
    • pp.425-426
    • /
    • 2017
  • 본 논문에서는 Google에서 지원하는 오픈소스인 Tesseract-OCR을 이용한 문자/음성 인식 및 번역 앱에 대해 제안한다. 최근 한국어를 포함한 외국어 인식과 번역기능을 이용한 다양한 스마트폰 앱이 개발되어 여행에 필수품으로 자리잡고 있다. 스마트폰의 카메라기능을 이용하여 촬영한 영상을 인식률을 높이도록 처리하고, Crop기능을 넣어 부분 인식기능을 지원하며, Tesseract-OCR의 train data를 보완하여 인식률을 높이고, Google 음성인식 API를 이용한 음성인식 기능을 통해 인식된 유사한 문장들을 선택하도록 하고, 이를 번역하고 보여주도록 개발하였다. 번역 기능은 번역대상 언어와 번역할 언어를 선택할 수 있고 기본적으로 영어, 한국어, 일본어, 중국어로 번역이 가능하다. 이 기능을 이용하여 차량번호 인식, 사진에 포함된 글자를 통한 검색 등 다양한 응용분야에 맞게 앱을 개발할 수 있다.

  • PDF

Static Analysis Tools Against Cross-site Scripting Vulnerabilities in Web Applications : An Analysis

  • Talib, Nurul Atiqah Abu;Doh, Kyung-Goo
    • Journal of Software Assessment and Valuation
    • /
    • v.17 no.2
    • /
    • pp.125-142
    • /
    • 2021
  • Reports of rampant cross-site scripting (XSS) vulnerabilities raise growing concerns on the effectiveness of current Static Analysis Security Testing (SAST) tools as an internet security device. Attentive to these concerns, this study aims to examine seven open-source SAST tools in order to account for their capabilities in detecting XSS vulnerabilities in PHP applications and to determine their performance in terms of effectiveness and analysis runtime. The representative tools - categorized as either text-based or graph-based analysis tools - were all test-run using real-world PHP applications with known XSS vulnerabilities. The collected vulnerability detection reports of each tool were analyzed with the aid of PhpStorm's data flow analyzer. It is observed that the detection rates of the tools calculated from the total vulnerabilities in the applications can be as high as 0.968 and as low as 0.006. Furthermore, the tools took an average of less than a minute to complete an analysis. Notably, their runtime is independent of their analysis type.

Low polygon game character modeling and Character Primitives manufacture (로우폴리곤 게임 캐릭터 모델링 및 Character Primitives 제작)

  • Kang, Sung-Jung;Kim, Sang-Jin;Lee, Seung-Hyun
    • Journal of the Korea Computer Industry Society
    • /
    • v.7 no.5
    • /
    • pp.573-582
    • /
    • 2006
  • The game is in progress according to the game story with the text, graphic, animation, motion picture, music, etc. Also the result of the game varies depending on the strategy and tactics of the player. For the development of the game, this paper describes the task of the game planner, game programmer, and game graphic designer. Game graphic designers are classified into 4 parts such as the art director, original picture designer, 2D designer, and 3D designer. Among these, the 3D designer makes the 3D game characters with the use of 3D tools. This paper presents the method that 3D designers and beginners can develop 3D characters easily and quickly, Also, this paper shows the method for making preparations of SourceModel which includes 150 polygons. The SourceModel is made up of between five life size and eight life size. In addition, Character Primitives Interface is made to use SourceModel in MaxScript. Accordingly 3D designers have the free use of SourceModel and they will be able to save time.

  • PDF

Effect of Information Source, Sales Promotion Type, and Impulse Buying Tendency Characteristics on Fashion Live Commerce Purchase Intention (정보원 특성, 판매촉진유형, 충동구매성향이 패션 라이브커머스 구매의도에 미치는 영향)

  • Choi, Hyun;Hwang, Sun Jin
    • Journal of Fashion Business
    • /
    • v.26 no.4
    • /
    • pp.52-63
    • /
    • 2022
  • As live commerce, mobile sales platforms based on real-time content and text are drawing attention as a new marketing channel. In particular, the fashion industry also using live commerce as a new fashion distribution channel, requiring marketing strategies to utilize it efficiently. This study attempted to verify the effect of information source, sales promotion, and impulse buying tendency characteristics on fashion live commerce purchase intention. The experimental design of this study was 2(characteristics of information source: expertise vs attractiveness) × 2(sales promotion type: value-added vs price discount) × 2(impulse buying tendency: high vs low) three-way mixed analysis of variance(ANOVA). A convenience sampling of 264 women in their 20s and 50s living in Seoul and the Gyeonggi area who had purchased products through Live Commerce was conducted. For the final analysis, 240 questionnaires were used. Data were analyzed by the SPSS 26 program and three-way ANOVA. Simple main effects analysis was conducted. The results of this study follow. First, there were statistically significant differences in purchase intention according to consumers' impulse buying tendencies and sales promotions. Second, information source and sales promotion showed statistically significant interaction effects on purchase intention. Lastly, information source, sales promotion, and impulse buying tendency showed significant three-way interaction effects on fashion live commerce purchase intention. Therefore, conducting appropriate marketing analysis can result in positive attitudes regarding live commerce products and substantive increases in sales.