• 제목/요약/키워드: text input

검색결과 355건 처리시간 0.018초

Lightweight Named Entity Extraction for Korean Short Message Service Text

  • Seon, Choong-Nyoung;Yoo, Jin-Hwan;Kim, Hark-Soo;Kim, Ji-Hwan;Seo, Jung-Yun
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • 제5권3호
    • /
    • pp.560-574
    • /
    • 2011
  • In this paper, we propose a hybrid method of Machine Learning (ML) algorithm and a rule-based algorithm to implement a lightweight Named Entity (NE) extraction system for Korean SMS text. NE extraction from Korean SMS text is a challenging theme due to the resource limitation on a mobile phone, corruptions in input text, need for extension to include personal information stored in a mobile phone, and sparsity of training data. The proposed hybrid method retaining the advantages of statistical ML and rule-based algorithms provides fully-automated procedures for the combination of ML approaches and their correction rules using a threshold-based soft decision function. The proposed method is applied to Korean SMS texts to extract person's names as well as location names which are key information in personal appointment management system. Our proposed system achieved 80.53% in F-measure in this domain, superior to those of the conventional ML approaches.

비정형 데이터 분석을 통한 금융소비자 유형화 및 그에 따른 금융상품 추천 방법 (Financial Instruments Recommendation based on Classification Financial Consumer by Text Mining Techniques)

  • 이재웅;김영식;권오병
    • 한국IT서비스학회지
    • /
    • 제15권4호
    • /
    • pp.1-24
    • /
    • 2016
  • With the innovation of information technology, non-face-to-face robo advisor with high accessibility and convenience is spreading. The current robot advisor recommends appropriate investment products after understanding the investment propensity based on the structured data entered directly or indirectly by individuals. However, it is an inconvenient and obtrusive way for financial consumers to inquire or input their own subjective propensity to invest. Hence, this study proposes a way to deduce the propensity to invest in unstructured data that customers voluntarily exposed during consultation or online. Since prediction performance based on unstructured document differs according to the characteristics of text, in this study, classification algorithm optimized for the characteristic of text left by financial consumers is selected by performing prediction performance evaluation of various learning discrimination algorithms and proposed an intelligent method that automatically recommends investment products. User tests were given to MBA students. After showing the recommended investment and list of investment products, satisfaction was asked. Financial consumers' satisfaction was measured by dividing them into investment propensity and recommendation goods. The results suggest that the users high satisfaction with investment products recommended by the method proposed in this paper. The results showed that it can be applies to non-face-to-face robo advisor.

Syntactic Structured Framework for Resolving Reflexive Anaphora in Urdu Discourse Using Multilingual NLP

  • Nasir, Jamal A.;Din, Zia Ud.
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • 제15권4호
    • /
    • pp.1409-1425
    • /
    • 2021
  • In wide-ranging information society, fast and easy access to information in language of one's choice is indispensable, which may be provided by using various multilingual Natural Language Processing (NLP) applications. Natural language text contains references among different language elements, called anaphoric links. Resolving anaphoric links is a key problem in NLP. Anaphora resolution is an essential part of NLP applications. Anaphoric links need to be properly interpreted for clear understanding of natural languages. For this purpose, a mechanism is desirable for the identification and resolution of these naturally occurring anaphoric links. In this paper, a framework based on Hobbs syntactic approach and a system developed by Lappin & Leass is proposed for resolution of reflexive anaphoric links, present in Urdu text documents. Generally, anaphora resolution process takes three main steps: identification of the anaphor, location of the candidate antecedent(s) and selection of the appropriate antecedent. The proposed framework is based on exploring the syntactic structure of reflexive anaphors to find out various features for constructing heuristic rules to develop an algorithm for resolving these anaphoric references. System takes Urdu text containing reflexive anaphors as input, and outputs Urdu text with resolved reflexive anaphoric links. Despite having scarcity of Urdu resources, our results are encouraging. The proposed framework can be utilized in multilingual NLP (m-NLP) applications.

DCT와 정보 화소 밀도를 이용한 PDA로 획득한 명함 영상에서의 영역 해석 (Region Analysis of Business Card Images Acquired in PDA Using DCT and Information Pixel Density)

  • 김종흔;장익훈;김남철
    • 한국통신학회논문지
    • /
    • 제29권8C호
    • /
    • pp.1159-1174
    • /
    • 2004
  • 본 논문에서는 PDA에 장착된 카메라를 사용하여 획득한 명함 영상에 대한 효율적인 영역 해석 알고리듬을 제안한다. 제안된 방법은 크게 영역 분할, 정보 영역 분류, 문자 영역 분류의 3개 과정으로 구성된다. 영역 분할에서는 입력 명함 영상을 8${\times}$8 크기의 블록으로 나누고 각 블록을 저주파 대역에서의 정규화 된 DCT 계수의 에너지를 이용하여 정보 블록과 배경 블록으로 분류한 다음, 블록에 대한 영역 라벨링을 통하여 정보 영역과 배경 영역으로 분할한다. 정보 영역 분류에서는 각 정보 영역을 블록 신호의 수평, 수직 방향 에지 성분과 저주파 대역에서의 DCT 계수의 에너지 비와 이진화 된 정보 영역 내에서의 흑화소인 정보 화소의 밀도를 이용하여 문자 영역과 배경 영역으로 분류한다. 문자 영역 분류에서는 분류된 문자 영역을 정보 화소의 밀도와 평균 런 길이를 이용하여 다시 큰 문자와 작은 문자 영역으로 분류한다. 실험결과 제안된 영역 해석 방법은 여러 종류의 명함을 다양한 주변 여건에서 PDA로 획득한 시험 영상에 대하여 정보 영역과 배경 영역을 잘 분할하고, 정보 영역을 문자 영역과 그림 영역으로 잘 분류하며, 다시 문자 영역을 큰 문자와 작은 문자 영역으로 잘 분류함을 보였다 그리고 제안된 영역 분할 방법과 정보 영역 분류 방법은 기존의 방법들보다 각각 약 2.2-10.1%와 7.7%의 에러율 향상을 보였다.

A Three-Set Type Korean Keyboard Model, 38K, with High Compatibility to the KS Computer Keyboard

  • Kim, Kuk
    • 대한인간공학회지
    • /
    • 제33권5호
    • /
    • pp.355-363
    • /
    • 2014
  • Objective:The purpose of this study is to design a three-set type (Sebulsik) keyboard that is to input Korean text with no shifted keys and also compatible with the standard Korean computer keyboard or ANSI keyboard. Background: The KS computer keyboard is two-set type (Dubulsik). Existing and proposed designs of three-set type of past studies are not compatible with KS or ANSI keyboard and are complex with many redundant letters. Method: The number of Korean letters for 3-set type is analyzed. Then Korean letters are arranged with normality and with spatial compatibility to the KS Korean keyboard, and symbols were arranged to same positions with ANSI keyboard. Results: Initial consonants of 14 numbers and 6 vowels are arranged as exactly same positions of KS keyboard, and other vowels are arranged with spatial compatibility. Symbols are arranged to the same positions with ANSI keyboard, and 10 digits are confirmed and has compatibility to International standard. Conclusion: A 38-key model, 38K, is designed to require minimal keys to input Korean text with no shifted keys, increased the compatibility to the KS Korean computer keyboard. Application: Using the proposed 38-key model, 38K, it can be taken into account for keyboards in industrial production. It is applicable to user group of 3-set type Korean keyboard with more easy than past keyboards.

DSP 임베디드 숫자-점자 변환 영상처리 알고리즘의 구현 (Implementation of DSP Embedded Number-Braille Conversion Algorithm based on Image Processing)

  • 채진영;우다라;김원호
    • 한국위성정보통신학회논문지
    • /
    • 제11권2호
    • /
    • pp.14-17
    • /
    • 2016
  • 본 논문은 시각 장애인들을 위해 영상처리 기반의 숫자-자동 점자 변환기의 설계 및 구현에 관한 내용을 기술한다. 영상처리 기반의 숫자-점자 변환 알고리즘은 카메라로 획득한 입력 영상을 이진 영상화 한 다음, 문자 영역을 팽창과 라벨링 연산을 수행하고 저장되어 있는 문자 패턴 영상과 상호 상관도를 계산하여 해당되는 점자로 변환한다. 컴퓨터 시뮬레이션을 통하여 제안한 알고리즘을 모의실험한 결과, A4 용지에 인쇄된 숫자(0-9)에 대하여 91.8% 변환 성공률을 보여 주었고, DSP 영상처리 보드에 구현한 시제품 시험을 통하여 90% 변환 성능을 확인함으로서 구현된 숫자-자동 점자 변환기의 실용화 가능성을 확인하였다.

A graphical user interface for stand-alone and mixed-type modelling of reinforced concrete structures

  • Sadeghian, Vahid;Vecchio, Frank
    • Computers and Concrete
    • /
    • 제16권2호
    • /
    • pp.287-309
    • /
    • 2015
  • FormWorks-Plus is a generalized public domain user-friendly preprocessor developed to facilitate the process of creating finite element models for structural analysis programs. The lack of a graphical user interface in most academic analysis programs forces users to input the structural model information into the standard text files, which is a time-consuming and error-prone process. FormWorks-Plus enables engineers to conveniently set up the finite element model in a graphical environment, eliminating the problems associated with conventional input text files and improving the user's perception of the application. In this paper, a brief overview of the FormWorks-Plus structure is presented, followed by a detailed explanation of the main features of the program. In addition, demonstration is made of the application of FormWorks-Plus in combination with VecTor programs, advanced nonlinear analysis tools for reinforced concrete structures. Finally, aspects relating to the modelling and analysis of three case studies are discussed: a reinforced concrete beam-column joint, a steel-concrete composite shear wall, and a SFRC shear panel. The unique mixed-type frame-membrane modelling procedure implemented in FormWorks-Plus can address the limitations associated with most frame type analyses.

A practical application of cluster analysis using SPSS

  • Kim, Dae-Hak
    • Journal of the Korean Data and Information Science Society
    • /
    • 제20권6호
    • /
    • pp.1207-1212
    • /
    • 2009
  • Basic objective in cluster analysis is to discover natural groupings of items or variables. In general, clustering is conducted based on some similarity (or dissimilarity) matrix or the original input text data. Various measures of similarities (or dissimilarities) between objects (or variables) are developed. We introduce a real application problem of clustering procedure in SPSS when the distance matrix of the objects (or variables) is only given as an input data. It will be very helpful for the cluster analysis of huge data set which leads the size of the proximity matrix greater than 1000, particularly. Syntax command for matrix input data in SPSS for clustering is given with numerical examples.

  • PDF

새로운 입력장치 개발을 위한 숙련자의 타이핑 동작에 관한 실험적 연구 (Experimental Investigation on Skilled Human′s Typing Pattern for Development of New Input Device)

  • 김진영;최혁렬;이호길
    • 제어로봇시스템학회논문지
    • /
    • 제9권9호
    • /
    • pp.720-726
    • /
    • 2003
  • A virtual keyboard may be efficient as a new mobile input device supporting QWERTY keyboard layout. As a preliminary study for developing a virtual keyboard, the typing pattern of a skilled human is investigated. In the study the touch-positions of the fingers are measured with a touchscreen while five skilled typists perform typing of long sentences. From these measurements it can be observed that the groups of touch-positions are classified into alphabetic characters. Though there are some mismatches, we can find constant distances capable of being discriminated among the groups. Based on the analysis the prediction algorithm of the constant distance is proposed and evaluated, which is useful for realization of a portable virtual keyboard.

Matching Algorithm for Hangul Recognition Based on PDA

  • Kim Hyeong-Gyun;Choi Gwang-Mi
    • Journal of information and communication convergence engineering
    • /
    • 제2권3호
    • /
    • pp.161-166
    • /
    • 2004
  • Electronic Ink is a stored data in the form of the handwritten text or the script without converting it into ASCII by handwritten recognition on the pen-based computers and Personal Digital Assistants(PDA) for supporting natural and convenient data input. One of the most important issue is to search the electronic ink in order to use it. We proposed and implemented a script matching algorithm for the electronic ink. Proposed matching algorithm separated the input stroke into a set of primitive stroke using the curvature of the stroke curve. After determining the type of separated strokes, it produced a stroke feature vector. And then it calculated the distance between the stroke feature vector of input strokes and one of strokes in the database using the dynamic programming technique.