• Title/Summary/Keyword: Text Input

Search Result 356, Processing Time 0.022 seconds

Lightweight Named Entity Extraction for Korean Short Message Service Text

  • Seon, Choong-Nyoung;Yoo, Jin-Hwan;Kim, Hark-Soo;Kim, Ji-Hwan;Seo, Jung-Yun
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.5 no.3
    • /
    • pp.560-574
    • /
    • 2011
  • In this paper, we propose a hybrid method of Machine Learning (ML) algorithm and a rule-based algorithm to implement a lightweight Named Entity (NE) extraction system for Korean SMS text. NE extraction from Korean SMS text is a challenging theme due to the resource limitation on a mobile phone, corruptions in input text, need for extension to include personal information stored in a mobile phone, and sparsity of training data. The proposed hybrid method retaining the advantages of statistical ML and rule-based algorithms provides fully-automated procedures for the combination of ML approaches and their correction rules using a threshold-based soft decision function. The proposed method is applied to Korean SMS texts to extract person's names as well as location names which are key information in personal appointment management system. Our proposed system achieved 80.53% in F-measure in this domain, superior to those of the conventional ML approaches.

Financial Instruments Recommendation based on Classification Financial Consumer by Text Mining Techniques (비정형 데이터 분석을 통한 금융소비자 유형화 및 그에 따른 금융상품 추천 방법)

  • Lee, Jaewoong;Kim, Young-Sik;Kwon, Ohbyung
    • Journal of Information Technology Services
    • /
    • v.15 no.4
    • /
    • pp.1-24
    • /
    • 2016
  • With the innovation of information technology, non-face-to-face robo advisor with high accessibility and convenience is spreading. The current robot advisor recommends appropriate investment products after understanding the investment propensity based on the structured data entered directly or indirectly by individuals. However, it is an inconvenient and obtrusive way for financial consumers to inquire or input their own subjective propensity to invest. Hence, this study proposes a way to deduce the propensity to invest in unstructured data that customers voluntarily exposed during consultation or online. Since prediction performance based on unstructured document differs according to the characteristics of text, in this study, classification algorithm optimized for the characteristic of text left by financial consumers is selected by performing prediction performance evaluation of various learning discrimination algorithms and proposed an intelligent method that automatically recommends investment products. User tests were given to MBA students. After showing the recommended investment and list of investment products, satisfaction was asked. Financial consumers' satisfaction was measured by dividing them into investment propensity and recommendation goods. The results suggest that the users high satisfaction with investment products recommended by the method proposed in this paper. The results showed that it can be applies to non-face-to-face robo advisor.

Syntactic Structured Framework for Resolving Reflexive Anaphora in Urdu Discourse Using Multilingual NLP

  • Nasir, Jamal A.;Din, Zia Ud.
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.15 no.4
    • /
    • pp.1409-1425
    • /
    • 2021
  • In wide-ranging information society, fast and easy access to information in language of one's choice is indispensable, which may be provided by using various multilingual Natural Language Processing (NLP) applications. Natural language text contains references among different language elements, called anaphoric links. Resolving anaphoric links is a key problem in NLP. Anaphora resolution is an essential part of NLP applications. Anaphoric links need to be properly interpreted for clear understanding of natural languages. For this purpose, a mechanism is desirable for the identification and resolution of these naturally occurring anaphoric links. In this paper, a framework based on Hobbs syntactic approach and a system developed by Lappin & Leass is proposed for resolution of reflexive anaphoric links, present in Urdu text documents. Generally, anaphora resolution process takes three main steps: identification of the anaphor, location of the candidate antecedent(s) and selection of the appropriate antecedent. The proposed framework is based on exploring the syntactic structure of reflexive anaphors to find out various features for constructing heuristic rules to develop an algorithm for resolving these anaphoric references. System takes Urdu text containing reflexive anaphors as input, and outputs Urdu text with resolved reflexive anaphoric links. Despite having scarcity of Urdu resources, our results are encouraging. The proposed framework can be utilized in multilingual NLP (m-NLP) applications.

Region Analysis of Business Card Images Acquired in PDA Using DCT and Information Pixel Density (DCT와 정보 화소 밀도를 이용한 PDA로 획득한 명함 영상에서의 영역 해석)

  • 김종흔;장익훈;김남철
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.29 no.8C
    • /
    • pp.1159-1174
    • /
    • 2004
  • In this paper, we present an efficient algorithm for region analysis of business card images acquired in a PDA by using DCT and information pixel density. The proposed method consists of three parts: region segmentation, information region classification, and text region classification. In the region segmentation, an input business card image is partitioned into 8 f8 blocks and the blocks are classified into information and background blocks using the normalized DCT energy in their low frequency bands. The input image is then segmented into information and background regions by region labeling on the classified blocks. In the information region classification, each information region is classified into picture region or text region by using a ratio of the DCT energy of horizontal and vertical edge components to that in low frequency band and a density of information pixels, that are black pixels in its binarized region. In the text region classification, each text region is classified into large character region or small character region by using the density of information pixels and an averaged horizontal and vertical run-lengths of information pixels. Experimental results show that the proposed method yields good performance of region segmentation, information region classification, and text region classification for test images of several types of business cards acquired by a PDA under various surrounding conditions. In addition, the error rates of the proposed region segmentation are about 2.2-10.1% lower than those of the conventional region segmentation methods. It is also shown that the error rates of the proposed information region classification is about 1.7% lower than that of the conventional information region classification method.

A Three-Set Type Korean Keyboard Model, 38K, with High Compatibility to the KS Computer Keyboard

  • Kim, Kuk
    • Journal of the Ergonomics Society of Korea
    • /
    • v.33 no.5
    • /
    • pp.355-363
    • /
    • 2014
  • Objective:The purpose of this study is to design a three-set type (Sebulsik) keyboard that is to input Korean text with no shifted keys and also compatible with the standard Korean computer keyboard or ANSI keyboard. Background: The KS computer keyboard is two-set type (Dubulsik). Existing and proposed designs of three-set type of past studies are not compatible with KS or ANSI keyboard and are complex with many redundant letters. Method: The number of Korean letters for 3-set type is analyzed. Then Korean letters are arranged with normality and with spatial compatibility to the KS Korean keyboard, and symbols were arranged to same positions with ANSI keyboard. Results: Initial consonants of 14 numbers and 6 vowels are arranged as exactly same positions of KS keyboard, and other vowels are arranged with spatial compatibility. Symbols are arranged to the same positions with ANSI keyboard, and 10 digits are confirmed and has compatibility to International standard. Conclusion: A 38-key model, 38K, is designed to require minimal keys to input Korean text with no shifted keys, increased the compatibility to the KS Korean computer keyboard. Application: Using the proposed 38-key model, 38K, it can be taken into account for keyboards in industrial production. It is applicable to user group of 3-set type Korean keyboard with more easy than past keyboards.

Implementation of DSP Embedded Number-Braille Conversion Algorithm based on Image Processing (DSP 임베디드 숫자-점자 변환 영상처리 알고리즘의 구현)

  • Chae, Jin-Young;Darshana, Panamulle Arachchige Udara;Kim, Won-Ho
    • Journal of Satellite, Information and Communications
    • /
    • v.11 no.2
    • /
    • pp.14-17
    • /
    • 2016
  • This paper describes the implementation of automatic number-braille converter based on image processing for the blind people. The algorithm is consists of four main steps. First step is binary image conversion of the input image obtained by the camera. the second step is segmentation operation by means of dilation and labelling of the character. Next step is calculation of cross-correlation between segmented text image and pre-defined text-pattern image. The final step is generation of brail output which is relevant to input image. The computer simulation result was showing 91.8% correct conversion rate for arabian numbers which is printed in A4-sheet and practical possibility was also confirmed by using implemented automatic number-braille converter based on DSP image processing board.

A graphical user interface for stand-alone and mixed-type modelling of reinforced concrete structures

  • Sadeghian, Vahid;Vecchio, Frank
    • Computers and Concrete
    • /
    • v.16 no.2
    • /
    • pp.287-309
    • /
    • 2015
  • FormWorks-Plus is a generalized public domain user-friendly preprocessor developed to facilitate the process of creating finite element models for structural analysis programs. The lack of a graphical user interface in most academic analysis programs forces users to input the structural model information into the standard text files, which is a time-consuming and error-prone process. FormWorks-Plus enables engineers to conveniently set up the finite element model in a graphical environment, eliminating the problems associated with conventional input text files and improving the user's perception of the application. In this paper, a brief overview of the FormWorks-Plus structure is presented, followed by a detailed explanation of the main features of the program. In addition, demonstration is made of the application of FormWorks-Plus in combination with VecTor programs, advanced nonlinear analysis tools for reinforced concrete structures. Finally, aspects relating to the modelling and analysis of three case studies are discussed: a reinforced concrete beam-column joint, a steel-concrete composite shear wall, and a SFRC shear panel. The unique mixed-type frame-membrane modelling procedure implemented in FormWorks-Plus can address the limitations associated with most frame type analyses.

A practical application of cluster analysis using SPSS

  • Kim, Dae-Hak
    • Journal of the Korean Data and Information Science Society
    • /
    • v.20 no.6
    • /
    • pp.1207-1212
    • /
    • 2009
  • Basic objective in cluster analysis is to discover natural groupings of items or variables. In general, clustering is conducted based on some similarity (or dissimilarity) matrix or the original input text data. Various measures of similarities (or dissimilarities) between objects (or variables) are developed. We introduce a real application problem of clustering procedure in SPSS when the distance matrix of the objects (or variables) is only given as an input data. It will be very helpful for the cluster analysis of huge data set which leads the size of the proximity matrix greater than 1000, particularly. Syntax command for matrix input data in SPSS for clustering is given with numerical examples.

  • PDF

Experimental Investigation on Skilled Human′s Typing Pattern for Development of New Input Device (새로운 입력장치 개발을 위한 숙련자의 타이핑 동작에 관한 실험적 연구)

  • 김진영;최혁렬;이호길
    • Journal of Institute of Control, Robotics and Systems
    • /
    • v.9 no.9
    • /
    • pp.720-726
    • /
    • 2003
  • A virtual keyboard may be efficient as a new mobile input device supporting QWERTY keyboard layout. As a preliminary study for developing a virtual keyboard, the typing pattern of a skilled human is investigated. In the study the touch-positions of the fingers are measured with a touchscreen while five skilled typists perform typing of long sentences. From these measurements it can be observed that the groups of touch-positions are classified into alphabetic characters. Though there are some mismatches, we can find constant distances capable of being discriminated among the groups. Based on the analysis the prediction algorithm of the constant distance is proposed and evaluated, which is useful for realization of a portable virtual keyboard.

Matching Algorithm for Hangul Recognition Based on PDA

  • Kim Hyeong-Gyun;Choi Gwang-Mi
    • Journal of information and communication convergence engineering
    • /
    • v.2 no.3
    • /
    • pp.161-166
    • /
    • 2004
  • Electronic Ink is a stored data in the form of the handwritten text or the script without converting it into ASCII by handwritten recognition on the pen-based computers and Personal Digital Assistants(PDA) for supporting natural and convenient data input. One of the most important issue is to search the electronic ink in order to use it. We proposed and implemented a script matching algorithm for the electronic ink. Proposed matching algorithm separated the input stroke into a set of primitive stroke using the curvature of the stroke curve. After determining the type of separated strokes, it produced a stroke feature vector. And then it calculated the distance between the stroke feature vector of input strokes and one of strokes in the database using the dynamic programming technique.