• Title/Summary/Keyword: text information

Search Result 4,380, Processing Time 0.026 seconds

Web Site Creation Method by Using Text2Image Technology (웹 기반 Text2Image 기술을 이용한 웹 사이트 제작 기법)

  • Ban, Tae-Hak;Kim, Kun-Sub;Min, Kyoung-Ju;Jung, Hoe-Kyung
    • Proceedings of the Korean Institute of Information and Commucation Sciences Conference
    • /
    • 2011.05a
    • /
    • pp.227-229
    • /
    • 2011
  • To develop an effective web site, Ajax, jQuery and various techniques have been developed. Administrator interface to powerful, where the administrator can easily create a variety of functions from the menu, but this is not leaving the form of text or code, modify it and then when is low. To solve these problems, in this paper we are making conjunction with CSS which provide the database with open font and CSS features in variety of images, converted into text using the open font, these existing production methods with a Web site that provides a differentiated ways.

  • PDF

Text Detection and Binarization using Color Variance and an Improved K-means Color Clustering in Camera-captured Images (카메라 획득 영상에서의 색 분산 및 개선된 K-means 색 병합을 이용한 텍스트 영역 추출 및 이진화)

  • Song Young-Ja;Choi Yeong-Woo
    • The KIPS Transactions:PartB
    • /
    • v.13B no.3 s.106
    • /
    • pp.205-214
    • /
    • 2006
  • Texts in images have significant and detailed information about the scenes, and if we can automatically detect and recognize those texts in real-time, it can be used in various applications. In this paper, we propose a new text detection method that can find texts from the various camera-captured images and propose a text segmentation method from the detected text regions. The detection method proposes color variance as a detection feature in RGB color space, and the segmentation method suggests an improved K-means color clustering in RGB color space. We have tested the proposed methods using various kinds of document style and natural scene images captured by digital cameras and mobile-phone camera, and we also tested the method with a portion of ICDAR[1] contest images.

Construction of Full-text Database by SGML (문서기술언어 SGML에 의한 전문 데이터베이스의 구축)

  • Kim, Chang-Bong
    • Journal of Information Management
    • /
    • v.27 no.4
    • /
    • pp.35-56
    • /
    • 1996
  • SGML(Standard Generalized Markup Language) and its application to full-text database including a table, a figure and a picture are explained. A structure of SGML based full-text database Is defined by DTD(document type definition) written in SGML, and full-text itself is described with generalized markup depending on DTD. This article explains how to represent a document structure : a hierarchical structure like a chapter, a section, or a paragraph, or non-hierarchical(referencial) structure like a note, a table, a figure or a picture. Merits of SGML, electronic publishing, a retrieval system or hypertext and SGML tools are also described.

  • PDF

Prosodic Annotation in a Thai Text-to-speech System

  • Potisuk, Siripong
    • Proceedings of the Korean Society for Language and Information Conference
    • /
    • 2007.11a
    • /
    • pp.405-414
    • /
    • 2007
  • This paper describes a preliminary work on prosody modeling aspect of a text-to-speech system for Thai. Specifically, the model is designed to predict symbolic markers from text (i.e., prosodic phrase boundaries, accent, and intonation boundaries), and then using these markers to generate pitch, intensity, and durational patterns for the synthesis module of the system. In this paper, a novel method for annotating the prosodic structure of Thai sentences based on dependency representation of syntax is presented. The goal of the annotation process is to predict from text the rhythm of the input sentence when spoken according to its intended meaning. The encoding of the prosodic structure is established by minimizing speech disrhythmy while maintaining the congruency with syntax. That is, each word in the sentence is assigned a prosodic feature called strength dynamic which is based on the dependency representation of syntax. The strength dynamics assigned are then used to obtain rhythmic groupings in terms of a phonological unit called foot. Finally, the foot structure is used to predict the durational pattern of the input sentence. The aforementioned process has been tested on a set of ambiguous sentences, which represents various structural ambiguities involving five types of compounds in Thai.

  • PDF

Construction of Full-Text Database and Implementation of Service Environment for Electronic Theses and Dissertations (학위논문 전문데이터베이스 구축 및 서비스환경 구현)

  • Lee, Kyi-Ho;Kim, Jin-Suk;Yoon, Wha-Muk
    • The Transactions of the Korea Information Processing Society
    • /
    • v.7 no.1
    • /
    • pp.41-49
    • /
    • 2000
  • Form the middle of 199os, most universities in Korea have requested their students to submit not only the original text books but also their Electronic Theses and Dissertations(ETD) for masters degree and doctorates degree. The ETD submitted by the students are usually developed by various kinds of word processors such as MS-Word, LaTex, and HWP. Since there is no standard format for ETD to merge various different formats yet, it is difficult to construct the integrated database that provides full-tex service. In this paper, we transform three different ETD formats into a unified one, construct a full-text database, and implement the full-text retrieval system for effective search in the Internet environment.

  • PDF

A Hierarchical Text Rating System for Objectionable Documents

  • Jeong, Chi-Yoon;Han, Seung-Wan;Nam, Taek-Yong
    • Journal of Information Processing Systems
    • /
    • v.1 no.1 s.1
    • /
    • pp.22-26
    • /
    • 2005
  • In this paper, we classified the objectionable texts into four rates according to their harmfulness and proposed the hierarchical text rating system for objectionable documents. Since the documents in the same category have similarities in used words, expressions and structure of the document, the text rating system, which uses a single classification model, has low accuracy. To solve this problem, we separate objectionable documents into several subsets by using their properties, and then classify the subsets hierarchically. The proposed system consists of three layers. In each layer, we select features using the chi-square statistics, and then the weight of the features, which is calculated by using the TF-IDF weighting scheme, is used as an input of the non-linear SVM classifier. By means of a hierarchical scheme using the different features and the different number of features in each layer, we can characterize the objectionability of documents more effectively and expect to improve the performance of the rating system. We compared the performance of the proposed system and performance of several text rating systems and experimental results show that the proposed system can archive an excellent classification performance.

A method for Character Segmentation using Frequence Characteristics and Back Propagation Neural Network (주파수 특성과 역전파 신경망 알고리즘을 이용한 문자 영역 분할 방법)

  • Chun Byung-Tae;Song Chee-Yang
    • Journal of the Korea Society of Computer and Information
    • /
    • v.11 no.4 s.42
    • /
    • pp.55-60
    • /
    • 2006
  • The proposed method uses FFT(Fast Fourier Transform) and neural networks in order to extract texts in real time. In general, text areas are found in the higher frequency domain, thus, can be characterized using FFT. The neural network are learned by character region(high frequency) and non character region(low frequency). The candidate text areas can be thus found by applying the higher frequency characteristics to neural network. Therefore, the final text area is extracted by verifying the candidate areas. Experimental results show a perfect candidate extraction rate and about 95% text extraction rate. The strength of the proposed algorithm is its simplicity, real-time processing by not processing the entire image.

  • PDF

EDGE: An Enticing Deceptive-content GEnerator as Defensive Deception

  • Li, Huanruo;Guo, Yunfei;Huo, Shumin;Ding, Yuehang
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.15 no.5
    • /
    • pp.1891-1908
    • /
    • 2021
  • Cyber deception defense mitigates Advanced Persistent Threats (APTs) with deploying deceptive entities, such as the Honeyfile. The Honeyfile distracts attackers from valuable digital documents and attracts unauthorized access by deliberately exposing fake content. The effectiveness of distraction and trap lies in the enticement of fake content. However, existing studies on the Honeyfile focus less on this perspective. In this work, we seek to improve the enticement of fake text content through enhancing its readability, indistinguishability, and believability. Hence, an enticing deceptive-content generator, EDGE, is presented. The EDGE is constructed with three steps: extracting key concepts with a semantics-aware K-means clustering algorithm, searching for candidate deceptive concepts within the Word2Vec model, and generating deceptive text content under the Integrated Readability Index (IR). Furthermore, the readability and believability performance analyses are undertaken. The experimental results show that EDGE generates indistinguishable deceptive text content without decreasing readability. In all, EDGE proves effective to generate enticing deceptive text content as deception defense against APTs.

Analysis of Factors Affecting Surge in Container Shipping Rates in the Era of Covid19 Using Text Analysis (코로나19 판데믹 이후 컨테이너선 운임 상승 요인분석: 텍스트 분석을 중심으로)

  • Rha, Jin Sung
    • Journal of Korea Society of Industrial Information Systems
    • /
    • v.27 no.1
    • /
    • pp.111-123
    • /
    • 2022
  • In the era of the Covid19, container shipping rates are surging up. Many studies have attempted to investigate the factors affecting a surge in container shipping rates. However, there is limited literature using text mining techniques for analyzing the underlying causes of the surge. This study aims to identify the factors behind the unprecedented surge in shipping rates using network text analysis and LDA topic modeling. For the analysis, we collected the data and keywords from articles in Lloyd's List during past two years(2020-2021). The results of the text analysis showed that the current surge is mainly due to "US-China trade war", "rising blanking sailings", "port congestion", "container shortage", and "unexpected events such as the Suez canal blockage".

Analysis of Social Media Utilization based on Big Data-Focusing on the Chinese Government Weibo

  • Li, Xiang;Guo, Xiaoqin;Kim, Soo Kyun;Lee, Hyukku
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.16 no.8
    • /
    • pp.2571-2586
    • /
    • 2022
  • The rapid popularity of government social media has generated huge amounts of text data, and the analysis of these data has gradually become the focus of digital government research. This study uses Python language to analyze the big data of the Chinese provincial government Weibo. First, this study uses a web crawler approach to collect and statistically describe over 360,000 data from 31 provincial government microblogs in China, covering the period from January 2018 to April 2022. Second, a word separation engine is constructed and these text data are analyzed using word cloud word frequencies as well as semantic relationships. Finally, the text data were analyzed for sentiment using natural language processing methods, and the text topics were studied using LDA algorithm. The results of this study show that, first, the number and scale of posts on the Chinese government Weibo have grown rapidly. Second, government Weibo has certain social attributes, and the epidemics, people's livelihood, and services have become the focus of government Weibo. Third, the contents of government Weibo account for more than 30% of negative sentiments. The classified topics show that the epidemics and epidemic prevention and control overshadowed the other topics, which inhibits the diversification of government Weibo.