• 제목/요약/키워드: text information

검색결과 4,361건 처리시간 0.03초

Effective Cross-Lingual Text Retrieval using a Fuzzy Knowledge Base (퍼지 지식베이스를 이용한 효과적인 다언어 문서 검색)

  • Choi, Myeong-Bok
    • The Journal of the Institute of Internet, Broadcasting and Communication
    • /
    • 제8권1호
    • /
    • pp.53-62
    • /
    • 2008
  • Cross-lingual text retrieval(CLTR) is the information retrieval in which a user tries to search a set of documents written in one language for a query another language. This thesis proposes a CLTR system based on fuzzy multilingual thesaurus to handle a partial matching between terms of two different languages. The proposed CLTR system uses a fuzzy term matrix defined in our thesis to perform the information retrieval effectively. In the defined fuzzy term matrix, all relation degrees between terms are inferred from using the transitive closure algorithm to reflect all implicit links between terms into processing of the information retrieval. With this framework, the CLTR system proposed in our thesis enhances the retrieval effectiveness because it is able to emulate a human expert's decision making well in CLTR.

  • PDF

Analysis and Localization of freeWAIS-sf (FreeWAIS-sf의 분석 및 한글화)

  • O, Jeong-Seok;Kim, Ji-Seung;Lee, Jun-Ho;Lee, Sang-Ho
    • Journal of KIISE:Computing Practices and Letters
    • /
    • 제5권5호
    • /
    • pp.611-618
    • /
    • 1999
  • An efficient and effective access to needed information becomes an important factor in the modern information society. Many people have developed information retrieval (IR) systems that retrieve needed information from a large amount of data at a given time. However, most freely available IR systems have been developed for English text rather than for Korean text. In this research, we have analyzed the IR system freeWAIS-sf, and localized it with the Korean morphological analyzer, namely HAM. The localized freeWAIS-sf can handle both English text and Korean text simultaneously. We have also modified the weighting scheme of freeWAIS-sf. The experimental result shows that the modified weighting scheme outperforms the original one in terms of retrieval effectiveness.

Text Categorization Using TextRank Algorithm (TextRank 알고리즘을 이용한 문서 범주화)

  • Bae, Won-Sik;Cha, Jeong-Won
    • Journal of KIISE:Computing Practices and Letters
    • /
    • 제16권1호
    • /
    • pp.110-114
    • /
    • 2010
  • We describe a new method for text categorization using TextRank algorithm. Text categorization is a problem that over one pre-defined categories are assigned to a text document. TextRank algorithm is a graph-based ranking algorithm. If we consider that each word is a vertex, and co-occurrence of two adjacent words is a edge, we can get a graph from a document. After that, we find important words using TextRank algorithm from the graph and make feature which are pairs of words which are each important word and a word adjacent to the important word. We use classifiers: SVM, Na$\ddot{i}$ve Bayesian classifier, Maximum Entropy Model, and k-NN classifier. We use non-cross-posted version of 20 Newsgroups data set. In consequence, we had an improved performance in whole classifiers, and the result tells that is a possibility of TextRank algorithm in text categorization.

The Forming Mechanism of Brain Text and Brain Concept in the Theory of Ethical Literary Criticism (뇌텍스트(Brain Text) 및 뇌개념(Brain Concept)의 형성원리와 문학윤리학비평)

  • Nie, Zhenzhao;Yoon, Seokmin
    • Journal of Popular Narrative
    • /
    • 제25권1호
    • /
    • pp.193-215
    • /
    • 2019
  • According to ethical literary criticism, every type of literature has its text. The original definition of oral literature refers to the literature disseminated orally. Before the dissemination, the text of oral literature is stored in the human brain, which is termed as "brain text". Brain text is the textual form used before the formation of writing symbols and its application to a recording of information, and it still exists after the creation of writing symbols. Other types of texts are written text and electronic text. Brain text consists of brain concepts, which, according to different sources, can be divided into objective concepts and abstractive concepts. Brain concepts are tools for thinking while thought comes from thinking with understanding and an application of brain concepts. Brain text is the carrier of thought. The termination of the synthesis of brain concepts signifies the completion of thinking, which produces thoughts to form brain text. Brain text determines thinking and behavioral patterns that not only communicate and spread information, but also decide our ideas, thoughts, judgments, choices, actions and emotions. Brain text is also a deciding factor for our lifestyle and moral behaviors. The nature of a person's brain text determines his thoughts and actions, and most importantly determines who he is.

Text Detection based on Edge Enhanced Contrast Extremal Region and Tensor Voting in Natural Scene Images

  • Pham, Van Khien;Kim, Soo-Hyung;Yang, Hyung-Jeong;Lee, Guee-Sang
    • Smart Media Journal
    • /
    • 제6권4호
    • /
    • pp.32-40
    • /
    • 2017
  • In this paper, a robust text detection method based on edge enhanced contrasting extremal region (CER) is proposed using stroke width transform (SWT) and tensor voting. First, the edge enhanced CER extracts a number of covariant regions, which is a stable connected component from input images. Next, SWT is created by the distance map, which is used to eliminate non-text regions. Then, these candidate text regions are verified based on tensor voting, which uses the input center point in the previous step to compute curve salience values. Finally, the connected component grouping is applied to a cluster closed to characters. The proposed method is evaluated with the ICDAR2003 and ICDAR2013 text detection competition datasets and the experiment results show high accuracy compared to previous methods.

A Real-Time Concept-Based Text Categorization System using the Thesauraus Tool (시소러스 도구를 이용한 실시간 개념 기반 문서 분류 시스템)

  • 강원석;강현규
    • Journal of KIISE:Software and Applications
    • /
    • 제26권1호
    • /
    • pp.167-167
    • /
    • 1999
  • The majority of text categorization systems use the term-based classification method. However, because of too many terms, this method is not effective to classify the documents in areal-time environment. This paper presents a real-time concept-based text categorization system,which classifies texts using thesaurus. The system consists of a Korean morphological analyzer, athesaurus tool, and a probability-vector similarity measurer. The thesaurus tool acquires the meaningsof input terms and represents the text with not the term-vector but the concept-vector. Because theconcept-vector consists of semantic units with the small size, it makes the system enable to analyzethe text with real-time. As representing the meanings of the text, the vector supports theconcept-based classification. The probability-vector similarity measurer decides the subject of the textby calculating the vector similarity between the input text and each subject. In the experimentalresults, we show that the proposed system can effectively analyze texts with real-time and do aconcept-based classification. Moreover, the experiment informs that we must expand the thesaurustool for the better system.

Skewed Angle Detection in Text Images Using Orthogonal Angle View

  • Chin, Seong-Ah;Choo, Moon-Won
    • Proceedings of the IEEK Conference
    • /
    • 대한전자공학회 2000년도 ITC-CSCC -1
    • /
    • pp.62-65
    • /
    • 2000
  • In this paper we propose skewed angle detection methods for images that contain text that is not aligned horizontally. In most images text areas are aligned along the horizontal axis, however there are many occasions when the text may be at a skewed angle (denoted by 0 < ${\theta}\;{\leq}\;{\pi}$). In the work described, we adapt the Hough transform, Shadow and Threshold Projection methods to detect the skewed angle of text in an input image using the orthogonal angle view property. The results of this method are a primary text skewed angle, which allows us to rotate the original input image into an image with horizontally aligned text. This utilizes document image processing prior to the recognition stage.

  • PDF

Caption Extraction in News Video Sequence using Frequency Characteristic

  • Youglae Bae;Chun, Byung-Tae;Seyoon Jeong
    • Proceedings of the IEEK Conference
    • /
    • 대한전자공학회 2000년도 ITC-CSCC -2
    • /
    • pp.835-838
    • /
    • 2000
  • Popular methods for extracting a text region in video images are in general based on analysis of a whole image such as merge and split method, and comparison of two frames. Thus, they take long computing time due to the use of a whole image. Therefore, this paper suggests the faster method of extracting a text region without processing a whole image. The proposed method uses line sampling methods, FFT and neural networks in order to extract texts in real time. In general, text areas are found in the higher frequency domain, thus, can be characterized using FFT The candidate text areas can be thus found by applying the higher frequency characteristics to neural network. Therefore, the final text area is extracted by verifying the candidate areas. Experimental results show a perfect candidate extraction rate and about 92% text extraction rate. The strength of the proposed algorithm is its simplicity, real-time processing by not processing the entire image, and fast skipping of the images that do not contain a text.

  • PDF

Detecting and Segmenting Text from Images for a Mobile Translator System

  • Chalidabhongse, Thanarat H.;Jeeraboon, Poonsak
    • 제어로봇시스템학회:학술대회논문집
    • /
    • 제어로봇시스템학회 2004년도 ICCAS
    • /
    • pp.875-878
    • /
    • 2004
  • Researching in text detection and segmentation has been done for a long period in the OCR area. However, there is some other area that the text detection and segmentation from images can be very useful. In this report, we first propose the design of a mobile translator system which helps non-native speakers to understand the foreign language using ubiquitous mobile network and camera mobile phones. The main focus of the paper will be the algorithm in detecting and segmenting texts embedded in the natural scenes from taken images. The image, which is captured by a camera mobile phone, is transmitted to a translator server. It is initially passed through some preprocessing processes to smooth the image as well as suppress noises. A threshold is applied to binarize the image. Afterward, an edge detection algorithm and connected component analysis are performed on the filtered image to find edges and segment the components in the image. Finally, the pre-defined layout relation constraints are utilized in order to decide which components likely to be texts in the image. A preliminary experiment was done and the system yielded a recognition rate of 94.44% on a set of 36 various natural scene images that contain texts.

  • PDF

Examining the Intellectual Structure of Housing Studies in Korea with Text Mining and Factor Analysis (저자 프로파일링과 요인분석을 이용한 국내 주거학 분야의 지적 구조 분석)

  • Lee, Jae-Yun;Kim, Hee-Jeon;Ryoo, Jong-Duk
    • Journal of the Korean Society for Library and Information Science
    • /
    • 제44권2호
    • /
    • pp.285-308
    • /
    • 2010
  • This study analyzes the intellectual structure in domestic research of the Housing field, by utilizing text mining technique. Unlike the existing research that mainly uses text clustering in statistical analyses to identify subject specialties, core authors, and relationships between research areas, this study applied author profiling and factor analysis. To supplement the analysis of intellectual structure generated by text mining, and to perform evaluation on intellectual structure itself, two professionals in the housing field were interviewed. The intellectual structure, generated through text mining, was evaluated and showed its division of valid research areas that is slightly different from the traditional intellectual structure in the housing field.