• Title, Summary, Keyword: Sub-text

Search Result 151, Processing Time 0.113 seconds

Automatic Text Categorization Using Passage-based Weight Function and Passage Type (문단 단위 가중치 함수와 문단 타입을 이용한 문서 범주화)

  • Joo, Won-Kyun;Kim, Jin-Suk;Choi, Ki-Seok
    • The KIPS Transactions:PartB
    • /
    • v.12B no.6
    • /
    • pp.703-714
    • /
    • 2005
  • Researches in text categorization have been confined to whole-document-level classification, probably due to lacks of full-text test collections. However, full-length documents availably today in large quantities pose renewed interests in text classification. A document is usually written in an organized structure to present its main topic(s). This structure can be expressed as a sequence of sub-topic text blocks, or passages. In order to reflect the sub-topic structure of a document, we propose a new passage-level or passage-based text categorization model, which segments a test document into several Passages, assigns categories to each passage, and merges passage categories to document categories. Compared with traditional document-level categorization, two additional steps, passage splitting and category merging, are required in this model. By using four subsets of Routers text categorization test collection and a full-text test collection of which documents are varying from tens of kilobytes to hundreds, we evaluated the proposed model, especially the effectiveness of various passage types and the importance of passage location in category merging. Our results show simple windows are best for all test collections tested in these experiments. We also found that passages have different degrees of contribution to main topic(s), depending on their location in the test document.

QUOTIENTS OF THETA SERIES AS RATIONAL FUNCTIONS OF j(sub)1,8

  • Hong, Kuk-Jin;Koo, Ja-Kyung
    • Journal of the Korean Mathematical Society
    • /
    • v.38 no.3
    • /
    • pp.595-611
    • /
    • 2001
  • Let Q(n,1) be the set of even unimodular positive definite integral quadratic forms in n-variables. Then n is divisible by 8. For A[X] in Q(n,1), the theta series $\theta$(sub)A(z) = ∑(sub)X∈Z(sup)n e(sup)$\pi$izA[X] (Z∈h (※Equations, See Full-text) the complex upper half plane) is a modular form of weight n/2 for the congruence group Γ$_1$(8) = {$\delta$∈SL$_2$(Z)│$\delta$≡()mod 8} (※Equation, See Full-text). If n$\geq$24 and A[X], B{X} are tow quadratic forms in Q(n,1), the quotient $\theta$(sub)A(z)/$\theta$(sub)B(z) is a modular function for Γ$_1$(8). Since we identify the field of modular functions for Γ$_1$(8) with the function field K(X$_1$(8)) of the modular curve X$_1$(8) = Γ$_1$(8)\h(sup)* (h(sup)* the extended plane of h) with genus 0, we can express it as a rational function of j(sub) 1,8 over C which is a field generator of K(X$_1$(8)) and defined by j(sub)1,8(z) = $\theta$$_3$(2z)/$\theta$$_3$(4z). Here, $\theta$$_3$ is the classical Jacobi theta series.

  • PDF

An Image-based CAPTCHA System with Correction of Sub-images (서브 이미지의 교정을 통한 이미지 기반의 CAPTCHA 시스템)

  • Chung, Woo-Keun;Ji, Seung-Hyun;Cho, Hwan-Gue
    • Journal of KIISE:Computing Practices and Letters
    • /
    • v.16 no.8
    • /
    • pp.873-877
    • /
    • 2010
  • CAPTCHA is a security tool that prevents the automatic sign-up by a spam or a robot. This CAPTCHA usually depends on the smart readability of humans. However, the common and plain CAPTCHA with text-based system is not difficult to be solved by intelligent web-bot and machine learning tools. In this paper, we propose a new sub-image based CAPTCHA system totally different from the text based system. Our system offers a set of cropped sub-image from a whole digital picture and asks user to identify the correct orientation. Though there are some nice machine learning tools for this job, but they are useless for a cropped sub-images, which was clearly revealed by our experiment. Experiment showed that our sub-image based CAPTCHA is easy to human solver, but very hard to all kinds of machine learning or AI tools. Also our CAPTCHA is easy to be generated automatical without any human intervention.

Text Verification Based on Sub-Image Matching (부분 영상 매칭에 기반한 텍스트 검증)

  • Son Hwa Jeong;Jeong Seon Hwa;Kim Soo Hyung
    • The KIPS Transactions:PartB
    • /
    • v.12B no.2
    • /
    • pp.115-122
    • /
    • 2005
  • The sub-mage matching problem in which one image contains some part of the other image, has been mostly investigated on natural images. In this paper, we propose two sub-image matching techniques: mesh-based method and correlation-based method, that are efficiently used to match text images. Mesh-based method consists of two stages, box alignment and similarity measurement by extracting the mesh feature from the two images. Correlation-based method determines the similarity using the correlation of the two images based on FFT function. We have applied the two methods to the text verification in a postal automation system and observed that the accuracy of correlation-based method is $92.7\%$ while that of mesh-based method is $90.1\%$.

Conceptual Graph Matching Method for Reading Comprehension Tests

  • Zhang, Zhi-Chang;Zhang, Yu;Liu, Ting;Li, Sheng
    • Journal of information and communication convergence engineering
    • /
    • v.7 no.4
    • /
    • pp.419-430
    • /
    • 2009
  • Reading comprehension (RC) systems are to understand a given text and return answers in response to questions about the text. Many previous studies extract sentences that are the most similar to questions as answers. However, texts for RC tests are generally short and facts about an event or entity are often expressed in multiple sentences. The answers for some questions might be indirectly presented in the sentences having few overlapping words with the questions. This paper proposes a conceptual graph matching method towards RC tests to extract answer strings. The method first represents the text and questions as conceptual graphs, and then extracts subgraphs for every candidate answer concept from the text graph. All candidate answer concepts will be scored and ranked according to the matching similarity between their sub-graphs and question graph. The top one will be returned as answer seed to form a concise answer string. Since the sub-graphs for candidate answer concepts are not restricted to only covering a single sentence, our approach improved the performance of answer extraction on the Remedia test data.

A Watermarking for Text Document Images using Edge Direction Histograms (에지 방향 히스토그램을 이용한 텍스트 문서 영상의 워터마킹)

  • 김영원;오일석
    • Journal of KIISE:Software and Applications
    • /
    • v.31 no.2
    • /
    • pp.203-212
    • /
    • 2004
  • The watermarking is a method to achieve the copyright protection of multimedia contents. Among several media, the left documents show very peculiar properties: block/line/word patterning, clear separation between foreground and background areas. So algorithms specific to the text documents are required that meet those properties. This paper proposes a novel watermarking algorithm for the grayscale text document images. The algorithm inserts the watermark signals through the edge direction histograms. A concept of sub-image consistency is developed that the sub-images have similar shapes in terms of edge direction histograms. Using Korean, Chinese, and English document images, the concept is evaluated and proven to be valid over a wide range of document images. To insert watermark signals, the edge direction histogram is modified slightly. The experiments were performed on various document images and the algorithm was evaluated in terms of imperceptibility and robustness.

Keywords Refinement using TextRank Algorithm (TextRank를 이용한 키워드 정련 -TextRank를 이용한 집단 지성에서 생성된 콘텐츠의 키워드 정련-)

  • Lee, Hyun-Woo;Han, Yo-Sub;Kim, Lae-Hyun;Cha, Jeong-Won
    • 한국HCI학회:학술대회논문집
    • /
    • /
    • pp.285-289
    • /
    • 2009
  • Tag is important to retrieve and classify contents. However, someone uses so many unrelated tags with contents for the high ranking In this work, we propose tag refinement algorithm using TextRank. We calculate the importance of keywords occurred a title, description, tag, and comments. We refine tags removing unrelated keywords from user generated tags. From the results of experiments, we can see that proposed method is useful for refining tags.

  • PDF

COHOMOLOGY GROUPS OF CIRCULAR UNITS

  • Kim, Jae-Moon;Oh, Seung-Ik
    • Journal of the Korean Mathematical Society
    • /
    • v.38 no.3
    • /
    • pp.623-631
    • /
    • 2001
  • Let $\kappa$ be a real abelian field of conductor f and $\kappa$(sub)$\infty$ = ∪(sub)n$\geq$0$\kappa$(sub)n be its Z(sub)p-extension for an odd prime p such that płf$\phi$(f). he aim of this paper is ot compute the cohomology groups of circular units. For m>n$\geq$0, let G(sub)m,n be the Galois group Gal($\kappa$(sub)m/$\kappa$(sub)n) and C(sub)m be the group of circular units of $\kappa$(sub)m. Let l be the number of prime ideals of $\kappa$ above p. Then, for mm>n$\geq$0, we have (1) C(sub)m(sup)G(sub)m,n = C(sub)n, (2) H(sup)i(G(sub)m,n, C(sub)m) = (Z/p(sup)m-n Z)(sup)l-1 if i is even, (3) H(sup)i(G(sub)m,n, C(sub)m) = (Z/P(sup)m-n Z)(sup l) if i is odd (※Equations, See Full-text).

  • PDF

Correction of Signboard Distortion by Vertical Stroke Estimation

  • Lim, Jun Sik;Na, In Seop;Kim, Soo Hyung
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.7 no.9
    • /
    • pp.2312-2325
    • /
    • 2013
  • In this paper, we propose a preprocessing method that it is to correct the distortion of text area in Korean signboard images as a preprocessing step to improve character recognition. Distorted perspective in recognizing of Korean signboard text may cause of the low recognition rate. The proposed method consists of four main steps and eight sub-steps: main step consists of potential vertical components detection, vertical components detection, text-boundary estimation and distortion correction. First, potential vertical line components detection consists of four steps, including edge detection for each connected component, pixel distance normalization in the edge, dominant-point detection in the edge and removal of horizontal components. Second, vertical line components detection is composed of removal of diagonal components and extraction of vertical line components. Third, the outline estimation step is composed of the left and right boundary line detection. Finally, distortion of the text image is corrected by bilinear transformation based on the estimated outline. We compared the changes in recognition rates of OCR before and after applying the proposed algorithm. The recognition rate of the distortion corrected signboard images is 29.63% and 21.9% higher at the character and the text unit than those of the original images.

Text Region Verification in Natural Scene Images using Multi-resolution Wavelet Transform and Support Vector Machine (다해상도 웨이블릿 변환과 써포트 벡터 머신을 이용한 자연영상에서의 문자 영역 검증)

  • Bae Kyungsook;Choi Youngwoo
    • The KIPS Transactions:PartB
    • /
    • v.11B no.6
    • /
    • pp.667-674
    • /
    • 2004
  • Extraction of texts from images is a fundamental and important problem to understand the images. This paper suggests a text region verification method by statistical means of stroke features of the characters. The method extracts 36 dimensional features from $16\times16$sized text and non-text images using wavelet transform - these 36 dimensional features express stroke and direction of characters - and select 12 sub-features out of 36 dimensional features which yield adequate separation between classes. After selecting the features, SVM trains the selected features. For the verification of the text region, each $16\times16$image block is scanned and classified as text or non-text. Then, the text region is finally decided as text region or non-text region. The proposed method is able to verify text regions which can hardly be distin guished.