• Title/Summary/Keyword: Text features

Search Result 580, Processing Time 0.029 seconds

Compilation of the Yonsei English Learner Corpus (YELC) 2011 and Its Use for Understanding Current Usage of English by Korean Pre-university Students (한국 예비 대학생의 영어 사용 특성 파악을 위한 대규모 공개 영어 학습자 코퍼스 구축 및 분석)

  • Rhee, Seok-Chae;Jung, Chae Kwan
    • The Journal of the Korea Contents Association
    • /
    • v.14 no.11
    • /
    • pp.1019-1029
    • /
    • 2014
  • In recent years, researchers have become increasingly interested in the creation and pedagogical use of English learner corpora. Many studies have shown that learner corpora can not only make a significant contribution to second language acquisition research but also contribute to the construction and evaluation of language tests by advancing our understanding of English learners. So far, however, little attention has been paid to the Korean EFL (English as a foreign language) learners' corpus. The Yonsei English Learner Corpus (YELC 2011) is a specialized, monolingual, and synchronic Korean EFL learner corpus that was developed by Yonsei University from 2011 to 2012. Over 3,000 Korean high school graduates (or equivalents) who were accepted by Yonsei University for their further studies participated in this project. It consists of 6,572 written texts (1,085,828 words) at nine different English proficiency levels. In this paper, we describe its compilation, and more specifically, how we have corpusized from a text archive to a corpus. After introducing the process of corpusization, we report arresting insights into the specific linguistic features that different proficiency levels of Korean learners of English have. This study also discusses the potential use of the YELC 2011 which is now freely available for research purposes.

A Study on the Inter-constructive Design Dictionary through the Internet. (인터넷을 통한 상호구축적 디자인 용어사전의 연구)

  • 김태균
    • Archives of design research
    • /
    • v.14 no.4
    • /
    • pp.25-33
    • /
    • 2001
  • With the increasing access to the internet, the number of designers who rely on internet to use information on design is on the rise. Therefore common dictionary of design terminology need to be formed and shared among designers. To do so, internet is very useful medium. However as relating terminology increases rapidly through interactivity among designers, it will be far from taking full advantage of features of internet to set up and provide such information unilaterally on internet. This indicates that providing data on the internet, not via traditional books, requires in-depth study on process of establishment of database structure and appropriate interface design. Thus this study will show design terms database model that harnesses internet feature that enables establishment of information spontaneously through user's interactivity, departing from a model that conveys information unilaterally. This report summarized and analyzed various models and suggested classification system in accordance with user's learning cognition. Problems on existing dictionary of design terminology were identified and new methods addressing such problems were exploited. In a word, this report is intended to propose user oriented inter-constructive database model that highlights high level of openness and interactivity by enabling changes of text in the cyber space and encouraging user to participate in making design dictionary.

  • PDF

Email Extraction and Utilization for Author Disambiguation (저자 식별을 위한 전자메일의 추출 및 활용)

  • Kang, In-Su
    • The Journal of the Korea Contents Association
    • /
    • v.8 no.6
    • /
    • pp.261-268
    • /
    • 2008
  • An author of a paper is represented as his/her personal name in a bibliographic record. However, the use of names to indicate authors may deteriorate recall and precision of paper and/or author search, since the same name can be shared by many different individuals and a person can write his/her name in different forms. To solve this problem, it is required to disambiguate same-name author names into different persons. As features for author resolution, previous studies have exploited bibliographic attributes such as co-authors, titles, publication information, etc. This study attempts to apply email addresses of authors to disambiguate author names. For this, we first handle the extraction of email addresses from full-text papers, and then evaluate and analyze the effect of email addresses on author resolution using a large-scale test set.

Documentation of the History of Ok-Cheon Catholic Church by standardized 2D CAD and 3D Digital Modeling (표준화된 2D CAD와 3D Digital Modeling을 이용한 옥천천주교회의 연혁 기록)

  • Kim, Myung-Sun;Choi, Soon-Yong
    • Journal of the Korea Academia-Industrial cooperation Society
    • /
    • v.12 no.1
    • /
    • pp.523-528
    • /
    • 2011
  • Ok-Cheon catholic church has been changed 4 times since it's first construction in 1955. Prior three changes were small ones of windows, doors, roof finish etc. but the last alteration was the extension of it's plan from 一 shape to long cross shape and along with it the size, structure and form of it changed. This history of the church has not been recorded in drawing but only in text with indistinct features not documented. This study makes a new 2D CAD files using layers matched the changes and 3D digital models, these have not only present information but also change informations of the church. They are useful data for effective management, conservation restoration or possible reuse of it.

Automatic Construction of Class Hierarchies and Named Entity Dictionaries using Korean Wikipedia (한국어 위키피디아를 이용한 분류체계 생성과 개체명 사전 자동 구축)

  • Bae, Sang-Joon;Ko, Young-Joong
    • Journal of KIISE:Computing Practices and Letters
    • /
    • v.16 no.4
    • /
    • pp.492-496
    • /
    • 2010
  • Wikipedia as an open encyclopedia contains immense human knowledge written by thousands of volunteer editors and its reliability is also high. In this paper, we propose to automatically construct a Korean named entity dictionary using the several features of the Wikipedia. Firstly, we generate class hierarchies using the class information from each article of Wikipedia. Secondly, the titles of each article are mapped to our class hierarchies, and then we calculate the entropy value of the root node in each class hierarchy. Finally, we construct named entity dictionary with high performance by removing the class hierarchies which have a higher entropy value than threshold. Our experiment results achieved overall F1-measure of 81.12% (precision : 83.94%, recall : 78.48%).

Eliminating Exceptional Subject-Verb Agreement rules in English Quantificational structure (양화사 구문에서의 예외적 주어-동사 수 일치 규칙 소거)

  • Yi, Jae Il
    • Journal of Digital Convergence
    • /
    • v.12 no.12
    • /
    • pp.529-535
    • /
    • 2014
  • This study is to establish the consistency of Subject-Verb agreement in quantifier phrase. Absence of consistency in English grammar is critical to the grammaticality. We focused on the grammar part, specifically, S-V agreement rule in quantifier phrase. We believe the existence of exceptional rules in quantifier S-V structure is not necessary as the basic grammar rule on S-V agreement is sufficient enough and adding exceptional rules just make it more difficult and confusing. We argue specific features indwelt in each quantifier are linked when quantifiers are used pronominally and the ${\pm}$feature plays an important role in quantifier S-V agreement structure. This study shows the solution to eliminate the ungrammaticality in typical English text books by simplifying quantifier S-V agreement to make it solid and systematic.

Design of enhanced communications transport service for multimedia applications (멀티미디어 응용을 위한 향상된 트랜스포트 서비스 설계)

  • 강신각;김대영
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.22 no.10
    • /
    • pp.2150-2164
    • /
    • 1997
  • Two distinct features required in transport service and protocol to meet new multimedia applications requirements are multicast transmission capability and enhanced QoS facility. In this paper, we presents major functions and facilities of enhanced communications transport service definition supporting these requirements. At first, major fundamental issues, that should be handled to design multi-peer communcation service and protocol, are resoved and applied to design service definition. Secondly, we proposed several new functions for enhanced transport service such as heterogeneous TC establishment, invitation capability, TC-ownership transfer, restriction of transmit concurrency. The detils of proposed enhanced transport service are TC creation, TC invitation, TC join, multicase data transfer, pause of data transfer, resulme of data transfer, report of specific status, TC leave, TC termination, TC-ownership transfer, Token transfer services. The proposed transport service was submitted and accepted as a draft text for committee draft of international standard by the international standradization body such as JTC1/SC6 and ITU-T SG 7.

  • PDF

Feature Selection for Anomaly Detection Based on Genetic Algorithm (유전 알고리즘 기반의 비정상 행위 탐지를 위한 특징선택)

  • Seo, Jae-Hyun
    • Journal of the Korea Convergence Society
    • /
    • v.9 no.7
    • /
    • pp.1-7
    • /
    • 2018
  • Feature selection, one of data preprocessing techniques, is one of major research areas in many applications dealing with large dataset. It has been used in pattern recognition, machine learning and data mining, and is now widely applied in a variety of fields such as text classification, image retrieval, intrusion detection and genome analysis. The proposed method is based on a genetic algorithm which is one of meta-heuristic algorithms. There are two methods of finding feature subsets: a filter method and a wrapper method. In this study, we use a wrapper method, which evaluates feature subsets using a real classifier, to find an optimal feature subset. The training dataset used in the experiment has a severe class imbalance and it is difficult to improve classification performance for rare classes. After preprocessing the training dataset with SMOTE, we select features and evaluate them with various machine learning algorithms.

Semantic Video Retrieval Based On User Preference (사용자 선호도를 고려한 의미기반 비디오 검색)

  • Jung, Min-Young;Park, Sung-Han
    • Journal of the Institute of Electronics Engineers of Korea CI
    • /
    • v.46 no.4
    • /
    • pp.127-133
    • /
    • 2009
  • To ensure access to rapidly growing video collection, video indexing is becoming more and more essential. A database for video should be build for fast searching and extracting the accurate features of video information with more complex characteristics. Moreover, video indexing structure supports efficient retrieval of interesting contents to reflect user preferences. In this paper, we propose semantic video retrieval method based on user preference. Unlikely the previous methods do not consider user preferences. Futhermore, the conventional methods show the result as simple text matching for the user's query that does not supports the semantic search. To overcome these limitations, we develop a method for user preference analysis and present a method of video ontology construction for semantic retrieval. The simulation results show that the proposed algorithm performs better than previous methods in terms of semantic video retrieval based on user preferences.

A Study of Development and Implementation of Teaching-Learning Materials for Integrated Education of Mathematics and Biological Science - Focused on Probability in Calculus and Basic Statistics Curriculum - (수학과 생명과학 통합 교수-학습 자료 개발 및 적용 -미적분과 통계 기본의 확률의 뜻과 활용 단원을 중심으로-)

  • Shin, Bo Mi;Ju, Eun Hwa
    • Journal of the Korean School Mathematics Society
    • /
    • v.17 no.4
    • /
    • pp.629-656
    • /
    • 2014
  • This study developed teaching and learning materials for an integrated education program of probability and genetics in the light of connections between mathematics and biological science. It also analysed characteristics of high school students' mathematical activities which appeared while the students took part in lessons where the developed materials were contributed in order to teach them. To achieve the aim, this study firstly specified five details for the development of the materials based on the results of previous research and extracted contents of probability and genetics which had the possibility of being taught in the integrated education program by examining the text books. After embodying the teaching materials according to the five details and the extracted contents, the researchers implemented 10 lessons by using the materials. This study elaborated some implications for a succeeding integrated education of mathematics and biological science in term of anlaysis results of features from the students' mathematical understanding and attitudes emerging in the lessons.

  • PDF