• Title/Summary/Keyword: Corpus-based

Search Result 571, Processing Time 0.044 seconds

Enhancing Performance of Bilingual Lexicon Extraction through Refinement of Pivot-Context Vectors (중간언어 문맥벡터의 정제를 통한 이중언어 사전 구축의 성능개선)

  • Kwon, Hong-Seok;Seo, Hyung-Won;Kim, Jae-Hoon
    • Journal of KIISE:Software and Applications
    • /
    • v.41 no.7
    • /
    • pp.492-500
    • /
    • 2014
  • This paper presents the performance enhancement of automatic bilingual lexicon extraction by using refinement of pivot-context vectors under the standard pivot-based approach, which is very effective method for less-resource language pairs. In this paper, we gradually improve the performance through two different refinements of pivot-context vectors: One is to filter out unhelpful elements of the pivot-context vectors and to revise the values of the vectors through bidirectional translation probabilities estimated by Anymalign and another one is to remove non-noun elements from the original vectors. In this paper, experiments have been conducted on two different language pairs that are bi-directional Korean-Spanish and Korean-French, respectively. The experimental results have demonstrated that our method for high-frequency words shows at least 48.5% at the top 1 and up to 88.5% at the top 20 and for the low-frequency words at least 43.3% at the top 1 and up to 48.9% at the top 20.

Differentiation of Human Adult Adipose Derived Stem Cell in vitro and Immunohistochemical Study of Adipose Derived Stem Cell after Intracerebral Transplantation in Rats

  • Ko, Kwang-Seok;Lee, Il-Woo;Joo, Won-Il;Lee, Kyung-Jun;Park, Hae-Kwan;Rha, Hyung-Keun
    • Journal of Korean Neurosurgical Society
    • /
    • v.42 no.2
    • /
    • pp.118-124
    • /
    • 2007
  • Objective : Adipose tissue is derived from the embryonic mesoderm and contains a heterogenous stromal cell population. Authors have tried to verify the characteristics of stem cell of adipose derived stromal cells (ADSCs) and to investigate immunohistochemical findings after transplantation of ADSC into rat brain to evaluate survival, migration and differentiation of transplanted stromal cells. Methods : First ADSCs were isolated from human adipose tissue and induced adipose, osseous and neuronal differentiation under appropriate culture condition in vitro and examined phenotypes profile of human ADSCs in undifferentiated states using flow cytometry and immunohistochemical study. Human ADSCs were transplanted into the healthy rat brain to investigate survival, migration and differentiation after 4 weeks. Results : From human adipose tissue, adipose stem cells were harvested and subcultured for several times. The cultured ADSCs were differentiated into adipocytes, osteoctye and neuron-like cell under conditioned media. Flow cytometric analysis of undifferentiated ADSCs revealed that ADSCs were positive for CD29, CD44 and negative for CD34, CD45, CD117 and HLA-DR. Transplanted human ADSCs were found mainly in cortex adjacent to injection site and migrated from injection site at a distance of at least 1 mm along the cortex and corpus callosum. A few transplanted cells have differentiated into neuron and astrocyte. Conclusion : ADSCs were differentiated into multilineage cell lines through transdifferentiation. ADSCs were survived and migrated in xenograft without immunosuppression. Based on this data, ADSCs may be potential source of stem cells for many human disease including neurologic disorder.

The Study on Implementation of Crime Terms Classification System for Crime Issues Response

  • Jeong, Inkyu;Yoon, Cheolhee;Kang, Jang Mook
    • International Journal of Advanced Culture Technology
    • /
    • v.8 no.3
    • /
    • pp.61-72
    • /
    • 2020
  • The fear of crime, discussed in the early 1960s in the United States, is a psychological response, such as anxiety or concern about crime, the potential victim of a crime. These anxiety factors lead to the burden of the individual in securing the psychological stability and indirect costs of the crime against the society. Fear of crime is not a good thing, and it is a part that needs to be adjusted so that it cannot be exaggerated and distorted by the policy together with the crime coping and resolution. This is because fear of crime has as much harm as damage caused by criminal act. Eric Pawson has argued that the popular impression of violent crime is not formed because of media reports, but by official statistics. Therefore, the police should watch and analyze news related to fear of crime to reduce the social cost of fear of crime and prepare a preemptive response policy before the people have 'fear of crime'. In this paper, we propose a deep - based news classification system that helps police cope with crimes related to crimes reported in the media efficiently and quickly and precisely. The goal is to establish a system that can quickly identify changes in security issues that are rapidly increasing by categorizing news related to crime among news articles. To construct the system, crime data was learned so that news could be classified according to the type of crime. Deep learning was applied by using Google tensor flow. In the future, it is necessary to continue research on the importance of keyword according to early detection of issues that are rapidly increasing by crime type and the power of the press, and it is also necessary to constantly supplement crime related corpus.

Development of a Sizing System of Women's Fitness Wear for the Senior Population in South Korea (한국 노인 여성을 위한 피트니스 압박웨어 치수 개발)

  • Jeon, Eun-Jin;Lee, Won-sup;Park, Jang-Woon;You, Hee-Cheon
    • Fashion & Textile Research Journal
    • /
    • v.20 no.4
    • /
    • pp.464-473
    • /
    • 2018
  • The objective of this study is to develop a sizing system of fitness clothing that can properly accommodate various body sizes of Korean senior women. The sizing system of upper and lower fitness clothing was developed in the present study by selection of key variables, identification of size category candidates, and determination of an optimal sizing system. First, key anthropometric dimensions (stature and bust circumference for upper clothing and stature; waist circumference for lower clothing) were identified by factor analysis on the direct body measurements (n = 272) and 3D whole-body scan data (n = 271) of Korean senior women in Size Korea. Second, sizing system candidates based on the key dimensions of upper and lower clothing were explored using a grid method and an optimization method. Lastly, among the sizing system candidates, optimal sizing systems of upper and lower clothing were selected in terms of accommodation rate. Five size categories (short/small, short/medium, tall/small, tall/medium, and tall/large) were selected as the optimal sizing systems of upper and lower clothing with 89% and 78% of accommodation rate, respectively, for the Korean senior women. The anthropometric characteristics of the representative humans of the optimal size categories would be of use in the design of fitness compressive wear for the better fit and effectiveness of exercise and health of Korean senior women.

Definition and Extraction of Causal Relations for Question-Answering on Fault-Diagnosis of Electronic Devices (전자장비 고장진단 질의응답을 위한 인과관계 정의 및 추출)

  • Lee, Sheen-Mok;Shin, Ji-Ae
    • Journal of KIISE:Software and Applications
    • /
    • v.35 no.5
    • /
    • pp.335-346
    • /
    • 2008
  • Causal relations in ontology should be defined based on the inference types necessary to solve problems specific to application as well as domain. In this paper, we present a model to define and extract causal relations for application ontology for Question-Answering (QA) on fault-diagnosis of electronic devices. Causal categories are defined by analyzing generic patterns of QA application; the relations between concepts in the corpus belonging to the causal categories are defined as causal relations. Instances of casual relations are extracted using lexical patterns in the concept definitions of domain, and extended incrementally with information from thesaurus. On the evaluation by domain specialists, our model shows precision of 92.3% in classification of relations and precision of 80.7% in identifying causal relations at the extraction phase.

Intra-Sentence Segmentation using Maximum Entropy Model for Efficient Parsing of English Sentences (효율적인 영어 구문 분석을 위한 최대 엔트로피 모델에 의한 문장 분할)

  • Kim Sung-Dong
    • Journal of KIISE:Software and Applications
    • /
    • v.32 no.5
    • /
    • pp.385-395
    • /
    • 2005
  • Long sentence analysis has been a critical problem in machine translation because of high complexity. The methods of intra-sentence segmentation have been proposed to reduce parsing complexity. This paper presents the intra-sentence segmentation method based on maximum entropy probability model to increase the coverage and accuracy of the segmentation. We construct the rules for choosing candidate segmentation positions by a teaming method using the lexical context of the words tagged as segmentation position. We also generate the model that gives probability value to each candidate segmentation positions. The lexical contexts are extracted from the corpus tagged with segmentation positions and are incorporated into the probability model. We construct training data using the sentences from Wall Street Journal and experiment the intra-sentence segmentation on the sentences from four different domains. The experiments show about $88\%$ accuracy and about $98\%$ coverage of the segmentation. Also, the proposed method results in parsing efficiency improvement by 4.8 times in speed and 3.6 times in space.

Automatic Construction of a Negative/positive Corpus and Emotional Classification using the Internet Emotional Sign (인터넷 감정기호를 이용한 긍정/부정 말뭉치 구축 및 감정분류 자동화)

  • Jang, Kyoungae;Park, Sanghyun;Kim, Woo-Je
    • Journal of KIISE
    • /
    • v.42 no.4
    • /
    • pp.512-521
    • /
    • 2015
  • Internet users purchase goods on the Internet and express their positive or negative emotions of the goods in product reviews. Analysis of the product reviews become critical data to both potential consumers and to the decision making of enterprises. Therefore, the importance of opinion mining techniques which derive opinions by analyzing meaningful data from large numbers of Internet reviews. Existing studies were mostly based on comments written in English, yet analysis in Korean has not actively been done. Unlike English, Korean has characteristics of complex adjectives and suffixes. Existing studies did not consider the characteristics of the Internet language. This study proposes an emotional classification method which increases the accuracy of emotional classification by analyzing the characteristics of the Internet language connoting feelings. We can classify positive and negative comments about products automatically using the Internet emoticon. Also we can check the validity of the proposed algorithm through the result of high precision, recall and coverage for the evaluation of this method.

Knowledge-poor Term Translation using Common Base Axis with application to Korean-English Cross-Language Information Retrieval (과도한 지식을 요구하지 않는 공통기반축에 의한 용어 번역과 한영 교차정보검색에의 응용)

  • 최용석;최기선
    • Korean Journal of Cognitive Science
    • /
    • v.14 no.1
    • /
    • pp.29-40
    • /
    • 2003
  • Cross-Language Information Retrieval (CLIR) deals with the documents in various languages by one language query. A user who uses one language can retrieve the documents in another language through CLIR system. In CLIR, query translation method is known to be more efficient. For the better performance of query translation, we need more resources like dictionary, ontology, and parallel/comparable corpus but usually not available. This paper proposes a new concept called the Common Base Axis which is adapted to Korean-English Query translation ann a new weighting method in dictionary based query translation. The essential idea is that we can express Korean and English word in one vector space by Common Base Axis and use it in calculating sense distance for query weighting. The experiments show that Common Base Axis gives us good performance without ontology and is especially good for one word query translation.

  • PDF

Term Clustering and Duplicate Distribution for Efficient Parallel Information Retrieval (효율적인 병렬정보검색을 위한 색인어 군집화 및 분산저장 기법)

  • 강재호;양재완;정성원;류광렬;권혁철;정상화
    • Journal of KIISE:Software and Applications
    • /
    • v.30 no.1_2
    • /
    • pp.129-139
    • /
    • 2003
  • The PC cluster architecture is considered as a cost-effective alternative to the existing supercomputers for realizing a high-performance information retrieval (IR) system. To implement an efficient IR system on a PC cluster, it is essential to achieve maximum parallelism by having the data appropriately distributed to the local hard disks of the PCs in such a way that the disk I/O and the subsequent computation are distributed as evenly as possible to all the PCs. If the terms in the inverted index file can be classified to closely related clusters, the parallelism can be maximized by distributing them to the PCs in an interleaved manner. One of the goals of this research is the development of methods for automatically clustering the terms based on the likelihood of the terms' co-occurrence in the same query. Also, in this paper, we propose a method for duplicate distribution of inverted index records among the PCs to achieve fault-tolerance as well as dynamic load balancing. Experiments with a large corpus revealed the efficiency and effectiveness of our method.

An Effective Estimation method for Lexical Probabilities in Korean Lexical Disambiguation (한국어 어휘 중의성 해소에서 어휘 확률에 대한 효과적인 평가 방법)

  • Lee, Ha-Gyu
    • The Transactions of the Korea Information Processing Society
    • /
    • v.3 no.6
    • /
    • pp.1588-1597
    • /
    • 1996
  • This paper describes an estimation method for lexical probabilities in Korean lexical disambiguation. In the stochastic to lexical disambiguation lexical probabilities and contextual probabilities are generally estimated on the basis of statistical data extracted form corpora. It is desirable to apply lexical probabilities in terms of word phrases for Korean because sentences are spaced in the unit of word phrase. However, Korean word phrases are so multiform that there are more or less chances that lexical probabilities cannot be estimated directly in terms of word phrases though fairly large corpora are used. To overcome this problem, similarity for word phrases is defined from the lexical analysis point of view in this research and an estimation method for Korean lexical probabilities based on the similarity is proposed. In this method, when a lexical probability for a word phrase cannot be estimated directly, it is estimated indirectly through the word phrase similar to the given one. Experimental results show that the proposed approach is effective for Korean lexical disambiguation.

  • PDF