• Title/Summary/Keyword: Chinese Corpus

Search Result 47, Processing Time 0.026 seconds

Pharmacological Effects of Extract of Bufonis Yenenum (섬수(Bufonis Yenenum) 추출물의 약리작용)

  • 김영훈;정성학;김종학;최재묵;지준환;강재구;박종구;김제학;조희재
    • Biomolecules & Therapeutics
    • /
    • v.9 no.1
    • /
    • pp.51-54
    • /
    • 2001
  • Bufonis Venenum is a toad venom and its main components are bufadienolides, namely resibufogenin, bufalin and cinobufagin. The desensitizing effect of Bufonis Venenum is useful for the treatment of the premature ejaculation in Chinese medicine. But, minor components of Bufonis Venenum cause problems such as topical burring, pain, and erectile dysfunction. To clarify and eliminate the components responsible for these side effects, we prepared two extracts of Bufonis Venenum with either 70% ethanol or ethylacetate and tested their pharmacological effects. The extract of Bufonis Venenum with 70% ethanol produced pain response in rat hind paw, and exhibited contraction of rabbit corpus cavernosal muscle in vitro. On the other hand, the ethylacetate extract did not cause pain and smooth muscle contraction. The desensitizing effect of the ethylacetate extract was similar to that of the 70% ethanol extract. In conclusion, these results show that the extract of Bufonis Venenum with ethylacetate does not have the components causing side effects and deserve further study for therapeutic potential in premature ejaculation in men.

  • PDF

Method for Detecting Errors of Korean-Chinese MT Using Parallel Corpus (병렬 코퍼스를 이용한 한중 기계번역 오류 탐지 방법)

  • Jin, Yun;Kim, Young-Kil
    • Annual Conference on Human and Language Technology
    • /
    • 2008.10a
    • /
    • pp.113-117
    • /
    • 2008
  • 본 논문에서는 패턴기반 자동번역시스템의 효율적인 번역 성능 향상을 위해 병렬 코퍼스(parallel corpus)를 이용한 오류 자동 탐지 방법을 제안하고자 한다. 번역시스템에 존재하는 대부분 오류는 크게 지식 오류와 엔진 오류로 나눌 수 있는데 통상 이런 오류는 이중 언어가 가능한 훈련된 언어학자가 대량의 자동번역 된 결과 문장을 읽음으로써 오류를 탐지하고 분석하여 번역 지식을 수정/확장하거나 또는 엔진을 개선하게 된다. 하지만, 이런 작업은 많은 시간과 노력을 필요로 하게 된다. 따라서 본 논문에서는 병렬 코퍼스 중의 목적 언어(Target Language) 문장 즉, 정답 문장과 자동번역 된 결과 문장을 다양한 방법으로 비교하면서 번역시스템에 존재하고 있는 지식 및 엔진 오류를 자동으로 탐지하는 방법을 제안한다. 제안한 방법은 한-중 자동번역시스템에 적용하여 그 정확률과 재현률을 측정하였으며, 자동적으로 오류를 탐지하여 추출 할 수 있음을 증명하였다.

  • PDF

Corpus Based Unrestricted vocabulary Mandarin TTS (코퍼스 기반 무제한 단어 중국어 TTS)

  • Yu Zheng;Ha Ju-Hong;Kim Byeongchang;Lee Gary Geunbae
    • Proceedings of the KSPS conference
    • /
    • 2003.10a
    • /
    • pp.175-179
    • /
    • 2003
  • In order to produce a high quality (intelligibility and naturalness) synthesized speech, it is very important to get an accurate grapheme-to-phoneme conversion and prosody model. In this paper, we analyzed Chinese texts using a segmentation, POS tagging and unknown word recognition. We present a grapheme-to-phoneme conversion using a dictionary-based and rule-based method. We constructed a prosody model using a probabilistic method and a decision tree-based error correction method. According to the result from the above analysis, we can successfully select and concatenate exact synthesis unit of syllables from the Chinese Synthesis DB.

  • PDF

Affinity and Variety between Words in the Framework of Hypernetwork (하이퍼네트워크에서 본 단어간 긴밀성과 다양성)

  • Kim, Joon-Shik;Park, Chan-Hoon;Lee, Eun-Seok;Zhang, Byoung-Tak
    • Journal of KIISE:Computer Systems and Theory
    • /
    • v.35 no.4
    • /
    • pp.166-171
    • /
    • 2008
  • We studied the variety and affinity between the successive words in the text document A number of groups were defined by the frequency of a following word in the whole text (corpus). In the previous studies, the Zipf's power law was explained by Chinese restaurant process and hub node was searched after by examining the edge number profile in scale free network. We have observed both a power law and a hub profile at the same time by studying the conditional frequency and degeneracy of a group. A symmetry between the affinity and the variety between words were found during the data analysis. And this phenomenon can be explained within a viewpoint of "exploitation and exploration." We also remark on a small symmetry breaking phenomenon in TIPSTER data.

A Study on the English Translations of Shanghanlun (Treatise on Cold Damage) and the Cold Pathogen Chapter of Donguibogam (『상한론(傷寒論)』 영역본과 『동의보감(東醫寶鑑)』 영역본 잡병편 '한(寒)'문의 비교 연구)

  • Kim, Do-Hoon;Kim, Dong-Ryul;Jung, Ji-Hun
    • The Journal of Korean Medical History
    • /
    • v.30 no.1
    • /
    • pp.33-41
    • /
    • 2017
  • This study utilized Corpus-based Analysis process to compare the Cold Pathogen chapter in the 'English version of "Donguibogam"' to the 'English version of the "Shanghanlun"' translated by 罗希文 (Luo xi wen). Results of the linguistic analysis indicate that TTR, a ratio of number of types to number of tokens in the English version of "Shanghanlun" was 5.92% while TTR in the Cold pathogen chapter of English version of "Donguibogam" was 6.01%. It was also noted that the types of words frequently appearing in the two publications were the scientific name of medicinal herbs; the method of producing the herbal prescription (including terminology representing weights and measures); and Chinese descriptions of concepts considered important in both Korean and Chinese medicinal practices. Finally, it was possible to find points of comparison in naming of symptoms, diagnosis, prescriptions, and respective names of six meridians. Though the language difference is minimal, the vocabulary found in the Cold Pathogen chapter of "Donguibogam" was more diverse than Luo's translation of "Sanghanlun". In general, literal translation in keeping with the sense of original text was better performed in Luo's translation of the "Sanghanlun" whereas the English version of the Cold Pathogen chapter in the "Donguibogam" was more of a "free" translation.

An Analysis on the Vocabulary in the English-Translation Version of Donguibogam Using the Corpus-based Analysis (코퍼스 분석방법을 이용한 『동의보감(東醫寶鑑)』 영역본의 어휘 분석)

  • Jung, Ji-Hun;Kim, Dong-Ryul;Kim, Do-Hoon
    • The Journal of Korean Medical History
    • /
    • v.28 no.2
    • /
    • pp.37-45
    • /
    • 2015
  • Objectives : A quantitative analysis on the vocabulary in the English translation version of Donguibogam. Methods : This study quantitatively analyzed the English-translated texts of Donguibogam with the Corpus-based analysis, and compared the quantitative results analyzing the texts of original Donguibogam. Results : As the results from conducting the corpus analysis on the English-translation version of Donguibogam, it was found that the number of total words (Token) was about 1,207,376, and the all types of used words were about 20.495 and the TTR (Type/Token Rate) was 1.69. The accumulation rate reaching to the high-ranking 1000 words was 83.54%, and the accumulation rate reaching to the high-ranking 2000 words was 90.82%. As the words having the high-ranking frequency, the function words like 'the, and of, is' mainly appeared, and for the content words, the words like 'randix, qi, rhizoma and water' were appeared in multi frequencies. As the results from comparing them with the corpus analysis results of original version of Donguibogam, it was found that the TTR was higher in the English translation version than that of original version. The compositions of function words and contents words having high-ranking frequencies were similar between the English translation version and the original version of Donguibogam. The both versions were also similar in that their statements in the parts of 'Remedies' and 'Acupuncture' showed higher composition rate of contents words than the rate of function words. Conclusions : The vocabulary in the English translation version of Donguibogam showed that this book was a book keeping the complete form of sentence and an Korean medical book at the same time. Meanwhile, the English translation version of Donguibogam had some problems like the unification of vocabulary due to several translators, and the incomplete delivery of word's meanings from the Chinese character-culture area to the English-culture area, and these problems are considered as the matters to be considered in a work translating Korean old medical books in English.

Acoustic analysis of English lexical stress produced by Korean, Japanese and Taiwanese-Chinese speakers

  • Jung, Ye-Jee;Rhee, Seok-Chae
    • Phonetics and Speech Sciences
    • /
    • v.10 no.1
    • /
    • pp.15-22
    • /
    • 2018
  • Stressed vowels in English are usually produced using longer duration, higher pitch, and greater intensity than unstressed vowels. However, many English as a foreign language (EFL) learners have difficulty producing English lexical stress because their mother tongues do not have such features. In order to investigate if certain non-native English speakers (Korean, Japanese, and Taiwanese-Chinese native speakers) are able to produce English lexical stress in a native-like manner, speech samples were extracted from the L2 learners' corpus known as AESOP (the Asian English Speech cOrpus Project). Sixteen disyllabic words were analyzed in terms of the ratio of duration, pitch, and intensity. The results demonstrate that non-native English speakers are able to produce English stress in a similar way to native English speakers, and all speakers (both native and non-native) show a tendency to use duration as the strongest cue in producing stress. The results also show that the duration ratio of native English speakers was significantly higher than that of non-native speakers, indicating that native speakers produce a bigger difference in duration between stressed and unstressed vowels.

CNN-based Skip-Gram Method for Improving Classification Accuracy of Chinese Text

  • Xu, Wenhua;Huang, Hao;Zhang, Jie;Gu, Hao;Yang, Jie;Gui, Guan
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.13 no.12
    • /
    • pp.6080-6096
    • /
    • 2019
  • Text classification is one of the fundamental techniques in natural language processing. Numerous studies are based on text classification, such as news subject classification, question answering system classification, and movie review classification. Traditional text classification methods are used to extract features and then classify them. However, traditional methods are too complex to operate, and their accuracy is not sufficiently high. Recently, convolutional neural network (CNN) based one-hot method has been proposed in text classification to solve this problem. In this paper, we propose an improved method using CNN based skip-gram method for Chinese text classification and it conducts in Sogou news corpus. Experimental results indicate that CNN with the skip-gram model performs more efficiently than CNN-based one-hot method.

Identification of Chinese Event Types Based on Local Feature Selection and Explicit Positive & Negative Feature Combination

  • Tan, Hongye;Zhao, Tiejun;Wang, Haochang;Hong, Wan-Pyo
    • Journal of information and communication convergence engineering
    • /
    • v.5 no.3
    • /
    • pp.233-238
    • /
    • 2007
  • An approach to identify Chinese event types is proposed in this paper which combines a good feature selection policy and a Maximum Entropy (ME) model. The approach not only effectively alleviates the problem that classifier performs poorly on the small and difficult types, but improve overall performance. Experiments on the ACE2005 corpus show that performance is satisfying with the 83.5% macro - average F measure. The main characters and ideas of the approach are: (1) Optimal feature set is built for each type according to local feature selection, which fully ensures the performance of each type. (2) Positive and negative features are explicitly discriminated and combined by using one - sided metrics, which makes use of both features' advantages. (3) Wrapper methods are used to search new features and evaluate the various feature subsets to obtain the optimal feature subset.

The Use of MSVM and HMM for Sentence Alignment

  • Fattah, Mohamed Abdel
    • Journal of Information Processing Systems
    • /
    • v.8 no.2
    • /
    • pp.301-314
    • /
    • 2012
  • In this paper, two new approaches to align English-Arabic sentences in bilingual parallel corpora based on the Multi-Class Support Vector Machine (MSVM) and the Hidden Markov Model (HMM) classifiers are presented. A feature vector is extracted from the text pair that is under consideration. This vector contains text features such as length, punctuation score, and cognate score values. A set of manually prepared training data was assigned to train the Multi-Class Support Vector Machine and Hidden Markov Model. Another set of data was used for testing. The results of the MSVM and HMM outperform the results of the length based approach. Moreover these new approaches are valid for any language pairs and are quite flexible since the feature vector may contain less, more, or different features, such as a lexical matching feature and Hanzi characters in Japanese-Chinese texts, than the ones used in the current research.