• 제목/요약/키워드: Letters and Words

검색결과 109건 처리시간 0.027초

병렬말뭉치를 이용한 대체어 자동 추출 방법 (Automatic Extraction of Alternative Words using Parallel Corpus)

  • 백종범;이수원
    • 한국정보과학회논문지:컴퓨팅의 실제 및 레터
    • /
    • 제16권12호
    • /
    • pp.1254-1258
    • /
    • 2010
  • 정보 검색에 있어서 통일 객체를 다양한 표기로 기술하는 문제는 시스템의 성능을 저하시키는 요인이 된다. 본 연구에서는 이러한 문제를 해결하기 위하여 특허 정보의 국/영문 제목을 병렬말뭉치로 이용하여 대역어 뭉치를 추출하고, 이를 각 단어의 특징(Feature)으로 이용하여 대체어 목록을 자동 추출하는 방법을 제안한다. 또한 대체어 목록 내에 대체어가 아닌 다수의 연관단어들이 포함되는 문제점을 해결하기 위하여 국문 제목에서 추출한 연관단어 뭉치를 이용하여 대체어 목록 내 연관단어들을 필터링하는 방법을 제안한다. 평가결과에 따르면 본 연구에서 제안한 방법이 기존의 대체어 추출 방법들보다 더 우수한 것으로 나타났다.

사회적 가상놀이에서 나타난 문해 관련 의사소통 및 정보 유형 (Literacy-Related Communication and Information Types in Social Pretend Play)

  • 조은진;배재정
    • 아동학회지
    • /
    • 제20권4호
    • /
    • pp.247-263
    • /
    • 1999
  • Literacy-related communication and information types naturally occurring in the dramatic play area were observed during free play over a 4 week period. Participants were 21 boys and 16 girls enrolled in a kindergarten class in Taegu. Types of literacy-related communication frequently used during social pretend play were Description, Suggestion, Question, and Answer. Negative types of literacy-related communication, such as Threat, Protest, and Warning were rare. Types of frequently occurring literacy information were about letters & words, and literacy functions. These findings were discussed with respect to curricular implications for the classroom.

  • PDF

한글 단어의 음성 인식 처리에 관한 연구 (A Study on Processing of Speech Recognition Korean Words)

  • 남기훈
    • 문화기술의 융합
    • /
    • 제5권4호
    • /
    • pp.407-412
    • /
    • 2019
  • 본 논문에서는 한글 단어 단위의 음성 인식 처리 기술을 제안한다. 음성 인식은 마이크와 같은 센서를 사용하여 얻은 음향학적 신호를 단어나 문장으로 변환시키는 기술이다. 대부분의 외국어들은 음성 인식에 있어서 어려움이 적은 편이다. 그에 반면, 한글의 모음과 받침 자음 구성이어서 음성 합성 시스템으로부터 얻은 문자를 그대로 사용하기에는 부적절하다. 기존 구조의 음성 인식 기술을 개선해야만 보다 정확하게 단어를 인식할 수 있다. 이러한 문제를 해결하기 위해 기존 방식의 음성 인식구조에 새로운 알고리즘을 추가하여 음성 인식률을 높이게 하였다. 먼저 입력된 단어를 전처리 과정을 수행한 후 결과를 토큰 처리한다. 레벤스테인 거리 알고리즘과 해싱 알고리즘에서 처리된 결과 값을 조합한 후 자음 비교 알고리즘을 거쳐 표준 단어를 출력한다. 최종 결과 단어를 표준화 테이블과 비교하여 존재하면 출력하고 존재하지 않으면 테이블에 등록하도록 하였다. 실험 환경은 스마트폰 응용 프로그램을 개발하여 사용하였다. 본 논문에서 제안된 구조는 기존 방식에 비해 인식률의 성능이 표준어는 2%, 방언은 7% 정도 향상되었음을 보였다.

아랍식-말레이문자(Jawi Script) 키보드(Keyboard)에 관한 연구 (A Study on the Keyboard of Jawi Script (Arabic-Malay Script))

  • 강경석
    • 수완나부미
    • /
    • 제3권1호
    • /
    • pp.47-66
    • /
    • 2011
  • Malay society is rooted on the Islamic concept. That Islam influenced every corner of that Malay society which had ever been an edge of the civilizations of the Indus and Ganges. Once the letters of that Hindu religion namely Sanscrit was adopted to this Malay society for the purpose of getting the Malay language, that is, Bahasa Melayu down to the practical literation but in vain. The Sanscrit was too complicated for Malay society to imitate and put it into practice in everyday life because it was totally different type of letters which has many of the similar allographs for a sound. In the end Malay society gave it up and just used the Malay language without using any letters for herself. After a few centuries Islam entered this Malay society with taking Arabic letters. It was not merely influencing Malay cultures, but to the religious life according to wide spread of that Islam. Finally Arabic letters was to the very means that Malay language was written by. It means that Arabic letters had been used for Arabic language in former times, but it became a similar form of letters for a new language which was named as Malay language. This Arabic letters for Arabic language has no problems whereas Arabic letters for Malay language has some of it. Naturally speaking, arabic letters was not designed for any other language but just for Arabic language itself. On account of this, there occurred a few problems in writing Malay consonants, just like p, ng, g, c, ny and v. These 6 letters could never be written down in Arabic letters. Those 6 ones were never known before in trying to pronounce by Arab people. Therefore, Malay society had only to modify a few new forms of letters for these 6 letters which had frequently been found in their own Malay sounds. As a result, pa was derived from fa, nga was derived from ain, ga was derived from kaf, ca was derived from jim, nya was derived from tha or ba, and va was derived from wau itself. Where must these 6 newly modified letters be put on this Arabic keyboard? This is the very core of this working paper. As a matter of course, these 6 letters were put on the place where 6 Arabic signs which were scarecely written in Malay language. Those 6 are found when they are used only in the 'shift-key-using-letters.' These newly designed 6 letters were put instead of the original places of fatha, kasra, damma, sukun, tanween and so on. The main differences between the 2 set of 6 letters are this: 6 in Arabic orginal keyboard are only signs for Arabic letters, on the other hand 6 Malay's are real letters. In others words, 6 newly modified Malay letters were substituted for unused 6 Arabic signs in Malay keyboard. This type of newly designed Malay Jawi Script keyboard is still used in Malaysia, Brunei and some other Malay countries. But this sort of keyboard also needs to go forward to find out another way of keyboard system which is in accordance with the alphabetically ordered keyboard system. It means that alif is going to be typed for A key, and zai shall be typed when Z key is pressed. This keyboard system is called 'Malay Jawi-English Rumi matching keyboard system', even though this system should probably be inconvenient for Malay Jawi experts who are good at Arabic 'alif-ba-ta'order.

  • PDF

저작권자의 로고를 워터 마킹하는 장치 (Watermarking System That Inserts Copyright Holder′s Logo)

  • 남상엽;이천우;김형배;이상원;박인정
    • 대한전자공학회:학술대회논문집
    • /
    • 대한전자공학회 2003년도 하계종합학술대회 논문집 Ⅲ
    • /
    • pp.1487-1490
    • /
    • 2003
  • This paper shows the watermarking system that inserts copyright holder's logo in music file. In other words, a sound file is able to have an image information like a logo or letters. The watermarking system converts a sound file into an image file using spectrogram. In the spectrogram domain, a logo is inserted using spread spectrum. The proposed technique shows that the verification of copyright is better than the method using PN-Sequence.

  • PDF

Using Roots and Patterns to Detect Arabic Verbs without Affixes Removal

  • Abdulmonem Ahmed;Aybaba Hancrliogullari;Ali Riza Tosun
    • International Journal of Computer Science & Network Security
    • /
    • 제23권4호
    • /
    • pp.1-6
    • /
    • 2023
  • Morphological analysis is a branch of natural language processing, is now a rapidly growing field. The fundamental tenet of morphological analysis is that it can establish the roots or stems of words and enable comparison to the original term. Arabic is a highly inflected and derivational language and it has a strong structure. Each root or stem can have a large number of affixes attached to it due to the non-concatenative nature of Arabic morphology, increasing the number of possible inflected words that can be created. Accurate verb recognition and extraction are necessary nearly all issues in well-known study topics include Web Search, Information Retrieval, Machine Translation, Question Answering and so forth. in this work we have designed and implemented an algorithm to detect and recognize Arbic Verbs from Arabic text.The suggested technique was created with "Python" and the "pyqt5" visual package, allowing for quick modification and easy addition of new patterns. We employed 17 alternative patterns to represent all verbs in terms of singular, plural, masculine, and feminine pronouns as well as past, present, and imperative verb tenses. All of the verbs that matched these patterns were used when a verb has a root, and the outcomes were reliable. The approach is able to recognize all verbs with the same structure without requiring any alterations to the code or design. The verbs that are not recognized by our method have no antecedents in the Arabic roots. According to our work, the strategy can rapidly and precisely identify verbs with roots, but it cannot be used to identify verbs that are not in the Arabic language. We advise employing a hybrid approach that combines many principles as a result.

한글 낱말의 처리 단위 (The Processing Unit in Korean Words)

  • 이준석;김경린
    • 인지과학
    • /
    • 제1권2호
    • /
    • pp.221-239
    • /
    • 1989
  • 한글 낱말의 처리단의를 검증하기 위해 3개의 실험을 실시 하였다.예비 실험과 실험1은 한음절 글자, 실험 2는 2음절 이상 글자에서의 처리단위를 밝혀보고자 하였다.예비실험에서,자음유형효과는 통계적으로 유의미하지 않았으나 낱말 위치 효과는 유의미했다.Newman-Keuls 검증결과 초성조건과 중성조건간 차이는 유의미하지 않았으나 중성조건과 중성조건간의 차이는 유의미했다.실험 1에서는 낱자수가 증가함에 따라 반응시간도 증가했다.낱말 위치 효과는 예비실험과 동일했다.실험 2에서는 종성유무와는 관계없이 음절이 증가함에 따라 반응시간이 증가했다.본 연구의 시사점은 다음과 같다:(1)한 음절의 글자에서는 초성과 종성으로만 구성된 음절을 단위로 정보처리가 이루어지나 (2) 두 음절이상의 글자에서는 종성이 포함된 음절을 단위로 정보처리가 이루어진다.

데이터베이스의 사용문자(使用文字) 및 용어(用語) 표기법(表記法) (The Description Rule of Terms and Characters in Databases)

  • 김태중;이창한
    • 정보관리연구
    • /
    • 제19권1호
    • /
    • pp.95-122
    • /
    • 1988
  • 지금까지 데이터베이스를 만들 때에 일정한 표기기준(表記基準)이 없어서 정보교환(情報交換)이 불가능하였으며, 검색(檢索)에 있어서도 어려움이 있었다. 이 자료(資料)에서는 검색(檢索)을 위한 데이터베이스를 제작(製作)할 때에 사용하는 기호(記號)와 용어(用語)의 표기법(表記法)을 제시하고 있다. 학술논문(學術論文)을 기술(記述)할 때에 사용되고 있는 기호(記號)와 문자(文字) 가운데 컴퓨터 단말기(端末機)를 통하여 입력(入力) 및 검색(檢索)이 곤란한 경우가 있으므로 이러한 기호(記號)와 문자(文字)를 입력(入力) 검색(檢索)이 가능한 형태(形態)로 표현하는 방법(方法)을 고안했으며, 문교부(文敎部)가 고시한 "한글 맞춤법"과 "외래어 표기법"을 검토하여 2가지 이상으로 표기(表記)가 가능한 부분에 대해 검색효율(檢索效率)이 높아지는 쪽을 택해 일정하게 용어(用語)를 표기(表記)하도록 규정(規定)했다.

  • PDF

우리말 동철이음어 구별표기안 - IPA, 로마자, 한글표기를 나란히 견주어 -

  • 유만근
    • 대한음성학회지:말소리
    • /
    • 제31_32호
    • /
    • pp.51-82
    • /
    • 1996
  • The purpose of this paper is to gather pairs of heteronyms in Modem Korean and to propose that all of them should be differentiated in both the Hanngul orthography and Romanization as well as in the IPA transcription. More than a quarter of the whole Korean vocabulary consists of words with a long vowel and the number of minimal pairs distinguished only by the chroneme reaches nearly ten thousand (ie. twenty thousand words). It is suggested here that the letter s in Hanngul and the letter 'h' in the Roman alphabet be used to represent the long vowel. Another factor which brings forth lots of heteronyms in Korean is the lacking of enough indication as to non-automatic reinforcement in the initial consonant o( a word (or a morpheme) when following another within a phrase (or a word). It is proposed here that the non-automatincally rienforced word-initial consonant should be written with the letter h (like ㅺ, ㅼ, ㅽ, ㅾ) and an apostrophe (like 물'새 or 밭'이랑, 물'약) in Hanngul, and with the letter c and an apostrophe (like c'g-, c'd-, c'b-, c'j- ) in the Roman alphabet The morpheme-initial reinforced consonant within a word is written with the letters k, 1, p and cz for ㅺ, ㅼ, ㅽ, and ㅾ respectively. The contrasted pronunciations of pairs of heteronyms beginning with ㅁ/m sound are transcribed here for exemplification in the IPA, Roman alphabet and Hanngul.

  • PDF

우리말 동철이음어(同綴異音語) IPA.로마자 표기 (사~섬) (Heteronyms in modern Korean and their transcription in the IPA and the Roman alphabet)

  • 유만근
    • 대한음성학회지:말소리
    • /
    • 제37호
    • /
    • pp.49-71
    • /
    • 1999
  • The Purpose of this paper is to gather pairs of heteronyms in modern Korean and transcribe them in the IPA and the Roman alphabet in order to propose that all of them should be differentiated in Hanngul orthography. More than a quarter of the whole Korean vocabulary consists of words with a long vowel and the number of minimal pairs distinguished only by the chroneme reaches nearly ten thousand (i.e. twenty thousand words). The letter h syllable-finally is used here to represent the long vowel in Romanization except the vowel '으‘[?:] which is transcribed by doubling the letter u (i.e. uu). Another factor bringing forth lots of heteronyms in Korean is the lack of full indication as to the non-automatic reinforcement in the initial consonant of a word (or a morpheme) when preceded by another within a phrase (or a word). These reinforced word-initial consonants are written with the letter c and an apostrophe (like c'g- , c'd- , c'b-, c's-, c'j-) in Romanization here. The reinforced morpheme-initial consonant within a word is written with the letters k t, p, ss and cz for ㄲ, ㄸ, ㅃ, ㅆ and ㅉ sounds respectively. The contrasted pronunciations of pairs of heteronyms beginning with ㅅ /s/sup h// and ㅆ /s/ sounds are transcribed here for exemplification.

  • PDF