• Title/Summary/Keyword: Language-Independent Model

Search Result 83, Processing Time 0.031 seconds

Korean Word Sense Disambiguation using Dictionary and Corpus (사전과 말뭉치를 이용한 한국어 단어 중의성 해소)

  • Jeong, Hanjo;Park, Byeonghwa
    • Journal of Intelligence and Information Systems
    • /
    • v.21 no.1
    • /
    • pp.1-13
    • /
    • 2015
  • As opinion mining in big data applications has been highlighted, a lot of research on unstructured data has made. Lots of social media on the Internet generate unstructured or semi-structured data every second and they are often made by natural or human languages we use in daily life. Many words in human languages have multiple meanings or senses. In this result, it is very difficult for computers to extract useful information from these datasets. Traditional web search engines are usually based on keyword search, resulting in incorrect search results which are far from users' intentions. Even though a lot of progress in enhancing the performance of search engines has made over the last years in order to provide users with appropriate results, there is still so much to improve it. Word sense disambiguation can play a very important role in dealing with natural language processing and is considered as one of the most difficult problems in this area. Major approaches to word sense disambiguation can be classified as knowledge-base, supervised corpus-based, and unsupervised corpus-based approaches. This paper presents a method which automatically generates a corpus for word sense disambiguation by taking advantage of examples in existing dictionaries and avoids expensive sense tagging processes. It experiments the effectiveness of the method based on Naïve Bayes Model, which is one of supervised learning algorithms, by using Korean standard unabridged dictionary and Sejong Corpus. Korean standard unabridged dictionary has approximately 57,000 sentences. Sejong Corpus has about 790,000 sentences tagged with part-of-speech and senses all together. For the experiment of this study, Korean standard unabridged dictionary and Sejong Corpus were experimented as a combination and separate entities using cross validation. Only nouns, target subjects in word sense disambiguation, were selected. 93,522 word senses among 265,655 nouns and 56,914 sentences from related proverbs and examples were additionally combined in the corpus. Sejong Corpus was easily merged with Korean standard unabridged dictionary because Sejong Corpus was tagged based on sense indices defined by Korean standard unabridged dictionary. Sense vectors were formed after the merged corpus was created. Terms used in creating sense vectors were added in the named entity dictionary of Korean morphological analyzer. By using the extended named entity dictionary, term vectors were extracted from the input sentences and then term vectors for the sentences were created. Given the extracted term vector and the sense vector model made during the pre-processing stage, the sense-tagged terms were determined by the vector space model based word sense disambiguation. In addition, this study shows the effectiveness of merged corpus from examples in Korean standard unabridged dictionary and Sejong Corpus. The experiment shows the better results in precision and recall are found with the merged corpus. This study suggests it can practically enhance the performance of internet search engines and help us to understand more accurate meaning of a sentence in natural language processing pertinent to search engines, opinion mining, and text mining. Naïve Bayes classifier used in this study represents a supervised learning algorithm and uses Bayes theorem. Naïve Bayes classifier has an assumption that all senses are independent. Even though the assumption of Naïve Bayes classifier is not realistic and ignores the correlation between attributes, Naïve Bayes classifier is widely used because of its simplicity and in practice it is known to be very effective in many applications such as text classification and medical diagnosis. However, further research need to be carried out to consider all possible combinations and/or partial combinations of all senses in a sentence. Also, the effectiveness of word sense disambiguation may be improved if rhetorical structures or morphological dependencies between words are analyzed through syntactic analysis.

Speech Animation Synthesis based on a Korean Co-articulation Model (한국어 동시조음 모델에 기반한 스피치 애니메이션 생성)

  • Jang, Minjung;Jung, Sunjin;Noh, Junyong
    • Journal of the Korea Computer Graphics Society
    • /
    • v.26 no.3
    • /
    • pp.49-59
    • /
    • 2020
  • In this paper, we propose a speech animation synthesis specialized in Korean through a rule-based co-articulation model. Speech animation has been widely used in the cultural industry, such as movies, animations, and games that require natural and realistic motion. Because the technique for audio driven speech animation has been mainly developed for English, however, the animation results for domestic content are often visually very unnatural. For example, dubbing of a voice actor is played with no mouth motion at all or with an unsynchronized looping of simple mouth shapes at best. Although there are language-independent speech animation models, which are not specialized in Korean, they are yet to ensure the quality to be utilized in a domestic content production. Therefore, we propose a natural speech animation synthesis method that reflects the linguistic characteristics of Korean driven by an input audio and text. Reflecting the features that vowels mostly determine the mouth shape in Korean, a coarticulation model separating lips and the tongue has been defined to solve the previous problem of lip distortion and occasional missing of some phoneme characteristics. Our model also reflects the differences in prosodic features for improved dynamics in speech animation. Through user studies, we verify that the proposed model can synthesize natural speech animation.

An On-line Speech and Character Combined Recognition System for Multimodal Interfaces (멀티모달 인터페이스를 위한 음성 및 문자 공용 인식시스템의 구현)

  • 석수영;김민정;김광수;정호열;정현열
    • Journal of Korea Multimedia Society
    • /
    • v.6 no.2
    • /
    • pp.216-223
    • /
    • 2003
  • In this paper, we present SCCRS(Speech and Character Combined Recognition System) for speaker /writer independent. on-line multimodal interfaces. In general, it has been known that the CHMM(Continuous Hidden Markov Mode] ) is very useful method for speech recognition and on-line character recognition, respectively. In the proposed method, the same CHMM is applied to both speech and character recognition, so as to construct a combined system. For such a purpose, 115 CHMM having 3 states and 9 transitions are constructed using MLE(Maximum Likelihood Estimation) algorithm. Different features are extracted for speech and character recognition: MFCC(Mel Frequency Cepstrum Coefficient) Is used for speech in the preprocessing, while position parameter is utilized for cursive character At recognition step, the proposed SCCRS employs OPDP (One Pass Dynamic Programming), so as to be a practical combined recognition system. Experimental results show that the recognition rates for voice phoneme, voice word, cursive character grapheme, and cursive character word are 51.65%, 88.6%, 85.3%, and 85.6%, respectively, when not using any language models. It demonstrates the efficiency of the proposed system.

  • PDF

A Exploratory Study on The Determinants of Youth Facilities Visits (청소년시설이용에 영향을 미치는 요인에 대한 탐색적 연구)

  • Kim, Sin-Young
    • The Journal of the Convergence on Culture Technology
    • /
    • v.9 no.1
    • /
    • pp.129-134
    • /
    • 2023
  • This study purports to investigate potential factors in various levels that affect respondents' use of youth facilities. Those levels include individual, family, and school. The data from 「2021 Youth Survey on Human Right Conditions」 will be analyzed. Hierarchical multiple regression analysis shows several results. First of all, respondents' age and level of human rights related information strongly influence respondents' use of youth facilities. Secondly, the analysis also shows that subjective well -being, abusive language and physical punishment from school faculty, and experience of human rights violation in schools affect the level of respondents' use of youth facilities. The order of effect sizes among significant variables are as follows; respondents' age, level of human rights related information, subjective well -being, abusive language and physical punishment from school faculty, and experience of human rights violation in schools. The independent variables in the model explain roughly 20 percent of whole variation of dependents variable.

Media Literacy Education in the Australian Curriculum: Media Art (호주 국가교육과정 예술과목 'Media Art' 에 나타난 미디어 리터러시 교육)

  • Park, Yoo-Shin
    • Cartoon and Animation Studies
    • /
    • s.48
    • /
    • pp.271-310
    • /
    • 2017
  • This paper examines the composition and the content of media art which is an art education subject in a national curriculum of Australia; and discusses implications for Korean education curriculums. Media covered by Media Art subject in Australia are the multi types of general media including TV, movie, video, newspaper, radio, video game, the internet, and mobile media; and their contents. The purpose of ACARA's media art education curriculum is to improve creative use, knowledge, understanding, and technology of communication techniques for multiple purposes and the audiences. Through the Media Art subject, both the students and the community are able to participate in the actual communications with the rich culture surrounding them and to develop the knowledge and understanding of the 5 core concepts of language, technology, system, audience and re-creation while testing the culture. The implication of this study is as the following. ACARA's media art education curriculum has been developed as an independent educational program and has a special significance within Australian education curriculums. Although ACARA's media art education curriculum is formed as an independent subject, it is suggested within the curriculum to instruct in close connection with other subjects upon execution. Its organization and elaborateness in curriculum composition are very effective in terms of the teacher's teaching-learning design and as well as the evaluation. This seems to show a good model of leading media literacy curriculum. ACARA's media art education curriculum can be a great reference in introducing media literacy to Korean national education curriculums.

광미/광폐석 처리를 위한 고형화 공정 실증 실험

  • Jeon Ji-Hye;Choi Ae-Jeong;Kim In-Su;Lee Min-Hui;Jang Yun-Yeong
    • Proceedings of the Korean Society of Soil and Groundwater Environment Conference
    • /
    • 2006.04a
    • /
    • pp.166-170
    • /
    • 2006
  • 본 연구에서는 폐광산 주변에 산재되어 있는 광미/광폐석을 처리하기 위하여 고형화 실증 실험을 수행하였다. 고형화 공정에서 흔히 사용하는 포틀랜드 시멘트와 MSG-E, MSG-N을 고화제로 사용하였으며 현장 광미 및 광폐석을 대상으로 고화체를 양생하고 고화체의 압축강도 및 중금속 용출 정도를 측정하였다. 고화체의 물리/화학적 특성을 비교하기 위해 광미/고화제 비율, 배합수/고화제 비율 그리고 고화체 양생기간을 실험인자로 설정하였다. 실험 결과 광미/고화제의 비율 1:1 만을 고려하더라도 중금속 용출의 급격한 감소가 이루어지는 것을 확인할 수 있었으며 광미/고화제의 비율을 3:1 이하로 유지시키는 경우, 고화체의 압축강도가 현행 폐기물 관리법(20조 관련)에서 규정하고 있는 차단형 매립시설 내부막의 압축강도 기준인 $0.21kgf/mm^2$ 보다 높은 것으로 나타났다. 다양한 pH를 갖는 수용액에 대하여 시간에 따른 고화체의 중금속 용출률을 측정한 결과, 수용액의 pH가 1과 13인 강산/강염기 용액에서 일부 중금속의 용출 농도가 지하수 생활용수 기준치를 초과하였으나, pH와 3 - 11인 경우에는 중금속 용출률이 급격히 감소하여 모두 기준치 이하를 나타내었다. 또한, pH가 1과 13인 수용액의 경우에도 고화체와 반응하는 시간이 증가할수록 고화체의 buffering 효과에 의해 수용액의 pH가 감소하였다. 이러한 결과는 현장에서 접촉수의 pH가 강산이나 강염기라 하여도, 고화체의 buffering 효과에 의해 시간이 지남에 따라 수용액의 pH가 낮아져 고화체로부터의 중금속 용출은 매우 감소할 것임을 의미한다.ss of an active application defined using the model. The technique is developed in a platform- and language-independent way, and it is algorithmic and can be automated by computer program. We give an example dealing with network auction to illustrate the use of the model and the verification technique.품으로 내부 온도분포를 측정하였으며, 유한차분법 프로그램으로 대류열전달계수를 결정하였다. 대류열전달계수는 792에서 2,107 W/m$^2$로 분석되었다. 대류열전달 계수는 액상식품과의 상대속도가 증가함에 따라서 증가하였고, 점도가 증가함에 따라서는 감소하였다.ce of precision/recall of 90.99%/92.52%, and 93.39%/93.41% respectively. 의한 변성에 부분적으로 보호 작용을 나타 낼 것으로 추정된다.경(製麴72時間頃)의 활성(活性)은 보리쌀국(麴), 밀가루국(麴), 찹쌀국(麴), 고구마국(麴)의 순이었다.험 결과 오전용 사료는 관행적인 산란계 배합사료에서 Ca공급제를 제외한 것을 급여하고, 오후용 사료는 Ca공급제를 3배 첨가한 T2처리로 15:00~16:00시에 교체급여를 하면 사료섭취량 감소와 사료비 절감면에서 바람직할 것으로 사료되며, 고에너지-고단백질-저Ca의 분말사료와 저에너지-저단백질-고Ca의 펠렛사료를 혼합급여하면 산란계의 사료

  • PDF

Design of Translator for generating Secure Java Bytecode from Thread code of Multithreaded Models (다중스레드 모델의 스레드 코드를 안전한 자바 바이트코드로 변환하기 위한 번역기 설계)

  • 김기태;유원희
    • Proceedings of the Korea Society for Industrial Systems Conference
    • /
    • 2002.06a
    • /
    • pp.148-155
    • /
    • 2002
  • Multithreaded models improve the efficiency of parallel systems by combining inner parallelism, asynchronous data availability and the locality of von Neumann model. This model executes thread code which is generated by compiler and of which quality is given by the method of generation. But multithreaded models have the demerit that execution model is restricted to a specific platform. On the contrary, Java has the platform independency, so if we can translate from threads code to Java bytecode, we can use the advantages of multithreaded models in many platforms. Java executes Java bytecode which is intermediate language format for Java virtual machine. Java bytecode plays a role of an intermediate language in translator and Java virtual machine work as back-end in translator. But, Java bytecode which is translated from multithreaded models have the demerit that it is not secure. This paper, multhithread code whose feature of platform independent can execute in java virtual machine. We design and implement translator which translate from thread code of multithreaded code to Java bytecode and which check secure problems from Java bytecode.

  • PDF

KNU Korean Sentiment Lexicon: Bi-LSTM-based Method for Building a Korean Sentiment Lexicon (Bi-LSTM 기반의 한국어 감성사전 구축 방안)

  • Park, Sang-Min;Na, Chul-Won;Choi, Min-Seong;Lee, Da-Hee;On, Byung-Won
    • Journal of Intelligence and Information Systems
    • /
    • v.24 no.4
    • /
    • pp.219-240
    • /
    • 2018
  • Sentiment analysis, which is one of the text mining techniques, is a method for extracting subjective content embedded in text documents. Recently, the sentiment analysis methods have been widely used in many fields. As good examples, data-driven surveys are based on analyzing the subjectivity of text data posted by users and market researches are conducted by analyzing users' review posts to quantify users' reputation on a target product. The basic method of sentiment analysis is to use sentiment dictionary (or lexicon), a list of sentiment vocabularies with positive, neutral, or negative semantics. In general, the meaning of many sentiment words is likely to be different across domains. For example, a sentiment word, 'sad' indicates negative meaning in many fields but a movie. In order to perform accurate sentiment analysis, we need to build the sentiment dictionary for a given domain. However, such a method of building the sentiment lexicon is time-consuming and various sentiment vocabularies are not included without the use of general-purpose sentiment lexicon. In order to address this problem, several studies have been carried out to construct the sentiment lexicon suitable for a specific domain based on 'OPEN HANGUL' and 'SentiWordNet', which are general-purpose sentiment lexicons. However, OPEN HANGUL is no longer being serviced and SentiWordNet does not work well because of language difference in the process of converting Korean word into English word. There are restrictions on the use of such general-purpose sentiment lexicons as seed data for building the sentiment lexicon for a specific domain. In this article, we construct 'KNU Korean Sentiment Lexicon (KNU-KSL)', a new general-purpose Korean sentiment dictionary that is more advanced than existing general-purpose lexicons. The proposed dictionary, which is a list of domain-independent sentiment words such as 'thank you', 'worthy', and 'impressed', is built to quickly construct the sentiment dictionary for a target domain. Especially, it constructs sentiment vocabularies by analyzing the glosses contained in Standard Korean Language Dictionary (SKLD) by the following procedures: First, we propose a sentiment classification model based on Bidirectional Long Short-Term Memory (Bi-LSTM). Second, the proposed deep learning model automatically classifies each of glosses to either positive or negative meaning. Third, positive words and phrases are extracted from the glosses classified as positive meaning, while negative words and phrases are extracted from the glosses classified as negative meaning. Our experimental results show that the average accuracy of the proposed sentiment classification model is up to 89.45%. In addition, the sentiment dictionary is more extended using various external sources including SentiWordNet, SenticNet, Emotional Verbs, and Sentiment Lexicon 0603. Furthermore, we add sentiment information about frequently used coined words and emoticons that are used mainly on the Web. The KNU-KSL contains a total of 14,843 sentiment vocabularies, each of which is one of 1-grams, 2-grams, phrases, and sentence patterns. Unlike existing sentiment dictionaries, it is composed of words that are not affected by particular domains. The recent trend on sentiment analysis is to use deep learning technique without sentiment dictionaries. The importance of developing sentiment dictionaries is declined gradually. However, one of recent studies shows that the words in the sentiment dictionary can be used as features of deep learning models, resulting in the sentiment analysis performed with higher accuracy (Teng, Z., 2016). This result indicates that the sentiment dictionary is used not only for sentiment analysis but also as features of deep learning models for improving accuracy. The proposed dictionary can be used as a basic data for constructing the sentiment lexicon of a particular domain and as features of deep learning models. It is also useful to automatically and quickly build large training sets for deep learning models.

Feasibility of Deep Learning Algorithms for Binary Classification Problems (이진 분류문제에서의 딥러닝 알고리즘의 활용 가능성 평가)

  • Kim, Kitae;Lee, Bomi;Kim, Jong Woo
    • Journal of Intelligence and Information Systems
    • /
    • v.23 no.1
    • /
    • pp.95-108
    • /
    • 2017
  • Recently, AlphaGo which is Bakuk (Go) artificial intelligence program by Google DeepMind, had a huge victory against Lee Sedol. Many people thought that machines would not be able to win a man in Go games because the number of paths to make a one move is more than the number of atoms in the universe unlike chess, but the result was the opposite to what people predicted. After the match, artificial intelligence technology was focused as a core technology of the fourth industrial revolution and attracted attentions from various application domains. Especially, deep learning technique have been attracted as a core artificial intelligence technology used in the AlphaGo algorithm. The deep learning technique is already being applied to many problems. Especially, it shows good performance in image recognition field. In addition, it shows good performance in high dimensional data area such as voice, image and natural language, which was difficult to get good performance using existing machine learning techniques. However, in contrast, it is difficult to find deep leaning researches on traditional business data and structured data analysis. In this study, we tried to find out whether the deep learning techniques have been studied so far can be used not only for the recognition of high dimensional data but also for the binary classification problem of traditional business data analysis such as customer churn analysis, marketing response prediction, and default prediction. And we compare the performance of the deep learning techniques with that of traditional artificial neural network models. The experimental data in the paper is the telemarketing response data of a bank in Portugal. It has input variables such as age, occupation, loan status, and the number of previous telemarketing and has a binary target variable that records whether the customer intends to open an account or not. In this study, to evaluate the possibility of utilization of deep learning algorithms and techniques in binary classification problem, we compared the performance of various models using CNN, LSTM algorithm and dropout, which are widely used algorithms and techniques in deep learning, with that of MLP models which is a traditional artificial neural network model. However, since all the network design alternatives can not be tested due to the nature of the artificial neural network, the experiment was conducted based on restricted settings on the number of hidden layers, the number of neurons in the hidden layer, the number of output data (filters), and the application conditions of the dropout technique. The F1 Score was used to evaluate the performance of models to show how well the models work to classify the interesting class instead of the overall accuracy. The detail methods for applying each deep learning technique in the experiment is as follows. The CNN algorithm is a method that reads adjacent values from a specific value and recognizes the features, but it does not matter how close the distance of each business data field is because each field is usually independent. In this experiment, we set the filter size of the CNN algorithm as the number of fields to learn the whole characteristics of the data at once, and added a hidden layer to make decision based on the additional features. For the model having two LSTM layers, the input direction of the second layer is put in reversed position with first layer in order to reduce the influence from the position of each field. In the case of the dropout technique, we set the neurons to disappear with a probability of 0.5 for each hidden layer. The experimental results show that the predicted model with the highest F1 score was the CNN model using the dropout technique, and the next best model was the MLP model with two hidden layers using the dropout technique. In this study, we were able to get some findings as the experiment had proceeded. First, models using dropout techniques have a slightly more conservative prediction than those without dropout techniques, and it generally shows better performance in classification. Second, CNN models show better classification performance than MLP models. This is interesting because it has shown good performance in binary classification problems which it rarely have been applied to, as well as in the fields where it's effectiveness has been proven. Third, the LSTM algorithm seems to be unsuitable for binary classification problems because the training time is too long compared to the performance improvement. From these results, we can confirm that some of the deep learning algorithms can be applied to solve business binary classification problems.

Understanding the Mismatch between ERP and Organizational Information Needs and Its Responses: A Study based on Organizational Memory Theory (조직의 정보 니즈와 ERP 기능과의 불일치 및 그 대응책에 대한 이해: 조직 메모리 이론을 바탕으로)

  • Jeong, Seung-Ryul;Bae, Uk-Ho
    • Asia pacific journal of information systems
    • /
    • v.22 no.2
    • /
    • pp.21-38
    • /
    • 2012
  • Until recently, successful implementation of ERP systems has been a popular topic among ERP researchers, who have attempted to identify its various contributing factors. None of these efforts, however, explicitly recognize the need to identify disparities that can exist between organizational information requirements and ERP systems. Since ERP systems are in fact "packages" -that is, software programs developed by independent software vendors for sale to organizations that use them-they are designed to meet the general needs of numerous organizations, rather than the unique needs of a particular organization, as is the case with custom-developed software. By adopting standard packages, organizations can substantially reduce many of the potential implementation risks commonly associated with custom-developed software. However, it is also true that the nature of the package itself could be a risk factor as the features and functions of the ERP systems may not completely comply with a particular organization's informational requirements. In this study, based on the organizational memory mismatch perspective that was derived from organizational memory theory and cognitive dissonance theory, we define the nature of disparities, which we call "mismatches," and propose that the mismatch between organizational information requirements and ERP systems is one of the primary determinants in the successful implementation of ERP systems. Furthermore, we suggest that customization efforts as a coping strategy for mismatches can play a significant role in increasing the possibilities of success. In order to examine the contention we propose in this study, we employed a survey-based field study of ERP project team members, resulting in a total of 77 responses. The results of this study show that, as anticipated from the organizational memory mismatch perspective, the mismatch between organizational information requirements and ERP systems makes a significantly negative impact on the implementation success of ERP systems. This finding confirms our hypothesis that the more mismatch there is, the more difficult successful ERP implementation is, and thus requires more attention to be drawn to mismatch as a major failure source in ERP implementation. This study also found that as a coping strategy on mismatch, the effects of customization are significant. In other words, utilizing the appropriate customization method could lead to the implementation success of ERP systems. This is somewhat interesting because it runs counter to the argument of some literature and ERP vendors that minimized customization (or even the lack thereof) is required for successful ERP implementation. In many ERP projects, there is a tendency among ERP developers to adopt default ERP functions without any customization, adhering to the slogan of "the introduction of best practices." However, this study asserts that we cannot expect successful implementation if we don't attempt to customize ERP systems when mismatches exist. For a more detailed analysis, we identified three types of mismatches-Non-ERP, Non-Procedure, and Hybrid. Among these, only Non-ERP mismatches (a situation in which ERP systems cannot support the existing information needs that are currently fulfilled) were found to have a direct influence on the implementation of ERP systems. Neither Non-Procedure nor Hybrid mismatches were found to have significant impact in the ERP context. These findings provide meaningful insights since they could serve as the basis for discussing how the ERP implementation process should be defined and what activities should be included in the implementation process. They show that ERP developers may not want to include organizational (or business processes) changes in the implementation process, suggesting that doing so could lead to failed implementation. And in fact, this suggestion eventually turned out to be true when we found that the application of process customization led to higher possibilities of failure. From these discussions, we are convinced that Non-ERP is the only type of mismatch we need to focus on during the implementation process, implying that organizational changes must be made before, rather than during, the implementation process. Finally, this study found that among the various customization approaches, bolt-on development methods in particular seemed to have significantly positive effects. Interestingly again, this finding is not in the same line of thought as that of the vendors in the ERP industry. The vendors' recommendations are to apply as many best practices as possible, thereby resulting in the minimization of customization and utilization of bolt-on development methods. They particularly advise against changing the source code and rather recommend employing, when necessary, the method of programming additional software code using the computer language of the vendor. As previously stated, however, our study found active customization, especially bolt-on development methods, to have positive effects on ERP, and found source code changes in particular to have the most significant effects. Moreover, our study found programming additional software to be ineffective, suggesting there is much difference between ERP developers and vendors in viewpoints and strategies toward ERP customization. In summary, mismatches are inherent in the ERP implementation context and play an important role in determining its success. Considering the significance of mismatches, this study proposes a new model for successful ERP implementation, developed from the organizational memory mismatch perspective, and provides many insights by empirically confirming the model's usefulness.

  • PDF