• Title/Summary/Keyword: Corpus-based

Search Result 573, Processing Time 0.024 seconds

Semi-supervised domain adaptation using unlabeled data for end-to-end speech recognition (라벨이 없는 데이터를 사용한 종단간 음성인식기의 준교사 방식 도메인 적응)

  • Jeong, Hyeonjae;Goo, Jahyun;Kim, Hoirin
    • Phonetics and Speech Sciences
    • /
    • v.12 no.2
    • /
    • pp.29-37
    • /
    • 2020
  • Recently, the neural network-based deep learning algorithm has dramatically improved performance compared to the classical Gaussian mixture model based hidden Markov model (GMM-HMM) automatic speech recognition (ASR) system. In addition, researches on end-to-end (E2E) speech recognition systems integrating language modeling and decoding processes have been actively conducted to better utilize the advantages of deep learning techniques. In general, E2E ASR systems consist of multiple layers of encoder-decoder structure with attention. Therefore, E2E ASR systems require data with a large amount of speech-text paired data in order to achieve good performance. Obtaining speech-text paired data requires a lot of human labor and time, and is a high barrier to building E2E ASR system. Therefore, there are previous studies that improve the performance of E2E ASR system using relatively small amount of speech-text paired data, but most studies have been conducted by using only speech-only data or text-only data. In this study, we proposed a semi-supervised training method that enables E2E ASR system to perform well in corpus in different domains by using both speech or text only data. The proposed method works effectively by adapting to different domains, showing good performance in the target domain and not degrading much in the source domain.

An Intelligent Marking System based on Semantic Kernel and Korean WordNet (의미커널과 한글 워드넷에 기반한 지능형 채점 시스템)

  • Cho Woojin;Oh Jungseok;Lee Jaeyoung;Kim Yu-Seop
    • The KIPS Transactions:PartA
    • /
    • v.12A no.6 s.96
    • /
    • pp.539-546
    • /
    • 2005
  • Recently, as the number of Internet users are growing explosively, e-learning has been applied spread, as well as remote evaluation of intellectual capacity However, only the multiple choice and/or the objective tests have been applied to the e-learning, because of difficulty of natural language processing. For the intelligent marking of short-essay typed answer papers with rapidness and fairness, this work utilize heterogenous linguistic knowledges. Firstly, we construct the semantic kernel from un tagged corpus. Then the answer papers of students and instructors are transformed into the vector form. Finally, we evaluate the similarity between the papers by using the semantic kernel and decide whether the answer paper is correct or not, based on the similarity values. For the construction of the semantic kernel, we used latent semantic analysis based on the vector space model. Further we try to reduce the problem of information shortage, by integrating Korean Word Net. For the construction of the semantic kernel we collected 38,727 newspaper articles and extracted 75,175 indexed terms. In the experiment, about 0.894 correlation coefficient value, between the marking results from this system and the human instructors, was acquired.

A Study on Building Knowledge Base for Intelligent Battlefield Awareness Service

  • Jo, Se-Hyeon;Kim, Hack-Jun;Jin, So-Yeon;Lee, Woo-Sin
    • Journal of the Korea Society of Computer and Information
    • /
    • v.25 no.4
    • /
    • pp.11-17
    • /
    • 2020
  • In this paper, we propose a method to build a knowledge base based on natural language processing for intelligent battlefield awareness service. The current command and control system manages and utilizes the collected battlefield information and tactical data at a basic level such as registration, storage, and sharing, and information fusion and situation analysis by an analyst is performed. This is an analyst's temporal constraints and cognitive limitations, and generally only one interpretation is drawn, and biased thinking can be reflected. Therefore, it is essential to aware the battlefield situation of the command and control system and to establish the intellignet decision support system. To do this, it is necessary to build a knowledge base specialized in the command and control system and develop intelligent battlefield awareness services based on it. In this paper, among the entity names suggested in the exobrain corpus, which is the private data, the top 250 types of meaningful names were applied and the weapon system entity type was additionally identified to properly represent battlefield information. Based on this, we proposed a way to build a battlefield-aware knowledge base through mention extraction, cross-reference resolution, and relationship extraction.

Research on Development of Support Tools for Local Government Business Transaction Operation Using Big Data Analysis Methodology (빅데이터 분석 방법론을 활용한 지방자치단체 단위과제 운영 지원도구 개발 연구)

  • Kim, Dabeen;Lee, Eunjung;Ryu, Hanjo
    • The Korean Journal of Archival Studies
    • /
    • no.70
    • /
    • pp.85-117
    • /
    • 2021
  • The purpose of this study is to investigate and analyze the current status of unit tasks, unit task operation, and record management problems used by local governments, and to present improvement measures using text-based big data technology based on the implications derived from the process. Local governments are in a serious state of record management operation due to errors in preservation period due to misclassification of unit tasks, inability to identify types of overcommon and institutional affairs, errors in unit tasks, errors in name, referenceable standards, and tools. However, the number of unit tasks is about 720,000, which cannot be effectively controlled due to excessive quantities, and thus strict and controllable tools and standards are needed. In order to solve these problems, this study developed a system that applies text-based analysis tools such as corpus and tokenization technology during big data analysis, and applied them to the names and construction terms constituting the record management standard. These unit task operation support tools are expected to contribute significantly to record management tasks as they can support standard operability such as uniform preservation period, identification of delegated office records, control of duplicate and similar unit task creation, and common tasks. Therefore, if the big data analysis methodology can be linked to BRM and RMS in the future, it is expected that the quality of the record management standard work will increase.

A Study on Knowledge Entity Extraction Method for Individual Stocks Based on Neural Tensor Network (뉴럴 텐서 네트워크 기반 주식 개별종목 지식개체명 추출 방법에 관한 연구)

  • Yang, Yunseok;Lee, Hyun Jun;Oh, Kyong Joo
    • Journal of Intelligence and Information Systems
    • /
    • v.25 no.2
    • /
    • pp.25-38
    • /
    • 2019
  • Selecting high-quality information that meets the interests and needs of users among the overflowing contents is becoming more important as the generation continues. In the flood of information, efforts to reflect the intention of the user in the search result better are being tried, rather than recognizing the information request as a simple string. Also, large IT companies such as Google and Microsoft focus on developing knowledge-based technologies including search engines which provide users with satisfaction and convenience. Especially, the finance is one of the fields expected to have the usefulness and potential of text data analysis because it's constantly generating new information, and the earlier the information is, the more valuable it is. Automatic knowledge extraction can be effective in areas where information flow is vast, such as financial sector, and new information continues to emerge. However, there are several practical difficulties faced by automatic knowledge extraction. First, there are difficulties in making corpus from different fields with same algorithm, and it is difficult to extract good quality triple. Second, it becomes more difficult to produce labeled text data by people if the extent and scope of knowledge increases and patterns are constantly updated. Third, performance evaluation is difficult due to the characteristics of unsupervised learning. Finally, problem definition for automatic knowledge extraction is not easy because of ambiguous conceptual characteristics of knowledge. So, in order to overcome limits described above and improve the semantic performance of stock-related information searching, this study attempts to extract the knowledge entity by using neural tensor network and evaluate the performance of them. Different from other references, the purpose of this study is to extract knowledge entity which is related to individual stock items. Various but relatively simple data processing methods are applied in the presented model to solve the problems of previous researches and to enhance the effectiveness of the model. From these processes, this study has the following three significances. First, A practical and simple automatic knowledge extraction method that can be applied. Second, the possibility of performance evaluation is presented through simple problem definition. Finally, the expressiveness of the knowledge increased by generating input data on a sentence basis without complex morphological analysis. The results of the empirical analysis and objective performance evaluation method are also presented. The empirical study to confirm the usefulness of the presented model, experts' reports about individual 30 stocks which are top 30 items based on frequency of publication from May 30, 2017 to May 21, 2018 are used. the total number of reports are 5,600, and 3,074 reports, which accounts about 55% of the total, is designated as a training set, and other 45% of reports are designated as a testing set. Before constructing the model, all reports of a training set are classified by stocks, and their entities are extracted using named entity recognition tool which is the KKMA. for each stocks, top 100 entities based on appearance frequency are selected, and become vectorized using one-hot encoding. After that, by using neural tensor network, the same number of score functions as stocks are trained. Thus, if a new entity from a testing set appears, we can try to calculate the score by putting it into every single score function, and the stock of the function with the highest score is predicted as the related item with the entity. To evaluate presented models, we confirm prediction power and determining whether the score functions are well constructed by calculating hit ratio for all reports of testing set. As a result of the empirical study, the presented model shows 69.3% hit accuracy for testing set which consists of 2,526 reports. this hit ratio is meaningfully high despite of some constraints for conducting research. Looking at the prediction performance of the model for each stocks, only 3 stocks, which are LG ELECTRONICS, KiaMtr, and Mando, show extremely low performance than average. this result maybe due to the interference effect with other similar items and generation of new knowledge. In this paper, we propose a methodology to find out key entities or their combinations which are necessary to search related information in accordance with the user's investment intention. Graph data is generated by using only the named entity recognition tool and applied to the neural tensor network without learning corpus or word vectors for the field. From the empirical test, we confirm the effectiveness of the presented model as described above. However, there also exist some limits and things to complement. Representatively, the phenomenon that the model performance is especially bad for only some stocks shows the need for further researches. Finally, through the empirical study, we confirmed that the learning method presented in this study can be used for the purpose of matching the new text information semantically with the related stocks.

Evaluation of White Matter Abnormality in Mild Alzheimer Disease and Mild Cognitive Impairment Using Diffusion Tensor Imaging: A Comparison of Tract-Based Spatial Statistics with Voxel-Based Morphometry (확산텐서영상을 이용한 경도의 알츠하이머병 환자와 경도인지장애 환자의 뇌 백질의 이상평가: Tract-Based Spatial Statistics와 화소기반 형태분석 방법의 비교)

  • Lim, Hyun-Kyung;Kim, Sang-Joon;Choi, Choong-Gon;Lee, Jae-Hong;Kim, Seong-Yoon;Kim, Heng-Jun J.;Kim, Nam-Kug;Jahng, Geon-Ho
    • Investigative Magnetic Resonance Imaging
    • /
    • v.16 no.2
    • /
    • pp.115-123
    • /
    • 2012
  • Purpose : To evaluate white matter abnormalities on diffusion tensor imaging (DTI) in patients with mild Alzheimer disease (AD) and mild cognitive impairment (MCI), using tract-based spatial statistics (TBSS) and voxel-based morphometry (VBM). Materials and Methods: DTI was performed in 21 patients with mild AD, in 13 with MCI and in 16 old healthy subjects. A fractional anisotropy (FA) map was generated for each participant and processed for voxel-based comparisons among the three groups using TBSS. For comparison, DTI data was processed using the VBM method, also. Results: TBSS showed that FA was significantly lower in the AD than in the old healthy group in the bilateral anterior and right posterior corona radiata, the posterior thalamic radiation, the right superior longitudinal fasciculus, the body of the corpus callosum, and the right precuneus gyrus. VBM identified additional areas of reduced FA, including both uncinates, the left parahippocampal white matter, and the right cingulum. There were no significant differences in FA between the AD and MCI groups, or between the MCI and old healthy groups. Conclusion: TBSS showed multifocal abnormalities in white matter integrity in patients with AD compared with old healthy group. VBM could detect more white matter lesions than TBSS, but with increased artifacts.

The Significance of Registration Convention and its Future Challenges in Space Law (등록협약의 우주법상 의의와 미래과제에 관한 연구)

  • Kim, Han-Taek
    • The Korean Journal of Air & Space Law and Policy
    • /
    • v.35 no.2
    • /
    • pp.375-402
    • /
    • 2020
  • The adoption and entering into force of the Registration Convention was another achievement in expanding and strengthening the corpus iuris spatialis. It was the fourth treaty negotiated by the member states of the UNCOPUOS and it elaborates further Articles 5 and 8 of the Outer Space Treaty(OST). The Registration Convention also complements and strengthens the Article 11 of the OST, which stipulates an obligation of state parties to inform the UN Secretary-General of the nature, conduct, locations, and results of their space activities in order to promote international cooperation. The prevailing purposes of the Registration Convention is the clarification of "jurisdiction and control" as a comprehensive concept mentioned in Article 5 8 of the OST. In addition to its overriding objective, the Registration Convention also contributes to the promotion and the exploration and use of outer space for peaceful purposes. Establishing and maintaining a public register reduces the possibility of the existence of unidentified space objects and thereby lowers the risk such as, for example, putting the weapons of mass destruction secretly into orbit. And furthermore it could serve for a better space traffic management. The Registration Convention is a treaty established to implement Article 5 of OST for the rescue and return of astronaut in more detail. In this respect, if OST is a general law, the Registration Convention would be said to be in a special law. If two laws conflict the principle of lex specialis will be applied. Countries that have not joined the Registration Convention will have to follow the rules concerning the registration of paragraph 7 of the Declaration by the United Nations General Assembly resolution 1721 (X V I) in 1961. UN Resolution 1721 (XVI) is essentially non-binding, but appears to have evolved into the norm of customary international law requiring all States launching space objects into orbit or beyond to promptly provide information about their launchings for registration to the United Nations. However, the nature and scope of the information to be supplied is left to the discretion of the notifying State. The Registration Convention is a treaty created for compulsory registration of space objects by nations, but in reality it is a treaty that does not deviate from existing practice because it is based on voluntary registration. With the situation of dealing with new problems due to the commercialization and privatization of the space market, issues related to the definition of a 'space object', including matter of the registry state of new state that purchased space objects and space debris matter caused by the suspension of space objects launched by the registry state should be considered as matters when amendments, additional protocols or new Registration Convention are established. Also the question of registration of a flight vehicle in the commercial space market using a space vehicle traveling in a sub-orbital in a short time should be considered.

Target Word Selection Disambiguation using Untagged Text Data in English-Korean Machine Translation (영한 기계 번역에서 미가공 텍스트 데이터를 이용한 대역어 선택 중의성 해소)

  • Kim Yu-Seop;Chang Jeong-Ho
    • The KIPS Transactions:PartB
    • /
    • v.11B no.6
    • /
    • pp.749-758
    • /
    • 2004
  • In this paper, we propose a new method utilizing only raw corpus without additional human effort for disambiguation of target word selection in English-Korean machine translation. We use two data-driven techniques; one is the Latent Semantic Analysis(LSA) and the other the Probabilistic Latent Semantic Analysis(PLSA). These two techniques can represent complex semantic structures in given contexts like text passages. We construct linguistic semantic knowledge by using the two techniques and use the knowledge for target word selection in English-Korean machine translation. For target word selection, we utilize a grammatical relationship stored in a dictionary. We use k- nearest neighbor learning algorithm for the resolution of data sparseness Problem in target word selection and estimate the distance between instances based on these models. In experiments, we use TREC data of AP news for construction of latent semantic space and Wail Street Journal corpus for evaluation of target word selection. Through the Latent Semantic Analysis methods, the accuracy of target word selection has improved over 10% and PLSA has showed better accuracy than LSA method. finally we have showed the relatedness between the accuracy and two important factors ; one is dimensionality of latent space and k value of k-NT learning by using correlation calculation.

Comparison of vowel lengths of articles and monosyllabic nouns in Korean EFL learners' noun phrase production in relation to their English proficiency (한국인 영어학습자의 명사구 발화에서 영어 능숙도에 따른 관사와 단음절 명사 모음 길이 비교)

  • Park, Woojim;Mo, Ranm;Rhee, Seok-Chae
    • Phonetics and Speech Sciences
    • /
    • v.12 no.3
    • /
    • pp.33-40
    • /
    • 2020
  • The purpose of this research was to find out the relation between Korean learners' English proficiency and the ratio of the length of the stressed vowel in a monosyllabic noun to that of the unstressed vowel in an article of the noun phrases (e.g., "a cup", "the bus", etcs.). Generally, the vowels in monosyllabic content words are phonetically more prominent than the ones in monosyllabic function words as the former have phrasal stress, making the vowels in content words longer in length, higher in pitch, and louder in amplitude. This study, based on the speech samples from Korean-Spoken English Corpus (K-SEC) and Rated Korean-Spoken English Corpus (Rated K-SEC), examined 879 English noun phrases, which are composed of an article and a monosyllabic noun, from sentences which are rated on 4 levels of proficiency. The lengths of the vowels in these 879 target NPs were measured and the ratio of the vowel lengths in nouns to those in articles was calculated. It turned out that the higher the proficiency level, the greater the mean ratio of the vowels in nouns to the vowels in articles, confirming the research's hypothesis. This research thus concluded that for the Korean English learners, the higher the English proficiency level, the better they could produce the stressed and unstressed vowels with more conspicuous length differences between them.

The Implementation and limits of Involuntary Detention of the Tuberculosis Prevention Act (결핵예방법의 격리명령의 실행과 한계에 관하여)

  • Kim, Jang Han
    • The Korean Society of Law and Medicine
    • /
    • v.16 no.2
    • /
    • pp.55-84
    • /
    • 2015
  • The tuberculosis is the infectious disease. Generally, the active tuberculosis patient can infect the 10 persons for one year within the daily activities like casual conversation and singing together. The infectivity of tuberculosis can continue for a life time, and infected persons can remain at risk for developing active tuberculosis. To control this contagious disease, along with the active tuberculosis patients, non-infectious but non-compliant patients who can be infectious if their immune systems become impaired have to be managed. To control the non-complaint patients, medical treatment order should be combined with the public order. Because tuberculosis is the risk of community health, the human rights like liberty and freedom of movement can be restricted for public welfare under the article 37(2) of constitution. Even when such restriction is imposed, no essential aspect of the freedom or right shall be violated. The degree of restriction on the rights of citizens is different what methods are chosen to non-complaint patients. For example, under the directly observed therapy program, the patients and medical staffs make an appointment and meet to confirm the drug intakes according to the schedule, which is the medical treatment combined with the mildest public order. If the patients break the appointments or have the history of disobedient, the involuntary detention can obtain the legitimate cause. The Tuberculosis Prevention Act has the two step programs on this involuntary detention, The admission order (Article 15) is issued when the patients are infectious. The quarantine order (Artle 15-2) is issued when the patients are infectious and non-complaint. The legal criteria for involuntary detention are discussed and published through the international conventions and covenants. For example, World Health Organization had made guidance on human rights and involuntary detention for tuberculosis control. The restrictions should be carried out in accordance with the our law and in the legitimate objective of public interest. And the restriction should be based on scientific evidence and not imposed in an unreasonable or discriminatory manner. We define and adopt these international criteria under our constitution and legal system. Least restrictive alternative principle, proportionality principle and the individual evaluation methods are explained through the reviews of United States court decisions. Habeas Corpus Act is reviewed and adopted as the procedural due process to protect the patient rights as a citizen. Along with that, what conditions and facilities which are needed to performed quarantine order are discussed.

  • PDF