• Title/Summary/Keyword: Data-Driven Method

Search Result 537, Processing Time 0.025 seconds

Target Word Selection Disambiguation using Untagged Text Data in English-Korean Machine Translation (영한 기계 번역에서 미가공 텍스트 데이터를 이용한 대역어 선택 중의성 해소)

  • Kim Yu-Seop;Chang Jeong-Ho
    • The KIPS Transactions:PartB
    • /
    • v.11B no.6
    • /
    • pp.749-758
    • /
    • 2004
  • In this paper, we propose a new method utilizing only raw corpus without additional human effort for disambiguation of target word selection in English-Korean machine translation. We use two data-driven techniques; one is the Latent Semantic Analysis(LSA) and the other the Probabilistic Latent Semantic Analysis(PLSA). These two techniques can represent complex semantic structures in given contexts like text passages. We construct linguistic semantic knowledge by using the two techniques and use the knowledge for target word selection in English-Korean machine translation. For target word selection, we utilize a grammatical relationship stored in a dictionary. We use k- nearest neighbor learning algorithm for the resolution of data sparseness Problem in target word selection and estimate the distance between instances based on these models. In experiments, we use TREC data of AP news for construction of latent semantic space and Wail Street Journal corpus for evaluation of target word selection. Through the Latent Semantic Analysis methods, the accuracy of target word selection has improved over 10% and PLSA has showed better accuracy than LSA method. finally we have showed the relatedness between the accuracy and two important factors ; one is dimensionality of latent space and k value of k-NT learning by using correlation calculation.

KNU Korean Sentiment Lexicon: Bi-LSTM-based Method for Building a Korean Sentiment Lexicon (Bi-LSTM 기반의 한국어 감성사전 구축 방안)

  • Park, Sang-Min;Na, Chul-Won;Choi, Min-Seong;Lee, Da-Hee;On, Byung-Won
    • Journal of Intelligence and Information Systems
    • /
    • v.24 no.4
    • /
    • pp.219-240
    • /
    • 2018
  • Sentiment analysis, which is one of the text mining techniques, is a method for extracting subjective content embedded in text documents. Recently, the sentiment analysis methods have been widely used in many fields. As good examples, data-driven surveys are based on analyzing the subjectivity of text data posted by users and market researches are conducted by analyzing users' review posts to quantify users' reputation on a target product. The basic method of sentiment analysis is to use sentiment dictionary (or lexicon), a list of sentiment vocabularies with positive, neutral, or negative semantics. In general, the meaning of many sentiment words is likely to be different across domains. For example, a sentiment word, 'sad' indicates negative meaning in many fields but a movie. In order to perform accurate sentiment analysis, we need to build the sentiment dictionary for a given domain. However, such a method of building the sentiment lexicon is time-consuming and various sentiment vocabularies are not included without the use of general-purpose sentiment lexicon. In order to address this problem, several studies have been carried out to construct the sentiment lexicon suitable for a specific domain based on 'OPEN HANGUL' and 'SentiWordNet', which are general-purpose sentiment lexicons. However, OPEN HANGUL is no longer being serviced and SentiWordNet does not work well because of language difference in the process of converting Korean word into English word. There are restrictions on the use of such general-purpose sentiment lexicons as seed data for building the sentiment lexicon for a specific domain. In this article, we construct 'KNU Korean Sentiment Lexicon (KNU-KSL)', a new general-purpose Korean sentiment dictionary that is more advanced than existing general-purpose lexicons. The proposed dictionary, which is a list of domain-independent sentiment words such as 'thank you', 'worthy', and 'impressed', is built to quickly construct the sentiment dictionary for a target domain. Especially, it constructs sentiment vocabularies by analyzing the glosses contained in Standard Korean Language Dictionary (SKLD) by the following procedures: First, we propose a sentiment classification model based on Bidirectional Long Short-Term Memory (Bi-LSTM). Second, the proposed deep learning model automatically classifies each of glosses to either positive or negative meaning. Third, positive words and phrases are extracted from the glosses classified as positive meaning, while negative words and phrases are extracted from the glosses classified as negative meaning. Our experimental results show that the average accuracy of the proposed sentiment classification model is up to 89.45%. In addition, the sentiment dictionary is more extended using various external sources including SentiWordNet, SenticNet, Emotional Verbs, and Sentiment Lexicon 0603. Furthermore, we add sentiment information about frequently used coined words and emoticons that are used mainly on the Web. The KNU-KSL contains a total of 14,843 sentiment vocabularies, each of which is one of 1-grams, 2-grams, phrases, and sentence patterns. Unlike existing sentiment dictionaries, it is composed of words that are not affected by particular domains. The recent trend on sentiment analysis is to use deep learning technique without sentiment dictionaries. The importance of developing sentiment dictionaries is declined gradually. However, one of recent studies shows that the words in the sentiment dictionary can be used as features of deep learning models, resulting in the sentiment analysis performed with higher accuracy (Teng, Z., 2016). This result indicates that the sentiment dictionary is used not only for sentiment analysis but also as features of deep learning models for improving accuracy. The proposed dictionary can be used as a basic data for constructing the sentiment lexicon of a particular domain and as features of deep learning models. It is also useful to automatically and quickly build large training sets for deep learning models.

A Study on Land Acquisition Priority for Establishing Riparian Buffer Zones in Korea (수변녹지 조성을 위한 토지매수 우선순위 산정 방안 연구)

  • Hong, Jin-Pyo;Lee, Jae-Won;Choi, Ok-Hyun;Son, Ju-Dong;Cho, Dong-Gil;Ahn, Tong-Mahn
    • Journal of the Korean Society of Environmental Restoration Technology
    • /
    • v.17 no.4
    • /
    • pp.29-41
    • /
    • 2014
  • The Korean government has purchased land properties alongside any significant water bodies before setting up the buffers to secure water qualities. Since the annual budgets are limited, however, there has always been the issue of which land parcels ought to be given the priority. Therefore, this study aims to develop efficient mechanism for land acquisition priorities in stream corridors that would ultimately be vegetated for riparian buffer zones. The criteria of land acquisition priority were driven through literary review along with experts' advice. The relative weights of their value and priorities for each criterion were computed using the Analytical Hierarchy Process(AHP) method. Major findings of the study are as follows: 1. The decision-making structural model for land acquisition priority focuses mainly on the reduction of non-point source pollutants(NSPs). This fact is highly associated with natural and physical conditions and land use types of surrounding areas. The criteria were classified into two categories-NSPs runoff areas and potential NSPs runoff areas. 2. Land acquisition priority weights derived for NSPs runoff areas and potential NSPs runoff areas were 0.862 and 0.138, respectively. This implicates that much higher priority should be given to the land parcels with NSPs runoff areas. 3. Weights and priorities of sub-criteria suggested from this study include: proximity to the streams(0.460), land cover(0.189), soil permeability(0.117), topographical slope(0.096), proximity to the roads(0.058), land-use types(0.036), visibility to the streams(0.032), and the land price(0.012). This order of importance suggests, as one can expect, that it is better to purchase land parcels that are adjacent to the streams. 4. A standard scoring system including the criteria and weights for land acquisition priority was developed which would likely to allow expedited decision making and easy quantification for priority evaluation due to the utilization of measurable spatial data. Further studies focusing on both point and non-point pollutants and GIS-based spatial analysis and mapping of land acquisition priority are needed.

Temporal and Spatial Characteristics of Visual and Somatosensory Integration in Normal Adult Brain (정상성인의 시각 및 촉각 통합 작용 시 뇌신경세포의 전기생리적활동의 시간 및 공간적 특성: 예비실험)

  • Ju, Yu-Mi;Kim, Ji-Hyun
    • The Journal of Korean Academy of Sensory Integration
    • /
    • v.8 no.1
    • /
    • pp.41-49
    • /
    • 2010
  • Objective : Multisensory integration (MSI) is the essential process to use diverse sensory information for cognitive task or execution of motor action. Especially, visual and somatosensory integration is critical for motor behavior and coordination. This study was designed to explain spatial and temporal characteristics of visual and somatosensory integration by neurophysiological research method that identifies the time course and brain location of the SI process. Methods : Electroencephalography (EEG) and event-related potential (ERP) is used in this study in order to observe neural activities when integrating visual and tactile input. We calculate the linear summation (SUM) of visual-related potentials (VEPs) and somatosensory-related potentials (SEPs), and compared the SUM with simultaneously presented visual-tactile ERPs(SIM) Results : There were significant differences between the SIM and SUM in later time epochs (about 200-300ms) at contralateral somatosensory areas (C4) and occipital cortices (O1&O2). The amplitude of the SIM was mathematically larger than the summed signals, implying that the integration made some extra neural activities. Conclusion : This study provides some empirical neural evidence of that multisensory integration is more powerful than just combing two unisensory inputs in the brain and ERP data reveals neural signature relating to multisensory integrative process. Since this study is preliminary pilot study, larger population and criteria are needed for level of the significance. Further study is recommended to consider issues including effect of internally-driven attention and laterality of interaction to make the evidence by this study solid.

  • PDF

Mega Flood Simulation Assuming Successive Extreme Rainfall Events (연속적인 극한호우사상의 발생을 가정한 거대홍수모의)

  • Choi, Changhyun;Han, Daegun;Kim, Jungwook;Jung, Jaewon;Kim, Duckhwan;Kim, Hung Soo
    • Journal of Wetlands Research
    • /
    • v.18 no.1
    • /
    • pp.76-83
    • /
    • 2016
  • In recent, the series of extreme storm events were occurred by those continuous typhoons and the severe flood damages due to the loss of life and the destruction of property were involved. In this study, we call Mega flood for the Extreme flood occurred by these successive storm events and so we can have a hypothetical Mega flood by assuming that a extreme event can be successively occurred with a certain time interval. Inter Event Time Definition (IETD) method was used to determine the time interval between continuous events in order to simulate Mega flood. Therefore, the continuous extreme rainfall events are determined with IETD then Mega flood is simulated by the consecutive events : (1) consecutive occurrence of two historical extreme events, (2) consecutive occurrence of two design events obtained by the frequency analysis based on the historical data. We have shown that Mega floods by continuous extreme rainfall events were increased by 6-17% when we compared to typical flood by a single event. We can expect that flood damage caused by Mega flood leads to much greater than damage driven by a single rainfall event. The second increase in the flood caused by heavy rain is not much compared to the first flood caused by heavy rain. But Continuous heavy rain brings the two times of flood damage. Therefore, flood damage caused by the virtual Mega flood of is judged to be very large. Here we used the hypothetical rainfall events which can occur Mega floods and this could be used for preparing for unexpected flood disaster by simulating Mega floods defined in this study.

Submarket Identification in Property Markets: Focusing on a Hedonic Price Model Improvement (부동산 하부시장 구획: 헤도닉 모형의 개선을 중심으로)

  • Lee, Chang Ro;Eum, Young Seob;Park, Key Ho
    • Journal of the Korean Geographical Society
    • /
    • v.49 no.3
    • /
    • pp.405-422
    • /
    • 2014
  • Two important issues in hedonic model are to specify accurate model and delineate submarkets. While the former has experienced much improvement over recent decades, the latter has received relatively little attention. However, the accuracy of estimates from hedonic model will be necessarily reduced when the analysis does not adequately address market segmentation which can capture the spatial scale of price formation process in real estate. Placing emphasis on improvement of performance in hedonic model, this paper tried to segment real estate markets in Gangnam-gu and Jungrang-gu, which correspond to most heterogeneous and homogeneous ones respectively in 25 autonomous districts of Seoul. First, we calculated variable coefficients from mixed geographically weighted regression model (mixed GWR model) as input for clustering, since the coefficient from hedonic model can be interpreted as shadow price of attributes constituting real estate. After that, we developed a spatially constrained data-driven methodology to preserve spatial contiguity by utilizing the SKATER algorithm based on a minimum spanning tree. Finally, the performance of this method was verified by applying a multi-level model. We concluded that submarket does not exist in Jungrang-gu and five submarkets centered on arterial roads would be reasonable in Gangnam-gu. Urban infrastructure such as arterial roads has not been considered an important factor for delineating submarkets until now, but it was found empirically that they play a key role in market segmentation.

  • PDF

Performance of a Closed-Loop Power Control Using a Variable Step-size Control Scheme in a DS/CDMA LEO Mobile Satellite System (DS/CDMA 저궤도 이동 위성 시스템에서 가변 스텝사이즈 조절 방식 폐루프 전력제어의 성능분석)

  • 전동근;이연우;홍선표
    • The Journal of the Acoustical Society of Korea
    • /
    • v.19 no.1
    • /
    • pp.16-24
    • /
    • 2000
  • In this paper the performance of a closed-loop power control scheme using variable step size decision method for DS/CDMA based-low earth orbit(LEO) mobile satellite systems in which the long round trip delay is a dominant performance degradation factor is evaluated. Because there are fundamental differences in the characteristics between the LEO mobile satellite channel and terrestrial mobile channel, such as long round trip delay and different elevation angle, these factors are considered in channel modeling based on the European Space Agency(ESA) measurement data. Since the round trip delay (from the mobile terminal to the gateway station via satellite) is typically 10∼20ms in low altitude satellite channels, closed-loop power control is much less effective than it is on a terrestrial channel. Thus, the adaptive power control scheme using a variable step size control is essential for overcoming the long round trip delay and fading due to the elevation angle. It is shown that the standard deviation of signal to interference ratio(SIR) adopting a variable step size closed-loop power control scheme is much less than that of a fixed step size closed-loop power control. Furthermore, we have driven the conclusion that the measurement interval of power control commands is optimal choice when it is twice the round trip delay.

  • PDF

Axial Load Capacity Prediction of Single Piles in Clay and Sand Layers Using Nonlinear Load Transfer Curves (비선형 하중전이법에 의한 점토 및 모래층에서 파일의 지지력 예측)

  • Kim, Hyeongjoo;Mission, Joseleo;Song, Youngsun;Ban, Jaehong;Baeg, Pilsoon
    • Journal of the Korean GEO-environmental Society
    • /
    • v.9 no.5
    • /
    • pp.45-52
    • /
    • 2008
  • The present study has extended OpenSees, which is an open-source software framework DOS program for developing applications to idealize geotechnical and structural problems, for the static analysis of axial load capacity and settlement of single piles in MS Windows environment. The Windows version of OpenSees as improved by this study has enhanced the DOS version from a general purpose software program to a special purpose program for driven and bored pile analysis with additional features of pre-processing and post-processing and a user friendly graphical interface. The method used in the load capacity analysis is the numerical methods based on load transfer functions combined with finite elements. The use of empirical nonlinear T-z and Q-z load transfer curves to model soil-pile interaction in skin friction and end bearing, respectively, has been shown to capture the nonlinear soil-pile response under settlement due to load. Validation studies have shown the static load capacity and settlement predictions implemented in this study are in fair agreement with reference data from the static loading tests.

  • PDF

Accuracy Analysis of Velocity and Water Depth Measurement in the Straight Channel using ADCP (ADCP를 이용한 직선 하천의 유속 및 수심 측정 정확도 분석)

  • Kim, Jongmin;Kim, Dongsu;Son, Geunsoo;Kim, Seojun
    • Journal of Korea Water Resources Association
    • /
    • v.48 no.5
    • /
    • pp.367-377
    • /
    • 2015
  • ADCPs have been highlighted so far for measuring steramflow discharge in terms of their high-order of accuracy, relatively low cost and less field operators driven by their easy in-situ operation. While ADCPs become increasingly dominant in hydrometric area, their actual measurement accuracy for velocity and bathymetry measurement has not been sufficiently validated due to the lack of reliable bench-mark data, and subsequently there are still many uncertain aspects for using ADCPs in the field. This research aimed at analyzing inter-comparison results between ADCP measurements with respect to the detailed ADV measurement in a specified field environment. Overall, 184 ADV points were collected for densely designed grids for the given cross-section that has 6 m of width, 1 m of depth, and 0.7 m/s of averaged mean flow velocity. Concurrently, ADCP fixed-points measurements were conducted for each 0.2m and 0.02m of horizontal and vertical spacing respectively. The inter-comparison results indicated that ADCP matched ADV velocity very accurately for 0.4~0.8 of relative depth (y/h), but noticeable deviation occurred between them in near surface and bottom region. For evaluating the capacity of measuring bathymetry of ADCPs, bottom tracking bathymetry based on oblique beams showed better performance than vertical beam approach, and similar results were shown for fixed and moving-boat method as well. Error analysis for velocity and bathymetry measurements of ADCP can be potentially able to be utilized for the more detailed uncertainty analysis of the ADCP discharge measurement.

Empirical Mode Decomposition using the Second Derivative (이차 미분을 이용한 경험적 모드분해법)

  • Park, Min-Su;Kim, Donghoh;Oh, Hee-Seok
    • The Korean Journal of Applied Statistics
    • /
    • v.26 no.2
    • /
    • pp.335-347
    • /
    • 2013
  • There are various types of real world signals. For example, an electrocardiogram(ECG) represents myocardium activities (contraction and relaxation) according to the beating of the heart. ECG can be expressed as the fluctuation of ampere ratings over time. A signal is a composite of various types of signals. An orchestra (which boasts a beautiful melody) consists of a variety of instruments with a unique frequency; subsequently, each sound is combined to form a perfect harmony. Various research on how to to decompose mixed stationary signals have been conducted. In the case of non-stationary signals, there is a limitation to use methodologies for stationary signals. Huang et al. (1998) proposed empirical mode decomposition(EMD) to deal with non-stationarity. EMD provides a data-driven approach to decompose a signal into intrinsic mode functions according to local oscillation through the identification of local extrema. However, due to the repeating process in the construction of envelopes, EMD algorithm is not efficient and not robust to a noise, and its computational complexity tends to increase as the size of a signal grows. In this research, we propose a new method to extract a local oscillation embedded in a signal by utilizing the second derivative.