• Title/Summary/Keyword: Models, statistical

Search Result 3,012, Processing Time 0.028 seconds

Latent topics-based product reputation mining (잠재 토픽 기반의 제품 평판 마이닝)

  • Park, Sang-Min;On, Byung-Won
    • Journal of Intelligence and Information Systems
    • /
    • v.23 no.2
    • /
    • pp.39-70
    • /
    • 2017
  • Data-drive analytics techniques have been recently applied to public surveys. Instead of simply gathering survey results or expert opinions to research the preference for a recently launched product, enterprises need a way to collect and analyze various types of online data and then accurately figure out customer preferences. In the main concept of existing data-based survey methods, the sentiment lexicon for a particular domain is first constructed by domain experts who usually judge the positive, neutral, or negative meanings of the frequently used words from the collected text documents. In order to research the preference for a particular product, the existing approach collects (1) review posts, which are related to the product, from several product review web sites; (2) extracts sentences (or phrases) in the collection after the pre-processing step such as stemming and removal of stop words is performed; (3) classifies the polarity (either positive or negative sense) of each sentence (or phrase) based on the sentiment lexicon; and (4) estimates the positive and negative ratios of the product by dividing the total numbers of the positive and negative sentences (or phrases) by the total number of the sentences (or phrases) in the collection. Furthermore, the existing approach automatically finds important sentences (or phrases) including the positive and negative meaning to/against the product. As a motivated example, given a product like Sonata made by Hyundai Motors, customers often want to see the summary note including what positive points are in the 'car design' aspect as well as what negative points are in thesame aspect. They also want to gain more useful information regarding other aspects such as 'car quality', 'car performance', and 'car service.' Such an information will enable customers to make good choice when they attempt to purchase brand-new vehicles. In addition, automobile makers will be able to figure out the preference and positive/negative points for new models on market. In the near future, the weak points of the models will be improved by the sentiment analysis. For this, the existing approach computes the sentiment score of each sentence (or phrase) and then selects top-k sentences (or phrases) with the highest positive and negative scores. However, the existing approach has several shortcomings and is limited to apply to real applications. The main disadvantages of the existing approach is as follows: (1) The main aspects (e.g., car design, quality, performance, and service) to a product (e.g., Hyundai Sonata) are not considered. Through the sentiment analysis without considering aspects, as a result, the summary note including the positive and negative ratios of the product and top-k sentences (or phrases) with the highest sentiment scores in the entire corpus is just reported to customers and car makers. This approach is not enough and main aspects of the target product need to be considered in the sentiment analysis. (2) In general, since the same word has different meanings across different domains, the sentiment lexicon which is proper to each domain needs to be constructed. The efficient way to construct the sentiment lexicon per domain is required because the sentiment lexicon construction is labor intensive and time consuming. To address the above problems, in this article, we propose a novel product reputation mining algorithm that (1) extracts topics hidden in review documents written by customers; (2) mines main aspects based on the extracted topics; (3) measures the positive and negative ratios of the product using the aspects; and (4) presents the digest in which a few important sentences with the positive and negative meanings are listed in each aspect. Unlike the existing approach, using hidden topics makes experts construct the sentimental lexicon easily and quickly. Furthermore, reinforcing topic semantics, we can improve the accuracy of the product reputation mining algorithms more largely than that of the existing approach. In the experiments, we collected large review documents to the domestic vehicles such as K5, SM5, and Avante; measured the positive and negative ratios of the three cars; showed top-k positive and negative summaries per aspect; and conducted statistical analysis. Our experimental results clearly show the effectiveness of the proposed method, compared with the existing method.

Calculation of Surface Broadband Emissivity by Multiple Linear Regression Model (다중선형회귀모형에 의한 지표면 광대역 방출율 산출)

  • Jo, Eun-Su;Lee, Kyu-Tae;Jung, Hyun-Seok;Kim, Bu-Yo;Zo, Il-Sung
    • Journal of the Korean earth science society
    • /
    • v.38 no.4
    • /
    • pp.269-282
    • /
    • 2017
  • In this study, the surface broadband emissivity ($3.0-14.0{\mu}m$) was calculated using the multiple linear regression model with narrow bands (channels 29, 30, and 31) emissivity data of the Moderate Resolution Imaging Spectroradiometer (MODIS) on Earth Observing System Terra satellite. The 307 types of spectral emissivity data (123 soil types, 32 vegetation types, 19 types of water bodies, 43 manmade materials, and 90 rock) with MODIS University of California Santa Barbara emissivity library and Advanced Spaceborne Thermal Emission & Reflection Radiometer spectral library were used as the spectral emissivity data for the derivation and verification of the multiple linear regression model. The derived determination coefficient ($R^2$) of multiple linear regression model had a high value of 0.95 (p<0.001) and the root mean square error between these model calculated and theoretical broadband emissivities was 0.0070. The surface broadband emissivity from our multiple linear regression model was comparable with that by Wang et al. (2005). The root mean square error between surface broadband emissivities calculated by models in this study and by Wang et al. (2005) during January was 0.0054 in Asia, Africa, and Oceania regions. The minimum and maximum differences of surface broadband emissivities between two model results were 0.0027 and 0.0067 respectively. The similar statistical results were also derived for August. The surface broadband emissivities by our multiple linear regression model could thus be acceptable. However, the various regression models according to different land covers need be applied for the more accurate calculation of the surface broadband emissivities.

Application of LCA Methodology on Lettuce Cropping Systems in Protected Cultivation (시설재배 상추에 대한 전과정평가 (LCA) 방법론 적용)

  • Ryu, Jong-Hee;Kim, Kye-Hoon
    • Korean Journal of Soil Science and Fertilizer
    • /
    • v.43 no.5
    • /
    • pp.705-715
    • /
    • 2010
  • The adoption of carbon foot print system is being activated mostly in the developed countries as one of the long-term response towards tightened up regulations and standards on carbon emission in the agricultural sector. The Korean Ministry of Environment excluded the primary agricultural products from the carbon foot print system due to lack of LCI (life cycle inventory) database in agriculture. Therefore, the research on and establishment of LCI database in the agriculture for adoption of carbon foot print system is urgent. Development of LCA (life cycle assessment) methodology for application of LCA to agricultural environment in Korea is also very important. Application of LCA methodology to agricultural environment in Korea is an early stage. Therefore, this study was carried out to find out the effect of lettuce cultivation on agricultural environment by establishing LCA methodology. Data collection of agricultural input and output for establishing LCI was carried out by collecting statistical data and documents on income from agro and livestock products prepared by RDA. LCA methodology for agriculture was reviewed by investigating LCA methodology and LCA applications of foreign countries. Results based on 1 kg of lettuce production showed that inputs including N, P, organic fertilizers, compound fertilizers and crop protectants were the main sources of major emission factor during lettuce cropping process. The amount of inputs considering the amount of active ingredients was required to estimate the actual quantity of the inputs used. Major emissions due to agricultural activities were $N_2O$ (emission to air) and ${NO_3}^-$/${PO_4}^-$ (emission to water) from fertilizers, organic compounds from pesticides and air pollutants from fossil fuel combustion in using agricultural machines. The softwares for LCIA (life cycle impact assessment) and LCA used in Korea are 'PASS' and 'TOTAL' which have been developed by the Ministry of Knowledge Economy and the Ministry of Environment. However, the models used for the softwares are the ones developed in foreign countries. In the future, development of models and optimization of factors for characterization, normalization and weighting suitable to Korean agricultural environment need to be done for more precise LCA analysis in the agricultural area.

Marginal and internal fitness of three-unit zirconia cores fabricated using several CAD/CAM systems (다양한 CAD/CAM 시스템으로 제작된 3 본 고정성 가공의치 지르코니아 코어의 변연 및 내면 적합도 평가)

  • Huh, Jung-Bo;Kim, U-Sic;Kim, Ha-Young;Kim, Jong-Eun;Lee, Jeong-Yeol;Kim, Young-Su;Jeon, Young-Chan;Shin, Sang-Wan
    • The Journal of Korean Academy of Prosthodontics
    • /
    • v.49 no.3
    • /
    • pp.236-244
    • /
    • 2011
  • Purpose: This study was aimed to compare the margin and internal fitness of 3-unit zirconia bridge cores fabricated by several CAD/CAM systems using replica technique. Materials and methods: Three unit-bridge models in which upper canine and upper second premolar were used as abutments and upper first premolar was missed, were fabricated. Fourty models were classified into 4 groups (Cerasys$^{(R)}$ (Group C), Dentaim$^{(R)}$ (Group D), KaVo Everest$^{(R)}$ (Group K), $Lava^{TM}$ (Group L)), and zirconia cores were fabricated by each company. Sixteen points were measured on each abutment by replica technique. Statistical analysis was accomplished with two way ANOVA and Dunnett T3 (${\alpha}$=.05). Results: In most systems, there was a larger gap on inter margin than outer margin. In the Group K, overall fitness was excellent, but the incisal gap was very large. In the Group C, marginal gap was significantly larger than Group K, but overall internal gap was uniform (P<.05). The axial gap was under $100\;{\mu}m$ in all system. The difference between internal and external gap was small on Group L and C. However, internal gap was significantly larger than external gap in Group D (P<.05). The fitness of canine was better than second premolar among abutments (P<.05). Conclusion: The marginal and internal gap was within the clinically allowed range in all of the three systems. There was a larger gap on second premolar than canine on internal and marginal surface. In most systems, there was a larger gap on occlusal surface than axial surface.

Tibial Torsion in Children of the Jeju Area (제주지역 소아의 경골 염전)

  • Song, Dong Ho;Eun, Baik-Lin;Park, Sang Hee;Lee, Joon Young;Tockgo, Young Chang
    • Clinical and Experimental Pediatrics
    • /
    • v.48 no.1
    • /
    • pp.75-80
    • /
    • 2005
  • Purpose : Internal tibial torsion is prevalent in East Asian countries such as Korea and Japan, where sitting on the floor is common behavior. Internal tibial torsion or excessive lateral tibial torsion may cause esthetical, functional, or psychological problems and also may induce degenerative arthritis in older age. The purpose of this study is to measure the tibial torsion in children of the Jeju area. Methods : Tibial torsion was measured in 1,042 lower extremities of 521 children from one to 12 years of age. The values of transmalleolar angles were analyzed for each age group divided by 6 months. Quadratic and linear regression models were used to fit patterns of changes in mean values of transmalleolar angles. The age at seven, which provides the highest coefficient of determination for quadratic regression analysis, was used as a cut-off point to fit different statistical models. Results : The mean transmalleolar angle was $0.10{\pm}5.79^{\circ}$ in all children,$ 0.90{\pm}5.49^{\circ}$ in males, and $-0.80{\pm}5.97^{\circ}$ in females. The value was $4.25{\pm}4.04$ in 1 year of age, gradually decreased to the lowest level of $-1.98^{\circ}$ in four years and seven months of age, increased again with age until it reached $0.67{\pm}1.10^{\circ}$ at seven years of age, and stayed at that level thereafter. Conclusion : Internal tibial torsion in infancy is known to correct spontaneously in the normal developing process. But in this study, the mean transmalleolar angle in children of Jeju area annually decreased after one year of age; to the lowest angle at four years and seven months of age; increased again gradually to the age of seven; and persisted in that level, about $10^{\circ}$ less than western children, not correcting further thereafter. These findings suggest tibial torsion might be caused by lifestyle, especially sitting on feet. To prevent abnormalities of joints and gaits, early diagnosis of tibial torsion in childhood and posture correction or early treatment when needed, seems to be necessary.

Development of Neural Network Based Cycle Length Design Model Minimizing Delay for Traffic Responsive Control (실시간 신호제어를 위한 신경망 적용 지체최소화 주기길이 설계모형 개발)

  • Lee, Jung-Youn;Kim, Jin-Tae;Chang, Myung-Soon
    • Journal of Korean Society of Transportation
    • /
    • v.22 no.3 s.74
    • /
    • pp.145-157
    • /
    • 2004
  • The cycle length design model of the Korean traffic responsive signal control systems is devised to vary a cycle length as a response to changes in traffic demand in real time by utilizing parameters specified by a system operator and such field information as degrees of saturation of through phases. Since no explicit guideline is provided to a system operator, the system tends to include ambiguity in terms of the system optimization. In addition, the cycle lengths produced by the existing model have yet been verified if they are comparable to the ones minimizing delay. This paper presents the studies conducted (1) to find shortcomings embedded in the existing model by comparing the cycle lengths produced by the model against the ones minimizing delay and (2) to propose a new direction to design a cycle length minimizing delay and excluding such operator oriented parameters. It was found from the study that the cycle lengths from the existing model fail to minimize delay and promote intersection operational conditions to be unsatisfied when traffic volume is low, due to the feature of the changed target operational volume-to-capacity ratio embedded in the model. The 64 different neural network based cycle length design models were developed based on simulation data surrogating field data. The CORSIM optimal cycle lengths minimizing delay were found through the COST software developed for the study. COST searches for the CORSIM optimal cycle length minimizing delay with a heuristic searching method, a hybrid genetic algorithm. Among 64 models, the best one producing cycle lengths close enough to the optimal was selected through statistical tests. It was found from the verification test that the best model designs a cycle length as similar pattern to the ones minimizing delay. The cycle lengths from the proposed model are comparable to the ones from TRANSYT-7F.

An Empirical Study on the Dual Burden of Married Working Women : Testifying the Adaptive Partnership, Dual Burden and Lagged Adaptation Hypotheses (근로기혼여성의 이중노동부담에 관한 실증연구: 가사노동분담에 관한 협조적 적응, 이중노동부담, 적응지체 가설의 검증)

  • Kim, Jin-Wook
    • Korean Journal of Social Welfare
    • /
    • v.57 no.3
    • /
    • pp.51-72
    • /
    • 2005
  • The purpose of this article is to empirically testify three hypotheses on the relation between married women's employment and the allocation of unpaid domestic work within households - i.e., adaptive partnership (AP), dual burden (DB) and lagged adaptation (LA) models. The AP hypothesis assumes that, when wives are employed, husbands spend more time doing housework in order to compensate for their wives' increased responsibility. The DB model, by contrast, indicates that, even if married women are employed, their burden on domestic work does not decrease. In this case, therefore, the dual burden of married women can be expected. Between these two opposite views, the third, alternative hypothesis has been suggested recently. The LA model argues that the behaviours of households are adaptive to the changing environments but over a period of many years and even across generations. The article has analysed the total work time as well as unpaid domestic work time to testify these three hypotheses, utilising 1999 Time Use Survey data of the National Statistical Office. The research results can be summarised as follows. First, married working women worked 100 minutes more than their male spouses. Second, the average domestic work time of married men, 23-25 minutes per day, was no more than 5-10% of that of women. Third, the effects of age and women's employment were not statistically significant in multiple regression models, which means that the DB hypothesis explains the situation of married working women in Korea. Based on these findings, the article suggested the expansion of the public social service system to mitigate the dual burden of married working women, the introduction of compensatory credit for caring work, and the directions of further empirical research using the time use survey data.

  • PDF

Assessment of Climate Change Impact on Storage Behavior of Chungju and the Regulation Dams Using SWAT Model (SWAT을 이용한 기후변화가 충주댐 및 조정지댐 저수량에 미치는 영향 평가)

  • Jeong, Hyeon Gyo;Kim, Seong-Joon;Ha, Rim
    • Journal of Korea Water Resources Association
    • /
    • v.46 no.12
    • /
    • pp.1235-1247
    • /
    • 2013
  • This study is to evaluate the climate change impact on future storage behavior of Chungju dam($2,750{\times}10^6m^3$) and the regulation dam($30{\times}10^6m^3$) using SWAT(Soil Water Assessment Tool) model. Using 9 years data (2002~2010), the SWAT was calibrated and validated for streamflow at three locations with 0.73 average Nash-Sutcliffe model Efficiency (NSE) and for two reservoir water levels with 0.86 NSE respectively. For future evaluation, the HadCM3 of GCMs (General Circulation Models) data by scenarios of SRES (Special Report on Emission Scenarios) A2 and B1 of the IPCC (Intergovernmental Panel on Climate Change) were adopted. The monthly temperature and precipitation data (2007~2099) were spatially corrected using 30 years (1977~2006, baseline period) of ground measured data through bias-correction, and temporally downscaled by Change Factor (CF) statistical method. For two periods; 2040s (2031~2050), 2080s (2071~2099), the future annual temperature were predicted to change $+0.9^{\circ}C$ in 2040s and $+4.0^{\circ}C$ in 2080s, and annual precipitation increased 9.6% in 2040s and 20.7% in 2080s respectively. The future watershed evapotranspiration increased up to 15.3% and the soil moisture decreased maximum 2.8% compared to baseline (2002~2010) condition. Under the future dam release condition of 9 years average (2002~2010) for each dam, the yearly dam inflow increased maximum 21.1% for most period except autumn. By the decrease of dam inflow in future autumn, the future dam storage could not recover to the full water level at the end of the year by the present dam release pattern. For the future flood and drought years, the temporal variation of dam storage became more unstable as it needs careful downward and upward management of dam storage respectively. Thus it is necessary to adjust the dam release pattern for climate change adaptation.

Developing Korean Forest Fire Occurrence Probability Model Reflecting Climate Change in the Spring of 2000s (2000년대 기후변화를 반영한 봄철 산불발생확률모형 개발)

  • Won, Myoungsoo;Yoon, Sukhee;Jang, Keunchang
    • Korean Journal of Agricultural and Forest Meteorology
    • /
    • v.18 no.4
    • /
    • pp.199-207
    • /
    • 2016
  • This study was conducted to develop a forest fire occurrence model using meteorological characteristics for practical forecasting of forest fire danger rate by reflecting the climate change for the time period of 2000yrs. Forest fire in South Korea is highly influenced by humidity, wind speed, temperature, and precipitation. To effectively forecast forest fire occurrence, we developed a forest fire danger rating model using weather factors associated with forest fire in 2000yrs. Forest fire occurrence patterns were investigated statistically to develop a forest fire danger rating index using times series weather data sets collected from 76 meteorological observation centers. The data sets were used for 11 years from 2000 to 2010. Development of the national forest fire occurrence probability model used a logistic regression analysis with forest fire occurrence data and meteorological variables. Nine probability models for individual nine provinces including Jeju Island have been developed. The results of the statistical analysis show that the logistic models (p<0.05) strongly depends on the effective and relative humidity, temperature, wind speed, and rainfall. The results of verification showed that the probability of randomly selected fires ranges from 0.687 to 0.981, which represent a relatively high accuracy of the developed model. These findings may be beneficial to the policy makers in South Korea for the prevention of forest fires.

Analysis of Urban Heat Island (UHI) Alleviating Effect of Urban Parks and Green Space in Seoul Using Deep Neural Network (DNN) Model (심층신경망 모형을 이용한 서울시 도시공원 및 녹지공간의 열섬저감효과 분석)

  • Kim, Byeong-chan;Kang, Jae-woo;Park, Chan;Kim, Hyun-jin
    • Journal of the Korean Institute of Landscape Architecture
    • /
    • v.48 no.4
    • /
    • pp.19-28
    • /
    • 2020
  • The Urban Heat Island (UHI) Effect has intensified due to urbanization and heat management at the urban level is treated as an important issue. Green space improvement projects and environmental policies are being implemented as a way to alleviate Urban Heat Islands. Several studies have been conducted to analyze the correlation between urban green areas and heat with linear regression models. However, linear regression models have limitations explaining the correlation between heat and the multitude of variables as heat is a result of a combination of non-linear factors. This study evaluated the Heat Island alleviating effects in Seoul during the summer by using a deep neural network model methodology, which has strengths in areas where it is difficult to analyze data with existing statistical analysis methods due to variable factors and a large amount of data. Wide-area data was acquired using Landsat 8. Seoul was divided into a grid (30m × 30m) and the heat island reduction variables were enter in each grid space to create a data structure that is needed for the construction of a deep neural network using ArcGIS 10.7 and Python3.7 with Keras. This deep neural network was used to analyze the correlation between land surface temperature and the variables. We confirmed that the deep neural network model has high explanatory accuracy. It was found that the cooling effect by NDVI was the greatest, and cooling effects due to the park size and green space proximity were also shown. Previous studies showed that the cooling effects related to park size was 2℃-3℃, and the proximity effect was found to lower the temperature 0.3℃-2.3℃. There is a possibility of overestimation of the results of previous studies. The results of this study can provide objective information for the justification and more effective formation of new urban green areas to alleviate the Urban Heat Island phenomenon in the future.