Search | Korea Science

Improving the Accuracy of Document Classification by Learning Heterogeneity (이질성 학습을 통한 문서 분류의 정확성 향상 기법)

Wong, William Xiu Shun;Hyun, Yoonjin;Kim, Namgyu
- Journal of Intelligence and Information Systems
- /
- v.24 no.3
- /
- pp.21-44
- /
- 2018
In recent years, the rapid development of internet technology and the popularization of smart devices have resulted in massive amounts of text data. Those text data were produced and distributed through various media platforms such as World Wide Web, Internet news feeds, microblog, and social media. However, this enormous amount of easily obtained information is lack of organization. Therefore, this problem has raised the interest of many researchers in order to manage this huge amount of information. Further, this problem also required professionals that are capable of classifying relevant information and hence text classification is introduced. Text classification is a challenging task in modern data analysis, which it needs to assign a text document into one or more predefined categories or classes. In text classification field, there are different kinds of techniques available such as K-Nearest Neighbor, Naïve Bayes Algorithm, Support Vector Machine, Decision Tree, and Artificial Neural Network. However, while dealing with huge amount of text data, model performance and accuracy becomes a challenge. According to the type of words used in the corpus and type of features created for classification, the performance of a text classification model can be varied. Most of the attempts are been made based on proposing a new algorithm or modifying an existing algorithm. This kind of research can be said already reached their certain limitations for further improvements. In this study, aside from proposing a new algorithm or modifying the algorithm, we focus on searching a way to modify the use of data. It is widely known that classifier performance is influenced by the quality of training data upon which this classifier is built. The real world datasets in most of the time contain noise, or in other words noisy data, these can actually affect the decision made by the classifiers built from these data. In this study, we consider that the data from different domains, which is heterogeneous data might have the characteristics of noise which can be utilized in the classification process. In order to build the classifier, machine learning algorithm is performed based on the assumption that the characteristics of training data and target data are the same or very similar to each other. However, in the case of unstructured data such as text, the features are determined according to the vocabularies included in the document. If the viewpoints of the learning data and target data are different, the features may be appearing different between these two data. In this study, we attempt to improve the classification accuracy by strengthening the robustness of the document classifier through artificially injecting the noise into the process of constructing the document classifier. With data coming from various kind of sources, these data are likely formatted differently. These cause difficulties for traditional machine learning algorithms because they are not developed to recognize different type of data representation at one time and to put them together in same generalization. Therefore, in order to utilize heterogeneous data in the learning process of document classifier, we apply semi-supervised learning in our study. However, unlabeled data might have the possibility to degrade the performance of the document classifier. Therefore, we further proposed a method called Rule Selection-Based Ensemble Semi-Supervised Learning Algorithm (RSESLA) to select only the documents that contributing to the accuracy improvement of the classifier. RSESLA creates multiple views by manipulating the features using different types of classification models and different types of heterogeneous data. The most confident classification rules will be selected and applied for the final decision making. In this paper, three different types of real-world data sources were used, which are news, twitter and blogs.
https://doi.org/10.13088/jiis.2018.24.3.021 인용 PDF KSCI

Mobility and Safety Evaluation Methodology for the Locations of Hi-PASS Lanes Using a Microscopic Traffic Simulation Tool (미시교통시뮬레이션모형을 이용한 하이패스 차로 위치별 이동성 및 안전성 평가방법 연구)

Yun, Ilsoo;Han, Eum;Lee, Cheol-Ki;Rho, Jeong Hyun;Lee, Soojin;Kim, Sang Byum
- The Journal of The Korea Institute of Intelligent Transport Systems
- /
- v.12 no.1
- /
- pp.98-108
- /
- 2013
The number of Hi-Pass lanes became 793 lanes at 316 expressway tollgates in 2011 due to the increase in the Hi-Pass use. In spite of the increase in the number of Hi-Pass lanes, there have been increased potential risks in tollgates where vehicles using a Hi-Pass lane must weave with other vehicles using a TCS lane. Therefore, there is a need for study on the safety in tollgates. To this end, this study aims at developing a methodology to evaluate the performance measures of diverse location countermeasures of Hi-Pass lanes in an efficient and systematic way. This study measured the mobility, safety and the convenience of installation and operation of Hi-Pass lanes using a microscopic traffic simulation tool, the surrogate safety assessment model and survey. In addition, this study aggregated the above three performance indexes using weight factors estimated using the AHP technique. For the test site, Dongsuwon interchange was selected. After building the microscopic traffic simulation model for the test site, the location countermeasures of Hi-Pass lanes applicable to the test site were compared with each other in terms of the mobility, safety and installing and operating convenience. As a result, there has been no apparent difference in mobility index based on delays. However, the countermeasures where Hi-Pass lanes are located in inside lanes generally showed better safety performance based on the number of conflicts. In addition, countermeasures with neighboring Hi-Pass lanes were favorable in terms of the safety and the convenience of installation and operation. The methodology proposed in this study was found to be useful to support decision makings by providing critical and quantitative information regarding the mobility, safety and the convenience of installation and operation.
https://doi.org/10.12815/kits.2013.12.1.098 인용 PDF KSCI

Development of a Failure Probability Model based on Operation Data of Thermal Piping Network in District Heating System (지역난방 열배관망 운영데이터 기반의 파손확률 모델 개발)

Kim, Hyoung Seok;Kim, Gye Beom;Kim, Lae Hyun
- Korean Chemical Engineering Research
- /
- v.55 no.3
- /
- pp.322-331
- /
- 2017
District heating was first introduced in Korea in 1985. As the service life of the underground thermal piping network has increased for more than 30 years, the maintenance of the underground thermal pipe has become an important issue. A variety of complex technologies are required for periodic inspection and operation management for the maintenance of the aged thermal piping network. Especially, it is required to develop a model that can be used for decision making in order to derive optimal maintenance and replacement point from the economic viewpoint in the field. In this study, the analysis was carried out based on the repair history and accident data at the operation of the thermal pipe network of five districts in the Korea District Heating Corporation. A failure probability model was developed by introducing statistical techniques of qualitative analysis and binomial logistic regression analysis. As a result of qualitative analysis of maintenance history and accident data, the most important cause of pipeline damage was construction erosion, corrosion of pipe and bad material accounted for about 82%. In the statistical model analysis, by setting the separation point of the classification to 0.25, the accuracy of the thermal pipe breakage and non-breakage classification improved to 73.5%. In order to establish the failure probability model, the fitness of the model was verified through the Hosmer and Lemeshow test, the independent test of the independent variables, and the Chi-Square test of the model. According to the results of analysis of the risk of thermal pipe network damage, the highest probability of failure was analyzed as the thermal pipeline constructed by the F construction company in the reducer pipe of less than 250mm, which is more than 10 years on the Seoul area motorway in winter. The results of this study can be used to prioritize maintenance, preventive inspection, and replacement of thermal piping systems. In addition, it will be possible to reduce the frequency of thermal pipeline damage and to use it more aggressively to manage thermal piping network by establishing and coping with accident prevention plan in advance such as inspection and maintenance.
https://doi.org/10.9713/kcer.2017.55.3.322 인용 PDF KSCI

A Value Analysis of Ecological Restoration Construction Considering Life Cycle Cost and Performance - Focusing on the Wet Media for Slope Revegetation - (생애주기비용과 성능을 고려한 생태복원 공법 가치분석 - 습식 비탈면 기반재를 사례로 -)

Li, Lan;Kim, Sung Hee;Kim, Bo Heui;Lim, Su Hyun;Kim, Sung Il;Koo, Bon Hak
- Journal of the Korean Institute of Landscape Architecture
- /
- v.42 no.5
- /
- pp.101-109
- /
- 2014
In order to save costs and enhance quality in construction without damaging the environment, the VE/LCC analysis method is increasingly used. This study was carried out to conduct a value analysis for the ecological restoration of a slope considering life cycle cost and performance. The construction conditions were classified into three types(A, B, C) according to the condition of each base. Three construction methods for slope ecological restoration were selected by each condition. Eventually, a value analysis was conducted for total nine conditions by analyzing the life cycle cost and performance. The gradient of slope and base of Condition 1 were below 1:1.2 and general soil, while condition 2 and 3 were below 1:1.0(reaping rock) and below 1:0.7(soft rock, blasted rock), respectively. A value analysis was conducted based on the value estimated via life cycle cost and performance analysis. The result showed that the B construction method had the highest value in Condition 1 as it showed 108.4, while A and C showed 90.3 and 45.8, respectively. When it comes to Condition 2, Construction Method A indicated the highest value as it showed 89.1(B: 47.5, C: 47.0). In Condition 3, Construction Method A(89.1) was the highest, while B and C showed 55.4 and 40.2, respectively. Based on the result of this study, in order to make a reasonable decision that can enhance quality and reduce costs in slope ecological restoration, the slope ecological restoration method must be reviewed in consideration of life cycle cost and performance.
https://doi.org/10.9715/KILA.2014.42.5.101 인용 PDF KSCI

Success Factors of the Supdari(A Wooden Bridge) Restoration in Jeonju-River through Citizens' Initiative (적극적 주민참여를 통한 전통문화시설 복원 성공요인 분석 - 전주천 섶다리 놓기 사업을 중심으로 -)

Kim, Sang-Wook;Kim, Gil-Joong
- Journal of the Korean Institute of Traditional Landscape Architecture
- /
- v.28 no.1
- /
- pp.93-101
- /
- 2010
This paper aims to analyze success factors for the construction of Supdari(a traditional wooden bridge to connect small streams temporarily), which is a citizens' initiative project to revitalize local community in Jeonju-River, Jeonju City. Recently Supdari has been restored for the use of belongings in local festivals. But Jeonju-River Supdari was designed and built to unite local citizens and connect river-divided villages. This project shows how investing social capital like Supdari makes the community vitalize through citizen's active participation. As a citizen leading project, there were several critical factors for sucess. At first, there were some noticeable ways to encourage local citizen's participation in online and offline. In the online, the Supdari internet cafe introduced what is a Supdari, how to make it and where we build using various media of UCCs and photos. In the offline, the small scaled model of Supdari was made and exhibited in the entrance of the village and related several seminars were hosted to discuss how to construct Supdari with citizens, local assembly men and public officials together. The Second is the movement to restore traditional and cultural resources for the community recovery triggered the supports from local councils and many civic groups. Civic groups supported ecological and structural expertise to guarantee environment friendly and stable construction. And local councils mediated citizen's and administrative office's opinions. The third is flexible administrative management to help citizen's ideas to be realized. Officials extended setting period of Supdari on the condition with the civic-control safety management.
PDF KSCI

Oxidative Degradation of PCE/TCE Using $KMnO_4$ in Aqueous Solutions under Steady Flow Conditions (유동조건에서 $KMnO_4$도입에 따른 수용액중 PCE/TCE의 산화분해)

Kim, Heon-Ki;Kim, Tae-Yun
- Economic and Environmental Geology
- /
- v.41 no.6
- /
- pp.685-693
- /
- 2008
The rates of oxidative degradation of perchloroethene (PCE) and trichloroethene (TCE) using $KMnO_4$ solution were evaluated under the flow condition using a bench-scale transport experimental setup. Parameters which are considered to affect the reaction rates tested in this study were the contact time (or retention time), and the concentration of oxidizing agent. A glass column packed with coarse sand was used for simulating the aquifer condition. Contact time between reactants was controlled by changing the flow rate of the solution through the column. The inflow concentrations of PCE and TCE were controlled constant within the range of $0.11{\sim}0.21\;mM$ and $1.3{\sim}1.5\;mM$, respectively. And the contact time was $14{\sim}125$ min for PCE and $15{\sim}36$ min for TCE. The $KMnO_4$ concentration was controlled constant during experiment in the range of $0.6{\sim}2.5\;mM$. It was found that the reduction of PCE and TCE concentrations were inversely proportional to the contact time. The exact reaction order for the PCE and TCE degradation reaction could not be determined under the experimental condition used in this study. However, the estimated reaction rate constants assuming pseudo-1st order reaction agree with those reported based on batch studies. TCE degradation rate was proportional to $KMnO_4$ concentration. This was considered to be the result of using high inflow concentrations of reactant, which might be the case at the vicinity of the source zones in aquifer. The results of this study, performed using a dynamic flow system, are expected to provide useful information for designing and implementing a field scale oxidative removal process for PCE/TCE-contaminated sites.
PDF KSCI

Methodology for Issue-related R&D Keywords Packaging Using Text Mining (텍스트 마이닝 기반의 이슈 관련 R&D 키워드 패키징 방법론)

Hyun, Yoonjin;Shun, William Wong Xiu;Kim, Namgyu
- Journal of Internet Computing and Services
- /
- v.16 no.2
- /
- pp.57-66
- /
- 2015
Considerable research efforts are being directed towards analyzing unstructured data such as text files and log files using commercial and noncommercial analytical tools. In particular, researchers are trying to extract meaningful knowledge through text mining in not only business but also many other areas such as politics, economics, and cultural studies. For instance, several studies have examined national pending issues by analyzing large volumes of text on various social issues. However, it is difficult to provide successful information services that can identify R&D documents on specific national pending issues. While users may specify certain keywords relating to national pending issues, they usually fail to retrieve appropriate R&D information primarily due to discrepancies between these terms and the corresponding terms actually used in the R&D documents. Thus, we need an intermediate logic to overcome these discrepancies, also to identify and package appropriate R&D information on specific national pending issues. To address this requirement, three methodologies are proposed in this study-a hybrid methodology for extracting and integrating keywords pertaining to national pending issues, a methodology for packaging R&D information that corresponds to national pending issues, and a methodology for constructing an associative issue network based on relevant R&D information. Data analysis techniques such as text mining, social network analysis, and association rules mining are utilized for establishing these methodologies. As the experiment result, the keyword enhancement rate by the proposed integration methodology reveals to be about 42.8%. For the second objective, three key analyses were conducted and a number of association rules between national pending issue keywords and R&D keywords were derived. The experiment regarding to the third objective, which is issue clustering based on R&D keywords is still in progress and expected to give tangible results in the future.
https://doi.org/10.7472/jksii.2015.16.2.57 인용 PDF KSCI

Dynamic Traffic Assignment Using Genetic Algorithm (유전자 알고리즘을 이용한 동적통행배정에 관한 연구)

Park, Kyung-Chul;Park, Chang-Ho;Chon, Kyung-Soo;Rhee, Sung-Mo
- Journal of Korean Society for Geospatial Information Science
- /
- v.8 no.1 s.15
- /
- pp.51-63
- /
- 2000
Dynamic traffic assignment(DTA) has been a topic of substantial research during the past decade. While DTA is gradually maturing, many aspects of DTA still need improvement, especially regarding its formulation and solution algerian Recently, with its promise for In(Intelligent Transportation System) and GIS(Geographic Information System) applications, DTA have received increasing attention. This potential also implies higher requirement for DTA modeling, especially regarding its solution efficiency for real-time implementation. But DTA have many mathematical difficulties in searching process due to the complexity of spatial and temporal variables. Although many solution algorithms have been studied, conventional methods cannot iud the solution in case that objective function or constraints is not convex. In this paper, the genetic algorithm to find the solution of DTA is applied and the Merchant-Nemhauser model is used as DTA model because it has a nonconvex constraint set. To handle the nonconvex constraint set the GENOCOP III system which is a kind of the genetic algorithm is used in this study. Results for the sample network have been compared with the results of conventional method.
PDF

Revisiting the cause of unemployment problem in Korea's labor market: The job seeker's interests-based topic analysis (취업준비생 토픽 분석을 통한 취업난 원인의 재탐색)

Kim, Jung-Su;Lee, Suk-Jun
- Management & Information Systems Review
- /
- v.35 no.1
- /
- pp.85-116
- /
- 2016
The present study aims to explore the causes of employment difficulty on the basis of job applicant's interest from P-E (person-environment) fit perspective. Our approach relied on a textual analytic method to reveal insights from their situational interests in a job search during the change of labor market. Thus, to investigate the type of major interests and psychological responses, user-generated texts in a social community were collected for analysis between January 1, 2013 through December 31, 2015 by crawling the online-community in regard to job seeking and sharing information and opinions. The results of topic analysis indicated user's primary interests were divided into four types: perception of vocation expectation, employment pre-preparation behaviors, perception of labor market, and job-seeking stress. Specially, job applicants put mainly concerns of monetary reward and a form of employment, rather than their work values or career exploration, thus youth job applicants expressed their psychological responses using contextualized language (e.g., slang, vulgarisms) for projecting their unstable state under uncertainty in response to environmental changes. Additionally, they have perceived activities in the restricted preparation (e.g., certification, English exam) as determinant factors for success in employment and suffered form job-seeking stress. On the basis of these findings, current unemployment matters are totally attributed to the absence of pursing the value of vocation and job in individuals, organizations, and society. Concretely, job seekers are preoccupied with occupational prestige in social aspect and have undecided vocational value. On the other hand, most companies have no perception of the importance of human resources and have overlooked the needs for proper work environment development in respect of stimulating individual motivation. The attempt in this study to reinterpret the effect of environment as for classifying job applicant's interests in reference to linguistic and psychological theories not only helps conduct a more comprehensive meaning for understanding social matters, but guides new directions for future research on job applicant's psychological factors (e.g., attitudes, motivation) using topic analysis.
PDF

The Development and Features of Discussion about Community Design (커뮤니티디자인의 전개와 논의의 특징)

Kim, Yun-Geum;Reigh, Young-Bum
- Journal of the Korean Institute of Landscape Architecture
- /
- v.40 no.3
- /
- pp.22-31
- /
- 2012
This study was prompted by the recognition that the tenn "Community design" has recently been used in diverse practical fields without prior discussion about its underpinnings, a potentially problematic state of affairs. Based on these problems, this study studied the special quality about the concept of community design. Community design can be discussed from two perspectives. The first views community design as a design that concerns the community, an inhabited area populated with people who have common interests, at least in part because of geographic proximity to each other. The second sees community design as a movement that started in the 1960s and places a great importance on democratic decision making, communication, and collaboration. This study will focus on the latter. This branch of community design encompasses an advocacy planning approach, in which design professionals represent deprived communities in their resistance against comprehensive redevelopment. This was associated to the wider social protest movements of the mid and late 1960s. In the 1970s, this branch of community design was developed alongside community design centers, which provided local-level technical assistance to the communities on a number of issues, such as design and planning. The discussion about community design started in earnest from the early 1980s. A review of the literature m community design reveals several characteristics. First, community design deals with the relationship between the physical environment and several aspects of a region, including the social and cultural. Second, it involves community participation, which many scholars believe is the core of community design. Specifically, community design has been characterized by increased participation and democratic debate and decision making. The Third is about communication methods. Since the 1960s, diverse methods had been developed to promote communication effectively. Finally, community design must consider the relationship between designers, who typically value aesthetics and efficiency of form, and the needs of the community with which they are working. Indeed, some scholars believe that this relationship is generally contentious, although the designer can also be thought of as the facilitator of the community's needs. As community design practice becomes more prevalent, a review of the foundation of institution and policy and the role of experts is also needed. The community design movement bas been theorized ex post facto through diverse discussion that has sought to ascribe meaning and direction to its practice. In other words, the relationship between this theory and practice is cyclical. Therefore, this study can contribute to the virtuous circle.
https://doi.org/10.9715/KILA.2012.40.3.022 인용 PDF KSCI

Search Result 1,624, Processing Time 0.032 seconds

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)