• Title/Summary/Keyword: Built-up

Search Result 1,825, Processing Time 0.027 seconds

A Study on Ontology and Topic Modeling-based Multi-dimensional Knowledge Map Services (온톨로지와 토픽모델링 기반 다차원 연계 지식맵 서비스 연구)

  • Jeong, Hanjo
    • Journal of Intelligence and Information Systems
    • /
    • v.21 no.4
    • /
    • pp.79-92
    • /
    • 2015
  • Knowledge map is widely used to represent knowledge in many domains. This paper presents a method of integrating the national R&D data and assists of users to navigate the integrated data via using a knowledge map service. The knowledge map service is built by using a lightweight ontology and a topic modeling method. The national R&D data is integrated with the research project as its center, i.e., the other R&D data such as research papers, patents, and reports are connected with the research project as its outputs. The lightweight ontology is used to represent the simple relationships between the integrated data such as project-outputs relationships, document-author relationships, and document-topic relationships. Knowledge map enables us to infer further relationships such as co-author and co-topic relationships. To extract the relationships between the integrated data, a Relational Data-to-Triples transformer is implemented. Also, a topic modeling approach is introduced to extract the document-topic relationships. A triple store is used to manage and process the ontology data while preserving the network characteristics of knowledge map service. Knowledge map can be divided into two types: one is a knowledge map used in the area of knowledge management to store, manage and process the organizations' data as knowledge, the other is a knowledge map for analyzing and representing knowledge extracted from the science & technology documents. This research focuses on the latter one. In this research, a knowledge map service is introduced for integrating the national R&D data obtained from National Digital Science Library (NDSL) and National Science & Technology Information Service (NTIS), which are two major repository and service of national R&D data servicing in Korea. A lightweight ontology is used to design and build a knowledge map. Using the lightweight ontology enables us to represent and process knowledge as a simple network and it fits in with the knowledge navigation and visualization characteristics of the knowledge map. The lightweight ontology is used to represent the entities and their relationships in the knowledge maps, and an ontology repository is created to store and process the ontology. In the ontologies, researchers are implicitly connected by the national R&D data as the author relationships and the performer relationships. A knowledge map for displaying researchers' network is created, and the researchers' network is created by the co-authoring relationships of the national R&D documents and the co-participation relationships of the national R&D projects. To sum up, a knowledge map-service system based on topic modeling and ontology is introduced for processing knowledge about the national R&D data such as research projects, papers, patent, project reports, and Global Trends Briefing (GTB) data. The system has goals 1) to integrate the national R&D data obtained from NDSL and NTIS, 2) to provide a semantic & topic based information search on the integrated data, and 3) to provide a knowledge map services based on the semantic analysis and knowledge processing. The S&T information such as research papers, research reports, patents and GTB are daily updated from NDSL, and the R&D projects information including their participants and output information are updated from the NTIS. The S&T information and the national R&D information are obtained and integrated to the integrated database. Knowledge base is constructed by transforming the relational data into triples referencing R&D ontology. In addition, a topic modeling method is employed to extract the relationships between the S&T documents and topic keyword/s representing the documents. The topic modeling approach enables us to extract the relationships and topic keyword/s based on the semantics, not based on the simple keyword/s. Lastly, we show an experiment on the construction of the integrated knowledge base using the lightweight ontology and topic modeling, and the knowledge map services created based on the knowledge base are also introduced.

The Study of Establishing the Multi-pass Eurasian Railroads (유라시아 철도의 다중경로 구축에 관한 연구)

  • Hahm, Beom-Hee;Huh, Nam-Kyun;Hurr, Hee-Young
    • Korean Business Review
    • /
    • v.21 no.2
    • /
    • pp.137-170
    • /
    • 2008
  • This study is presenting the logistics strategy in the international logistics markets which makes competition and corporation among north-east Asian countries to establishing the multi-pass Eurasian railroads. The countries located in north-east area of Eurasia like China, Japan, Russia and Korea are paying higher costs and disutility to the transportations and communications due to repeated conflicts and confrontations causes from the politic problems. They are being used surface transportation for most of all logistics between Europe and Asia except special merchandises because of characteristic of cargo to be air, the Silk Road remains vestige only which was main logistic passage to this area since BC. So far the Trans-Siberian Railway is being used by Russia mostly as north of Eurasian transport because of difficulties of service. The Trans-China Railway built in 1992 is not accomplishing as a international logistic passages. It is expected to take a long lead time because of characteristic of resource development and poor logistic infrastructure to the countries like Uzbekistan, double landlocked country, Mongolia and Azerbaijan, the countries do not be adjacent to the sea, even they have great economic jump-up plans through the development of their own resources. The Shanghai Cooperation Organization(SCO) start to sail officially in 2001 is constructed with China, Russia, Tadzhikistan, Kyrgyzstan, Kazakhstan and Uzbekistan as regular members of 6 countries and Mongolia, India, Pakistan, Afghanistan and Iran as observers 5 countries. It is started as a military alliance to protect terror, but now, it is expended to cooperate with the traffic, transportation, trade and share of energies. The Russia is doing their best to activate TSR as a government target to developnorth area equivalently, and economic develop of far-east Siberia. And also it is agreed provisionally to improve and repair of rail road between Nahjin and Hassan to connect TSR and TKR( Trans-Korea Railroad) by Russia, North Korea and South Korea with Russian's aggressive efforts. The development plan of this area is over lapped with GTI(Greater Tumen Initiative) promoted by UNDP, and is a cooperated project by 5 countries of South Korea, Mongolia, China, Russia and North Korea, subject to review the appropriation of energy, tour, environment, rail road connection between Mongolia and China and establishing a ferry route to north-east Asia. It is Japanese situation to pay attention to Russia and China even they have been supplying large-scope of infrastructure in Mongol area without any charges, target to get East Asia Main Rail Road to connect Mongolia and Zalubino of Russia. In case of the program for the Denuclearization of North Korea is not creeping, it will be accelerated to connect the TKR and TSR, TKR and TCR by somehow attending United States, including developing program promoted by UN ESCAP. As the result, Korean peninsular will continue the central role of competition and cooperation as in the past, now and future of north-east Asia, as of geographical-economics and geographical-politics whether it is requested or not wanted by neighbor countries.

  • PDF

Records Management and Archives in Korea : Its Development and Prospects (한국 기록관리행정의 변천과 전망)

  • Nam, Hyo-Chai
    • Journal of Korean Society of Archives and Records Management
    • /
    • v.1 no.1
    • /
    • pp.19-35
    • /
    • 2001
  • After almost one century of discontinuity in the archival tradition of Chosun dynasty, Korea entered the new age of records and archival management by legislating and executing the basic laws (The Records and Archives Management of Public Agencies Ad of 1999). Annals of Chosun dynasty recorded major historical facts of the five hundred years of national affairs. The Annals are major accomplishment in human history and rare in the world. It was possible because the Annals were composed of collected, selected and complied records of primary sources written and compiled by generations of historians, As important public records are needed to be preserved in original forms in modern archives, we had to develop and establish a modern archival system to appraise and select important national records for archival preservation. However, the colonialization of Korea deprived us of the opportunity to do the task, and our fine archival tradition was not succeeded. A centralized archival system began to develop since the establishment of GARS under the Ministry of Government Administration in 1969. GARS built a modem repository in Pusan in 1984 succeeding to the tradition of History Archives of Chosun dynasty. In 1998, GARS moved its headquarter to Taejon Government Complex and acquired state-of-the-art audio visual archives preservation facilities. From 1996, GARS introduced an automated archival management system to remedy the manual registration and management system complementing the preservation microfilming. Digitization of the holdings was the key project to provided the digital images of archives to users. To do this, the GARS purchased new computer/server systems and developed application softwares. Parallel to this direction, GARS drastically renovated its manpower composition toward a high level of professionalization by recruiting more archivists with historical and library science backgrounds. Conservators and computer system operators were also recruited. The new archival laws has been in effect from January 1, 2000. The new laws made following new changes in the field of records and archival administration in Korea. First, the laws regulate the records and archives of all public agencies including the Legislature, the Judiciary, the Administration, the constitutional institutions, Army, Navy, Air Force, and National Intelligence Service. A nation-wide unified records and archives management system became available. Second, public archives and records centers are to be established according to the level of the agency; a central archives at national level, special archives for the National Assembly and the Judiciary, local government archives for metropolitan cities and provinces, records center or special records center for administrative agencies. A records manager will be responsible for the records management of each administrative divisions. Third, the records in the public agencies are registered in the computer system as they are produced. Therefore, the records are traceable and will be searched or retrieved easily through internet or computer network. Fourth, qualified records managers and archivists who are professionally trained in the field of records management and archival science will be assigned mandatorily to guarantee the professional management of records and archives. Fifth, the illegal treatment of public records and archives constitutes a punishable crime. In the future, the public records find archival management will develop along with Korean government's 'Electronic Government Project.' Following changes are in prospect. First, public agencies will digitize paper records, audio-visual records, and publications as well as electronic documents, thus promoting administrative efficiency and productivity. Second, the National Assembly already established its Special Archives. The judiciary and the National Intelligence Service will follow it. More archives will be established at city and provincial levels. Third, the more our society develop into a knowledge-based information society, the more the records management function will become one of the important national government functions. As more universities, academic associations, and civil societies participate in promoting archival awareness and in establishing archival science, and more people realize the importance of the records and archives management up to the level of national public campaign, the records and archival management in Korea will develop significantly distinguishable from present practice.

Extension Method of Association Rules Using Social Network Analysis (사회연결망 분석을 활용한 연관규칙 확장기법)

  • Lee, Dongwon
    • Journal of Intelligence and Information Systems
    • /
    • v.23 no.4
    • /
    • pp.111-126
    • /
    • 2017
  • Recommender systems based on association rule mining significantly contribute to seller's sales by reducing consumers' time to search for products that they want. Recommendations based on the frequency of transactions such as orders can effectively screen out the products that are statistically marketable among multiple products. A product with a high possibility of sales, however, can be omitted from the recommendation if it records insufficient number of transactions at the beginning of the sale. Products missing from the associated recommendations may lose the chance of exposure to consumers, which leads to a decline in the number of transactions. In turn, diminished transactions may create a vicious circle of lost opportunity to be recommended. Thus, initial sales are likely to remain stagnant for a certain period of time. Products that are susceptible to fashion or seasonality, such as clothing, may be greatly affected. This study was aimed at expanding association rules to include into the list of recommendations those products whose initial trading frequency of transactions is low despite the possibility of high sales. The particular purpose is to predict the strength of the direct connection of two unconnected items through the properties of the paths located between them. An association between two items revealed in transactions can be interpreted as the interaction between them, which can be expressed as a link in a social network whose nodes are items. The first step calculates the centralities of the nodes in the middle of the paths that indirectly connect the two nodes without direct connection. The next step identifies the number of the paths and the shortest among them. These extracts are used as independent variables in the regression analysis to predict future connection strength between the nodes. The strength of the connection between the two nodes of the model, which is defined by the number of nodes between the two nodes, is measured after a certain period of time. The regression analysis results confirm that the number of paths between the two products, the distance of the shortest path, and the number of neighboring items connected to the products are significantly related to their potential strength. This study used actual order transaction data collected for three months from February to April in 2016 from an online commerce company. To reduce the complexity of analytics as the scale of the network grows, the analysis was performed only on miscellaneous goods. Two consecutively purchased items were chosen from each customer's transactions to obtain a pair of antecedent and consequent, which secures a link needed for constituting a social network. The direction of the link was determined in the order in which the goods were purchased. Except for the last ten days of the data collection period, the social network of associated items was built for the extraction of independent variables. The model predicts the number of links to be connected in the next ten days from the explanatory variables. Of the 5,711 previously unconnected links, 611 were newly connected for the last ten days. Through experiments, the proposed model demonstrated excellent predictions. Of the 571 links that the proposed model predicts, 269 were confirmed to have been connected. This is 4.4 times more than the average of 61, which can be found without any prediction model. This study is expected to be useful regarding industries whose new products launch quickly with short life cycles, since their exposure time is critical. Also, it can be used to detect diseases that are rarely found in the early stages of medical treatment because of the low incidence of outbreaks. Since the complexity of the social networking analysis is sensitive to the number of nodes and links that make up the network, this study was conducted in a particular category of miscellaneous goods. Future research should consider that this condition may limit the opportunity to detect unexpected associations between products belonging to different categories of classification.

Antecedents of Manufacturer's Private Label Program Engagement : A Focus on Strategic Market Management Perspective (제조업체 Private Labels 도입의 선행요인 : 전략적 시장관리 관점을 중심으로)

  • Lim, Chae-Un;Yi, Ho-Taek
    • Journal of Distribution Research
    • /
    • v.17 no.1
    • /
    • pp.65-86
    • /
    • 2012
  • The $20^{th}$ century was the era of manufacturer brands which built higher brand equity for consumers. Consumers moved from generic products of inconsistent quality produced by local factories in the $19^{th}$ century to branded products from global manufacturers and manufacturer brands reached consumers through distributors and retailers. Retailers were relatively small compared to their largest suppliers. However, sometime in the 1970s, things began to slowly change as retailers started to develop their own national chains and began international expansion, and consolidation of the retail industry from mom-and-pop stores to global players was well under way (Kumar and Steenkamp 2007, p.2) In South Korea, since the middle of the 1990s, the bulking up of retailers that started then has changed the balance of power between manufacturers and retailers. Retailer private labels, generally referred to as own labels, store brands, distributors own private-label, home brand or own label brand have also been performing strongly in every single local market (Bushman 1993; De Wulf et al. 2005). Private labels now account for one out of every five items sold every day in U.S. supermarkets, drug chains, and mass merchandisers (Kumar and Steenkamp 2007), and the market share in Western Europe is even larger (Euromonitor 2007). In the UK, grocery market share of private labels grew from 39% of sales in 2008 to 41% in 2010 (Marian 2010). Planet Retail (2007, p.1) recently concluded that "[PLs] are set for accelerated growth, with the majority of the world's leading grocers increasing their own label penetration." Private labels have gained wide attention both in the academic literature and popular business press and there is a glowing academic research to the perspective of manufacturers and retailers. Empirical research on private labels has mainly studies the factors explaining private labels market shares across product categories and/or retail chains (Dahr and Hoch 1997; Hoch and Banerji, 1993), factors influencing the private labels proneness of consumers (Baltas and Doyle 1998; Burton et al. 1998; Richardson et al. 1996) and factors how to react brand manufacturers towards PLs (Dunne and Narasimhan 1999; Hoch 1996; Quelch and Harding 1996; Verhoef et al. 2000). Nevertheless, empirical research on factors influencing the production in terms of a manufacturer-retailer is rather anecdotal than theory-based. The objective of this paper is to bridge the gap in these two types of research and explore the factors which influence on manufacturer's private label production based on two competing theories: S-C-P (Structure - Conduct - Performance) paradigm and resource-based theory. In order to do so, the authors used in-depth interview with marketing managers, reviewed retail press and research and presents the conceptual framework that integrates the major determinants of private labels production. From a manufacturer's perspective, supplying private labels often starts on a strategic basis. When a manufacturer engages in private labels, the manufacturer does not have to spend on advertising, retailer promotions or maintain a dedicated sales force. Moreover, if a manufacturer has weak marketing capabilities, the manufacturer can make use of retailer's marketing capability to produce private labels and lessen its marketing cost and increases its profit margin. Figure 1. is the theoretical framework based on a strategic market management perspective, integrated concept of both S-C-P paradigm and resource-based theory. The model includes one mediate variable, marketing capabilities, and the other moderate variable, competitive intensity. Manufacturer's national brand reputation, firm's marketing investment, and product portfolio, which are hypothesized to positively affected manufacturer's marketing capabilities. Then, marketing capabilities has negatively effected on private label production. Moderating effects of competitive intensity are hypothesized on the relationship between marketing capabilities and private label production. To verify the proposed research model and hypotheses, data were collected from 192 manufacturers (212 responses) who are producing private labels in South Korea. Cronbach's alpha test, explanatory / comfirmatory factor analysis, and correlation analysis were employed to validate hypotheses. The following results were drawing using structural equation modeling and all hypotheses are supported. Findings indicate that manufacturer's private label production is strongly related to its marketing capabilities. Consumer marketing capabilities, in turn, is directly connected with the 3 strategic factors (e.g., marketing investment, manufacturer's national brand reputation, and product portfolio). It is moderated by competitive intensity between marketing capabilities and private label production. In conclusion, this research may be the first study to investigate the reasons manufacturers engage in private labels based on two competing theoretic views, S-C-P paradigm and resource-based theory. The private label phenomenon has received growing attention by marketing scholars. In many industries, private labels represent formidable competition to manufacturer brands and manufacturers have a dilemma with selling to as well as competing with their retailers. The current study suggests key factors when manufacturers consider engaging in private label production.

  • PDF

Twitter Issue Tracking System by Topic Modeling Techniques (토픽 모델링을 이용한 트위터 이슈 트래킹 시스템)

  • Bae, Jung-Hwan;Han, Nam-Gi;Song, Min
    • Journal of Intelligence and Information Systems
    • /
    • v.20 no.2
    • /
    • pp.109-122
    • /
    • 2014
  • People are nowadays creating a tremendous amount of data on Social Network Service (SNS). In particular, the incorporation of SNS into mobile devices has resulted in massive amounts of data generation, thereby greatly influencing society. This is an unmatched phenomenon in history, and now we live in the Age of Big Data. SNS Data is defined as a condition of Big Data where the amount of data (volume), data input and output speeds (velocity), and the variety of data types (variety) are satisfied. If someone intends to discover the trend of an issue in SNS Big Data, this information can be used as a new important source for the creation of new values because this information covers the whole of society. In this study, a Twitter Issue Tracking System (TITS) is designed and established to meet the needs of analyzing SNS Big Data. TITS extracts issues from Twitter texts and visualizes them on the web. The proposed system provides the following four functions: (1) Provide the topic keyword set that corresponds to daily ranking; (2) Visualize the daily time series graph of a topic for the duration of a month; (3) Provide the importance of a topic through a treemap based on the score system and frequency; (4) Visualize the daily time-series graph of keywords by searching the keyword; The present study analyzes the Big Data generated by SNS in real time. SNS Big Data analysis requires various natural language processing techniques, including the removal of stop words, and noun extraction for processing various unrefined forms of unstructured data. In addition, such analysis requires the latest big data technology to process rapidly a large amount of real-time data, such as the Hadoop distributed system or NoSQL, which is an alternative to relational database. We built TITS based on Hadoop to optimize the processing of big data because Hadoop is designed to scale up from single node computing to thousands of machines. Furthermore, we use MongoDB, which is classified as a NoSQL database. In addition, MongoDB is an open source platform, document-oriented database that provides high performance, high availability, and automatic scaling. Unlike existing relational database, there are no schema or tables with MongoDB, and its most important goal is that of data accessibility and data processing performance. In the Age of Big Data, the visualization of Big Data is more attractive to the Big Data community because it helps analysts to examine such data easily and clearly. Therefore, TITS uses the d3.js library as a visualization tool. This library is designed for the purpose of creating Data Driven Documents that bind document object model (DOM) and any data; the interaction between data is easy and useful for managing real-time data stream with smooth animation. In addition, TITS uses a bootstrap made of pre-configured plug-in style sheets and JavaScript libraries to build a web system. The TITS Graphical User Interface (GUI) is designed using these libraries, and it is capable of detecting issues on Twitter in an easy and intuitive manner. The proposed work demonstrates the superiority of our issue detection techniques by matching detected issues with corresponding online news articles. The contributions of the present study are threefold. First, we suggest an alternative approach to real-time big data analysis, which has become an extremely important issue. Second, we apply a topic modeling technique that is used in various research areas, including Library and Information Science (LIS). Based on this, we can confirm the utility of storytelling and time series analysis. Third, we develop a web-based system, and make the system available for the real-time discovery of topics. The present study conducted experiments with nearly 150 million tweets in Korea during March 2013.

The micro-tensile bond strength of two-step self-etch adhesive to ground enamel with and without prior acid-etching (산부식 전처리에 따른 2단계 자가부식 접착제의 연마 법랑질에 대한 미세인장결합강도)

  • Kim, You-Lee;Kim, Jee-Hwan;Shim, June-Sung;Kim, Kwang-Mahn;Lee, Keun-Woo
    • The Journal of Korean Academy of Prosthodontics
    • /
    • v.46 no.2
    • /
    • pp.148-156
    • /
    • 2008
  • Statement of problems: Self-etch adhesives exhibit some clinical benefits such as ease of manipulation and reduced technique-sensitivity. Nevertheless, some concern remains regarding the bonding effectiveness of self-etch adhesives to enamel, in particular when so-called 'mild' self-etch adhesives are employed. This study compared the microtensile bond strengths to ground enamel of the two-step self-etch adhesive Clearfil SE Bond (Kuraray) to the three-step etch-and- rinse adhesive Scotchbond Multi-Purpose (3M ESPE) and the one-step self-etch adhesive iBond (Heraeus Kulzer). Purpose: The purpose of this study was to determine the effect of a preceding phosphoric acid conditioning step on the bonding effectiveness of a two-step self-etch adhesive to ground enamel. Material and methods: The two-step self-etch adhesive Clearfil SE Bond non-etch group, Clearfil SE Bond etch group with prior 35% phosphoric acid etching, and the one-step self-etch adhesive iBond group were used as experimental groups. The three-step etch-and-rinse adhesive Scotchbond Multi-Purpose was used as a control group. The facial surfaces of bovine incisors were divided in four equal parts cruciformly, and randomly distributed into each group. The facial surface of each incisor was ground with 800-grit silicon carbide paper. Each adhesive group was applied according to the manufacturer's instructions to ground enamel, after which the surface was built up using Light-Core (Bisco). After storage in distilled water at $37^{\circ}C$ for 1 week, the restored teeth were sectioned into enamel beams approximately 0.8*0.8mm in cross section using a low speed precision diamond saw (TOPMET Metsaw-LS). After storage in distilled water at $37^{\circ}C$ for 1 month, 3 months, microtensile bond strength evaluations were performed using microspecimens. The microtensile bond strength (MPa) was derived by dividing the imposed force (N) at time of fracture by the bond area ($mm^2$). The mode of failure at the interface was determined with a microscope (Microscope-B nocular, Nikon). The data of microtensile bond strength were statistically analyzed using a one-way ANOVA, followed by Least Significant Difference Post Hoc Test at a significance level of 5%. Results: The mean microtensile bond strength after 1 month of storage showed no statistically significant difference between all adhesive groups (P>0.05). After 3 months of storage, adhesion to ground enamel of iBond was not significantly different from Clearfil SE Bond etch (P>>0.05), while Clearfil SE Bond non-etch and Scotchbond Multi-Purpose demonstrated significantly lower bond strengths (P<0.05), with no significant differences between the two adhesives. Conclusion: In this study the microtensile bond strength to ground enamel of two-step self-etch adhesive Clearfil SE Bond was not significantly different from three-step etch-and-rinse adhesive Scotchbond Multi-Purpose, and prior etching with 35% phosphoric acid significantly increased the bonding effectiveness of Clearfil SE Bond to enamel at 3 months.

Export Control System based on Case Based Reasoning: Design and Evaluation (사례 기반 지능형 수출통제 시스템 : 설계와 평가)

  • Hong, Woneui;Kim, Uihyun;Cho, Sinhee;Kim, Sansung;Yi, Mun Yong;Shin, Donghoon
    • Journal of Intelligence and Information Systems
    • /
    • v.20 no.3
    • /
    • pp.109-131
    • /
    • 2014
  • As the demand of nuclear power plant equipment is continuously growing worldwide, the importance of handling nuclear strategic materials is also increasing. While the number of cases submitted for the exports of nuclear-power commodity and technology is dramatically increasing, preadjudication (or prescreening to be simple) of strategic materials has been done so far by experts of a long-time experience and extensive field knowledge. However, there is severe shortage of experts in this domain, not to mention that it takes a long time to develop an expert. Because human experts must manually evaluate all the documents submitted for export permission, the current practice of nuclear material export is neither time-efficient nor cost-effective. Toward alleviating the problem of relying on costly human experts only, our research proposes a new system designed to help field experts make their decisions more effectively and efficiently. The proposed system is built upon case-based reasoning, which in essence extracts key features from the existing cases, compares the features with the features of a new case, and derives a solution for the new case by referencing similar cases and their solutions. Our research proposes a framework of case-based reasoning system, designs a case-based reasoning system for the control of nuclear material exports, and evaluates the performance of alternative keyword extraction methods (full automatic, full manual, and semi-automatic). A keyword extraction method is an essential component of the case-based reasoning system as it is used to extract key features of the cases. The full automatic method was conducted using TF-IDF, which is a widely used de facto standard method for representative keyword extraction in text mining. TF (Term Frequency) is based on the frequency count of the term within a document, showing how important the term is within a document while IDF (Inverted Document Frequency) is based on the infrequency of the term within a document set, showing how uniquely the term represents the document. The results show that the semi-automatic approach, which is based on the collaboration of machine and human, is the most effective solution regardless of whether the human is a field expert or a student who majors in nuclear engineering. Moreover, we propose a new approach of computing nuclear document similarity along with a new framework of document analysis. The proposed algorithm of nuclear document similarity considers both document-to-document similarity (${\alpha}$) and document-to-nuclear system similarity (${\beta}$), in order to derive the final score (${\gamma}$) for the decision of whether the presented case is of strategic material or not. The final score (${\gamma}$) represents a document similarity between the past cases and the new case. The score is induced by not only exploiting conventional TF-IDF, but utilizing a nuclear system similarity score, which takes the context of nuclear system domain into account. Finally, the system retrieves top-3 documents stored in the case base that are considered as the most similar cases with regard to the new case, and provides them with the degree of credibility. With this final score and the credibility score, it becomes easier for a user to see which documents in the case base are more worthy of looking up so that the user can make a proper decision with relatively lower cost. The evaluation of the system has been conducted by developing a prototype and testing with field data. The system workflows and outcomes have been verified by the field experts. This research is expected to contribute the growth of knowledge service industry by proposing a new system that can effectively reduce the burden of relying on costly human experts for the export control of nuclear materials and that can be considered as a meaningful example of knowledge service application.

KNU Korean Sentiment Lexicon: Bi-LSTM-based Method for Building a Korean Sentiment Lexicon (Bi-LSTM 기반의 한국어 감성사전 구축 방안)

  • Park, Sang-Min;Na, Chul-Won;Choi, Min-Seong;Lee, Da-Hee;On, Byung-Won
    • Journal of Intelligence and Information Systems
    • /
    • v.24 no.4
    • /
    • pp.219-240
    • /
    • 2018
  • Sentiment analysis, which is one of the text mining techniques, is a method for extracting subjective content embedded in text documents. Recently, the sentiment analysis methods have been widely used in many fields. As good examples, data-driven surveys are based on analyzing the subjectivity of text data posted by users and market researches are conducted by analyzing users' review posts to quantify users' reputation on a target product. The basic method of sentiment analysis is to use sentiment dictionary (or lexicon), a list of sentiment vocabularies with positive, neutral, or negative semantics. In general, the meaning of many sentiment words is likely to be different across domains. For example, a sentiment word, 'sad' indicates negative meaning in many fields but a movie. In order to perform accurate sentiment analysis, we need to build the sentiment dictionary for a given domain. However, such a method of building the sentiment lexicon is time-consuming and various sentiment vocabularies are not included without the use of general-purpose sentiment lexicon. In order to address this problem, several studies have been carried out to construct the sentiment lexicon suitable for a specific domain based on 'OPEN HANGUL' and 'SentiWordNet', which are general-purpose sentiment lexicons. However, OPEN HANGUL is no longer being serviced and SentiWordNet does not work well because of language difference in the process of converting Korean word into English word. There are restrictions on the use of such general-purpose sentiment lexicons as seed data for building the sentiment lexicon for a specific domain. In this article, we construct 'KNU Korean Sentiment Lexicon (KNU-KSL)', a new general-purpose Korean sentiment dictionary that is more advanced than existing general-purpose lexicons. The proposed dictionary, which is a list of domain-independent sentiment words such as 'thank you', 'worthy', and 'impressed', is built to quickly construct the sentiment dictionary for a target domain. Especially, it constructs sentiment vocabularies by analyzing the glosses contained in Standard Korean Language Dictionary (SKLD) by the following procedures: First, we propose a sentiment classification model based on Bidirectional Long Short-Term Memory (Bi-LSTM). Second, the proposed deep learning model automatically classifies each of glosses to either positive or negative meaning. Third, positive words and phrases are extracted from the glosses classified as positive meaning, while negative words and phrases are extracted from the glosses classified as negative meaning. Our experimental results show that the average accuracy of the proposed sentiment classification model is up to 89.45%. In addition, the sentiment dictionary is more extended using various external sources including SentiWordNet, SenticNet, Emotional Verbs, and Sentiment Lexicon 0603. Furthermore, we add sentiment information about frequently used coined words and emoticons that are used mainly on the Web. The KNU-KSL contains a total of 14,843 sentiment vocabularies, each of which is one of 1-grams, 2-grams, phrases, and sentence patterns. Unlike existing sentiment dictionaries, it is composed of words that are not affected by particular domains. The recent trend on sentiment analysis is to use deep learning technique without sentiment dictionaries. The importance of developing sentiment dictionaries is declined gradually. However, one of recent studies shows that the words in the sentiment dictionary can be used as features of deep learning models, resulting in the sentiment analysis performed with higher accuracy (Teng, Z., 2016). This result indicates that the sentiment dictionary is used not only for sentiment analysis but also as features of deep learning models for improving accuracy. The proposed dictionary can be used as a basic data for constructing the sentiment lexicon of a particular domain and as features of deep learning models. It is also useful to automatically and quickly build large training sets for deep learning models.

Color-related Query Processing for Intelligent E-Commerce Search (지능형 검색엔진을 위한 색상 질의 처리 방안)

  • Hong, Jung A;Koo, Kyo Jung;Cha, Ji Won;Seo, Ah Jeong;Yeo, Un Yeong;Kim, Jong Woo
    • Journal of Intelligence and Information Systems
    • /
    • v.25 no.1
    • /
    • pp.109-125
    • /
    • 2019
  • As interest on intelligent search engines increases, various studies have been conducted to extract and utilize the features related to products intelligencely. In particular, when users search for goods in e-commerce search engines, the 'color' of a product is an important feature that describes the product. Therefore, it is necessary to deal with the synonyms of color terms in order to produce accurate results to user's color-related queries. Previous studies have suggested dictionary-based approach to process synonyms for color features. However, the dictionary-based approach has a limitation that it cannot handle unregistered color-related terms in user queries. In order to overcome the limitation of the conventional methods, this research proposes a model which extracts RGB values from an internet search engine in real time, and outputs similar color names based on designated color information. At first, a color term dictionary was constructed which includes color names and R, G, B values of each color from Korean color standard digital palette program and the Wikipedia color list for the basic color search. The dictionary has been made more robust by adding 138 color names converted from English color names to foreign words in Korean, and with corresponding RGB values. Therefore, the fininal color dictionary includes a total of 671 color names and corresponding RGB values. The method proposed in this research starts by searching for a specific color which a user searched for. Then, the presence of the searched color in the built-in color dictionary is checked. If there exists the color in the dictionary, the RGB values of the color in the dictioanry are used as reference values of the retrieved color. If the searched color does not exist in the dictionary, the top-5 Google image search results of the searched color are crawled and average RGB values are extracted in certain middle area of each image. To extract the RGB values in images, a variety of different ways was attempted since there are limits to simply obtain the average of the RGB values of the center area of images. As a result, clustering RGB values in image's certain area and making average value of the cluster with the highest density as the reference values showed the best performance. Based on the reference RGB values of the searched color, the RGB values of all the colors in the color dictionary constructed aforetime are compared. Then a color list is created with colors within the range of ${\pm}50$ for each R value, G value, and B value. Finally, using the Euclidean distance between the above results and the reference RGB values of the searched color, the color with the highest similarity from up to five colors becomes the final outcome. In order to evaluate the usefulness of the proposed method, we performed an experiment. In the experiment, 300 color names and corresponding color RGB values by the questionnaires were obtained. They are used to compare the RGB values obtained from four different methods including the proposed method. The average euclidean distance of CIE-Lab using our method was about 13.85, which showed a relatively low distance compared to 3088 for the case using synonym dictionary only and 30.38 for the case using the dictionary with Korean synonym website WordNet. The case which didn't use clustering method of the proposed method showed 13.88 of average euclidean distance, which implies the DBSCAN clustering of the proposed method can reduce the Euclidean distance. This research suggests a new color synonym processing method based on RGB values that combines the dictionary method with the real time synonym processing method for new color names. This method enables to get rid of the limit of the dictionary-based approach which is a conventional synonym processing method. This research can contribute to improve the intelligence of e-commerce search systems especially on the color searching feature.