• Title/Summary/Keyword: 기록시스템

Search Result 2,324, Processing Time 0.028 seconds

Design and Implementation of MongoDB-based Unstructured Log Processing System over Cloud Computing Environment (클라우드 환경에서 MongoDB 기반의 비정형 로그 처리 시스템 설계 및 구현)

  • Kim, Myoungjin;Han, Seungho;Cui, Yun;Lee, Hanku
    • Journal of Internet Computing and Services
    • /
    • v.14 no.6
    • /
    • pp.71-84
    • /
    • 2013
  • Log data, which record the multitude of information created when operating computer systems, are utilized in many processes, from carrying out computer system inspection and process optimization to providing customized user optimization. In this paper, we propose a MongoDB-based unstructured log processing system in a cloud environment for processing the massive amount of log data of banks. Most of the log data generated during banking operations come from handling a client's business. Therefore, in order to gather, store, categorize, and analyze the log data generated while processing the client's business, a separate log data processing system needs to be established. However, the realization of flexible storage expansion functions for processing a massive amount of unstructured log data and executing a considerable number of functions to categorize and analyze the stored unstructured log data is difficult in existing computer environments. Thus, in this study, we use cloud computing technology to realize a cloud-based log data processing system for processing unstructured log data that are difficult to process using the existing computing infrastructure's analysis tools and management system. The proposed system uses the IaaS (Infrastructure as a Service) cloud environment to provide a flexible expansion of computing resources and includes the ability to flexibly expand resources such as storage space and memory under conditions such as extended storage or rapid increase in log data. Moreover, to overcome the processing limits of the existing analysis tool when a real-time analysis of the aggregated unstructured log data is required, the proposed system includes a Hadoop-based analysis module for quick and reliable parallel-distributed processing of the massive amount of log data. Furthermore, because the HDFS (Hadoop Distributed File System) stores data by generating copies of the block units of the aggregated log data, the proposed system offers automatic restore functions for the system to continually operate after it recovers from a malfunction. Finally, by establishing a distributed database using the NoSQL-based Mongo DB, the proposed system provides methods of effectively processing unstructured log data. Relational databases such as the MySQL databases have complex schemas that are inappropriate for processing unstructured log data. Further, strict schemas like those of relational databases cannot expand nodes in the case wherein the stored data are distributed to various nodes when the amount of data rapidly increases. NoSQL does not provide the complex computations that relational databases may provide but can easily expand the database through node dispersion when the amount of data increases rapidly; it is a non-relational database with an appropriate structure for processing unstructured data. The data models of the NoSQL are usually classified as Key-Value, column-oriented, and document-oriented types. Of these, the representative document-oriented data model, MongoDB, which has a free schema structure, is used in the proposed system. MongoDB is introduced to the proposed system because it makes it easy to process unstructured log data through a flexible schema structure, facilitates flexible node expansion when the amount of data is rapidly increasing, and provides an Auto-Sharding function that automatically expands storage. The proposed system is composed of a log collector module, a log graph generator module, a MongoDB module, a Hadoop-based analysis module, and a MySQL module. When the log data generated over the entire client business process of each bank are sent to the cloud server, the log collector module collects and classifies data according to the type of log data and distributes it to the MongoDB module and the MySQL module. The log graph generator module generates the results of the log analysis of the MongoDB module, Hadoop-based analysis module, and the MySQL module per analysis time and type of the aggregated log data, and provides them to the user through a web interface. Log data that require a real-time log data analysis are stored in the MySQL module and provided real-time by the log graph generator module. The aggregated log data per unit time are stored in the MongoDB module and plotted in a graph according to the user's various analysis conditions. The aggregated log data in the MongoDB module are parallel-distributed and processed by the Hadoop-based analysis module. A comparative evaluation is carried out against a log data processing system that uses only MySQL for inserting log data and estimating query performance; this evaluation proves the proposed system's superiority. Moreover, an optimal chunk size is confirmed through the log data insert performance evaluation of MongoDB for various chunk sizes.

A study on optical coherence tomography system using optical fiber (광섬유를 이용한 광영상 단층촬영기에 관한연구)

  • 양승국;박양하;장원석;오상기;김현덕;김기문
    • Proceedings of the Korean Institute of Navigation and Port Research Conference
    • /
    • 2004.04a
    • /
    • pp.5-9
    • /
    • 2004
  • In this paper, we studied the OCT(Optical Coherence Tomography) system which it has been extensively studied because of having some advantages such as high resolution cross-sectional images, low cost, and small size configuration. A basic principle of OCT system is Michelson interferometer. The characteristics of light source determine the resolution and the transmission depth. As a results, the light source have a commercial SLD with a central wavelength of 1,285 nm and FWHM(Full Width at Half Maximum) of 35.3 nm. The optical delay line part is necessary to equal of the optical path length with scattered light or reflected light from sample. In order to equal the optical path length, the stage which is attached to reference mirror is moved linearly by step motor And the interferometer is configured with the Michelson interferometer using single mod fiber, the scanner can be focused of the sample by using the reference arm. Also, the 2-dimensional cross-sectional images were measured with scanning the transverse direction of the sample by using step motor. After detecting the internal signal of lateral direction at a paint of sample, scanner is moved to obtain the cross-sectional image of 2-demensional by using step motor. Photodiode has been used which has high detection sensitivity, excellent noise characteristic, and dynamic range from 800 nm to 1,700 nm. It is detected mixed small signal between noise and interference signal with high frequency After filtering and amplifying this signal, only envelope curve of interference signal is detected. And then, cross-sectional image is shown through converting this signal into digitalized signal using A/D converter. The resolution of the OCT system is about 30$\mu\textrm{m}$ which corresponds to the theoretical resolution. Also, the cross-sectional image of ping-pong ball is measured. The OCT system is configured with Michelson interferometer which has a low contrast because of reducing the power of feedback interference light. Such a problem is overcomed by using the improved inteferometer. Also, in order to obtain the cross-sectional image within a short time, it is necessary to reduce the measurement time for improving the optical delay line.

  • PDF

The status, classification and data characteristics of Seonsaengan(先生案, The predecessor's lists) in Jangseogak(藏書閣, Joseon dynasty royal library) (장서각 소장 선생안(先生案)의 현황과 사료적 가치)

  • Yi, Nam-ok
    • (The)Study of the Eastern Classic
    • /
    • no.69
    • /
    • pp.9-44
    • /
    • 2017
  • Seonsaengan(先生案) is the predecessor's lists. The list includes the names of the predecessor, the date of the appointment, the date of return, the previous job, and the next job. Therefore, previous studies on the local recruitment and Jungin (中人) that can not be found in general personnel information of the Joseon dynasty were conducted. However, the status and classification of the list has not been achieved yet. So this study aims to clarify the status, classification and data characteristics of the list. 176 books, are the Joseon dynasty lists of predecessors, remain to this day. These lists are in Jangseogak(47 cases), Kyujanggak(80 cases), the National Library of Korea(24 cases) and other collections(25 cases). Jangseogak has lists of royal government officials, Kyujanggak has lists of central government officials, and the National Library of Korea and other collections have lists of local government officials. However, this paper focuses on accessible Jangseogak list of 47 cases. As I mentioned earlier, the Jangsaegak lists are generally related to the royal government officails. This classification includes 18 central government officials, 5 local government officials, and 24 royal government officails. If the list is classified as contents, it can be classified into six rituals and diplomatic officials, 12 royal government officials, 5 local government officials, 14 royal tombs officials, and 10 royal education officials. Through the information on the list, the following six characteristics can be summarized. First, it can be finded the basic personal information about the recorded person. Second, the period of office and reasons for leaving the office and office can be known. Third, changes in the office system can be confirmed. Fourth, it can be looked at one aspect of the personnel administration system of the Joseon Dynasty through the previous workplace and the next job. Fifth, it is possible to know days that are particularly important for each government. Sixth, the contents of work evaluation can be confirmed. This is the reality of the Joseon Dynasty, which is different from the contents recorded in the Code. Through this, it is possible to look at the personnel administration system of the Joseon Dynasty. However, in order to carry out a precise review, it is necessary to make a database for 176 lists. In addition, if data is analyzed in connection with existing genealogy data, it will be possible to establish a basis for understanding the personnel administration system of the Joseon Dynasty.

A Study on Developing a VKOSPI Forecasting Model via GARCH Class Models for Intelligent Volatility Trading Systems (지능형 변동성트레이딩시스템개발을 위한 GARCH 모형을 통한 VKOSPI 예측모형 개발에 관한 연구)

  • Kim, Sun-Woong
    • Journal of Intelligence and Information Systems
    • /
    • v.16 no.2
    • /
    • pp.19-32
    • /
    • 2010
  • Volatility plays a central role in both academic and practical applications, especially in pricing financial derivative products and trading volatility strategies. This study presents a novel mechanism based on generalized autoregressive conditional heteroskedasticity (GARCH) models that is able to enhance the performance of intelligent volatility trading systems by predicting Korean stock market volatility more accurately. In particular, we embedded the concept of the volatility asymmetry documented widely in the literature into our model. The newly developed Korean stock market volatility index of KOSPI 200, VKOSPI, is used as a volatility proxy. It is the price of a linear portfolio of the KOSPI 200 index options and measures the effect of the expectations of dealers and option traders on stock market volatility for 30 calendar days. The KOSPI 200 index options market started in 1997 and has become the most actively traded market in the world. Its trading volume is more than 10 million contracts a day and records the highest of all the stock index option markets. Therefore, analyzing the VKOSPI has great importance in understanding volatility inherent in option prices and can afford some trading ideas for futures and option dealers. Use of the VKOSPI as volatility proxy avoids statistical estimation problems associated with other measures of volatility since the VKOSPI is model-free expected volatility of market participants calculated directly from the transacted option prices. This study estimates the symmetric and asymmetric GARCH models for the KOSPI 200 index from January 2003 to December 2006 by the maximum likelihood procedure. Asymmetric GARCH models include GJR-GARCH model of Glosten, Jagannathan and Runke, exponential GARCH model of Nelson and power autoregressive conditional heteroskedasticity (ARCH) of Ding, Granger and Engle. Symmetric GARCH model indicates basic GARCH (1, 1). Tomorrow's forecasted value and change direction of stock market volatility are obtained by recursive GARCH specifications from January 2007 to December 2009 and are compared with the VKOSPI. Empirical results indicate that negative unanticipated returns increase volatility more than positive return shocks of equal magnitude decrease volatility, indicating the existence of volatility asymmetry in the Korean stock market. The point value and change direction of tomorrow VKOSPI are estimated and forecasted by GARCH models. Volatility trading system is developed using the forecasted change direction of the VKOSPI, that is, if tomorrow VKOSPI is expected to rise, a long straddle or strangle position is established. A short straddle or strangle position is taken if VKOSPI is expected to fall tomorrow. Total profit is calculated as the cumulative sum of the VKOSPI percentage change. If forecasted direction is correct, the absolute value of the VKOSPI percentage changes is added to trading profit. It is subtracted from the trading profit if forecasted direction is not correct. For the in-sample period, the power ARCH model best fits in a statistical metric, Mean Squared Prediction Error (MSPE), and the exponential GARCH model shows the highest Mean Correct Prediction (MCP). The power ARCH model best fits also for the out-of-sample period and provides the highest probability for the VKOSPI change direction tomorrow. Generally, the power ARCH model shows the best fit for the VKOSPI. All the GARCH models provide trading profits for volatility trading system and the exponential GARCH model shows the best performance, annual profit of 197.56%, during the in-sample period. The GARCH models present trading profits during the out-of-sample period except for the exponential GARCH model. During the out-of-sample period, the power ARCH model shows the largest annual trading profit of 38%. The volatility clustering and asymmetry found in this research are the reflection of volatility non-linearity. This further suggests that combining the asymmetric GARCH models and artificial neural networks can significantly enhance the performance of the suggested volatility trading system, since artificial neural networks have been shown to effectively model nonlinear relationships.

Index-based Searching on Timestamped Event Sequences (타임스탬프를 갖는 이벤트 시퀀스의 인덱스 기반 검색)

  • 박상현;원정임;윤지희;김상욱
    • Journal of KIISE:Databases
    • /
    • v.31 no.5
    • /
    • pp.468-478
    • /
    • 2004
  • It is essential in various application areas of data mining and bioinformatics to effectively retrieve the occurrences of interesting patterns from sequence databases. For example, let's consider a network event management system that records the types and timestamp values of events occurred in a specific network component(ex. router). The typical query to find out the temporal casual relationships among the network events is as fellows: 'Find all occurrences of CiscoDCDLinkUp that are fellowed by MLMStatusUP that are subsequently followed by TCPConnectionClose, under the constraint that the interval between the first two events is not larger than 20 seconds, and the interval between the first and third events is not larger than 40 secondsTCPConnectionClose. This paper proposes an indexing method that enables to efficiently answer such a query. Unlike the previous methods that rely on inefficient sequential scan methods or data structures not easily supported by DBMSs, the proposed method uses a multi-dimensional spatial index, which is proven to be efficient both in storage and search, to find the answers quickly without false dismissals. Given a sliding window W, the input to a multi-dimensional spatial index is a n-dimensional vector whose i-th element is the interval between the first event of W and the first occurrence of the event type Ei in W. Here, n is the number of event types that can be occurred in the system of interest. The problem of‘dimensionality curse’may happen when n is large. Therefore, we use the dimension selection or event type grouping to avoid this problem. The experimental results reveal that our proposed technique can be a few orders of magnitude faster than the sequential scan and ISO-Depth index methods.hods.

Extraction of Primary Factors Influencing Dam Operation Using Factor Analysis (요인분석 통계기법을 이용한 댐 운영에 대한 영향 요인 추출)

  • Kang, Min-Goo;Jung, Chan-Yong;Lee, Gwang-Man
    • Journal of Korea Water Resources Association
    • /
    • v.40 no.10
    • /
    • pp.769-781
    • /
    • 2007
  • Factor analysis has been usually employed in reducing quantity of data and summarizing information on a system or phenomenon. In this analysis methodology, variables are grouped into several factors by consideration of statistic characteristics, and the results are used for dropping variables which have lower weight than others. In this study, factor analysis was applied for extracting primary factors influencing multi-dam system operation in the Han River basin, where there are two multi-purpose dams such as Soyanggang Dam and Chungju Dam, and water has been supplied by integrating two dams in water use season. In order to fulfill factor analysis, first the variables related to two dams operation were gathered and divided into five groups (Soyanggang Dam: inflow, hydropower product, storage management, storage, and operation results of the past; Chungju Dam: inflow, hydropower product, water demand, storage, and operation results of the past). And then, considering statistic properties, in the gathered variables, some variables were chosen and grouped into five factors; hydrological condition, dam operation of the past, dam operation at normal season, water demand, and downstream dam operation. In order to check the appropriateness and applicability of factors, a multiple regression equation was newly constructed using factors as description variables, and those factors were compared with terms of objective function used in operation water resources optimally in a river basin. Reviewing the results through two check processes, it was revealed that the suggested approach provided satisfactory results. And, it was expected for extracted primary factors to be useful for making dam operation schedule considering the future situation and previous results.

Development trend of the mushroom industry (버섯 산업의 발달 동향)

  • Yoo, Young Bok;Oh, Min Ji;Oh, Youn Lee;Shin, Pyung Gyun;Jang, Kab Yeul;Kong, Won Sik
    • Journal of Mushroom
    • /
    • v.14 no.4
    • /
    • pp.142-154
    • /
    • 2016
  • Worldwide production of mushrooms has been increasing by 10-20% every year. Recently, Pleurotus eryngii and P. nebrodensis have become popular mushroom species for cultivation. In particular, China exceeded 8.7 million tons in 2002, which accounted for 71.5% of total world output. A similar trend was also observed in Korea. Two kinds of mushrooms-Gumji (金芝; Ganoderma) and Seoji-are described in the ancient book 'Samguksagi' (History of the three kingdoms; B.C 57~A.D 668; written by Bu Sik Kim in 1145) during the Korea-dynasty. Many kinds of mushrooms are also described in more than 17 ancient books during the Chosun-dynasty (1392~1910) in Korea. Approximately 200 commercial strains of 38 species of mushrooms were developed and distributed to cultivators. The somatic hybrid variety of oyster mushroom, 'Wonhyeong-neutari,' was developed by protoplast fusion, and distributed to growers in 1989. Further, the production of mushrooms as food was 199,829 metric tons, valued at 850 billion Korean Won (one trillion won if mushroom factory products are included) in 2015. In Korea, the major cultivated species are P. ostreatus, P. eryngii, Flammulina velutipes, Lentinula edodes, Agaricus bisporus, and Ganoderma lucidum, which account for 90% of the total production. Since mushroom export was initiated in 1960, the export and import of mushrooms have increased in Korea. Technology was developed for liquid spawn production, and automatic cultivation systems led to the reduction of production cost, resulting in the increase in mushroom export. However, some species were imported owing to high production costs for effective cultivation methods. In academia, RDA scientists have conducted mushroom genome projects since 1997. One of the main outcomes is the whole genome sequencing of Flammulina velutipes for molecular breeding. With regard to medicinal mushrooms, we have been conducting genome research on Cordyceps and its related species for developing functional foods. There are various kinds of beneficial substances in mushrooms; mushroom products, including pharmaceuticals, tonics, healthy beverages, functional biotransformants, and processed foods have also became available on the market. In addition, compost and feed can likewise be made from mushroom substrates after harvest.

The Comparison of the Solar Radiation and the Mean Radiant Temperature (MRT) under the Shade of Landscaping Trees in Summertime (하절기 조경용 녹음수 수관 하부의 일사와 평균복사온도 비교)

  • Lee, Chun-Seok;Ryu, Nam-Hyung
    • Journal of the Korean Institute of Landscape Architecture
    • /
    • v.42 no.5
    • /
    • pp.22-30
    • /
    • 2014
  • The purpose of this study was to compare the Solar Radiation(SR) and the Mean Radiant Temperature(MRT) under the shades of the three landscaping trees in clear summer daytimes. The trees were Lagerstroemia indica, Quercus palustris and Ulmus parvifolia. The solar radiation, the globe temperature and the air temperature were recorded every minute from the $1^{st}$ of April to the $30^{th}$ of September 2013 at a height of 1.1m above on the four monitoring stations, with four same measuring system consisting of a solar radiation sensor, two resistance temperature detectors(Pt-100), a black brass globe (${\phi}50mm$) and data acquisition systems. At the same time, the sky view photos were taken automatically hourly by three scouting cameras(lens angle: $60^{\circ}$) fixed at each monitoring station. Based on the 258 daily sky view photos and 6,640 records of middays(10 A.M.~2 P.M.) from the $1^{st}$ of June to the $30^{th}$ of August, the time serial differences of SR and MRT under the trees were analysed and compared with those of open sky, The major findings were as follows; 1. The average ratio of sky views screened by the canopies of Quercus palustris, Lagerstroemia indica and Ulmus parvifolia were 99%, 98% and 97%, and the SR were $106W/m^2$, $163W/m^2$ and $202W/m^2$ respectively, while the SR of open sky was $823W/m^2$. Which shows the canopies blocked at least 70% of natural SR. 2. The average MRT under the canopies of Quercus palustris, Lagerstroemia indica and Ulmus parvifolia were $30.34^{\circ}C$, $33.34^{\circ}C$ and $34.77^{\circ}C$ respectively, while that of open sky was $46.0^{\circ}C$. Therefore, it can be said that the tree canopies can reduce the MRT around $10{\sim}16^{\circ}C$. 3. The regression test showed significant linear relationship between the SR and MRT. In summary, the performances of the landscaping shade trees were very good at screening the SR and reducing the MRT at the outdoor of summer middays. Therefore, it can be apparently said that the more shade trees or forest at the outdoor, the more effective in conditioning the outdoor space reducing the MRT and the useless SR for human activities in summertime.

Analysis of the Annual Earnings used as the Sire Evaluation Criteria in Home-produced Thoroughbred Racehorses (국내산 더러브렛 경주마의 씨수말 평가 기준으로 이용되는 연간수득상금 분석)

  • Lee, Do-Hyeong;Kong, Hong-Sik;Lee, Hak-Kyo;Park, Kyung-Do;Cho, Byung-Wook;Choy, Yun-Ho;Jeon, Byeong-Soon;Cho, Kwang-Hyun;Sin, Young-Soo
    • Journal of Animal Science and Technology
    • /
    • v.53 no.4
    • /
    • pp.319-324
    • /
    • 2011
  • This study was conducted to analyze demerits of the sire evaluation system using annual earnings and to examine relationship between annual earnings and finish time in home-produced thoroughbred racehorses. The average number of progenies and number of starts per sire were 34 heads and 221 times, respectively. On the other hand, the number of progenies with the average age of 2 years and the number of starts were 9 heads and 25 times, respectively. The earnings of the horses with the age of 2 years accounted for 8.3% of annual earnings. The simple correlation coefficient between the number of progenies and the number of starts in annual earnings were 0.922 and 0.934, respectively. The correlation coefficient between the number of progenies and the number of starts was very high (0.985). The number of progenies and starts of sires for the first year of test career were very low (6 heads and 17 times), and there was very close relationship between number of progenies and annual earnings by the year of test career. The number of progenies was over 40 heads during the first 4 years of test career, and as the number of progenies increased the average earning index increased. The average earning index of sires with less than 30 progenies was lower than 1.00. When the number of progenies was less than 10, the average earning index was in the range of 0.06~0.13, indicating that the number of progenies affects much for determining the ranking of sires. The correlation coefficient between breeding value for finish time and annual earnings per start was very high (-0.524~-0.633) compared with other traits.

User-Perspective Issue Clustering Using Multi-Layered Two-Mode Network Analysis (다계층 이원 네트워크를 활용한 사용자 관점의 이슈 클러스터링)

  • Kim, Jieun;Kim, Namgyu;Cho, Yoonho
    • Journal of Intelligence and Information Systems
    • /
    • v.20 no.2
    • /
    • pp.93-107
    • /
    • 2014
  • In this paper, we report what we have observed with regard to user-perspective issue clustering based on multi-layered two-mode network analysis. This work is significant in the context of data collection by companies about customer needs. Most companies have failed to uncover such needs for products or services properly in terms of demographic data such as age, income levels, and purchase history. Because of excessive reliance on limited internal data, most recommendation systems do not provide decision makers with appropriate business information for current business circumstances. However, part of the problem is the increasing regulation of personal data gathering and privacy. This makes demographic or transaction data collection more difficult, and is a significant hurdle for traditional recommendation approaches because these systems demand a great deal of personal data or transaction logs. Our motivation for presenting this paper to academia is our strong belief, and evidence, that most customers' requirements for products can be effectively and efficiently analyzed from unstructured textual data such as Internet news text. In order to derive users' requirements from textual data obtained online, the proposed approach in this paper attempts to construct double two-mode networks, such as a user-news network and news-issue network, and to integrate these into one quasi-network as the input for issue clustering. One of the contributions of this research is the development of a methodology utilizing enormous amounts of unstructured textual data for user-oriented issue clustering by leveraging existing text mining and social network analysis. In order to build multi-layered two-mode networks of news logs, we need some tools such as text mining and topic analysis. We used not only SAS Enterprise Miner 12.1, which provides a text miner module and cluster module for textual data analysis, but also NetMiner 4 for network visualization and analysis. Our approach for user-perspective issue clustering is composed of six main phases: crawling, topic analysis, access pattern analysis, network merging, network conversion, and clustering. In the first phase, we collect visit logs for news sites by crawler. After gathering unstructured news article data, the topic analysis phase extracts issues from each news article in order to build an article-news network. For simplicity, 100 topics are extracted from 13,652 articles. In the third phase, a user-article network is constructed with access patterns derived from web transaction logs. The double two-mode networks are then merged into a quasi-network of user-issue. Finally, in the user-oriented issue-clustering phase, we classify issues through structural equivalence, and compare these with the clustering results from statistical tools and network analysis. An experiment with a large dataset was performed to build a multi-layer two-mode network. After that, we compared the results of issue clustering from SAS with that of network analysis. The experimental dataset was from a web site ranking site, and the biggest portal site in Korea. The sample dataset contains 150 million transaction logs and 13,652 news articles of 5,000 panels over one year. User-article and article-issue networks are constructed and merged into a user-issue quasi-network using Netminer. Our issue-clustering results applied the Partitioning Around Medoids (PAM) algorithm and Multidimensional Scaling (MDS), and are consistent with the results from SAS clustering. In spite of extensive efforts to provide user information with recommendation systems, most projects are successful only when companies have sufficient data about users and transactions. Our proposed methodology, user-perspective issue clustering, can provide practical support to decision-making in companies because it enhances user-related data from unstructured textual data. To overcome the problem of insufficient data from traditional approaches, our methodology infers customers' real interests by utilizing web transaction logs. In addition, we suggest topic analysis and issue clustering as a practical means of issue identification.