• Title/Summary/Keyword: 데이터 처리시스템

Search Result 8,401, Processing Time 0.036 seconds

A Study on Automatic Classification Model of Documents Based on Korean Standard Industrial Classification (한국표준산업분류를 기준으로 한 문서의 자동 분류 모델에 관한 연구)

  • Lee, Jae-Seong;Jun, Seung-Pyo;Yoo, Hyoung Sun
    • Journal of Intelligence and Information Systems
    • /
    • v.24 no.3
    • /
    • pp.221-241
    • /
    • 2018
  • As we enter the knowledge society, the importance of information as a new form of capital is being emphasized. The importance of information classification is also increasing for efficient management of digital information produced exponentially. In this study, we tried to automatically classify and provide tailored information that can help companies decide to make technology commercialization. Therefore, we propose a method to classify information based on Korea Standard Industry Classification (KSIC), which indicates the business characteristics of enterprises. The classification of information or documents has been largely based on machine learning, but there is not enough training data categorized on the basis of KSIC. Therefore, this study applied the method of calculating similarity between documents. Specifically, a method and a model for presenting the most appropriate KSIC code are proposed by collecting explanatory texts of each code of KSIC and calculating the similarity with the classification object document using the vector space model. The IPC data were collected and classified by KSIC. And then verified the methodology by comparing it with the KSIC-IPC concordance table provided by the Korean Intellectual Property Office. As a result of the verification, the highest agreement was obtained when the LT method, which is a kind of TF-IDF calculation formula, was applied. At this time, the degree of match of the first rank matching KSIC was 53% and the cumulative match of the fifth ranking was 76%. Through this, it can be confirmed that KSIC classification of technology, industry, and market information that SMEs need more quantitatively and objectively is possible. In addition, it is considered that the methods and results provided in this study can be used as a basic data to help the qualitative judgment of experts in creating a linkage table between heterogeneous classification systems.

An Analysis into the Characteristics of the High-pass Transportation Data and Information Processing Measures on Urban Roads (도시부도로에서의 하이패스 교통자료 특성분석 및 정보가공방안)

  • Jung, Min-Chul;Kim, Young-Chan;Kim, Dong-Hyo
    • The Journal of The Korea Institute of Intelligent Transport Systems
    • /
    • v.10 no.6
    • /
    • pp.74-83
    • /
    • 2011
  • The high-pass transportation information system directly collects section information by using probe cars and therefore can offer more reliable information to drivers. However, because the running condition and features of probe cars and statistical processing methods affect the reliability of the information and particularly because the section travel time is greatly influenced by whether there has been delay by signals on urban roads or not, there can be much deviation among the collected individual probe data. Accordingly, researches in multilateral directions are necessary in order to enhance the credibility of the section information. Yet, the precedent studies related to high-pass information provision have been conducted on the highway sections with the feature of continuous flow, which has a limit to be applied to the urban roads with the transportational feature of an interrupted flow. Therefore, this research aims at analyzing the features of high-pass transportation data on urban roads and finding a proper processing method. When the characteristics of the high-pass data on urban roads collected from RSE were analyzed by using a time-space diagram, the collected data was proved to have a certain pattern according to the arriving cars' waiting for signals with the period of the signaling cycle of the finish node. Moreover, the number of waiting for signals and the time of waiting caused the deviation in the collected data, and it was bigger in traffic jam. The analysis result showed that it was because the increased number of waiting for signals in traffic jam caused the deviation to be offset partially. The analysis result shows that it is appropriate to use the mean of this collected data of high-pass on urban roads as its representative value to reflect the transportational features by waiting for signals, and the standard of judgment of delay and congestion needs to be changed depending on the features of signals and roads. The results of this research are expected to be the foundation stone to improve the reliability of high-pass information on urban roads.

Analysis of Optimal Pathways for Terrestrial LiDAR Scanning for the Establishment of Digital Inventory of Forest Resources (디지털 산림자원정보 구축을 위한 최적의 지상LiDAR 스캔 경로 분석)

  • Ko, Chi-Ung;Yim, Jong-Su;Kim, Dong-Geun;Kang, Jin-Taek
    • Korean Journal of Remote Sensing
    • /
    • v.37 no.2
    • /
    • pp.245-256
    • /
    • 2021
  • This study was conducted to identify the applicability of a LiDAR sensor to forest resources inventories by comparing data on a tree's position, height, and DBH obtained by the sensor with those by existing forest inventory methods, for the tree species of Criptomeria japonica in Jeolmul forest in Jeju, South Korea. To this end, a backpack personal LiDAR (Greenvalley International, Model D50) was employed. To facilitate the process of the data collection, patterns of collecting the data by the sensor were divided into seven ones, considering the density of sample plots and the work efficiency. Then, the accuracy of estimating the variables of each tree was assessed. The amount of time spent on acquiring and processing the data by each method was compared to evaluate the efficiency. The findings showed that the rate of detecting standing trees by the LiDAR was 100%. Also, the high statistical accuracy was observed in both Pattern 5 (DBH: RMSE 1.07 cm, Bias -0.79 cm, Height: RMSE 0.95 m, Bias -3.2 m), and Pattern 7 (DBH: RMSE 1.18 cm, Bias -0.82 cm, Height: RMSE 1.13 m, Bias -2.62 m), compared to the results drawn in the typical inventory manner. Concerning the time issue, 115 to 135 minutes per 1ha were taken to process the data by utilizing the LiDAR, while 375 to 1,115 spent in the existing way, proving the higher efficiency of the device. It can thus be concluded that using a backpack personal LiDAR helps increase efficiency in conducting a forest resources inventory in an planted coniferous forest with understory vegetation, implying a need for further research in a variety of forests.

Observation of Methane Flux in Rice Paddies Using a Portable Gas Analyzer and an Automatic Opening/Closing Chamber (휴대용 기체분석기와 자동 개폐 챔버를 활용한 벼논에서의 메탄 플럭스 관측)

  • Sung-Won Choi;Minseok Kang;Jongho Kim;Seungwon Sohn;Sungsik Cho;Juhan Park
    • Korean Journal of Agricultural and Forest Meteorology
    • /
    • v.25 no.4
    • /
    • pp.436-445
    • /
    • 2023
  • Methane (CH4) emissions from rice paddies are mainly observed using the closed chamber method or the eddy covariance method. In this study, a new observation technique combining a portable gas analyzer (Model LI-7810, LI-COR, Inc., USA) and an automatic opening/closing chamber (Model Smart Chamber, LI-COR, Inc., USA) was introduced based on the strengths and weaknesses of the existing measurement methods. A cylindrical collar was manufactured according to the maximum growth height of rice and used as an auxiliary measurement tool. All types of measured data can be monitored in real time, and CH4 flux is also calculated simultaneously during the measurement. After the measurement is completed, all the related data can be checked using the software called 'SoilFluxPro'. The biggest advantage of the new observation technique is that time-series changes in greenhouse gas concentrations can be immediately confirmed in the field. It can also be applied to small areas with various treatment conditions, and it is simpler to use and requires less effort for installation and maintenance than the eddy covariance system. However, there are also disadvantages in that the observation system is still expensive, requires specialized knowledge to operate, and requires a lot of manpower to install multiple collars in various observation areas and travel around them to take measurements. It is expected that the new observation technique can make a significant contribution to understanding the CH4 emission pathways from rice paddies and quantifying the emissions from those pathways.

Customer Behavior Prediction of Binary Classification Model Using Unstructured Information and Convolution Neural Network: The Case of Online Storefront (비정형 정보와 CNN 기법을 활용한 이진 분류 모델의 고객 행태 예측: 전자상거래 사례를 중심으로)

  • Kim, Seungsoo;Kim, Jongwoo
    • Journal of Intelligence and Information Systems
    • /
    • v.24 no.2
    • /
    • pp.221-241
    • /
    • 2018
  • Deep learning is getting attention recently. The deep learning technique which had been applied in competitions of the International Conference on Image Recognition Technology(ILSVR) and AlphaGo is Convolution Neural Network(CNN). CNN is characterized in that the input image is divided into small sections to recognize the partial features and combine them to recognize as a whole. Deep learning technologies are expected to bring a lot of changes in our lives, but until now, its applications have been limited to image recognition and natural language processing. The use of deep learning techniques for business problems is still an early research stage. If their performance is proved, they can be applied to traditional business problems such as future marketing response prediction, fraud transaction detection, bankruptcy prediction, and so on. So, it is a very meaningful experiment to diagnose the possibility of solving business problems using deep learning technologies based on the case of online shopping companies which have big data, are relatively easy to identify customer behavior and has high utilization values. Especially, in online shopping companies, the competition environment is rapidly changing and becoming more intense. Therefore, analysis of customer behavior for maximizing profit is becoming more and more important for online shopping companies. In this study, we propose 'CNN model of Heterogeneous Information Integration' using CNN as a way to improve the predictive power of customer behavior in online shopping enterprises. In order to propose a model that optimizes the performance, which is a model that learns from the convolution neural network of the multi-layer perceptron structure by combining structured and unstructured information, this model uses 'heterogeneous information integration', 'unstructured information vector conversion', 'multi-layer perceptron design', and evaluate the performance of each architecture, and confirm the proposed model based on the results. In addition, the target variables for predicting customer behavior are defined as six binary classification problems: re-purchaser, churn, frequent shopper, frequent refund shopper, high amount shopper, high discount shopper. In order to verify the usefulness of the proposed model, we conducted experiments using actual data of domestic specific online shopping company. This experiment uses actual transactions, customers, and VOC data of specific online shopping company in Korea. Data extraction criteria are defined for 47,947 customers who registered at least one VOC in January 2011 (1 month). The customer profiles of these customers, as well as a total of 19 months of trading data from September 2010 to March 2012, and VOCs posted for a month are used. The experiment of this study is divided into two stages. In the first step, we evaluate three architectures that affect the performance of the proposed model and select optimal parameters. We evaluate the performance with the proposed model. Experimental results show that the proposed model, which combines both structured and unstructured information, is superior compared to NBC(Naïve Bayes classification), SVM(Support vector machine), and ANN(Artificial neural network). Therefore, it is significant that the use of unstructured information contributes to predict customer behavior, and that CNN can be applied to solve business problems as well as image recognition and natural language processing problems. It can be confirmed through experiments that CNN is more effective in understanding and interpreting the meaning of context in text VOC data. And it is significant that the empirical research based on the actual data of the e-commerce company can extract very meaningful information from the VOC data written in the text format directly by the customer in the prediction of the customer behavior. Finally, through various experiments, it is possible to say that the proposed model provides useful information for the future research related to the parameter selection and its performance.

Hybrid Scheme of Data Cache Design for Reducing Energy Consumption in High Performance Embedded Processor (고성능 내장형 프로세서의 에너지 소비 감소를 위한 데이타 캐쉬 통합 설계 방법)

  • Shim, Sung-Hoon;Kim, Cheol-Hong;Jhang, Seong-Tae;Jhon, Chu-Shik
    • Journal of KIISE:Computer Systems and Theory
    • /
    • v.33 no.3
    • /
    • pp.166-177
    • /
    • 2006
  • The cache size tends to grow in the embedded processor as technology scales to smaller transistors and lower supply voltages. However, larger cache size demands more energy. Accordingly, the ratio of the cache energy consumption to the total processor energy is growing. Many cache energy schemes have been proposed for reducing the cache energy consumption. However, these previous schemes are concerned with one side for reducing the cache energy consumption, dynamic cache energy only, or static cache energy only. In this paper, we propose a hybrid scheme for reducing dynamic and static cache energy, simultaneously. for this hybrid scheme, we adopt two existing techniques to reduce static cache energy consumption, drowsy cache technique, and to reduce dynamic cache energy consumption, way-prediction technique. Additionally, we propose a early wake-up technique based on program counter to reduce penalty caused by applying drowsy cache technique. We focus on level 1 data cache. The hybrid scheme can reduce static and dynamic cache energy consumption simultaneously, furthermore our early wake-up scheme can reduce extra program execution cycles caused by applying the hybrid scheme.

The Efficient Merge Operation in Log Buffer-Based Flash Translation Layer for Enhanced Random Writing (임의쓰기 성능향상을 위한 로그블록 기반 FTL의 효율적인 합병연산)

  • Lee, Jun-Hyuk;Roh, Hong-Chan;Park, Sang-Hyun
    • The KIPS Transactions:PartD
    • /
    • v.19D no.2
    • /
    • pp.161-186
    • /
    • 2012
  • Recently, the flash memory consistently increases the storage capacity while the price of the memory is being cheap. This makes the mass storage SSD(Solid State Drive) popular. The flash memory, however, has a lot of defects. In order that these defects should be complimented, it is needed to use the FTL(Flash Translation Layer) as a special layer. To operate restrictions of the hardware efficiently, the FTL that is essential to work plays a role of transferring from the logical sector number of file systems to the physical sector number of the flash memory. Especially, the poor performance is attributed to Erase-Before-Write among the flash memory's restrictions, and even if there are lots of studies based on the log block, a few problems still exists in order for the mass storage flash memory to be operated. If the FAST based on Log Block-Based Flash often is generated in the wide locality causing the random writing, the merge operation will be occur as the sectors is not used in the data block. In other words, the block thrashing which is not effective occurs and then, the flash memory's performance get worse. If the log-block makes the overwriting caused, the log-block is executed like a cache and this technique contributes to developing the flash memory performance improvement. This study for the improvement of the random writing demonstrates that the log block is operated like not only the cache but also the entire flash memory so that the merge operation and the erase operation are diminished as there are a distinct mapping table called as the offset mapping table for the operation. The new FTL is to be defined as the XAST(extensively-Associative Sector Translation). The XAST manages the offset mapping table with efficiency based on the spatial locality and temporal locality.

Micro-CT System for Small Animal Imaging (소동물영상을 위한 마이크로 컴퓨터단층촬영장치)

  • Nam, Ki-Yong;Kim, Kyong-Woo;Kim, Jae-Hee;Son, Hyun-Hwa;Ryu, Jeong-Hyun;Kang, Seoung-Hoon;Chon, Kwon-Su;Park, Seong-Hoon;Yoon, Kwon-Ha
    • Progress in Medical Physics
    • /
    • v.19 no.2
    • /
    • pp.102-112
    • /
    • 2008
  • We developed a high-resolution micro-CT system based on rotational gantry and flat-panel detector for live mouse imaging. This system is composed primarily of an x-ray source with micro-focal spot size, a CMOS (complementary metal oxide semiconductor) flat panel detector coupled with Csl (TI) (thallium-doped cesium iodide) scintillator, a linearly moving couch, a rotational gantry coupled with positioning encoder, and a parallel processing system for image data. This system was designed to be of the gantry-rotation type which has several advantages in obtaining CT images of live mice, namely, the relative ease of minimizing the motion artifact of the mice and the capability of administering respiratory anesthesia during scanning. We evaluated the spatial resolution, image contrast, and uniformity of the CT system using CT phantoms. As the results, the spatial resolution of the system was approximately the 11.3 cycles/mm at 10% of the MTF curve, and the radiation dose to the mice was 81.5 mGy. The minimal resolving contrast was found to be less than 46 CT numbers on low-contrast phantom imaging test. We found that the image non-uniformity was approximately 70 CT numbers at a voxel size of ${\sim}55{\times}55{\times}X100\;{\mu}^3$. We present the image test results of the skull and lung, and body of the live mice.

  • PDF

Construction of Event Networks from Large News Data Using Text Mining Techniques (텍스트 마이닝 기법을 적용한 뉴스 데이터에서의 사건 네트워크 구축)

  • Lee, Minchul;Kim, Hea-Jin
    • Journal of Intelligence and Information Systems
    • /
    • v.24 no.1
    • /
    • pp.183-203
    • /
    • 2018
  • News articles are the most suitable medium for examining the events occurring at home and abroad. Especially, as the development of information and communication technology has brought various kinds of online news media, the news about the events occurring in society has increased greatly. So automatically summarizing key events from massive amounts of news data will help users to look at many of the events at a glance. In addition, if we build and provide an event network based on the relevance of events, it will be able to greatly help the reader in understanding the current events. In this study, we propose a method for extracting event networks from large news text data. To this end, we first collected Korean political and social articles from March 2016 to March 2017, and integrated the synonyms by leaving only meaningful words through preprocessing using NPMI and Word2Vec. Latent Dirichlet allocation (LDA) topic modeling was used to calculate the subject distribution by date and to find the peak of the subject distribution and to detect the event. A total of 32 topics were extracted from the topic modeling, and the point of occurrence of the event was deduced by looking at the point at which each subject distribution surged. As a result, a total of 85 events were detected, but the final 16 events were filtered and presented using the Gaussian smoothing technique. We also calculated the relevance score between events detected to construct the event network. Using the cosine coefficient between the co-occurred events, we calculated the relevance between the events and connected the events to construct the event network. Finally, we set up the event network by setting each event to each vertex and the relevance score between events to the vertices connecting the vertices. The event network constructed in our methods helped us to sort out major events in the political and social fields in Korea that occurred in the last one year in chronological order and at the same time identify which events are related to certain events. Our approach differs from existing event detection methods in that LDA topic modeling makes it possible to easily analyze large amounts of data and to identify the relevance of events that were difficult to detect in existing event detection. We applied various text mining techniques and Word2vec technique in the text preprocessing to improve the accuracy of the extraction of proper nouns and synthetic nouns, which have been difficult in analyzing existing Korean texts, can be found. In this study, the detection and network configuration techniques of the event have the following advantages in practical application. First, LDA topic modeling, which is unsupervised learning, can easily analyze subject and topic words and distribution from huge amount of data. Also, by using the date information of the collected news articles, it is possible to express the distribution by topic in a time series. Second, we can find out the connection of events in the form of present and summarized form by calculating relevance score and constructing event network by using simultaneous occurrence of topics that are difficult to grasp in existing event detection. It can be seen from the fact that the inter-event relevance-based event network proposed in this study was actually constructed in order of occurrence time. It is also possible to identify what happened as a starting point for a series of events through the event network. The limitation of this study is that the characteristics of LDA topic modeling have different results according to the initial parameters and the number of subjects, and the subject and event name of the analysis result should be given by the subjective judgment of the researcher. Also, since each topic is assumed to be exclusive and independent, it does not take into account the relevance between themes. Subsequent studies need to calculate the relevance between events that are not covered in this study or those that belong to the same subject.

Measuring the Public Service Quality Using Process Mining: Focusing on N City's Building Licensing Complaint Service (프로세스 마이닝을 이용한 공공서비스의 품질 측정: N시의 건축 인허가 민원 서비스를 중심으로)

  • Lee, Jung Seung
    • Journal of Intelligence and Information Systems
    • /
    • v.25 no.4
    • /
    • pp.35-52
    • /
    • 2019
  • As public services are provided in various forms, including e-government, the level of public demand for public service quality is increasing. Although continuous measurement and improvement of the quality of public services is needed to improve the quality of public services, traditional surveys are costly and time-consuming and have limitations. Therefore, there is a need for an analytical technique that can measure the quality of public services quickly and accurately at any time based on the data generated from public services. In this study, we analyzed the quality of public services based on data using process mining techniques for civil licensing services in N city. It is because the N city's building license complaint service can secure data necessary for analysis and can be spread to other institutions through public service quality management. This study conducted process mining on a total of 3678 building license complaint services in N city for two years from January 2014, and identified process maps and departments with high frequency and long processing time. According to the analysis results, there was a case where a department was crowded or relatively few at a certain point in time. In addition, there was a reasonable doubt that the increase in the number of complaints would increase the time required to complete the complaints. According to the analysis results, the time required to complete the complaint was varied from the same day to a year and 146 days. The cumulative frequency of the top four departments of the Sewage Treatment Division, the Waterworks Division, the Urban Design Division, and the Green Growth Division exceeded 50% and the cumulative frequency of the top nine departments exceeded 70%. Higher departments were limited and there was a great deal of unbalanced load among departments. Most complaint services have a variety of different patterns of processes. Research shows that the number of 'complementary' decisions has the greatest impact on the length of a complaint. This is interpreted as a lengthy period until the completion of the entire complaint is required because the 'complement' decision requires a physical period in which the complainant supplements and submits the documents again. In order to solve these problems, it is possible to drastically reduce the overall processing time of the complaints by preparing thoroughly before the filing of the complaints or in the preparation of the complaints, or the 'complementary' decision of other complaints. By clarifying and disclosing the cause and solution of one of the important data in the system, it helps the complainant to prepare in advance and convinces that the documents prepared by the public information will be passed. The transparency of complaints can be sufficiently predictable. Documents prepared by pre-disclosed information are likely to be processed without problems, which not only shortens the processing period but also improves work efficiency by eliminating the need for renegotiation or multiple tasks from the point of view of the processor. The results of this study can be used to find departments with high burdens of civil complaints at certain points of time and to flexibly manage the workforce allocation between departments. In addition, as a result of analyzing the pattern of the departments participating in the consultation by the characteristics of the complaints, it is possible to use it for automation or recommendation when requesting the consultation department. In addition, by using various data generated during the complaint process and using machine learning techniques, the pattern of the complaint process can be found. It can be used for automation / intelligence of civil complaint processing by making this algorithm and applying it to the system. This study is expected to be used to suggest future public service quality improvement through process mining analysis on civil service.