• Title/Summary/Keyword: Big data Processing

Search Result 1,063, Processing Time 0.027 seconds

Using Ontologies for Semantic Text Mining (시맨틱 텍스트 마이닝을 위한 온톨로지 활용 방안)

  • Yu, Eun-Ji;Kim, Jung-Chul;Lee, Choon-Youl;Kim, Nam-Gyu
    • The Journal of Information Systems
    • /
    • v.21 no.3
    • /
    • pp.137-161
    • /
    • 2012
  • The increasing interest in big data analysis using various data mining techniques indicates that many commercial data mining tools now need to be equipped with fundamental text analysis modules. The most essential prerequisite for accurate analysis of text documents is an understanding of the exact semantics of each term in a document. The main difficulties in understanding the exact semantics of terms are mainly attributable to homonym and synonym problems, which is a traditional problem in the natural language processing field. Some major text mining tools provide a thesaurus to solve these problems, but a thesaurus cannot be used to resolve complex synonym problems. Furthermore, the use of a thesaurus is irrelevant to the issue of homonym problems and hence cannot solve them. In this paper, we propose a semantic text mining methodology that uses ontologies to improve the quality of text mining results by resolving the semantic ambiguity caused by homonym and synonym problems. We evaluate the practical applicability of the proposed methodology by performing a classification analysis to predict customer churn using real transactional data and Q&A articles from the "S" online shopping mall in Korea. The experiments revealed that the prediction model produced by our proposed semantic text mining method outperformed the model produced by traditional text mining in terms of prediction accuracy such as the response, captured response, and lift.

Design of Falling Recognition Application System using Deep Learning

  • Kwon, TaeWoo;Lee, Jong-Yong;Jung, Kye-Dong
    • International Journal of Internet, Broadcasting and Communication
    • /
    • v.12 no.2
    • /
    • pp.120-126
    • /
    • 2020
  • Studies are being conducted regarding falling recognition using sensors on smartphonesto recognize falling in human daily life. These studies use a number of sensors, mostly acceleration sensors, gyro sensors, motion sensors, etc. Falling recognition system processes the values of sensor data by using a falling recognition algorithm and classifies behavior based on thresholds. If the threshold is ambiguous, the accuracy will be reduced. To solve this problem, Deep learning was introduced in the behavioral recognition system. Deep learning is a kind of machine learning technique that computers process and categorize input data rather than processing it by man-made algorithms. Thus, in this paper, we propose a falling recognition application system using deep learning based on smartphones. The proposed system is powered by apps on smartphones. It also consists of three layers and uses DataBase as a Service (DBaaS) to handle big data and address data heterogeneity. The proposed system uses deep learning to recognize the user's behavior, it can expect higher accuracy compared to the system in the general rule base.

Concurrency Control for Updating a Large Spatial Object (큰 공간 객체의 변경을 위한 동시성 제어)

  • Seo Young Duk;Kim DongHyun;Hong Bong Hee
    • Journal of KIISE:Databases
    • /
    • v.32 no.1
    • /
    • pp.100-110
    • /
    • 2005
  • The update transactions to be executed in spatial databases usually have been known as interactive and long duration works. To improve the parallelism of concurrent updates, it needs multiple transactions concurrently update a large spatial object which has a spatial extensions larger than workspace of a client. However, under the existing locking protocols, it is not possible to concurrently update a large spatial object because of conflict of a write lock This paper proposes a partial locking scheme of enabling a transaction to set locks on parts of a big object. The partial locking scheme which is an exclusive locking scheme set by user, acquires locks for a part of the big object to restrict the unit of concurrency control to a partial object of a big object. The scheme gives benefits of improving the concurrency of un updating job for a large object because it makes the lock control granularity finer.

Development of online drone control management information platform (온라인 드론방제 관리 정보 플랫폼 개발)

  • Lim, Jin-Taek;Lee, Sang-Beom
    • Journal of the Institute of Convergence Signal Processing
    • /
    • v.22 no.4
    • /
    • pp.193-198
    • /
    • 2021
  • Recently, interests in the 4th industry have increased the level of demand for pest control by farmers in the field of rice farming, and the interests and use of agricultural pest control drones. Therefore, the diversification of agricultural control drones that spray high-concentration pesticides and the increase of agricultural exterminators due to the acquisition of national drone certifications are rapidly developing the agricultural sector in the drone industry. In addition, as detailed projects, an effective platform is required to construct large-scale big data due to pesticide management, exterminator management, precise spraying, pest control work volume classification, settlement, soil management, prediction and monitoring of damages by pests, etc. and to process the data. However, studies in South Korea and other countries on development of models and programs to integrate and process the big data such as data analysis algorithms, image analysis algorithms, growth management algorithms, AI algorithms, etc. are insufficient. This paper proposed an online drone pest control management information platform to meet the needs of managers and farmers in the agricultural field and to realize precise AI pest control based on the agricultural drone pest control processor using drones and presented foundation for development of a comprehensive management system through empirical experiments.

Big Data Analysis for Strategic Use of Urban Brands: Case Study Seoul city brand "I SEOUL U" (도시 브랜드의 전략적 활용을 위한 빅데이터 분석 : 서울시 도시 브랜드 "I SEOUL U" 사례)

  • Lim, Haewen
    • The Journal of the Korea Contents Association
    • /
    • v.22 no.1
    • /
    • pp.197-213
    • /
    • 2022
  • In this study, text mining analysis was performed on online big data for recognition and assessment of urban brand I Seoul U. To this end, TEXTOM, a processing program for data acquisition and analysis was used, and the 'I SEOUL U' keyword was selected as an analysis keyword. Keyword analysis shows the keywords associated with I Seoul U to be as follows: First, as a business and marketing term, keywords include pop-up store, gallery, co-branding, (festival, etc.), commodities, private companies and online. Second, as an event-related term, keywords include Han River, tree-planting day, tree planting, Hongdae, Christmas, Mapo, Jung-gu, Sejong University, and festival. Third, as a promotional term, keywords include robotics engineer Dr. Dennis Hong, Government, Art and Korea. In the N Gram analysis, as the city brand of Seoul, I Seoul U, in the public interest, was found to contribute to the commercial activities of private companies. In connection-oriented analysis, business and marketing, events, and promotions have been derived as categories. In matrix analysis, it was found that the products of the pop-up store are mainly developed, and products in the form of co-branding were being developed. In the topic modeling, a total of 10 topics were extracted and needs for commercial utilization and information for event festivals were mostly found.

The Bigdata Processing Environment Building for the Learning System (학습 시스템을 위한 빅데이터 처리 환경 구축)

  • Kim, Young-Geun;Kim, Seung-Hyun;Jo, Min-Hui;Kim, Won-Jung
    • The Journal of the Korea institute of electronic communication sciences
    • /
    • v.9 no.7
    • /
    • pp.791-797
    • /
    • 2014
  • In order to create an environment for Apache Hadoop for parallel distributed processing system of Bigdata, by connecting a plurality of computers, or to configure the node, using the configuration of the virtual nodes on a single computer it is necessary to build a cloud fading environment. However, be constructed in practice for education in these systems, there are many constraints in terms of cost and complex system configuration. Therefore, it is possible to be used as training for educational institutions and beginners in the field of Bigdata processing, development of learning systems and inexpensive practical is urgent. Based on the Raspberry Pi board, training and analysis of Big data processing, such as Hadoop and NoSQL is now the design and implementation of a learning system of parallel distributed processing of possible Bigdata in this study. It is expected that Bigdata parallel distributed processing system that has been implemented, and be a useful system for beginners who want to start a Bigdata and education.

A Hybrid Mechanism of Particle Swarm Optimization and Differential Evolution Algorithms based on Spark

  • Fan, Debin;Lee, Jaewan
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.13 no.12
    • /
    • pp.5972-5989
    • /
    • 2019
  • With the onset of the big data age, data is growing exponentially, and the issue of how to optimize large-scale data processing is especially significant. Large-scale global optimization (LSGO) is a research topic with great interest in academia and industry. Spark is a popular cloud computing framework that can cluster large-scale data, and it can effectively support the functions of iterative calculation through resilient distributed datasets (RDD). In this paper, we propose a hybrid mechanism of particle swarm optimization (PSO) and differential evolution (DE) algorithms based on Spark (SparkPSODE). The SparkPSODE algorithm is a parallel algorithm, in which the RDD and island models are employed. The island model is used to divide the global population into several subpopulations, which are applied to reduce the computational time by corresponding to RDD's partitions. To preserve population diversity and avoid premature convergence, the evolutionary strategy of DE is integrated into SparkPSODE. Finally, SparkPSODE is conducted on a set of benchmark problems on LSGO and show that, in comparison with several algorithms, the proposed SparkPSODE algorithm obtains better optimization performance through experimental results.

Design and Implementation of Kernel-Level Split and Merge Operations for Efficient File Transfer in Cyber-Physical System (사이버 물리 시스템에서 효율적인 파일 전송을 위한 커널 레벨 분할 및 결합 연산의 설계와 구현)

  • Park, Hyunchan;Jang, Jun-Hee;Lee, Junseok
    • IEMEK Journal of Embedded Systems and Applications
    • /
    • v.14 no.5
    • /
    • pp.249-258
    • /
    • 2019
  • In the cyber-physical system, big data collected from numerous sensors and IoT devices is transferred to the Cloud for processing and analysis. When transferring data to the Cloud, merging data into one single file is more efficient than using the data in the form of split files. However, current merging and splitting operations are performed at the user-level and require many I / O requests to memory and storage devices, which is very inefficient and time-consuming. To solve this problem, this paper proposes kernel-level partitioning and combining operations. At the kernel level, splitting and merging files can be done with very little overhead by modifying the file system metadata. We have designed the proposed algorithm in detail and implemented it in the Linux Ext4 file system. In our experiments with the real Cloud storage system, our technique has achieved a transfer time of up to only 17% compared to the case of transferring split files. It also confirmed that the time required can be reduced by up to 0.5% compared to the existing user-level method.

Development of Multi-Sensor Convergence Monitoring and Diagnosis Device based on Edge AI for the Modular Main Circuit Breaker of Korean High-Speed Rolling Stock

  • Byeong Ju, Yun;Jhong Il, Kim;Jae Young, Yoon;Jeong Jin, Kang;You Sik, Hong
    • International Journal of Advanced Culture Technology
    • /
    • v.10 no.4
    • /
    • pp.569-575
    • /
    • 2022
  • This is a research thesis on the development of a monitoring and diagnosis device that prevents the risk of an accident through monitoring and diagnosis of a modular Main Circuit Breaker (MCB) using Vacuum Interrupter (VI) for Korean high-speed rolling stock. In this paper, a comprehensive MCB monitoring and diagnosis was performed by converging vacuum level diagnosis of interrupter, operating coil monitoring of MCB and environmental temperature/humidity monitoring of modular box. In addition, to develop an algorithm that is expected to have a similar data processing before the actual field test of the MCB monitoring and diagnosis device in 2023, the cluster analysis and factor analysis were performed using the WEKA data mining technique on the big data of Korean railroad transformer, which was previously researched by Tae Hee Evolution with KORAIL.

An Analytical Approach Using Topic Mining for Improving the Service Quality of Hotels (호텔 산업의 서비스 품질 향상을 위한 토픽 마이닝 기반 분석 방법)

  • Moon, Hyun Sil;Sung, David;Kim, Jae Kyeong
    • Journal of Intelligence and Information Systems
    • /
    • v.25 no.1
    • /
    • pp.21-41
    • /
    • 2019
  • Thanks to the rapid development of information technologies, the data available on Internet have grown rapidly. In this era of big data, many studies have attempted to offer insights and express the effects of data analysis. In the tourism and hospitality industry, many firms and studies in the era of big data have paid attention to online reviews on social media because of their large influence over customers. As tourism is an information-intensive industry, the effect of these information networks on social media platforms is more remarkable compared to any other types of media. However, there are some limitations to the improvements in service quality that can be made based on opinions on social media platforms. Users on social media platforms represent their opinions as text, images, and so on. Raw data sets from these reviews are unstructured. Moreover, these data sets are too big to extract new information and hidden knowledge by human competences. To use them for business intelligence and analytics applications, proper big data techniques like Natural Language Processing and data mining techniques are needed. This study suggests an analytical approach to directly yield insights from these reviews to improve the service quality of hotels. Our proposed approach consists of topic mining to extract topics contained in the reviews and the decision tree modeling to explain the relationship between topics and ratings. Topic mining refers to a method for finding a group of words from a collection of documents that represents a document. Among several topic mining methods, we adopted the Latent Dirichlet Allocation algorithm, which is considered as the most universal algorithm. However, LDA is not enough to find insights that can improve service quality because it cannot find the relationship between topics and ratings. To overcome this limitation, we also use the Classification and Regression Tree method, which is a kind of decision tree technique. Through the CART method, we can find what topics are related to positive or negative ratings of a hotel and visualize the results. Therefore, this study aims to investigate the representation of an analytical approach for the improvement of hotel service quality from unstructured review data sets. Through experiments for four hotels in Hong Kong, we can find the strengths and weaknesses of services for each hotel and suggest improvements to aid in customer satisfaction. Especially from positive reviews, we find what these hotels should maintain for service quality. For example, compared with the other hotels, a hotel has a good location and room condition which are extracted from positive reviews for it. In contrast, we also find what they should modify in their services from negative reviews. For example, a hotel should improve room condition related to soundproof. These results mean that our approach is useful in finding some insights for the service quality of hotels. That is, from the enormous size of review data, our approach can provide practical suggestions for hotel managers to improve their service quality. In the past, studies for improving service quality relied on surveys or interviews of customers. However, these methods are often costly and time consuming and the results may be biased by biased sampling or untrustworthy answers. The proposed approach directly obtains honest feedback from customers' online reviews and draws some insights through a type of big data analysis. So it will be a more useful tool to overcome the limitations of surveys or interviews. Moreover, our approach easily obtains the service quality information of other hotels or services in the tourism industry because it needs only open online reviews and ratings as input data. Furthermore, the performance of our approach will be better if other structured and unstructured data sources are added.