• Title/Summary/Keyword: optimize

Search Result 5,346, Processing Time 0.032 seconds

Twitter Issue Tracking System by Topic Modeling Techniques (토픽 모델링을 이용한 트위터 이슈 트래킹 시스템)

  • Bae, Jung-Hwan;Han, Nam-Gi;Song, Min
    • Journal of Intelligence and Information Systems
    • /
    • v.20 no.2
    • /
    • pp.109-122
    • /
    • 2014
  • People are nowadays creating a tremendous amount of data on Social Network Service (SNS). In particular, the incorporation of SNS into mobile devices has resulted in massive amounts of data generation, thereby greatly influencing society. This is an unmatched phenomenon in history, and now we live in the Age of Big Data. SNS Data is defined as a condition of Big Data where the amount of data (volume), data input and output speeds (velocity), and the variety of data types (variety) are satisfied. If someone intends to discover the trend of an issue in SNS Big Data, this information can be used as a new important source for the creation of new values because this information covers the whole of society. In this study, a Twitter Issue Tracking System (TITS) is designed and established to meet the needs of analyzing SNS Big Data. TITS extracts issues from Twitter texts and visualizes them on the web. The proposed system provides the following four functions: (1) Provide the topic keyword set that corresponds to daily ranking; (2) Visualize the daily time series graph of a topic for the duration of a month; (3) Provide the importance of a topic through a treemap based on the score system and frequency; (4) Visualize the daily time-series graph of keywords by searching the keyword; The present study analyzes the Big Data generated by SNS in real time. SNS Big Data analysis requires various natural language processing techniques, including the removal of stop words, and noun extraction for processing various unrefined forms of unstructured data. In addition, such analysis requires the latest big data technology to process rapidly a large amount of real-time data, such as the Hadoop distributed system or NoSQL, which is an alternative to relational database. We built TITS based on Hadoop to optimize the processing of big data because Hadoop is designed to scale up from single node computing to thousands of machines. Furthermore, we use MongoDB, which is classified as a NoSQL database. In addition, MongoDB is an open source platform, document-oriented database that provides high performance, high availability, and automatic scaling. Unlike existing relational database, there are no schema or tables with MongoDB, and its most important goal is that of data accessibility and data processing performance. In the Age of Big Data, the visualization of Big Data is more attractive to the Big Data community because it helps analysts to examine such data easily and clearly. Therefore, TITS uses the d3.js library as a visualization tool. This library is designed for the purpose of creating Data Driven Documents that bind document object model (DOM) and any data; the interaction between data is easy and useful for managing real-time data stream with smooth animation. In addition, TITS uses a bootstrap made of pre-configured plug-in style sheets and JavaScript libraries to build a web system. The TITS Graphical User Interface (GUI) is designed using these libraries, and it is capable of detecting issues on Twitter in an easy and intuitive manner. The proposed work demonstrates the superiority of our issue detection techniques by matching detected issues with corresponding online news articles. The contributions of the present study are threefold. First, we suggest an alternative approach to real-time big data analysis, which has become an extremely important issue. Second, we apply a topic modeling technique that is used in various research areas, including Library and Information Science (LIS). Based on this, we can confirm the utility of storytelling and time series analysis. Third, we develop a web-based system, and make the system available for the real-time discovery of topics. The present study conducted experiments with nearly 150 million tweets in Korea during March 2013.

Development of a Stock Trading System Using M & W Wave Patterns and Genetic Algorithms (M&W 파동 패턴과 유전자 알고리즘을 이용한 주식 매매 시스템 개발)

  • Yang, Hoonseok;Kim, Sunwoong;Choi, Heung Sik
    • Journal of Intelligence and Information Systems
    • /
    • v.25 no.1
    • /
    • pp.63-83
    • /
    • 2019
  • Investors prefer to look for trading points based on the graph shown in the chart rather than complex analysis, such as corporate intrinsic value analysis and technical auxiliary index analysis. However, the pattern analysis technique is difficult and computerized less than the needs of users. In recent years, there have been many cases of studying stock price patterns using various machine learning techniques including neural networks in the field of artificial intelligence(AI). In particular, the development of IT technology has made it easier to analyze a huge number of chart data to find patterns that can predict stock prices. Although short-term forecasting power of prices has increased in terms of performance so far, long-term forecasting power is limited and is used in short-term trading rather than long-term investment. Other studies have focused on mechanically and accurately identifying patterns that were not recognized by past technology, but it can be vulnerable in practical areas because it is a separate matter whether the patterns found are suitable for trading. When they find a meaningful pattern, they find a point that matches the pattern. They then measure their performance after n days, assuming that they have bought at that point in time. Since this approach is to calculate virtual revenues, there can be many disparities with reality. The existing research method tries to find a pattern with stock price prediction power, but this study proposes to define the patterns first and to trade when the pattern with high success probability appears. The M & W wave pattern published by Merrill(1980) is simple because we can distinguish it by five turning points. Despite the report that some patterns have price predictability, there were no performance reports used in the actual market. The simplicity of a pattern consisting of five turning points has the advantage of reducing the cost of increasing pattern recognition accuracy. In this study, 16 patterns of up conversion and 16 patterns of down conversion are reclassified into ten groups so that they can be easily implemented by the system. Only one pattern with high success rate per group is selected for trading. Patterns that had a high probability of success in the past are likely to succeed in the future. So we trade when such a pattern occurs. It is a real situation because it is measured assuming that both the buy and sell have been executed. We tested three ways to calculate the turning point. The first method, the minimum change rate zig-zag method, removes price movements below a certain percentage and calculates the vertex. In the second method, high-low line zig-zag, the high price that meets the n-day high price line is calculated at the peak price, and the low price that meets the n-day low price line is calculated at the valley price. In the third method, the swing wave method, the high price in the center higher than n high prices on the left and right is calculated as the peak price. If the central low price is lower than the n low price on the left and right, it is calculated as valley price. The swing wave method was superior to the other methods in the test results. It is interpreted that the transaction after checking the completion of the pattern is more effective than the transaction in the unfinished state of the pattern. Genetic algorithms(GA) were the most suitable solution, although it was virtually impossible to find patterns with high success rates because the number of cases was too large in this simulation. We also performed the simulation using the Walk-forward Analysis(WFA) method, which tests the test section and the application section separately. So we were able to respond appropriately to market changes. In this study, we optimize the stock portfolio because there is a risk of over-optimized if we implement the variable optimality for each individual stock. Therefore, we selected the number of constituent stocks as 20 to increase the effect of diversified investment while avoiding optimization. We tested the KOSPI market by dividing it into six categories. In the results, the portfolio of small cap stock was the most successful and the high vol stock portfolio was the second best. This shows that patterns need to have some price volatility in order for patterns to be shaped, but volatility is not the best.

Analysis of shopping website visit types and shopping pattern (쇼핑 웹사이트 탐색 유형과 방문 패턴 분석)

  • Choi, Kyungbin;Nam, Kihwan
    • Journal of Intelligence and Information Systems
    • /
    • v.25 no.1
    • /
    • pp.85-107
    • /
    • 2019
  • Online consumers browse products belonging to a particular product line or brand for purchase, or simply leave a wide range of navigation without making purchase. The research on the behavior and purchase of online consumers has been steadily progressed, and related services and applications based on behavior data of consumers have been developed in practice. In recent years, customization strategies and recommendation systems of consumers have been utilized due to the development of big data technology, and attempts are being made to optimize users' shopping experience. However, even in such an attempt, it is very unlikely that online consumers will actually be able to visit the website and switch to the purchase stage. This is because online consumers do not just visit the website to purchase products but use and browse the websites differently according to their shopping motives and purposes. Therefore, it is important to analyze various types of visits as well as visits to purchase, which is important for understanding the behaviors of online consumers. In this study, we explored the clustering analysis of session based on click stream data of e-commerce company in order to explain diversity and complexity of search behavior of online consumers and typified search behavior. For the analysis, we converted data points of more than 8 million pages units into visit units' sessions, resulting in a total of over 500,000 website visit sessions. For each visit session, 12 characteristics such as page view, duration, search diversity, and page type concentration were extracted for clustering analysis. Considering the size of the data set, we performed the analysis using the Mini-Batch K-means algorithm, which has advantages in terms of learning speed and efficiency while maintaining the clustering performance similar to that of the clustering algorithm K-means. The most optimized number of clusters was derived from four, and the differences in session unit characteristics and purchasing rates were identified for each cluster. The online consumer visits the website several times and learns about the product and decides the purchase. In order to analyze the purchasing process over several visits of the online consumer, we constructed the visiting sequence data of the consumer based on the navigation patterns in the web site derived clustering analysis. The visit sequence data includes a series of visiting sequences until one purchase is made, and the items constituting one sequence become cluster labels derived from the foregoing. We have separately established a sequence data for consumers who have made purchases and data on visits for consumers who have only explored products without making purchases during the same period of time. And then sequential pattern mining was applied to extract frequent patterns from each sequence data. The minimum support is set to 10%, and frequent patterns consist of a sequence of cluster labels. While there are common derived patterns in both sequence data, there are also frequent patterns derived only from one side of sequence data. We found that the consumers who made purchases through the comparative analysis of the extracted frequent patterns showed the visiting pattern to decide to purchase the product repeatedly while searching for the specific product. The implication of this study is that we analyze the search type of online consumers by using large - scale click stream data and analyze the patterns of them to explain the behavior of purchasing process with data-driven point. Most studies that typology of online consumers have focused on the characteristics of the type and what factors are key in distinguishing that type. In this study, we carried out an analysis to type the behavior of online consumers, and further analyzed what order the types could be organized into one another and become a series of search patterns. In addition, online retailers will be able to try to improve their purchasing conversion through marketing strategies and recommendations for various types of visit and will be able to evaluate the effect of the strategy through changes in consumers' visit patterns.

Optimization of Cultivational Conditions of Rice(Oryza sativa L.) by a Central Composite Design Applied to an Early Cultivar in Southern Region (중심합성계획법에 의한 남부 조생벼 재배요인의 최적조건 구명)

  • Shon, Gil-Man;Kim, Jeung-Kyo;Choe, Zhin-Ryong;Lee, Yu-Sik;Park, Joong-Yang
    • KOREAN JOURNAL OF CROP SCIENCE
    • /
    • v.34 no.1
    • /
    • pp.60-73
    • /
    • 1989
  • Two field experiments were carried out to assess the applicability of a central composite design (CCD) in determining optimum culture condition of an early rice cultivar, Unbongbyeo in southern Korea. A central composite design with two replicates was applied to five levels of five factors such as the number of hills per 3.3m2, the number of seedlings per hill, the levels of nitrogen, the transplanting date and the seedling age (Experiment 1). The levels of planting density were ranged from 30 hills to 150 hills per 3.3m2 ; the number of seedlings per hill from 1 seedling to 9 seedlings per hill; the levels of nitrogen application from 1 kg/l0a to 21 kg/l0a; the transplanting date from June 15 to July 5; the seedling age from 25 days to 45 days. A fractional factorial design was applied to three levels of five factors tested in CCD (Experiment 2). Yield per hill and per unit area were examined and the results obtained from both experiments were compared. The benefits from the central composite design were discussed. Maximum yield of brown rice per unit area was obtained at the combination of the central levels of one of five factors when the other four factors were fixed at central point. Furthermore, brown rice yield per unit area affected by interaction of two factors was maximized at the central point when the remain three factors being fixed at the central level. The responses of five factors to brown rice yield per hill and unit area were found to be a saddle point in both designs. Actual values of the stationary points were 107 hills per 3.3 m2, 4 seedlings per hill, 10 kg nitrogen per l0a, transplanting date of rice on June 26 and 33 days of seedling age in the central composite design. Brown rice yield per unit area at the stationary points were estimated 439 kg/l0a in the central composite design and 442 kg/l0a in the fractional factorial design. Considering the number of experimental treatment combinations, the central composite design was rather convenient in reducing the number of treatment combinations for similar information. It was more convenient for an experimenter to present the results from the central composite design than those from the fractional factorial design. Considering the optimum yields of brown rice per unit area at the stationary points being verified as saddle points in both designs. inter-heterogeneity of each of the factors should be avoided in setting up factors in pursuit of inducing unidirectional response of the factors to yield. Even though both the lower and higher levels in the central composite design being beyond the region of an experimenter's interest. they were considered highly valued in interpretation of the results. Conclusively. the central composite design was found to be more beneficial to optimize culture condition of paddy rice even with several levels of various factors were involved.

  • PDF

Pipetting Stability and Improvement Test of the Robotic Liquid Handling System Depending on Types of Liquid (용액에 따른 자동분주기의 분주능력 평가와 분주력 향상 실험)

  • Back, Hyangmi;Kim, Youngsan;Yun, Sunhee;Heo, Uisung;Kim, Hosin;Ryu, Hyeonggi;Lee, Guiwon
    • The Korean Journal of Nuclear Medicine Technology
    • /
    • v.20 no.2
    • /
    • pp.62-68
    • /
    • 2016
  • Purpose In a cyclosporine experiment using a robotic liquid handing system has found a deviation of its standard curve and low reproducibility of patients's results. The difference of the test is that methanol is mixed with samples and the extractions are used for the test. Therefore, we assumed that the abnormal test results came from using methanol and conducted this test. In a manual of a robotic liquid handling system mentions that we can choose several setting parameters depending on the viscosity of the liquids being used, the size of the sampling tips and the motor speeds that you elect to use but there's no exact order. This study was undertaken to confirm pipetting ability depending on types of liquids and investigate proper setting parameters for the optimum dispensing ability. Materials and Methods 4types of liquids(water, serum, methanol, PEG 6000(25%)) and $TSH^{125}I$ tracer(515 kBq) are used to confirm pipetting ability. 29 specimens for Cyclosporine test are used to compare results. Prepare 8 plastic tubes for each of the liquids and with multi pipette $400{\mu}l$ of each liquid is dispensed to 8 tubes and $100{\mu}l$ of $TSH^{125}I$ tracer are dispensed to all of the tubes. From the prepared samples, $100{\mu}l$ of liquids are dispensed using a robotic liquid handing system, counted and calculated its CV(%) depending on types of liquids. And then by adjusting several setting parameters(air gap, dispense time, delay time) the change of the CV(%)are calcutated and finds optimum setting parameters. 29 specimens are tested with 3 methods. The first(A) is manual method and the second(B) is used robotic liquid handling system with existing parameters. The third(C) is used robotic liquid handling system with adjusted parameters. Pipetting ability depending on types of liquids is assessed with CV(%). On the basis of (A), patients's test results are compared (A)and(B), (A)and(C) and they are assessed with %RE(%Relative error) and %Diff(%Difference). Results The CV(%) of the CPM depending on liquid types were water 0.88, serum 0.95, methanol 10.22 and PEG 0.68. As expected dispensing of methanol using a liquid handling system was the problem and others were good. The methanol's dispensing were conducted by adjusting several setting parameters. When transport air gap 0 was adjusted to 2 and 5, CV(%) were 20.16, 12.54 and when system air gap 0 was adjusted to 2 and 5, CV(%) were 8.94, 1.36. When adjusted to system air gap 2, transport air gap 2 was 12.96 and adjusted to system air gap 5, Transport air gap 5 was 1.33. When dispense speed was adjusted 300 to 100, CV(%) was 13.32 and when dispense delay was adjusted 200 to 100 was 13.55. When compared (B) to (A), the result increased 99.44% and %RE was 93.59%. When compared (C-system air gap was adjusted 0 to 5) to (A), the result increased 6.75% and %RE was 5.10%. Conclusion Adjusting speed and delay time of aspiration and dispense was meaningless but changing system air gap was effective. By adjusting several parameters proper value was found and it affected the practical result of the experiment. To optimize the system active efforts are needed through the test and in case of dispensing new types of liquids proper test is required to check the liquid is suitable for using the equipment.

  • PDF

The Research on Recommender for New Customers Using Collaborative Filtering and Social Network Analysis (협력필터링과 사회연결망을 이용한 신규고객 추천방법에 대한 연구)

  • Shin, Chang-Hoon;Lee, Ji-Won;Yang, Han-Na;Choi, Il Young
    • Journal of Intelligence and Information Systems
    • /
    • v.18 no.4
    • /
    • pp.19-42
    • /
    • 2012
  • Consumer consumption patterns are shifting rapidly as buyers migrate from offline markets to e-commerce routes, such as shopping channels on TV and internet shopping malls. In the offline markets consumers go shopping, see the shopping items, and choose from them. Recently consumers tend towards buying at shopping sites free from time and place. However, as e-commerce markets continue to expand, customers are complaining that it is becoming a bigger hassle to shop online. In the online shopping, shoppers have very limited information on the products. The delivered products can be different from what they have wanted. This case results to purchase cancellation. Because these things happen frequently, they are likely to refer to the consumer reviews and companies should be concerned about consumer's voice. E-commerce is a very important marketing tool for suppliers. It can recommend products to customers and connect them directly with suppliers with just a click of a button. The recommender system is being studied in various ways. Some of the more prominent ones include recommendation based on best-seller and demographics, contents filtering, and collaborative filtering. However, these systems all share two weaknesses : they cannot recommend products to consumers on a personal level, and they cannot recommend products to new consumers with no buying history. To fix these problems, we can use the information which has been collected from the questionnaires about their demographics and preference ratings. But, consumers feel these questionnaires are a burden and are unlikely to provide correct information. This study investigates combining collaborative filtering with the centrality of social network analysis. This centrality measure provides the information to infer the preference of new consumers from the shopping history of existing and previous ones. While the past researches had focused on the existing consumers with similar shopping patterns, this study tried to improve the accuracy of recommendation with all shopping information, which included not only similar shopping patterns but also dissimilar ones. Data used in this study, Movie Lens' data, was made by Group Lens research Project Team at University of Minnesota to recommend movies with a collaborative filtering technique. This data was built from the questionnaires of 943 respondents which gave the information on the preference ratings on 1,684 movies. Total data of 100,000 was organized by time, with initial data of 50,000 being existing customers and the latter 50,000 being new customers. The proposed recommender system consists of three systems : [+] group recommender system, [-] group recommender system, and integrated recommender system. [+] group recommender system looks at customers with similar buying patterns as 'neighbors', whereas [-] group recommender system looks at customers with opposite buying patterns as 'contraries'. Integrated recommender system uses both of the aforementioned recommender systems to recommend movies that both recommender systems pick. The study of three systems allows us to find the most suitable recommender system that will optimize accuracy and customer satisfaction. Our analysis showed that integrated recommender system is the best solution among the three systems studied, followed by [-] group recommended system and [+] group recommender system. This result conforms to the intuition that the accuracy of recommendation can be improved using all the relevant information. We provided contour maps and graphs to easily compare the accuracy of each recommender system. Although we saw improvement on accuracy with the integrated recommender system, we must remember that this research is based on static data with no live customers. In other words, consumers did not see the movies actually recommended from the system. Also, this recommendation system may not work well with products other than movies. Thus, it is important to note that recommendation systems need particular calibration for specific product/customer types.