• Title/Summary/Keyword: Web data

Search Result 5,608, Processing Time 0.032 seconds

A Collaborative Filtering System Combined with Users' Review Mining : Application to the Recommendation of Smartphone Apps (사용자 리뷰 마이닝을 결합한 협업 필터링 시스템: 스마트폰 앱 추천에의 응용)

  • Jeon, ByeoungKug;Ahn, Hyunchul
    • Journal of Intelligence and Information Systems
    • /
    • v.21 no.2
    • /
    • pp.1-18
    • /
    • 2015
  • Collaborative filtering(CF) algorithm has been popularly used for recommender systems in both academic and practical applications. A general CF system compares users based on how similar they are, and creates recommendation results with the items favored by other people with similar tastes. Thus, it is very important for CF to measure the similarities between users because the recommendation quality depends on it. In most cases, users' explicit numeric ratings of items(i.e. quantitative information) have only been used to calculate the similarities between users in CF. However, several studies indicated that qualitative information such as user's reviews on the items may contribute to measure these similarities more accurately. Considering that a lot of people are likely to share their honest opinion on the items they purchased recently due to the advent of the Web 2.0, user's reviews can be regarded as the informative source for identifying user's preference with accuracy. Under this background, this study proposes a new hybrid recommender system that combines with users' review mining. Our proposed system is based on conventional memory-based CF, but it is designed to use both user's numeric ratings and his/her text reviews on the items when calculating similarities between users. In specific, our system creates not only user-item rating matrix, but also user-item review term matrix. Then, it calculates rating similarity and review similarity from each matrix, and calculates the final user-to-user similarity based on these two similarities(i.e. rating and review similarities). As the methods for calculating review similarity between users, we proposed two alternatives - one is to use the frequency of the commonly used terms, and the other one is to use the sum of the importance weights of the commonly used terms in users' review. In the case of the importance weights of terms, we proposed the use of average TF-IDF(Term Frequency - Inverse Document Frequency) weights. To validate the applicability of the proposed system, we applied it to the implementation of a recommender system for smartphone applications (hereafter, app). At present, over a million apps are offered in each app stores operated by Google and Apple. Due to this information overload, users have difficulty in selecting proper apps that they really want. Furthermore, app store operators like Google and Apple have cumulated huge amount of users' reviews on apps until now. Thus, we chose smartphone app stores as the application domain of our system. In order to collect the experimental data set, we built and operated a Web-based data collection system for about two weeks. As a result, we could obtain 1,246 valid responses(ratings and reviews) from 78 users. The experimental system was implemented using Microsoft Visual Basic for Applications(VBA) and SAS Text Miner. And, to avoid distortion due to human intervention, we did not adopt any refining works by human during the user's review mining process. To examine the effectiveness of the proposed system, we compared its performance to the performance of conventional CF system. The performances of recommender systems were evaluated by using average MAE(mean absolute error). The experimental results showed that our proposed system(MAE = 0.7867 ~ 0.7881) slightly outperformed a conventional CF system(MAE = 0.7939). Also, they showed that the calculation of review similarity between users based on the TF-IDF weights(MAE = 0.7867) leaded to better recommendation accuracy than the calculation based on the frequency of the commonly used terms in reviews(MAE = 0.7881). The results from paired samples t-test presented that our proposed system with review similarity calculation using the frequency of the commonly used terms outperformed conventional CF system with 10% statistical significance level. Our study sheds a light on the application of users' review information for facilitating electronic commerce by recommending proper items to users.

Semantic Process Retrieval with Similarity Algorithms (유사도 알고리즘을 활용한 시맨틱 프로세스 검색방안)

  • Lee, Hong-Joo;Klein, Mark
    • Asia pacific journal of information systems
    • /
    • v.18 no.1
    • /
    • pp.79-96
    • /
    • 2008
  • One of the roles of the Semantic Web services is to execute dynamic intra-organizational services including the integration and interoperation of business processes. Since different organizations design their processes differently, the retrieval of similar semantic business processes is necessary in order to support inter-organizational collaborations. Most approaches for finding services that have certain features and support certain business processes have relied on some type of logical reasoning and exact matching. This paper presents our approach of using imprecise matching for expanding results from an exact matching engine to query the OWL(Web Ontology Language) MIT Process Handbook. MIT Process Handbook is an electronic repository of best-practice business processes. The Handbook is intended to help people: (1) redesigning organizational processes, (2) inventing new processes, and (3) sharing ideas about organizational practices. In order to use the MIT Process Handbook for process retrieval experiments, we had to export it into an OWL-based format. We model the Process Handbook meta-model in OWL and export the processes in the Handbook as instances of the meta-model. Next, we need to find a sizable number of queries and their corresponding correct answers in the Process Handbook. Many previous studies devised artificial dataset composed of randomly generated numbers without real meaning and used subjective ratings for correct answers and similarity values between processes. To generate a semantic-preserving test data set, we create 20 variants for each target process that are syntactically different but semantically equivalent using mutation operators. These variants represent the correct answers of the target process. We devise diverse similarity algorithms based on values of process attributes and structures of business processes. We use simple similarity algorithms for text retrieval such as TF-IDF and Levenshtein edit distance to devise our approaches, and utilize tree edit distance measure because semantic processes are appeared to have a graph structure. Also, we design similarity algorithms considering similarity of process structure such as part process, goal, and exception. Since we can identify relationships between semantic process and its subcomponents, this information can be utilized for calculating similarities between processes. Dice's coefficient and Jaccard similarity measures are utilized to calculate portion of overlaps between processes in diverse ways. We perform retrieval experiments to compare the performance of the devised similarity algorithms. We measure the retrieval performance in terms of precision, recall and F measure? the harmonic mean of precision and recall. The tree edit distance shows the poorest performance in terms of all measures. TF-IDF and the method incorporating TF-IDF measure and Levenshtein edit distance show better performances than other devised methods. These two measures are focused on similarity between name and descriptions of process. In addition, we calculate rank correlation coefficient, Kendall's tau b, between the number of process mutations and ranking of similarity values among the mutation sets. In this experiment, similarity measures based on process structure, such as Dice's, Jaccard, and derivatives of these measures, show greater coefficient than measures based on values of process attributes. However, the Lev-TFIDF-JaccardAll measure considering process structure and attributes' values together shows reasonably better performances in these two experiments. For retrieving semantic process, we can think that it's better to consider diverse aspects of process similarity such as process structure and values of process attributes. We generate semantic process data and its dataset for retrieval experiment from MIT Process Handbook repository. We suggest imprecise query algorithms that expand retrieval results from exact matching engine such as SPARQL, and compare the retrieval performances of the similarity algorithms. For the limitations and future work, we need to perform experiments with other dataset from other domain. And, since there are many similarity values from diverse measures, we may find better ways to identify relevant processes by applying these values simultaneously.

The Recognition and Utilization of Middle School Technology.Home Economics Teacher's Guidebook (중학교 "기술.가정" 교과 교사용 지도서에 대한 가정 교사의 인식 및 활용)

  • Kang, Eun-Yeong;Shin, Hye-Won
    • Journal of Korean Home Economics Education Association
    • /
    • v.19 no.2
    • /
    • pp.1-12
    • /
    • 2007
  • This study analyzed the recognition and utilization regarding teacher's guidebook for middle school technology-home economics class in the 7th Educational Curriculum. The data were collected via e-mail to teachers teaching home economics in middle schools. These e-mail addresses were acquired from middle school web pages registered on the Educational Board. The 355 data were analyzed using the SPSS program. The results were as follows: First, teachers recognized highly the necessity of teacher's guidebook. However, as the actual guidebook was not adequately helpful, the overall degree of satisfaction was relatively low. Teachers utilizing guidebook had more positive recognition on teacher's guidebook than teachers who did not. And teachers majored in technology education thought teacher's guidebook more helpful compared with teachers majored in home economics education. Second, teachers referenced teacher's guidebook mostly for field practice guidance. Third, teachers who did not utilize teacher's guidebook used other reference materials such as Internet Web sites and audiovisual materials. They were most commonly used for the reason that the contents were ample and easy to access. Fourth, the followings were suggested to improve teacher's guidebook. The provision of learning contents that can be practically used in class, the various samples of teaching-learning method, the specified methods of planning and criteria for performance assessment, the adequate supplementations regarding textbook contents, and the improvement of the outward layout format of the guidebook.

  • PDF

Feeding Behavior of Crustaceans (Cladocera, Copepoda and Ostracoda): Food Selection Measured by Stable Isotope Analysis Using R Package SIAR in Mesocosm Experiment (메소코즘을 이용한 지각류, 요각류 및 패충류의 섭식 성향 분석; 탄소, 질소 안정동위원소비의 믹싱모델 (R package SIAR)을 이용한 정량 분석)

  • Chang, Kwang-Hyeon;Seo, Dong-Il;Go, Soon-Mi;Sakamoto, Masaki;Nam, Gui-Sook;Choi, Jong-Yun;Kim, Min-Seob;Jeong, Kwang-Seok;La, Geung-Hwan;Kim, Hyun-Woo
    • Korean Journal of Ecology and Environment
    • /
    • v.49 no.4
    • /
    • pp.279-288
    • /
    • 2016
  • Stable Isotope Analysis(SIA) of carbon and nitrogen is useful tool for the understanding functional roles of target organisms in biological interactions in the food web. Recently, mixing model based on SIA is frequently used to determine which of the potential food sources predominantly assimilated by consumers, however, application of model is often limited and difficult for non-expert users of software. In the present study, we suggest easy manual of R software and package SIAR with example data regarding selective feeding of crustaceans dominated freshwater zooplankton community. We collected SIA data from the experimental mesocosms set up at the littoral area of eutrophic Chodae Reservoir, and analyzed the dominant crustacean species main food sources among small sized particulate organic matters (POM, <$50{\mu}m$), large sized POM (>$50{\mu}m$), and attached POM using mixing model. From the results obtained by SIAR model, Daphnia galeata and Ostracoda mainly consumed small sized POM while Simocephalus vetulus consumed both small and large sized POM simultaneously. Copepods collected from the reservoir showed no preferences on various food items, but in the mesocosm tanks, main food sources for the copepods was attached POM rather than planktonic preys including rotifers. The results have suggested that their roles as grazers in food web of eutrophicated reservoirs are different, and S. vetulus is more efficient grazer on wide range of food items such as large colony of phytoplankton and cyanobacteria during water bloom period.

A Comparative Study Between Food-Borne Outbreaks Two or More Persons and Individual Cases by Using Statistics of Japan (일본의 식중독 현황 통계 분석으로 살펴본 1인 식중독과 집단 식중독 비교)

  • Lee, Jong-Kyung
    • Journal of Food Hygiene and Safety
    • /
    • v.26 no.3
    • /
    • pp.248-253
    • /
    • 2011
  • KFDA compiles the statistical data of food poisoning outbreaks two or more persons since 2002 in Korea and release them to the public on the web. There is a gap of outbreak number between the real situation and the reports. To reduce the gap, addition of sporadic individual case of food poisoning may be one of the solution method. The statistical data of Japan where food consumption pattern is similar to Korea, were used in this study to compare the ratio and the pattern between the outbreak cases two or more persons and individual cases. By doing so, the data of Japan regarding to outbreak cases two or more persons will be comparable to that of Korea. The data of 2002 and 2003 in Japan showed that sporadic individual cases were 43.3% in the total food poisoning cases. The individual cases occurred highly in unknown places (90-92.3%) and home (6.2-8.5%) whileas the outbreaks two or more persons occurred mostly in the place of restaurants (46.6-50.l%) and inns (9.2-9.8%). The food-borne pathogens attributed to the individual cases were C. jejuni (51.9%), Salmonella spp. (35.3%), and V. parahaemolyticus (9.8%) while those to the outbreak cases two or more persons were norovirus (31.3%), Salmonella spp. (20.8%), C. jejuni (15.5%) in Japan. The data of 2002-2009 between Korea and Japan showed the outbreak case report rate was 1:1.5 based on the total population number.

Development of Extreme Event Analysis Tool Base on Spatial Information Using Climate Change Scenarios (기후변화 시나리오를 활용한 공간정보 기반 극단적 기후사상 분석 도구(EEAT) 개발)

  • Han, Kuk-Jin;Lee, Moung-Jin
    • Korean Journal of Remote Sensing
    • /
    • v.36 no.3
    • /
    • pp.475-486
    • /
    • 2020
  • Climate change scenarios are the basis of research to cope with climate change, and consist of large-scale spatio-temporal data. From the data point of view, one scenario has a large capacity of about 83 gigabytes or more, and the data format is semi-structured, making it difficult to utilize the data through means such as search, extraction, archiving and analysis. In this study, a tool for analyzing extreme climate events based on spatial information is developed to improve the usability of large-scale, multi-period climate change scenarios. In addition, a pilot analysis is conducted on the time and space in which the heavy rain thresholds that occurred in the past can occur in the future, by applying the developed tool to the RCP8.5 climate change scenario. As a result, the days with a cumulative rainfall of more than 587.6 mm over three days would account for about 76 days in the 2080s, and localized heavy rains would occur. The developed analysis tool was designed to facilitate the entire process from the initial setting through to deriving analysis results on a single platform, and enabled the results of the analysis to be implemented in various formats without using specific commercial software: web document format (HTML), image (PNG), climate change scenario (ESR), statistics (XLS). Therefore, the utilization of this analysis tool is considered to be useful for determining future prospects for climate change or vulnerability assessment, etc., and it is expected to be used to develop an analysis tool for climate change scenarios based on climate change reports to be presented in the future.

A Study of Big data-based Machine Learning Techniques for Wheel and Bearing Fault Diagnosis (차륜 및 차축베어링 고장진단을 위한 빅데이터 기반 머신러닝 기법 연구)

  • Jung, Hoon;Park, Moonsung
    • Journal of the Korea Academia-Industrial cooperation Society
    • /
    • v.19 no.1
    • /
    • pp.75-84
    • /
    • 2018
  • Increasing the operation rate of components and stabilizing the operation through timely management of the core parts are crucial for improving the efficiency of the railroad maintenance industry. The demand for diagnosis technology to assess the condition of rolling stock components, which employs history management and automated big data analysis, has increased to satisfy both aspects of increasing reliability and reducing the maintenance cost of the core components to cope with the trend of rapid maintenance. This study developed a big data platform-based system to manage the rolling stock component condition to acquire, process, and analyze the big data generated at onboard and wayside devices of railroad cars in real time. The system can monitor the conditions of the railroad car component and system resources in real time. The study also proposed a machine learning technique that enabled the distributed and parallel processing of the acquired big data and automatic component fault diagnosis. The test, which used the virtual instance generation system of the Amazon Web Service, proved that the algorithm applying the distributed and parallel technology decreased the runtime and confirmed the fault diagnosis model utilizing the random forest machine learning for predicting the condition of the bearing and wheel parts with 83% accuracy.

A Study on the necessity of Open Source Software Intermediaries in the Software Distribution Channel (소프트웨어 유통에 있어 공개소프트웨어 중개자의필요성에 대한 연구)

  • Lee, Seung-Chang;Suh, Eung-Kyo;Ahn, Sung-Hyuck;Park, Hoon-Sung
    • Journal of Distribution Science
    • /
    • v.11 no.2
    • /
    • pp.45-55
    • /
    • 2013
  • Purpose - The development and implementation of OSS (Open Source Software) led to a dramatic change in corporate IT infrastructure, from system server to smart phone, because the performance, reliability, and security functions of OSS are comparable to those of commercial software. Today, OSS has become an indispensable tool to cope with the competitive business environment and the constantly-evolving IT environment. However, the use of OSS is insufficient in small and medium-sized companies and software houses. This study examines the need for OSS Intermediaries in the Software Distribution Channel. It is expected that the role of the OSS Intermediary will be reduced with the improvement of the distribution process. The purpose of this research is to prove that OSS Intermediaries increase the efficiency of the software distribution market. Research design, Data, and Methodology - This study presents the analysis of data gathered online to determine the extent of the impact of the intermediaries on the OSS market. Data was collected using an online survey, conducted by building a personal search robot (web crawler). The survey period lasted 9 days during which a total of 233,021 data points were gathered from sourceforge.net and Apple's App store, the two most popular software intermediaries in the world. The data collected was analyzed using Google's Motion Chart. Results - The study found that, beginning 2006, the production of OSS in the Sourceforge.net increased rapidly across the board, but in the second half of 2009, it dropped sharply. There are many events that can explain this causality; however, we found an appropriate event to explain the effect. It was seen that during the same period of time, the monthly production of OSS in the App store was increasing quickly. The App store showed a contrasting trend to software production. Our follow-up analysis suggests that appropriate intermediaries like App store can enlarge the OSS market. The increase was caused by the appearance of B2C software intermediaries like App store. The results imply that OSS intermediaries can accelerate OSS software distribution, while development of a better online market is critical for corporate users. Conclusion - In this study, we analyzed 233,021 data points on the online software marketplace at Sourceforge.net. It indicates that OSS Intermediaries are needed in the software distribution market for its vitality. It is also critical that OSS intermediaries should satisfy certain qualifications to play a key role as market makers. This study has several interesting implications. One implication of this research is that the OSS intermediary should make an effort to create a complementary relationship between OSS and Proprietary Software. The second implication is that the OSS intermediary must possess a business model that shares the benefits with all the participants (developer, intermediary, and users).The third implication is that the intermediary provides an OSS of high quality like proprietary software with a high level of complexity. Thus, it is worthwhile to examine this study, which proves that the open source software intermediaries are essential in the software distribution channel.

  • PDF

Statistical Metadata for Users: A Case Study on the Level of Metadata Provision on Statistical Agency Websites (웹 이용자를 위한 통계 메타데이터: 통계정보 제공사이트의 메타데이터 제공 수준 평가 사례 연구)

  • Oh, Jung-Sun
    • Journal of the Korean Society for information Management
    • /
    • v.24 no.2
    • /
    • pp.161-179
    • /
    • 2007
  • As increasingly diverse kinds of information materials are available on the Internet, it becomes a challenge to define an adequate level of metadata provision for each different type of material in the context of digital libraries. This study explores issues of metadata provision for a particular type of material, statistical tables. Statistical data always involves numbers and numeric values which should be interpreted with an understanding of underlying concepts and constructs. Because of the unique data characteristics, metadata in the statistical domain is essential not only for finding and discovering relevant data, but also for understanding and using the data found. However, in statistical metadata research, more emphasis has been put on the question of what metadata is necessary for processing the data and less on what metadata should be presented to users. In this study, a case study was conducted to gauge the status of metadata provision for statistical tables on the Internet. The websites of two federal statistical agencies in the United States were selected and a content analysis method was used for that purpose. The result showing insufficient and inconsistent provision of metadata demonstrate the need for more discussions on statistical metadata from the ordinary web users' perspective.

Challenges in Construction of Omics data integration, and its standardization (농생명 오믹스데이터 통합 및 표준화)

  • Kim, Do-Wan;Lee, Tae-Ho;Kim, Chang-Kug;Seol, Young-Joo;Lee, Dong-Jun;Oh, Jae-Hyeon;Beak, Jung-Ho;Kim, Juna;Lee, Hong-Ro
    • Proceedings of the Korean Institute of Information and Commucation Sciences Conference
    • /
    • 2015.05a
    • /
    • pp.768-770
    • /
    • 2015
  • We performed integration and standardization of the omics data related agriculture. To do this, we requires progressed computational methods and bioinformatics infrastructures for integration, standardization, mining, and analysis. It makes easier biological knowledge to find. we potentialize registration a row and processed data in NABIC (National Agricultural Biotechnology Information Center) and its processed analysis results were offered related researchers. And we also provided various analysis pipelines, NGS analysis (Reference assembly, RNA-seq), GWAS, Microbial community analysis. In addition, the our system was carried out based on the design and build the quality assurance in management omics information system and constructed the infrastructure for utilization of omics analyze system. We carried out major improvement quality of omics information system. First is Improvement quality of registration category for omics based information. Second is data processing and development platform for web UI about related omics data. Third is development of proprietary management information for omics registration database. Forth is management and development of the statistics module producers about omics data. Last is Improvement the standard upload/ download module for Large omics Registration information.

  • PDF