• Title/Summary/Keyword: Average Value

Search Result 6,782, Processing Time 0.038 seconds

Performance analysis of Frequent Itemset Mining Technique based on Transaction Weight Constraints (트랜잭션 가중치 기반의 빈발 아이템셋 마이닝 기법의 성능분석)

  • Yun, Unil;Pyun, Gwangbum
    • Journal of Internet Computing and Services
    • /
    • v.16 no.1
    • /
    • pp.67-74
    • /
    • 2015
  • In recent years, frequent itemset mining for considering the importance of each item has been intensively studied as one of important issues in the data mining field. According to strategies utilizing the item importance, itemset mining approaches for discovering itemsets based on the item importance are classified as follows: weighted frequent itemset mining, frequent itemset mining using transactional weights, and utility itemset mining. In this paper, we perform empirical analysis with respect to frequent itemset mining algorithms based on transactional weights. The mining algorithms compute transactional weights by utilizing the weight for each item in large databases. In addition, these algorithms discover weighted frequent itemsets on the basis of the item frequency and weight of each transaction. Consequently, we can see the importance of a certain transaction through the database analysis because the weight for the transaction has higher value if it contains many items with high values. We not only analyze the advantages and disadvantages but also compare the performance of the most famous algorithms in the frequent itemset mining field based on the transactional weights. As a representative of the frequent itemset mining using transactional weights, WIS introduces the concept and strategies of transactional weights. In addition, there are various other state-of-the-art algorithms, WIT-FWIs, WIT-FWIs-MODIFY, and WIT-FWIs-DIFF, for extracting itemsets with the weight information. To efficiently conduct processes for mining weighted frequent itemsets, three algorithms use the special Lattice-like data structure, called WIT-tree. The algorithms do not need to an additional database scanning operation after the construction of WIT-tree is finished since each node of WIT-tree has item information such as item and transaction IDs. In particular, the traditional algorithms conduct a number of database scanning operations to mine weighted itemsets, whereas the algorithms based on WIT-tree solve the overhead problem that can occur in the mining processes by reading databases only one time. Additionally, the algorithms use the technique for generating each new itemset of length N+1 on the basis of two different itemsets of length N. To discover new weighted itemsets, WIT-FWIs performs the itemset combination processes by using the information of transactions that contain all the itemsets. WIT-FWIs-MODIFY has a unique feature decreasing operations for calculating the frequency of the new itemset. WIT-FWIs-DIFF utilizes a technique using the difference of two itemsets. To compare and analyze the performance of the algorithms in various environments, we use real datasets of two types (i.e., dense and sparse) in terms of the runtime and maximum memory usage. Moreover, a scalability test is conducted to evaluate the stability for each algorithm when the size of a database is changed. As a result, WIT-FWIs and WIT-FWIs-MODIFY show the best performance in the dense dataset, and in sparse dataset, WIT-FWI-DIFF has mining efficiency better than the other algorithms. Compared to the algorithms using WIT-tree, WIS based on the Apriori technique has the worst efficiency because it requires a large number of computations more than the others on average.

Inhomogeneity correction in on-line dosimetry using transmission dose (투과선량을 이용한 온라인 선량측정에서 불균질조직에 대한 선량 보정)

  • Wu, Hong-Gyun;Huh, Soon-Nyung;Lee, Hyoung-Koo;Ha, Sung-Whan
    • Journal of Radiation Protection and Research
    • /
    • v.23 no.3
    • /
    • pp.139-147
    • /
    • 1998
  • Purpose: Tissue inhomogeneity such as lung affects tumor dose as well as transmission dose in new concept of on-line dosimetry which estimates tumor dose from transmission dose using the new algorithm. This study was carried out to confirm accuracy of correction by tissue density in tumor dose estimation utilizing transmission dose. Methods: Cork phantom (CP, density $0.202\;gm/cm^3$) having similar density with lung parenchyme and polystyrene phantom (PP, density $1.040\;gm/cm^3$) having similar density with soft tissue were used. Dose measurement was carried out under condition simulating human chest. On simulating AP-PA irradiation, PPs with 3 cm thickness were placed above and below CP, which had thickness of 5, 10, and 20 cm. On simulating lateral irradiation, 6 cm thickness of PP was placed between two 10 cm thickness CPs additional 3 cm thick PP was placed to both lateral sides. 4, 6, and 10 MV x-ray were used. Field size was in the range of $3{\times}3$ cm through $20{\times}20$ cm, and phantom-chamber distance (PCD) was 10 to 50 cm. Above result was compared with another sets of data with equivalent thickness of PP which was corrected by density. Result: When transmission dose of PP was compared with equivalent thickness of CP which was corrected with density, the average error was 0.18 (${\pm}0.27$) % for 4 MV, 0.10 (${\pm}0.43$) % for 6 MV, and 0.33 (${\pm}0.30$) % for 10 MV with CP having thickness of 5 cm. When CP was 10 cm thick, the error was 0.23 (${\pm}0.73$) %, 0.05 (${\pm}0.57$) %, and 0.04 (${\pm}0.40$) %, while for 20 cm, error was 0.55 (${\pm}0.36$) %, 0.34 (${\pm}0.27$) %, and 0.34 (${\pm}0.18$) % for corresponding energy. With lateral irradiation model, difference was 1.15 (${\pm}1.86$) %, 0.90 (${\pm}1.43$) %, and 0.86 (${\pm}1.01$) % for corresponding energy. Relatively large difference was found in case of PCD having value of 10 cm. Omitting PCD with 10 cm, the difference was reduced to 0.47 (${\pm}$1.17) %, 0.42 (${\pm}$0.96) %, and 0.55 (${\pm}$0.77) % for corresponding energy. Conclusion When tissue inhomogeneity such as lung is in tract of x-ray beam, tumor dose could be calculated from transmission dose after correction utilizing tissue density.

  • PDF

Studies on the Microflora and Enzymes Influencing on Korea Native Kochuzang (Red Pepper Soybean Paste) Aging (재래식(在來式) 고추장 숙성(熟成)에 미치는 미생물(微生物) 및 그 효소(酵素)에 관(關)한 연구(硏究))

  • Lee, Ke-Ho;Lee, Myo-Sook;Park, Sung-O
    • Applied Biological Chemistry
    • /
    • v.19 no.2
    • /
    • pp.82-92
    • /
    • 1976
  • The study was carried out to investigate the changes of the various chemical components and the microflora during the aging period of Korean navive Kochuzang. (Red pepper soybean paste) Korean native maeju loaves were separated into surface and inner parts. Three kinds of Korean native Kochuzang were prepared from surface part, inner part, and ordinary of maeju. The selection and the indentification of the high enzyme producing strains from the microflora and characteristics of their enzymes were studied. I. The changes of the various chemical components during the aging period of Kochuzang. 1) The changes of pH in the 3 kinds of Kochuzang displayed rapid decrease for the first 10 days after preparing and gradual curve of decrease until 60 days, but slight increase for the next 30 days. The pH of the surface part Kochuzang was lower than that of inner part or ordinary Kochuzang. 2) The total acid contents in the 3 kinds of Kochuzang showed gradual increase until the 60 days but it slowly reduced after this time. 3) The total nitrogen contents in the 3 kind of Kochuzang showed gradual inerease up to the 60 days, but slight decrease after this time. 4) The changes of trichloroacetic acid soluble nitrogen in the 3 kinds of Kochuzang showed a remarkable increase for the first 10 days, however gradual increase after this time. 5) The increase of amino nitrogen contents in the 3 kinds of Kochuzang seemed to be remarkable until the first 30 days, however to be less remarkable after this time. 6) The contents of reducing sugar in the 3 kinds of Kochuzang showed remarkable increase until the first 50 days and it slowly reduced after this time. II. The changes of microflora during the aging period of Kochuzang. 1) Aerobic, anaerobic bacteria and mold in the 3 kinds of Kochuzang were increased until the first 30 to 40 days, but they were reduced after this time. 2) No yeast in the three kinds of Kochuzang appeared until the first 20 days. Yeast were proved to grow, when the pH value was decreased below 5.4 after the 30 days. Yeasts in the surface part and ordinary Kochuzang were gradually increased and those in the inner part Kochuzang were decreased as aging. III. The selection and identification of high amylase and protease producing strains from the microflora during the aging period of Kochuzang. 1) The amylase and protease highly producing strains from microflora were identified as Bacillus subtilis-P, Bacillus subtilis-G, Bacillus licheniformis-K, Aspergillus oryzae-B. 2) Amylase activity of Aspergillus oryzae-B was highest among the strains and the strains in order of the higher activity to the lower one were Bacillus subtilis-P Bacillus licheniformis-K, Bacillus subtilis-G. Protease activities of Aspergillus oryzae-B and Bacillus subtilis-P were about the same and the strains in order of the higher activity to the lower one were Bacillus licheniformis-K, Bacillus subtilis-G. 3) Amylase activity was inhibited more than protease activity was with NaCl concentration. Amylase activity was inhibited by 45 to 65 percent and protease activity by 40 to 46 percent at the concentration of 15 percent NaCl, which was the average concentration of NaCl in Kochuzang.

  • PDF

Requirement and Perception of Parents on the Subject of Home Economics in Middle School (중학교 가정교과에 대한 학부모의 인식 및 요구도)

  • Shin Hyo-Shick;Park Mi-Soog
    • Journal of Korean Home Economics Education Association
    • /
    • v.18 no.3 s.41
    • /
    • pp.1-22
    • /
    • 2006
  • The purpose of this study is that I should look for a desirous directions about home economics by studying the requirements and perception of the high school parents who have finished the course of home economics. It was about 600 parents whom I have searched Seoul-Pusan, Ganwon. Ghynggi province, Choongcheong-Gyungsang province, Cheonla and Jeju province of 600, I chose only 560 as apparently suitable research. The questions include 61 requirements about home economics and one which we never fail to keep among the contents, whenever possible and one about the perception of home economics aims 11 about the perception of home economics courses and management. The collections were analyzed frequency, percent, mean. standard deviation t-test by using SAS program. The followings is the summary result of studying of it. 1. All the boys and girls learning together about the Idea of healthy lives and desirous human formulation and knowledge together are higher. 2. Among the teaching purposes of home economics, the item of the scientific principle and knowledge for improvements of home life shows 15.7% below average value. 3. The recognition degree about the quality of home economics is highly related with the real life, and about the system. we recognize lacking in periods and contents of home economics field and about guiding content, accomplishment and application qualities are higher regardless of sex. 4. The important term which we should emphasize in the subject of home economics is family part. 5. Among the needs of home economic requirement in freshman, in the middle unit, their growth and development are higher than anything else, representing 4.11, and by contrast the basic principle and actuality is 3.70, which is lowest among them. 6. In the case of second grade requirement of home economics content for parents in the middle unit young man and consuming life is 4.09 highest. 7. In the case of 3rd grade requirement of economics contents in the middle unit the choice of coming direction and job ethics is highest 4.16, and preparing meals and evaluation is lowest 3.50.

  • PDF

Resolving the 'Gray sheep' Problem Using Social Network Analysis (SNA) in Collaborative Filtering (CF) Recommender Systems (소셜 네트워크 분석 기법을 활용한 협업필터링의 특이취향 사용자(Gray Sheep) 문제 해결)

  • Kim, Minsung;Im, Il
    • Journal of Intelligence and Information Systems
    • /
    • v.20 no.2
    • /
    • pp.137-148
    • /
    • 2014
  • Recommender system has become one of the most important technologies in e-commerce in these days. The ultimate reason to shop online, for many consumers, is to reduce the efforts for information search and purchase. Recommender system is a key technology to serve these needs. Many of the past studies about recommender systems have been devoted to developing and improving recommendation algorithms and collaborative filtering (CF) is known to be the most successful one. Despite its success, however, CF has several shortcomings such as cold-start, sparsity, gray sheep problems. In order to be able to generate recommendations, ordinary CF algorithms require evaluations or preference information directly from users. For new users who do not have any evaluations or preference information, therefore, CF cannot come up with recommendations (Cold-star problem). As the numbers of products and customers increase, the scale of the data increases exponentially and most of the data cells are empty. This sparse dataset makes computation for recommendation extremely hard (Sparsity problem). Since CF is based on the assumption that there are groups of users sharing common preferences or tastes, CF becomes inaccurate if there are many users with rare and unique tastes (Gray sheep problem). This study proposes a new algorithm that utilizes Social Network Analysis (SNA) techniques to resolve the gray sheep problem. We utilize 'degree centrality' in SNA to identify users with unique preferences (gray sheep). Degree centrality in SNA refers to the number of direct links to and from a node. In a network of users who are connected through common preferences or tastes, those with unique tastes have fewer links to other users (nodes) and they are isolated from other users. Therefore, gray sheep can be identified by calculating degree centrality of each node. We divide the dataset into two, gray sheep and others, based on the degree centrality of the users. Then, different similarity measures and recommendation methods are applied to these two datasets. More detail algorithm is as follows: Step 1: Convert the initial data which is a two-mode network (user to item) into an one-mode network (user to user). Step 2: Calculate degree centrality of each node and separate those nodes having degree centrality values lower than the pre-set threshold. The threshold value is determined by simulations such that the accuracy of CF for the remaining dataset is maximized. Step 3: Ordinary CF algorithm is applied to the remaining dataset. Step 4: Since the separated dataset consist of users with unique tastes, an ordinary CF algorithm cannot generate recommendations for them. A 'popular item' method is used to generate recommendations for these users. The F measures of the two datasets are weighted by the numbers of nodes and summed to be used as the final performance metric. In order to test performance improvement by this new algorithm, an empirical study was conducted using a publically available dataset - the MovieLens data by GroupLens research team. We used 100,000 evaluations by 943 users on 1,682 movies. The proposed algorithm was compared with an ordinary CF algorithm utilizing 'Best-N-neighbors' and 'Cosine' similarity method. The empirical results show that F measure was improved about 11% on average when the proposed algorithm was used

    . Past studies to improve CF performance typically used additional information other than users' evaluations such as demographic data. Some studies applied SNA techniques as a new similarity metric. This study is novel in that it used SNA to separate dataset. This study shows that performance of CF can be improved, without any additional information, when SNA techniques are used as proposed. This study has several theoretical and practical implications. This study empirically shows that the characteristics of dataset can affect the performance of CF recommender systems. This helps researchers understand factors affecting performance of CF. This study also opens a door for future studies in the area of applying SNA to CF to analyze characteristics of dataset. In practice, this study provides guidelines to improve performance of CF recommender systems with a simple modification.

  • Determination of Optimal Concentration of LPE (Lysophosphatidylethanolamine) for Postharvest Stability and Quality of Strawberry Fruit (딸기 수확 후 저장기간 연장 및 품질 개선을 위한 LPE (Lysophosphatidylethanolamine) 적정 처리농도 구명)

    • Choi, Ki-Young;Kim, Il-Seop;Yun, Young-Sik;Choi, Eun-Young
      • Journal of Bio-Environment Control
      • /
      • v.25 no.3
      • /
      • pp.153-161
      • /
      • 2016
    • This study aims to determine the optimal maturity of strawberry fruits as affected by the application of lysophosphatidylethanolamine (LPE) and its optimal concentration for postharvest stability and quality. Prior to application of treatments, fruits that were classified into levels of maturity (0%, 50%, 70% and 100%) were air-dried for 40 minutes and stored in the refrigerator at $4^{\circ}C$ for 12 days. Fruits at 70% maturity were dipped into 0, 10, 50 and $100mg{\cdot}L^{-1}$ LPE solutions for 1 minute. A lower range of concentration (0, 2.5, 5, 10 and $25mg{\cdot}L^{-1}$) was applied to fruits at different maturity levels. Data on fresh weight, hardness at vertical and horizontal loading positions, color index and sugar content during storage were collected. Based on fruits with 70% maturity dipped in LPE concentrations, there were no significant differences found on fresh weight, color index and sugar content. However, the application of $10mg{\cdot}L^{-1}$ LPE gave the highest hardness at vertical loading position while $100mg{\cdot}L^{-1}$ had the lowest average. At lower range of LPE concentrations, fresh weight was not significantly affected by LPE application and maturity levels. Hardness of fruits was mainly based on the maturity of the fruits. Increased hardness was observed in the fruits with 70% maturity dipped into the $5mg{\cdot}L^{-1}$ of LPE solution. The hardness and Hunter's $L^*$ and $b^*$ value of 100% matured fruits gave lowest values despite the application of $25mg{\cdot}L^{-1}$ LPE 12 days after storage.

    Measuring Consumer-Brand Relationship Quality (소비자-브랜드 관계 품질 측정에 관한 연구)

    • Kang, Myung-Soo;Kim, Byoung-Jai;Shin, Jong-Chil
      • Journal of Global Scholars of Marketing Science
      • /
      • v.17 no.2
      • /
      • pp.111-131
      • /
      • 2007
    • As a brand becomes a core asset in creating a corporation's value, brand marketing has become one of core strategies that corporations pursue. Recently, for customer relationship management, possession and consumption of goods were centered on brand for the management. Thus, management related to this matter was developed. The main reason of the increased interest on the relationship between the brand and the consumer is due to acquisition of individual consumers and development of relationship with those consumers. Along with the development of relationship, a corporation is able to establish long-term relationships. This has become a competitive advantage for the corporation. All of these processes became the strategic assets of corporations. The importance and the increase of interest of a brand have also become a big issue academically. Brand equity, brand extension, brand identity, brand relationship, and brand community are the results derived from the interest of a brand. More specifically, in marketing, the study of brands has been led to the study of factors related to building of powerful brands and the process of building the brand. Recently, studies concentrated primarily on the consumer-brand relationship. The reason is that brand loyalty can not explain the dynamic quality aspects of loyalty, the consumer-brand relationship building process, and especially interactions between the brands and the consumers. In the studies of consumer-brand relationship, a brand is not just limited to possession or consumption objectives, but rather conceptualized as partners. Most of the studies from the past concentrated on the results of qualitative analysis of consumer-brand relationship to show the depth and width of the performance of consumer-brand relationship. Studies in Korea have been the same. Recently, studies of consumer-brand relationship started to concentrate on quantitative analysis rather than qualitative analysis or even go further with quantitative analysis to show effecting factors of consumer-brand relationship. Studies of new quantitative approaches show the possibilities of using the results as a new concept of viewing consumer-brand relationship and possibilities of applying these new concepts on marketing. Studies of consumer-brand relationship with quantitative approach already exist, but none of them include sub-dimensions of consumer-brand relationship, which presents theoretical proofs for measurement. In other words, most studies add up or average out the sub-dimensions of consumer-brand relationship. However, to do these kind of studies, precondition of sub-dimensions being in identical constructs is necessary. Therefore, most of the studies from the past do not meet conditions of sub-dimensions being as one dimension construct. From this, we question the validity of past studies and their limits. The main purpose of this paper is to overcome the limits shown from the past studies by practical use of previous studies on sub-dimensions in a one-dimensional construct (Naver & Slater, 1990; Cronin & Taylor, 1992; Chang & Chen, 1998). In this study, two arbitrary groups were classified to evaluate reliability of the measurements and reliability analyses were pursued on each group. For convergent validity, correlations, Cronbach's, one-factor solution exploratory analysis were used. For discriminant validity correlation of consumer-brand relationship was compared with that of an involvement, which is a similar concept with consumer-based relationship. It also indicated dependent correlations by Cohen and Cohen (1975, p.35) and results showed that it was different constructs from 6 sub-dimensions of consumer-brand relationship. Through the results of studies mentioned above, we were able to finalize that sub-dimensions of consumer-brand relationship can viewed from one-dimensional constructs. This means that the one-dimensional construct of consumer-brand relationship can be viewed with reliability and validity. The result of this research is theoretically meaningful in that it assumes consumer-brand relationship in a one-dimensional construct and provides the basis of methodologies which are previously preformed. It is thought that this research also provides the possibility of new research on consumer-brand relationship in that it gives root to the fact that it is possible to manipulate one-dimensional constructs consisting of consumer-brand relationship. In the case of previous research on consumer-brand relationship, consumer-brand relationship is classified into several types on the basis of components consisting of consumer-brand relationship and a number of studies have been performed with priority given to the types. However, as we can possibly manipulate a one-dimensional construct through this research, it is expected that various studies which make the level or strength of consumer-brand relationship practical application of construct will be performed, and not research focused on separate types of consumer-brand relationship. Additionally, we have the theoretical basis of probability in which to manipulate the consumer-brand relationship with one-dimensional constructs. It is anticipated that studies using this construct, which is consumer-brand relationship, practical use of dependent variables, parameters, mediators, and so on, will be performed.

    • PDF

    The Effects of Environmental Dynamism on Supply Chain Commitment in the High-tech Industry: The Roles of Flexibility and Dependence (첨단산업의 환경동태성이 공급체인의 결속에 미치는 영향: 유연성과 의존성의 역할)

    • Kim, Sang-Deok;Ji, Seong-Goo
      • Journal of Global Scholars of Marketing Science
      • /
      • v.17 no.2
      • /
      • pp.31-54
      • /
      • 2007
    • The exchange between buyers and sellers in the industrial market is changing from short-term to long-term relationships. Long-term relationships are governed mainly by formal contracts or informal agreements, but many scholars are now asserting that controlling relationship by using formal contracts under environmental dynamism is inappropriate. In this case, partners will depend on each other's flexibility or interdependence. The former, flexibility, provides a general frame of reference, order, and standards against which to guide and assess appropriate behavior in dynamic and ambiguous situations, thus motivating the value-oriented performance goals shared between partners. It is based on social sacrifices, which can potentially minimize any opportunistic behaviors. The later, interdependence, means that each firm possesses a high level of dependence in an dynamic channel relationship. When interdependence is high in magnitude and symmetric, each firm enjoys a high level of power and the bonds between the firms should be reasonably strong. Strong shared power is likely to promote commitment because of the common interests, attention, and support found in such channel relationships. This study deals with environmental dynamism in high-tech industry. Firms in the high-tech industry regard it as a key success factor to successfully cope with environmental changes. However, due to the lack of studies dealing with environmental dynamism and supply chain commitment in the high-tech industry, it is very difficult to find effective strategies to cope with them. This paper presents the results of an empirical study on the relationship between environmental dynamism and supply chain commitment in the high-tech industry. We examined the effects of consumer, competitor, and technological dynamism on supply chain commitment. Additionally, we examined the moderating effects of flexibility and dependence of supply chains. This study was confined to the type of high-tech industry which has the characteristics of rapid technology change and short product lifecycle. Flexibility among the firms of this industry, having the characteristic of hard and fast growth, is more important here than among any other industry. Thus, a variety of environmental dynamism can affect a supply chain relationship. The industries targeted industries were electronic parts, metal product, computer, electric machine, automobile, and medical precision manufacturing industries. Data was collected as follows. During the survey, the researchers managed to obtain the list of parts suppliers of 2 companies, N and L, with an international competitiveness in the mobile phone manufacturing industry; and of the suppliers in a business relationship with S company, a semiconductor manufacturing company. They were asked to respond to the survey via telephone and e-mail. During the two month period of February-April 2006, we were able to collect data from 44 companies. The respondents were restricted to direct dealing authorities and subcontractor company (the supplier) staff with at least three months of dealing experience with a manufacture (an industrial material buyer). The measurement validation procedures included scale reliability; discriminant and convergent validity were used to validate measures. Also, the reliability measurements traditionally employed, such as the Cronbach's alpha, were used. All the reliabilities were greater than.70. A series of exploratory factor analyses was conducted. We conducted confirmatory factor analyses to assess the validity of our measurements. A series of chi-square difference tests were conducted so that the discriminant validity could be ensured. For each pair, we estimated two models-an unconstrained model and a constrained model-and compared the two model fits. All these tests supported discriminant validity. Also, all items loaded significantly on their respective constructs, providing support for convergent validity. We then examined composite reliability and average variance extracted (AVE). The composite reliability of each construct was greater than.70. The AVE of each construct was greater than.50. According to the multiple regression analysis, customer dynamism had a negative effect and competitor dynamism had a positive effect on a supplier's commitment. In addition, flexibility and dependence had significant moderating effects on customer and competitor dynamism. On the other hand, all hypotheses about technological dynamism had no significant effects on commitment. In other words, technological dynamism had no direct effect on supplier's commitment and was not moderated by the flexibility and dependence of the supply chain. This study makes its contribution in the point of view that this is a rare study on environmental dynamism and supply chain commitment in the field of high-tech industry. Especially, this study verified the effects of three sectors of environmental dynamism on supplier's commitment. Also, it empirically tested how the effects were moderated by flexibility and dependence. The results showed that flexibility and interdependence had a role to strengthen supplier's commitment under environmental dynamism in high-tech industry. Thus relationship managers in high-tech industry should make supply chain relationship flexible and interdependent. The limitations of the study are as follows; First, about the research setting, the study was conducted with high-tech industry, in which the direction of the change in the power balance of supply chain dyads is usually determined by manufacturers. So we have a difficulty with generalization. We need to control the power structure between partners in a future study. Secondly, about flexibility, we treated it throughout the paper as positive, but it can also be negative, i.e. violating an agreement or moving, but in the wrong direction, etc. Therefore we need to investigate the multi-dimensionality of flexibility in future research.

    • PDF

    Pareto Ratio and Inequality Level of Knowledge Sharing in Virtual Knowledge Collaboration: Analysis of Behaviors on Wikipedia (지식 공유의 파레토 비율 및 불평등 정도와 가상 지식 협업: 위키피디아 행위 데이터 분석)

    • Park, Hyun-Jung;Shin, Kyung-Shik
      • Journal of Intelligence and Information Systems
      • /
      • v.20 no.3
      • /
      • pp.19-43
      • /
      • 2014
    • The Pareto principle, also known as the 80-20 rule, states that roughly 80% of the effects come from 20% of the causes for many events including natural phenomena. It has been recognized as a golden rule in business with a wide application of such discovery like 20 percent of customers resulting in 80 percent of total sales. On the other hand, the Long Tail theory, pointing out that "the trivial many" produces more value than "the vital few," has gained popularity in recent times with a tremendous reduction of distribution and inventory costs through the development of ICT(Information and Communication Technology). This study started with a view to illuminating how these two primary business paradigms-Pareto principle and Long Tail theory-relates to the success of virtual knowledge collaboration. The importance of virtual knowledge collaboration is soaring in this era of globalization and virtualization transcending geographical and temporal constraints. Many previous studies on knowledge sharing have focused on the factors to affect knowledge sharing, seeking to boost individual knowledge sharing and resolve the social dilemma caused from the fact that rational individuals are likely to rather consume than contribute knowledge. Knowledge collaboration can be defined as the creation of knowledge by not only sharing knowledge, but also by transforming and integrating such knowledge. In this perspective of knowledge collaboration, the relative distribution of knowledge sharing among participants can count as much as the absolute amounts of individual knowledge sharing. In particular, whether the more contribution of the upper 20 percent of participants in knowledge sharing will enhance the efficiency of overall knowledge collaboration is an issue of interest. This study deals with the effect of this sort of knowledge sharing distribution on the efficiency of knowledge collaboration and is extended to reflect the work characteristics. All analyses were conducted based on actual data instead of self-reported questionnaire surveys. More specifically, we analyzed the collaborative behaviors of editors of 2,978 English Wikipedia featured articles, which are the best quality grade of articles in English Wikipedia. We adopted Pareto ratio, the ratio of the number of knowledge contribution of the upper 20 percent of participants to the total number of knowledge contribution made by the total participants of an article group, to examine the effect of Pareto principle. In addition, Gini coefficient, which represents the inequality of income among a group of people, was applied to reveal the effect of inequality of knowledge contribution. Hypotheses were set up based on the assumption that the higher ratio of knowledge contribution by more highly motivated participants will lead to the higher collaboration efficiency, but if the ratio gets too high, the collaboration efficiency will be exacerbated because overall informational diversity is threatened and knowledge contribution of less motivated participants is intimidated. Cox regression models were formulated for each of the focal variables-Pareto ratio and Gini coefficient-with seven control variables such as the number of editors involved in an article, the average time length between successive edits of an article, the number of sections a featured article has, etc. The dependent variable of the Cox models is the time spent from article initiation to promotion to the featured article level, indicating the efficiency of knowledge collaboration. To examine whether the effects of the focal variables vary depending on the characteristics of a group task, we classified 2,978 featured articles into two categories: Academic and Non-academic. Academic articles refer to at least one paper published at an SCI, SSCI, A&HCI, or SCIE journal. We assumed that academic articles are more complex, entail more information processing and problem solving, and thus require more skill variety and expertise. The analysis results indicate the followings; First, Pareto ratio and inequality of knowledge sharing relates in a curvilinear fashion to the collaboration efficiency in an online community, promoting it to an optimal point and undermining it thereafter. Second, the curvilinear effect of Pareto ratio and inequality of knowledge sharing on the collaboration efficiency is more sensitive with a more academic task in an online community.

    Stock-Index Invest Model Using News Big Data Opinion Mining (뉴스와 주가 : 빅데이터 감성분석을 통한 지능형 투자의사결정모형)

    • Kim, Yoo-Sin;Kim, Nam-Gyu;Jeong, Seung-Ryul
      • Journal of Intelligence and Information Systems
      • /
      • v.18 no.2
      • /
      • pp.143-156
      • /
      • 2012
    • People easily believe that news and stock index are closely related. They think that securing news before anyone else can help them forecast the stock prices and enjoy great profit, or perhaps capture the investment opportunity. However, it is no easy feat to determine to what extent the two are related, come up with the investment decision based on news, or find out such investment information is valid. If the significance of news and its impact on the stock market are analyzed, it will be possible to extract the information that can assist the investment decisions. The reality however is that the world is inundated with a massive wave of news in real time. And news is not patterned text. This study suggests the stock-index invest model based on "News Big Data" opinion mining that systematically collects, categorizes and analyzes the news and creates investment information. To verify the validity of the model, the relationship between the result of news opinion mining and stock-index was empirically analyzed by using statistics. Steps in the mining that converts news into information for investment decision making, are as follows. First, it is indexing information of news after getting a supply of news from news provider that collects news on real-time basis. Not only contents of news but also various information such as media, time, and news type and so on are collected and classified, and then are reworked as variable from which investment decision making can be inferred. Next step is to derive word that can judge polarity by separating text of news contents into morpheme, and to tag positive/negative polarity of each word by comparing this with sentimental dictionary. Third, positive/negative polarity of news is judged by using indexed classification information and scoring rule, and then final investment decision making information is derived according to daily scoring criteria. For this study, KOSPI index and its fluctuation range has been collected for 63 days that stock market was open during 3 months from July 2011 to September in Korea Exchange, and news data was collected by parsing 766 articles of economic news media M company on web page among article carried on stock information>news>main news of portal site Naver.com. In change of the price index of stocks during 3 months, it rose on 33 days and fell on 30 days, and news contents included 197 news articles before opening of stock market, 385 news articles during the session, 184 news articles after closing of market. Results of mining of collected news contents and of comparison with stock price showed that positive/negative opinion of news contents had significant relation with stock price, and change of the price index of stocks could be better explained in case of applying news opinion by deriving in positive/negative ratio instead of judging between simplified positive and negative opinion. And in order to check whether news had an effect on fluctuation of stock price, or at least went ahead of fluctuation of stock price, in the results that change of stock price was compared only with news happening before opening of stock market, it was verified to be statistically significant as well. In addition, because news contained various type and information such as social, economic, and overseas news, and corporate earnings, the present condition of type of industry, market outlook, the present condition of market and so on, it was expected that influence on stock market or significance of the relation would be different according to the type of news, and therefore each type of news was compared with fluctuation of stock price, and the results showed that market condition, outlook, and overseas news was the most useful to explain fluctuation of news. On the contrary, news about individual company was not statistically significant, but opinion mining value showed tendency opposite to stock price, and the reason can be thought to be the appearance of promotional and planned news for preventing stock price from falling. Finally, multiple regression analysis and logistic regression analysis was carried out in order to derive function of investment decision making on the basis of relation between positive/negative opinion of news and stock price, and the results showed that regression equation using variable of market conditions, outlook, and overseas news before opening of stock market was statistically significant, and classification accuracy of logistic regression accuracy results was shown to be 70.0% in rise of stock price, 78.8% in fall of stock price, and 74.6% on average. This study first analyzed relation between news and stock price through analyzing and quantifying sensitivity of atypical news contents by using opinion mining among big data analysis techniques, and furthermore, proposed and verified smart investment decision making model that could systematically carry out opinion mining and derive and support investment information. This shows that news can be used as variable to predict the price index of stocks for investment, and it is expected the model can be used as real investment support system if it is implemented as system and verified in the future.


    (34141) Korea Institute of Science and Technology Information, 245, Daehak-ro, Yuseong-gu, Daejeon
    Copyright (C) KISTI. All Rights Reserved.