• Title/Summary/Keyword: Weight Mining

Search Result 157, Processing Time 0.025 seconds

Discovering Association Rules using Item Clustering on Frequent Pattern Network (빈발 패턴 네트워크에서 아이템 클러스터링을 통한 연관규칙 발견)

  • Oh, Kyeong-Jin;Jung, Jin-Guk;Ha, In-Ay;Jo, Geun-Sik
    • Journal of Intelligence and Information Systems
    • /
    • v.14 no.1
    • /
    • pp.1-17
    • /
    • 2008
  • Data mining is defined as the process of discovering meaningful and useful pattern in large volumes of data. In particular, finding associations rules between items in a database of customer transactions has become an important thing. Some data structures and algorithms had been proposed for storing meaningful information compressed from an original database to find frequent itemsets since Apriori algorithm. Though existing method find all association rules, we must have a lot of process to analyze association rules because there are too many rules. In this paper, we propose a new data structure, called a Frequent Pattern Network (FPN), which represents items as vertices and 2-itemsets as edges of the network. In order to utilize FPN, We constitute FPN using item's frequency. And then we use a clustering method to group the vertices on the network into clusters so that the intracluster similarity is maximized and the intercluster similarity is minimized. We generate association rules based on clusters. Our experiments showed accuracy of clustering items on the network using confidence, correlation and edge weight similarity methods. And We generated association rules using clusters and compare traditional and our method. From the results, the confidence similarity had a strong influence than others on the frequent pattern network. And FPN had a flexibility to minimum support value.

  • PDF

Insights into the genetic diversity of indigenous goats and their conservation priorities

  • Liu, Gang;Zhao, Qianjun;Lu, Jian;Sun, Feizhou;Han, Xu;Zhao, Junjin;Feng, Haiyong;Wang, Kejun;Liu, Chousheng
    • Asian-Australasian Journal of Animal Sciences
    • /
    • v.32 no.10
    • /
    • pp.1501-1510
    • /
    • 2019
  • Objective: An experiment was conducted to evaluate genetic diversity of 26 Chinese indigenous goats by 30 microsatellite markers, and then to define conservation priorities to set up the protection programs according to the weight given to within- and between-breed genetic diversity. Methods: Twenty-six representative populations of Chinese indigenous goats, 1,351 total, were sampled from different geographic regions of China. Within-breed genetic diversity and marker polymorphism were estimated calculating the mean number of alleles, observed heterozygosities, expected heterozygosities, fixation index, effective number of alleles and allelic richness. Conservation priorities were analyzed by statistical methods. Results: A relatively high level of genetic diversity was found in twenty-four population; the exceptions were in the Daiyun and Fuqing goat populations. Within-breed kinship coefficient matrices identified seven highly inbred breeds which should be of concern. Of these, six breeds receive a negative contribution to heterozygosity when the method was based on proportional contribution to heterozygosity. Based on Weitzman or Piyasatian and Kinghorn methods, the breeds distant from others i.e. Inner Mongolia Cashmere goat, Chengdu Brown goat and Leizhou goat obtain a high ranking. Evidence from Caballero and Toro and Fabuel et al method prioritized Jining Gray goat, Liaoning Cashmere goat, and Inner Mongolia Cashmere goat, which agree with results from Kinship-based methods. Conclusion: Conservation priorities were determined according to multiple methods. Our results suggest Inner Mongolia Cashmere goat (most methods), Jining Gray goat and Liaoning Cashmere goat (high contribution to heterozygosity and total diversity) should be prioritized based on most methods. Furthermore, Daiyun goat and Shannan White goat also should be prioritized based on consideration of effective population size. However, if one breed can continually survive under changing conditions, the straightforward approach would be to increase its utilization and attraction for production via mining breed germplasm characteristics.

Reformability evaluation of blasting-enhanced permeability in in situ leaching mining of low-permeability sandstone-type uranium deposits

  • Wei Wang;Xuanyu Liang;Qinghe Niu;Qizhi Wang;Jinyi Zhuo;Xuebin Su;Genmao Zhou;Lixin Zhao;Wei Yuan;Jiangfang Chang;Yongxiang Zheng;Jienan Pan;Zhenzhi Wang;Zhongmin Ji
    • Nuclear Engineering and Technology
    • /
    • v.55 no.8
    • /
    • pp.2773-2784
    • /
    • 2023
  • It is essential to evaluate the blasting-enhanced permeability (BEP) feasibility of a low-permeability sandstone-type uranium deposit. In this work, the mineral composition, reservoir physical properties and rock mechanical properties of samples from sandstone-type uranium deposits were first measured. Then, the reformability evaluation method was established by the analytic hierarchy process-entropy weight method (AHP-EWM) and the fuzzy mathematics method. Finally, evaluation results were verified by the split Hopkinson Pressure Bar (SHPB) experiment and permeability test. Results show that medium sandstone, argillaceous sandstone and siltstone exhibit excellent reformability, followed by coarse sandstone and fine sandstone, while the reformability of sandy mudstone is poor and is not able to accept BEP reservoir stimulation. The permeability improvement and the distribution of damage fractures before and after the SHPB experiment confirm the correctness of evaluation results. This research provides a reformability evaluation method for the BEP of the low-permeability sandstone-type uranium deposit, which contributes to the selection of the appropriate regional and stratigraphic horizon of the BEP and the enhanced ISL of the low-permeability sandstone-type uranium deposit.

On the Milled Wood Lignins Isolated from Hardwood by Progressive Milling (단계적(段階的) 분쇄법(粉碎法)에 의해 조제(調製)된 활엽수(闊葉樹) MWL에 관한 연구(硏究))

  • Cho, Nam Seok
    • Journal of Korean Society of Forest Science
    • /
    • v.45 no.1
    • /
    • pp.62-67
    • /
    • 1979
  • Ultraviolet microscopy of ultrathin sections of wood has proved to be one of the useful means for determining the lignin distribution in the various regions of the cell wall. Also, spectral approach and quantitative analysis of isolated compound middle lamella fraction from birch xylem have revealed that the lignin associated with the vessel secondary wall and middle lamella is composed predominantly of gualacylpropane units. Lignin deposited in the fiber and ray parenchyma secondary walls is composed mostly of syringylpropane units. The middle lamella lignin around fibers and ray cells contains both guaiacyl and syringyl propane quits. On the basis of the results above, this research was carried out to clarify the origin of milled wood lignin (MWL) by analysing the chemical characteristics of ML MWLs extracted at various milling stages. The amount of phenolic hydroxyl-, ${\alpha}$-carbonyl-, and methoxyl-group in the MWL's increases the milling time. And progressive mining contributes to the merease of ratio of syringylaldehyde to vanillin(S/V ratio) after nitrobenzene oxidation of MWL. Accordingly, It could be concluded that milled wood lignin extracted at the initial milling stage derives from compound middle lamella region of cell wall, whereas, with progressive milling, lignin of secondary wall of fiber is introduced gradually to milled wood lignin. These results are suggesting that heterogeneous chemical structure of lignins in hardwood exists. Although milled wood lignin at the initial stage seems to have lower molecular weight in comparison with milled wood lignin extracted at final milling stage from the result of Gel-filtration curves, further study would be required on molecular weight distribution of milled wood lignin in future.

  • PDF

Exposure and Risk Assessments of Multimedia of Arsenic in the Environment (환경 중 비소의 매체통합 노출평가 및 위해성평가 연구)

  • Sim, Ki-Tae;Kim, Dong-Hoon;Lee, Jaewoo;Lee, Chae-Hong;Park, Soyeon;Seok, Kwang-Seol;Kim, Younghee
    • Journal of Environmental Impact Assessment
    • /
    • v.28 no.2
    • /
    • pp.152-168
    • /
    • 2019
  • The element arsenic, which is abundant in the Earth's crust, is used for various industrial purposes including materials for disease treatment and household goods. Various human activities, such as the disposal of soil waste, metal mining and smelting, and combustion of fossil fuels, have caused the pollution of the environment with arsenic. Recently, guidelines for arsenic in rice have been adopted by the Korean ministry of food and drug safety to prevent health risks based on rice consumption. Because of the exposure to arsenic and its accumulation in the human body through various channels, such as air inhalation, skin contact, ingestion of drinking water, and food consumption, integrated multimedia risk assessment is required to adopt appropriate risk management policies. Therefore, integrated human health risk assessment was carried out in this study using integrated exposure assessment based on multimedia (e.g., air, water, and soil) and multi-route (e.g., oral, inhalation, and dermal) scenarios. The results show that oral uptake via drinking water is the most common pathway of arsenic into the human body, accounting for 57%-96% of the total arsenic exposure. Among various age groups, the highest exposures to arsenic were observed in infants because the body weight of infants is low and the surface areas of infant bodies are large. Based on the results of the exposure assessment, the cancer and non-cancer risks were calculated. The cancer risk for CTE and RME is in the range of 2.3E-05 to 6.7E-05 and thus is negligible because it does not exceed the cancer probability of 1.0E-04 for all age groups. On the other hand, the cancer risk for RME varies from 6.4E-05 to 1.8E-04 and from 1.3E-04 to 1.8E-04 for infants and preschool children, exceeding the excess cancer risk of 1.0E-04. The non-cancer risks range from 5.4E-02 to 1.9E-01 and from 1.5E-01 to 6.8E-01, respectively. They do not exceed the hazard index 1 for all scenarios and all ages.

A New Approach to Automatic Keyword Generation Using Inverse Vector Space Model (키워드 자동 생성에 대한 새로운 접근법: 역 벡터공간모델을 이용한 키워드 할당 방법)

  • Cho, Won-Chin;Rho, Sang-Kyu;Yun, Ji-Young Agnes;Park, Jin-Soo
    • Asia pacific journal of information systems
    • /
    • v.21 no.1
    • /
    • pp.103-122
    • /
    • 2011
  • Recently, numerous documents have been made available electronically. Internet search engines and digital libraries commonly return query results containing hundreds or even thousands of documents. In this situation, it is virtually impossible for users to examine complete documents to determine whether they might be useful for them. For this reason, some on-line documents are accompanied by a list of keywords specified by the authors in an effort to guide the users by facilitating the filtering process. In this way, a set of keywords is often considered a condensed version of the whole document and therefore plays an important role for document retrieval, Web page retrieval, document clustering, summarization, text mining, and so on. Since many academic journals ask the authors to provide a list of five or six keywords on the first page of an article, keywords are most familiar in the context of journal articles. However, many other types of documents could not benefit from the use of keywords, including Web pages, email messages, news reports, magazine articles, and business papers. Although the potential benefit is large, the implementation itself is the obstacle; manually assigning keywords to all documents is a daunting task, or even impractical in that it is extremely tedious and time-consuming requiring a certain level of domain knowledge. Therefore, it is highly desirable to automate the keyword generation process. There are mainly two approaches to achieving this aim: keyword assignment approach and keyword extraction approach. Both approaches use machine learning methods and require, for training purposes, a set of documents with keywords already attached. In the former approach, there is a given set of vocabulary, and the aim is to match them to the texts. In other words, the keywords assignment approach seeks to select the words from a controlled vocabulary that best describes a document. Although this approach is domain dependent and is not easy to transfer and expand, it can generate implicit keywords that do not appear in a document. On the other hand, in the latter approach, the aim is to extract keywords with respect to their relevance in the text without prior vocabulary. In this approach, automatic keyword generation is treated as a classification task, and keywords are commonly extracted based on supervised learning techniques. Thus, keyword extraction algorithms classify candidate keywords in a document into positive or negative examples. Several systems such as Extractor and Kea were developed using keyword extraction approach. Most indicative words in a document are selected as keywords for that document and as a result, keywords extraction is limited to terms that appear in the document. Therefore, keywords extraction cannot generate implicit keywords that are not included in a document. According to the experiment results of Turney, about 64% to 90% of keywords assigned by the authors can be found in the full text of an article. Inversely, it also means that 10% to 36% of the keywords assigned by the authors do not appear in the article, which cannot be generated through keyword extraction algorithms. Our preliminary experiment result also shows that 37% of keywords assigned by the authors are not included in the full text. This is the reason why we have decided to adopt the keyword assignment approach. In this paper, we propose a new approach for automatic keyword assignment namely IVSM(Inverse Vector Space Model). The model is based on a vector space model. which is a conventional information retrieval model that represents documents and queries by vectors in a multidimensional space. IVSM generates an appropriate keyword set for a specific document by measuring the distance between the document and the keyword sets. The keyword assignment process of IVSM is as follows: (1) calculating the vector length of each keyword set based on each keyword weight; (2) preprocessing and parsing a target document that does not have keywords; (3) calculating the vector length of the target document based on the term frequency; (4) measuring the cosine similarity between each keyword set and the target document; and (5) generating keywords that have high similarity scores. Two keyword generation systems were implemented applying IVSM: IVSM system for Web-based community service and stand-alone IVSM system. Firstly, the IVSM system is implemented in a community service for sharing knowledge and opinions on current trends such as fashion, movies, social problems, and health information. The stand-alone IVSM system is dedicated to generating keywords for academic papers, and, indeed, it has been tested through a number of academic papers including those published by the Korean Association of Shipping and Logistics, the Korea Research Academy of Distribution Information, the Korea Logistics Society, the Korea Logistics Research Association, and the Korea Port Economic Association. We measured the performance of IVSM by the number of matches between the IVSM-generated keywords and the author-assigned keywords. According to our experiment, the precisions of IVSM applied to Web-based community service and academic journals were 0.75 and 0.71, respectively. The performance of both systems is much better than that of baseline systems that generate keywords based on simple probability. Also, IVSM shows comparable performance to Extractor that is a representative system of keyword extraction approach developed by Turney. As electronic documents increase, we expect that IVSM proposed in this paper can be applied to many electronic documents in Web-based community and digital library.

Multi-Dimensional Analysis Method of Product Reviews for Market Insight (마켓 인사이트를 위한 상품 리뷰의 다차원 분석 방안)

  • Park, Jeong Hyun;Lee, Seo Ho;Lim, Gyu Jin;Yeo, Un Yeong;Kim, Jong Woo
    • Journal of Intelligence and Information Systems
    • /
    • v.26 no.2
    • /
    • pp.57-78
    • /
    • 2020
  • With the development of the Internet, consumers have had an opportunity to check product information easily through E-Commerce. Product reviews used in the process of purchasing goods are based on user experience, allowing consumers to engage as producers of information as well as refer to information. This can be a way to increase the efficiency of purchasing decisions from the perspective of consumers, and from the seller's point of view, it can help develop products and strengthen their competitiveness. However, it takes a lot of time and effort to understand the overall assessment and assessment dimensions of the products that I think are important in reading the vast amount of product reviews offered by E-Commerce for the products consumers want to compare. This is because product reviews are unstructured information and it is difficult to read sentiment of reviews and assessment dimension immediately. For example, consumers who want to purchase a laptop would like to check the assessment of comparative products at each dimension, such as performance, weight, delivery, speed, and design. Therefore, in this paper, we would like to propose a method to automatically generate multi-dimensional product assessment scores in product reviews that we would like to compare. The methods presented in this study consist largely of two phases. One is the pre-preparation phase and the second is the individual product scoring phase. In the pre-preparation phase, a dimensioned classification model and a sentiment analysis model are created based on a review of the large category product group review. By combining word embedding and association analysis, the dimensioned classification model complements the limitation that word embedding methods for finding relevance between dimensions and words in existing studies see only the distance of words in sentences. Sentiment analysis models generate CNN models by organizing learning data tagged with positives and negatives on a phrase unit for accurate polarity detection. Through this, the individual product scoring phase applies the models pre-prepared for the phrase unit review. Multi-dimensional assessment scores can be obtained by aggregating them by assessment dimension according to the proportion of reviews organized like this, which are grouped among those that are judged to describe a specific dimension for each phrase. In the experiment of this paper, approximately 260,000 reviews of the large category product group are collected to form a dimensioned classification model and a sentiment analysis model. In addition, reviews of the laptops of S and L companies selling at E-Commerce are collected and used as experimental data, respectively. The dimensioned classification model classified individual product reviews broken down into phrases into six assessment dimensions and combined the existing word embedding method with an association analysis indicating frequency between words and dimensions. As a result of combining word embedding and association analysis, the accuracy of the model increased by 13.7%. The sentiment analysis models could be seen to closely analyze the assessment when they were taught in a phrase unit rather than in sentences. As a result, it was confirmed that the accuracy was 29.4% higher than the sentence-based model. Through this study, both sellers and consumers can expect efficient decision making in purchasing and product development, given that they can make multi-dimensional comparisons of products. In addition, text reviews, which are unstructured data, were transformed into objective values such as frequency and morpheme, and they were analysed together using word embedding and association analysis to improve the objectivity aspects of more precise multi-dimensional analysis and research. This will be an attractive analysis model in terms of not only enabling more effective service deployment during the evolving E-Commerce market and fierce competition, but also satisfying both customers.