• Title/Summary/Keyword: Jaccard Similarity

Search Result 51, Processing Time 0.032 seconds

Exploration of Hierarchical Techniques for Clustering Korean Author Names (한글 저자명 군집화를 위한 계층적 기법 비교)

  • Kang, In-Su
    • Journal of Information Management
    • /
    • v.40 no.2
    • /
    • pp.95-115
    • /
    • 2009
  • Author resolution is to disambiguate same-name author occurrences into real individuals. For this, pair-wise author similarities are computed for author name entities, and then clustering is performed. So far, many studies have employed hierarchical clustering techniques for author disambiguation. However, various hierarchical clustering methods have not been sufficiently investigated. This study covers an empirical evaluation and analysis of hierarchical clustering applied to Korean author resolution, using multiple distance functions such as Dice coefficient, Cosine similarity, Euclidean distance, Jaccard coefficient, Pearson correlation coefficient.

Brain Tumor Detection Based on Amended Convolution Neural Network Using MRI Images

  • Mohanasundari M;Chandrasekaran V;Anitha S
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.17 no.10
    • /
    • pp.2788-2808
    • /
    • 2023
  • Brain tumors are one of the most threatening malignancies for humans. Misdiagnosis of brain tumors can result in false medical intervention, which ultimately reduces a patient's chance of survival. Manual identification and segmentation of brain tumors from Magnetic Resonance Imaging (MRI) scans can be difficult and error-prone because of the great range of tumor tissues that exist in various individuals and the similarity of normal tissues. To overcome this limitation, the Amended Convolutional Neural Network (ACNN) model has been introduced, a unique combination of three techniques that have not been previously explored for brain tumor detection. The three techniques integrated into the ACNN model are image tissue preprocessing using the Kalman Bucy Smoothing Filter to remove noisy pixels from the input, image tissue segmentation using the Isotonic Regressive Image Tissue Segmentation Process, and feature extraction using the Marr Wavelet Transformation. The extracted features are compared with the testing features using a sigmoid activation function in the output layer. The experimental findings show that the suggested model outperforms existing techniques concerning accuracy, precision, sensitivity, dice score, Jaccard index, specificity, Positive Predictive Value, Hausdorff distance, recall, and F1 score. The proposed ACNN model achieved a maximum accuracy of 98.8%, which is higher than other existing models, according to the experimental results.

Changes Over Time in the Community Structure and Spatial Distribution of Forest Vegetation on Mt. Yeompo, Ulsan City, South Korea (염포산 산림식생의 군락 구조 및 공간 분포의 경시적 변화)

  • Oh, Jeong-Hak;Kim, Jun-Soo;Cho, Hyun-Je
    • Journal of Korean Society of Forest Science
    • /
    • v.109 no.2
    • /
    • pp.145-156
    • /
    • 2020
  • In 2000 and 2018, phytosociological surveys were carried out in the forest vegetation of Mt. Yeompo, a representative isolated urban forest in Ulsan city. The trends of change in forest structure, composition, and spatial distribution were compared between years. Total percent coverage per 100 squaremeters of forest vegetation was similar, but natural vegetation showed a 9% increase. The importance of constituent species changed slightly. Specifically, Lindera erythrocarpa and Styrax japonicus showed very high growth rates of 835% and 269%, respectively. Species richness (S) and diversity (H') decreased by about 22% and 8%, respectively. Both S and H' showed slightly higher rates of decrease in artificial compared with natural vegetation. The constituent species life form spectrums were the same in 2000 and 2018 as 'MM-R5-D4-e'. The similarity (Jaccard coefficient) in the species composition of the forest vegetation was almost homogeneous at approximately 75%. The number of indicator species decreased from 16 species in 2000 to 7 species in 2018. This decrease was mostly due to a decline in herbaceous plants, such as Hemicryptophytes, Geophytes, and Therophytes, which are sensitive to disturbances. The spatial distribution of forest vegetation did not change significantly. The number of forest landscape elements (patches) increased by approximately 25% from 537 in 2000 to 721 in 2018, while the average size decreased by about 20% from 1.28 ha in 2000 to 1.03 ha in 2018.

Personalized Bookmark Recommendation System Using Tag Network (태그 네트워크를 이용한 개인화 북마크 추천시스템)

  • Eom, Tae-Young;Kim, Woo-Ju;Park, Sang-Un
    • The Journal of Society for e-Business Studies
    • /
    • v.15 no.4
    • /
    • pp.181-195
    • /
    • 2010
  • The participation and share between personal users are the driving force of Web 2.0, and easily found in blog, social network, collective intelligence, social bookmarking and tagging. Among those applications, the social bookmarking lets Internet users to store bookmarks online and share them, and provides various services based on shared bookmarks which people think important.Delicious.com is the representative site of social bookmarking services, and provides a bookmark search service by using tags which users attach to the bookmarks. Our paper suggests a method re-ranking the ranks from Delicious.com based on user tags in order to provide personalized bookmark recommendations. Moreover, a method to consider bookmarks which have tags not directly related to the user query keywords is suggested by using tag network based on Jaccard similarity coefficient. The performance of suggested system is verified with experiments that compare the ranks by Delicious.com with new ranks of our system.

Isolation, Characterization and Numerical Taxonomy of Novel Oxalate-oxidizing Bacteria

  • Sahin, Nurettin;Gokler, Isa;Tamer, Abdurrahman
    • Journal of Microbiology
    • /
    • v.40 no.2
    • /
    • pp.109-118
    • /
    • 2002
  • The present work is aimed at providing additional new pure cultures of oxalate utilizing bacteria and its preliminary characterization for further work in the field of oxalate-metabolism and taxonomic studies. The taxonomy of 14 mesophilic, aerobic oxalotrophic bacteria isolated by an enrichment culture technique from soils rhizosphers, and the juice of the petiole/stem tissue of plants was investigated. Isolates were characterized with 95 morphological, biochemical and physiological tests. Cellular lipid components and carotenoids of isolates were also studied as an aid to taxonomic characterization. All isolates were Gram-negative, oxidase and catalase positive and no growth factors were required. In addition to oxalates, some of the strains grow on methanol and/or formate. The taxonomic similarities among isolates, reference strains or previously reported oxalotrophic bacteria were analysed by using the Simple Matching (S/ sub SM/) and Jaccard (S$\_$J/) Coefficients. Clustering was performed by using the unweighted pair group method with arithmetic averages (UPGMA) algorithm. The oxalotrophic strains formed five major and two single-member clusters at the 70-86% similarity level. Based on the numerical taxonomy, isolates were separated into three phenotypic groups. Pink-pigmented strains belonged to Methylobacterium extorquens, yellow-pigmented strains were most similar to Pseudomonas sp. YOx and Xanthobacter autorophicus, and heterogeneous non-pigmented strains were closely related to genera Azospirillum, Ancylobacter, Burkholderia and Pseudomonas. New strains belonged to the genera Pseudomonas, Azospirillum and Ancylobacter that differ taxonomically from other known oxalate oxidizers were obtained. Numerical analysis indicated that some strains of the yellow-pigmented and nonpigmented clusters might represent new species.

An effective automated ontology construction based on the agriculture domain

  • Deepa, Rajendran;Vigneshwari, Srinivasan
    • ETRI Journal
    • /
    • v.44 no.4
    • /
    • pp.573-587
    • /
    • 2022
  • The agricultural sector is completely different from other sectors since it completely relies on various natural and climatic factors. Climate changes have many effects, including lack of annual rainfall and pests, heat waves, changes in sea level, and global ozone/atmospheric CO2 fluctuation, on land and agriculture in similar ways. Climate change also affects the environment. Based on these factors, farmers chose their crops to increase productivity in their fields. Many existing agricultural ontologies are either domain-specific or have been created with minimal vocabulary and no proper evaluation framework has been implemented. A new agricultural ontology focused on subdomains is designed to assist farmers using Jaccard relative extractor (JRE) and Naïve Bayes algorithm. The JRE is used to find the similarity between two sentences and words in the agricultural documents and the relationship between two terms is identified via the Naïve Bayes algorithm. In the proposed method, the preprocessing of data is carried out through natural language processing techniques and the tags whose dimensions are reduced are subjected to rule-based formal concept analysis and mapping. The subdomain ontologies of weather, pest, and soil are built separately, and the overall agricultural ontology are built around them. The gold standard for the lexical layer is used to evaluate the proposed technique, and its performance is analyzed by comparing it with different state-of-the-art systems. Precision, recall, F-measure, Matthews correlation coefficient, receiver operating characteristic curve area, and precision-recall curve area are the performance metrics used to analyze the performance. The proposed methodology gives a precision score of 94.40% when compared with the decision tree(83.94%) and K-nearest neighbor algorithm(86.89%) for agricultural ontology construction.

Deep Learning-Based Lumen and Vessel Segmentation of Intravascular Ultrasound Images in Coronary Artery Disease

  • Gyu-Jun Jeong;Gaeun Lee;June-Goo Lee;Soo-Jin Kang
    • Korean Circulation Journal
    • /
    • v.54 no.1
    • /
    • pp.30-39
    • /
    • 2024
  • Background and Objectives: Intravascular ultrasound (IVUS) evaluation of coronary artery morphology is based on the lumen and vessel segmentation. This study aimed to develop an automatic segmentation algorithm and validate the performances for measuring quantitative IVUS parameters. Methods: A total of 1,063 patients were randomly assigned, with a ratio of 4:1 to the training and test sets. The independent data set of 111 IVUS pullbacks was obtained to assess the vessel-level performance. The lumen and external elastic membrane (EEM) boundaries were labeled manually in every IVUS frame with a 0.2-mm interval. The Efficient-UNet was utilized for the automatic segmentation of IVUS images. Results: At the frame-level, Efficient-UNet showed a high dice similarity coefficient (DSC, 0.93±0.05) and Jaccard index (JI, 0.87±0.08) for lumen segmentation, and demonstrated a high DSC (0.97±0.03) and JI (0.94±0.04) for EEM segmentation. At the vessel-level, there were close correlations between model-derived vs. experts-measured IVUS parameters; minimal lumen image area (r=0.92), EEM area (r=0.88), lumen volume (r=0.99) and plaque volume (r=0.95). The agreement between model-derived vs. expert-measured minimal lumen area was similarly excellent compared to the experts' agreement. The model-based lumen and EEM segmentation for a 20-mm lesion segment required 13.2 seconds, whereas manual segmentation with a 0.2-mm interval by an expert took 187.5 minutes on average. Conclusions: The deep learning models can accurately and quickly delineate vascular geometry. The artificial intelligence-based methodology may support clinicians' decision-making by real-time application in the catheterization laboratory.

아까시나무(Robinia pseudo-acacia)종자 단백질의 전기 영동 변이

  • 김창호;이호준;김용옥
    • The Korean Journal of Ecology
    • /
    • v.16 no.4
    • /
    • pp.515-526
    • /
    • 1993
  • In order to study the ecotypic variation of Rohinia pseudo-acacia L. distributed in southern area of Korean peninsula, 15 local populations(Daejin, Sokcho, Kangneung, Mt. Surak, Hongcheon, Kwangneung, Namhansanseong, Chungju, Yesan, Andong, Jeonju, Dalseong, Changweon, Mokpo and Wando), located from $34^{\circ}18'N\;to\;38^{\circ}36'N$, were selected based on the latitudes and geographical distances. Seeds of these populations were collected and protein contents of seeds and their band patterns were investigated. The seed proteins of all populations were electrophoresed on SDS-polyacrylamide gel. Total number of protein bands were 35, whose molecular weights ranged from 17, 258 daltons to 142, 232 daltons. The number of bands of seed proteins was 23 in Dalseong and Hongcheon and was 32 in Daejin and Sokcho, showing an increasing tendency in the number of bands as the latitude goes high. The local populations were classified into 3 local types based on protein analysis: the middle north east coastal type(Daejin, Sokcho. Kangneung), the central type (Mt. Surak, Hongcheon, Kwangneung, Namhansanseong, Chungju) and the southern type(Yesan, Andong, Jeonju, Dalseong, Changweon, Mokpo, Wando). According to the results of cluster analysis by UPGMA based on the similarity index(c0efficient of Jaccard) of the patterns, 3 local types were subdivided further into 6 types: the middle north east coastal type(Sokcho, Kangneung), the north central type I (Mt. Surak, Hongcheon), the north central type II (Narnhansanseong, Chungju, Daejin), the north central type III (Kwangneung), the south central type (Yesan, Dalseong, Jeonju) and the southern type(Andong, Changweon, Mokpo, Dalseong, Wando). The No. 12 band of the separated seed proteins showed the highest colored density in the preparations from all the populations. The No. 11~13 and No. 23~28 bands also showed high densities. As a whole, southern type populations (Changweon, Mokpo, Wando) showed high protein contents and high colored density. Total protein contents of the seeds in each population were variable from 9. 68mg / g (Mt. Surak) to 17.30mg/g (Jeonju), showing an increasing trends toward low latitudes.

  • PDF

Research on the Evaluation and Utilization of Constitutional Diagnosis by Korean Doctors using AI-based Evaluation Tool (인공지능 기반 평가 도구를 이용한 한의사의 체질 진단 평가 및 활용 방안에 대한 연구)

  • Park, Musun;Hwang, Minwoo;Lee, Jeongyun;Kim, Chang-Eop;Kwon, Young-Kyu
    • Journal of Physiology & Pathology in Korean Medicine
    • /
    • v.36 no.2
    • /
    • pp.73-78
    • /
    • 2022
  • Since Traditional Korean medicine (TKM) doctors use various knowledge systems during treatment, diagnosis results may differ for each TKM doctor. However, it is difficult to explain all the reasons for the diagnosis because TKM doctors use both explicit and implicit knowledge. In this study, an upgraded random forest (RF)-based evaluation tool was proposed to extract clinical knowledge of TKM doctors. Also, it was confirmed to what extent the professor's clinical knowledge was delivered to the trainees by using the evaluation tool. The data used to construct the evaluation tool were targeted at 106 people who visited the Sasang Constitutional Department at Kyung Hee University Korean Medicine Hospital at Gangdong. For explicit knowledge extraction, four TKM doctors were asked to express the importance of symptoms as scores. In addition, for implicit knowledge extraction, importance score was confirmed in the RF model that learned the patient's symptoms and the TKM doctor's constitutional determination results. In order to confirm the delivery of clinical knowledge, the similarity of symptoms that professors and trainees consider important when discriminating constitution was calculated using the Jaccard coefficient. As a result of the study, our proposed tool was able to successfully evaluate the clinical knowledge of TKM doctors. Also, it was confirmed that the professor's clinical knowledge was delivered to the trainee. Our tool can be used in various fields such as providing feedback on treatment, education of training TKM doctors, and development of AI in TKM.

A Study on Market Size Estimation Method by Product Group Using Word2Vec Algorithm (Word2Vec을 활용한 제품군별 시장규모 추정 방법에 관한 연구)

  • Jung, Ye Lim;Kim, Ji Hui;Yoo, Hyoung Sun
    • Journal of Intelligence and Information Systems
    • /
    • v.26 no.1
    • /
    • pp.1-21
    • /
    • 2020
  • With the rapid development of artificial intelligence technology, various techniques have been developed to extract meaningful information from unstructured text data which constitutes a large portion of big data. Over the past decades, text mining technologies have been utilized in various industries for practical applications. In the field of business intelligence, it has been employed to discover new market and/or technology opportunities and support rational decision making of business participants. The market information such as market size, market growth rate, and market share is essential for setting companies' business strategies. There has been a continuous demand in various fields for specific product level-market information. However, the information has been generally provided at industry level or broad categories based on classification standards, making it difficult to obtain specific and proper information. In this regard, we propose a new methodology that can estimate the market sizes of product groups at more detailed levels than that of previously offered. We applied Word2Vec algorithm, a neural network based semantic word embedding model, to enable automatic market size estimation from individual companies' product information in a bottom-up manner. The overall process is as follows: First, the data related to product information is collected, refined, and restructured into suitable form for applying Word2Vec model. Next, the preprocessed data is embedded into vector space by Word2Vec and then the product groups are derived by extracting similar products names based on cosine similarity calculation. Finally, the sales data on the extracted products is summated to estimate the market size of the product groups. As an experimental data, text data of product names from Statistics Korea's microdata (345,103 cases) were mapped in multidimensional vector space by Word2Vec training. We performed parameters optimization for training and then applied vector dimension of 300 and window size of 15 as optimized parameters for further experiments. We employed index words of Korean Standard Industry Classification (KSIC) as a product name dataset to more efficiently cluster product groups. The product names which are similar to KSIC indexes were extracted based on cosine similarity. The market size of extracted products as one product category was calculated from individual companies' sales data. The market sizes of 11,654 specific product lines were automatically estimated by the proposed model. For the performance verification, the results were compared with actual market size of some items. The Pearson's correlation coefficient was 0.513. Our approach has several advantages differing from the previous studies. First, text mining and machine learning techniques were applied for the first time on market size estimation, overcoming the limitations of traditional sampling based- or multiple assumption required-methods. In addition, the level of market category can be easily and efficiently adjusted according to the purpose of information use by changing cosine similarity threshold. Furthermore, it has a high potential of practical applications since it can resolve unmet needs for detailed market size information in public and private sectors. Specifically, it can be utilized in technology evaluation and technology commercialization support program conducted by governmental institutions, as well as business strategies consulting and market analysis report publishing by private firms. The limitation of our study is that the presented model needs to be improved in terms of accuracy and reliability. The semantic-based word embedding module can be advanced by giving a proper order in the preprocessed dataset or by combining another algorithm such as Jaccard similarity with Word2Vec. Also, the methods of product group clustering can be changed to other types of unsupervised machine learning algorithm. Our group is currently working on subsequent studies and we expect that it can further improve the performance of the conceptually proposed basic model in this study.