• 제목/요약/키워드: Online mining

Search Result 398, Processing Time 0.03 seconds

Construction of a Protein-Protein Interaction Network for Chronic Myelocytic Leukemia and Pathway Prediction of Molecular Complexes

  • Zhou, Chao;Teng, Wen-Jing;Yang, Jing;Hu, Zhen-Bo;Wang, Cong-Cong;Qin, Bao-Ning;Lv, Qing-Liang;Liu, Ze-Wang;Sun, Chang-Gang
    • Asian Pacific Journal of Cancer Prevention
    • /
    • v.15 no.13
    • /
    • pp.5325-5330
    • /
    • 2014
  • Background: Chronic myelocytic leukemia is a disease that threatens both adults and children. Great progress has been achieved in treatment but protein-protein interaction networks underlining chronic myelocytic leukemia are less known. Objective: To develop a protein-protein interaction network for chronic myelocytic leukemia based on gene expression and to predict biological pathways underlying molecular complexes in the network. Materials and Methods: Genes involved in chronic myelocytic leukemia were selected from OMIM database. Literature mining was performed by Agilent Literature Search plugin and a protein-protein interaction network of chronic myelocytic leukemia was established by Cytoscape. The molecular complexes in the network were detected by Clusterviz plugin and pathway enrichment of molecular complexes were performed by DAVID online. Results and Discussion: There are seventy-nine chronic myelocytic leukemia genes in the Mendelian Inheritance In Man Database. The protein-protein interaction network of chronic myelocytic leukemia contained 638 nodes, 1830 edges and perhaps 5 molecular complexes. Among them, complex 1 is involved in pathways that are related to cytokine secretion, cytokine-receptor binding, cytokine receptor signaling, while complex 3 is related to biological behavior of tumors which can provide the bioinformatic foundation for further understanding the mechanisms of chronic myelocytic leukemia.

Inferring Disease-related Genes using Title and Body in Biomedical Text (생물학 문헌 데이터의 제목과 본문을 이용한 질병 관련 유전자 추론 방법)

  • Kim, Jeongwoo;Kim, Hyunjin;Yeo, Yunku;Shin, Mincheol;Park, Sanghyun
    • KIISE Transactions on Computing Practices
    • /
    • v.23 no.1
    • /
    • pp.28-36
    • /
    • 2017
  • After the genome projects of the 90s, a vast number of gene studies have been stored in online databases. By using these databases, several biological relationships can be inferred. In this study, we proposed a method to infer disease-gene relationships using title and body in biomedical text. The title was used to extract hub genes from data in the literature; whereas, the body of the literature was used to extract sub genes that are related to hub genes. Through these steps, we were able to construct a local gene-network for each report in the literature. By integrating the local gene-networks, we then constructed a global gene-network. Subsequent analyses of the global gene-network allowed inference of disease-related genes with high rank. We validated the proposed method by comparing with previous methods. The results indicated that the proposed method is a meaningful approach to infer disease-related genes.

A Study of Extraction of Variables Affecting the Adolescents' Computer Use Type with Decision Tree (의사결정트리 기반의 분석을 통한 청소년의 컴퓨터 사용 유형별 관련 변수 추출)

  • Lee, Hye-Joo;Jung, Eui-Hyun
    • The Journal of Korean Association of Computer Education
    • /
    • v.15 no.2
    • /
    • pp.9-18
    • /
    • 2012
  • This study investigated the extraction algorithm fitting for variables of adolescents' computer use type with the sample from KYPS data (3409 in the second grade of the junior high school; 1704 boys and 1705 girls). The results of the decision tree model revealed that : (1) Gender, computer use time, misdeed friends, parent supervision, other agreement of misdeed, parent study expectation, self-control, teacher attachment, and sibling relation were significant for entertainment type. (2) Gender, cyberclub, computer use time, self-belief, online misdeed were significant for relation type. (3) Study enthusiasm, personal study time, optimistic disposition, study and spare time, cyberclub, self-belief, and other people criticism were significant for information type. These results suggest that adolescents' diverse conditions should be considered for using computer more efficiently.

  • PDF

Application of Sentiment Analysis and Topic Modeling on Rural Solar PV Issues : Comparison of News Articles and Blog Posts (감성분석과 토픽모델링을 활용한 농촌태양광 관련 이슈 연구 : 언론 기사와 블로그 포스트 비교)

  • Ki, Jaehong;Ahn, Seunghyeok
    • Journal of Digital Convergence
    • /
    • v.18 no.9
    • /
    • pp.17-27
    • /
    • 2020
  • News articles and blog posts have influence on social agenda setting and this study applied text mining on the subject of solar PV in rural area appeared in those media. Texts are gained from online news articles and blog posts with rural solar PV as a keyword by web scrapping, and these are analysed by sentiment analysis and topic modeling technique. Sentiment analysis shows that the proportion of negative texts are significantly lower in blog posts compared to news articles. Result of topic modeling shows that topics related to government policy have the largest loading in positive articles whereas various topics are relatively evenly distributed in negative articles. For blog posts, topics related to rural area installation and environmental damage are have the largest loading in positive and negative texts, respectively. This research reveals issues related to rural solar PV by combining sentiment analysis and topic modeling that were separately applied in previous studies.

A Study on the Application of SNS Big Data to the Industry in the Fourth Industrial Revolution (제4차 산업혁명에서 SNS 빅데이터의 외식산업 활용 방안에 대한 연구)

  • Han, Soon-lim;Kim, Tae-ho;Lee, Jong-ho;Kim, Hak-Seon
    • Culinary science and hospitality research
    • /
    • v.23 no.7
    • /
    • pp.1-10
    • /
    • 2017
  • This study proposed SNS big data analysis method of food service industry in the 4th industrial revolution. This study analyzed the keyword of the fourth industrial revolution by using Google trend. Based on the data posted on the SNS from January 1, 2016 to September 5, 2017 (1 year and 8 months) utilizing the "Social Metrics". Through the social insights, the related words related to cooking were analyzed and visualized about attributes, products, hobbies and leisure. As a result of the analysis, keywords were found such as cooking, entrepreneurship, franchise, restaurant, job search, Twitter, family, friends, menu, reaction, video, etc. As a theoretical implication of this study, we proposed how to utilize big data produced from various online materials for research on restaurant business, interpret atypical data as meaningful data and suggest the basic direction of field application. In order to utilize positioning of customers of restaurant companies in the future, this study suggests more detailed and in-depth consumer sentiment as a basic resource for marketing data development through various menu development and customers' perception change. In addition, this study provides marketing implications for the foodservice industry and how to use big data for the cooking industry in preparation for the fourth industrial revolution.

Clustering Normal User Behavior for Anomaly Intrusion Detection (비정상행위 탐지를 위한 사용자 정상행위 클러스터링 기법)

  • Oh, Sang-Hyun;Lee, Won-Suk
    • The KIPS Transactions:PartC
    • /
    • v.10C no.7
    • /
    • pp.857-866
    • /
    • 2003
  • For detecting an intrusion based on the anomaly of a user's activities, previous works are concentrated on statistical techniques in order to analyze an audit data set. However. since they mainly analyze the average behavior of a user's activities, some anomalies can be detected inaccurately. In this paper, a new clustering algorithm for modeling the normal pattern of a user's activities is proposed. Since clustering can identify an arbitrary number of dense ranges in an analysis domain, it can eliminate the inaccuracy caused by statistical analysis. Also, clustering can be used to model common knowledge occurring frequently in a set of transactions. Consequently, the common activities of a user can be found more accurately. The common knowledge is represented by the occurrence frequency of similar data objects by the unit of a transaction as veil as the common repetitive ratio of similar data objects in each transaction. Furthermore, the proposed method also addresses how to maintain identified common knowledge as a concise profile. As a result, the profile can be used to detect any anomalous behavior In an online transaction.

Examining Factors Affecting the Binge-Watching Behaviors of OTT Services (OTT(Over-the-Top) 서비스의 몰아보기 시청행위 영향 요인 탐색)

  • Hwang, Kyung-Ho;Kim, Kyung-Ae
    • Journal of the Korea Convergence Society
    • /
    • v.11 no.3
    • /
    • pp.181-186
    • /
    • 2020
  • The purpose of this study is to empirically examine the factors affecting the binge-watching behaviors of OTT service users by using a multi-layer perceptron (MLP) artificial neural network. All samples (n=1,000) were collected from 'A survey on user awareness in OTT service' published by a Media Research Center of the Korea Press Foundation in 2018. Our research model includes one dependent variable which is binge-watching behaviors on OTT service and five independent variables such as gender, age, frequency of service usage, users' satisfaction with content recommendation algorithm, and content types mainly consumed. Our findings demonstrate that age, frequency of service usage, users' satisfaction with content recommendation algorithms, and certain types of contents (e.g., Korean dramas, Korean films, and foreign dramas) were found to be highly related to binge-watching behavior on OTT services.

Rethinking of Self-Organizing Maps for Market Segmentation in Customer Relationship Management (고객관계관리의 시장 세분화를 위한 Self-Organizing Maps 재고찰)

  • Bang, Joung-Hae;Hamel, Lutz;Ioerger, Brian
    • Journal of Intelligence and Information Systems
    • /
    • v.13 no.4
    • /
    • pp.17-34
    • /
    • 2007
  • Organizations have realized the importance of CRM. To obtain the maximum possible lifetime value from a customer base, it is critical that customer data is analyzed to understand patterns of customer response. As customer databases assume gigantic proportions due to Internet and e-commerce activity, data-mining-based market segmentation becomes crucial for understanding customers. Here we raise a question and some issues of using single SOM approach for clustering while proposing multiple self-organizing maps approach. This methodology exploits additional themes on the attributes that characterize customers in a typical CRM system. Since this additional theme is usually ignored by traditional market segmentation techniques we here suggest careful application of SOM for market segmentation.

  • PDF

Safeguarding Korean Export Trade through Social Media-Driven Risk Identification and Characterization

  • Sithipolvanichgul, Juthamon;Abrahams, Alan S.;Goldberg, David M.;Zaman, Nohel;Baghersad, Milad;Nasri, Leila;Ractham, Peter
    • Journal of Korea Trade
    • /
    • v.24 no.8
    • /
    • pp.39-62
    • /
    • 2020
  • Purpose - Korean exports account for a vast proportion of Korean GDP, and large volumes of Korean products are sold in the United States. Identifying and characterizing actual and potential product hazards related to Korean products is critical to safeguard Korean export trade, as severe quality issues can impair Korea's reputation and reduce global consumer confidence in Korean products. In this study, we develop country-of-origin-based product risk analysis methods for social media with a specific focus on Korean-labeled products, for the purpose of safeguarding Korean export trade. Design/methodology - We employed two social media datasets containing consumer-generated product reviews. Sentiment analysis is a popular text mining technique used to quantify the type and amount of emotion that is expressed in the text. It is a useful tool for gathering customer opinions regarding products. Findings - We document and discuss the specific potential risks found in Korean-labeled products and explain their implications for safeguarding Korean export trade. Finally, we analyze the false positive matches that arise from the established dictionaries that were used for risk discovery and utilize these classification errors to suggest opportunities for the future refinement of the associated automated text analytic methods. Originality/value - Various studies have used online feedback from social media to analyze product defects. However, none of them links their findings to trade promotion and the protection of a specific country's exports. Therefore, it is important to fill this research gap, which could help to safeguard export trade in Korea.

Consumer behavior prediction using Airbnb web log data (에어비앤비(Airbnb) 웹 로그 데이터를 이용한 고객 행동 예측)

  • An, Hyoin;Choi, Yuri;Oh, Raeeun;Song, Jongwoo
    • The Korean Journal of Applied Statistics
    • /
    • v.32 no.3
    • /
    • pp.391-404
    • /
    • 2019
  • Customers' fixed characteristics have often been used to predict customer behavior. It has recently become possible to track customer web logs as customer activities move from offline to online. It has become possible to collect large amounts of web log data; however, the researchers only focused on organizing the log data or describing the technical characteristics. In this study, we predict the decision-making time until each customer makes the first reservation, using Airbnb customer data provided by the Kaggle website. This data set includes basic customer information such as gender, age, and web logs. We use various methodologies to find the optimal model and compare prediction errors for cases with web log data and without it. We consider six models such as Lasso, SVM, Random Forest, and XGBoost to explore the effectiveness of the web log data. As a result, we choose Random Forest as our optimal model with a misclassification rate of about 20%. In addition, we confirm that using web log data in our study doubles the prediction accuracy in predicting customer behavior compared to not using it.