• Title/Summary/Keyword: Knowledge Mining

Search Result 582, Processing Time 0.027 seconds

Information Security Job Skills Requirements: Text-mining to Compare Job Posting and NCS (정보보호 직무 수행을 위해 필요한 지식 및 기술: 텍스트 마이닝을 이용한 구인광고와 NCS의 비교)

  • Hyo-Jung Jun;Byeong-Jo Park;Tae-Sung Kim
    • Information Systems Review
    • /
    • v.25 no.3
    • /
    • pp.179-197
    • /
    • 2023
  • As a sufficient workforce supports the industry's growth, workforce training has also been carried out as part of the industry promotion policy. However, the market still has a shortage of skilled mid-level workers. The information security disclosure requires organizations to secure personnel responsible for information security work. Still, the division between information technology work and job areas is unclear, and the pay is not high for responsibility. This paper compares job keywords in advertisements for the information security workforce for 2014, 2019, and 2022. There is no difference in the keywords describing the job duties of information security personnel in the three years, such as implementation, operation, technical support, network, and security solution. To identify the actual needs of companies, we also analyzed and compared the contents of job advertisements posted on online recruitment sites with information security sector knowledge and skills defined by the National Competence Standards used for comprehensive vocational training. It was found that technical skills such as technology development, network, and operating system are preferred in the actual workplace. In contrast, managerial skills such as the legal system and certification systems are prioritized in vocational training.

A Design and Practical Use of Spatial Data Warehouse for Spatiall Decision Making (공간적 의사결정을 위한 공간 데이터 웨어하우스 설계 및 활용)

  • Park Ji-Man;Hwang Chul-sue
    • Spatial Information Research
    • /
    • v.13 no.3 s.34
    • /
    • pp.239-252
    • /
    • 2005
  • The major reason that spatial data warehousing has attracted a great deal of attention in business GIS in recent years is due to the wide availability of huge amount of spatial data and the imminent need for fuming such data into useful geographic information. Therefore, this research has been focused on designing and implementing the pilot tested system for spatial decision making. The purpose of the system is to predict targeted marketing area by discriminating the customers by using both transaction quantity and the number of customer using credit card in department store. Moreover, the pilot tested system of this research provides OLAP tools for interactive analysis of multidimensional data of geographically various granularities, which facilitate effective spatial data mining. focused on the analysis methodology, the case study is aiming to use GIS and clustering for knowledge discovery. Especially, the importance of this study is in the use of snowflake schema model capabilities for GIS framework.

  • PDF

Signed Hellinger measure for directional association (연관성 방향을 고려한 부호 헬링거 측도의 제안)

  • Park, Hee Chang
    • Journal of the Korean Data and Information Science Society
    • /
    • v.27 no.2
    • /
    • pp.353-362
    • /
    • 2016
  • By Wikipedia, data mining is the process of discovering patterns in a big data set involving methods at the intersection of association rule, decision tree, clustering, artificial intelligence, machine learning. and database systems. Association rule is a method for discovering interesting relations between items in large transactions by interestingness measures. Association rule interestingness measures play a major role within a knowledge discovery process in databases, and have been developed by many researchers. Among them, the Hellinger measure is a good association threshold considering the information content and the generality of a rule. But it has the drawback that it can not determine the direction of the association. In this paper we proposed a signed Hellinger measure to be able to interpret operationally, and we checked three conditions of association threshold. Furthermore, we investigated some aspects through a few examples. The results showed that the signed Hellinger measure was better than the Hellinger measure because the signed one was able to estimate the right direction of association.

The World as Seen from Venice (1205-1533) as a Case Study of Scalable Web-Based Automatic Narratives for Interactive Global Histories

  • NANETTI, Andrea;CHEONG, Siew Ann
    • Asian review of World Histories
    • /
    • v.4 no.1
    • /
    • pp.3-34
    • /
    • 2016
  • This introduction is both a statement of a research problem and an account of the first research results for its solution. As more historical databases come online and overlap in coverage, we need to discuss the two main issues that prevent 'big' results from emerging so far. Firstly, historical data are seen by computer science people as unstructured, that is, historical records cannot be easily decomposed into unambiguous fields, like in population (birth and death records) and taxation data. Secondly, machine-learning tools developed for structured data cannot be applied as they are for historical research. We propose a complex network, narrative-driven approach to mining historical databases. In such a time-integrated network obtained by overlaying records from historical databases, the nodes are actors, while thelinks are actions. In the case study that we present (the world as seen from Venice, 1205-1533), the actors are governments, while the actions are limited to war, trade, and treaty to keep the case study tractable. We then identify key periods, key events, and hence key actors, key locations through a time-resolved examination of the actions. This tool allows historians to deal with historical data issues (e.g., source provenance identification, event validation, trade-conflict-diplomacy relationships, etc.). On a higher level, this automatic extraction of key narratives from a historical database allows historians to formulate hypotheses on the courses of history, and also allow them to test these hypotheses in other actions or in additional data sets. Our vision is that this narrative-driven analysis of historical data can lead to the development of multiple scale agent-based models, which can be simulated on a computer to generate ensembles of counterfactual histories that would deepen our understanding of how our actual history developed the way it did. The generation of such narratives, automatically and in a scalable way, will revolutionize the practice of history as a discipline, because historical knowledge, that is the treasure of human experiences (i.e. the heritage of the world), will become what might be inherited by machine learning algorithms and used in smart cities to highlight and explain present ties and illustrate potential future scenarios and visionarios.

Collaboration Framework based on Social Semantic Web for Cloud Systems (클라우드 시스템에서 소셜 시멘틱 웹 기반 협력 프레임 워크)

  • Mateo, Romeo Mark A.;Yang, Hyun-Ho;Lee, Jae-Wan
    • Journal of Internet Computing and Services
    • /
    • v.13 no.1
    • /
    • pp.65-74
    • /
    • 2012
  • Cloud services are used for improving business. Moreover, customer relationship management(CRM) approaches use social networking as tools to enhance services to customers. However, most cloud systems do not support the semantic structures, and because of this, vital information from social network sites is still hard to process and use for business strategy. This paper proposes a collaboration framework based on social semantic web for cloud system. The proposed framework consists of components to support social semantic web to provide an efficient collaboration system for cloud consumers and service providers. The knowledge acquisition module extracts rules from data gathered by social agents and these rules are used for collaboration and business strategy. This paper showed the implementations of processing of social network site data in the proposed semantic model and pattern extraction which was used for the virtual grouping of cloud service providers for efficient collaboration.

A Study on the Development and Maintenance of Embedded SQL based Information Systems (임베디드 SQL 기반 정보시스템의 개발 및 관리 방법에 대한 연구)

  • Song, Yong-Uk
    • The Journal of Information Systems
    • /
    • v.19 no.4
    • /
    • pp.25-49
    • /
    • 2010
  • As companies introduced ERP (Enterprise Resource Planning) systems since the middle of 1990s, the databases of the companies has become centralized and gigantic. The companies are now developing data-mining based applications on those centralized and gigantic databases for knowledge management. Almost of them are using $Pro^*C$/C++, a embedded SQL programming language, and it's because the $Pro^*C$/C++ is independent of platforms and also fast. However, they suffer from difficulties in development and maintenance due to the characteristics of corporate databases which have intrinsically large number of tables and fields. The purpose of this research is to design and implement a methodology which makes it easier to develop and maintain embedded SQL applications based on relational databases. Firstly, this article analyzes the syntax of $Pro^*C$/C++ and addresses the concept of repetition and duplication which causes the difficulties in development and maintenance of corporate information systems. Then, this article suggests a management architecture of source codes and databases in which a preprocessor generates $Pro^*C$/C++ source codes by referring a DB table specification, which would solve the problem of repetition and duplication. Moreover, this article also suggests another architecture of DB administration in which the preprocessor generates DB administration commands by referring the same table specification, which would solve the problem of repetition and duplication again. The preprocessor, named $PrePro^*C$, has been developed under the UNIX command-line prompt environment to preprocess $Pro^*C$/C++ source codes and SQL administration commands, and is under update to be used in another DB interface environment like ODBC and JDBC, too.

GEDA: New Knowledge Base of Gene Expression in Drug Addiction

  • Suh, Young-Ju;Yang, Moon-Hee;Yoon, Suk-Joon;Park, Jong-Hoon
    • BMB Reports
    • /
    • v.39 no.4
    • /
    • pp.441-447
    • /
    • 2006
  • Abuse of drugs can elicit compulsive drug seeking behaviors upon repeated administration, and ultimately leads to the phenomenon of addiction. We developed a procedure for the standardization of microarray gene expression data of rat brain in drug addiction and stored them in a single integrated database system, focusing on more effective data processing and interpretation. Another characteristic of the present database is that it has a systematic flexibility for statistical analysis and linking with other databases. Basically, we adopt an intelligent SQL querying system, as the foundation of our DB, in order to set up an interactive module which can automatically read the raw gene expression data in the standardized format. We maximize the usability of this DB, helping users study significant gene expression and identify biological function of the genes through integrated up-to-date gene information such as GO annotation and metabolic pathway. For collecting the latest information of selected gene from the database, we also set up the local BLAST search engine and non-redundant sequence database updated by NCBI server on a daily basis. We find that the present database is a useful query interface and data-mining tool, specifically for finding out the genes related to drug addiction. We apply this system to the identification and characterization of methamphetamine-induced genes' behavior in rat brain.

KONG-DB: Korean Novel Geo-name DB & Search and Visualization System Using Dictionary from the Web (KONG-DB: 웹 상의 어휘 사전을 활용한 한국 소설 지명 DB, 검색 및 시각화 시스템)

  • Park, Sung Hee
    • Journal of the Korean Society for information Management
    • /
    • v.33 no.3
    • /
    • pp.321-343
    • /
    • 2016
  • This study aimed to design a semi-automatic web-based pilot system 1) to build a Korean novel geo-name, 2) to update the database using automatic geo-name extraction for a scalable database, and 3) to retrieve/visualize the usage of an old geo-name on the map. In particular, the problem of extracting novel geo-names, which are currently obsolete, is difficult to solve because obtaining a corpus used for training dataset is burden. To build a corpus for training data, an admin tool, HTML crawler and parser in Python, crawled geo-names and usages from a vocabulary dictionary for Korean New Novel enough to train a named entity tagger for extracting even novel geo-names not shown up in a training corpus. By means of a training corpus and an automatic extraction tool, the geo-name database was made scalable. In addition, the system can visualize the geo-name on the map. The work of study also designed, implemented the prototype and empirically verified the validity of the pilot system. Lastly, items to be improved have also been addressed.

Comparison of Readability between Documents in the Community Question-Answering (질의응답 커뮤니티에서 문서 간 이독성 비교)

  • Mun, Gil-Seong
    • The Journal of the Korea Contents Association
    • /
    • v.20 no.10
    • /
    • pp.25-34
    • /
    • 2020
  • Community question and answering service is one of the main sources of information and knowledge in the Web. The quality of information in question and answer documents is determined by the clarity of the question and the relevance of the answers, and the readability of a document is a key factor for evaluating the quality. This study is to measure the quality of documents used in community question and answering service. For this purpose, we compare the frequency of occurrence by vocabulary level used in community documents and measure the readability index of documents by institution of author. To measure the readability index, we used the Dale-Chall formula which is calculated by vocabulary level and sentence length. The results show that the vocabulary used in the answers is more difficult than in the questions and the sentence length is longer. The gap in readability between questions and answers is also found by writing institution. The results of this study can be used as basic data for improving online counseling services.

Automatic Construction of Class Hierarchies and Named Entity Dictionaries using Korean Wikipedia (한국어 위키피디아를 이용한 분류체계 생성과 개체명 사전 자동 구축)

  • Bae, Sang-Joon;Ko, Young-Joong
    • Journal of KIISE:Computing Practices and Letters
    • /
    • v.16 no.4
    • /
    • pp.492-496
    • /
    • 2010
  • Wikipedia as an open encyclopedia contains immense human knowledge written by thousands of volunteer editors and its reliability is also high. In this paper, we propose to automatically construct a Korean named entity dictionary using the several features of the Wikipedia. Firstly, we generate class hierarchies using the class information from each article of Wikipedia. Secondly, the titles of each article are mapped to our class hierarchies, and then we calculate the entropy value of the root node in each class hierarchy. Finally, we construct named entity dictionary with high performance by removing the class hierarchies which have a higher entropy value than threshold. Our experiment results achieved overall F1-measure of 81.12% (precision : 83.94%, recall : 78.48%).