• Title/Summary/Keyword: Search algorithms

Search Result 1,328, Processing Time 0.037 seconds

Relational Database SQL Test Auto-scoring System

  • Hur, Tai-Sung
    • Journal of the Korea Society of Computer and Information
    • /
    • v.24 no.11
    • /
    • pp.127-133
    • /
    • 2019
  • SQL is the most common language in data processing. Therefore, most of the colleges offer SQL in their curriculum. In this research, an auto scoring SQL test is proposed for the efficient results of SQL education. The system was treated with algorithms instead of using expensive DBMS(Data Base Management System) for automatic scoring, and satisfactory results were produced. For this system, the test question bank was established out of 'personnel management' and 'academic management'. It provides users with different sets of test each time. Scoring was done by dividing tables into two sections. The one that does not change the table(select) and the other that actually changes the table(update, insert, delete). In the case of a search, the answer and response were executed at first and then the results were compared and processed, the user's answers are evaluated by comparing the table with the correct answer. Modification, insertion, and deletion of table actually changes the data table, so data was restored by using ROLLBACK command. This system was implemented and tested 772 times on the 88 students in Computer Information Division of our college. The results of the implementation show that the average scoring time for a test consisting of 10 questions is 0.052 seconds, and the performance of this system is distinguished considering that multiple responses cannot be processed at the same time by a human grader, we want to develop a problem system that takes into account the difficulty of the problem into account near future.

Implementation of Specific Target Detection and Tracking Technique using Re-identification Technology based on public Multi-CCTV (공공 다중CCTV 기반에서 재식별 기술을 활용한 특정대상 탐지 및 추적기법 구현)

  • Hwang, Joo-Sung;Nguyen, Thanh Hai;Kang, Soo-Kyung;Kim, Young-Kyu;Kim, Joo-Yong;Chung, Myoung-Sug;Lee, Jooyeoun
    • The Journal of the Institute of Internet, Broadcasting and Communication
    • /
    • v.22 no.4
    • /
    • pp.49-57
    • /
    • 2022
  • The government is making great efforts to prevent crimes such as missing children by using public CCTVs. However, there is a shortage of operating manpower, weakening of concentration due to long-term concentration, and difficulty in tracking. In addition, applying real-time object search, re-identification, and tracking through a deep learning algorithm showed a phenomenon of increased parameters and insufficient memory for speed reduction due to complex network analysis. In this paper, we designed the network to improve speed and save memory through the application of Yolo v4, which can recognize real-time objects, and the application of Batch and TensorRT technology. In this thesis, based on the research on these advanced algorithms, OSNet re-ranking and K-reciprocal nearest neighbor for re-identification, Jaccard distance dissimilarity measurement algorithm for correlation, etc. are developed and used in the solution of CCTV national safety identification and tracking system. As a result, we propose a solution that can track objects by recognizing and re-identification objects in real-time within situation of a Korean public multi-CCTV environment through a set of algorithm combinations.

Trends in the Use of Artificial Intelligence in Medical Image Analysis (의료영상 분석에서 인공지능 이용 동향)

  • Lee, Gil-Jae;Lee, Tae-Soo
    • Journal of the Korean Society of Radiology
    • /
    • v.16 no.4
    • /
    • pp.453-462
    • /
    • 2022
  • In this paper, the artificial intelligence (AI) technology used in the medical image analysis field was analyzed through a literature review. Literature searches were conducted on PubMed, ResearchGate, Google and Cochrane Review using the key word. Through literature search, 114 abstracts were searched, and 98 abstracts were reviewed, excluding 16 duplicates. In the reviewed literature, AI is applied in classification, localization, disease detection, disease segmentation, and fit degree of registration images. In machine learning (ML), prior feature extraction and inputting the extracted feature values into the neural network have disappeared. Instead, it appears that the neural network is changing to a deep learning (DL) method with multiple hidden layers. The reason is thought to be that feature extraction is processed in the DL process due to the increase in the amount of memory of the computer, the improvement of the calculation speed, and the construction of big data. In order to apply the analysis of medical images using AI to medical care, the role of physicians is important. Physicians must be able to interpret and analyze the predictions of AI algorithms. Additional medical education and professional development for existing physicians is needed to understand AI. Also, it seems that a revised curriculum for learners in medical school is needed.

Machine- and Deep Learning Modelling Trends for Predicting Harmful Cyanobacterial Cells and Associated Metabolites Concentration in Inland Freshwaters: Comparison of Algorithms, Input Variables, and Learning Data Number (담수 유해남조 세포수·대사물질 농도 예측을 위한 머신러닝과 딥러닝 모델링 연구동향: 알고리즘, 입력변수 및 학습 데이터 수 비교)

  • Yongeun Park;Jin Hwi Kim;Hankyu Lee;Seohyun Byeon;Soon-Jin Hwang;Jae-Ki Shin
    • Korean Journal of Ecology and Environment
    • /
    • v.56 no.3
    • /
    • pp.268-279
    • /
    • 2023
  • Nowadays, artificial intelligence model approaches such as machine and deep learning have been widely used to predict variations of water quality in various freshwater bodies. In particular, many researchers have tried to predict the occurrence of cyanobacterial blooms in inland water, which pose a threat to human health and aquatic ecosystems. Therefore, the objective of this study were to: 1) review studies on the application of machine learning models for predicting the occurrence of cyanobacterial blooms and its metabolites and 2) prospect for future study on the prediction of cyanobacteria by machine learning models including deep learning. In this study, a systematic literature search and review were conducted using SCOPUS, which is Elsevier's abstract and citation database. The key results showed that deep learning models were usually used to predict cyanobacterial cells, while machine learning models focused on predicting cyanobacterial metabolites such as concentrations of microcystin, geosmin, and 2-methylisoborneol (2-MIB) in reservoirs. There was a distinct difference in the use of input variables to predict cyanobacterial cells and metabolites. The application of deep learning models through the construction of big data may be encouraged to build accurate models to predict cyanobacterial metabolites.

Optimal Sensor Placement for Improved Prediction Accuracy of Structural Responses in Model Test of Multi-Linked Floating Offshore Systems Using Genetic Algorithms (다중연결 해양부유체의 모형시험 구조응답 예측정확도 향상을 위한 유전알고리즘을 이용한 센서배치 최적화)

  • Kichan Sim;Kangsu Lee
    • Journal of the Computational Structural Engineering Institute of Korea
    • /
    • v.37 no.3
    • /
    • pp.163-171
    • /
    • 2024
  • Structural health monitoring for ships and offshore structures is important in various aspects. Ships and offshore structures are continuously exposed to various environmental conditions, such as waves, wind, and currents. In the event of an accident, immense economic losses, environmental pollution, and safety problems can occur, so it is necessary to detect structural damage or defects early. In this study, structural response data of multi-linked floating offshore structures under various wave load conditions was calculated by performing fluid-structure coupled analysis. Furthermore, the order reduction method with distortion base mode was applied to the structures for predicting the structural response by using the results of numerical analysis. The distortion base mode order reduction method can predict the structural response of a desired area with high accuracy, but prediction performance is affected by sensor arrangement. Optimization based on a genetic algorithm was performed to search for optimal sensor arrangement and improve the prediction performance of the distortion base mode-based reduced-order model. Consequently, a sensor arrangement that predicted the structural response with an error of about 84.0% less than the initial sensor arrangement was derived based on the root mean squared error, which is a prediction performance evaluation index. The computational cost was reduced by about 8 times compared to evaluating the prediction performance of reduced-order models for a total of 43,758 sensor arrangement combinations. and the expected performance was overturned to approximately 84.0% based on sensor placement, including the largest square root error.

Selective Word Embedding for Sentence Classification by Considering Information Gain and Word Similarity (문장 분류를 위한 정보 이득 및 유사도에 따른 단어 제거와 선택적 단어 임베딩 방안)

  • Lee, Min Seok;Yang, Seok Woo;Lee, Hong Joo
    • Journal of Intelligence and Information Systems
    • /
    • v.25 no.4
    • /
    • pp.105-122
    • /
    • 2019
  • Dimensionality reduction is one of the methods to handle big data in text mining. For dimensionality reduction, we should consider the density of data, which has a significant influence on the performance of sentence classification. It requires lots of computations for data of higher dimensions. Eventually, it can cause lots of computational cost and overfitting in the model. Thus, the dimension reduction process is necessary to improve the performance of the model. Diverse methods have been proposed from only lessening the noise of data like misspelling or informal text to including semantic and syntactic information. On top of it, the expression and selection of the text features have impacts on the performance of the classifier for sentence classification, which is one of the fields of Natural Language Processing. The common goal of dimension reduction is to find latent space that is representative of raw data from observation space. Existing methods utilize various algorithms for dimensionality reduction, such as feature extraction and feature selection. In addition to these algorithms, word embeddings, learning low-dimensional vector space representations of words, that can capture semantic and syntactic information from data are also utilized. For improving performance, recent studies have suggested methods that the word dictionary is modified according to the positive and negative score of pre-defined words. The basic idea of this study is that similar words have similar vector representations. Once the feature selection algorithm selects the words that are not important, we thought the words that are similar to the selected words also have no impacts on sentence classification. This study proposes two ways to achieve more accurate classification that conduct selective word elimination under specific regulations and construct word embedding based on Word2Vec embedding. To select words having low importance from the text, we use information gain algorithm to measure the importance and cosine similarity to search for similar words. First, we eliminate words that have comparatively low information gain values from the raw text and form word embedding. Second, we select words additionally that are similar to the words that have a low level of information gain values and make word embedding. In the end, these filtered text and word embedding apply to the deep learning models; Convolutional Neural Network and Attention-Based Bidirectional LSTM. This study uses customer reviews on Kindle in Amazon.com, IMDB, and Yelp as datasets, and classify each data using the deep learning models. The reviews got more than five helpful votes, and the ratio of helpful votes was over 70% classified as helpful reviews. Also, Yelp only shows the number of helpful votes. We extracted 100,000 reviews which got more than five helpful votes using a random sampling method among 750,000 reviews. The minimal preprocessing was executed to each dataset, such as removing numbers and special characters from text data. To evaluate the proposed methods, we compared the performances of Word2Vec and GloVe word embeddings, which used all the words. We showed that one of the proposed methods is better than the embeddings with all the words. By removing unimportant words, we can get better performance. However, if we removed too many words, it showed that the performance was lowered. For future research, it is required to consider diverse ways of preprocessing and the in-depth analysis for the co-occurrence of words to measure similarity values among words. Also, we only applied the proposed method with Word2Vec. Other embedding methods such as GloVe, fastText, ELMo can be applied with the proposed methods, and it is possible to identify the possible combinations between word embedding methods and elimination methods.

Development and Performance Evaluation of Multi-sensor Module for Use in Disaster Sites of Mobile Robot (조사로봇의 재난현장 활용을 위한 다중센서모듈 개발 및 성능평가에 관한 연구)

  • Jung, Yonghan;Hong, Junwooh;Han, Soohee;Shin, Dongyoon;Lim, Eontaek;Kim, Seongsam
    • Korean Journal of Remote Sensing
    • /
    • v.38 no.6_3
    • /
    • pp.1827-1836
    • /
    • 2022
  • Disasters that occur unexpectedly are difficult to predict. In addition, the scale and damage are increasing compared to the past. Sometimes one disaster can develop into another disaster. Among the four stages of disaster management, search and rescue are carried out in the response stage when an emergency occurs. Therefore, personnel such as firefighters who are put into the scene are put in at a lot of risk. In this respect, in the initial response process at the disaster site, robots are a technology with high potential to reduce damage to human life and property. In addition, Light Detection And Ranging (LiDAR) can acquire a relatively wide range of 3D information using a laser. Due to its high accuracy and precision, it is a very useful sensor when considering the characteristics of a disaster site. Therefore, in this study, development and experiments were conducted so that the robot could perform real-time monitoring at the disaster site. Multi-sensor module was developed by combining LiDAR, Inertial Measurement Unit (IMU) sensor, and computing board. Then, this module was mounted on the robot, and a customized Simultaneous Localization and Mapping (SLAM) algorithm was developed. A method for stably mounting a multi-sensor module to a robot to maintain optimal accuracy at disaster sites was studied. And to check the performance of the module, SLAM was tested inside the disaster building, and various SLAM algorithms and distance comparisons were performed. As a result, PackSLAM developed in this study showed lower error compared to other algorithms, showing the possibility of application in disaster sites. In the future, in order to further enhance usability at disaster sites, various experiments will be conducted by establishing a rough terrain environment with many obstacles.

Multi-day Trip Planning System with Collaborative Recommendation (협업적 추천 기반의 여행 계획 시스템)

  • Aprilia, Priska;Oh, Kyeong-Jin;Hong, Myung-Duk;Ga, Myeong-Hyeon;Jo, Geun-Sik
    • Journal of Intelligence and Information Systems
    • /
    • v.22 no.1
    • /
    • pp.159-185
    • /
    • 2016
  • Planning a multi-day trip is a complex, yet time-consuming task. It usually starts with selecting a list of points of interest (POIs) worth visiting and then arranging them into an itinerary, taking into consideration various constraints and preferences. When choosing POIs to visit, one might ask friends to suggest them, search for information on the Web, or seek advice from travel agents; however, those options have their limitations. First, the knowledge of friends is limited to the places they have visited. Second, the tourism information on the internet may be vast, but at the same time, might cause one to invest a lot of time reading and filtering the information. Lastly, travel agents might be biased towards providers of certain travel products when suggesting itineraries. In recent years, many researchers have tried to deal with the huge amount of tourism information available on the internet. They explored the wisdom of the crowd through overwhelming images shared by people on social media sites. Furthermore, trip planning problems are usually formulated as 'Tourist Trip Design Problems', and are solved using various search algorithms with heuristics. Various recommendation systems with various techniques have been set up to cope with the overwhelming tourism information available on the internet. Prediction models of recommendation systems are typically built using a large dataset. However, sometimes such a dataset is not always available. For other models, especially those that require input from people, human computation has emerged as a powerful and inexpensive approach. This study proposes CYTRIP (Crowdsource Your TRIP), a multi-day trip itinerary planning system that draws on the collective intelligence of contributors in recommending POIs. In order to enable the crowd to collaboratively recommend POIs to users, CYTRIP provides a shared workspace. In the shared workspace, the crowd can recommend as many POIs to as many requesters as they can, and they can also vote on the POIs recommended by other people when they find them interesting. In CYTRIP, anyone can make a contribution by recommending POIs to requesters based on requesters' specified preferences. CYTRIP takes input on the recommended POIs to build a multi-day trip itinerary taking into account the user's preferences, the various time constraints, and the locations. The input then becomes a multi-day trip planning problem that is formulated in Planning Domain Definition Language 3 (PDDL3). A sequence of actions formulated in a domain file is used to achieve the goals in the planning problem, which are the recommended POIs to be visited. The multi-day trip planning problem is a highly constrained problem. Sometimes, it is not feasible to visit all the recommended POIs with the limited resources available, such as the time the user can spend. In order to cope with an unachievable goal that can result in no solution for the other goals, CYTRIP selects a set of feasible POIs prior to the planning process. The planning problem is created for the selected POIs and fed into the planner. The solution returned by the planner is then parsed into a multi-day trip itinerary and displayed to the user on a map. The proposed system is implemented as a web-based application built using PHP on a CodeIgniter Web Framework. In order to evaluate the proposed system, an online experiment was conducted. From the online experiment, results show that with the help of the contributors, CYTRIP can plan and generate a multi-day trip itinerary that is tailored to the users' preferences and bound by their constraints, such as location or time constraints. The contributors also find that CYTRIP is a useful tool for collecting POIs from the crowd and planning a multi-day trip.

A New Approach to Automatic Keyword Generation Using Inverse Vector Space Model (키워드 자동 생성에 대한 새로운 접근법: 역 벡터공간모델을 이용한 키워드 할당 방법)

  • Cho, Won-Chin;Rho, Sang-Kyu;Yun, Ji-Young Agnes;Park, Jin-Soo
    • Asia pacific journal of information systems
    • /
    • v.21 no.1
    • /
    • pp.103-122
    • /
    • 2011
  • Recently, numerous documents have been made available electronically. Internet search engines and digital libraries commonly return query results containing hundreds or even thousands of documents. In this situation, it is virtually impossible for users to examine complete documents to determine whether they might be useful for them. For this reason, some on-line documents are accompanied by a list of keywords specified by the authors in an effort to guide the users by facilitating the filtering process. In this way, a set of keywords is often considered a condensed version of the whole document and therefore plays an important role for document retrieval, Web page retrieval, document clustering, summarization, text mining, and so on. Since many academic journals ask the authors to provide a list of five or six keywords on the first page of an article, keywords are most familiar in the context of journal articles. However, many other types of documents could not benefit from the use of keywords, including Web pages, email messages, news reports, magazine articles, and business papers. Although the potential benefit is large, the implementation itself is the obstacle; manually assigning keywords to all documents is a daunting task, or even impractical in that it is extremely tedious and time-consuming requiring a certain level of domain knowledge. Therefore, it is highly desirable to automate the keyword generation process. There are mainly two approaches to achieving this aim: keyword assignment approach and keyword extraction approach. Both approaches use machine learning methods and require, for training purposes, a set of documents with keywords already attached. In the former approach, there is a given set of vocabulary, and the aim is to match them to the texts. In other words, the keywords assignment approach seeks to select the words from a controlled vocabulary that best describes a document. Although this approach is domain dependent and is not easy to transfer and expand, it can generate implicit keywords that do not appear in a document. On the other hand, in the latter approach, the aim is to extract keywords with respect to their relevance in the text without prior vocabulary. In this approach, automatic keyword generation is treated as a classification task, and keywords are commonly extracted based on supervised learning techniques. Thus, keyword extraction algorithms classify candidate keywords in a document into positive or negative examples. Several systems such as Extractor and Kea were developed using keyword extraction approach. Most indicative words in a document are selected as keywords for that document and as a result, keywords extraction is limited to terms that appear in the document. Therefore, keywords extraction cannot generate implicit keywords that are not included in a document. According to the experiment results of Turney, about 64% to 90% of keywords assigned by the authors can be found in the full text of an article. Inversely, it also means that 10% to 36% of the keywords assigned by the authors do not appear in the article, which cannot be generated through keyword extraction algorithms. Our preliminary experiment result also shows that 37% of keywords assigned by the authors are not included in the full text. This is the reason why we have decided to adopt the keyword assignment approach. In this paper, we propose a new approach for automatic keyword assignment namely IVSM(Inverse Vector Space Model). The model is based on a vector space model. which is a conventional information retrieval model that represents documents and queries by vectors in a multidimensional space. IVSM generates an appropriate keyword set for a specific document by measuring the distance between the document and the keyword sets. The keyword assignment process of IVSM is as follows: (1) calculating the vector length of each keyword set based on each keyword weight; (2) preprocessing and parsing a target document that does not have keywords; (3) calculating the vector length of the target document based on the term frequency; (4) measuring the cosine similarity between each keyword set and the target document; and (5) generating keywords that have high similarity scores. Two keyword generation systems were implemented applying IVSM: IVSM system for Web-based community service and stand-alone IVSM system. Firstly, the IVSM system is implemented in a community service for sharing knowledge and opinions on current trends such as fashion, movies, social problems, and health information. The stand-alone IVSM system is dedicated to generating keywords for academic papers, and, indeed, it has been tested through a number of academic papers including those published by the Korean Association of Shipping and Logistics, the Korea Research Academy of Distribution Information, the Korea Logistics Society, the Korea Logistics Research Association, and the Korea Port Economic Association. We measured the performance of IVSM by the number of matches between the IVSM-generated keywords and the author-assigned keywords. According to our experiment, the precisions of IVSM applied to Web-based community service and academic journals were 0.75 and 0.71, respectively. The performance of both systems is much better than that of baseline systems that generate keywords based on simple probability. Also, IVSM shows comparable performance to Extractor that is a representative system of keyword extraction approach developed by Turney. As electronic documents increase, we expect that IVSM proposed in this paper can be applied to many electronic documents in Web-based community and digital library.

Resolving the 'Gray sheep' Problem Using Social Network Analysis (SNA) in Collaborative Filtering (CF) Recommender Systems (소셜 네트워크 분석 기법을 활용한 협업필터링의 특이취향 사용자(Gray Sheep) 문제 해결)

  • Kim, Minsung;Im, Il
    • Journal of Intelligence and Information Systems
    • /
    • v.20 no.2
    • /
    • pp.137-148
    • /
    • 2014
  • Recommender system has become one of the most important technologies in e-commerce in these days. The ultimate reason to shop online, for many consumers, is to reduce the efforts for information search and purchase. Recommender system is a key technology to serve these needs. Many of the past studies about recommender systems have been devoted to developing and improving recommendation algorithms and collaborative filtering (CF) is known to be the most successful one. Despite its success, however, CF has several shortcomings such as cold-start, sparsity, gray sheep problems. In order to be able to generate recommendations, ordinary CF algorithms require evaluations or preference information directly from users. For new users who do not have any evaluations or preference information, therefore, CF cannot come up with recommendations (Cold-star problem). As the numbers of products and customers increase, the scale of the data increases exponentially and most of the data cells are empty. This sparse dataset makes computation for recommendation extremely hard (Sparsity problem). Since CF is based on the assumption that there are groups of users sharing common preferences or tastes, CF becomes inaccurate if there are many users with rare and unique tastes (Gray sheep problem). This study proposes a new algorithm that utilizes Social Network Analysis (SNA) techniques to resolve the gray sheep problem. We utilize 'degree centrality' in SNA to identify users with unique preferences (gray sheep). Degree centrality in SNA refers to the number of direct links to and from a node. In a network of users who are connected through common preferences or tastes, those with unique tastes have fewer links to other users (nodes) and they are isolated from other users. Therefore, gray sheep can be identified by calculating degree centrality of each node. We divide the dataset into two, gray sheep and others, based on the degree centrality of the users. Then, different similarity measures and recommendation methods are applied to these two datasets. More detail algorithm is as follows: Step 1: Convert the initial data which is a two-mode network (user to item) into an one-mode network (user to user). Step 2: Calculate degree centrality of each node and separate those nodes having degree centrality values lower than the pre-set threshold. The threshold value is determined by simulations such that the accuracy of CF for the remaining dataset is maximized. Step 3: Ordinary CF algorithm is applied to the remaining dataset. Step 4: Since the separated dataset consist of users with unique tastes, an ordinary CF algorithm cannot generate recommendations for them. A 'popular item' method is used to generate recommendations for these users. The F measures of the two datasets are weighted by the numbers of nodes and summed to be used as the final performance metric. In order to test performance improvement by this new algorithm, an empirical study was conducted using a publically available dataset - the MovieLens data by GroupLens research team. We used 100,000 evaluations by 943 users on 1,682 movies. The proposed algorithm was compared with an ordinary CF algorithm utilizing 'Best-N-neighbors' and 'Cosine' similarity method. The empirical results show that F measure was improved about 11% on average when the proposed algorithm was used

    . Past studies to improve CF performance typically used additional information other than users' evaluations such as demographic data. Some studies applied SNA techniques as a new similarity metric. This study is novel in that it used SNA to separate dataset. This study shows that performance of CF can be improved, without any additional information, when SNA techniques are used as proposed. This study has several theoretical and practical implications. This study empirically shows that the characteristics of dataset can affect the performance of CF recommender systems. This helps researchers understand factors affecting performance of CF. This study also opens a door for future studies in the area of applying SNA to CF to analyze characteristics of dataset. In practice, this study provides guidelines to improve performance of CF recommender systems with a simple modification.


  • (34141) Korea Institute of Science and Technology Information, 245, Daehak-ro, Yuseong-gu, Daejeon
    Copyright (C) KISTI. All Rights Reserved.