• Title/Summary/Keyword: Target-Specific Dataset

Search Result 18, Processing Time 0.021 seconds

Specialized Dataset Extraction Method for Developing Optimal Pedestrian Detection Model (최적의 객체 검출 모델 개발을 위한 특화 데이터 세트 추출 방법)

  • Chun-Su Park
    • Journal of the Semiconductor & Display Technology
    • /
    • v.23 no.3
    • /
    • pp.135-139
    • /
    • 2024
  • Public datasets, which are freely available and often labeled, play a crucial role in training object detection models in computer vision. While public datasets are effective for developing general object detection models, they may not be ideal for specialized tasks. For specific object detection needs, it is more beneficial to create and use a dataset tailored to the target object. This paper proposes a method for extracting a target-specific dataset from public datasets to develop object detection models with superior performance for the target object. This approach not only improves detection accuracy, but also reduces training data requirements and complexity. We evaluate the performance of the proposed method using the latest object detection model YOLOv10.

  • PDF

WebSHArk 1.0: A Benchmark Collection for Malicious Web Shell Detection

  • Kim, Jinsuk;Yoo, Dong-Hoon;Jang, Heejin;Jeong, Kimoon
    • Journal of Information Processing Systems
    • /
    • v.11 no.2
    • /
    • pp.229-238
    • /
    • 2015
  • Web shells are programs that are written for a specific purpose in Web scripting languages, such as PHP, ASP, ASP.NET, JSP, PERL-CGI, etc. Web shells provide a means to communicate with the server's operating system via the interpreter of the web scripting languages. Hence, web shells can execute OS specific commands over HTTP. Usually, web attacks by malicious users are made by uploading one of these web shells to compromise the target web servers. Though there have been several approaches to detect such malicious web shells, no standard dataset has been built to compare various web shell detection techniques. In this paper, we present a collection of web shell files, WebSHArk 1.0, as a standard dataset for current and future studies in malicious web shell detection. To provide baseline results for future studies and for the improvement of current tools, we also present some benchmark results by scanning the WebSHArk dataset directory with three web shell scanning tools that are publicly available on the Internet. The WebSHArk 1.0 dataset is only available upon request via email to one of the authors, due to security and legal issues.

Data mining approach to predicting user's past location

  • Lee, Eun Min;Lee, Kun Chang
    • Journal of the Korea Society of Computer and Information
    • /
    • v.22 no.11
    • /
    • pp.97-104
    • /
    • 2017
  • Location prediction has been successfully utilized to provide high quality of location-based services to customers in many applications. In its usual form, the conventional type of location prediction is to predict future locations based on user's past movement history. However, as location prediction needs are expanded into much complicated cases, it becomes necessary quite frequently to make inference on the locations that target user visited in the past. Typical cases include the identification of locations that infectious disease carriers may have visited before, and crime suspects may have dropped by on a certain day at a specific time-band. Therefore, primary goal of this study is to predict locations that users visited in the past. Information used for this purpose include user's demographic information and movement histories. Data mining classifiers such as Bayesian network, neural network, support vector machine, decision tree were adopted to analyze 6868 contextual dataset and compare classifiers' performance. Results show that general Bayesian network is the most robust classifier.

An Effective Orientation-based Method and Parameter Space Discretization for Defined Object Segmentation

  • Nguyen, Huy Hoang;Lee, GueeSang;Kim, SooHyung;Yang, HyungJeong
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.7 no.12
    • /
    • pp.3180-3199
    • /
    • 2013
  • While non-predefined object segmentation (NDOS) distinguishes an arbitrary self-assumed object from its background, predefined object segmentation (DOS) pre-specifies the target object. In this paper, a new and novel method to segment predefined objects is presented, by globally optimizing an orientation-based objective function that measures the fitness of the object boundary, in a discretized parameter space. A specific object is explicitly described by normalized discrete sets of boundary points and corresponding normal vectors with respect to its plane shape. The orientation factor provides robust distinctness for target objects. By considering the order of transformation elements, and their dependency on the derived over-segmentation outcome, the domain of translations and scales is efficiently discretized. A branch and bound algorithm is used to determine the transformation parameters of a shape model corresponding to a target object in an image. The results tested on the PASCAL dataset show a considerable achievement in solving complex backgrounds and unclear boundary images.

A Study on Data Quality Evaluation of Administrative Information Dataset (행정정보데이터세트의 데이터 품질평가 연구)

  • Song, Chiho;Yim, Jinhee
    • The Korean Journal of Archival Studies
    • /
    • no.71
    • /
    • pp.237-272
    • /
    • 2022
  • In 2019, the pilot project to establish a record management system for administrative information datasets started in earnest under the leadership of the National Archives. Based on the results of the three-year project by 2021, the improved administrative information dataset management plan will be reflected in public records-related laws and guidelines. Through this, the administrative information dataset becomes the target of full-scale public record management. Although public records have been converted to electronic documents and even the datasets of administrative information systems have been included in full-scale public records management, research on the quality requirements of data itself as raw data constituting records is still lacking. If data quality is not guaranteed, all four properties of records will be threatened in the dataset, which is a structure of data and an aggregate of records. Moreover, if the reliability of the quality of the data of the administrative information system built by reflecting the various needs of the working departments of the institution without considering the standards of the standard records management system is insufficient, the reliability of the public records itself can not be secured. This study is based on the administrative information dataset management plan presented in the "Administrative Information Dataset Recorded Information Service and Utilization Model Study" conducted by the National Archives of Korea in 2021. A study was conducted. By referring to various data, especially public data-related policies and guides, which are being promoted across the government, we would like to derive quality evaluation requirements in terms of records management and present specific indicators. Through this, it is expected that it will be helpful for record management of administrative information dataset which will be in full swing in the future.

In - Silico approach and validation of JNK1 Inhibitors for Colon Rectal Cancer Target

  • Bavya, Chandrasekhar;Thirumurthy, Madhavan
    • Journal of Integrative Natural Science
    • /
    • v.15 no.4
    • /
    • pp.145-152
    • /
    • 2022
  • Colon rectal cancer is one of the frequently diagnosed cancers worldwide. In recent times the drug discovery for colon cancer is challenging because of their speedy metastasis and morality of these patients. C-jun N-terminal kinase signaling pathway controls the cell cycle survival and apoptosis. Evidence has shown that JNK1 promotes the tumor progression in various types of cancers like colon cancer, breast cancer and lung cancer. Recent study has shown that inhibiting, JNK1 pathway is identified as one of the important cascades in drug discovery. One of the recent approaches in the field of drug discovery is drug repurposing. In drug repurposing approach we have virtually screened ChEMBL dataset against JNK1 protein and their interactions have been studied through Molecular docking. Cross docking was performed with the top compounds to be more specific with JNK1 comparing the affinity with JNK2 and JNK3.The drugs which exhibited higher binding were subjected to Conceptual - Density functional theory. The results showed mainly Entrectinib and Exatecan showed better binding to the target.

Estimating Interest Levels based on Visitor Behavior Recognition Towards a Guide Robot (안내 로봇을 향한 관람객의 행위 인식 기반 관심도 추정)

  • Ye Jun Lee;Juhyun Kim;Eui-Jung Jung;Min-Gyu Kim
    • The Journal of Korea Robotics Society
    • /
    • v.18 no.4
    • /
    • pp.463-471
    • /
    • 2023
  • This paper proposes a method to estimate the level of interest shown by visitors towards a specific target, a guide robot, in spaces where a large number of visitors, such as exhibition halls and museums, can show interest in a specific subject. To accomplish this, we apply deep learning-based behavior recognition and object tracking techniques for multiple visitors, and based on this, we derive the behavior analysis and interest level of visitors. To implement this research, a personalized dataset tailored to the characteristics of exhibition hall and museum environments was created, and a deep learning model was constructed based on this. Four scenarios that visitors can exhibit were classified, and through this, prediction and experimental values were obtained, thus completing the validation for the interest estimation method proposed in this paper.

QSPR model for the boiling point of diverse organic compounds with applicability domain (다양한 유기화합물의 비등점 예측을 위한 QSPR 모델 및 이의 적용구역)

  • Shin, Seong Eun;Cha, Ji Young;Kim, Kwang-Yon;No, Kyoung Tai
    • Analytical Science and Technology
    • /
    • v.28 no.4
    • /
    • pp.270-277
    • /
    • 2015
  • Boiling point (BP) is one of the most fundamental physicochemical properties of organic compounds to characterize and identify the thermal characteristics of target compounds. Previously developed QSPR equations, however, still had some limitation for the specific compounds, like high-energy molecules, mainly because of the lack of experimental data and less coverage. A large BP dataset of 5,923 solid organic compounds was finally secured in this study, after dedicated pre-filtration of experimental data from different sources, mostly consisting of compounds not only from common organic molecules but also from some specially used molecules, and those dataset was used to build the new BP prediction model. Various machine learning methods were performed for newly collected data based on meaningful 2D descriptor set. Results of combined check showed acceptable validity and robustness of our models, and consensus approaches of each model were also performed. Applicability domain of BP prediction model was shown based on descriptor of training set.

Export Prediction Using Separated Learning Method and Recommendation of Potential Export Countries (분리학습 모델을 이용한 수출액 예측 및 수출 유망국가 추천)

  • Jang, Yeongjin;Won, Jongkwan;Lee, Chaerok
    • Journal of Intelligence and Information Systems
    • /
    • v.28 no.1
    • /
    • pp.69-88
    • /
    • 2022
  • One of the characteristics of South Korea's economic structure is that it is highly dependent on exports. Thus, many businesses are closely related to the global economy and diplomatic situation. In addition, small and medium-sized enterprises(SMEs) specialized in exporting are struggling due to the spread of COVID-19. Therefore, this study aimed to develop a model to forecast exports for next year to support SMEs' export strategy and decision making. Also, this study proposed a strategy to recommend promising export countries of each item based on the forecasting model. We analyzed important variables used in previous studies such as country-specific, item-specific, and macro-economic variables and collected those variables to train our prediction model. Next, through the exploratory data analysis(EDA) it was found that exports, which is a target variable, have a highly skewed distribution. To deal with this issue and improve predictive performance, we suggest a separated learning method. In a separated learning method, the whole dataset is divided into homogeneous subgroups and a prediction algorithm is applied to each group. Thus, characteristics of each group can be more precisely trained using different input variables and algorithms. In this study, we divided the dataset into five subgroups based on the exports to decrease skewness of the target variable. After the separation, we found that each group has different characteristics in countries and goods. For example, In Group 1, most of the exporting countries are developing countries and the majority of exporting goods are low value products such as glass and prints. On the other hand, major exporting countries of South Korea such as China, USA, and Vietnam are included in Group 4 and Group 5 and most exporting goods in these groups are high value products. Then we used LightGBM(LGBM) and Exponential Moving Average(EMA) for prediction. Considering the characteristics of each group, models were built using LGBM for Group 1 to 4 and EMA for Group 5. To evaluate the performance of the model, we compare different model structures and algorithms. As a result, it was found that the separated learning model had best performance compared to other models. After the model was built, we also provided variable importance of each group using SHAP-value to add explainability of our model. Based on the prediction model, we proposed a second-stage recommendation strategy for potential export countries. In the first phase, BCG matrix was used to find Star and Question Mark markets that are expected to grow rapidly. In the second phase, we calculated scores for each country and recommendations were made according to ranking. Using this recommendation framework, potential export countries were selected and information about those countries for each item was presented. There are several implications of this study. First of all, most of the preceding studies have conducted research on the specific situation or country. However, this study use various variables and develops a machine learning model for a wide range of countries and items. Second, as to our knowledge, it is the first attempt to adopt a separated learning method for exports prediction. By separating the dataset into 5 homogeneous subgroups, we could enhance the predictive performance of the model. Also, more detailed explanation of models by group is provided using SHAP values. Lastly, this study has several practical implications. There are some platforms which serve trade information including KOTRA, but most of them are based on past data. Therefore, it is not easy for companies to predict future trends. By utilizing the model and recommendation strategy in this research, trade related services in each platform can be improved so that companies including SMEs can fully utilize the service when making strategies and decisions for exports.

Secondary Literature Analysis: The Marketing Practice to Attract Potential Customers into Leisure and Sports Industry

  • Eungoo KANG;Ji-Hye KIM
    • The Journal of Industrial Distribution & Business
    • /
    • v.14 no.6
    • /
    • pp.1-8
    • /
    • 2023
  • Purpose: The marketing practice for the leisure and sports industry is a complex process that requires a thorough understanding of the audience, their needs and motivations. Thus, this niche market is focused on specific products, services, or experiences. The present research explores and suggests meaningful strategies based on the literature textual dataset to provide how to attract consumers in this sector. Research design, data and methodology: We have conducted the 'Secondary Literature Analysis', reviewing and summarizing numerous findings in the relevant prior studies. As a result, we could obtain a total of 15 significant textual resources which are from only peer-reviewed journal article. All resources had a high quality of the instrument to prove their results. Results: The findings of this research pointed out that marketers in leisure sports sector need to communicate via following methods: (1) Understanding the Customers' Needs and Wants, (2) Social Media, (3) Advertising, (4) Promoting Brand Affinity, (5) Offering Discounts, and (6) Providing Value-Added Services. Conclusions: The present research concludes that the marketing practice in the leisure and sports industry should be performed using various channels. In addition, marketing practitioners are supposed to check if tailored marketing messages are compatible with products, services, and events that relate to their target audience's interests.