• Title/Summary/Keyword: 불균형데이터 처리

Search Result 121, Processing Time 0.024 seconds

Art transaction using big data Artist analysis system implementation (미술품 거래 빅데이터를 이용한 작가 분석 시스템 구현)

  • SeungKyung Lee;JongTae Lim
    • Journal of Service Research and Studies
    • /
    • v.11 no.2
    • /
    • pp.79-93
    • /
    • 2021
  • The size of the domestic art market has increased 21.9% over the past five years as of 2018 to KRW 448.2 billion and the number of transactions has also increased 31.6% to 39,367 points maintaining growth for the fifth consecutive year. Art distribution platforms are diversifying from galleries and auction-style offline to online auctions. The art market consists of three areas: production (creation), distribution (trade), and consumption (buying) of works and as the perception of artistic value as well as economic value spreads interest is also increasing as a means of investment. Consumers who purchase works and think of them as a means of investment technology have an increased need for objective information about their works, but there is a limit to collecting and analyzing objective and reliable statistics because information provision in the art market distribution area is closed and unbalanced. This paper identifies objective and reliable art distribution status and status through big data collection and structured and unstructured data analysis on art market distribution areas. Through this, we want to implement a system that can objectively provide analysis of authors in the current market. This study collected author information from art distribution sites and calculated the frequency of associated words by writer by collecting and analyzing the author's articles from Maeil Business, a daily newspaper. It aims to provide consumers with objective and reliable information.

Ensemble Learning-Based Prediction of Good Sellers in Overseas Sales of Domestic Books and Keyword Analysis of Reviews of the Good Sellers (앙상블 학습 기반 국내 도서의 해외 판매 굿셀러 예측 및 굿셀러 리뷰 키워드 분석)

  • Do Young Kim;Na Yeon Kim;Hyon Hee Kim
    • KIPS Transactions on Software and Data Engineering
    • /
    • v.12 no.4
    • /
    • pp.173-178
    • /
    • 2023
  • As Korean literature spreads around the world, its position in the overseas publishing market has become important. As demand in the overseas publishing market continues to grow, it is essential to predict future book sales and analyze the characteristics of books that have been highly favored by overseas readers in the past. In this study, we proposed ensemble learning based prediction model and analyzed characteristics of the cumulative sales of more than 5,000 copies classified as good sellers published overseas over the past 5 years. We applied the five ensemble learning models, i.e., XGBoost, Gradient Boosting, Adaboost, LightGBM, and Random Forest, and compared them with other machine learning algorithms, i.e., Support Vector Machine, Logistic Regression, and Deep Learning. Our experimental results showed that the ensemble algorithm outperforms other approaches in troubleshooting imbalanced data. In particular, the LightGBM model obtained an AUC value of 99.86% which is the best prediction performance. Among the features used for prediction, the most important feature is the author's number of overseas publications, and the second important feature is publication in countries with the largest publication market size. The number of evaluation participants is also an important feature. In addition, text mining was performed on the four book reviews that sold the most among good-selling books. Many reviews were interested in stories, characters, and writers and it seems that support for translation is needed as many of the keywords of "translation" appear in low-rated reviews.

Intelligent Clustering Mechanism for Efficient Energy Management in Sensor Network (센서 네트워크에서의 효율적 에너지 관리를 위한 지능형 클러스터링 기법)

  • Seo, Sung-Yun;Jung, Won-Soo;Oh, Young-Hwan
    • Journal of the Institute of Electronics Engineers of Korea TC
    • /
    • v.44 no.4
    • /
    • pp.40-48
    • /
    • 2007
  • MANET constructs a network that is free and independent between sensor nodes without infrastructure. Also, there are a lot of difficulties to manage data process, control etc.. back efficiently from change of topology by transfer of sensor node that compose network. Especially, because each sensor node must consider mobility certainly, problem about energy use happens. To solve these problem, mechanisms that compose cluster of cluster header and hierarchic structure between member were suggested. However, accompanies inefficient energy consumption because sensing power level of sensor node is fixed and brings energy imbalance of sensor network and shortening of survival time. In this paper, I suggested intelligent clustering mechanism for efficient energy management to solve these problem of existent Clustering mechanism. Proposed mechanism corresponds fast in network topology change by transfer of sensor node, and compares in existent mechanism in circumstance that require serial sensing and brings elevation survival time of sensor node.Please put the abstract of paper here.

자동차 분야의 CALS/EC 구축 방향

  • 김관영
    • Proceedings of the CALSEC Conference
    • /
    • 1998.10b
    • /
    • pp.585-594
    • /
    • 1998
  • 이미 전자상거래(EC)가 시간적ㆍ공간적 제약을 극복하고 국경을 초월한 새로운 교역시장 (Cyber Market)으로 등장하고 있으며 세계 자동차 산업은 표준부품의 공동개발 및 조달을 통해 중복투자 방지, 신차개발기간 단축 등 전략적 제휴를 통한 공조ㆍ공생체계 구축을 경쟁적으로 추진하고 있으나 국내 자동차업계는 제품개발, 부품조달, 판매 및 A/S 등 모든 부문을 독자적으로 해결함으로써 경쟁력 제고에 역행하는 경향이 있다. 또한 자동차 선진국과는 달리 국제 경쟁력 강화를 위한 CALS/EC 정보 기반 기술의 실질적인 활용이 미흡한 실정이다. 이러한 현실을 개선하기 위해 최근에 자동차공업협회(KAMA)와 현대, 대우, 기아 자동차 3사는 자동차 산업 CALS 추진 모델(Autopia)의 구축을 추진하고 있다. 추진 내용은 자동차 산업의 전체 Life-Cycle인 제품기획 단계부터 설계, 생산, 구매/조달, 고객지원 단계등 전 분야를 3개 부문(신차개발 프로세스, 구매조달 프로세스, 고객지원 서비스)으로 구분되어 있다. 신차개발 프로세스 부문은 차세대 PDM을 통하여 제품개발 사이클 단축을 추구하며 STEP을 통한 범용적 설계정보 교환 체계 구현이 기반이 된다. 또한 업무 흐름의 불투명성으로 인한 업무의 불균형 현상 타파와 설계 변경의 효율적 대응을 위하여 Workflow Management가 동시공학에 바탕을 두고 도입 적용되어야 한다. CAD 데이터를 비롯한 방대한 데이터의 효율적 관리를 위해서는 각 프로세스별로 독립된 정보를 체계적으로 관리할 수 있는 통합 환경(Integrated Data Environment)을 구성하여 각 프로세스에 걸쳐 데이터의 처리효율을 증대하여야 한다. 신차개발 부문의 핵심 기술이면서도 현업 적용이 초기 단계인 Digital Mockup과 Virtual Reality의 적용을 위해서는 3D 모델링이 기본 설계 방법으로 적용되어야 하며 이를 통한 어셈블리 및 부품구조의 관리가 이루어져야 한다. 구매조달 프로세스 부문은 자동차 업계의 공통 EDI/EC 네트워크 구축을 통한 경제적인 인프라 구조와 함께 부품 조달 체계의 간소화를 추구함으로써 자동차 산업의 대외 경쟁력 강화가 이루어 질 수 있다. 공개구매 시스템의 구축을 통하여 완성차별로 전속 계열화된 수직적인 부품조달 체계와 업체간 정보공유의 폐쇄성을 제거할 수 있고 완전 경쟁에 의한 우량 협력업체 발굴 기회의 확대가 용이하다. 이를 통하여 궁극적으로는 Global Vendor망의 구축이 실현될 것이다. 종합물류 시스템이 구현되면 판매는 경쟁체제, 물류는 공동화가 됨으로써 국가적으로 물류 비용의 절감이 엄청날 것으로 예상된다. 전국에 산재되어 있는 1,000여개의 대리점과 7,000여개의 정비업소를 대상으로 한 정비부품 EDI/EC 시스템이 구축되면 고객 서비스의 효율 향상과 함께 정비업소의 물류 및 재고 비용의 감소, 조달 속도의 향상, 조달 업무의 간소화 등의 효과를 보게 될 것이다. 고객지원 서비스는 정비정보 시스템, 산업정보 시스템, 쇼핑몰 시스템, 등록대행 시스템등을 통하여 일반 국민들이 피부로 느낄 수 있는 시스템으로 구축 되어야 할 것이다.

  • PDF

Improvement of Altitude Measurement Algorithm Based on Accelerometer for Holding Drone's Altitude (드론의 고도 유지를 위한 가속도센서 기반 고도 측정 알고리즘 개선)

  • Kim, Deok Yeop;Yun, Bo Ram;Lee, Sunghee;Lee, Woo Jin
    • KIPS Transactions on Software and Data Engineering
    • /
    • v.6 no.10
    • /
    • pp.473-478
    • /
    • 2017
  • Drones require altitude holding in order to achieve flight objectives. The altitude holding of the drone is to repeat the operation of raising or lowering the drone according to the altitude information being measured in real-time. When the drones are maintained altitude, the drone's altitude will continue to change due to external factors such as imbalance in thrust due to difference in motor speed or wind. Therefore, in order to maintain the altitude of drone, we have to exactly measure the continuously changing altitude of the drone. Generally, the acceleration sensor is used for measuring the height of the drones. In this method, there is a problem that the measured value due to the integration error accumulates, and the drone's vibration is recognized by the altitude change. To solve the difficulty of the altitude measurement, commercial drones and existing studies are used for altitude measurement together with acceleration sensors by adding other sensors. However, most of the additional sensors have a limitation on the measurement distance and when the sensors are used together, the calculation processing of the sensor values increases and the altitude measurement speed is delayed. Therefore, it is necessary to accurately measure the altitude of the drone without considering additional sensors or devices. In this paper, we propose a measurement algorithm that improves general altitude measurement method using acceleration sensor and show that accuracy of altitude holding and altitude measurement is improved as a result of applying this algorithm.

Research on ITB Contract Terms Classification Model for Risk Management in EPC Projects: Deep Learning-Based PLM Ensemble Techniques (EPC 프로젝트의 위험 관리를 위한 ITB 문서 조항 분류 모델 연구: 딥러닝 기반 PLM 앙상블 기법 활용)

  • Hyunsang Lee;Wonseok Lee;Bogeun Jo;Heejun Lee;Sangjin Oh;Sangwoo You;Maru Nam;Hyunsik Lee
    • KIPS Transactions on Software and Data Engineering
    • /
    • v.12 no.11
    • /
    • pp.471-480
    • /
    • 2023
  • The Korean construction order volume in South Korea grew significantly from 91.3 trillion won in public orders in 2013 to a total of 212 trillion won in 2021, particularly in the private sector. As the size of the domestic and overseas markets grew, the scale and complexity of EPC (Engineering, Procurement, Construction) projects increased, and risk management of project management and ITB (Invitation to Bid) documents became a critical issue. The time granted to actual construction companies in the bidding process following the EPC project award is not only limited, but also extremely challenging to review all the risk terms in the ITB document due to manpower and cost issues. Previous research attempted to categorize the risk terms in EPC contract documents and detect them based on AI, but there were limitations to practical use due to problems related to data, such as the limit of labeled data utilization and class imbalance. Therefore, this study aims to develop an AI model that can categorize the contract terms based on the FIDIC Yellow 2017(Federation Internationale Des Ingenieurs-Conseils Contract terms) standard in detail, rather than defining and classifying risk terms like previous research. A multi-text classification function is necessary because the contract terms that need to be reviewed in detail may vary depending on the scale and type of the project. To enhance the performance of the multi-text classification model, we developed the ELECTRA PLM (Pre-trained Language Model) capable of efficiently learning the context of text data from the pre-training stage, and conducted a four-step experiment to validate the performance of the model. As a result, the ensemble version of the self-developed ITB-ELECTRA model and Legal-BERT achieved the best performance with a weighted average F1-Score of 76% in the classification of 57 contract terms.

Energy-Efficient Routing Protocol based on Interference Awareness for Transmission of Delay-Sensitive Data in Multi-Hop RF Energy Harvesting Networks (다중 홉 RF 에너지 하베스팅 네트워크에서 지연에 민감한 데이터 전송을 위한 간섭 인지 기반 에너지 효율적인 라우팅 프로토콜)

  • Kim, Hyun-Tae;Ra, In-Ho
    • The Journal of the Korea Contents Association
    • /
    • v.18 no.3
    • /
    • pp.611-625
    • /
    • 2018
  • With innovative advances in wireless communication technology, many researches for extending network lifetime in maximum by using energy harvesting have been actively performed on the area of network resource optimization, QoS-guaranteed transmission, energy-intelligent routing and etc. As known well, it is very hard to guarantee end-to-end network delay due to uncertainty of the amount of harvested energy in multi-hop RF(radio frequency) energy harvesting wireless networks. To minimize end-to-end delay in multi-hop RF energy harvesting networks, this paper proposes an energy efficient routing metric based on interference aware and protocol which takes account of various delays caused by co-channel interference, energy harvesting time and queuing in a relay node. The proposed method maximizes end-to-end throughput by performing avoidance of packet congestion causing load unbalance, reduction of waiting time due to exhaustion of energy and restraint of delay time from co-channel interference. Finally simulation results using ns-3 simulator show that the proposed method outperforms existing methods in respect of throughput, end-to-end delay and energy consumption.

A Customized Healthy Menu Recommendation Method Using Content-Based and Food Substitution Table (내용 기반 및 식품 교환 표를 이용한 맞춤형 건강식단 추천 기법)

  • Oh, Yoori;Kim, Yoonhee
    • KIPS Transactions on Software and Data Engineering
    • /
    • v.6 no.3
    • /
    • pp.161-166
    • /
    • 2017
  • In recent times, many people have problems of nutritional imbalance; lack or surplus intake of a specific nutrient despite the variety of available foods. Accordingly, the interest in health and diet issues has increased leading to the emergence of various mobile applications. However, most mobile applications only record the user's diet history and show simple statistics and usually provide only general information for healthy diet. It is necessary for users interested in healthy eating to be provided recommendation services reflecting their food interest and providing customized information. Hence, we propose a menu recommendation method which includes calculating the recommended calorie amount based on the user's physical and activity profile to assign to each food group a substitution unit. In addition, our method also analyzes the user's food preferences using food intake history. Thus it satisfies recommended intake unit for each food group by exchanging the user's preferred foods. Also, the excellence of our proposed algorithm is demonstrated through the calculation of precision, recall, health index and the harmonic average of the 3 aforementioned measures. We compare it to another method which considers user's interest and recommended substitution unit. The proposed method provides menu recommendation reflecting interest and personalized health status by which user can improve and maintain a healthy dietary habit.

Negative Transition of Smart Device Utility: Empirical Study on IT-enabled Work Flexibility, After Hours Work Connectivity, and Work-Life Conflict (스마트기기 효용의 부정적 전이: IT기반 업무 유연성, 근무시간 외 업무 연결성, 일-삶 갈등에 관한 실증 연구)

  • Kim, Hyung-Jin;Lee, Yoon-ji;Lee, Ho-Geun
    • Informatization Policy
    • /
    • v.26 no.4
    • /
    • pp.36-61
    • /
    • 2019
  • While smart devices can have a positive impact on work efficiency and productivity by reducing time-space constraints and enabling rapid processing of tasks, side effects can arise from the imbalances between work and personal life. In recent years, as smart devices are increasingly used in work environments, it is more necessary than ever to understand the related phenomenon, find the cause of negative effects, and search for appropriate solutions. This study has developed and verified a theoretical model that shows how the technical characteristics known as the utility of smart devices are converted into negative results such as work-life conflict. As a result of analyzing the collected data from the employees, our study provides significant implications for the researchers, as well as the practitioners and policy makers, regarding various relationships among IT-enabled work flexibility, after-hours work connectivity and work-life conflict, and the new knowledge about the important role of segmentation supplies from the organization.

Evaluation of Preference by Bukhansan Dulegil Course Using Sentiment Analysis of Blog Data (블로그 데이터 감성분석을 통한 북한산둘레길 구간별 선호도 평가)

  • Lee, Sung-Hee;Son, Yong-Hoon
    • Journal of the Korean Institute of Landscape Architecture
    • /
    • v.49 no.3
    • /
    • pp.1-10
    • /
    • 2021
  • This study aimed to evaluate preferences of Bukhansan dulegil using sentiment analysis, a natural language processing technique, to derive preferred and non-preferred factors. Therefore, we collected blog articles written in 2019 and produced sentimental scores by the derivation of positive and negative words in the texts for 21 dulegil courses. Then, content analysis was conducted to determine which factors led visitors to prefer or dislike each course. In blogs written about Bukhansan dulegil, positive words appeared in approximately 73% of the content, and the percentage of positive documents was significantly higher than that of negative documents for each course. Through this, it can be seen that visitors generally had positive sentiments toward Bukhansan dulegil. Nevertheless, according to the sentiment score analysis, all 21 dulegil courses belonged to both the preferred and non-preferred courses. Among courses, visitors preferred less difficult courses, in which they could walk without a burden, and in which various landscape elements (visual, auditory, olfactory, etc.) were harmonious yet distinct. Furthermore, they preferred courses with various landscapes and landscape sequences. Additionally, visitors appreciated the presence of viewpoints, such as observation decks, as a significant factor and preferred courses with excellent accessibility and information provisions, such as information boards. Conversely, the dissatisfaction with the dulegil courses was due to noise caused by adjacent roads, excessive urban areas, and the inequality or difficulty of the course which was primarily attributed to insufficient information on the landscape or section of the course. The results of this study can serve not only serve as a guide in national parks but also in the management of nearby forest green areas to formulate a plan to repair and improve dulegil. Further, the sentiment analysis used in this study is meaningful in that it can continuously monitor actual users' responses towards natural areas. However, since it was evaluated based on a predefined sentiment dictionary, continuous updates are needed. Additionally, since there is a tendency to share positive content rather than negative views due to the nature of social media, it is necessary to compare and review the results of analysis, such as with on-site surveys.