• Title/Summary/Keyword: minority society

Search Result 443, Processing Time 0.019 seconds

Conditional Generative Adversarial Network based Collaborative Filtering Recommendation System (Conditional Generative Adversarial Network(CGAN) 기반 협업 필터링 추천 시스템)

  • Kang, Soyi;Shin, Kyung-shik
    • Journal of Intelligence and Information Systems
    • /
    • v.27 no.3
    • /
    • pp.157-173
    • /
    • 2021
  • With the development of information technology, the amount of available information increases daily. However, having access to so much information makes it difficult for users to easily find the information they seek. Users want a visualized system that reduces information retrieval and learning time, saving them from personally reading and judging all available information. As a result, recommendation systems are an increasingly important technologies that are essential to the business. Collaborative filtering is used in various fields with excellent performance because recommendations are made based on similar user interests and preferences. However, limitations do exist. Sparsity occurs when user-item preference information is insufficient, and is the main limitation of collaborative filtering. The evaluation value of the user item matrix may be distorted by the data depending on the popularity of the product, or there may be new users who have not yet evaluated the value. The lack of historical data to identify consumer preferences is referred to as data sparsity, and various methods have been studied to address these problems. However, most attempts to solve the sparsity problem are not optimal because they can only be applied when additional data such as users' personal information, social networks, or characteristics of items are included. Another problem is that real-world score data are mostly biased to high scores, resulting in severe imbalances. One cause of this imbalance distribution is the purchasing bias, in which only users with high product ratings purchase products, so those with low ratings are less likely to purchase products and thus do not leave negative product reviews. Due to these characteristics, unlike most users' actual preferences, reviews by users who purchase products are more likely to be positive. Therefore, the actual rating data is over-learned in many classes with high incidence due to its biased characteristics, distorting the market. Applying collaborative filtering to these imbalanced data leads to poor recommendation performance due to excessive learning of biased classes. Traditional oversampling techniques to address this problem are likely to cause overfitting because they repeat the same data, which acts as noise in learning, reducing recommendation performance. In addition, pre-processing methods for most existing data imbalance problems are designed and used for binary classes. Binary class imbalance techniques are difficult to apply to multi-class problems because they cannot model multi-class problems, such as objects at cross-class boundaries or objects overlapping multiple classes. To solve this problem, research has been conducted to convert and apply multi-class problems to binary class problems. However, simplification of multi-class problems can cause potential classification errors when combined with the results of classifiers learned from other sub-problems, resulting in loss of important information about relationships beyond the selected items. Therefore, it is necessary to develop more effective methods to address multi-class imbalance problems. We propose a collaborative filtering model using CGAN to generate realistic virtual data to populate the empty user-item matrix. Conditional vector y identify distributions for minority classes and generate data reflecting their characteristics. Collaborative filtering then maximizes the performance of the recommendation system via hyperparameter tuning. This process should improve the accuracy of the model by addressing the sparsity problem of collaborative filtering implementations while mitigating data imbalances arising from real data. Our model has superior recommendation performance over existing oversampling techniques and existing real-world data with data sparsity. SMOTE, Borderline SMOTE, SVM-SMOTE, ADASYN, and GAN were used as comparative models and we demonstrate the highest prediction accuracy on the RMSE and MAE evaluation scales. Through this study, oversampling based on deep learning will be able to further refine the performance of recommendation systems using actual data and be used to build business recommendation systems.

Ensemble Learning with Support Vector Machines for Bond Rating (회사채 신용등급 예측을 위한 SVM 앙상블학습)

  • Kim, Myoung-Jong
    • Journal of Intelligence and Information Systems
    • /
    • v.18 no.2
    • /
    • pp.29-45
    • /
    • 2012
  • Bond rating is regarded as an important event for measuring financial risk of companies and for determining the investment returns of investors. As a result, it has been a popular research topic for researchers to predict companies' credit ratings by applying statistical and machine learning techniques. The statistical techniques, including multiple regression, multiple discriminant analysis (MDA), logistic models (LOGIT), and probit analysis, have been traditionally used in bond rating. However, one major drawback is that it should be based on strict assumptions. Such strict assumptions include linearity, normality, independence among predictor variables and pre-existing functional forms relating the criterion variablesand the predictor variables. Those strict assumptions of traditional statistics have limited their application to the real world. Machine learning techniques also used in bond rating prediction models include decision trees (DT), neural networks (NN), and Support Vector Machine (SVM). Especially, SVM is recognized as a new and promising classification and regression analysis method. SVM learns a separating hyperplane that can maximize the margin between two categories. SVM is simple enough to be analyzed mathematical, and leads to high performance in practical applications. SVM implements the structuralrisk minimization principle and searches to minimize an upper bound of the generalization error. In addition, the solution of SVM may be a global optimum and thus, overfitting is unlikely to occur with SVM. In addition, SVM does not require too many data sample for training since it builds prediction models by only using some representative sample near the boundaries called support vectors. A number of experimental researches have indicated that SVM has been successfully applied in a variety of pattern recognition fields. However, there are three major drawbacks that can be potential causes for degrading SVM's performance. First, SVM is originally proposed for solving binary-class classification problems. Methods for combining SVMs for multi-class classification such as One-Against-One, One-Against-All have been proposed, but they do not improve the performance in multi-class classification problem as much as SVM for binary-class classification. Second, approximation algorithms (e.g. decomposition methods, sequential minimal optimization algorithm) could be used for effective multi-class computation to reduce computation time, but it could deteriorate classification performance. Third, the difficulty in multi-class prediction problems is in data imbalance problem that can occur when the number of instances in one class greatly outnumbers the number of instances in the other class. Such data sets often cause a default classifier to be built due to skewed boundary and thus the reduction in the classification accuracy of such a classifier. SVM ensemble learning is one of machine learning methods to cope with the above drawbacks. Ensemble learning is a method for improving the performance of classification and prediction algorithms. AdaBoost is one of the widely used ensemble learning techniques. It constructs a composite classifier by sequentially training classifiers while increasing weight on the misclassified observations through iterations. The observations that are incorrectly predicted by previous classifiers are chosen more often than examples that are correctly predicted. Thus Boosting attempts to produce new classifiers that are better able to predict examples for which the current ensemble's performance is poor. In this way, it can reinforce the training of the misclassified observations of the minority class. This paper proposes a multiclass Geometric Mean-based Boosting (MGM-Boost) to resolve multiclass prediction problem. Since MGM-Boost introduces the notion of geometric mean into AdaBoost, it can perform learning process considering the geometric mean-based accuracy and errors of multiclass. This study applies MGM-Boost to the real-world bond rating case for Korean companies to examine the feasibility of MGM-Boost. 10-fold cross validations for threetimes with different random seeds are performed in order to ensure that the comparison among three different classifiers does not happen by chance. For each of 10-fold cross validation, the entire data set is first partitioned into tenequal-sized sets, and then each set is in turn used as the test set while the classifier trains on the other nine sets. That is, cross-validated folds have been tested independently of each algorithm. Through these steps, we have obtained the results for classifiers on each of the 30 experiments. In the comparison of arithmetic mean-based prediction accuracy between individual classifiers, MGM-Boost (52.95%) shows higher prediction accuracy than both AdaBoost (51.69%) and SVM (49.47%). MGM-Boost (28.12%) also shows the higher prediction accuracy than AdaBoost (24.65%) and SVM (15.42%)in terms of geometric mean-based prediction accuracy. T-test is used to examine whether the performance of each classifiers for 30 folds is significantly different. The results indicate that performance of MGM-Boost is significantly different from AdaBoost and SVM classifiers at 1% level. These results mean that MGM-Boost can provide robust and stable solutions to multi-classproblems such as bond rating.

Burqanism from the Origin of the Pastoral Nomadic Koryo Region and the Vision of Korean Livestock Farming (고려의 원시영역 유목초지, 그 부르칸(불함)이즘과 한국축산의 비전)

  • Chu Chae Hyok
    • Journal of The Korean Society of Grassland and Forage Science
    • /
    • v.25 no.1
    • /
    • pp.71-82
    • /
    • 2005
  • Khori(高麗) refers to the Chaabog(reindeer) that live on lichens(蘚) on Mt. Soyon(鮮) in which pastures are the cold and dry plateau of North Eurasia. Thus, the origin region of the Khori or Koguryo that are the ancestors of the reindeer-herding pastoral nomads(馴鹿 遊牧民) can be said to be the Steppe-Taiga-Tundra pastoral areas of North Eurasia and North America. When the pastoral nomads moved on to the great mountain(大山) zone of the Jangbaek(長白) to the Baekdu(白頭) Mountains, they could have been in contact with pastoral farmers or agricultural farmers living there and they became the farmers remaining on agricultural farms. They were the Koryo people, the ancestors of Korea. Staying in one place, they gradually forgot the origin of their reindeer-herding pastoral nomadic history in the Northwest area of Mt. Soyon, the small mountain(小山) zone of the Steppe-Taiga-Tundra pastoral areas. In other words, they lost their identity as reindeer-herding pastoral nomads when they entered the agricultural area after leaving the pastoral area. However, since their basic genes had already formed when they lived on the cold and dry plateau of North Eurasia, it is possible to study their pastoral nomadic history focusing on 'the minority living in the broad area(廣域少數)', by utilizing highly advanced biotechnological science and focusing on genes and information technology innovation, and removing various past hindrances in research. Therefore, it is not so difficult to restore the reindeerherding pastoral nomadic history of the Koguryo(高句麗) people and secure their pastoral nomadic identity, of which the first steps have already been taken into their historical stages. The Eurasian continent and the Korean peninsula, especially the cold and dry plateau of North Eurasia and the Korean peninsula have been closely related to each other ecologically and historically. They can never be a separate space at all. The Eurasian continent lies horizontally east to west and thus, the continent forms an isothermal zone. Also, since the time of producing their own foods, it was relatively easy for people with their technology to move to other places owing to the pastoral nomadic characteristic of mobility. Unlike the Chungyen(中原) region, western Asia and the regions covering the Siberia-Manchu-Korean peninsula where food production revolution was first made were connected to the Mongolian lichens route(蘚苔之路: Ni, ukinii jam) and steppe roads. Although the ecological conditions of nature have changed a bit throughout a long history, it was natural for the many tribes in North Asia living on the largest Steppe-Taiga-Tundra area in the world to have believed 'the legends related to animals in relation to their founders and ancestors(獸祖傳說)'. Assuming that Siberian tigers and the tigers living on Mt. Baekdu were connected ecologically and genetically because of the ecological characteristics of the animals, and their migration from plateau to plateau, we would suspect that the Chosun(朝鮮) tribe living on Mt. Baekdu were ethnically and culturally more closely connected to the farther removed Ural-Altai tribes that lived on the cold and dry plateau region than to the Han(i14;) tribe who lived in Chungyen(中原) that was close to Mt. Baekdu. More evidence is the structure of the Korean language which has the form of 'Subject + Object + Verb', which is assumed to have originated from the speedy lifestyle of the reindeer-herding pastoral nomads. The structure is quite different from that of the Han(漢) language, which is based on agricultural life. Also, it is natural for reindeer riding reindeerherding pastoral nomads or horse-riding sheep-herding pastoral nomads(騎馬, 羊遊牧民) to have held military and political power over the region and eventually to have established an ancient pastoral nomadic empire in the process of their conquest of agricultural regions. The stages for founding global empires in the history of mankind maybe largely divided into two, in terms of ecological conditions and occupations. They are the steppes and the oceans. Of course, the steppe-based empires were established based on the skills to deal with horses and the ability to shoot arrows while riding horses, along with the use of iron ware in the 8th century BC. The steppe-based empires became the foundation for an oceanic empire, which could have been established by the use of warships and warship guns since the 15th Century. Based on those facts, we know that Chosun, Puyo(夫餘), and Koguryo are the products of a developmental process of pastoral nomadic empires on the steppes. Maybe we can easily find the pastoral nomadic identity of the Koguryo more than we expected when we trace the origins and history of the Korean tribe living in the pastures located in the northwest area of Mt. Jangbaek by focusing on pastoral nomadic mobility and organization just as we have investigated the historic origins of Anglo-Saxons in America by focusing on the times before the 15th Century. In the process, we should keep in mind that English culture originated from the Industrial Revolution and was directly delivered to the American continent, although America was far from England and was not an intermediate point on long sojourns either. Further, American culture came back to England in a more advanced form later. The most important thing currently to be resolved is to cause Koreans to look back on their own history in a freer way of thinking and with diverse, profound, and sharp insight, taking away the old and existing conventional recognition that is entangled with complicated interests with Korean people and other countries. The meanings of Chosun, Khori, and Solongos have been interpreted arbitrarily without any historic evidence by the scholars who followed conventional tradition of fixed-minded aristocrats in an agricultural society. If the Siberian cultural properties of the stone age, the earthenware age, the bronze age, and the iron age are analyzed in such a way, archaeological discovery will never be able to contribute to the restoration of the Koguryo's pastoral nomadic identity. One should transcend the errors that tend to interpret the cultural properties discovered in the pastoral nomadic regions as not being differentiated from those of agricultural regions and just interpret them altogether from the agricultural point of view. A more careful intention is required in the interpretation of cultural properties of ancient Korean empires that seem to have been formed due to mutual interactions of pastoral nomadic and agricultural cultures. Also, it is required that the conventional recognition chain of 'reverse-genes' be severed, which has placed more weight on agricultural properties than pastoral nomadic ones, since their settlement on agricultural farms was made after the establishment of their ancient pastoral nomadic empires. There is no reason at all to place priority on stoneware, earthenware, bronze ware, and iron ware than on wooden ware(木器) and other ware which were made of animal skins(皮器), bones and horns(骨角器), in analyzing the history in the regions of reindeer or sheep pastures. Reading ancient Korean history from the perspective of pastoral nomadic history, one feels strongly the instinctive emotions to return to the natural 'mother place'. The reindeer-herding pastoral nomadic identity of the Koguryo people that has been accumulated in volumes in their genes and hidden deep inside and have interacted organically could be reborn with Burqanism(Burqan refers to 不咸 in Chinese), which was their religion by birth and symbolized as the red willow(紅柳=不咸). The mother place of the Koguryo's people is the endless vast green pastures of North Eurasia and North America, where we anticipated the development of Korean livestock farming following the inherent properties in the genes of the reindeer-herding pastoral nomads with Korean ancestors. We anticipate that the place would be the core resource that could contribute to the development of life of living creatures following the inherent properties of their genes and biotechnological factors. In other words, biotechnology used for a search for clues on the well-being of humans could be the fruit brought by Burqanism of the Koguryo people and the fruit of the globalization of Korean livestock farming. It is the Chosun farmer in China come from the vast nomadic reindeer pastures of North Eurasia that resolved the food problem of a billion Chinese people with lowland paddy rice seeds (水稻) by transforming Heilongjiang Province(黑龍江省) into an oceanic lowland paddy rice field(水田). Even Mao Tse-tung(毛擇東) could not resolve the food problem by his revolution campaigns for tens of years. Today is the very time that requires the development of special livestock farming following the inherent properties of the ancient Korean reindeer-herding pastoral nomads that respected the dignity of life on the cold and dry plateau of North Eurasia and the America continent. I suggest that research should be started from the pastures of the Dariganga Steppe in East Mongolia that was the homeland of Hanwoo(韓牛) and the central horse-herding steppe place(牧馬場) of Chingis Khan's Mongolia. The Dariganga Steppe is awash with an affluent natural environment for pastoral nomadic living however, the quality of life of the pastoral nomads there is still low. I suggest we Koreans, the descendents of the Koguryo, should take our first steps for our livestock farming business project and develop the Northern nomadic pastures, here at the pastures of the Dariganga Steppe, which is the Mongolian core place of state-of-the-art technology for military weapons.