• Title/Summary/Keyword: Learning to Rank

Search Result 119, Processing Time 0.03 seconds

Horse race rank prediction using learning-to-rank approaches (Learning-to-rank 기법을 활용한 서울 경마경기 순위 예측)

  • Junhyoung Chung;Donguk Shin;Seyong Hwang;Gunwoong Park
    • The Korean Journal of Applied Statistics
    • /
    • v.37 no.2
    • /
    • pp.239-253
    • /
    • 2024
  • This research applies both point-wise and pair-wise learning strategies within the learning-to-rank (LTR) framework to predict horse race rankings in Seoul. Specifically, for point-wise learning, we employ a linear model and random forest. In contrast, for pair-wise learning, we utilize tools such as RankNet, and LambdaMART (XGBoost Ranker, LightGBM Ranker, and CatBoost Ranker). Furthermore, to enhance predictions, race records are standardized based on race distance, and we integrate various datasets, including race information, jockey information, horse training records, and trainer information. Our results empirically demonstrate that pair-wise learning approaches that can reflect the order information between items generally outperform point-wise learning approaches. Notably, CatBoost Ranker is the top performer. Through Shapley value analysis, we identified that the important variables for CatBoost Ranker include the performance of a horse, its previous race records, the count of its starting trainings, the total number of starting trainings, and the instances of disease diagnoses for the horse.

Recommendations Based on Listwise Learning-to-Rank by Incorporating Social Information

  • Fang, Chen;Zhang, Hengwei;Zhang, Ming;Wang, Jindong
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.12 no.1
    • /
    • pp.109-134
    • /
    • 2018
  • Collaborative Filtering (CF) is widely used in recommendation field, which can be divided into rating-based CF and learning-to-rank based CF. Although many methods have been proposed based on these two kinds of CF, there still be room for improvement. Firstly, the data sparsity problem still remains a big challenge for CF algorithms. Secondly, the malicious rating given by some illegal users may affect the recommendation accuracy. Existing CF algorithms seldom took both of the two observations into consideration. In this paper, we propose a recommendation method based on listwise learning-to-rank by incorporating users' social information. By taking both ratings and order of items into consideration, the Plackett-Luce model is presented to find more accurate similar users. In order to alleviate the data sparsity problem, the improved matrix factorization model by integrating the influence of similar users is proposed to predict the rating. On the basis of exploring the trust relationship between users according to their social information, a listwise learning-to-rank algorithm is proposed to learn an optimal ranking model, which can output the recommendation list more consistent with the user preference. Comprehensive experiments conducted on two public real-world datasets show that our approach not only achieves high recommendation accuracy in relatively short runtime, but also is able to reduce the impact of malicious ratings.

Efficient Retrieval of Short Opinion Documents Using Learning to Rank (기계학습을 이용한 단문 오피니언 문서의 효율적 검색 기법)

  • Chang, Jae-Young
    • The Journal of the Institute of Internet, Broadcasting and Communication
    • /
    • v.13 no.4
    • /
    • pp.117-126
    • /
    • 2013
  • Recently, as Social Network Services(SNS), such as Twitter, Facebook, are becoming more popular, much research has been doing on opinion mining. However, current related researches are mostly focused on sentiment classification or feature selection, but there were few studies about opinion document retrieval. In this paper, we propose a new retrieval method of short opinion documents. Proposed method utilizes previous sentiment classification methodology, and applies several features of documents for evaluating the quality of the opinion documents. For generating the retrieval model, we adopt Learning-to-rank technique and integrate sentiment classification model to Learning-to-rank. Experimental results show that proposed method can be applied successfully in opinion search.

A Federated Multi-Task Learning Model Based on Adaptive Distributed Data Latent Correlation Analysis

  • Wu, Shengbin;Wang, Yibai
    • Journal of Information Processing Systems
    • /
    • v.17 no.3
    • /
    • pp.441-452
    • /
    • 2021
  • Federated learning provides an efficient integrated model for distributed data, allowing the local training of different data. Meanwhile, the goal of multi-task learning is to simultaneously establish models for multiple related tasks, and to obtain the underlying main structure. However, traditional federated multi-task learning models not only have strict requirements for the data distribution, but also demand large amounts of calculation and have slow convergence, which hindered their promotion in many fields. In our work, we apply the rank constraint on weight vectors of the multi-task learning model to adaptively adjust the task's similarity learning, according to the distribution of federal node data. The proposed model has a general framework for solving optimal solutions, which can be used to deal with various data types. Experiments show that our model has achieved the best results in different dataset. Notably, our model can still obtain stable results in datasets with large distribution differences. In addition, compared with traditional federated multi-task learning models, our algorithm is able to converge on a local optimal solution within limited training iterations.

'Hot Search Keyword' Rank-Change Prediction (인기 검색어의 순위 변화 예측)

  • Kim, Dohyeong;Kang, Byeong Ho;Lee, Sungyoung
    • Journal of KIISE
    • /
    • v.44 no.8
    • /
    • pp.782-790
    • /
    • 2017
  • The service, 'Hot Search Keywords', provides a list of the most hot search terms of different web services such as Naver or Daum. The service, bases the changes in rank of a specific search keyword on changes in its users' interest. This paper introduces a temporal modelling framework for predicting the rank change of hot search keywords using past rank data and machine learning. Past rank data shows that more than 70% of hot search keywords tend to disappear and reappear later. The authors processed missing rank value, using deletion, dummy variables, mean substitution, and expectation maximization. It is however crucial to calculate the optimal window size of the past rank data. We proposed an optimal window size selection approach based on the minimum amount of time a topic within the same or a differing context disappeared. The experiments were conducted with four different machine-learning techniques using the Naver, Daum, and Nate 'Hot Search Keywords' datasets, which were collected for 2 years.

Relationships between Teaching Professional Rank, Course Taking, Teaching Experience and Knowledge of Algebra for Teaching

  • Huang, Rongjin;Li, Yeping;Kulm, Gerald;Willson, Victor
    • Research in Mathematical Education
    • /
    • v.18 no.2
    • /
    • pp.129-148
    • /
    • 2014
  • In this study, we examined the relationships among years of teaching experience, professional rank, number of courses taken, and knowledge of algebra for teaching (KAT). 338 in-service and 376 pre-service secondary mathematics teachers in China completed a KAT questionnaire. Various statistical techniques were employed to examine these relationships. The pre-service participants teachers performed statistically significantly higher in advanced mathematics knowledge than their in-service counterparts. Among the inservice teachers, senior teachers had scored higher in school mathematics and teaching mathematics, compared with junior teachers. Yet participants' advanced mathematics knowledge decreased as their professional rank advanced or their teaching experience increased. The number of courses taken has significantly positive correlation with school mathematics knowledge and advanced mathematics knowledge. The implications of these findings for mathematics teacher education are discussed.

Graph Construction Based on Fast Low-Rank Representation in Graph-Based Semi-Supervised Learning (그래프 기반 준지도 학습에서 빠른 낮은 계수 표현 기반 그래프 구축)

  • Oh, Byonghwa;Yang, Jihoon
    • Journal of KIISE
    • /
    • v.45 no.1
    • /
    • pp.15-21
    • /
    • 2018
  • Low-Rank Representation (LRR) based methods are widely used in many practical applications, such as face clustering and object detection, because they can guarantee high prediction accuracy when used to constructing graphs in graph - based semi-supervised learning. However, in order to solve the LRR problem, it is necessary to perform singular value decomposition on the square matrix of the number of data points for each iteration of the algorithm; hence the calculation is inefficient. To solve this problem, we propose an improved and faster LRR method based on the recently published Fast LRR (FaLRR) and suggests ways to introduce and optimize additional constraints on the underlying optimization goals in order to address the fact that the FaLRR is fast but actually poor in classification problems. Our experiments confirm that the proposed method finds a better solution than LRR does. We also propose Fast MLRR (FaMLRR), which shows better results when the goal of minimizing is added.

Automatic and objective gradation of 114 183 terrorist attacks using a machine learning approach

  • Chi, Wanle;Du, Yihong
    • ETRI Journal
    • /
    • v.43 no.4
    • /
    • pp.694-701
    • /
    • 2021
  • Catastrophic events cause casualties, damage property, and lead to huge social impacts. To build common standards and facilitate international communications regarding disasters, the relevant authorities in social management rank them in subjectively imposed terms such as direct economic losses and loss of life. Terrorist attacks involving uncertain human factors, which are roughly graded based on the rule of property damage, are even more difficult to interpret and assess. In this paper, we collected 114 183 open-source records of terrorist attacks and used a machine learning method to grade them synthetically in an automatic and objective way. No subjective claims or personal preferences were involved in the grading, and each derived common factor contains the comprehensive and rich information of many variables. Our work presents a new automatic ranking approach and is suitable for a broad range of gradation problems. Furthermore, we can use this model to grade all such attacks globally and visualize them to provide new insights.

KR-WordRank : An Unsupervised Korean Word Extraction Method Based on WordRank (KR-WordRank : WordRank를 개선한 비지도학습 기반 한국어 단어 추출 방법)

  • Kim, Hyun-Joong;Cho, Sungzoon;Kang, Pilsung
    • Journal of Korean Institute of Industrial Engineers
    • /
    • v.40 no.1
    • /
    • pp.18-33
    • /
    • 2014
  • A Word is the smallest unit for text analysis, and the premise behind most text-mining algorithms is that the words in given documents can be perfectly recognized. However, the newly coined words, spelling and spacing errors, and domain adaptation problems make it difficult to recognize words correctly. To make matters worse, obtaining a sufficient amount of training data that can be used in any situation is not only unrealistic but also inefficient. Therefore, an automatical word extraction method which does not require a training process is desperately needed. WordRank, the most widely used unsupervised word extraction algorithm for Chinese and Japanese, shows a poor word extraction performance in Korean due to different language structures. In this paper, we first discuss why WordRank has a poor performance in Korean, and propose a customized WordRank algorithm for Korean, named KR-WordRank, by considering its linguistic characteristics and by improving the robustness to noise in text documents. Experiment results show that the performance of KR-WordRank is significantly better than that of the original WordRank in Korean. In addition, it is found that not only can our proposed algorithm extract proper words but also identify candidate keywords for an effective document summarization.

Supervised Rank Normalization for Support Vector Machines (SVM을 위한 교사 랭크 정규화)

  • Lee, Soojong;Heo, Gyeongyong
    • Journal of the Korea Society of Computer and Information
    • /
    • v.18 no.11
    • /
    • pp.31-38
    • /
    • 2013
  • Feature normalization as a pre-processing step has been widely used in classification problems to reduce the effect of different scale in each feature dimension and error as a result. Most of the existing methods, however, assume some distribution function on feature distribution. Even worse, existing methods do not use the labels of data points and, as a result, do not guarantee the optimality of the normalization results in classification. In this paper, proposed is a supervised rank normalization which combines rank normalization and a supervised learning technique. The proposed method does not assume any feature distribution like rank normalization and uses class labels of nearest neighbors in classification to reduce error. SVM, in particular, tries to draw a decision boundary in the middle of class overlapping zone, the reduction of data density in that area helps SVM to find a decision boundary reducing generalized error. All the things mentioned above can be verified through experimental results.