Exploiting the hierarchical dependence behind user behaviour is critical for click-through rate (CRT) prediction in recommender systems. Existing methods apply attention mechanisms to obtain the weights of items;howev...Exploiting the hierarchical dependence behind user behaviour is critical for click-through rate (CRT) prediction in recommender systems. Existing methods apply attention mechanisms to obtain the weights of items;however, the authors argue that deterministic attention mechanisms cannot capture the hierarchical dependence between user behaviours because they treat each user behaviour as an independent individual and cannot accurately express users' flexible and changeable interests. To tackle this issue, the authors introduce the Bayesian attention to the CTR prediction model, which treats attention weights as data-dependent local random variables and learns their distribution by approximating their posterior distribution. Specifically, the prior knowledge is constructed into the attention weight distribution, and then the posterior inference is utilised to capture the implicit and flexible user intentions. Extensive experiments on public datasets demonstrate that our algorithm outperforms state-of-the-art algorithms. Empirical evidence shows that random attention weights can predict user intentions better than deterministic ones.展开更多
基金Natural Science Foundation Project of Chongqing(No.CSTB2023NSCQ-MSX0343)Science and Technology Research Programme of Chongqing Municipal Education Commission(No.KJZD-K202101105)+2 种基金Humanities and Social Sciences Research Programme of Chongqing Municipal Education Commission(No.22SKGH302)Chongqing Municipal Entrepreneurship and Innovation Support Project for Returned Overseas(No.cx2021087)National Natural Science Foundation of China(No.61702063).
文摘Exploiting the hierarchical dependence behind user behaviour is critical for click-through rate (CRT) prediction in recommender systems. Existing methods apply attention mechanisms to obtain the weights of items;however, the authors argue that deterministic attention mechanisms cannot capture the hierarchical dependence between user behaviours because they treat each user behaviour as an independent individual and cannot accurately express users' flexible and changeable interests. To tackle this issue, the authors introduce the Bayesian attention to the CTR prediction model, which treats attention weights as data-dependent local random variables and learns their distribution by approximating their posterior distribution. Specifically, the prior knowledge is constructed into the attention weight distribution, and then the posterior inference is utilised to capture the implicit and flexible user intentions. Extensive experiments on public datasets demonstrate that our algorithm outperforms state-of-the-art algorithms. Empirical evidence shows that random attention weights can predict user intentions better than deterministic ones.