[ASK] Confusion between top_k and by_threshold relevancy_method in python_evaluation.py #2140

mnhqut · 2024-08-10T15:55:17Z

Description

What is the different between k and threshold value here if they all get assigned to top_k?
Isn't threshold supposed to be a rating value that the predictions should exceed instead of being the number of items in the top_k list ?
Thanks.

def merge_ranking_true_pred(
    rating_true,
    rating_pred,
    col_user,
    col_item,
    col_prediction,
    relevancy_method,
    k=DEFAULT_K,
    threshold=DEFAULT_THRESHOLD,
    **_,
):
    if relevancy_method == "top_k":
        top_k = k
    elif relevancy_method == "by_threshold":
        top_k = threshold

Other Comments

The text was updated successfully, but these errors were encountered:

daviddavo · 2024-08-12T09:58:52Z

I don't really understand it neither. It calls get_top_k_items, which does this:

recommenders/recommenders/evaluation/python_evaluation.py

Lines 859 to 871 in 4f86e47

    
           # Sort dataframe by col_user and (top k) col_rating 
        
           if k is None: 
        
               top_k_items = dataframe 
        
           else: 
        
               top_k_items = ( 
        
                   dataframe.sort_values([col_user, col_rating], ascending=[True, False]) 
        
                   .groupby(col_user, as_index=False) 
        
                   .head(k) 
        
                   .reset_index(drop=True) 
        
               ) 
        
           # Add ranks 
        
           top_k_items["rank"] = top_k_items.groupby(col_user, sort=False).cumcount() + 1 
        
           return top_k_items

With k being top_k (threshold). It then calls .head(k), but according to pandas docs, k should be an integer.

The DEFAULT_THRESHOLD variable is just 10, an integer. Have you tried using a float? What happens then? Does it work? Isn't this exactly the same as using k?

I think a user expects by_threshold to just return just items that have a rating above that threshold.

mnhqut · 2024-08-15T16:17:35Z

Yes it is indeed very confusing how 'by_threshold' works.

Doing a quick test, I passed "relevancy_method = "top_k", k =20 ", and got the exact same result as passing "relevancy_method = "by_threshold", threshold = 20, k = whatever ". So the threshold here does not seem to represent a (float) rating value.

As for the type of 'top_k', if I passed 'threshold' or 'k' as a float, it raised no error and worked as if the number got rounded up, for example 2.1 as 3 or 7.6 as 8.

mnhqut added the help wanted Need help from developers label Aug 10, 2024

daviddavo mentioned this issue Aug 29, 2024

[BUG] Ranking Evaluation Metrics Exceed 1 with "by_threshold" Relevancy Method #2154

Open

2 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[ASK] Confusion between top_k and by_threshold relevancy_method in python_evaluation.py #2140

[ASK] Confusion between top_k and by_threshold relevancy_method in python_evaluation.py #2140

mnhqut commented Aug 10, 2024

daviddavo commented Aug 12, 2024

mnhqut commented Aug 15, 2024