Skip to content

Introduce a product classifier #6680

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
Jun 5, 2025

Conversation

akatsoulas
Copy link
Collaborator

No description provided.

@akatsoulas akatsoulas marked this pull request as draft May 27, 2025 13:48
@akatsoulas akatsoulas force-pushed the product-classifier branch from 8c88f36 to 082d72b Compare May 27, 2025 13:54
Copy link
Contributor

@escattone escattone left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall, I like this approach. I'd suggest one refinement. I think we should only run the product re-classification when we know that the only reason the question was marked as spam was because of its irrelevance to its original product. In other words, it was a legitimate support request, but not relevant to its original product classification. The spam classification step can provide that extra information in its result without changing its current scope.

I created a PR based on your work here that shows what I'm thinking. I tested it using two examples that Emil provided and it worked great.

Copy link
Contributor

@escattone escattone left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is a nice step forward.

r+wc

Comment on lines 192 to 193
and hasattr(question, "product")
and getattr(question.product, "title", None) != new_product_title
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit. The hasattr and getattr aren't necessary, and I think using:

        and question.product
        and (question.product.title != new_product_title)

is more readable too.

and hasattr(question, "product")
and getattr(question.product, "title", None) != new_product_title
):
from kitsune.products.models import Product
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not needed. Product is already imported above.

try:
obj = (manager or model.objects).get(title=title)
except (model.DoesNotExist, getattr(model, "MultipleObjectsReturned", Exception)) as exc:
log.warning(f"{log_msg} Exception: {exc}", exc_info=True)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would be nice to send this exception to Sentry, since it indicates that either the LLM has selected a bad product or topic title, or something's wrong with the products or topics that we've provided to the LLM in the prompt.

if new_product := get_object_by_title(
Product,
new_product_title,
f"LLM suggested product {new_product_title} does not exist. Skipping product update.",
):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this should also be restricted to active products, so I'd recommend adding manager=Product.active.

Comment on lines 219 to 222
if topic := update_fields.get("topic"):
question.clear_cached_tags()
question.tags.clear()
question.auto_tag()
Copy link
Contributor

@escattone escattone Jun 3, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems to me that we'd also need to re-do the question's tags for the case when the question's product has been updated (since the product slug is a tag), but the topic wasn't -- which I guess means the question was moved to a different product, but was also considered spam under that product? Maybe impossible, but maybe not? I'm thinking that it would be better to do something like the following instead:

        # Always update the tags, because we've updated product and/or topic fields.
        question.clear_cached_tags()
        question.tags.clear()
        question.auto_tag()
        # Only flag the question if we've updated the topic.
        if topic := update_fields.get("topic"):
            flag_question(
                ...
            )

@akatsoulas akatsoulas force-pushed the product-classifier branch from 678b098 to 603bda6 Compare June 5, 2025 09:20
@akatsoulas akatsoulas merged commit cc440ea into mozilla:main Jun 5, 2025
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants