|
| 1 | +from typing import Any |
| 2 | + |
| 3 | +from langchain.prompts import ChatPromptTemplate |
| 4 | +from langchain.schema import AIMessage |
| 5 | + |
| 6 | +from kitsune.llm.l10n.config import L10N_PROTECTED_TERMS |
| 7 | + |
| 8 | + |
| 9 | +TRANSLATION_INSTRUCTIONS = """ |
| 10 | +# Role and task |
| 11 | +- You are an expert at translating technical documents written in Wiki syntax about Mozilla's products from {{ source_language }} to {{ target_language }}. |
| 12 | +- Your task is to translate the given {{ source_language }} text into {{ target_language }}, **strictly obeying** the task instructions. |
| 13 | +- You may be given a "prior translation" as well. If so, the task instructions will describe how to use the "prior translation" to complete your task. |
| 14 | +
|
| 15 | +# Definitions |
| 16 | +Remember the following definitions. You will use them to complete your task. |
| 17 | +
|
| 18 | +## Definition of `wiki-hook` |
| 19 | +A `wiki-hook` is a string that case-sensitively matches the regular expression pattern that follows: |
| 20 | +
|
| 21 | +```python |
| 22 | +r"\[\[(Image|Video|V|Button|UI|Include|I|Template|T):.*?\]\]" |
| 23 | +``` |
| 24 | +
|
| 25 | +## Definition of `wiki-article-link` |
| 26 | +A `wiki-article-link` is a string that case-sensitively matches the regular expression pattern that follows: |
| 27 | +
|
| 28 | +```python |
| 29 | +r"\[\[(?!Image:|Video:|V:|Button:|UI:|Include:|I:|Template:|T:)[^|]+?(?:\|(?P<description>.+?))?\]\]" |
| 30 | +``` |
| 31 | +
|
| 32 | +## Definition of `wiki-external-link` |
| 33 | +A `wiki-external-link` is a string that case-sensitively matches the regular expression pattern that follows: |
| 34 | +
|
| 35 | +```python |
| 36 | +r"\[((mailto:|git://|irc://|https?://|ftp://|/)[^<>\]\[\x00-\x20\x7f]*)\s*(?P<description>.*?)\]" |
| 37 | +``` |
| 38 | +
|
| 39 | +## Definition of `prior-translation-wiki-map` |
| 40 | +- The `prior-translation-wiki-map` is a Python `dict` built from the "prior translation", if it is provided. |
| 41 | +- A Python `dict` maps keys to their values. |
| 42 | +- Each `wiki-hook`, `wiki-article-link`, and `wiki-external-link` in the {{ source_language }} text of the "prior translation" becomes a key in the `prior-translation-wiki-map`, and each key's value is its corresponding translation found in the {{ target_language }} text of the "prior translation". |
| 43 | +- If no "prior translation" is provided, the `prior-translation-wiki-map` should be set to an empty `dict`. |
| 44 | +
|
| 45 | +# Rules for translating special strings |
| 46 | +- Each of the following rules describes how to translate special strings that may be present within the {{ source_language }} text that you are translating. |
| 47 | +- **Strictly obey** all of these rules when freshly translating {{ source_language }} text. |
| 48 | +
|
| 49 | +1. **Preserve unchanged** each of the following strings (each is wrapped with single backticks): |
| 50 | +{%- for term in protected_terms %} |
| 51 | + - `{{ term }}` |
| 52 | +{%- endfor %} |
| 53 | +
|
| 54 | +2. **Preserve unchanged** each string that case-sensitively matches the following regular expression: |
| 55 | +
|
| 56 | + ```python |
| 57 | + r"\{(for|key|filepath) .*?\}" |
| 58 | + ``` |
| 59 | +
|
| 60 | +3. For each string that case-sensitively matches the following regular expression: |
| 61 | +
|
| 62 | + ```python |
| 63 | + r"\{(?:button|menu|pref) (?P<description>.*?)\}" |
| 64 | + ``` |
| 65 | +
|
| 66 | + - Translate only the text matched by the named group `description` (**remember to obey rule #1 above**), and **preserve the rest of the string unchanged**. |
| 67 | +
|
| 68 | +4. For each `wiki-hook`, perform the following steps: |
| 69 | + - First, check if the `wiki-hook` is a key within the `prior-translation-wiki-map`. |
| 70 | + - If it is a key within the `prior-translation-wiki-map`, use its value from the `prior-translation-wiki-map` as its translation. |
| 71 | + - If it is **not** a key within the `prior-translation-wiki-map`, **preserve it unchanged**. |
| 72 | +
|
| 73 | +5. For each `wiki-article-link`, perform the following steps: |
| 74 | + - First, check if the `wiki-article-link` is a key within the `prior-translation-wiki-map`. |
| 75 | + - If it is a key within the `prior-translation-wiki-map`, use its value from the `prior-translation-wiki-map` as its translation. |
| 76 | + - If it is **not** a key within the `prior-translation-wiki-map`, translate only the text matched by the named group `description` (**remember to obey rule #1 above**), and **preserve the rest unchanged**. |
| 77 | +
|
| 78 | +6. For each `wiki-external-link`, perform the following steps: |
| 79 | + - First, check if the `wiki-external-link` is a key within the `prior-translation-wiki-map`. |
| 80 | + - If it is a key within the `prior-translation-wiki-map`, use its value from the `prior-translation-wiki-map` as its translation. |
| 81 | + - If it is **not** a key within the `prior-translation-wiki-map`, translate only the text matched by the named group `description` (**remember to obey rule #1 above**), and **preserve the rest unchanged**. |
| 82 | +
|
| 83 | +# Task Instructions |
| 84 | +1. **Build the `prior-translation-wiki-map`**. If no "prior translation" is provided, set the `prior-translation-wiki-map` to an empty `dict`. |
| 85 | +2. **Compare** the {{ source_language }} text you've been asked to translate with the {{ source_language }} text of the "prior translation", if provided, and **determine which parts are the same and which parts are different**. If no "prior translation" was provided, consider the entire {{ source_language }} text you've been asked to translate as different. |
| 86 | +3. For each part that is the same, **copy** its corresponding translation from the {{ target_language }} text of the "prior translation". |
| 87 | +4. For each part that is different, **freshly translate** that part. **Remember to obey the `Rules for translating special strings`**. |
| 88 | +5. **Combine** the copied parts and the freshly translated parts into a final translation. |
| 89 | +6. In your response, include your final translation and an explanation describing what you did for each step. |
| 90 | +
|
| 91 | +# Response Format |
| 92 | +Use the following template to format your response: |
| 93 | +<<begin-translation>> |
| 94 | +{ translation } |
| 95 | +<<end-translation>> |
| 96 | +<<begin-explanation>> |
| 97 | +{ explanation } |
| 98 | +<<end-explanation>> |
| 99 | +""" |
| 100 | + |
| 101 | +SOURCE_ARTICLE = """ |
| 102 | +# Prior translation |
| 103 | +
|
| 104 | +{%if prior_translation -%} |
| 105 | +## The {{ source_language }} text of the prior translation |
| 106 | +
|
| 107 | +```wiki |
| 108 | +{{ prior_translation.source_text|safe }} |
| 109 | +``` |
| 110 | +
|
| 111 | +## The {{ target_language }} text of the prior translation |
| 112 | +
|
| 113 | +```wiki |
| 114 | +{{ prior_translation.target_text|safe }} |
| 115 | +``` |
| 116 | +{%- else -%} |
| 117 | +There is no prior translation. |
| 118 | +{%- endif %} |
| 119 | +
|
| 120 | +# The {{ source_language }} text to translate |
| 121 | +
|
| 122 | +```wiki |
| 123 | +{{ source_text|safe }} |
| 124 | +``` |
| 125 | +""" |
| 126 | + |
| 127 | + |
| 128 | +def translation_parser(message: AIMessage) -> dict[str, Any]: |
| 129 | + """ |
| 130 | + Parses the result from the LLM invocation for a translation, and returns a dictionary |
| 131 | + with the translation and the explanation. Special characters in the translation and |
| 132 | + the explanation often caused JSON decode errors when the StructuredOutputParser was |
| 133 | + used. |
| 134 | + """ |
| 135 | + result = {} |
| 136 | + content = message.content |
| 137 | + for name in ("translation", "explanation"): |
| 138 | + result[name] = content.split(f"<<begin-{name}>>")[-1].split(f"<<end-{name}>>")[0].strip() |
| 139 | + return result |
| 140 | + |
| 141 | + |
| 142 | +translation_prompt = ChatPromptTemplate( |
| 143 | + ( |
| 144 | + ("system", TRANSLATION_INSTRUCTIONS), |
| 145 | + ("human", SOURCE_ARTICLE), |
| 146 | + ), |
| 147 | + template_format="jinja2", |
| 148 | +).partial(protected_terms=L10N_PROTECTED_TERMS) |
0 commit comments