-
-
Notifications
You must be signed in to change notification settings - Fork 9.6k
[Intl] Apply localeDisplayPattern and fix locale generation #31337
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
} | ||
|
||
// Some languages are translated together with their region, | ||
// i.e. "en_GB" is translated as "British English" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not applicable; we get the language name based on Locale::getPrimaryLanguage
, which is en
for en_GB
already.
Thus we translate the expected language name already.
|
||
// Some languages are not translated | ||
// Example: "az" (Azerbaijani) has no translation in "af" (Afrikaans) | ||
if (null === ($name = $this->languageDataProvider->getName($lang, $displayLocale))) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
null API is not applicable, it throws instead. This is try/catched on the calling side already
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
besides, this case is translated in fact :D
// i.e. "en_GB" is translated as "British English" | ||
// we don't include these languages though because they mess up | ||
// the name sorting | ||
// $name = $this->langBundle->getLanguageName($displayLocale, $lang, $region); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
dead code ($this->langBundle
)
} | ||
|
||
// "as" (Assamese) has no "Variants" block | ||
//if (!$langBundle->get('Variants')) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
dead code ($langBundle
)
// For example, in German, zh_Hans is "Chinesisch (vereinfacht)". | ||
// The latter is the script part which is already included in the | ||
// extras and will be appended again with the other extras. | ||
if (preg_match('/^(.+)\s+\([^\)]+\)$/', $name, $matches)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this causes the false positive, as some primary language names matches
@@ -409,8 +409,8 @@ | |||
"my": "бірманская", | |||
"my_MM": "бірманская (М’янма (Бірма))", | |||
"nb": "нарвежская (букмол)", | |||
"nb_NO": "нарвежская (Нарвегія)", | |||
"nb_SJ": "нарвежская (Шпіцберген і Ян-Маен)", | |||
"nb_NO": "нарвежская (букмол) (Нарвегія)", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if we look at the primary language, e.g. nb
we see it already includes (букмол)
(this seems consistent across the changeset).
Other locales use different/no interpunction here, e.g. en
"nb": "Norwegian Bokmål",
"nb_NO": "Norwegian Bokmål (Norway)",
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Google confirms букмол
means Bokmål
:)
@@ -37,6 +37,7 @@ | |||
"ks": "kashmiri", | |||
"ks_IN": "kashmiri (Inde)", | |||
"lu": "luba-katanga", | |||
"lu_CD": "luba-katanga (Congo-Kinshasa)", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
now included, because it differs from the parent fr
(see https://github.com/unicode-org/icu/blob/master/icu4c/source/data/lang/fr.txt#L349 vs. https://github.com/unicode-org/icu/blob/075cefb2e21f57f4cac1bc2868e93dd1b8c077cc/icu4c/source/data/lang/fr_CA.txt#L33)
"zh_MO": "кытай (Макао Махсус Идарәле Төбәге)", | ||
"zh_SG": "кытай (Сингапур)", | ||
"zh_TW": "кытай (Тайвань)" | ||
"zh_CN": "кытай (тәрҗемә киңәше: аерым алганда, мандарин кытайчасы) (Кытай)", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this looks scary, but seems the legit translation of zh
primary language :| (see https://github.com/unicode-org/icu/blob/master/icu4c/source/data/lang/tt.txt)
playing a bit with google translate i get Chinese (Interpretation of Interpretation: Mandarin Chinese)
, but im not sure if anyone speaks tt
(Tatar?)
im still convinced the current regex wasnt anticipating it explicitly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
as we already used this translation for zh
directly (see above), im not sure we should care for now.
From here on we should rely on native speakers to add a possible "per case fix" IMHO.
to obtain the To do so, i've refactored the LocaleDataGenerator to extend from AbstractDataGenerator, just like the other data generators. It's makes it a bit easier to follow IMHO, and a decent amount of logic was duplicated. cc @webmozart |
I think this is up to ICU to follow yes/no, they don't :) As such we can replace the |
Ready for me 👍 , i'll send a follow up PR for master, to patch the timezone generator. |
This diff should be re-applied on 4.2: https://www.diffchecker.com/m77BMN8j Sorry if this is causing inconvenience :) The important change here is we expose I realized i've effectively rebuild this in master when adding timezones:
This should leverage the BundleEntryReaderInterface instead. |
@ro0NL Can you create a PR? |
See #31402 |
Thank you @ro0NL. |
…ion (ro0NL) This PR was squashed before being merged into the 3.4 branch (closes #31337). Discussion ---------- [Intl] Apply localeDisplayPattern and fix locale generation | Q | A | ------------- | --- | Branch? | 3.4 | Bug fix? | yes | New feature? | no | BC breaks? | no (few data changes) | Deprecations? | no | Tests pass? | yes (including intl-data group) | Fixed tickets | #... <!-- #-prefixed issue number(s), if any --> | License | MIT | Doc PR | symfony/symfony-docs#... <!-- required for new features --> See https://github.com/unicode-org/icu/blob/e2d85306162d3a0691b070b4f0a73e4012433444/icu4c/source/data/lang/en.txt#L1281-L1285 Technically, this should be applied here: https://github.com/symfony/symfony/blob/2b923a7c03e1ed1540c9dba224242defd9aa75cd/src/Symfony/Component/Intl/Data/Generator/LocaleDataGenerator.php#L211 This PR aims to implement it, but before it got to this point i noticed a false positive during generation (AFAIK). The current state solves this issue first. While at it, i cleaned up dead-code in `update-data.php` to reduce the noise. Commits ------- a20a6cc recompile 29e8aba [Intl] Apply localeDisplayPattern and fix locale generation
…neration (ro0NL) This PR was squashed before being merged into the 4.2 branch (closes #31402). Discussion ---------- [Intl][4.2] Apply localeDisplayPattern and fix locale generation | Q | A | ------------- | --- | Branch? | 4.2 | Bug fix? | yes | New feature? | no | BC breaks? | no <!-- see https://symfony.com/bc --> | Deprecations? | no | Tests pass? | yes <!-- please add some, will be required by reviewers --> | Fixed tickets | #... <!-- #-prefixed issue number(s), if any --> | License | MIT | Doc PR | symfony/symfony-docs#... <!-- required for new features --> Follow up of #31337 for 4.2 Commits ------- c902c8a recompile a3164de re-apply translator parents.json generation 294ae7a [Intl] Apply localeDisplayPattern and fix locale generation
…neration (ro0NL) This PR was squashed before being merged into the 4.3-dev branch (closes #31403). Discussion ---------- [Intl][4.3] Apply localeDisplayPattern and fix locale generation | Q | A | ------------- | --- | Branch? | master | Bug fix? | yes | New feature? | no | BC breaks? | no <!-- see https://symfony.com/bc --> | Deprecations? | no | Tests pass? | yes <!-- please add some, will be required by reviewers --> | Fixed tickets | #... <!-- #-prefixed issue number(s), if any --> | License | MIT | Doc PR | symfony/symfony-docs#... <!-- required for new features --> Follow up of #31337 & #31402 for 4.3 / master Commits ------- ed04f24 recompile 504db27 re-apply api changes for TimezoneDataGenerator c443e78 re-apply translator parents.json generation bf50b61 [Intl] Apply localeDisplayPattern and fix locale generation
See https://github.com/unicode-org/icu/blob/e2d85306162d3a0691b070b4f0a73e4012433444/icu4c/source/data/lang/en.txt#L1281-L1285
Technically, this should be applied here:
symfony/src/Symfony/Component/Intl/Data/Generator/LocaleDataGenerator.php
Line 211 in 2b923a7
This PR aims to implement it, but before it got to this point i noticed a false positive during generation (AFAIK). The current state solves this issue first.
While at it, i cleaned up dead-code in
update-data.php
to reduce the noise.