Skip to content

ICU-23137 ICU tests fail if data is updated to current CLDR #3526

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

mihnita
Copy link
Contributor

@mihnita mihnita commented Jun 17, 2025

DO NOT MERGE!!!

Checklist

  • Required: Issue filed: ICU-23137
  • Required: The PR title must be prefixed with a JIRA Issue number. Example: "ICU-1234 Fix xyz"
  • Required: Each commit message must be prefixed with a JIRA Issue number. Example: "ICU-1234 Fix xyz"
  • Issue accepted (done by Technical Committee after discussion)
  • Tests included, if applicable
  • API docs and/or User Guide docs changed or added, if applicable

@grhoten
Copy link
Member

grhoten commented Jun 18, 2025

Are your tools and data out of sync? The latest cldr-to-icu has the required changes, and they worked for me. I tested my changes with my CLDR changes to get the RBNF data to be correctly derived. I saw no changes, except for the new Polish rules, which is expected.

What is the tag that you're using for release-48-m2? I'd like to confirm that it has the latest CLDR structure for RBNF, and these differences hint at a mismatch.

@mihnita
Copy link
Contributor Author

mihnita commented Jun 19, 2025

Are your tools and data out of sync? The latest cldr-to-icu has the required changes, and they worked for me. I tested my changes with my CLDR changes to get the RBNF data to be correctly derived. I saw no changes, except for the new Polish rules, which is expected.

I used the HEAD for the ICU repo.
For cldr and cldr-release I tried both at HEAD and at release-48-m2

What is the tag that you're using for release-48-m2? I'd like to confirm that it has the latest CLDR structure for RBNF, and these differences hint at a mismatch.

That is the tag, release-48-m2.
You can do a git tag | grep 48 to see it, in both cldr and cldr-release

@grhoten
Copy link
Member

grhoten commented Jun 19, 2025

I used the HEAD for ICU and CLDR, and I could not reproduce this issue. That includes the tools. I followed these cldr-to-icu instructions. I got a completely different set of failures, but the RBNF data looks fine.

On a related note, release-48-m2 has the incompatible old data structure. It's too old. I didn't think it would be helpful to support the old syntax with the new tool. I'd figure that you want to use the new data with the new tool. If you really want to use the old data structure, you may want to consider using the previous version of RbnfMapper.java in cldr-to-icu tool to get the old behavior.

@mihnita
Copy link
Contributor Author

mihnita commented Jun 20, 2025

I just tried:


Minimal steps, ready to copy-paste (in CLI or a shell script)

Set variables

# WARNING: if these are in a script the script should be sourced, not called normally
export BASE_DIR=$PWD
export ICU_DIR=$BASE_DIR/icu
export CLDR_DIR=$BASE_DIR/cldr
export CLDR_DATA_DIR=$BASE_DIR/cldr-staging/production

CLDR 2 icu4c .txt files

cd "$ICU_DIR"
mvn clean install -q -f icu4j -DskipTests -DskipITs
cd "$CLDR_DIR"
mvn clean install -q -pl :cldr-all,:cldr-code -DskipTests -DskipITs
cd "$ICU_DIR/tools/cldr/cldr-to-icu/"
mvn clean package -q -DskipTests -DskipITs
java -jar target/cldr-to-icu-1.0-SNAPSHOT-jar-with-dependencies.jar

icu4c .txt to icu4j .res and other binary files

cd "$ICU_DIR/icu4c/source"
ICU_DATA_BUILDTOOL_OPTS=--include_uni_core_data ./runConfigureICU macOS/gcc --disable-layoutex
make clean
make -j24
make JAR=jar ICU4J_ROOT=$ICU_DIR/icu4j -j24 icu4j-data-install
cd "$ICU_DIR/icu4j"
./extract-data-files.sh

Test icu4c

cd "$ICU_DIR/icu4c/source"
./runConfigureICU macOS/gcc
make clean
make -j24
make check

Test icu4j

cd "$ICU_DIR/icu4j"
mvn package

Done on macOS (hence the runConfigureICU macOS/gcc), but I first bumped into this on Linux.

$ mvn -version

Apache Maven 3.9.6 (bc0240f3c744dd6b6ec2920b3cd08dcc295161ae)
Maven home: /Users/mnita/apps/maven
Java version: 11.0.27, vendor: Eclipse Adoptium, runtime: /Users/mnita/apps/jdk-11.0.27+6/Contents/Home
Default locale: en_US, platform encoding: UTF-8
OS name: "mac os x", version: "15.5", arch: "aarch64", family: "mac"

@grhoten
Copy link
Member

grhoten commented Jun 20, 2025

I believe that this is what I ran (minus some aimless wandering). Perhaps you're missing some steps, like copying the dtd files?

cd ~/Development/icu/
export ICU_ROOT=`pwd`
export SRC_ROOT=$ICU_ROOT/icu/icu4c/source
export CLDR_ROOT=$ICU_ROOT/cldr
export TOOLS_ROOT=$ICU_ROOT/icu/tools
export ANT_OPTS="-Xmx8192m"
export CLDR_ROOT=~/Development/cldr
export CLDR_DIR=~/Development/cldr
export ICU4C_DIR=$HOME/icu/icu4c
export ICU4J_ROOT=$HOME/icu/icu4j
export TOOLS_ROOT=$HOME/icu/tools
export ICU4C_DIR=/Users/grhoten/Development/icu/icu4c
export ICU4J_ROOT=/Users/grhoten/Development/icu/icu4j
export TOOLS_ROOT=/Users/grhoten/Development/icu/tools
export CLDR_TMP_DIR=~/Development/cldr-staging
export ICU_DIR=~/Development/icu
export CLDR_DATA_DIR=~/Development/cldr-staging/production
cd $ICU4J_ROOT
/usr/local/apache-maven/bin/mvn clean
cp -p $CLDR_DIR/common/dtd/ldml.dtd $ICU4C_DIR/source/data/dtd/cldr/common/dtd/
cp -p $CLDR_DIR/common/dtd/ldmlICU.dtd $ICU4C_DIR/source/data/dtd/cldr/common/dtd/
/usr/local/apache-maven/bin/mvn verify
cd "$CLDR_DIR"
/usr/local/apache-maven/bin/mvn clean install -pl :cldr-all,:cldr-code -DskipTests -DskipITs
cd $ICU4J_ROOT
/usr/local/apache-maven/bin/mvn install
cd "$CLDR_DIR"
/usr/local/apache-maven/bin/mvn clean install -pl :cldr-all,:cldr-code -DskipTests -DskipITs
cd $ICU4C_DIR/source/data
/usr/local/apache-ant/bin/ant cleanprod
/usr/local/apache-ant/bin/ant setup
/usr/local/apache-ant/bin/ant proddata
cd $TOOLS_ROOT/cldr/cldr-to-icu
/usr/local/apache-maven/bin/mvn clean package -DskipTests -DskipITs
java -jar target/cldr-to-icu-1.0-SNAPSHOT-jar-with-dependencies.jar --cldrDataDir="$CLDR_TMP_DIR/production"

At the last step, I got the results in #3528. I didn't go further because it's obvious that we're not getting the same results at this point. My pull request got the correct results.

Your first 2 pull requests look to be as expected. Your last one is confusing as to why you're getting different results.

@richgillam richgillam added the incomplete Needs work; do not approve/merge as is. label Jun 26, 2025
@mihnita mihnita marked this pull request as draft June 26, 2025 16:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
incomplete Needs work; do not approve/merge as is.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants