Difference between revisions of "Transform Framework"

From Lingoport Wiki
Jump to: navigation, search
m (Olibouban moved page Other File Types to Transform Framework)
 
(22 intermediate revisions by the same user not shown)
Line 1: Line 1:
LRM supports a number of file types out of the box (See [[Supported Resource Bundles]]). However, other file types may represent user facing strings to be translated. In that case, some customization is required to on-board those projects.
+
Localyzer supports a number of file types out of the box (See [[Supported Resource Bundles]]). However, other file types may represent user facing strings to be translated. In that case, some customization is required to on-board those projects.
 
The '''bash script transform framework''' facilitates the customization.
 
The '''bash script transform framework''' facilitates the customization.
   
 
= Analyze the file types =
 
= Analyze the file types =
If the file types fall into a category not supported by LRM out of the box, the first thing to do is to see what is the closest file types supported by LRM.
+
If the file types fall into a category not supported by Localyzer out of the box, the first thing to do is to see what is the closest file types supported by Localyzer and TMSs/LSPs/MTs.
   
 
= Use the transform framework =
 
= Use the transform framework =
The transform framework needs '''three scripts''' in order to fit in with LRM. The three scripts need to be under the <code>$JENKINS_HOME/lingoport/transform/<nameoftransform>/</code> directory.
+
The transform framework needs '''three scripts'''. The three scripts need to be in a repository, under a directory whose name will be used to identify the transform.
 
 
The <nameoftransform> must be indicative of the type of transformation to apply. For instance, it could be <code>loc</code> to handle .loc files (see below). In that case, three scripts will need to be under <code>/var/lib/jenkins/lingoport/transform/loc</code> for a typical installation where the <code>jenkins</code> user is under <code>/var/lib/jenkins</code>.
 
 
   
 
The three scripts to write are:
 
The three scripts to write are:
Line 17: Line 13:
 
* '''transform_files_list.sh''': How to transform the file names from the LRM supported file naming into the repository file naming
 
* '''transform_files_list.sh''': How to transform the file names from the LRM supported file naming into the repository file naming
   
  +
Each script can in turn call other scripts as necessary.
From Jenkins, to specify which transform directory to use, navigate to the project configuration advanced 'transform' text field and enter the directory name, for instance <code>loc</code> :
 
[[File:Transform Jenkins.JPG]]
 
   
  +
For example, under
  +
* https://github.com/Lingoport/CommandCenterConfig, the main branch
  +
a number of transform directories indicate what transform are available, in other words, what directories have those three scripts.
   
  +
For example, for the <code>txt2prop</code> transform, the directory is located :
In this example, the transformation scripts need to be located under <code>$JENKINS_HOME/lingoport/transform/loc</code>.
 
  +
* https://github.com/Lingoport/CommandCenterConfig/tree/main/transforms/txt2prop
   
Note: A configuration file on disk will be generated when running the Automation Jenkins job for that project:
 
 
<code>$JENKINS_HOME/Lingoport_Data/L10nStreamlining/<GROUP>/projects/<PROJECT>/config/transform.properties</code>
 
<pre>
 
l10n.transform=loc
 
</pre>
 
   
 
== Bash Variables ==
 
== Bash Variables ==
Line 34: Line 27:
 
They are set before calling the transform framework.
 
They are set before calling the transform framework.
   
  +
* '''TRANSFORM_DIR''' : The transform scripts directory where the scripts are executed from.
* '''CLIENT_SOURCE_DIR''' : For an LRM project such as CET.json, the CLIENT_SOURCE_DIR would typically be ~jenkins/jobs/CET.json/workspace. Note: This is not necessarily the WORKSPACE of the running Jenkins job from which the transform is called (Dashboard Update for instance).
 
  +
* '''PROJECT_TMP_DIR''' : Where temporary files can be stashed for this project during the script execution
* '''LRM_GROUP_NAME''' : The name of the LRM Group Name (e.g. 'CET' )
 
* '''LRM_PROJECT_NAME''' : The name of the LRM Project Name (e.g. 'json' )
+
* '''FULL_LIST_PATH''' : The list of all the files to be transformed from the translated files back to the repository
* '''TRANSFORM_DIR''' : The transform scripts directory (e.g. 'loc' )
 
   
== Example: .loc files ==
+
== Example: txt2propfiles ==
  +
Say the repository contains resource files like the following <code>hmUiMessage.loc</code> file:
 
  +
  +
For a simple example, say the repository contains resource files like the following <code>strings.txt</code> file:
 
<pre>
 
<pre>
  +
# Strings for use in GUI
;hmUiMessage.loc
 
  +
#################################################
;*********************************************************************
 
  +
PushIfCommits,Push if commits
#include hmUiMain.loc
 
  +
CloseButtonText,Close
;*********************************************************************
 
  +
ApplyButtonText,Apply
message1 The first message
 
  +
CancelButtonText,Cancel
message2 The second message
 
message3 The third message
 
message4 The fourth message
 
 
</pre>
 
</pre>
   
The file may not be in ASCII or UTF-8 format; For instance this file is in UTF-16BE
+
The file may not be in ASCII, UTF format, etc.; For instance this file is in UTF-8
   
A supported file format that is close to this one is <code>properties</code>.
+
A supported file format that is close to this one is <code>properties</code>. So the transform script will change the file name, extension, and content, so as to be a standard resource file:
   
  +
<code>strings.properties</code>
== transform_from_repo.sh ==
 
  +
<pre>
  +
# Strings for use in GUI
  +
#################################################
  +
PushIfCommits=Push if commits
  +
CloseButtonText=Close
  +
ApplyButtonText=Apply
  +
CancelButtonText=Cancel
  +
</pre>
  +
  +
=== transform_from_repo.sh ===
 
An <i>example</i> snippet of bash code for this type of file may be something like:
 
An <i>example</i> snippet of bash code for this type of file may be something like:
 
<pre>
 
<pre>
 
#!/bin/bash
 
#!/bin/bash
  +
find . -name "string*\.txt" -type f > "${PROJECT_TMP_DIR}/input_files.txt"
   
  +
cat "${PROJECT_TMP_DIR}/input_files.txt" | while read -r FILEPATH
# Find all the files ending in 'loc'
 
find $CLIENT_SOURCE_DIR -name "*loc" > ~/tmp/input_files.txt
 
 
# Transform each .loc file into a .properties file
 
cat ~/tmp/input_files.txt | while read -r FILEPATH
 
 
do
 
do
 
FILENAME=`basename $FILEPATH`
 
FILENAME=`basename $FILEPATH`
 
DIRNAME=`dirname $FILEPATH`
 
DIRNAME=`dirname $FILEPATH`
  +
SUFFIX=".txt"
file "$FILEPATH"
 
  +
ROOTNAME=${FILENAME%$SUFFIX}
SUFFIX=".loc"
 
  +
TARGET_NAME="${ROOTNAME//-/_}.properties"
ROOTNAME=${FILEPATH%$SUFFIX}
 
  +
TARGET_PATH="${DIRNAME}/${TARGET_NAME}"
TARGET="${ROOTNAME}.properties"
 
iconv -f UTF-16 -t UTF-8 -c "$FILEPATH" > "$TARGET"
+
echo " Transform [$FILENAME] -> [$TARGET_NAME]"
  +
sed -i 's/^#/# #/' "$TARGET"
 
  +
rm $TARGET_PATH 2> /dev/null
sed -i 's/^;/# ;/' "$TARGET"
 
  +
touch $TARGET_PATH
sed -i -e "s/[[:space:]]\+/=/" "$TARGET"
 
  +
sed -i -e "s/^=$//" "$TARGET"
 
  +
sed -i 's/,,/, ,/' $FILEPATH
done
 
  +
cat $FILEPATH | while read -r LINE
  +
do
  +
IFS=',' tokens=( $LINE )
  +
  +
if [ -z "${tokens[1]}" ]
  +
then
  +
echo "${LINE}" >> $TARGET_PATH
  +
else
  +
KEY=${tokens[0]}
  +
VALUE=${LINE#"${KEY},"}
  +
echo "${KEY}=${VALUE}" >> $TARGET_PATH
  +
fi
  +
done
  +
IFS=' '
  +
  +
done
 
</pre>
 
</pre>
   
== transform_to_repo.sh ==
+
=== transform_to_repo.sh ===
 
An <i>example</i> snippet of bash code for this type of file may be something like:
 
An <i>example</i> snippet of bash code for this type of file may be something like:
 
<pre>
 
<pre>
 
#!/bin/bash
 
#!/bin/bash
  +
find . -name "string*\.txt" -type f > "${PROJECT_TMP_DIR}/input_files.txt"
   
  +
cat "${PROJECT_TMP_DIR}/input_files.txt" | while read -r FILEPATH
# Find all the files ending in .properties
 
find $CLIENT_SOURCE_DIR -name "*.properties" > ~/tmp/input_files.txt
 
 
#
 
# Transform each .properties into a .loc
 
#
 
cat ~/tmp/input_files.txt | while read -r FILEPATH
 
 
do
 
do
 
FILENAME=`basename $FILEPATH`
 
FILENAME=`basename $FILEPATH`
 
DIRNAME=`dirname $FILEPATH`
 
DIRNAME=`dirname $FILEPATH`
  +
SUFFIX=".txt"
ls -l "$FILEPATH"
 
  +
ROOTNAME=${FILENAME%$SUFFIX}
SUFFIX=".properties"
 
  +
TARGET_NAME="${ROOTNAME//-/_}.properties"
ROOTNAME=${FILEPATH%$SUFFIX}
 
  +
TARGET_PATH="${DIRNAME}/${TARGET_NAME}"
TARGET="${ROOTNAME}.loc"
 
  +
echo " Transform [$FILENAME] -> [$TARGET_NAME]"
cp "$FILEPATH" "$TARGET"
 
  +
sed -i 's/^#=#/#/' "$TARGET"
 
  +
rm $TARGET_PATH 2> /dev/null
sed -i 's/^#=;/;/' "$TARGET"
 
  +
touch $TARGET_PATH
sed -i -e "s/^#\([[:alnum:]]*\)/;\1/" "$TARGET"
 
  +
sed -i -e "s/\([[:alnum:]]*\)=/\1\t/" "$TARGET"
 
  +
sed -i 's/,,/, ,/' $FILEPATH
iconv -f UTF-8 -t UTF-16 -c "$TARGET" > tmp.tmp
 
  +
cat $FILEPATH | while read -r LINE
mv tmp.tmp "$TARGET"
 
  +
do
done
 
  +
IFS=',' tokens=( $LINE )
  +
  +
if [ -z "${tokens[1]}" ]
  +
then
  +
echo "${LINE}" >> $TARGET_PATH
  +
else
  +
KEY=${tokens[0]}
  +
VALUE=${LINE#"${KEY},"}
  +
echo "${KEY}=${VALUE}" >> $TARGET_PATH
  +
fi
  +
done
  +
IFS=' '
  +
  +
done
 
</pre>
 
</pre>
   
== transform_files_list.sh ==
+
=== transform_files_list.sh ===
 
An <i>example</i> snippet of bash code for this type of file may be something like:
 
An <i>example</i> snippet of bash code for this type of file may be something like:
 
<pre>
 
<pre>
 
#!/bin/bash
 
#!/bin/bash
# Check if there is a parameter
 
 
if [ -z "$1" ]
 
if [ -z "$1" ]
 
then
 
then
Line 121: Line 144:
 
fi
 
fi
   
# If the file exists then do something, otherwise exit
 
 
if [ -f "$1" ]; then
 
if [ -f "$1" ]; then
 
echo " File to rewrite: $1"
 
echo " File to rewrite: $1"
Line 129: Line 151:
 
fi
 
fi
   
  +
# Rename .properties to .loc files inside the list of files passed as a parameter
 
  +
echo " "
sed -i 's/\.properties/.loc/' "$1"
 
  +
echo " --------------------------------------------"
  +
echo " Files to Modify: $1"
  +
echo " for repository formatted files, not LRM OOTB ones"
  +
  +
# strings<locale>.properties -> strings<locale>.txt
  +
echo " >> strings<locale>.properties to strings<locale>.txt"
  +
sed -i 's/\.properties/.txt/' "$1"
  +
sed -i 's/strings_/strings-/' "$1"
  +
sed -i 's/strings-zh_Hans/strings-zh-Hans/' "$1"
  +
sed -i 's/strings-zh_Hant/strings-zh-Hant/' "$1"
  +
sed -i 's/_/-/g' "$1"
  +
  +
echo " "
  +
ls -l "$1"
  +
cat "$1"
  +
echo " --------------------------------------------"
 
</pre>
 
</pre>
  +
  +
== Command Center Settings ==
  +
  +
=== Transform System Files ===
  +
The transform is retrieve like any system file from the repository and set as a system file.
  +
  +
* Put Command Center in Maintenance Mode
  +
* Go to Settings
  +
* Go to System Files
  +
* Add the Transform from the repository
  +
  +
For example:
  +
  +
[[File:Transform System File.jpg|center|600px]]
  +
  +
When the repository is updated and the transform scripts are modified, update from the Edit button above to get the latest. The commit hash indicates which commit was last.
  +
  +
=== Set the Transform for a Project ===
  +
Now that the Transform is ready to be used, a project can be configured to run the transform. To do so, either during the creation or the edit of the project.
  +
1. Make sure the project resource format is that of the transformed file, not the files in the repository. Here, that's .properties files.
  +
2. Click the 'Use transform script' checkbox and select the transform of your choice, here 'txt2prop'
  +
  +
For example:
  +
[[File:Transform Project Configuration.jpg|center|600px]]
  +
  +
  +
Now whenever the project is analyzed or files are sent to translation, the repository is cloned or pulled, the files identified with .txt are transformed into .properties, then analyzed or sent to translation as properties.
  +
  +
Files are received from translation as properties, are analyzed for correctness, etc. as properties, then if the validations are passed, the files are transformed back in to .txt files and pushed to the repository.

Latest revision as of 20:56, 27 June 2024

Localyzer supports a number of file types out of the box (See Supported Resource Bundles). However, other file types may represent user facing strings to be translated. In that case, some customization is required to on-board those projects. The bash script transform framework facilitates the customization.

Analyze the file types

If the file types fall into a category not supported by Localyzer out of the box, the first thing to do is to see what is the closest file types supported by Localyzer and TMSs/LSPs/MTs.

Use the transform framework

The transform framework needs three scripts. The three scripts need to be in a repository, under a directory whose name will be used to identify the transform.

The three scripts to write are:

  • transform_from_repo.sh: How to transform the files from the repository so they fit into an LRM supported file type
  • transform_to_repo.sh: How to transform translated/pseudo-localized files in an LRM supported file type into the repository file type
  • transform_files_list.sh: How to transform the file names from the LRM supported file naming into the repository file naming

Each script can in turn call other scripts as necessary.

For example, under

a number of transform directories indicate what transform are available, in other words, what directories have those three scripts.

For example, for the txt2prop transform, the directory is located :


Bash Variables

A few Bash variables are available when called from the Lingoport Jenkins jobs that use the transform framework. They are set before calling the transform framework.

  • TRANSFORM_DIR : The transform scripts directory where the scripts are executed from.
  • PROJECT_TMP_DIR : Where temporary files can be stashed for this project during the script execution
  • FULL_LIST_PATH : The list of all the files to be transformed from the translated files back to the repository

Example: txt2propfiles

For a simple example, say the repository contains resource files like the following strings.txt file:

# Strings for use in GUI
#################################################
PushIfCommits,Push if commits
CloseButtonText,Close
ApplyButtonText,Apply
CancelButtonText,Cancel

The file may not be in ASCII, UTF format, etc.; For instance this file is in UTF-8

A supported file format that is close to this one is properties. So the transform script will change the file name, extension, and content, so as to be a standard resource file:

strings.properties

# Strings for use in GUI
#################################################
PushIfCommits=Push if commits
CloseButtonText=Close
ApplyButtonText=Apply
CancelButtonText=Cancel

transform_from_repo.sh

An example snippet of bash code for this type of file may be something like:

#!/bin/bash
find . -name "string*\.txt" -type f > "${PROJECT_TMP_DIR}/input_files.txt"

cat "${PROJECT_TMP_DIR}/input_files.txt" | while read -r FILEPATH
do
  FILENAME=`basename $FILEPATH`
  DIRNAME=`dirname $FILEPATH`
  SUFFIX=".txt"
  ROOTNAME=${FILENAME%$SUFFIX}
  TARGET_NAME="${ROOTNAME//-/_}.properties"
  TARGET_PATH="${DIRNAME}/${TARGET_NAME}"
  echo "    Transform [$FILENAME] -> [$TARGET_NAME]"

  rm $TARGET_PATH 2> /dev/null
  touch $TARGET_PATH

  sed -i 's/,,/, ,/' $FILEPATH 
  cat $FILEPATH | while read -r LINE
  do
      IFS=',' tokens=( $LINE )

      if [ -z "${tokens[1]}" ]
      then
        echo "${LINE}" >> $TARGET_PATH
      else
        KEY=${tokens[0]}
        VALUE=${LINE#"${KEY},"}
        echo "${KEY}=${VALUE}" >> $TARGET_PATH
      fi
  done
IFS=' '

done 

transform_to_repo.sh

An example snippet of bash code for this type of file may be something like:

#!/bin/bash
find . -name "string*\.txt" -type f > "${PROJECT_TMP_DIR}/input_files.txt"

cat "${PROJECT_TMP_DIR}/input_files.txt" | while read -r FILEPATH
do
  FILENAME=`basename $FILEPATH`
  DIRNAME=`dirname $FILEPATH`
  SUFFIX=".txt"
  ROOTNAME=${FILENAME%$SUFFIX}
  TARGET_NAME="${ROOTNAME//-/_}.properties"
  TARGET_PATH="${DIRNAME}/${TARGET_NAME}"
  echo "    Transform [$FILENAME] -> [$TARGET_NAME]"

  rm $TARGET_PATH 2> /dev/null
  touch $TARGET_PATH

  sed -i 's/,,/, ,/' $FILEPATH 
  cat $FILEPATH | while read -r LINE
  do
      IFS=',' tokens=( $LINE )

      if [ -z "${tokens[1]}" ]
      then
        echo "${LINE}" >> $TARGET_PATH
      else
        KEY=${tokens[0]}
        VALUE=${LINE#"${KEY},"}
        echo "${KEY}=${VALUE}" >> $TARGET_PATH
      fi
  done
IFS=' '

done 

transform_files_list.sh

An example snippet of bash code for this type of file may be something like:

#!/bin/bash
if [ -z "$1" ]
  then
    echo "Error: Missing the argument like /<path>/pseudo_files.txt"
    exit 1
fi

if [ -f "$1" ]; then
    echo " File to rewrite: $1"
else
    echo " $1 not found"
    exit 1
fi


echo " "
echo " --------------------------------------------"
echo " Files to Modify:  $1"
echo " for repository formatted files, not LRM OOTB ones"

# strings<locale>.properties -> strings<locale>.txt
echo "   >>  strings<locale>.properties to strings<locale>.txt"
sed -i 's/\.properties/.txt/' "$1"
sed -i 's/strings_/strings-/' "$1"
sed -i 's/strings-zh_Hans/strings-zh-Hans/' "$1"
sed -i 's/strings-zh_Hant/strings-zh-Hant/' "$1"
sed -i 's/_/-/g' "$1"

echo " " 
ls -l "$1"
cat "$1"
echo " --------------------------------------------"

Command Center Settings

Transform System Files

The transform is retrieve like any system file from the repository and set as a system file.

  • Put Command Center in Maintenance Mode
  • Go to Settings
  • Go to System Files
  • Add the Transform from the repository

For example:

Transform System File.jpg

When the repository is updated and the transform scripts are modified, update from the Edit button above to get the latest. The commit hash indicates which commit was last.

Set the Transform for a Project

Now that the Transform is ready to be used, a project can be configured to run the transform. To do so, either during the creation or the edit of the project. 1. Make sure the project resource format is that of the transformed file, not the files in the repository. Here, that's .properties files. 2. Click the 'Use transform script' checkbox and select the transform of your choice, here 'txt2prop'

For example:

Transform Project Configuration.jpg


Now whenever the project is analyzed or files are sent to translation, the repository is cloned or pulled, the files identified with .txt are transformed into .properties, then analyzed or sent to translation as properties.

Files are received from translation as properties, are analyzed for correctness, etc. as properties, then if the validations are passed, the files are transformed back in to .txt files and pushed to the repository.