Difference between revisions of "Git to AWS S3 System"

From Lingoport Wiki
Jump to: navigation, search
(Git to S3 Installation and Configuration)
 
(40 intermediate revisions by 2 users not shown)
Line 5: Line 5:
   
 
To do so, the following will be needed:
 
To do so, the following will be needed:
* '''Git/S3 System:''' one Linux system will host the ''bash'' scripts to automate the transfer from Git to S3 and back, most likely inside the customers network.
+
* '''Git/S3 System:''' one '''Linux''' system will host the ''bash'' scripts to automate the transfer from Git to S3 and back, most likely inside the customers network.
 
* '''Lingoport System:''' The system hosting Jenkins, Dashboard, etc., connected to S3 and the TMS
 
* '''Lingoport System:''' The system hosting Jenkins, Dashboard, etc., connected to S3 and the TMS
 
* '''A dedicated S3 bucket:''' The S3 bucket will have two main top directories:
 
* '''A dedicated S3 bucket:''' The S3 bucket will have two main top directories:
Line 16: Line 16:
   
 
This section applies to both:
 
This section applies to both:
* '''Git/S3 System:''' : Note: The disk size should be based on the volume of files sizes for the repositories to be on-boarded.
+
* '''Git/S3 System''' Note: The disk size should be based on the volume of files sizes for the repositories to be on-boarded.
* '''Lingoport System:'''
+
* '''Lingoport System'''
   
 
On both systems, the AWS S3 client need to be installed with the proper credentials.
 
On both systems, the AWS S3 client need to be installed with the proper credentials.
Line 100: Line 100:
   
 
</pre>
 
</pre>
 
   
 
= Git to S3 Installation and Configuration =
 
= Git to S3 Installation and Configuration =
In order to set up the automation from Git to S3 and back, make sure you have the git_to_s3.zip file. If you do not have it, please contact support (at) lingoport dot com.
+
In order to set up the automation from Git to S3 and back, make sure you have the <code>git_to_s3.zip</code> file. If you do not have it, please contact support (at) lingoport dot com.
   
 
== Git Access ==
 
== Git Access ==
In addition to access to AWS S3 set up in the previous section, this system needs to be able to clone, pull, add, commit, push files to the git repositories of interest.
+
In addition to access to AWS S3 set up in the previous section, this system needs to be able to clone, pull, add, commit, push files to the git repositories of interest. There are many ways to do so.
   
  +
''For example'', see the [[Git|Lingoport Wiki Git page]].
'''TO DO/ Git Setup? Leave it to the reader??'''
 
   
== Installation ==
+
=== Verification ===
  +
Make sure all is set up correctly by simply cloning a project of interest for Localyzer. For instance
* Unzip the git_to_s3.zip file in a directory accessible by Cron jobs.
 
  +
git clone https://github.com/mycompany/myrepo
  +
  +
In addition, make sure the branch of interest needs is writable.
  +
  +
== Scripts Installation ==
  +
* Unzip the <code>git_to_s3.zip</code> file in a directory accessible by Cron jobs.
 
This should result in the following set of files:
 
This should result in the following set of files:
* git_to_s3/scripts: where the bash scripts reside to select and transfer files to and from git/S3. Make sure the .sh files are executable. If not, run ''chmod +x *.sh''.
+
* <code>git_to_s3/scripts</code>: where the bash scripts reside to select and transfer files to and from git/S3. Make sure the .sh files are executable. If not, run ''chmod +x *.sh''.
* git_to_s3/config: this is where the git repository, branches, file types, and optionally directories are set up
+
* <code>git_to_s3/config</code>: this is where the git repository, branches, file types, and optionally directories are set up
* git_to_s3/logs: this is where the log files will end up
 
   
 
== Configuration ==
 
== Configuration ==
Line 121: Line 125:
 
Follow the README.md instructions.
 
Follow the README.md instructions.
   
  +
Mostly
 
* ''s3config.properties'': Set the S3 bucket and to_localyzer/from_localyzer directories
+
=== s3config.properties===
  +
* Set the S3 bucket and to_localyzer/from_localyzer directories
 
S3_TO_LOCALYZER=s3://<S3 URL>/to_localyzer/
 
S3_TO_LOCALYZER=s3://<S3 URL>/to_localyzer/
 
S3_FROM_LOCALYZER=s3://<S3 URL>/from_localyzer/
 
S3_FROM_LOCALYZER=s3://<S3 URL>/from_localyzer/
  +
* Set the DATA_DIR location for the working directories. The directory must exist and be readable/writable.
  +
DATA_DIR="/full/path/git_s3_data"
  +
* Set the commit message for the translated files going back to the repository:
  +
COMMIT_MESSAGE="Localyzer Git-S3"
   
  +
===repositories.txt===
* ''repositories.txt'': Set the Git URL, branch, and optionally the directories to include, one per line, in the following format:
 
  +
* Set the Git URL, branch, and optionally the directories to include, one per line, in the following format:
 
https://<giturl>/<organization>/<repository> <branch> <optionally, comma separated list of include dirs>
 
https://<giturl>/<organization>/<repository> <branch> <optionally, comma separated list of include dirs>
   
 
For instance:
 
For instance:
 
https://github.com/lingoport-public/Rebel-Outfitters Payments
 
https://github.com/lingoport-public/Rebel-Outfitters Payments
  +
https://github.com/lingoport-public/Rebel-Outfitters Miis content
   
  +
Where <code>Payments</code> and <code>Miis</code> are branches for that repository, and <code>content</code> is the only directory in the <code>Miis</code> branch with files to be translated.
   
* ''gitProjectLocation.txt'': Set the location of where the Git repository will be cloned before selecting which files to push to S3 and the selected files to push to S3. It's a one liner with the directory name. For instance:
 
/var/lib/s3data
 
   
  +
===fileSuffixes.txt===
* ''fileSuffixes.txt'': Set the resource file extensions so only those files are copied to the S3 bucket. For instance:
 
  +
* Set the resource file extensions so only those files are copied to the S3 bucket. For instance:
   
 
.properties
 
.properties
Line 142: Line 153:
 
.resx
 
.resx
   
  +
== Verification==
= Project Config =
 
  +
To check that the '''Git to S3''' system is set up and configured properly:
  +
* make sure you do have at least one repository configured in the ''repositories.txt'' file
  +
* run the <code>wrapper.sh</code> script
  +
* Look in the S3 bucket and verify the files of interest are there with <code>aws s3 ls</code> commands
   
  +
aws s3 ls s3://lingoport-s3-roundtrip/to_localyzer/
First [[On-Boarding_New_Projects|On-Board your Project]], (but set the VCS details to 'None'). Then run the associated Jenkins job once (this pre-populates several directories). Note: The Jenkins Job is expected to fail as the setup is not yet complete.
 
  +
aws s3 ls s3://lingoport-s3-roundtrip/to_localyzer/<RepositoryName_BranchName>/
  +
etc.
   
  +
== Cron ==
Then, edit the file:
 
  +
In order for the scripts to run automatically, we recommend setting up a cronjob with a frequency of your choice.
   
  +
For instance, see: https://www.digitalocean.com/community/tutorials/how-to-use-cron-to-automate-tasks-centos-8
<code>/var/lib/jenkins/Lingoport_Data/L10nStreamlining/<your group>/projects/<your project>/config/config_vcs.properties</code>
 
   
  +
The crontab may look like the following to run the Git to S3 scripts on an hourly basis:
Adding the following:
 
  +
H * * * * /path/to/wrapper.sh >/dev/null 2>&1
   
  +
This means that there will be a delay of one hour at minima, in addition to the delay from running the Localyzer projects.
<pre>
 
...
 
VCS_TYPE=S3
 
...
 
S3_BUCKET_URL=s3://my-bucket/optionalsubdirs
 
...
 
</pre>
 
   
  +
Cron is typically set up as a Linux service. You may want to stop and start the cron service and check its status with:
If your bucket uses AES256 encryption, add the following at the end:
 
   
  +
sudo systemctl stop crond
<pre>
 
  +
sudo systemctl start crond
S3_OPTS=--sse AES256
 
  +
sudo systemctl status crond
</pre>
 
   
  +
== Logging ==
Otherwise, leave blank:
 
  +
Although not mandatory to have a functional system, having a log file to review if some issues are detected can be invaluable and is recommended. To maintain a log of the S3 syncing activities there are two items that must be configured.
  +
* Add a redirect of standard out and standard error to a log file.
  +
* Automatically rotate the log(s) that are created.
   
  +
To create a new log first create a new directory /var/log/s3 and set the ownership identical to the user that executes the S3 wrapper.sh. It will likely be necessary to put this directory in /var/log if selinux is enabled on the system. Selinux can prevent writing to other directories within the system from crontab.
<pre>
 
  +
S3_OPTS=
 
  +
Once the /var/log/s3 directory has been created, modify the crontab as noted below to create a date entry in the log file and output both standard you and standard error to the log file every time the script executes. In this case the script will run hourly.
</pre>
 
  +
  +
DATEVAR=date +%c
  +
0 */1 * * * /bin/echo $(${DATEVAR}) >> /var/log/s3/s3_wrapper.log; /path/to/wrapper.sh >> /var/log/s3/s3_wrapper.log 2>&1
  +
  +
To permit automated log rotation of the newly created log by the operating system, create a new file in the /etc/logrotate.d/ directory. In this case it was named s3wrapper. Below are is an example of its contents. Be sure the change the <USER> <GROUP> entries to match user and group of the user running the wrapper.sh (without the "< >".
  +
  +
In this case it will create a total of seven rotations of the log file, if the log is >50KB in size. It will also compress the log as it rotates out. Feel free to adjust the number of rotations and size to fit the requirements of the system. These values are simply suggestions.
  +
  +
/var/log/s3/s3_wrapper.log {
  +
missingok
  +
notifempty
  +
compress
  +
copytruncate
  +
rotate 7
  +
size 50k
  +
create 0660 <USER> <GROUP>
  +
}
  +
  +
At this point the system should be populating the log file automatically whenever cron executes, and also rotating the logs on a daily basis.
  +
  +
== Debugging ==
  +
If something is not working right in the system, run the <code>wrapper.sh</code> script outside of ''cron'' with the <code>-x</code> flag. It will output debug information to the console.
  +
  +
bash -x /path/to/wrapper.sh
  +
  +
<code>bash -x</code> will echo each line before executing it; In particular, the variable assignment will be visible.
  +
  +
= Command Center AWS S3 Data Source Configuration =
  +
See:
  +
* https://wiki.lingoport.com/index.php?title=AWS_S3_Data_Source_Credential
  +
  +
=== Jenkins/Dashboard Localyzer Project Configuration (Unsupported) ===
  +
This section used to apply to Jenkins and Dashboard, i.e. versions up to Japan of our product line.
  +
'''This is not supported any longer.'''
  +
  +
* [[On-Boarding_New_Projects|On-Board your Project]], (but set the VCS details to 'None').
  +
* Run the associated Jenkins job once (this pre-populates several directories). ''Note'': The Jenkins Job is expected to fail as the setup is not yet complete.
  +
  +
* Edit the file:
  +
<code>/var/lib/jenkins/Lingoport_Data/L10nStreamlining/<your group>/projects/<your project>/config/config_vcs.properties</code>
  +
  +
* The configuration requires three parameters to be set. Make sure the <code><S3 URL></code>, <code>to_localyzer</code> and <code>from_localyzer</code> are the same as set up on the '''Git to S3 System'''.
  +
VCS_TYPE=s3
  +
S3_TO_LOCALYZER=s3://<S3 URL>/to_localyzer/<RepoName>_<branchName>
  +
S3_FROM_LOCALYZER=s3://<S3 URL>/from_localyzer/<RepoName>_<branchName>
  +
  +
  +
* If your bucket uses AES256 encryption, add the following at the end:
  +
S3_OPTS=--sse AES256
  +
  +
** Otherwise, leave blank:
  +
S3_OPTS=
  +
  +
* Rerun the automation project and make sure the files are indeed correctly read in from S3, either by going to the workspace or checking the <code>Code</code> page on the Dashboard.

Latest revision as of 15:32, 12 December 2023

Introduction

Customers may want to isolate the actual repositories from Lingoport's products, especially for Localyzer. One option to do so is to push files to AWS S3 from the repositories and let Localyzer access only S3.

If customers decide on this option, we recommend automating the process from Git to S3 to Localyzer to the TMS and back. One of the keys here is to automate the transfer of the desired files (typically resource files such as.properties or .json) from Git to S3 and from S3 to Git, then to on-board the Localyzer project using S3 as the data source for the resource files.

To do so, the following will be needed:

  • Git/S3 System: one Linux system will host the bash scripts to automate the transfer from Git to S3 and back, most likely inside the customers network.
  • Lingoport System: The system hosting Jenkins, Dashboard, etc., connected to S3 and the TMS
  • A dedicated S3 bucket: The S3 bucket will have two main top directories:
    • a to_localyzer top level directory: Under this directory will be a directory per Git repository and branch. This is where the configured files (.properties, .resx, .json, etc.) coming from each Git repositories will be retrieved by Localyzer for analysis and sending to translation.
    • a from_localyzer top level directory: Under this directory will be a directory per Git repository and branch, created by Localyzer, with translated files. These files will be picked up by the Git/S3 System scripts and pushed to the repositories.

Center

AWS Installation and Configuration

This section applies to both:

  • Git/S3 System Note: The disk size should be based on the volume of files sizes for the repositories to be on-boarded.
  • Lingoport System

On both systems, the AWS S3 client need to be installed with the proper credentials.

On the Git/S3 System, the scripts need to be downloaded and set up with Cron with a frequency to be decided by the customer.

On the Lingoport System, the project will be on-boarded using S3 as the VCS method.


Install AWS Client V2

On the Unix box, install AWS Client (Version 2). To do so, follow this link

Or quickreference on Linux is:

$ whoami
# should be root, or a user with 'sudo' access

$ curl "https://awscli.amazonaws.com/awscli-exe-linux-x86_64.zip" -o "awscliv2.zip"
$ unzip awscliv2.zip
$ sudo ./aws/install

AWS User

The system authenticates to S3 by providing an AWS aws_access_key_id and the associated aws_secret_access_key.

Most common is to create a service account to provide these credentials.

Please find associated aws documentation here

The provided account must have permissions to read, download from, and write to, the associated AWS S3 bucket.

Storing the AWS Credentials

As the 'jenkins' user on the target system, create /var/lib/jenkins/.aws (~/.aws as 'jenkins'), along with a .aws/config and .aws/credentials.

Examples:

$ whoami
jenkins

$ mkdir -p ~/.aws

$ : #Substitute your region for us-east-1 as needed:

$ cat <<EOF >> ~/.aws/config
[default]
region=us-east-1
output=json
EOF

$ : # Fill in the aws_access_key_id and aws_secret_access_key per your organization's AWS service account:

$ cat <<EOF >> ~/.aws/credentials
[default]
aws_access_key_id=<access key id associated with read+write access to the target S3 bucket per your Org>
aws_secret_access_key=<secret access key associated with the aws_access_key_id above>
notes="S3 Read+Write access for <your Org>"
EOF

Test

Make sure you can read, download from, and write to the target s3 bucket. From the system, try running:

echo "Testing view access:"
aws s3 ls s3://<your bucket>/<optional path>

echo "Testing write access:"
echo "Write me." > test.txt 
aws s3 cp test.txt s3://<your bucket>/<optional path>/test.txt   # --SSE AES256 # <--- uncomment that if encryption is required and your org uses the default AES256 
encryption. Or replace with other settings as needed.

echo "Testing download access:" 
rm test.txt  # remove it so that you have to get it back from s3
aws s3 cp s3://<your bucket>/<optional path>/test.txt .
ls 
# You should see test.txt 

Git to S3 Installation and Configuration

In order to set up the automation from Git to S3 and back, make sure you have the git_to_s3.zip file. If you do not have it, please contact support (at) lingoport dot com.

Git Access

In addition to access to AWS S3 set up in the previous section, this system needs to be able to clone, pull, add, commit, push files to the git repositories of interest. There are many ways to do so.

For example, see the Lingoport Wiki Git page.

Verification

Make sure all is set up correctly by simply cloning a project of interest for Localyzer. For instance

   git clone https://github.com/mycompany/myrepo

In addition, make sure the branch of interest needs is writable.

Scripts Installation

  • Unzip the git_to_s3.zip file in a directory accessible by Cron jobs.

This should result in the following set of files:

  • git_to_s3/scripts: where the bash scripts reside to select and transfer files to and from git/S3. Make sure the .sh files are executable. If not, run chmod +x *.sh.
  • git_to_s3/config: this is where the git repository, branches, file types, and optionally directories are set up

Configuration

Follow the README.md instructions.


s3config.properties

  • Set the S3 bucket and to_localyzer/from_localyzer directories
   S3_TO_LOCALYZER=s3://<S3 URL>/to_localyzer/
   S3_FROM_LOCALYZER=s3://<S3 URL>/from_localyzer/
  • Set the DATA_DIR location for the working directories. The directory must exist and be readable/writable.
   DATA_DIR="/full/path/git_s3_data"
  • Set the commit message for the translated files going back to the repository:
   COMMIT_MESSAGE="Localyzer Git-S3"

repositories.txt

  • Set the Git URL, branch, and optionally the directories to include, one per line, in the following format:
   https://<giturl>/<organization>/<repository> <branch> <optionally, comma separated list of include dirs>

For instance:

   https://github.com/lingoport-public/Rebel-Outfitters Payments
   https://github.com/lingoport-public/Rebel-Outfitters Miis content

Where Payments and Miis are branches for that repository, and content is the only directory in the Miis branch with files to be translated.


fileSuffixes.txt

  • Set the resource file extensions so only those files are copied to the S3 bucket. For instance:
   .properties
   .json
   .resx

Verification

To check that the Git to S3 system is set up and configured properly:

  • make sure you do have at least one repository configured in the repositories.txt file
  • run the wrapper.sh script
  • Look in the S3 bucket and verify the files of interest are there with aws s3 ls commands
   aws s3 ls  s3://lingoport-s3-roundtrip/to_localyzer/
   aws s3 ls  s3://lingoport-s3-roundtrip/to_localyzer/<RepositoryName_BranchName>/
   etc.

Cron

In order for the scripts to run automatically, we recommend setting up a cronjob with a frequency of your choice.

For instance, see: https://www.digitalocean.com/community/tutorials/how-to-use-cron-to-automate-tasks-centos-8

The crontab may look like the following to run the Git to S3 scripts on an hourly basis:

   H * * * * /path/to/wrapper.sh >/dev/null 2>&1

This means that there will be a delay of one hour at minima, in addition to the delay from running the Localyzer projects.

Cron is typically set up as a Linux service. You may want to stop and start the cron service and check its status with:

   sudo systemctl stop crond
   sudo systemctl start crond
   sudo systemctl status crond

Logging

Although not mandatory to have a functional system, having a log file to review if some issues are detected can be invaluable and is recommended. To maintain a log of the S3 syncing activities there are two items that must be configured.

  • Add a redirect of standard out and standard error to a log file.
  • Automatically rotate the log(s) that are created.

To create a new log first create a new directory /var/log/s3 and set the ownership identical to the user that executes the S3 wrapper.sh. It will likely be necessary to put this directory in /var/log if selinux is enabled on the system. Selinux can prevent writing to other directories within the system from crontab.

Once the /var/log/s3 directory has been created, modify the crontab as noted below to create a date entry in the log file and output both standard you and standard error to the log file every time the script executes. In this case the script will run hourly.

   DATEVAR=date +%c
   0 */1 * * * /bin/echo $(${DATEVAR}) >> /var/log/s3/s3_wrapper.log; /path/to/wrapper.sh >> /var/log/s3/s3_wrapper.log 2>&1

To permit automated log rotation of the newly created log by the operating system, create a new file in the /etc/logrotate.d/ directory. In this case it was named s3wrapper. Below are is an example of its contents. Be sure the change the <USER> <GROUP> entries to match user and group of the user running the wrapper.sh (without the "< >".

In this case it will create a total of seven rotations of the log file, if the log is >50KB in size. It will also compress the log as it rotates out. Feel free to adjust the number of rotations and size to fit the requirements of the system. These values are simply suggestions.

   /var/log/s3/s3_wrapper.log {
       missingok
       notifempty
       compress
       copytruncate
       rotate 7
       size 50k
       create 0660 <USER> <GROUP>
   }

At this point the system should be populating the log file automatically whenever cron executes, and also rotating the logs on a daily basis.

Debugging

If something is not working right in the system, run the wrapper.sh script outside of cron with the -x flag. It will output debug information to the console.

   bash -x /path/to/wrapper.sh 

bash -x will echo each line before executing it; In particular, the variable assignment will be visible.

Command Center AWS S3 Data Source Configuration

See:

Jenkins/Dashboard Localyzer Project Configuration (Unsupported)

This section used to apply to Jenkins and Dashboard, i.e. versions up to Japan of our product line. This is not supported any longer.

  • On-Board your Project, (but set the VCS details to 'None').
  • Run the associated Jenkins job once (this pre-populates several directories). Note: The Jenkins Job is expected to fail as the setup is not yet complete.
  • Edit the file:
   /var/lib/jenkins/Lingoport_Data/L10nStreamlining/<your group>/projects/<your project>/config/config_vcs.properties
  • The configuration requires three parameters to be set. Make sure the <S3 URL>, to_localyzer and from_localyzer are the same as set up on the Git to S3 System.
   VCS_TYPE=s3
   S3_TO_LOCALYZER=s3://<S3 URL>/to_localyzer/<RepoName>_<branchName>
   S3_FROM_LOCALYZER=s3://<S3 URL>/from_localyzer/<RepoName>_<branchName>


  • If your bucket uses AES256 encryption, add the following at the end:
   S3_OPTS=--sse AES256
    • Otherwise, leave blank:
   S3_OPTS=
  • Rerun the automation project and make sure the files are indeed correctly read in from S3, either by going to the workspace or checking the Code page on the Dashboard.