Difference between revisions of "Machine Learning"

From Lingoport Wiki
Jump to: navigation, search
(Work Flow)
(Work Flow)
Line 39: Line 39:
   
 
After marking some issues as false positives, please click "Find more false positives" button under "Machine Learning" menu, and wait the predicting process finish. Possible values for the predictions are:
 
After marking some issues as false positives, please click "Find more false positives" button under "Machine Learning" menu, and wait the predicting process finish. Possible values for the predictions are:
* T: Marked by a user as True Positive to train Machine Learning
+
* <code>T</code>: Marked by a user as True Positive to train Machine Learning
* F: Marked by a user as False Positive to train Machine Learning
+
* <code>F</code>: Marked by a user as False Positive to train Machine Learning
* Negative: Marked by Workbench if the issue is filtered to train Machine Learning
+
* <code>Negative</code>: Marked by Workbench if the issue is filtered to train Machine Learning
   
* ML False: Machine Learning prediction that the issue is in fact a false positive
+
* <code>ML False</code>: Machine Learning prediction that the issue is in fact a false positive
* ML NULL: Machine Learning prediction that the issue is not a false positive, i.e. the issue should be refactored.
+
* <code>ML NULL</code>: Machine Learning prediction that the issue is not a false positive, i.e. the issue should be refactored.
* ML True: Machine learning prediction that the issue is a true positive, i.e. the issue should be refactored.
+
* <code>ML True</code>: Machine learning prediction that the issue is a true positive, i.e. the issue should be refactored.
   
   

Revision as of 22:53, 8 March 2018

Machine Learning Overview

Machine Learning prediction is a Globalyzer workbench and Globalyzer Lite feature that help users handle false positive issues. We suggest applying machine learning as a follow-up step to scanning with Rule Sets. It helps to determine which candidate issues using Rule Sets are indeed i18n issues.

Installation

Prerequisite: Python 3.6.x and H2O.ai 3.x

1. Download Python version 3.6+ from website https://www.python.org/downloads/

2. Install python and add python to PATH environment variable

3. Go to this link http://h2o-release.s3.amazonaws.com/h2o/rel-wheeler/4/index.html and make sure you navigate to the "INSTALL IN PYTHON" tab as shown below.

 Install dependencies (prepending with `sudo` if needed):
 pip install requests
 pip install tabulate
 pip install scikit-learn
 pip install colorama
 pip install future

At the command line, copy and paste these commands one line at a time:

 pip uninstall h2o
 pip install http://h2o-release.s3.amazonaws.com/h2o/rel-wheeler/4/Python/h2o-3.16.0.4-py2.py3-none-any.whl

Success if response messages have "Successfully installed h2o-3.16.0.4"

Test1: Open System Command and type in "python -V", success if reply python version like "Python 3.6.2"

Test2: On the command line, go into python. In python:

> import h2o
> h2o.init()

This should complete without errors.

Work Flow

Firstly, you need to create a globalyzer project with scans in Globalyzer client. At the Scan Results view, you could right mouse click on the issue that you determine it's a false positive issue, and choose "Mark prediction as false positive(F)" from the menu. Please at least marking several issues as false positives before applying "Find more false positives" under Machine Learning menu.

After marking some issues as false positives, please click "Find more false positives" button under "Machine Learning" menu, and wait the predicting process finish. Possible values for the predictions are:

  • T: Marked by a user as True Positive to train Machine Learning
  • F: Marked by a user as False Positive to train Machine Learning
  • Negative: Marked by Workbench if the issue is filtered to train Machine Learning
  • ML False: Machine Learning prediction that the issue is in fact a false positive
  • ML NULL: Machine Learning prediction that the issue is not a false positive, i.e. the issue should be refactored.
  • ML True: Machine learning prediction that the issue is a true positive, i.e. the issue should be refactored.


If you find issues be predicted as "ML False" are indeed issues, you could right mouse click on the issue and select "Mark prediction as true positive(T)", and in next time you run "Find more false positives" machine learning will learn your correction. And if you are not satisfied with the prediction results, please continue marking more issues as "F" or "T", and rerun "Find more false positives".

Tips:

  • Viewing all issues, including filtered issues: One way to understand some of the Machine Learning results is to show all issues, including filtered ones. When an issue is predicted as "ML False", it is easier to see why when it is surrounded by filtered issues with the same type of patterns.
  • Scan > Search in Scan Results:
    • Search on the Prediction column for issues which are "ML False". From the Search panel, you can right click on the items to change the prediction with "Globalyzer > Mark Prediction as True" (or False).
    • Search on the Prediction column for issues which are "ML NULL" and "ML True": This will help you see which issues are predicted as True Positives.

Machine Learning FAQ

1. If I change issues status, will machine learning work?

Yes, it will work. when you change issues status, the prediction of the issue will be changed by default. For example, if you move issues to "Todo" status, the prediction will be marked as True, if you move issues to "Ignore"/"Invalid" status, the prediction will be marked as False. However, you could still mark a "Todo" issue as "False" manually.

2. How does machine learning work?

We use h2o.ai to analyze the issue, the issue code line and the issue reason. Based on filtered issues and your marked false issues, machine learning will try to find similar issues with them and change the prediction as "ML False" for those similar issues. So machine learning prediction will be different per project per scan, you won't have an exactly same result every time, machine learning just gives the prediction. In addition, machine learning needs a category as input to learn, which means if you only mark one issue as "False Positives", there are high possibly that machine learning cannot find other issues similar with it.