Reducing false positives by customizing rule sets.

The default Globalyzer Rule Sets come pre-configured to be generally applicable to all applications written in the relevant programming language. This places some limits on their accuracy, as applications can vary wildly in form and coding style. Rule sets are designed to detect issues regardless of the application coding style. However, as a consequence of this, they may also detect a number of false positives. These false positives can be greatly reduced via tuning a rule set to match an applications coding style.

Rule Set Tuning Basics

Rule sets contain 4 categories of detection types:

Embedded Strings
- Any hard coded string in the application that will need to be translated.
Locale Sensitive Methods
- For example, Date/Time, Encoding, or String Concatenation methods.
General Patterns
- For example, hard coded fonts, encodings, or date formats: 'ASCII', 'ARIAL', 'mm/dd/yy'.
Static File References
- Application references to static files, some of which may need to be localized.

For each category, there are regex based rules used to detect and filter issues. Rule sets start with pre-configured rules, but are fully modifiable after creation.

Rule Types

For each category, there are detection and filtering rules. You can mostly leave the detection rules alone - they are already well configured. This article will focus on creating, modifying, and testing filtering rules.

Filtering Rules

For each rule, the given regex will be matched against a part of the issues context. The context could be the line the issue is on, the text of the issue itself, a method call, or a variable. If the regex pattern matches some or all of the context (depending), then the issue will be filtered.

Filtering rules come in the following categories:

Literal Filters (Embedded String Only)

String literal filters match against the string contents. The string will beis filtered if part of its content matches.

Example:

String example = "This is a string.";

Matching filter examples:

a string
This

Line Filters (All types)

Line filters match against the code line containing the issue. If the match succeeds for the code line, then the issue will be filtered.

Example:

String example = "This is a string with the general pattern 'mm/dd/yy'";

Matching filter examples:

example
String example
This is a string
mm/dd/yy

Method Filters (Embedded Strings only)

String method filters match against a method in use which may contain one or more embedded strings. All strings passed to the method are filtered.

Example:

someObject.doSomething("Input one", "Input two");

Matching filter examples:

doSomething
someObject\.doSomething

Operand Filters (Embedded Strings only)

String operand filters match against a variable that is compared to or assigned to the string via an operand.

Examples:

String example = "This is a string.";
if (example != "This is not a string") { ...

Matching filter example:

example

Note: '=', '==' and '!=' are nearly universal operators, but many other operators may also be matched against, depending on the language.

General Guidelines

To

FP Filtering

   * Look for patterns, it's best to start by creating rules that filter large
     amounts of issues before creating rules that filter only a few.
   * Types of patterns may be obvious, for instance similar method/variable
     names. Or they may be more obscure. See below.
   * Create one filter catches multiple methods, e.g. myMethod and
     myMethodLong.
 * Advanced Filters
   * The Embedded String category has all filter types. Will use it for
     example.
   * Method
     * log
     * log\.debug
     * log[\w\._]*
     * (CharConversion|Unimplemented|I?IO)Exception[\w\._]*
     * 'myMethod(Long)?'
     * '(possible)?Example\w+'
   * Literal
     * \A[A-Z\s]+\Z
     * \A(ON|OFF)\Z
     * \A(word1|word2|word3)\Z
     * html: \A(\s*\b(a|b|body|button|colgroup|dd|div|dt|em|file|font|footer|form|geo_name|h1|h2|h3|h4|h5|head|header|html|i|iframe|li|ol|p|pane|pre|seq|span|tab|tbody|td|text|textarea|th|tr|tt|type|ul|wrapper)\b[,\s]*)+\Z
     * \A[A-Z\W]*\b(GET|POST)\b[A-Z\W]*\Z  # [A-Z\W] = UPPERCASE or non-word
   * Operand
     * (error)?[Mm]essage
     * [^\s]+Mode                # [^\s] at start won't work for method filters
     * [A-Z]+_[A-Z]+(_|[A-Z]+)*
   * Line
     * String stringName
     * \A[A-Z]+_[A-Z](_[A-Z]|[A-Z]+)+\Z   # Line constant, e.g. CONSTANT_NAME

* Notes:

   * Avoid initial .* (computationally expensive, also may not work)
   * Monster Literal: 
     ((\\[bwst.]|\\[AZSP]|\.[*+]|\^\{0,6}\]).*?){3,}
     * Filters out strings containing regex expressions.
       Looks for 3 or more of the following:
       "\b \w \s \t \. .* .+ [q-y,$etc] ^ $ \A \Z \S \P"

       Components:
       \\[bwst.]      # \b \w \s \t \.
       \\[AZSP.]      # \A \Z \S \P \.
       \.[*+]         # .* .+
       \^\{0,6}\] # [a-z,etc]

       ((a).*?){3,}
       apples          # no match
       apples fall     # no match
       apples fall far # match

      Same principle:
      ((\\[bwst.]|\\[AZSP]|\.[*+]|\^\{0,6}\]).*?){3,}
   * Less obvious patterns:
   * Similar string content
* Sort the issues by different categories. Then scroll through, looking for
  patterns
 * Code Line and Issue are the most useful categories to sort for. Priority
   and File can also be useful.

False Positive Filtering through Globalyzer Rule Sets

Contents

Rule Set Tuning Basics

Rule Types

Filtering Rules

Literal Filters (Embedded String Only)

Line Filters (All types)

Method Filters (Embedded Strings only)

Operand Filters (Embedded Strings only)

General Guidelines

Navigation menu

Personal tools

Namespaces

Variants

Views

More

Search

Lingoport

Lingoport Products

Command Center

Globalyzer

Localyzer (aka LRM)

LocalyzerQA

Express

LingoBot

InContext for Translation

Tools