Difference between revisions of "False Positive Filtering through Globalyzer Rule Sets"

From Lingoport Wiki
Jump to: navigation, search
(Pre-Setup)
(Literal Filters (Embedded String Only))
(35 intermediate revisions by 2 users not shown)
Line 28: Line 28:
 
Filtering rules come in the following categories:
 
Filtering rules come in the following categories:
   
===== Literal Filters (Embedded String Only) =====
+
===== Content Filters (Embedded String Only) =====
String literal filters match against the string contents. The string will be filtered if part of its content matches.
+
String content filters match against the string contents. The string will be filtered if part of its content matches.
   
 
Example:
 
Example:
Line 63: Line 63:
 
* <code>someObject\.doSomething</code>
 
* <code>someObject\.doSomething</code>
   
  +
Note: <code>\.</code> is regex for a literal period. <code>.</code> is otherwise a wildcard.
===== Operand Filters (Embedded Strings only) =====
 
  +
String operand filters match against a variable that is compared to or assigned to the string via an operand.
 
  +
===== Variable Filters (Embedded Strings only) =====
  +
String variable filters match against a variable that is compared to or assigned to the string via an operand.
   
 
Examples:
 
Examples:
Line 80: Line 82:
 
== Pre-Setup ==
 
== Pre-Setup ==
   
Rule sets include an inheritance feature. You may wish to have an overarching rule set where you place filters for common application wide issues. Then multiple sub rule sets for specific sub projects. Rules are distinguished as either from a parent rule set or part of the current rule set. This make it easier to see the changes that you have made to a given rule set.
+
Rule sets include an inheritance feature. You may wish to have an overarching rule set where you place filters for common application wide issues. Then multiple sub rule sets for specific sub projects. Rules are distinguished as either from a parent rule set or part of the current rule set. This makes it easier to see the changes that you have made to a given rule set.
   
 
== Tools ==
 
== Tools ==
   
 
===== Globalyzer Workbench =====
 
===== Globalyzer Workbench =====
The Globalyzer Workbench is the ideal platform for rule set tuning. It allows you to test your filters against your code as you create them. Before you begin, you should be familiar with the workbench. Knowledge of the following is essential:
+
The Globalyzer Workbench is the ideal platform for rule set tuning. It allows you to test your filters against your code as you create them. Before you begin, you should be familiar with the Workbench. Knowledge of the following is essential:
 
# Workbench usage basics (create a project, manage scans, scan a project)
 
# Workbench usage basics (create a project, manage scans, scan a project)
# Searching through scan results.
+
# Searching through scan results (Right click an issue, the select: Find in Results).
 
#* Be aware that the search can only look through issues currently displayed in the scan results. This is a maximum of 5,000 issues at a time.
 
#* Be aware that the search can only look through issues currently displayed in the scan results. This is a maximum of 5,000 issues at a time.
 
# The 'Add filters/detections' dialog (From the menu: Scan -> Add Rule Set Filters/Detections...).
 
# The 'Add filters/detections' dialog (From the menu: Scan -> Add Rule Set Filters/Detections...).
 
# The scan views
 
# The scan views
#* View "all active" issues while searching for issues to filter.
+
#* View "All Scan Issues" while searching for issues to filter.
#* Search through issues in the view "filtered" to check that your rule filtered the correct issues. And to make sure the rule did not filter out real problems.
+
#* Search through issues in the view "Filtered" to check that your rule filtered the correct issues. And to make sure the rule did not filter out real problems.
   
 
===== Globalyzer Server =====
 
===== Globalyzer Server =====
You should mostly be testing new rules from the Globalyzer Workbench interface. However, you will need to visit the Globalyzer Server website in order to modify already existing rules.
+
You should mostly be testing new rules from the Globalyzer Workbench interface. However, you will need to visit the Globalyzer Server website in order to either create inheriting rule sets or to modify already existing rules.
   
 
===== Regex Testing Websites =====
 
===== Regex Testing Websites =====
   
There are multiple websites that allow you to test regex patterns. These provide a fast way to test rule patterns before trying them in the Workbench. A couple of sites to consider are [http://www.regexpal.com/ regexpal.com] and [https://regex101.com/ regex101.com]. Multiple regex variants are supported on each site. Use the 'pcre' variant - this is closest to the form that the Workbench's Java rule set engine uses. Using another variant, such as javascript, will not always give accurate results.
+
There are multiple websites that allow you to test regex patterns. These provide a fast way to test rule patterns before trying them in the Workbench. A couple of sites to consider are [http://www.regexpal.com/ regexpal.com] and [https://regex101.com/ regex101.com]. Multiple regex variants are supported on each site. Use the''' 'PCRE' '''variant - this is closest to the form that the Workbench's Java rule set engine uses. Using another variant, such as javascript, will not always give accurate results.
   
 
== General Tactics ==
 
== General Tactics ==
   
If scanning a large project in the workbench, create two scans using the same rule set. Set one scan to apply to the entire project, but have the other only scan a small portion of the code. The partial scan will be quick - use it as you iterate over your rules. Once you are happy with a given set of rules, try them with the full scan. Then submit them to the server and restart the process.
+
If scanning a large project in the workbench, create two scans using the same rule set. Set one scan to apply to the entire project, but have the other only scan a small portion of the code. The partial scan will be quick - use it as you iterate over your rules. Once you are happy with a given set of rules, try them with the full scan. Then submit them to the server.
   
When filtering a fresh project, it's best to filter widely used patterns first. There will be some application specific patterns that are present in a large number of false positives. Look for these patterns. You will need to do less work overall if you take the time to first filter the most widely present patterns.
+
When filtering a fresh project, there will be some application specific patterns that are present in a large number of false positives. Look for these patterns. You will need to do less work overall if you take the time to first filter the most widely present patterns.
   
  +
To find patterns in the results, sort the issues by different categories. Then scroll through, looking for patterns. '''Code Line''' and '''Issue''' are the most useful categories to sort for. '''Priority''' and '''File''' can also be useful.
The most obvious patterns include similar method and variable names. You can create a single filter that will match many similar method/variable names. More obscure patterns in the results will be discussed below.
 
   
  +
The most obvious patterns include common method and variable names. Sometimes you will find several method/variable names that are similar to each other. You can usually create a single filter that will match all of them.
Sort the issues by different categories. Then scroll through, looking for patterns. '''Code Line''' and '''Issue''' are the most useful categories to sort for. '''Priority''' and '''File''' can also be useful.
 
   
 
== Regex Patterns ==
 
== Regex Patterns ==
   
A great deal of creativity is possible when creating regex based rules. By being creative, you can create rules that match large numbers of false positives, despite the matching pattern being unintuitive or obscure. This section provides a tutorial to help you make the most of this power.
+
A great deal of creativity is possible when creating regex based rules. By being creative, you can create rules that match large numbers of false positives. Sometimes the matching pattern may initially be unintuitive or obscure. This section provides a tutorial to help you learn to find these patterns. And to show you how to filter them.
   
 
This section will discuss the embedded string issue category. This category has the widest variety of filter types, so it makes the best example. However, the tips shown here are applicable to all rule set categories.
 
This section will discuss the embedded string issue category. This category has the widest variety of filter types, so it makes the best example. However, the tips shown here are applicable to all rule set categories.
   
 
Different filtering types will be discussed in sequence. It is best to read through them sequentially, as each section may build on the others.
 
Different filtering types will be discussed in sequence. It is best to read through them sequentially, as each section may build on the others.
  +
   
 
=== Method Filters ===
 
=== Method Filters ===
Line 124: Line 127:
   
 
* <code>log</code>
 
* <code>log</code>
** basic filter - filters e.g. <code>log("some text.");</code>
+
** basic filter - filters e.g. <code>log("some text.");</code><br /><br />
   
 
* <code>log\.debug</code>
 
* <code>log\.debug</code>
 
** Filter a specific method of the 'log' variable/class. e.g. <code>log.debug("Some text.");</code>
 
** Filter a specific method of the 'log' variable/class. e.g. <code>log.debug("Some text.");</code>
 
** Note that the '.' is escaped, to match a literal period.
 
** Note that the '.' is escaped, to match a literal period.
** This should not be used in Java rule sets, which allow separate specifying of class/variable types.
+
** This should not be used in Java rule sets, which allow separate specifying of class/variable types.<br /><br />
   
 
* <code>log[\w\._]*</code>
 
* <code>log[\w\._]*</code>
** Matches any method or methods following log. E.g. <code>log.method_a.write("something");</code>
+
** Matches any method or methods following log. E.g. <code>log.method_a.write("something");</code><br /><br />
   
 
* <code>myMethod(LongVersion)?</code>
 
* <code>myMethod(LongVersion)?</code>
** Matches both <code>myMethod("Embedded string.");</code> and <code>myMethodLongVersion("Embedded String.");</code>
+
** Matches both <code>myMethod("Embedded string.");</code> and <code>myMethodLongVersion("Embedded String.");</code><br /><br />
   
 
* <code>(somePrefix)?[Mm]yMethod</code>
 
* <code>(somePrefix)?[Mm]yMethod</code>
 
** Matches both <code>myMethod("Embedded string.");</code> and <code>somePrefixMyMethod("Embedded String.");</code>
 
** Matches both <code>myMethod("Embedded string.");</code> and <code>somePrefixMyMethod("Embedded String.");</code>
** Note that <code>[Mm]</code> is necessary in camel cased languages, as the capitalization of 'My' changes.
+
** Note that <code>[Mm]</code> is necessary in camel cased languages, as the capitalization of 'My' changes.<br /><br />
   
 
* <code>(CharConversion|Unimplemented|IO)Exception</code>
 
* <code>(CharConversion|Unimplemented|IO)Exception</code>
** Matche against any of these three exception types.
+
** Matches against any of these three exception types.<br /><br />
   
 
* <code>(CharConversion|Unimplemented|IO)Exception[\w\._]*</code>
 
* <code>(CharConversion|Unimplemented|IO)Exception[\w\._]*</code>
** Match against any of these three exception types. And any method called by them.
+
** Matches against any of these three exception types. And any method called by them.<br /><br />
   
* <code>.*Exception[\w\._]</code>
+
* <code>.*Exception[\w\._]*</code>
 
** Match any exception.
 
** Match any exception.
 
** Use <code>.*</code> sparingly at the beginning of method patterns. It can slow down the scan.
 
** Use <code>.*</code> sparingly at the beginning of method patterns. It can slow down the scan.
   
=== Literal Filters ===
+
=== Content Filters ===
   
 
* <code>example string</code>
 
* <code>example string</code>
** Matches <code>String example = "This is an example string"</code>
+
** Matches <code>String example = "This is an example string"</code><br /><br />
   
 
* <code>\Aexample string\Z</code>
 
* <code>\Aexample string\Z</code>
Line 160: Line 163:
 
** Avoid <code>^</code> and <code>$</code>. These refer to the beginning and ending of lines, which is not equivalent.
 
** Avoid <code>^</code> and <code>$</code>. These refer to the beginning and ending of lines, which is not equivalent.
 
** Matches <code>String example = "example string";</code>
 
** Matches <code>String example = "example string";</code>
** Does not Match <code>String example = "This is an example string";</code>
+
** Does not Match <code>String example = "This is an example string";</code><br /><br />
   
 
* <code>\A(ON|OFF)\Z</code>
 
* <code>\A(ON|OFF)\Z</code>
 
** Matches <code>String example = "ON";</code>
 
** Matches <code>String example = "ON";</code>
 
** Matches <code>String example = "OFF";</code>
 
** Matches <code>String example = "OFF";</code>
** Does not match <code>String example = "ON OFF";</code>
+
** Does not match <code>String example = "ON OFF";</code><br /><br />
   
 
* <code>\A(word1|word2|word3)\Z</code>
 
* <code>\A(word1|word2|word3)\Z</code>
 
** Matches <code>String example = "word1"</code>;
 
** Matches <code>String example = "word1"</code>;
 
** Matches <code>String example = "word2"</code>;
 
** Matches <code>String example = "word2"</code>;
** Matches <code>String example = "word3"</code>;
+
** Matches <code>String example = "word3"</code>;<br /><br />
   
 
* <code>\A(buffalo[,\s]+)+\Z</code>
 
* <code>\A(buffalo[,\s]+)+\Z</code>
 
** Matches any number of "buffalo", separated by commas or spaces.
 
** Matches any number of "buffalo", separated by commas or spaces.
** Matches <code>String sentence = "buffalo buffalo buffalo buffalo buffalo buffalo buffalo buffalo";</code>
+
** Matches <code>String sentence = "buffalo buffalo buffalo buffalo buffalo buffalo buffalo buffalo";</code><br /><br />
   
 
* <code>\A((buffalo|yaks)[,\s]+)+\Z</code>
 
* <code>\A((buffalo|yaks)[,\s]+)+\Z</code>
 
** Matches any number of "buffalo" or "yaks", separated by commas or spaces.
 
** Matches any number of "buffalo" or "yaks", separated by commas or spaces.
** Matches <code>String sentence = "buffalo yaks buffalo yaks buffalo buffalo buffalo yaks";</code>
+
** Matches <code>String sentence = "buffalo yaks buffalo yaks buffalo buffalo buffalo yaks";</code><br /><br />
   
 
* <code>\A(a|b|body|br|button|colgroup|dd|div|dt|em|file|font|footer|form|h1|h2|h3|h4|h5|head|header|html|i|iframe|li|ol|p|pane|pre|seq|span|tab|tbody|td|text|textarea|th|tr|tt|type|ul|wrapper)\Z</code>
 
* <code>\A(a|b|body|br|button|colgroup|dd|div|dt|em|file|font|footer|form|h1|h2|h3|h4|h5|head|header|html|i|iframe|li|ol|p|pane|pre|seq|span|tab|tbody|td|text|textarea|th|tr|tt|type|ul|wrapper)\Z</code>
 
** Matches a string entirely composed of an html item.
 
** Matches a string entirely composed of an html item.
** Matches <code>String htmlTag = "footer";</code>
+
** Matches <code>String htmlTag = "footer";</code><br /><br />
   
 
* <code> \A((a|b|body|br|button|colgroup|dd|div|dt|em|file|font|footer|form|h1|h2|h3|h4|h5|head|header|html|i|iframe|li|ol|p|pane|pre|seq|span|tab|tbody|td|text|textarea|th|tr|tt|type|ul|wrapper)[,\s]*)+\Z</code>
 
* <code> \A((a|b|body|br|button|colgroup|dd|div|dt|em|file|font|footer|form|h1|h2|h3|h4|h5|head|header|html|i|iframe|li|ol|p|pane|pre|seq|span|tab|tbody|td|text|textarea|th|tr|tt|type|ul|wrapper)[,\s]*)+\Z</code>
 
** Matches a string entirely composed of html items, separated by commas or spaces
 
** Matches a string entirely composed of html items, separated by commas or spaces
** Matches <code>String htmlTags = "header, footer, div, br";</code>
+
** Matches <code>String htmlTags = "header, footer, div, br";</code><br /><br />
   
 
* <code>\A[A-Z\s]+\Z</code>
 
* <code>\A[A-Z\s]+\Z</code>
 
** Match a string of only spaces and UPPERCASE characters.
 
** Match a string of only spaces and UPPERCASE characters.
** Matches <code>String example = "EXAMPLE STRING";</code>
+
** Matches <code>String example = "EXAMPLE STRING";</code><br /><br />
   
* <code>\A[A-Z\W]*\b(GET|POST)\b[A-Z\W]*\Z
+
* <code>\A[A-Z\W]*\b(GET|POST)\b[A-Z\W]*\Z</code>
 
** <code>\W</code> = non-word. <code>[A-Z\W]</code> = UPPERCASE or non-word
 
** <code>\W</code> = non-word. <code>[A-Z\W]</code> = UPPERCASE or non-word
 
** <code>\b</code> = word boundry.
 
** <code>\b</code> = word boundry.
** Matches <code>String nonsenseQuery = "QUERY,---GET---,EXECUTE";</code>.
+
** Matches <code>String nonsenseQuery = "QUERY,---GET---,EXECUTE();";</code>.
   
=== Operand Filters ===
+
=== Variable Filters ===
  +
* (error)?[Mm]essage
 
  +
* <code>(error)?[Mm]essage</code>
 
** Matches <code>String errorMessage = "Something went wrong";</code>
 
** Matches <code>String errorMessage = "Something went wrong";</code>
** Matches <code>String message = "Something went wrong";</code>
+
** Matches <code>String message = "Something went wrong";</code><br /><br />
   
* (log|error)[Mm]sg
+
* <code>(log|error)[Mm]sg</code>
 
** Matches <code>String errorMsg = "Something went wrong";</code>
 
** Matches <code>String errorMsg = "Something went wrong";</code>
** Matches <code>String logMsg = "Something happened";</code>
+
** Matches <code>String logMsg = "Something happened";</code><br /><br />
   
* (log|err|error)[Mm](sg|essage)
+
* <code>(log|err|error)[Mm](sg|essage)</code>
 
** Matches <code>String errMsg = "Something went wrong";</code>
 
** Matches <code>String errMsg = "Something went wrong";</code>
 
** Matches <code>String errorMessage = "Something went wrong";</code>
 
** Matches <code>String errorMessage = "Something went wrong";</code>
** Matches <code>String logMessage = "Something happened";</code>
+
** Matches <code>String logMessage = "Something happened";</code><br /><br />
   
  +
* <code>[^\s]+[Mm]ode</code>
* [^\s]+[Mm]ode # [^\s] at start won't work for method filters
 
 
** Matches non-space characters followed by 'Mode'
 
** Matches non-space characters followed by 'Mode'
 
*** The <code>[^s]</code> allows including special characters such as <code>-</code> and <code>_</code>.
 
*** The <code>[^s]</code> allows including special characters such as <code>-</code> and <code>_</code>.
 
** Matches <code>String someMode = "Number 37";</code>
 
** Matches <code>String someMode = "Number 37";</code>
** Matches <code>String some_mode = "Number 38";</code>
+
** Matches <code>String some_mode = "Number 38";</code><br /><br />
   
* [A-Z]+_[A-Z]+(_|[A-Z]+)*
+
* <code>[A-Z]+_[A-Z]+(_|[A-Z]+)*</code>
 
** Matches UPPERCASE variables with <code>_</code> word separators.
 
** Matches UPPERCASE variables with <code>_</code> word separators.
 
** Requires at least 2 words.
 
** Requires at least 2 words.
 
** Allows any number of words.
 
** Allows any number of words.
** Matches <code>String SOME_VARIABLE = "Constant stored text";</code>
+
** Matches <code>public static final String SOME_VARIABLE = "Constant stored text";</code>
** Matches <code>String THIS_IS_A_RATHER_LONG_VARIABLE = "Containing a short sentence.";</code>
+
** Matches <code>public static final String THIS_IS_A_RATHER_LONG_VARIABLE = "Containing a short sentence.";</code>
  +
 
 
=== Line Filters ===
 
=== Line Filters ===
   
Line 231: Line 235:
 
** <code>\(</code> insures that only methods are detected.
 
** <code>\(</code> insures that only methods are detected.
 
** Matches <code>methodName("Some string");</code>
 
** Matches <code>methodName("Some string");</code>
** Does not match <code>String methodName = "someName";</code>
+
** Does not match <code>String methodName = "someName";</code><br /><br />
   
* <code>String methodName</code>
+
* <code>String varName</code>
** Catch variables only as they are assigned.
+
** Catch variables only as they are initially assigned.
** Matches <code>String methodName = "someName";</code>
+
** Matches <code>String varName = "someName";</code>
** Does not match <code>if (methodName == "someName") { ...</code>
+
** Does not match <code>varName = "someName";</code>
  +
** Does not match <code>if (varName == "someName") { ...</code><br /><br />
   
* <code>methodName\s*(<|<=|=|==|!=|>|>=)</code>
+
* <code>varName\s*(<|<=|=|==|!=|>|>=)</code>
 
** Imitate an operand filter
 
** Imitate an operand filter
** Matches <code>if (methodName == "someName") { ...</code>
+
** Matches <code>varName = "someName";</code>
  +
** Matches <code>if (varName == "someName") { ...</code>
  +
  +
=== More Examples ===
  +
  +
Default rule sets come with many pre-configured rules. You can review these rules for more examples and inspiration.

Revision as of 22:19, 6 August 2018

Reducing false positives by customizing rule sets.

The default Globalyzer Rule Sets come pre-configured to be generally applicable to all applications written in the relevant programming language. This places some limits on their accuracy, as applications can vary wildly in form and coding style. Rule sets are designed to detect issues regardless of the application coding style. However, as a consequence of this, they may also detect a number of false positives. These false positives can be greatly reduced via tuning a rule set to match an application's coding style.

Rule Set Tuning Basics

Rule sets contain 4 categories of detection types:

  • Embedded Strings
    • Any hard coded string in the application that will need to be translated.
  • Locale Sensitive Methods
    • For example, Date/Time, Encoding, or String Concatenation methods.
  • General Patterns
    • For example, hard coded fonts, encodings, or date formats: 'ASCII', 'ARIAL', 'mm/dd/yy'.
  • Static File References
    • Application references to static files, some of which may need to be localized.

For each category, there are regex based rules used to detect and filter issues. Rule sets start with pre-configured rules, but are fully modifiable after creation.

Rule Types

For each category, there are detection and filtering rules. You can mostly leave the detection rules alone - they are already well configured. This article will focus on creating, modifying, and testing filtering rules.

Filtering Rules

For each rule, the given regex will be matched against a part of the issue's context. The context could be the line the issue is on, the text of the issue itself, a method call, or a variable. If the regex pattern matches some or all of the context (depending), then the issue will be filtered.

Filtering rules come in the following categories:

Content Filters (Embedded String Only)

String content filters match against the string contents. The string will be filtered if part of its content matches.

Example:

String example = "This is a string.";

Matching filter examples:

  • a string
  • This
Line Filters (Embedded Strings, Locale Sensitive Methods, General Patterns, Static File References)

Line filters match against the code line containing the issue. If the match succeeds for the code line, then the issue will be filtered.

Example:

String example = "This is a string with the general pattern 'mm/dd/yy'";

Matching filter examples:

  • example
  • String example
  • This is a string
  • mm/dd/yy
Method Filters (Embedded Strings only)

String method filters match against a method in use which may contain one or more embedded strings. All strings passed to the method are filtered.

Example:

someObject.doSomething("Input one", "Input two");

Matching filter examples:

  • doSomething
  • someObject\.doSomething

Note: \. is regex for a literal period. . is otherwise a wildcard.

Variable Filters (Embedded Strings only)

String variable filters match against a variable that is compared to or assigned to the string via an operand.

Examples:

String example = "This is a string.";
if (example != "This is not a string") { ...

Matching filter example:

  • example

Note: =, == and != are nearly universal operators, but many other operators may also be matched against, depending on the language.

Filtering Tactics

Pre-Setup

Rule sets include an inheritance feature. You may wish to have an overarching rule set where you place filters for common application wide issues. Then multiple sub rule sets for specific sub projects. Rules are distinguished as either from a parent rule set or part of the current rule set. This makes it easier to see the changes that you have made to a given rule set.

Tools

Globalyzer Workbench

The Globalyzer Workbench is the ideal platform for rule set tuning. It allows you to test your filters against your code as you create them. Before you begin, you should be familiar with the Workbench. Knowledge of the following is essential:

  1. Workbench usage basics (create a project, manage scans, scan a project)
  2. Searching through scan results (Right click an issue, the select: Find in Results).
    • Be aware that the search can only look through issues currently displayed in the scan results. This is a maximum of 5,000 issues at a time.
  3. The 'Add filters/detections' dialog (From the menu: Scan -> Add Rule Set Filters/Detections...).
  4. The scan views
    • View "All Scan Issues" while searching for issues to filter.
    • Search through issues in the view "Filtered" to check that your rule filtered the correct issues. And to make sure the rule did not filter out real problems.
Globalyzer Server

You should mostly be testing new rules from the Globalyzer Workbench interface. However, you will need to visit the Globalyzer Server website in order to either create inheriting rule sets or to modify already existing rules.

Regex Testing Websites

There are multiple websites that allow you to test regex patterns. These provide a fast way to test rule patterns before trying them in the Workbench. A couple of sites to consider are regexpal.com and regex101.com. Multiple regex variants are supported on each site. Use the 'PCRE' variant - this is closest to the form that the Workbench's Java rule set engine uses. Using another variant, such as javascript, will not always give accurate results.

General Tactics

If scanning a large project in the workbench, create two scans using the same rule set. Set one scan to apply to the entire project, but have the other only scan a small portion of the code. The partial scan will be quick - use it as you iterate over your rules. Once you are happy with a given set of rules, try them with the full scan. Then submit them to the server.

When filtering a fresh project, there will be some application specific patterns that are present in a large number of false positives. Look for these patterns. You will need to do less work overall if you take the time to first filter the most widely present patterns.

To find patterns in the results, sort the issues by different categories. Then scroll through, looking for patterns. Code Line and Issue are the most useful categories to sort for. Priority and File can also be useful.

The most obvious patterns include common method and variable names. Sometimes you will find several method/variable names that are similar to each other. You can usually create a single filter that will match all of them.

Regex Patterns

A great deal of creativity is possible when creating regex based rules. By being creative, you can create rules that match large numbers of false positives. Sometimes the matching pattern may initially be unintuitive or obscure. This section provides a tutorial to help you learn to find these patterns. And to show you how to filter them.

This section will discuss the embedded string issue category. This category has the widest variety of filter types, so it makes the best example. However, the tips shown here are applicable to all rule set categories.

Different filtering types will be discussed in sequence. It is best to read through them sequentially, as each section may build on the others.


Method Filters

String method filters are the simplest to start with. A basic method filter, such as log will filter all strings passed to that method. More detailed filters are shown below.

  • log
    • basic filter - filters e.g. log("some text.");

  • log\.debug
    • Filter a specific method of the 'log' variable/class. e.g. log.debug("Some text.");
    • Note that the '.' is escaped, to match a literal period.
    • This should not be used in Java rule sets, which allow separate specifying of class/variable types.

  • log[\w\._]*
    • Matches any method or methods following log. E.g. log.method_a.write("something");

  • myMethod(LongVersion)?
    • Matches both myMethod("Embedded string."); and myMethodLongVersion("Embedded String.");

  • (somePrefix)?[Mm]yMethod
    • Matches both myMethod("Embedded string."); and somePrefixMyMethod("Embedded String.");
    • Note that [Mm] is necessary in camel cased languages, as the capitalization of 'My' changes.

  • (CharConversion|Unimplemented|IO)Exception
    • Matches against any of these three exception types.

  • (CharConversion|Unimplemented|IO)Exception[\w\._]*
    • Matches against any of these three exception types. And any method called by them.

  • .*Exception[\w\._]*
    • Match any exception.
    • Use .* sparingly at the beginning of method patterns. It can slow down the scan.

Content Filters

  • example string
    • Matches String example = "This is an example string"

  • \Aexample string\Z
    • Use \A to denote the beginning of the content (string). \Z for the end.
    • Avoid ^ and $. These refer to the beginning and ending of lines, which is not equivalent.
    • Matches String example = "example string";
    • Does not Match String example = "This is an example string";

  • \A(ON|OFF)\Z
    • Matches String example = "ON";
    • Matches String example = "OFF";
    • Does not match String example = "ON OFF";

  • \A(word1|word2|word3)\Z
    • Matches String example = "word1";
    • Matches String example = "word2";
    • Matches String example = "word3";

  • \A(buffalo[,\s]+)+\Z
    • Matches any number of "buffalo", separated by commas or spaces.
    • Matches String sentence = "buffalo buffalo buffalo buffalo buffalo buffalo buffalo buffalo";

  • \A((buffalo|yaks)[,\s]+)+\Z
    • Matches any number of "buffalo" or "yaks", separated by commas or spaces.
    • Matches String sentence = "buffalo yaks buffalo yaks buffalo buffalo buffalo yaks";

  • \A(a|b|body|br|button|colgroup|dd|div|dt|em|file|font|footer|form|h1|h2|h3|h4|h5|head|header|html|i|iframe|li|ol|p|pane|pre|seq|span|tab|tbody|td|text|textarea|th|tr|tt|type|ul|wrapper)\Z
    • Matches a string entirely composed of an html item.
    • Matches String htmlTag = "footer";

  • \A((a|b|body|br|button|colgroup|dd|div|dt|em|file|font|footer|form|h1|h2|h3|h4|h5|head|header|html|i|iframe|li|ol|p|pane|pre|seq|span|tab|tbody|td|text|textarea|th|tr|tt|type|ul|wrapper)[,\s]*)+\Z
    • Matches a string entirely composed of html items, separated by commas or spaces
    • Matches String htmlTags = "header, footer, div, br";

  • \A[A-Z\s]+\Z
    • Match a string of only spaces and UPPERCASE characters.
    • Matches String example = "EXAMPLE STRING";

  • \A[A-Z\W]*\b(GET|POST)\b[A-Z\W]*\Z
    • \W = non-word. [A-Z\W] = UPPERCASE or non-word
    • \b = word boundry.
    • Matches String nonsenseQuery = "QUERY,---GET---,EXECUTE();";.

Variable Filters

  • (error)?[Mm]essage
    • Matches String errorMessage = "Something went wrong";
    • Matches String message = "Something went wrong";

  • (log|error)[Mm]sg
    • Matches String errorMsg = "Something went wrong";
    • Matches String logMsg = "Something happened";

  • (log|err|error)[Mm](sg|essage)
    • Matches String errMsg = "Something went wrong";
    • Matches String errorMessage = "Something went wrong";
    • Matches String logMessage = "Something happened";

  • [^\s]+[Mm]ode
    • Matches non-space characters followed by 'Mode'
      • The [^s] allows including special characters such as - and _.
    • Matches String someMode = "Number 37";
    • Matches String some_mode = "Number 38";

  • [A-Z]+_[A-Z]+(_|[A-Z]+)*
    • Matches UPPERCASE variables with _ word separators.
    • Requires at least 2 words.
    • Allows any number of words.
    • Matches public static final String SOME_VARIABLE = "Constant stored text";
    • Matches public static final String THIS_IS_A_RATHER_LONG_VARIABLE = "Containing a short sentence.";

Line Filters

Any of the above filters will work as line filters. See the tricks below to avoid picking up extra issues.

  • methodName\(
    • \( insures that only methods are detected.
    • Matches methodName("Some string");
    • Does not match String methodName = "someName";

  • String varName
    • Catch variables only as they are initially assigned.
    • Matches String varName = "someName";
    • Does not match varName = "someName";
    • Does not match if (varName == "someName") { ...

  • varName\s*(<|<=|=|==|!=|>|>=)
    • Imitate an operand filter
    • Matches varName = "someName";
    • Matches if (varName == "someName") { ...

More Examples

Default rule sets come with many pre-configured rules. You can review these rules for more examples and inspiration.