-
Notifications
You must be signed in to change notification settings - Fork 0
Developing Plug Ins
Developing plug-ins requires experience in Python 3 programming. Assuming that most plug-ins to be developed will operate on parsed PeopleCode, two additional requirements apply:
- Experience with parsers defined and generated using ANTLR 4; and
- Experience with programming applications based on listeners or visitors generated by ANTLR 4.
The PeopleCode lexer and parser grammars with which the peoplecodeparser
package has been generated can be found in its GitHub repository. It is important to understand the nested nature of the rules defined in the parser grammar in particular, in order to traverse the parse tree and look for specific tokens.
The Definitive ANTLR 4 Reference by Terrence Parr is recommended reading to understand the ANTLR 4 parser generator architecture, how to interpret grammars, and how to code using listeners and visitors. Since the book is based on parsers generated for Java, additional resources are recommended below for Python applications and general examples:
- JSZheng's GitHub repository where the book's examples are ported to Python 3;
- The ANTLR Mega Tutorial by Federico Tomassetti;
- The ANTLR website and GitHub documentation page.
For simple rules, however, the easiest approach would be to derive the new plug-in using the SQLExecRule
class as a template, complemented by the empty method signatures in the PeopleCodeParserListener
class.
Once the plug-ins have been developed, they should be installed into the system packages so that the Static Code Analyzer engine can locate them when referenced from the configuration file.
This section explains the functionality of the delivered SQLExecRule
class with an added focus on explaining its design vis-à-vis the PeopleCode parser.
First, to understand how the parser interprets a SQLExec
function call, we look at the simpleFunctionCall
parser rule:
simpleFunctionCall
: genericID LPAREN functionCallArguments? RPAREN
;
This rule tells us that a simple function call consists of a genericID
(i.e., the function's name) followed by an optional list of arguments enclosed between required parentheses. Said list of arguments is defined as follows:
functionCallArguments
: expression (COMMA expression)*
;
The last thing we need to look at is the expression
parser rule:
expression
: LPAREN expression RPAREN #ParenthesizedExpr
| AT expression #AtExpr
| objectCreate #ObjectCreateExpr
| expression AS (appClassPath | genericID) #ClassCastExpr
| expression LBRACKET expressionList RBRACKET #ArrayIndexExpr
| simpleFunctionCall #FunctionCallExpr
| expression dotAccess+ #DotAccessExpr
| expression DOT StringLiteral #StringObjectReferenceExpr
| expression LPAREN expression RPAREN #ImplicitSubindexExpr
| SUBTR expression #NegationExpr
| <assoc=right> expression EXP expression #ExponentialExpr
| expression op=(STAR | DIV) expression #MultDivExpr
| expression op=(ADD | SUBTR) expression #AddSubtrExpr
| NOT expression #NotExpr
| expression NOT? op=(LE | GE | LT | GT) expression #ComparisonExpr
| expression NOT? op=(NEQ | EQ) expression #EqualityExpr
| expression op=(AND | OR) expression #AndOrExpr
| expression PIPE expression #ConcatenationExpr
| literal #LiteralExpr
| ident #IdentifierExpr
| appClassPath #MetadataExpr
;
This is one of the most important rules of the parser, because we can see that practically anything could be considered an expression
, with the added complexity of recursive definition. However, for the purposes of the SQLExecRule
class, we're interested in detecting instances where the first argument is either a string literal or a concatenation expression. Below is an annotated excerpt of that class, where we can see how to probe for the presence of certain parser and lexer elements in cases of ambiguous definition:
# Enter a parse tree produced by PeopleCodeParser#simpleFunctionCall.
def enterSimpleFunctionCall(self, ctx: PeopleCodeParser.SimpleFunctionCallContext):
"""Event triggered when a simple function call is found."""
# Get the line where the simpleFunctionCall starts in the source
line = ctx.start.line
if self.is_position_in_intervals(line):
# We know that the rule will contain a genericID, and it may or may not
# contain an allowableFunctionName
function_name = ctx.genericID().allowableFunctionName()
# If allowableFunctionName is present, only act if it's "SQLExec"
if function_name and function_name.getText().upper() == 'SQLEXEC':
# Obtain the list of functionCallArguments
args = ctx.functionCallArguments()
# We should never have a SQLExec call with zero arguments,
# but best to check anyway
if args:
# Isolate the first argument
expr = args.expression(i=0)
# If the expression contains a literal or a "|" character
# denoting concatenation, create an error report
if hasattr(expr, 'literal'):
message = 'SQLExec with literal first argument'
elif hasattr(expr, 'PIPE'):
message = 'SQLExec with concatenated first argument'
else:
message = None
if message:
# Create an error report
report = Report(
self.code, message,
line=line, column=(ctx.start.column + 1),
text=ctx.getText(),
detail=('The first argument to SQLExec should be '
'either a SQL object reference or a '
'variable with dynamically generated '
'SQL.'))
self.reports.append(report)
[...]
This is an absurd but fully-functional example of a custom plug-in that ensures that all locally-scoped variables start with the prefix "&yo
". The example is implemented as part of the repository's tests.
First, we consider the different ways that locally-scoped variables can be declared:
-
They can be declared individually:
Local string &var1;
-
They can be declared collectively:
Local number &var2, &var3, &var4;
-
They can be declared and assigned an initial value:
Local string &var5 = "Some value"; Local number &var6 = (&var2 + &var3) * &var4;
There are two parser rules which cover these three possibilities:
// Local variable declaration (individual or collective) without assignment
localVariableDefinition
: LOCAL typeT USER_VARIABLE (COMMA USER_VARIABLE)* COMMA? // trailing comma is allowed
;
// Local variable declaration with assignment
localVariableDeclAssignment
: LOCAL typeT USER_VARIABLE EQ expression
;
The LocalVariableNamingRule
class starts off as follows:
from pscodeanalyzer.engine import Report
from pscodeanalyzer.rules.peoplecode import PeopleCodeParserListenerRule
from peoplecodeparser.PeopleCodeParser import PeopleCodeParser
class LocalVariableNamingRule(PeopleCodeParserListenerRule):
"""Rule to enforce locally-defined variable naming convention.
The following configuration options apply:
- "variable_prefix": the prefix with which all locally-defined
variables must begin
"""
def __init__(self, config):
"""Initialize the rule."""
super(LocalVariableNamingRule, self).__init__(config)
self.variable_prefix = config.get('variable_prefix')
if not self.variable_prefix:
raise ValueError('empty variable_prefix is not allowed')
Here we can see that the class defines a new required property, variable_prefix
, for the settings file. Defining a new property is of course not required, but it helps to generalize and future-proof the solution.
Next, we add the two methods that trigger whenever one of the two aforementioned parser rules are encountered:
# Enter a parse tree produced by PeopleCodeParser#localVariableDefinition.
def enterLocalVariableDefinition(self, ctx: PeopleCodeParser.LocalVariableDefinitionContext):
"""Event triggered when a local variable definition is found.
Local variable definitions are of the following forms:
Local string &var1;
Local number &var2, &var3, &var4;
"""
self._verify_user_variables(ctx)
# Enter a parse tree produced by PeopleCodeParser#localVariableDeclAssignment.
def enterLocalVariableDeclAssignment(self, ctx: PeopleCodeParser.LocalVariableDeclAssignmentContext):
"""Event triggered for local variable definition-assignments.
These are of the form:
Local string &var5 = "Some value";
Local number &var6 = (&var2 + &var3) * &var4;
"""
self._verify_user_variables(ctx)
Because both rules must be processed the same way, the actual code is delegated to a pair of private methods. The first, _verify_user_variables
, will handle the initial stages of the parse:
def _verify_user_variables(self, ctx):
"""Verify if variable names are compliant.
The local variable definition parser rule contains a list of
USER_VARIABLE tokens, whereas the local variable declaration and
assignment parser rule contains a single one.
"""
# Determine if the line where the declaration occurs is within
# the (optional) intervals under analysis
line = ctx.start.line
if self.is_position_in_intervals(line):
# Obtain the USER_VARIABLE token(s) from the context
user_variable = ctx.USER_VARIABLE()
# The localVariableDefinition parser rule allows for collective
# variable declaration, so the USER_VARIABLE method will return
# a list of tokens, regardless of how many variables were
# declared. In contrast, the localVariableDeclAssignment rule
# only allows for a single variable to be declared and assigned,
# so the USER_VARIABLE method will always return a single token.
if type(user_variable) is list:
for uv in user_variable:
self._process_single_variable(uv)
else:
self._process_single_variable(user_variable)
The actual naming convention validation is delegated to the _process_single_variable
method:
def _process_single_variable(self, user_variable):
"""Verify if an individual variable name is compliant."""
if user_variable:
# The getText method returns the token's text as found
# in the source file
var_name = user_variable.getText()
# We check if the token starts with the prefix provided
# in the settings, and act if it doesn't
if not var_name.startswith(self.variable_prefix):
# This is equivalent to the first statement in the
# _verify_user_variables method: line = ctx.start.line
line = user_variable.parentCtx.start.line
# The getSymbol method returns an object with various
# attributes about the token, including the column on
# which it begins in the source line (note that the
# column is zero-indexed, so we add 1 for readability)
column = user_variable.getSymbol().column + 1
# Finally, we assemble the report and append it to the
# Rule's list of reports
message = (f'Variable name "{var_name}" does not start with '
f'"{self.variable_prefix}"')
report = Report(
self.code, message, line=line, column=column,
text=var_name,
detail=('The variable name does not begin with the prefix '
f'"{self.variable_prefix}".'))
self.reports.append(report)
This is an example that shows how to use native Python features only (i.e., without using the PeopleCode parser) to enforce an opinionated rule of a maximum line length of 79 characters. The example is implemented as part of the repository's tests.
The LineLengthRule
class starts off as follows:
from pscodeanalyzer.engine import Report, Rule
class LineLengthRule(Rule):
"""Rule to enforce maximum line lengths.
The following configuration options apply:
- "max_length": an integer indicating the maximum acceptable line
length
"""
def __init__(self, config):
"""Construct a rule with the given configuration."""
super(LineLengthRule, self).__init__(config)
self.max_length = int(config.get('max_length'))
if self.max_length <= 0:
raise ValueError('max_length must be a positive integer')
Here we can see that the class defines a new required property, max_length
, for the settings file. Defining a new property is of course not required, but it helps to generalize and future-proof the solution.
Next, we override the evaluate_file
method to process the source file, line by line:
def evaluate_file(self, source_file):
"""Evaluate the rule against the provided file-like object.
The file must already be open.
Returns a list of Report objects.
"""
# Start with a clean slate
reports = []
# Process each line of the source file in turn
for line, text in enumerate(source_file, start=1):
# Determine if the line where the declaration occurs is
# within the (optional) intervals under analysis
if self.is_position_in_intervals(line):
# We check if the line's length exceeds the value
# provided in the settings
line_length = len(text)
if line_length > self.max_length:
# Finally, we assemble the report and append it
# to the Rule's list of reports
report = Report(
self.code, self.default_message,
report_type=self.default_report_type, line=line,
text=text,
detail=f'Line {line} has a length of {line_length}.')
reports.append(report)
return reports