BARF: Reflex

Throughout this document, the term "primary source file" will be used to refer to the .reflex source file which is written by the user and is directly parsed by reflex.

Using Reflex From The Commandline

$ reflex -I path/to/targets/directory input.reflex -o desired/output/directory/

Format Of The Reflex Source File

Everything at the top of the file, above the %% delimiter, is the preamble. The preamble is where certain meta-properties are declared via directives:

It should be noted that the preamble for both reflex and trison are nearly identical in format -- the exception being the tool-specific directives such as %macro on the reflex side, and XXX (TODO: fill in later) on the trison side.

Below the %% delimiter is the scanner mode specification. The format is comprised of one or more scanner modes, each containing zero or more regex rules. Each scanner mode is essentially a unique mode of operation which dictates what regex rules can be accepted at any given time. When the scanner is in a particular scanner mode, only the regex rules specified within can be accepted. The generated scanner code will provide facilities for switching the current scanner mode at runtime.

Scanner modes are useful in separating modes of operation, such as scanning the body of a string literal, or ignoring everything except the closing delimiter of a block-style comment.

Each regex rule in each scanner mode must specify, for each target declared in the preamble, a segment of code known as a rule handler. This code will be executed when the corresponding regex has been successfully matched by the generated scanner.

// ///////////////////////////////////////////////////////////////////////////
// This particular example will recognize tokens from the regular language
// consisting of integers (sequences of decimal digits), whitespace (tabs, 
// newlines and spaces), operators (plusses and asterisks), and C-style block
// comments.  There are two scanner modes, one (%scanner_mode MAIN) for 
// recognizing the first three language elements mentioned, and the second
// (%scanner_mode BLOCK_COMMENT) for recognizing the last.
// ///////////////////////////////////////////////////////////////////////////

%targets cpp

%target.cpp.class_name AwesomeScanner
%target.cpp.header_filename "awesome.hpp"
%target.cpp.implementation_filename "awesome.cpp"
%target.cpp.bottom_of_implementation_file %{
int main (int argc, char **argv)
{
    AwesomeScanner scanner;
    while (!scanner.IsAtEndOfInput())
        scanner.Scan();
    return 0;
}
%}

%macro DIGIT ([0-9])
%macro INTEGER ({DIGIT}+)

%start_in_scanner_mode MAIN

%%

%scanner_mode MAIN
:
    ({INTEGER}) %target.cpp { std::cout << "integer" << std::endl; }
|
    ([\t\n ]) %target.cpp { std::cout << "whitespace" << std::endl; }
|
    ([*+]) %target.cpp { std::cout << "operator" << std::endl; }
|
    (\z) %target.cpp { std::cout << "EOF" << std::endl; return; }
|
    (/[*]) %target.cpp { ScannerMode(Mode::BLOCK_COMMENT); }
;

%scanner_mode BLOCK_COMMENT
:
    ([*]/) %target.cpp { std::cout << "block comment" << std::endl; ScannerMode(Mode::MAIN); }
|
    ([^*]+|[*]) %target.cpp { }
|
    (\z) %target.cpp { std::cout << "unterminated block comment" << std::endl; ScannerMode(Mode::MAIN); }
;

Using Targets

Using a particular target in a reflex scanner is straightforward. Add its identifier (e.g. cpp) to the %targets directive at the top of the source file. This will cause reflex to look for a targetspec file in the targets search path (TODO: make ref) corresponding to a the identifier specified, using a canonical form of reflex.XXX.targetspec where XXX is the target's identifier. The targetspec file defines the target-specific interaction between the primary source file and the generated code, which is produced using the codespec files it indicates.

Adding another target to the %targets directive will cause reflex to require that certain directives, as specified by the corresponding targetspec, are defined in the primary source file. The targetspec file mainly consists of specifications for "required" and "optional" directives. As expected, all "required" directives established by the targetspec must be present in the primary source file, or indicative error messages will be emitted.

The targetspec file will also require that the filename(s) of the output file(s) be specified.

%target.target_name.directive_name directive_value

Where target_name is the name of the target (e.g. cpp), directive_name is the targetspec-specified identifier for the particular directive, and directive_value is the value for the directive, of a type specified by the targetspec (see The Targetspec And Codespec Paradigm for details on these types). These directives go in the preamble of the primary source, so they are sensitive to newlines; there must not be a newline anywhere between the opening % and the beginning of directive_value.

In addition to adding the directive values required by the targetspec, you must also add code handlers for all regex rules, using the following form.

    (whatever regex) %target.XXX { code goes here }

These declarations are in the body of the primary source, so are not newline-sensitive. If rule handler code isn't present for each target for each regex, error messages will be emitted.

At this point, you should have enough working knowledge to use reflex to implement a scanner using the target of your choice.