Reflex

TODO: detailed description of what reflex is

Throughout this document, the term "primary source file" will be used to refer to the .reflex source file which is written by the user and is directly parsed by reflex.

Using Reflex From The Commandline

The help message is printed by issuing the following command (the dollar sign indicates a unix-style command prompt, and isn't part of the command to issue).
$ reflex -h
The help menu describes all commandline options in great detail. The simplest way to use reflex to compile a .reflex source file is to run it as follows.
$ reflex -I path/to/targets/directory input.reflex -o desired/output/directory/
Slashes should be replaced with backslashes appropriately for different operating systems. The "targets" directory is the directory containing reflex.*.targetspec and reflex.*.codespec files. In the BARF tarball, this is the barf/targets directory. You can also set the environment variable BARF_TARGETS_SEARCH_PATH to the (absolute) path instead of using the -I option.

Format Of The Reflex Source File

A primary source file is divided into two parts -- the preamble and the scanner specification.

Everything at the top of the file, above the %% delimiter, is the preamble. The preamble is where certain meta-properties are declared via directives:

The preamble is newline-sensitive, so each directive must be on its own line (though there can be an arbitrary number of newlines between directive lines).

It should be noted that the preamble for both reflex and trison are nearly identical in format -- the exception being the tool-specific directives such as %macro on the reflex side, and XXX (TODO: fill in later) on the trison side.

Below the %% delimiter is the scanner mode specification. The format is comprised of one or more scanner modes, each containing zero or more regex rules. Each scanner mode is essentially a unique mode of operation which dictates what regex rules can be accepted at any given time. When the scanner is in a particular scanner mode, only the regex rules specified within can be accepted. The generated scanner code will provide facilities for switching the current scanner mode at runtime.

Scanner modes are useful in separating modes of operation, such as scanning the body of a string literal, or ignoring everything except the closing delimiter of a block-style comment.

Each regex rule in each scanner mode must specify, for each target declared in the preamble, a segment of code known as a rule handler. This code will be executed when the corresponding regex has been successfully matched by the generated scanner.

The following is an example primary source file which uses the cpp target.

// ///////////////////////////////////////////////////////////////////////////
// This particular example will recognize tokens from the regular language
// consisting of integers (sequences of decimal digits), whitespace (tabs, 
// newlines and spaces), operators (plusses and asterisks), and C-style block
// comments.  There are two scanner modes, one (%scanner_mode MAIN) for 
// recognizing the first three language elements mentioned, and the second
// (%scanner_mode BLOCK_COMMENT) for recognizing the last.
// ///////////////////////////////////////////////////////////////////////////

%targets cpp

%target.cpp.class_name AwesomeScanner
%target.cpp.header_filename "awesome.hpp"
%target.cpp.implementation_filename "awesome.cpp"
%target.cpp.bottom_of_implementation_file %{
int main (int argc, char **argv)
{
    AwesomeScanner scanner;
    while (!scanner.IsAtEndOfInput())
        scanner.Scan();
    return 0;
}
%}

%macro DIGIT ([0-9])
%macro INTEGER ({DIGIT}+)

%start_in_scanner_mode MAIN

%%

%scanner_mode MAIN
:
    ({INTEGER}) %target.cpp { std::cout << "integer" << std::endl; }
|
    ([\t\n ]) %target.cpp { std::cout << "whitespace" << std::endl; }
|
    ([*+]) %target.cpp { std::cout << "operator" << std::endl; }
|
    (\z) %target.cpp { std::cout << "EOF" << std::endl; return; }
|
    (/[*]) %target.cpp { ScannerMode(Mode::BLOCK_COMMENT); }
;

%scanner_mode BLOCK_COMMENT
:
    ([*]/) %target.cpp { std::cout << "block comment" << std::endl; ScannerMode(Mode::MAIN); }
|
    ([^*]+|[*]) %target.cpp { }
|
    (\z) %target.cpp { std::cout << "unterminated block comment" << std::endl; ScannerMode(Mode::MAIN); }
;

TODO: overview of how a scanner functions

Using Targets

See The Targetspec And Codespec Paradigm.

Using a particular target in a reflex scanner is straightforward. Add its identifier (e.g. cpp) to the %targets directive at the top of the source file. This will cause reflex to look for a targetspec file in the targets search path (TODO: make ref) corresponding to a the identifier specified, using a canonical form of reflex.XXX.targetspec where XXX is the target's identifier. The targetspec file defines the target-specific interaction between the primary source file and the generated code, which is produced using the codespec files it indicates.

Adding another target to the %targets directive will cause reflex to require that certain directives, as specified by the corresponding targetspec, are defined in the primary source file. The targetspec file mainly consists of specifications for "required" and "optional" directives. As expected, all "required" directives established by the targetspec must be present in the primary source file, or indicative error messages will be emitted.

The targetspec file will also require that the filename(s) of the output file(s) be specified.

A targetspec directive value is specified in the following manner.

%target.target_name.directive_name directive_value

Where target_name is the name of the target (e.g. cpp), directive_name is the targetspec-specified identifier for the particular directive, and directive_value is the value for the directive, of a type specified by the targetspec (see The Targetspec And Codespec Paradigm for details on these types). These directives go in the preamble of the primary source, so they are sensitive to newlines; there must not be a newline anywhere between the opening % and the beginning of directive_value.

In addition to adding the directive values required by the targetspec, you must also add code handlers for all regex rules, using the following form.

    (whatever regex) %target.XXX { code goes here }

These declarations are in the body of the primary source, so are not newline-sensitive. If rule handler code isn't present for each target for each regex, error messages will be emitted.

At this point, you should have enough working knowledge to use reflex to implement a scanner using the target of your choice.


Hosted by SourceForge.net Logo -- Generated on Mon Jan 7 22:58:01 2008 for BARF by doxygen 1.5.1