Throughout this document, the term "primary source file" will be used to refer to the .reflex
source file which is written by the user and is directly parsed by reflex
.
$ reflex -h
.reflex
source file is to run it as follows. $ reflex -I path/to/targets/directory input.reflex -o desired/output/directory/
reflex.*.targetspec
and reflex.*.codespec
files. In the BARF tarball, this is the barf/targets
directory. You can also set the environment variable BARF_TARGETS_SEARCH_PATH to the (absolute) path instead of using the -I
option.Everything at the top of the file, above the %% delimiter, is the preamble. The preamble is where certain meta-properties are declared via directives:
reflex
will process the specified scanner and internally generate the corresponding state machine, emitting warnings and errors if necessary, but will not generate any source code. For example, %targets cpp
reflex.*.targetspec
(e.g. reflex.cpp.targetspec
for the cpp
target) which defines what language-specific directives are available, which ones are required to be supplied in the .reflex
source, as well as the parameter type for each. For example, the ccp
target only requires the class_name
directive be specified; this would be done as %target.cpp.class_name AwesomeScanner
%macro
directives or in scanner regexes. For example, %macro DIGIT ([0-9])
%start_in_scanner_mode MAIN
It should be noted that the preamble for both reflex
and trison
are nearly identical in format -- the exception being the tool-specific directives such as %macro
on the reflex
side, and XXX (TODO: fill in later) on the trison
side.
Below the %% delimiter is the scanner mode specification. The format is comprised of one or more scanner modes, each containing zero or more regex rules. Each scanner mode is essentially a unique mode of operation which dictates what regex rules can be accepted at any given time. When the scanner is in a particular scanner mode, only the regex rules specified within can be accepted. The generated scanner code will provide facilities for switching the current scanner mode at runtime.
Scanner modes are useful in separating modes of operation, such as scanning the body of a string literal, or ignoring everything except the closing delimiter of a block-style comment.
Each regex rule in each scanner mode must specify, for each target declared in the preamble, a segment of code known as a rule handler. This code will be executed when the corresponding regex has been successfully matched by the generated scanner.
The following is an example primary source file which uses the cpp
target.
// /////////////////////////////////////////////////////////////////////////// // This particular example will recognize tokens from the regular language // consisting of integers (sequences of decimal digits), whitespace (tabs, // newlines and spaces), operators (plusses and asterisks), and C-style block // comments. There are two scanner modes, one (%scanner_mode MAIN) for // recognizing the first three language elements mentioned, and the second // (%scanner_mode BLOCK_COMMENT) for recognizing the last. // /////////////////////////////////////////////////////////////////////////// %targets cpp %target.cpp.class_name AwesomeScanner %target.cpp.header_filename "awesome.hpp" %target.cpp.implementation_filename "awesome.cpp" %target.cpp.bottom_of_implementation_file %{ int main (int argc, char **argv) { AwesomeScanner scanner; while (!scanner.IsAtEndOfInput()) scanner.Scan(); return 0; } %} %macro DIGIT ([0-9]) %macro INTEGER ({DIGIT}+) %start_in_scanner_mode MAIN %% %scanner_mode MAIN : ({INTEGER}) %target.cpp { std::cout << "integer" << std::endl; } | ([\t\n ]) %target.cpp { std::cout << "whitespace" << std::endl; } | ([*+]) %target.cpp { std::cout << "operator" << std::endl; } | (\z) %target.cpp { std::cout << "EOF" << std::endl; return; } | (/[*]) %target.cpp { ScannerMode(Mode::BLOCK_COMMENT); } ; %scanner_mode BLOCK_COMMENT : ([*]/) %target.cpp { std::cout << "block comment" << std::endl; ScannerMode(Mode::MAIN); } | ([^*]+|[*]) %target.cpp { } | (\z) %target.cpp { std::cout << "unterminated block comment" << std::endl; ScannerMode(Mode::MAIN); } ;
TODO: overview of how a scanner functions
Using a particular target in a reflex
scanner is straightforward. Add its identifier (e.g. cpp
) to the %targets
directive at the top of the source file. This will cause reflex
to look for a targetspec file in the targets search path (TODO: make ref) corresponding to a the identifier specified, using a canonical form of reflex.XXX.targetspec
where XXX
is the target's identifier. The targetspec file defines the target-specific interaction between the primary source file and the generated code, which is produced using the codespec files it indicates.
Adding another target to the %targets
directive will cause reflex
to require that certain directives, as specified by the corresponding targetspec, are defined in the primary source file. The targetspec file mainly consists of specifications for "required" and "optional" directives. As expected, all "required" directives established by the targetspec must be present in the primary source file, or indicative error messages will be emitted.
The targetspec file will also require that the filename(s) of the output file(s) be specified.
A targetspec directive value is specified in the following manner.
%target.target_name.directive_name directive_value
Where target_name
is the name of the target (e.g. cpp
), directive_name
is the targetspec-specified identifier for the particular directive, and directive_value
is the value for the directive, of a type specified by the targetspec (see The Targetspec And Codespec Paradigm for details on these types). These directives go in the preamble of the primary source, so they are sensitive to newlines; there must not be a newline anywhere between the opening %
and the beginning of directive_value
.
In addition to adding the directive values required by the targetspec, you must also add code handlers for all regex rules, using the following form.
(whatever regex) %target.XXX { code goes here }
These declarations are in the body of the primary source, so are not newline-sensitive. If rule handler code isn't present for each target for each regex, error messages will be emitted.
At this point, you should have enough working knowledge to use reflex
to implement a scanner using the target of your choice.