mirror of https://github.com/swig/swig
285 lines
10 KiB
HTML
285 lines
10 KiB
HTML
<html>
|
|
<head>
|
|
<title>SWIG C Scanner</title>
|
|
</head>
|
|
|
|
<body>
|
|
<center>
|
|
<h1>SWIG C/C++ Scanning</h1>
|
|
|
|
<p>
|
|
David M. Beazley <br>
|
|
dave-swig@dabeaz.com<br>
|
|
January 11, 2007<br>
|
|
|
|
</b>
|
|
</center>
|
|
|
|
<h2>Introduction</h2>
|
|
|
|
This document describes functions that can be used to tokenize C/C++
|
|
input text. These functions are relatively low-level and are meant to
|
|
be used in the implementation of scanners that can be plugged into yacc or used for
|
|
other purposes. For instance, the preprocessor uses these functions to evaluate and test
|
|
constant expressions.
|
|
|
|
<p>
|
|
All of these functions are declared in <tt>Source/Swig/swigscan.h</tt>. This API is considered to be stable.
|
|
|
|
<h2>Creation and Deletion of Scanners</h2>
|
|
|
|
The following functions are used to create and destroy a scanner object. More than one scanner object can be created and used
|
|
as necessary.
|
|
|
|
<p>
|
|
<b><tt>Scanner *NewScanner()</tt></b>
|
|
|
|
<blockquote>
|
|
Creates a new scanner object. The scanner contains initially contains no text. To feed text to the scanner use <tt>Scanner_push()</tt>.
|
|
</blockquote>
|
|
|
|
<p>
|
|
<b><tt>Scanner *DelScanner()</tt></b>
|
|
|
|
<blockquote>
|
|
Deletes a scanner object.
|
|
</blockquote>
|
|
|
|
<h2>Scanner Functions</h2>
|
|
|
|
<p>
|
|
<b><tt>void Scanner_clear(Scanner *s)</tt></b>
|
|
<blockquote>
|
|
Clears all text from the scanner. This can be used to reset a scanner to its initial state, ready to receive new input text.
|
|
</blockquote>
|
|
|
|
<p>
|
|
<b><tt>void Scanner_push(Scanner *s, String *text)</tt></b>
|
|
<blockquote>
|
|
Pushes an input string into the scanner. Subsequent tokens will be
|
|
returned from the new string. If the scanner is already processing a
|
|
string, the pushed string takes precedence--in effect, interrupting
|
|
the scanning of the previous string. This behavior is used to
|
|
implement certain SWIG features such as the <tt>%inline</tt>
|
|
directive. Once the pushed string has been completely scanned, the
|
|
scanner will return to scanning the previous string (if any). The
|
|
scanning of text relies upon the DOH file interface to strings
|
|
(<tt>Getc()</tt>, <tt>Ungetc()</tt>, etc.). Prior to calling this
|
|
function, the input string should be set so that its file pointer is
|
|
in the location where you want scanning to begin. You may have to
|
|
use <tt>Seek()</tt> to set the file pointer back to the beginning of a
|
|
string prior to using this function.
|
|
</blockquote>
|
|
|
|
<p>
|
|
<b><tt>void Scanner_pushtoken(Scanner *s, int tokvalue, String_or_char *val)</tt></b>
|
|
<blockquote>
|
|
Pushes a token into the scanner. This exact token will be returned by the next call to <tt>Scanner_token()</tt>.
|
|
<tt>tokvalue</tt> is the integer token value to return and <tt>val</tt> is the token text to return. This
|
|
function is only used to handle very special parsing cases. For instance, if you need the scanner to
|
|
return a fictitious token into order to enter a special parsing case.
|
|
</blockquote>
|
|
|
|
<p>
|
|
<b><tt>int Scanner_token(Scanner *s)</tt></b>
|
|
|
|
<blockquote>
|
|
Returns the next token. An integer token code is returned (see table below) on success. If no more input text is
|
|
available 0 is returned. If a scanning error occurred, -1 is returned. In this case, error information can be
|
|
obtained using <tt>Scanner_errinfo()</tt>.
|
|
</blockquote>
|
|
|
|
<p>
|
|
<b><tt>String *Scanner_text(Scanner *s)</tt></b>
|
|
<blockquote>
|
|
Returns the scanned text corresponding to the last token returned by <tt>Scanner_token()</tt>. The returned string
|
|
is only valid until the next call to <tt>Scanner_token()</tt>. If you need to save it, make a copy.
|
|
</blockquote>
|
|
|
|
<p>
|
|
<b><tt>void Scanner_skip_line(Scanner *s)</tt></b>
|
|
<blockquote>
|
|
Skips to the end of the current line. The text skipped can be obtained using <tt>Scanner_text()</tt> afterwards.
|
|
</blockquote>
|
|
|
|
<p>
|
|
<b><tt>void Scanner_skip_balanced(Scanner *s, int startchar, int endchar)</tt></b>
|
|
<blockquote>
|
|
Skips to the end of a block of text denoted by starting and ending characters. For example, <tt>{</tt> and <tt>}</tt>. The
|
|
function is smart about how it skips text. String literals and comments are ignored. The function also is aware of nesting. The
|
|
skipped text can be obtained using <tt>Scanner_text()</tt> afterwards. Returns 0 on success, -1 if no matching <tt>endchar</tt> could be found.
|
|
</blockquote>
|
|
|
|
|
|
<p>
|
|
<b><tt>void Scanner_set_location(Scanner *s, int startchar, int endchar)</tt></b>
|
|
<blockquote>
|
|
Changes the current filename and line number of the scanner.
|
|
</blockquote>
|
|
|
|
<p>
|
|
<b><tt>String *Scanner_file(Scanner *s)</tt></b>
|
|
<blockquote>
|
|
Gets the current filename associated with text in the scanner.
|
|
</blockquote>
|
|
|
|
<p>
|
|
<b><tt>int Scanner_line(Scanner *s)</tt></b>
|
|
<blockquote>
|
|
Gets the current line number associated with text in the scanner.
|
|
</blockquote>
|
|
|
|
<p>
|
|
<b><tt>int Scanner_start_line(Scanner *s)</tt></b>
|
|
<blockquote>
|
|
Gets the starting line number of the last token returned by the scanner.
|
|
</blockquote>
|
|
|
|
<p>
|
|
<b><tt>void Scanner_idstart(Scanner *s, char *idchar)</tt></b>
|
|
<blockquote>
|
|
Sets additional characters (other than the C default) that may be used to start C identifiers. <tt>idchar</tt> is a string
|
|
containing the characters (e.g., "%@"). The purpose of this function is to up special keywords such as "%module" or "@directive" as
|
|
simple identifiers.
|
|
</blockquote>
|
|
|
|
<p>
|
|
<b><tt>String *Scanner_errmsg(Scanner *s)</tt></b>
|
|
<blockquote>
|
|
Returns the error message associated with the last scanner error (if any). This will only return a meaningful result
|
|
if <tt>Scanner_token()</tt> returned -1.
|
|
</blockquote>
|
|
|
|
<p>
|
|
<b><tt>int Scanner_errline(Scanner *s)</tt></b>
|
|
<blockquote>
|
|
Returns the line number associated with the last scanner error (if any). This will only return a meaningful result
|
|
if <tt>Scanner_token()</tt> returned -1. The line number usually corresponds to the starting line number of a particular
|
|
token (e.g., for unterminated strings, comments, etc.).
|
|
</blockquote>
|
|
|
|
<p>
|
|
<b><tt>int Scanner_isoperator(int tokval)</tt></b>
|
|
<blockquote>
|
|
A convenience function that returns 0 or 1 depending on whether <tt>tokval</tt> is a valid C/C++ operator (i.e., a candidate for
|
|
operator overloading).
|
|
</blockquote>
|
|
|
|
<p>
|
|
<b><tt>void Scanner_freeze_line(int val)</tt></b>
|
|
<blockquote>
|
|
Freezes the current line number depending upon whether or not <tt>val</tt> is 1 or 0. When the line number is frozen, newline characters will not result in
|
|
updates to the line number. This is sometimes useful in tracking line numbers through complicated macro expansions.
|
|
</blockquote>
|
|
|
|
|
|
<h2>Token Codes</h2>
|
|
|
|
The following table shows token codes returned by the scanner. These are integer codes returned by
|
|
the <tt>Scanner_token()</tt> function.
|
|
|
|
<blockquote>
|
|
<pre>
|
|
Token code C Token
|
|
------------------------- -------------
|
|
SWIG_TOKEN_LPAREN (
|
|
SWIG_TOKEN_RPAREN )
|
|
SWIG_TOKEN_SEMI ;
|
|
SWIG_TOKEN_COMMA ,
|
|
SWIG_TOKEN_STAR *
|
|
SWIG_TOKEN_TIMES *
|
|
SWIG_TOKEN_LBRACE {
|
|
SWIG_TOKEN_RBRACE }
|
|
SWIG_TOKEN_EQUAL =
|
|
SWIG_TOKEN_EQUALTO ==
|
|
SWIG_TOKEN_NOTEQUAL !=
|
|
SWIG_TOKEN_PLUS +
|
|
SWIG_TOKEN_MINUS -
|
|
SWIG_TOKEN_AND &
|
|
SWIG_TOKEN_LAND &&
|
|
SWIG_TOKEN_OR |
|
|
SWIG_TOKEN_LOR ||
|
|
SWIG_TOKEN_XOR ^
|
|
SWIG_TOKEN_LESSTHAN <
|
|
SWIG_TOKEN_GREATERTHAN >
|
|
SWIG_TOKEN_LTEQUAL <=
|
|
SWIG_TOKEN_GTEQUAL >=
|
|
SWIG_TOKEN_LTEQUALGT <=>
|
|
SWIG_TOKEN_NOT ~
|
|
SWIG_TOKEN_LNOT !
|
|
SWIG_TOKEN_LBRACKET [
|
|
SWIG_TOKEN_RBRACKET ]
|
|
SWIG_TOKEN_SLASH /
|
|
SWIG_TOKEN_DIVIDE /
|
|
SWIG_TOKEN_BACKSLASH \
|
|
SWIG_TOKEN_POUND #
|
|
SWIG_TOKEN_PERCENT %
|
|
SWIG_TOKEN_MODULO %
|
|
SWIG_TOKEN_COLON :
|
|
SWIG_TOKEN_DCOLON ::
|
|
SWIG_TOKEN_DCOLONSTAR ::*
|
|
SWIG_TOKEN_LSHIFT <<
|
|
SWIG_TOKEN_RSHIFT >>
|
|
SWIG_TOKEN_QUESTION ?
|
|
SWIG_TOKEN_PLUSPLUS ++
|
|
SWIG_TOKEN_MINUSMINUS --
|
|
SWIG_TOKEN_PLUSEQUAL +=
|
|
SWIG_TOKEN_MINUSEQUAL -=
|
|
SWIG_TOKEN_TIMESEQUAL *=
|
|
SWIG_TOKEN_DIVEQUAL /=
|
|
SWIG_TOKEN_ANDEQUAL &=
|
|
SWIG_TOKEN_OREQUAL |=
|
|
SWIG_TOKEN_XOREQUAL ^=
|
|
SWIG_TOKEN_LSEQUAL <<=
|
|
SWIG_TOKEN_RSEQUAL >>=
|
|
SWIG_TOKEN_MODEQUAL %=
|
|
SWIG_TOKEN_ARROW ->
|
|
SWIG_TOKEN_ARROWSTAR ->*
|
|
SWIG_TOKEN_PERIOD .
|
|
SWIG_TOKEN_AT @
|
|
SWIG_TOKEN_DOLLAR $
|
|
SWIG_TOKEN_ENDLINE Literal newline
|
|
SWIG_TOKEN_ID identifier
|
|
SWIG_TOKEN_FLOAT Floating point with F suffix (e.g., 3.1415F)
|
|
SWIG_TOKEN_DOUBLE Floating point (e.g., 3.1415 )
|
|
SWIG_TOKEN_INT Integer (e.g., 314)
|
|
SWIG_TOKEN_UINT Unsigned integer (e.g., 314U)
|
|
SWIG_TOKEN_LONG Long integer (e.g., 314L)
|
|
SWIG_TOKEN_ULONG Unsigned long integer (e.g., 314UL)
|
|
SWIG_TOKEN_LONGLONG Long long integer (e.g., 314LL )
|
|
SWIG_TOKEN_ULONGLONG Unsigned long long integer (e.g., 314ULL)
|
|
SWIG_TOKEN_CHAR Character literal in single quotes ('c')
|
|
SWIG_TOKEN_STRING String literal in double quotes ("str")
|
|
SWIG_TOKEN_RSTRING Reverse quote string (`str`)
|
|
SWIG_TOKEN_CODEBLOCK SWIG code literal block %{ ... %}
|
|
SWIG_TOKEN_COMMENT C or C++ comment (// or /* ... */)
|
|
SWIG_TOKEN_ILLEGAL Illegal character
|
|
</pre>
|
|
</blockquote>
|
|
|
|
<b>Notes</b>
|
|
|
|
<ul>
|
|
<li>When more than one token code exist for the same token text, those codes are identical (e.g., <tt>SWIG_TOKEN_STAR</tt> and <tt>SWIG_TOKEN_TIMES</tt>).
|
|
|
|
<p>
|
|
<li>
|
|
String literals are returned in their exact representation in which escape codes (if any) have been interpreted.
|
|
|
|
<p>
|
|
<li>
|
|
All C identifiers and keywords are simply returned as <tt>SWIG_TOKEN_ID</tt>. To check for specific keywords, you will need to
|
|
add extra checking on the returned text.
|
|
|
|
<p>
|
|
<li>C and C++ comments include the comment starting and ending text (e.g., "//", "/*").
|
|
|
|
<p>
|
|
<li>The maximum token integer value is found in the constant <tt>SWIG_MAXTOKENS</tt>. This can be used if you wanted to create
|
|
an array or table for the purposes of remapping tokens to a different set of codes. For instance, if you are
|
|
using these functions to write a yacc-compatible lexer.
|
|
</ul>
|
|
|
|
</body>
|
|
</html>
|