TABLE OF CONTENTS
 (Make a search on the tag in the right
 column to jump to the associated section.
 Vim users can simply type * on the tag;
 Emacs users do that with C-s C-w (I think);
 Other editors: I don't know!)
=======================================
          Introduction  intro_tag
           Input files  input_tag
            Paragraphs  paragraphs_tag
    Declaring patterns  patterns_tag
               Classes  classes_tag
      Protecting input  protect_tag
       Technical stuff  technical_tag
 An example: i-doc.lua  example_tag
=======================================


================================== intro_tag
=== Introduction =================
==================================

Interpreter preprocesses input files before their contents is fed to TeX. It
is meant to write document with whatever markup one wishes to define while
using normal TeX macros in the background. As a simple example, suppose you
have a macro "\bold" to put text in boldface; then Interpreter lets you map
"*text*", or "<strong>text</strong>", or simply "!text", or anything else, to
"\bold{text}". Interpreter doesn't perform any trickery with active
characters; instead, it manipulates the strings representing the lines of
a file and search for patterns.

There are two main advantages: first, TeX documents can be typeset with
a completely non-TeX syntax; second, if one uses some lightweight markup
language, the source file is much easier to read and might even be more useful
than the typeset PDF file, e.g. for some technical documentation you want to
read directly in your text editor while writing code (powerful editors
generally have their own documentation in such a format, for a good reason).
A third advantage, not explored in this documentation, is that while feeding
modified lines to TeX you can also translate the original lines into, say,
HTML, and write them to an external file, thus creating both PDF and HTML
output at once.


================================== input_tag
=== Input files ==================
==================================

Once Interpreter is loaded with 

          \input interpreter

in plain TeX or

          \usepackage{interpreter}

in LaTeX, files to be processed are input as follows:

          \interpretfile{<language>}{<file>}

There should exist a file "i-<language>.lua" containing the language used in
<file>. For instance, the source of this documentation is
"interpreter-doc.txt", input in the master file "interpreter-doc.tex" with

          \interpretfile{doc}{interpreter-doc.txt}

and the interpretation to be used is defined in "i-doc.lua". The contents of
such an interpretation file is the object of the rest of this documentation.


================================== paragraphs_tag
=== Paragraphs ===================
==================================

Interpreter doesn't process lines one by one. Instead, it gathers an entire
paragraph and then processes the lines. It is important because you can
manipulate an entire paragraph when a given pattern is detected, and modify
several lines according to what happens in only one. A paragraph in
Interpreter has nothing to do with what TeX considers a paragraph; instead, it
is defined by the following string.

> interpreter.paragraph [Default: blank line with spaces ignored]
  A string to be interpreted as a paragraph boundary when Interpreter collects
  lines before processing them. The string actually represents a pattern, so
  magic characters are obeyed. The default is "%s*", i.e. a blank line is
  considered a paragraph boundary, spaces notwithstanding. Of course, the end
  of the file itself is a paragraph boundary.
  

================================== patterns_tag
=== Declaring patterns ===========
==================================

Once the lines of a paragraph have been collected, Interpreter searches them
trying to match declared patterns, but it doesn't do so indiscriminately:
patterns are searched in a given order, as explained below.

Patterns are searched for in each line only, i.e. no match can occur across
lines. However, since you can manipulate entire paragraphs based on a match in
one line, the limitation easily vanishes.

>  interpreter.add_pattern(<table>)
   This is the basic function used to defined patterns. The <table> may
   contain the following entries, along other entries Interpreter won't use
   but which can be useful to you, especially with "call" below. The function
   returns a table.

>> class [Default: "intepreter.default_class"]
   The class of the pattern. See the section on classes.

>> pattern
   The pattern to match. Lua's magic characters are in force and should be
   escaped with "%" if necessary, unless "nomagic" is "true" (or the pattern
   itself is the result of "interpreter.nomagic").

>> nomagic [Default: "false"]
   A boolean deciding whether the pattern should be transformed with
   "interpreter.nomagic".

>> replace
   The replacement for the pattern, applied only if there is no "call" entry.
   This may be a string, a table or a function. Interpreter simply executes
   something similar to "string.gsub()", hence the replacement follows this
   function's ordinary syntax. More precisely, if "replace" is a string, the
   pattern is replaced with it; in this string, "%n" may be used to denote the
   _n_th capture in the pattern. If "replace" is a table, the first capture or
   the entire match (if there is no capture) is used as the key, and the
   associated value is used as the replacement. If "replace" is a function, it
   is called with the captures passed as arguments, or the entire match if
   there is no capture.  For instance, the following pattern will replace all
   "*text*" with "\bold{text}":

          interpreter.add_pattern{
            pattern = "%*(.-)%*",
            replace = [[\bold{%1}]]
          }

>> offset [Default: 0]
   The number of positions Interpreter should shift to the right after a match
   has occurred. Normally, Interpreter starts searching for another occurrence
   of the current pattern at the same position where it found the last one.
   However, loops might easily occur: the replacement for a pattern may very
   well contain another match for the same pattern, so Interpreter will get
   stuck. Suppose for instance you want to replace "TeX" with "\TeX". The
   first match will do that, but then Interpreter will start searching again
   at the backslash, producing "\\TeX", then "\\\TeX", etc. In this case, if
   you set "offset" to 2 in the pattern, then search will start again at the
   "e" and no new match will occur.

>> call
   This entry shall contain a function to be called if there is a match (if
   this entry exists, "replace" isn't applied). It is meant to perform complex
   tasks that aren't amenable to simple string replacement. The function will
   be executed as follows:

          function (paragraph, line, index, pattern)

   "paragraph" is a table representing the current paragraph; lines are stored
   at successive indices. The last line of this paragraph is always the
   paragraph boundary (see "interpreter.paragraph"), unless the paragraph
   stopped at the end of the file. The second argument, "line", is a number
   representing the index in "paragraph" containing the line where the pattern
   was found; "index" is the position in this line where the match occurred.
   Finally, "pattern" is the entire table declared with
   "interpreter.add_pattern" and containing all the entries discussed here.

   The function may return zero, one, or two numbers. If it returns none, the
   search for the next occurrence of the pattern will start again on the same
   line (rather, on the line with the same position in the paragraph), at
   "index". If it returns one number, the search will resume at the same line
   but at position _n_, with _n_ the returned number. Finally, if two numbers
   are returned, the search will resume at line _m_ at position _n_, _m_ and
   _n_ being the returned values. Specifying which line should be examined
   when the search resumes might be necessary if the function adds new lines
   in the paragraph _before_ the current line, since Interpreter only keeps
   count of line numbers.

   The entire paragraph can thus be modified if necessary. For instance,
   suppose you want to declare comments in your source file with only
   "!Comment" in the first line, i.e. TeX should ignore a paragraph such as:

          !Comment
          This should be ignored
          by TeX

   Then the following pattern will do (where the function requires only the
   first argument):

          local function comment (paragraph)
            for n, l in ipairs(paragraph) do
              paragraph[n] = "%" .. l
            end
          end
          interpreter.add_pattern{
            pattern = "^!Comment",
            call    = comment
          }

> interpreter.nomagic (string)
  A function which reverses the usual Lua magic for patterns: ordinary magic
  characters are normal characters here, unless they are prefixed with "%", in
  which case they are magic again. For instance, a pattern like ".+" is
  normally interpreted as ``one or more characters''. If passed to this
  function, a pattern is returned meaning ``a dot followed by a plus sign''.
  On the contrary, "%.%+" normally has the second interpretation, while with
  "interpreter.nomagic" it has the first one. The function makes another
  transformation: "..." is used to denote a capture "(.-)". Thus
  "interpreter.nomagic('*...*')" returns a pattern matching any number of
  characters surrounded by stars and capturing those characters; this would be
  expressed in ordinary Lua magic as "%*(.-)%*".


================================== classes_tag
=== Classes ======================
==================================

As already alluded to, the search for patterns isn't done at random. Instead,
patterns are organized in classes, which are applied one after the other. More
precisely, the process is as follows: Interpreter searches the entire
paragraph for the first pattern in class~1, then for the second pattern in the
same class, then for the third, etc., then when there is no pattern left in
class~1 it does the same with class~2, up to class~_n_, where _n_ is the
hightest class number such that there exists a class _n - 1_ (in other words,
classes should be numbered consecutively). Finally, the same goes for the
patterns in class~0 (which always exists, even if it contains no pattern).

Inside a class, patterns are ordered by length from long to short, or
alphabetically if two patterns have the same length. This means that if you
use e.g. "/text/" for italics and "//text//" for bold, you don't need to put
the second pattern in a class before the first to avoid "//text//" being
interpreted as two empty arguments in italics surrounding a text in roman.
Since the way the bold-pattern will be declared, e.g. "//(.-)//", is probably
longer than for the italic-pattern, e.g. "/(.-)/", it will always match first.

That said, the sorting isn't very clever and simply relies on the number of
symbols, no matter what they mean; in the patterns above, the parentheses
denote a capture but they still count in the pattern's length as understood by
Interpreter. Alternatively, while ".*" denotes ``zero or more character'' and
"%+" means ``a plus sign'' ("+" being magic, you have to escape it to refer to
it), in Interpreter's eye the two patterns have the same length: two. Finally,
one should be aware that patterns declared with a "nomagic" entry set to
"true" are sorted after they've been transformed (so that their real length
might not be obvious). So classes are needed when patterns need a proper
ordering no matter their lengths. For instance, some patterns should always be
declared first, as they protect input from Interpreter (see next section),
while others might need to be declared last, as they rely on what previous
patterns might have done.  Besides, classes are metatables for the patterns
they contain.

> interpreter.default_class [Default: 1]
  All patterns belong to a class, even though you may omit the "class" entry
  when declaring one. In this case, the pattern is assigned to the class
  denoted by this number.

> interpreter.set_class(number, table)
  Defines class "number" as "table". Classes don't need to be defined
  beforehand for patterns to be added to them (rather, Interpreter defines
  them implicitly when needed). However, classes are also metatables for the
  patterns, so that if there lacks an entry in a pattern's table, the class's
  entry is used if it exists. The function returns a table.


================================== protect_tag
=== Protecting input =============
==================================

Sometimes you want Interpreter to refrain from interpreting; that is most
useful for verbatim code, for instance. There are various ways to do that.

> interpreter.active [Default: true]
  A boolean switching Interpreter on and off. Beware, the switching applies
  only starting at the next paragraph.

> interpreter.protect([line])
  A function protecting all or part of the current paragraph. If "line" is
  given, it should be a number _n_, and line _n_ in the current paragraph will
  be protected; without "line", the entire paragraph is protected. Protecting
  means that the patterns not yet searched for will be ignored. For instance,
  if you want material to be read verbatim when surrounded with "<code>" and
  "</code>", you can declare a pattern as follows:

          local function verbatim (buffer)
            buffer[1] = "\\verbatim"
            buffer[#buffer - 1] = "\\endverbatim"
            intepreter.protect()
          end
          interpreter.add_pattern{
            pattern = "^%s*<code>%*s$",
            call    = verbatim,
            class   = 1
          }

  This code is extremely simplified : it assumes that "<code>" and "</code>"
  starts and ends the paragraph and that "</code>" isn't the last line of the
  file (otherwise it'd also be the last line in the paragraph, whereas here
  the last one is the paragraph boundary). An important point is that the
  pattern belongs to the first class, so it is called before all other
  patterns (provided there is no shorter pattern in class~1) and prevents them
  from doing anything, since the entire paragraph is protected. (Typesetting
  the material as verbatim material obviously depends on the "\verbatim"
  macro, not on Interpreter.)

> interpreter.escape
  A character which prevents patterns from being replaced if immediately
  preceded by it. As an example, if "interpreter.escape = '_'", and "*text*"
  denotes italic, then "*text*" will produce _text_ while "_*text*" will
  produce *text*. Once a paragraph has been processed, Interpreter removes all
  escape characters. Only one character can be an escape character.

> interpreter.protector(left[, right]) ["right" defaults to "left"]
  Defines two characters to protect what they surround. In other words,
  Interpreter replaces patterns only if the match isn't found between "left"
  and "right". Unlike the escape character, you can define as many protectors
  as you wish; and unlike the escape character again, Interpreter _doesn't_
  remove them once the paragraph has been processed, so you must take care of
  them. For instance:

          intepreter.protector('"')
          interpreter.add_pattern{
            pattern = '"(.-)"',
            replace = '\\verb`%1`',
            class   = 0
          }

  Anything between double quotes will be left untouched; then, when the
  paragraph has been processed for all other classes, a pattern in class~0
  calls the "\verb" command to take care of the argument. Note that the
  protectors should enclose what they protect without coinciding with it; this
  is not the case here, which is why the pattern is applied.

> interpreter.direct [Default: two percent signs then "I" and at least one space]
  A string, actually a pattern, signalling that the line which it begins
  should be processed as Lua code. The default is "%%%%I%s+", i.e. "%%I"
  followed by at least one space. The pattern shouldn't declare itself as
  attached to the beginning of the line (as in "^%%%%I%s+") because they will
  be matched at the beginning of the line only anyway. The line is processed
  with the "loadstring" function, and then turned into an empty line. For
  instance:

          %%I interpreter.active = false
          This won't be interpreted...
          %%I interpreter.active = true

  As this example shows, lines flagged with "interpreter.direct" don't obey
  "interpreter.active" and are always processed as described above.


================================== technical_tag
=== Technical stuff ==============
==================================

You don't have to bother with this section if you don't mind how Interpreter
does its job; actually you won't learn much anyway.

> interpreter.reset()
  A function which resets everything to default and deletes classes. It is used
  when calling "\interpretefile" so that new interpretetions start from zero.

> interpreter.register(function)
  A function called to put Interpreter's main function into the
  "post_linebreak_filter" callback; you can redefine it at will. If it is
  undefined, "callback.register()" is used, unless "luatexbase.add_to_callback()"
  is detected. (The detection takes place at the first call to
  "\interpretfile", so there is no need to load Interpreter after
  "luatexbase".)

> interpreter.unregister(function)
  A function called to remove Interpreter's main function from the
  "post_linebreak_filter" callback. It works similarly to the previous one.


================================== example_tag
=== An example: i-doc.lua ========
==================================

\interpretfile{doc}{i-doc.lua}