TABLE OF CONTENTS (Make a search on the tag in the right column to jump to the associated section. Vim users can simply type * on the tag; Emacs users do that with C-s C-w (I think); Other editors: I don't know!) ======================================= Introduction intro_tag Input files input_tag Paragraphs paragraphs_tag Declaring patterns patterns_tag Classes classes_tag Protecting input protect_tag Technical stuff technical_tag An example: i-doc.lua example_tag ======================================= ================================== intro_tag === Introduction ================= ================================== Interpreter preprocesses input files before their contents is fed to TeX. It is meant to write document with whatever markup one wishes to define while using normal TeX macros in the background. As a simple example, suppose you have a macro "\bold" to put text in boldface; then Interpreter lets you map "*text*", or "text", or simply "!text", or anything else, to "\bold{text}". Interpreter doesn't perform any trickery with active characters; instead, it manipulates the strings representing the lines of a file and search for patterns. There are two main advantages: first, TeX documents can be typeset with a completely non-TeX syntax; second, if one uses some lightweight markup language, the source file is much easier to read and might even be more useful than the typeset PDF file, e.g. for some technical documentation you want to read directly in your text editor while writing code (powerful editors generally have their own documentation in such a format, for a good reason). A third advantage, not explored in this documentation, is that while feeding modified lines to TeX you can also translate the original lines into, say, HTML, and write them to an external file, thus creating both PDF and HTML output at once. ================================== input_tag === Input files ================== ================================== Once Interpreter is loaded with \input interpreter in plain TeX or \usepackage{interpreter} in LaTeX, files to be processed are input as follows: \interpretfile{}{} There should exist a file "i-.lua" containing the language used in . For instance, the source of this documentation is "interpreter-doc.txt", input in the master file "interpreter-doc.tex" with \interpretfile{doc}{interpreter-doc.txt} and the interpretation to be used is defined in "i-doc.lua". The contents of such an interpretation file is the object of the rest of this documentation. ================================== paragraphs_tag === Paragraphs =================== ================================== Interpreter doesn't process lines one by one. Instead, it gathers an entire paragraph and then processes the lines. It is important because you can manipulate an entire paragraph when a given pattern is detected, and modify several lines according to what happens in only one. A paragraph in Interpreter has nothing to do with what TeX considers a paragraph; instead, it is defined by the following string. > interpreter.paragraph [Default: blank line with spaces ignored] A string to be interpreted as a paragraph boundary when Interpreter collects lines before processing them. The string actually represents a pattern, so magic characters are obeyed. The default is "%s*", i.e. a blank line is considered a paragraph boundary, spaces notwithstanding. Of course, the end of the file itself is a paragraph boundary. ================================== patterns_tag === Declaring patterns =========== ================================== Once the lines of a paragraph have been collected, Interpreter searches them trying to match declared patterns, but it doesn't do so indiscriminately: patterns are searched in a given order, as explained below. Patterns are searched for in each line only, i.e. no match can occur across lines. However, since you can manipulate entire paragraphs based on a match in one line, the limitation easily vanishes. > interpreter.add_pattern() This is the basic function used to defined patterns. The
may contain the following entries, along other entries Interpreter won't use but which can be useful to you, especially with "call" below. The function returns a table. >> class [Default: "intepreter.default_class"] The class of the pattern. See the section on classes. >> pattern The pattern to match. Lua's magic characters are in force and should be escaped with "%" if necessary, unless "nomagic" is "true" (or the pattern itself is the result of "interpreter.nomagic"). >> nomagic [Default: "false"] A boolean deciding whether the pattern should be transformed with "interpreter.nomagic". >> replace The replacement for the pattern, applied only if there is no "call" entry. This may be a string, a table or a function. Interpreter simply executes something similar to "string.gsub()", hence the replacement follows this function's ordinary syntax. More precisely, if "replace" is a string, the pattern is replaced with it; in this string, "%n" may be used to denote the _n_th capture in the pattern. If "replace" is a table, the first capture or the entire match (if there is no capture) is used as the key, and the associated value is used as the replacement. If "replace" is a function, it is called with the captures passed as arguments, or the entire match if there is no capture. For instance, the following pattern will replace all "*text*" with "\bold{text}": interpreter.add_pattern{ pattern = "%*(.-)%*", replace = [[\bold{%1}]] } >> offset [Default: 0] The number of positions Interpreter should shift to the right after a match has occurred. Normally, Interpreter starts searching for another occurrence of the current pattern at the same position where it found the last one. However, loops might easily occur: the replacement for a pattern may very well contain another match for the same pattern, so Interpreter will get stuck. Suppose for instance you want to replace "TeX" with "\TeX". The first match will do that, but then Interpreter will start searching again at the backslash, producing "\\TeX", then "\\\TeX", etc. In this case, if you set "offset" to 2 in the pattern, then search will start again at the "e" and no new match will occur. >> call This entry shall contain a function to be called if there is a match (if this entry exists, "replace" isn't applied). It is meant to perform complex tasks that aren't amenable to simple string replacement. The function will be executed as follows: function (paragraph, line, index, pattern) "paragraph" is a table representing the current paragraph; lines are stored at successive indices. The last line of this paragraph is always the paragraph boundary (see "interpreter.paragraph"), unless the paragraph stopped at the end of the file. The second argument, "line", is a number representing the index in "paragraph" containing the line where the pattern was found; "index" is the position in this line where the match occurred. Finally, "pattern" is the entire table declared with "interpreter.add_pattern" and containing all the entries discussed here. The function may return zero, one, or two numbers. If it returns none, the search for the next occurrence of the pattern will start again on the same line (rather, on the line with the same position in the paragraph), at "index". If it returns one number, the search will resume at the same line but at position _n_, with _n_ the returned number. Finally, if two numbers are returned, the search will resume at line _m_ at position _n_, _m_ and _n_ being the returned values. Specifying which line should be examined when the search resumes might be necessary if the function adds new lines in the paragraph _before_ the current line, since Interpreter only keeps count of line numbers. The entire paragraph can thus be modified if necessary. For instance, suppose you want to declare comments in your source file with only "!Comment" in the first line, i.e. TeX should ignore a paragraph such as: !Comment This should be ignored by TeX Then the following pattern will do (where the function requires only the first argument): local function comment (paragraph) for n, l in ipairs(paragraph) do paragraph[n] = "%" .. l end end interpreter.add_pattern{ pattern = "^!Comment", call = comment } > interpreter.nomagic (string) A function which reverses the usual Lua magic for patterns: ordinary magic characters are normal characters here, unless they are prefixed with "%", in which case they are magic again. For instance, a pattern like ".+" is normally interpreted as ``one or more characters''. If passed to this function, a pattern is returned meaning ``a dot followed by a plus sign''. On the contrary, "%.%+" normally has the second interpretation, while with "interpreter.nomagic" it has the first one. The function makes another transformation: "..." is used to denote a capture "(.-)". Thus "interpreter.nomagic('*...*')" returns a pattern matching any number of characters surrounded by stars and capturing those characters; this would be expressed in ordinary Lua magic as "%*(.-)%*". ================================== classes_tag === Classes ====================== ================================== As already alluded to, the search for patterns isn't done at random. Instead, patterns are organized in classes, which are applied one after the other. More precisely, the process is as follows: Interpreter searches the entire paragraph for the first pattern in class~1, then for the second pattern in the same class, then for the third, etc., then when there is no pattern left in class~1 it does the same with class~2, up to class~_n_, where _n_ is the hightest class number such that there exists a class _n - 1_ (in other words, classes should be numbered consecutively). Finally, the same goes for the patterns in class~0 (which always exists, even if it contains no pattern). Inside a class, patterns are ordered by length from long to short, or alphabetically if two patterns have the same length. This means that if you use e.g. "/text/" for italics and "//text//" for bold, you don't need to put the second pattern in a class before the first to avoid "//text//" being interpreted as two empty arguments in italics surrounding a text in roman. Since the way the bold-pattern will be declared, e.g. "//(.-)//", is probably longer than for the italic-pattern, e.g. "/(.-)/", it will always match first. That said, the sorting isn't very clever and simply relies on the number of symbols, no matter what they mean; in the patterns above, the parentheses denote a capture but they still count in the pattern's length as understood by Interpreter. Alternatively, while ".*" denotes ``zero or more character'' and "%+" means ``a plus sign'' ("+" being magic, you have to escape it to refer to it), in Interpreter's eye the two patterns have the same length: two. Finally, one should be aware that patterns declared with a "nomagic" entry set to "true" are sorted after they've been transformed (so that their real length might not be obvious). So classes are needed when patterns need a proper ordering no matter their lengths. For instance, some patterns should always be declared first, as they protect input from Interpreter (see next section), while others might need to be declared last, as they rely on what previous patterns might have done. Besides, classes are metatables for the patterns they contain. > interpreter.default_class [Default: 1] All patterns belong to a class, even though you may omit the "class" entry when declaring one. In this case, the pattern is assigned to the class denoted by this number. > interpreter.set_class(number, table) Defines class "number" as "table". Classes don't need to be defined beforehand for patterns to be added to them (rather, Interpreter defines them implicitly when needed). However, classes are also metatables for the patterns, so that if there lacks an entry in a pattern's table, the class's entry is used if it exists. The function returns a table. ================================== protect_tag === Protecting input ============= ================================== Sometimes you want Interpreter to refrain from interpreting; that is most useful for verbatim code, for instance. There are various ways to do that. > interpreter.active [Default: true] A boolean switching Interpreter on and off. Beware, the switching applies only starting at the next paragraph. > interpreter.protect([line]) A function protecting all or part of the current paragraph. If "line" is given, it should be a number _n_, and line _n_ in the current paragraph will be protected; without "line", the entire paragraph is protected. Protecting means that the patterns not yet searched for will be ignored. For instance, if you want material to be read verbatim when surrounded with "" and "", you can declare a pattern as follows: local function verbatim (buffer) buffer[1] = "\\verbatim" buffer[#buffer - 1] = "\\endverbatim" intepreter.protect() end interpreter.add_pattern{ pattern = "^%s*%*s$", call = verbatim, class = 1 } This code is extremely simplified : it assumes that "" and "" starts and ends the paragraph and that "" isn't the last line of the file (otherwise it'd also be the last line in the paragraph, whereas here the last one is the paragraph boundary). An important point is that the pattern belongs to the first class, so it is called before all other patterns (provided there is no shorter pattern in class~1) and prevents them from doing anything, since the entire paragraph is protected. (Typesetting the material as verbatim material obviously depends on the "\verbatim" macro, not on Interpreter.) > interpreter.escape A character which prevents patterns from being replaced if immediately preceded by it. As an example, if "interpreter.escape = '_'", and "*text*" denotes italic, then "*text*" will produce _text_ while "_*text*" will produce *text*. Once a paragraph has been processed, Interpreter removes all escape characters. Only one character can be an escape character. > interpreter.protector(left[, right]) ["right" defaults to "left"] Defines two characters to protect what they surround. In other words, Interpreter replaces patterns only if the match isn't found between "left" and "right". Unlike the escape character, you can define as many protectors as you wish; and unlike the escape character again, Interpreter _doesn't_ remove them once the paragraph has been processed, so you must take care of them. For instance: intepreter.protector('"') interpreter.add_pattern{ pattern = '"(.-)"', replace = '\\verb`%1`', class = 0 } Anything between double quotes will be left untouched; then, when the paragraph has been processed for all other classes, a pattern in class~0 calls the "\verb" command to take care of the argument. Note that the protectors should enclose what they protect without coinciding with it; this is not the case here, which is why the pattern is applied. > interpreter.direct [Default: two percent signs then "I" and at least one space] A string, actually a pattern, signalling that the line which it begins should be processed as Lua code. The default is "%%%%I%s+", i.e. "%%I" followed by at least one space. The pattern shouldn't declare itself as attached to the beginning of the line (as in "^%%%%I%s+") because they will be matched at the beginning of the line only anyway. The line is processed with the "loadstring" function, and then turned into an empty line. For instance: %%I interpreter.active = false This won't be interpreted... %%I interpreter.active = true As this example shows, lines flagged with "interpreter.direct" don't obey "interpreter.active" and are always processed as described above. ================================== technical_tag === Technical stuff ============== ================================== You don't have to bother with this section if you don't mind how Interpreter does its job; actually you won't learn much anyway. > interpreter.reset() A function which resets everything to default and deletes classes. It is used when calling "\interpretefile" so that new interpretetions start from zero. > interpreter.register(function) A function called to put Interpreter's main function into the "post_linebreak_filter" callback; you can redefine it at will. If it is undefined, "callback.register()" is used, unless "luatexbase.add_to_callback()" is detected. (The detection takes place at the first call to "\interpretfile", so there is no need to load Interpreter after "luatexbase".) > interpreter.unregister(function) A function called to remove Interpreter's main function from the "post_linebreak_filter" callback. It works similarly to the previous one. ================================== example_tag === An example: i-doc.lua ======== ================================== \interpretfile{doc}{i-doc.lua}