Text Mode Formula Formatting


Enhance mmj2 by improving the rendering of Metamath formulas when using plain ASCII text.

Although the Metamath program, metamath.exe, provides a mechanism to generate Latex and either .gif  or Unicode HTML output files that display the modern math equivalents of the user-defined ASCII symbols, this typesetting mechanism is not available within mmj2 and its Proof Assistant. It will be very helpful if the mmj2 Proof Assistant displays ASCII text Metamath formulas using improvements in formula layout similar to those shown here: mdsymlem5Example.html


Background Items:

Note: "Pretty Printing" Metamath formulas is not completely dissimilar to the problem faced by computer programmers wishing to format complicated "if" statements. Computer programmers often use conventions such as indenting nested "if"s and aligning and grouping various symbols in order to facilitate comprehension by readers; instead of writing one long, continuous stream of program code, programmers often try to break down a statement into manageable chunks that exhibit the logical structure of the code. We can attempt the same thing with Metamath formulas, though with variations that take into account the differences between a Metamath formula and say, Java code.

The Problem: user difficulties in reading and comprehending Metamath formulas --
  1. User unfamiliarity with a .mm file's Syntax Axioms and symbol schemes.
  2. Metamath formulas are linear, one-dimensional objects -- contrary to standard mathematical typesetting, which employs two dimensional shapes (i.e. powers and roots), as well as varying font sizes, font families and text styles. families
  3. Absence of "dummy variables" (or named sub-expressions) to represent repeated sub-expressions within a formula or array of formulas; these dummy variables are common in texts to manage notational complexity, and even though the metamath.exe program employs them in its Proof Assistant, once a proof is completed the dummy variables used during proof creation disappear.
  4. A profusion of parentheses
  5. Lengthy, complex sub-expressions that often mask the hierarchical syntactical structure of a formula.
  6. Unintelligent line breaks and inter-symbol spacing (note: an author of a .mm file may write the "source" file however she pleases, but the author's line breaks and spacing are discarded during output generation -- this holds also for proof step formulas, which lose all formatting during the round trip into Metamath RPN -- Reverse Polish Notation -- and back.)
Difficulties 1 through 3 are beyond the scope of this enhancement. Although much can be said about the first 3 difficulties listed above (and the discussion is interesting), we drop those items from further discussion in this document.

Difficulties 4 through 6 are within the scope of this enhancement, and certain minor improvements in the user experience will result from the new code. Again, please have a look at: mdsymlem5Example.html

The Solutions:

No Free Lunch

The "No Free Lunch" saying (or "TANSTAAFL") holds here too. The only possible way to perfectly render every possible notation scheme for every person reading formulas is to allow customization of formatting at the level of the individual syntax axiom by the user -- and provide other parameters and program "smarts" to handle all of the special cases. But perfection in this matter is not likely unattainable because the average user is unwilling to spend the hours needed to customize processing at the level of detail that will provide a perfect personal viewing experience. It just won't happen. So what we need in code is something that is just good enough without lots of tweaking and customization -- something that isn't terrible -- but that provides for the possibility of some customization and extension by .mm file authors and users.

Here is a basic outline of the solution:

Sample RunParms for TMFF processing:

1. No change to existing processing -- i.e. no special formula formatting -- using defaults:
* blah blah - final RunParm to run the Proof Assistant GUI:

2. Default TMFF parameters coded explicitly to specify the default values -- except that by default TMFF is Off/Disabled (see RunParm "TMFFUseFormat"):
* Define TMFF Schemes:
* Method names are hardcoded and fixed: "AlignColumn" and "Flat";
* "Unformatted" is RESERVED for internal use.
* Scheme Names are assigned by the user; must be unique and non-blank.
* RunParmName,SchemeName,MethodName,

* Define TMFF Formats:
* "Format 0" is RESERVED for internal use - uses Method "Unformatted".
* RunParmName,FormatNbr,SchemeName

* To turn on/enable TMFF, use Format Nbr >= 1 (off/disabled = Format 0):
* RunParmName,FormatNbr

* Proof Assistant RunParms that affect TMFF formatting:

* NOTE: formulas are now output using TMFF because TMFF is enabled!
* NOTE: TMFF relies upon grammatical parsing of .mm statements! Input
*       the "LoadFile" and "Parse" RunParms before the TMFF RunParms
*       for best results.

* ... doit!

TMFF Method Processing Overview:

The basic idea in TMFF Method processing is to use a formula's parse tree to determine where to insert line breaks. Take, for example, a formula, "X" = "|- ( ( ph -> ( ps -> ch ) ) -> ( ( ph -> ps ) -> ( ph -> ch ) ) )".

The parse tree for X looks like this (in set.mm notation):
                  *     ****    *               
             *                        *                    
      ****                                ****
      *wi*                                *wi*
      ****                                ****              
     *      *                          *        *           
   ****       ****                ****            ****
   *ph*       *wi*                *wi*            *wi*
   ****       ****                ****            ****   
             *    *              *    *          *    *      
           ****  ****          ****  ****      ****  ****
           *ps*  *ch*          *ph*  *ps*      *ph*  *ch*
           ****  ****          ****  ****      ****  ****
( ( ph -> ( ps -> ch ) ) -> ( ( ph -> ps ) -> ( ph -> ch ) ) )

Conceptually, the TMFF Method begins at the root node of formula X's parse tree. But a formula's parse tree does not provide the first symbol of the formula, which is always a constant (i.e. "|-" or "wff" or other user-defined constant symbol), so the first symbol of the formula must be input with the parse tree, and is output as given (updating current row/column in the output area).

Each node of the parse tree, starting with the root node is formatted in turn as a sub-expression using a function we will call "formatSubExpr". The current node's sub-expression is checked against the input Method option/parameters, such as maxDepth, current column number, etc. If the sub-expression's attributes exceed any of the given values then the sub-expression must be broken up using a line break and space characters. In this case, the constants and the variables of the sub-expression are formatted and output in turn, from left to right; each variable corresponds to a child node of the current node, and is formatted and output using a recursive call to formatSubExpr.

The main difference between TMFF Methods (as initially foreseen) is the way that processing deals with line breaks inside sub-expressions. The AlignColumn method arranges the variable or constant portions of a sub-expression in a single column based on the input option/parameters.


Here is how formula X would be formatted with the Format 1 above (using Scheme "AlignVarDepth3" w/ maxDepth=3):
         1         2         3         4         5         6         7         8
                   |- ( ( ph -> ( ps -> ch ) ) ->
                        ( ( ph -> ps ) -> ( ph -> ch ) ) )

Here is X again but with Format 3 above (using Scheme "AlignVarDepth1", which uses maxDepth=1):

         1         2         3         4         5         6         7         8
                   |- ( ( ph ->
                          ( ps ->
                            ch ) ) ->
                        ( ( ph ->
                            ps ) ->
                          ( ph ->
                            ch ) ) )

A few tricky points to mention about Method processing and the above examples:
          operatorVar op1