(see also BasicsOfSyntaxAxiomsAndTypes.html)

Q. The Metamath specification book, Metamath.pdf does not say much about Syntax Axioms. What rules and conventions should be followed in creating a formal language syntax in Metamath?

A. Metamath is by design, as flexible and general as possible while maintaining maximum simplicity.

(Note: I think that Metamath can implement Chomsky Type 2, 3 and 4 grammars -- AKA "Context-Free", "Regular" and "Finite-Choice" grammars, respectively, but cannot implement Chomsky Type 0 and 1 -- AKA Phrase-Structure and Context-Sensitive grammars..)

The user is given very wide latitude and very few restrictions in constructing a formal language with which to write logic and math statements and proofs. Metamath does not require that the user write only Boolean logic or any other specific logic theory. It does not validate the user's Logical Axioms to make sure that they are free from inconsistencies. Nor does Metamath care what notation scheme is employed, so long as the notation is in accordance with the basic Metamath file format, command syntax and relational validity edits. The user may employ infix, polish, reverse polish notation or a combination of these notation schemes. If the user creates a syntax that is ambiguous, Metamath.exe will not complain. Metamath simply validates the input file commands, checks proof validity according to well-defined rules, helps construct proofs via the "Proof Assistant" feature and creates output html pages, reports, etc. Contradictions and ambiguity are the user's problem -- in MMJ2 we will attempt to provide some help with the question of Ambiguity.

Now, in spite of the fact that the Metamath specification does not lay out rules regarding user-defined Syntax, per se, a close examination of the classic Metamath database, as well as others such as reveals that Syntax Axioms do indeed conform to certain unwritten rules.

The (Previously) Unwritten Rules for Syntax Axioms
[These appear to already be "conventions" in Metamath's file]

1. No Variable should appear more than once in a single Syntax Axiom. For example:

notvalidsyntaxaxiom $a wff ( ph -> ph ^ ps ) $.

is incorrect. Variable names are irrelevant in Syntax Axioms, except to define the order of hypotheses on the Proof Work Stack -- which is the reason each variable should have a different name even if the Type codes are identical.

2. A Syntax Axiom should not have an associated Logical Hypothesis or any Disjoint Variable Restrictions. For example:

    loghypX $e |- ( ph <-> ps ^ ch # th ) $.
            $d ph th ch $.
    saxiomX $a wff ph ^ ch # th ) $.

Syntax Axioms define the allowable symbol patterns for identifying a "thing" to be discussed. Logical constraints and disjoint variable restrictions should be placed on the definition axioms -- which themselves will rely upon Syntax Axioms to formulate the valid symbol patterns that identify the things being defined.

As an example, in English adjective-verb pairs are defined by the grammar of English: "red ball" and "pink elephant" are gramatically correct and are allowed by the grammar even though logically, pink elephants are impossibities that do not exist. The grammar allows us to formulate symbol patterns that identify impossible, illogical and false things! And it has to, otherwise, how would we be able to even talk about them to communicate the fact or their impossibility or falsity?

[NOTE: MMJ2-Specific "Rules" Regarding Unique Readability Follow

Please assume that we have a function/subroutine available,
N = GrammaticalParse(E),  that accepts as input any Expression, E, and returns as output a number N = 0, 1 or 2 where:
...and...further assume that by "Parse" we mean use a set of Grammar Production Rules derived from the Syntax Axioms and attempt to convert an Expression's symbol sequence into a tree structure (a "Parse Tree") that has a single root node corresponding to a Grammar Production Rule, with nodes below corresponding to variables in that Rule and/or other nodes corresponding to other Rules, etc. and so on... We'll get to the topic of HowTo Derive The Grammar Production Rules From The User's Syntax Axioms shortly...]

3. A Syntax Axiom's Type code should not be the same as the Type code assigned to any Theorem, Logical Hypothesis, Logical Axiom or Logical Definition statement (see $p, $e and $a Metamath statements.)

This may seem redundant given that we have defined what is a Syntax Axiom in terms of Type codes, but Rule 3 envisions the possibility of a computer program that allows the user to explicitly designate the labels of Syntax Axioms.

4. No Variable appearing in a Syntax Axiom should have a Type code that is the same as the Type code assigned to any Theorem, Logical Hypothesis, Logical Axiom or Logical Definition statement (see $p, $e and $a Metamath statements.)

For example, continuing with's Type codes such as "|-" meaning "provable" and "wff" meaning Well-Formed Formula,
and "set" meaning "set":

  $v formulaF formulaG formulaH
F $f |- formulaF $.
invalidsyntaxaxiomY $a wff ( ph <-> F ) $.

Note: the purpose of  rules 4 and 5 is to facilitate and guarantee the conversion of Metamath Syntax Axioms into Grammars, specifically Context-Free Grammars (AKA "Chomsky Type 3" grammars).

5. A Type code used in any statement's formula -- Theorem, Logical Axiom, Syntax Axiom, Variable Hypothesis, Logical Hypothesis, or Definition ($a, $e, $f, or $p statements) -- should not be used as a Constant in the Expression portion of any Formula. (This rule pertains the generating the Grammar Production Rules for a Context Free Grammar (aka "Chomsky Type 3" -- see "Parsing Techniques -- A Practical Guide" at ).

For example:

invalidsyntaxaxiomZ $a wff ( ph <-> wff ) $.

6. Each Syntax Axiom should be unique and should not be a composite of other Syntax Axioms. In other words, it should not be possible to parse a Syntax Axiom using other Syntax Axioms; in mathspeak, the ranges are disjoint.

Explanation: Let AE be the Expression portion of a Syntax Axiom, A, that is intended for inclusion in the existing set, R of Syntax Axiom's for a Metamath file.
    Switch (GrammaticalParse(AE)) {
        case  0: Syntax Axiom is Valid;
        case  1: Syntax Axiom not unique or is composite;
        case  2: Syntax is ambiguous;
        default: Shoot the programmer;

7. The Expression portion of every syntactically valid non-Syntax Axiom Formula should have a unique parse tree consisting of Syntax Axiom and Variable Hypothesis nodes. In other words, the syntax is unambiguous (aka "Unique Readability").

Explanation: Let FE be the Expression portion of a Formula, F, on a statement that is not a non-Syntax Axiom statement. Then
    Switch (GrammaticalParse(FE)) {
        case  0: Expression has syntax error;
        case  1: Expression has valid syntax;
        case  2: Syntax is ambiguous;
        default: Shoot the programmer;

NOTE: Rule 7 provides a test for a single Formula, and if GrammaticalParse(FE) returns a 2 then we know that the user's Syntax is Ambiguous. And because there are many algorithms for parsing Context-Free Grammars, it may be assumed that developing such a parser for Metamath is doable. But, Rule 7 cannot be used to prove that the user's Syntax is Unambiguous, i.e. has Unique Readability. To prove Unique Readability for an arbitrary Context-Free Metamath Grammar we will need to analyze punctuation and notations -- in depth, with all of their bewildering permutations.

8. Type Conversion Axioms may be coded into arbitrarily complex hierarchies but loops (aka "cycles", "synonyms") are not permitted.

Explanation: a loop of the form "XXX is a YYY", "YYY is a ZZZ", "ZZZ is a XXX" is not permitted. In this example we say "every subset is a set" which is fine, but then create a loop by saying "every set is a subset", which crosses the line; if there is a need to handle synonyms, the symbols should be converted prior to grammatical parsing and syntax verification.

       $c set class superclass subset $.  $( declare Constant symbols                       $)
       $v x A AA s                    $.  $( declare Variable symbols                       $)
vx     $f set x                       $.  $( Variable x has Type set                        $)
vA     $f class A                     $.  $( Variable A has Type class                      $)
vAA    $f superclass AA               $.  $( Variable AA has Type superclass                $)
vs     $f subset s                    $.  $( Variable s has Type subset                     $)
cs     $a set s                       $. 
$( OK, every subset is a set                      $)
cv     $a class x                     $.  $( OK, every set is a class                       $)
cA     $a superclass A                $.  $( OK, every class is a superclass                $)
cs2    $a class s                     $.  $( redundant but OK, every subset is a class      $)
cwrong $a subset x                    $.  $( ERROR: 'every set is a subset' creates a loop! $)

Note: A Type Conversion Axiom's Formula has length = 2, by definition and consists of a Type code followed by a Variable. This definition relates to the interesting question of just what precisely constitutes a "loop". For example, is the following pair of axioms constitute a "loop", where 'x' is a set Variable and 'A' is a class Variable?

cv           $a class x       $.
syntaxaxiomA $a set <@< A >@> $.
The (somewhat tenative) answer is No. The axiom, "syntaxaxiomA" could be interpreted as saying that certain restrictions can be applied to a class, "A" to construct a set. It is not saying that every class is a set.
9.  A Syntax Axiom containing one of more Variables whose Type code(s) have a Nulls-Permitted Syntax Axiom shall be treated as 'n' separate Syntax Axioms for the purpose of performing the validation checks specified by these "Unwritten Rules", where:

'n' = 1 plus the number of Expression variations generated by excising the Nulls-Permitted Variables from the Expression, both individually and in combination.

All other "Unwritten Rules" should be read as being intended to validate all 'n' possible Syntax Axioms!


Suppose we have:
vA      $f A a   $.
vB      $f B b   $.
nullBOk $a B     $.
ABOk    $a A b a $.

Then, Syntax Axiom "ABOk" has these 2 variations:
ABOk    $a A b a $.
ABOk.2  $a A a   $.

And, by Rule 8, ABOk.2 constitutes a loop, which is an error!

Note that the presence of Nulls-Permitted Syntax Axioms can implicitly generate other types of Syntax Axioms, such as Type Conversions, Named Typed Constants and Named Typed Strings! Thus, indirect loops are possible, as wells as other pathologies leading to ambiguities!!!

10. Deferred ... pending Grammar Production Rule Derivations.