CreatingGrammarRulesFromSyntaxAxioms.html

Note: Duplicate Grammar Rule Expressions are not stored and do not become part of the user's grammar (thus, one source of Ambiguity is eliminated.) Because Syntax Axioms are processed in order of their appearance in the user's Metamath database, a natural order of precedence is established (i.e. in set.mm weq occurs prior to wceq therefore "set = set" is parsed to weq rather than the type-converted form of wceq.)

Multiple Grammar Rules may be generated from a Syntax Axiom if any of the Variable Type codes in the Expression are the object of Type Conversion Syntax Axioms or Nulls Permitted Syntax Axioms. By "object of" it is meant that the Type code of a Type Conversion or Nulls Permitted Syntax Axiom is equal to the Type code of the Variable in the Syntax Axiom being processed; directly or indirect relationships are possible, involving chains of Type Conversion Axioms and Nulls Permitted Syntax Axioms.)

In theory (however rarely in practice), a Nulls Permitted Syntax Axiom can effectively "generate" Nulls Permitted and Type Conversion "rules" (see Examples in 2 below). Therefore, in the discussion below, we will refer to "Nulls Permitted Rules" and "Type Conversion Rules" instead of referring to Nulls Permitted Syntax Axioms and Type Conversion Syntax Axioms.

By convention, all of the processing below is performed in order of database appearance. Thus:

variables are converted/nulled in order of a variable's appearance in the database;
if there are multiple chains of Type Conversion Rules for a Variable's Type code, the chains are processed in the order that they were created; and
since a variable can only be "nulled out" once, regardless of how many Type Conversion Rules and Nulls Permitted Rules can be brought to bear on a single Type code, the first Nulls Permitted Rule for a Type code -- direct or indirect -- is used and the rest are ignored.

There are 4 possible cases here related to the presence or absence of Type Conversion and/or Nulls Permitted Rules in the user's Metamath database/file: 1) No Type Conversions or Nulls Permitted Rules present; 2) Nulls Permitted Rules present; 3) Type Conversion Rules present; and 4) both present.

However, to simplify the explanation that follows, we will describe a 3 step process whereby in Step 1, a Syntax Axiom is converted into a GrammarRule without using any Nulls Permitted or Type Conversion Syntax Axioms. Then, application of Nulls Permitted Rules and Type Conversion Rules are described separately -- and are defined in terms of operating on existing GrammarRules. Thus, the GrammarRule from Step 1 is considered input to Step 2, and the GrammarRules from Steps 1 and 2 are considered input to Step 3. In each case, if the generated GrammarRule duplicates an already existing GrammarRule it is dropped from further consideration and processing. (The reason for describing the process this way is that addition of another Nulls Permitted Rule or Type Conversion Rule to an existing set of Syntax Axioms and Grammar Rules requires re-processing of all existing GrammarRules to possibly generate additional GrammarRules!)

1) No Type Conversion Rules or Nulls Permitted Rules involved.

A Syntax Axiom Formula is converted to a GrammarRule formula by converting each Symbol in the Formula to a Constant where the first Symbol is the replacement Type and the remaining Symbols form the GrammarRule's pattern:



    For each Symbol(i) in Syntax Axiom Formula:

        If Symbol(i) is a Constant, 

            Grammar Rule's Constant(i) = Symbol(i);

        Else (the Symbol must be a Variable),

            Grammar Rule's Constant(i) = Symbol(i)'s Type Code;

End-if;

The Expression portion of the resulting GrammarRule formula is stored in a Notation Tree whose root is connected to the Constant of the Syntax Axiom's Type code -- unless the GrammarRule Expression is a duplicate. Then, a GrammarRule object is attached at the end, as a leaf node, and is itself "hooked" up with a reference back to the original Syntax Axiom. There are no Type Conversions or Nulls Permitted to worry about, so the GrammarRule has no stored transformation rules for the variables of the formula (see ExampleMetamathGrammar.html ).

If the GrammarRule Expression generated by this process is a duplicate then no subsequent GrammarRules are generated by the Nulls Permitted Rule and Type Conversion Rule processes described below. The reason is that all possible GrammarRules resulting from existing Nulls Permitted Rules and Type Conversion Rules will have already been generated -- no additional variations are possible from a duplicate (it's in there already...)

2) Nulls Permitted Rules present

A Syntax Axiom generates 0 or 1 unique, non-duplicate GrammarRule Expressions as described above in #1. If the resulting GrammarRule -- we'll call it "Rx" for the moment -- is unique and non-duplicative of any previously generated GrammarRule Expressions, then if Rx contains "M" Type code Constants for which Nulls are Permitted, then 0 to N additional unique, non-duplicate GrammarRules are generated -- we'll call these "Rxn1", "Rxn2", etc. Then:

N = (2 ** M) - 1.

Each of Rxn1, Rxn2, etc. is generated by applying a Nulls Permitted Rule to one or more of the Type codes in the original GrammarRule Expression, and there are N possible variations.

Example A) given GrammarRule Expression "set * set" where "set" is a Type code and "*" is a Constant, if Type code "set" has a Nulls Permitted Rule, then N = (2 ** 2) - 1 = 3, meaning that application of Nulls Permitted Rule to the GrammarRule "set * set" will generate 3 additional GrammarRules (which may or may not be duplicates of existing GrammarRules):

"set * set" -- original GrammarRule;
"set *" -- derived GrammarRule 1, with the last set null;
"* set" -- derived GrammarRule 2, with the first set null;
"*" -- derived GrammarRule 3, with both sets null (note that this Expression is a Constant! Nulls Permitted Rules can also generate, surprisingly, Type Conversion Rules and other Nulls Permitted Rules...)

Example B) given GrammarRule Expression "A B" derived from a Syntax Axiom with Type code A, if Type B has Nulls Permitted, then N = (2 ** 1) - 1 = 1. The generated GrammarRule Expression is just "A" -- which is invalid because the original Syntax Axiom has Type code A and this constitutes a loop!

Example C) given GrammarRule Expression "A B" derived from a Syntax Axiom with Type code A, if Type A has Nulls Permitted, then

N = (2 **
1) - 1 = 1

. The generated GrammarRule Expression is just "B" -- and thus, the generated GrammarRule defines a Type Conversion from B to A!

Example D) given GrammarRule Expression "B" derived from a Syntax Axiom with Type code A (which is in effect, a Type Conversion...), if Type B has Nulls Permitted, then

N = (2 **
1) - 1 = 1

. The generated GrammarRule Expression is null -- and thus, the generated GrammarRule defines Nulls Permitted for Type code A!

3) Type Conversion Rules Present

Processing 3) is basically same as 2) -- we generate new GrammarRule Expressions from a given GrammarRule Expression based on the Type codes in the given Grammar Rule Expression, and we delete any newly generated Grammar Rules that are duplicates of already existing Grammar Rules. However, instead of checking to see if a given Type code has a corresponding Nulls Permitted Rule we check to see if it has one or more corresponding Type Conversion Rules that specify "convert Type code X to the given Type code". If so, then a Grammar Rule Expression is generated for each matching Type Conversion Rule, and if the given (input) GrammarRule Expression has multiple occurrences of Type code with matching Type Conversion Rules, then all possible variations (permutations) of GrammarRules are generated.

Example A) given GrammarRule Expression "X Y Z" where "X", "Y" and "Z" are Type codes, and Type Conversion Rules specifying that Type Codes "A1" and "A2" are convertible to "X", "B1" and "B2" to "Y", and "C1" and "C2" to "Z", then the following GrammarRule Expressions are generated (the bullet point numbers are not part of the generated rules...):

X Y C1
X Y C2
X B1 Z
X B1 C1
X B1 C2
X B2 Z
X B2 C1
X B2 C2
A1 Y Z
A1 Y C1
A1 Y C2
A1 B1 Z
A1 B1 C1
A1 B1 C2
A1 B2 Z
A1 B2 C1
A1 B2 C2
A2 Y Z
A2 Y C1
A2 Y C2
A2 B1 Z
A2 B1 C1
A2 B1 C2
A2 B2 Z
A2 B2 C1
A2 B2 C2

Since each original Type Code has 2 matching Type Conversions, there are 3 possibilities for each of the 3 parameters, yielding 3 ** 3 possibilities, minus the original GrammarRule Expression = (3 ** 3) - 1 = 26.

Comment: Obviously, in a user-defined Grammar with many Type Conversions and Syntax Axioms, the combinatorial explosion could well exceed the size of available memory in the computer! This is a classic tradeoff between size and speed -- and we're not too sure how much speed this is buying us either. The time savings of pre-computing all possible grammatical variations would depend in part on the length of the formulas being parsed. Since most statements are not excessively long, the savings yielded by this pre-computation may be meager. On the other hand, it is likely that many languages use few Type Conversions or Nulls Permitted; for example, set.mm uses only one Type Conversion Rule and does not permit Nulls at all. (Note: there is nothing to prevent substitution of another grammatical parsing algorithm, the code will employ an GrammaticalParsing Interface and so on and so forth...) The main benefit of this pre-computation approach is simplicity, especially in the intended Bottom Up algorithm implementation (a parse expert may be inclined to hammer me for this crude approach...alternative code is welcome!)

ALGORITHM NOTES:

The discussion above is perhaps more complicated than the necessary code! The "trick" is that while adding "base" Grammar Rules -- the original, un-modified Grammar Rules obtained from the Syntax Axioms -- each new Grammar Rule that is added is also written to a Priority Queue which is ordered by the Grammar Rule's "maxSeqNbr" followed by its GrammarRule.ruleNbr (together these two numbers provide a "total order" on the set of Grammar Rules.) Then, immediately after storing the base Grammar Rule, the Priority Queue is read and the "derived" Grammar Rules, if any, are generated.

"maxSeqNbr" is the largest mObj.seq chosen from a Grammar Rule's baseSyntaxAxiom and the set of GrammarRules used to modify the Syntax Axiom's hypotheses. For example, if the Grammar Rule's baseSyntaxAxiom.seq = 100 and it has a single Type Conversion Rule applied, which has seq = 110, then GrammarRule.maxSeqNbr = 110. GrammarRule.maxSeqNbr can be used in parsing to guarantee that a grammatical parsing of an Expression contains no "forward references" to statements in the database that follow the Statement whose Expression is being parsed. In addition, the combination of maxSeqNbr and GrammarRule.ruleNbr provide a total order on the set of Grammar Rules which is then used to sequence the Priority Queue so that the earliest Grammar Rules are stored in the final grammar.

Each new Grammar Rule read back from the Priority Queue is processed against earlier Grammar Rules to generate new derived Grammar Rules. If the new rule is a Nulls Permitted Rule, then it is applied to all previous Type Conversion and Notation Rules. If the new rule is a Type Conversion Rule then it is applied to *all* previous rules, including Type Conversions. And if the new rule is a Notation Rule then it is applied to all previous Nulls Permitted and Type Conversion Rules.

The only real complication is how to generate all possible permutations of Type Conversions and Nulls Permitteds. Fortunately, there is a trick, which also involves the Priority Queue. Instead of trying to generate all possible permutations at once, each derivation involving each hypothesis separately is performed; the resulting Grammar Rule is added (if not a duplicate), and written to the Priority Queue -- which results in *more* Grammar Rules! Let's take a simple example, a Type Conversion from B to C (b2c $a c b $.)

Now suppose that there already exists a Type Conversion from C to D and another from D to E. When the B to C conversion comes through the priority queue, it is matched with the C to D conversion, thus generating a B to D conversion, which is then written to the Priority Queue. When the new B to D rule is read back, it is matched with the D to E conversion and a new rule, a B to E conversion is created and written to the Priority Queue. When the B to E conversion is read back, there are no more applicable conversions and the chain reaction comes to an end. A similar chain reaction would occur for a Notation Rule involving multiple hypotheses with Type Conversions and Nulls Permitteds.