CreatingGrammarRulesFromSyntaxAxioms.html
Note: Duplicate Grammar Rule Expressions are not stored and do not
become part of the user's grammar (thus, one source of Ambiguity is
eliminated.) Because Syntax Axioms are processed in order of their appearance
in the user's Metamath database, a natural order of precedence is established (i.e. in set.mm
weq
occurs prior to wceq
therefore "set = set
" is parsed to weq
rather than the type-converted form of wceq
.)
Multiple Grammar Rules may be generated from a Syntax Axiom if any of
the Variable Type codes in the Expression are the object of Type
Conversion Syntax Axioms or Nulls Permitted Syntax Axioms. By "object
of" it is meant that the Type code of a Type Conversion or Nulls
Permitted Syntax Axiom is equal to the Type code of the Variable in the
Syntax Axiom being processed; directly or indirect relationships are possible, involving chains of Type Conversion
Axioms and Nulls Permitted Syntax Axioms.)
In theory (however rarely in practice), a Nulls Permitted Syntax Axiom
can effectively "generate" Nulls Permitted and Type Conversion "rules"
(see Examples in 2 below). Therefore, in the discussion below, we will
refer to "Nulls Permitted Rules" and "Type Conversion Rules" instead of
referring to Nulls Permitted Syntax Axioms and Type Conversion Syntax
Axioms.
By convention, all of the processing below is performed in order of database appearance. Thus:
- variables are converted/nulled in order of a variable's appearance in the database;
- if there are multiple chains of Type Conversion Rules for a Variable's
Type code, the chains are processed in the order that they were
created; and
- since a variable can only be "nulled out" once, regardless of how
many Type Conversion Rules and Nulls Permitted Rules can be brought
to bear on a single Type code, the first Nulls Permitted Rule for a Type
code -- direct or indirect -- is used and the rest
are ignored.
There are 4 possible cases here related to the presence or absence of
Type Conversion and/or Nulls Permitted Rules in the user's
Metamath database/file: 1) No Type Conversions or Nulls
Permitted Rules present; 2) Nulls Permitted Rules present;
3) Type Conversion Rules present; and 4) both present.
However, to simplify the explanation that follows, we will describe a 3
step process whereby in Step 1, a Syntax Axiom is converted into a
GrammarRule without using any Nulls Permitted or Type Conversion Syntax
Axioms. Then, application of Nulls Permitted Rules and Type Conversion Rules are
described separately -- and are defined in terms of operating on
existing GrammarRules. Thus, the GrammarRule from Step 1 is considered
input to Step 2, and the GrammarRules from Steps 1 and 2 are considered
input to Step 3. In each case, if the generated GrammarRule duplicates
an already existing GrammarRule it is dropped from further
consideration and processing. (The reason for describing the process
this way is that addition of another Nulls Permitted Rule or Type Conversion
Rule to an existing set of Syntax Axioms and Grammar Rules
requires re-processing of all existing GrammarRules to possibly
generate additional GrammarRules!)
1) No Type Conversion Rules or Nulls Permitted Rules involved.
A
Syntax Axiom Formula is converted to a GrammarRule formula by converting each Symbol in the Formula to a
Constant where the first Symbol is the replacement Type and the remaining Symbols form the GrammarRule's pattern:
For each Symbol(i) in Syntax Axiom Formula:
If Symbol(i) is a Constant,
Grammar Rule's Constant(i) = Symbol(i);
Else (the Symbol must be a Variable),
Grammar Rule's Constant(i) = Symbol(i)'s Type Code;
End-if;
The Expression portion of the resulting GrammarRule formula is stored
in a Notation Tree whose root is connected to the Constant of the
Syntax Axiom's Type code -- unless the GrammarRule Expression is a
duplicate. Then, a GrammarRule object is attached at the end, as a leaf
node, and is itself "hooked" up with a reference back to the original
Syntax Axiom. There are no Type Conversions or Nulls Permitted to worry
about, so the GrammarRule has no stored transformation rules for the
variables of the formula (see ExampleMetamathGrammar.html ).
If the GrammarRule Expression generated by this process is a duplicate
then no subsequent GrammarRules are generated by the Nulls Permitted Rule
and Type Conversion Rule processes described below. The reason is that all
possible GrammarRules resulting from existing Nulls Permitted Rules and Type
Conversion Rules will have already been generated -- no
additional variations are possible from a duplicate (it's in there
already...)
2) Nulls Permitted Rules present
A Syntax Axiom generates 0 or 1 unique, non-duplicate GrammarRule
Expressions as described above in #1. If the resulting GrammarRule --
we'll call it "Rx
" for the moment -- is unique and non-duplicative of any previously generated GrammarRule Expressions, then if Rx
contains "M
" Type code Constants for which Nulls are Permitted, then 0
to N
additional unique, non-duplicate GrammarRules are generated -- we'll call these "Rxn1
", "Rxn2
", etc. Then:
Each of Rxn1
, Rxn2
, etc. is generated by
applying a Nulls Permitted Rule to one or more of the Type codes in
the original GrammarRule Expression, and there are N
possible variations.
Example A) given GrammarRule Expression "set * set
" where "set
" is a Type code and "*
" is a Constant, if Type code "set
" has a Nulls Permitted Rule, then N = (2 ** 2) - 1 = 3
, meaning that application of Nulls Permitted Rule to the GrammarRule "set * set
" will generate 3 additional GrammarRules (which may or may not be duplicates of existing GrammarRules):
- "
set * set
" -- original GrammarRule;
- "
set *
" -- derived GrammarRule 1, with the last set
null;
- "
* set
" -- derived GrammarRule 2, with the first set
null;
- "
*
" -- derived GrammarRule 3, with both set
s
null (note that this Expression is a Constant! Nulls Permitted Rules
can also generate, surprisingly, Type Conversion Rules and other
Nulls Permitted Rules...)
Example B) given GrammarRule Expression "A B
" derived from a Syntax Axiom with Type code A
, if Type B
has Nulls Permitted, then N = (2 ** 1) - 1 = 1
. The generated GrammarRule Expression is just "A
" -- which is invalid because the original Syntax Axiom has Type code A
and this constitutes a loop!
Example C) given GrammarRule Expression "A B
" derived from a Syntax
Axiom with Type code A
, if Type A
has Nulls Permitted, then N = (2 **
1) - 1 = 1
. The generated GrammarRule Expression is just "B
" --
and thus, the generated GrammarRule defines a Type Conversion from B
to A
!
Example D) given GrammarRule Expression "B
" derived from a Syntax
Axiom with Type code A
(which is in effect, a Type Conversion...), if Type B
has Nulls Permitted, then N = (2 **
1) - 1 = 1
. The generated GrammarRule Expression is null --
and thus, the generated GrammarRule defines Nulls Permitted for Type code A
!
3) Type Conversion Rules Present
Processing 3) is basically same as 2) -- we generate new GrammarRule
Expressions from a given GrammarRule Expression based on the Type codes
in the given Grammar Rule Expression, and we delete any newly generated
Grammar Rules that are duplicates of already existing Grammar Rules.
However, instead of checking to see if a given Type code has a
corresponding Nulls Permitted Rule we check to see if it has one or
more corresponding Type Conversion Rules that specify "convert Type
code X to the given Type code". If so, then a Grammar Rule Expression
is generated for each matching Type Conversion Rule, and if the given
(input) GrammarRule Expression has multiple occurrences of Type code
with matching Type Conversion Rules, then all possible variations
(permutations) of GrammarRules are generated.
Example A) given GrammarRule Expression "X Y Z
" where "X
", "Y
" and "Z
" are Type codes, and Type Conversion Rules specifying that Type Codes "A1
" and "A2
" are convertible to "X"
, "B1
" and "B2
" to "Y
", and "C1
" and "C2
" to "Z
", then the following GrammarRule Expressions are generated (the bullet point numbers are not part of the generated rules...):
X Y C1
X Y C2
X B1 Z
X B1 C1
X B1 C2
X B2 Z
X B2 C1
X B2 C2
A1 Y Z
A1 Y C1
A1 Y C2
A1 B1 Z
A1 B1 C1
A1 B1 C2
A1 B2 Z
A1 B2 C1
A1 B2 C2
A2 Y Z
A2 Y C1
A2 Y C2
A2 B1 Z
A2 B1 C1
A2 B1 C2
A2 B2 Z
A2 B2 C1
A2 B2 C2
Since each original Type Code has 2 matching Type Conversions, there
are 3 possibilities for each of the 3 parameters, yielding 3 ** 3
possibilities, minus the original GrammarRule Expression = (3 ** 3) - 1
= 26.
Comment: Obviously, in a user-defined
Grammar with many Type Conversions and Syntax Axioms, the combinatorial
explosion could well exceed the size of available memory in the
computer! This is a classic tradeoff between size and speed -- and
we're not too sure how much speed this is buying us either. The time
savings of pre-computing all possible grammatical variations would
depend in part on the length of the formulas being parsed. Since most
statements are not excessively long, the savings yielded by this
pre-computation may be meager. On the other hand, it is likely that
many languages use few Type Conversions or Nulls Permitted; for
example, set.mm uses only one Type Conversion Rule and does not permit
Nulls at all. (Note: there is nothing to prevent substitution of
another grammatical parsing algorithm, the code will employ an
GrammaticalParsing Interface and so on and so forth...) The main
benefit of this pre-computation approach is simplicity, especially in
the intended Bottom Up algorithm implementation (a parse expert may be
inclined to hammer me for this crude approach...alternative code is
welcome!)
ALGORITHM NOTES:
The discussion above is perhaps more complicated than the necessary
code! The "trick" is that while adding "base" Grammar Rules -- the
original, un-modified Grammar Rules obtained from the Syntax Axioms --
each new Grammar Rule that is added is also written to a Priority Queue
which is ordered by the Grammar Rule's "maxSeqNbr" followed by its
GrammarRule.ruleNbr (together these two numbers provide a "total order"
on the set of Grammar Rules.) Then, immediately after storing the base
Grammar Rule, the Priority Queue is read and the "derived" Grammar
Rules, if any, are generated.
"maxSeqNbr" is the largest mObj.seq chosen from a Grammar Rule's
baseSyntaxAxiom and the set of GrammarRules used to modify the Syntax
Axiom's hypotheses. For example, if the Grammar Rule's
baseSyntaxAxiom.seq = 100 and it has a single Type Conversion Rule
applied, which has seq = 110, then GrammarRule.maxSeqNbr = 110.
GrammarRule.maxSeqNbr can be used in parsing to guarantee that a
grammatical parsing of an Expression contains no "forward references"
to statements in the database that follow the Statement whose
Expression is being parsed. In addition, the combination of maxSeqNbr
and GrammarRule.ruleNbr provide a total order on the set of Grammar
Rules which is then used to sequence the Priority Queue so that the
earliest Grammar Rules are stored in the final grammar.
Each new Grammar Rule read back from the Priority Queue is processed
against earlier Grammar Rules to generate new derived Grammar Rules. If
the new rule is a Nulls Permitted Rule, then it is applied to all
previous Type Conversion and Notation Rules. If the new rule is a Type
Conversion Rule then it is applied to *all* previous rules, including
Type Conversions. And if the new rule is a Notation Rule then it is
applied to all previous Nulls Permitted and Type Conversion Rules.
The only real complication is how to generate all possible permutations
of Type Conversions and Nulls Permitteds. Fortunately, there is a
trick, which also involves the Priority Queue. Instead of trying to
generate all possible permutations at once, each derivation involving
each hypothesis separately is performed; the resulting Grammar Rule is
added (if not a duplicate), and written to the Priority Queue -- which
results in *more* Grammar Rules! Let's take a simple example, a Type
Conversion from B to C (b2c $a c b $.)
Now suppose that there already exists a Type Conversion from C to D and
another from D to E. When the B to C conversion comes through the
priority queue, it is matched with the C to D conversion, thus
generating a B to D conversion, which is then written to the Priority
Queue. When the new B to D rule is read back, it is matched with the D
to E conversion and a new rule, a B to E conversion is created and
written to the Priority Queue. When the B to E conversion is read back,
there are no more applicable conversions and the chain reaction comes
to an end. A similar chain reaction would occur for a Notation Rule
involving multiple hypotheses with Type Conversions and Nulls
Permitteds.