Step Unifier

Synopsis:
This is a very detailed description of the proposed mmj2 proof step unification algorithm for proof steps involving work variables. The algorithm is an adaptation of Robinson's unification algorithm, customized for retrofitting into mmj2. The proposed algorithm requires "occurs in" checking only when substitution values are assigned to work variables. Storage areas for source and target variables are mentioned but the implementation is not described here; the mechanism may be assumed to provide direct access lookups to source and work variables and their assigned substitution values.





Contents:


I. Introduction:
Example I:
Example II:
II. Parse Trees, Proof Trees and Assertion Representations:
III. "One Way" vs. "Two Way" Unification:
Unification Example 1:
Unification Example 2:
IV. Algorithm for "mmj2 Proof Step Unification with Work Variables":

A. Assumptions and Context:
B. Substitution Types: Target Formula Updates and Source Formula Updates:
C. Partial Unification Results, Heuristics and "Raw" Substitutions:
D. "Sub-Unifications":
E. "Occurs In" Checking:
F. Putting It All Together:




I. Introduction:

The 01-Aug-2007 release of the mmj2 software focuses on adding "Work Variables" to the mmj2 Proof Assistant. Accomplishing this requires significant changes to proof unification processing in the Proof  Assistant, now performed in ProofUnifier.java. Unfortunately, this section of the code is complex and specialized; it was designed originally to do what is essentially the inverse of this project's main feature. ProofUnifier.java's original primary function was to derive proof step Ref labels (of assertions) using input formulas and hypothesis step numbers. Later it was modified to derive formulas and hypotheses using proof step Refs, but the functionality is incomplete: in cases where the substituted-for variables in the Ref  assertion are under-specified (i.e. not every variable has a substitution), the code now outputs "dummy variables" which are not treated as "real" variables by mmj2. In fact, when mmj2's proof assistant reads in a dummy variable that it itself had previously output, it generates an "invalid token" or undefined symbol message! This is obviously not what should happen, and in fact, metamath.exe does not treat its users so rudely.

It would be possible to modify ProofUnifier.java to accomodate Work Variables and still perform all existing mmj2 Proof Assistant functions. Instead, a separate module, StepUnifier.java, will be created to perform unification of a single derivation proof step when either a) Work Variables are involved in the step's formula or hypotheses, or b) the Derive Formula or Derive Hypotheses features are to be invoked for the step.

Here are two examples to illustrate "proof unification" to make this discussion concrete.

Example I. When the Work Variables enhancements are completed in the next release, mmj2 will be able to reconstruct every proof using only the logical assertion labels in Metamath proofs -- plus the theorem conclusion and hypothesis formulas. That is one of the criteria for success of the project, which will be verified using this mmj2 RunParm command:

ProofAsstBatchTest,*,,Unified,NotRandomized,NoPrint,DeriveFormulas,NoCompareDJs,NoUpdateDJs

What that command will do is export each input .mm file theorem to a Proof Worksheet stored in memory, and then process it using the mmj2 Proof Assistant to complete the proof and generate a Metamath RPN format proof. Here is how a Proof Worksheet exported with the "DeriveFormulas" and "Unified" RunParms options would look prior to unification with the Proof Assistant (see also, RunParm "ProofAsstExportToFile"):


$( <MM> <PROOF_ASST> THEOREM=syl  LOC_AFTER=?

h1::syl.1          |- ( ph -> ps )
h2::syl.2          |- ( ps -> ch )
3:2:a1i
4:3:a2i
qed:1,4:ax-mp      |- ( ph -> ch )

$)


And here is the result after unification:


$( <MM> <PROOF_ASST> THEOREM=syl  LOC_AFTER=?

h1::syl.1          |- ( ph -> ps )
h2::syl.2          |- ( ps -> ch )
3:2:a1i            |- ( ph -> ( ps -> ch ) )
4:3:a2i            |- ( ( ph -> ps ) -> ( ph -> ch ) )
qed:1,4:ax-mp      |- ( ph -> ch )

$=  wph wps wi wph wch wi syl.1 wph wps wch wps wch wi wph syl.2 a1i
    a2i ax-mp $.
$)


Remarkably, the new release of mmj2 will show that, in theory Metamath proofs could be abbreviated by eliminating syntax labels -- saving space but requiring more computation. Theorem syl's proof could be stored as follows and then expanded using mmj2's new proof unification logic:

$=  syl.1 syl.2 a1i a2i ax-mp $.

Instead of like this (uncompressed Metamath RPN-format proof:

$=  wph wps wi wph wch wi syl.1 wph wps wch wps wch wi wph syl.2 a1i a2i ax-mp $.


Example II. Here is another example showing how the user will be able to input just Ref labels and create a proof step by step in reverse order:

1) Alt-F + New (File/New) and enter "syl":

$( <MM> <PROOF_ASST> THEOREM=syl  LOC_AFTER=?
h1::syl.1          |- ( ph -> ps )
h2::syl.2          |- ( ps -> ch )
3:?:               |- ?
qed:?:             |- ( ph -> ch )
$)

2) Delete step 3, enter qed step Ref label, "ax-mp", and press Ctrl-U (Unify):

$( <MM> <PROOF_ASST> THEOREM=syl  LOC_AFTER=?
h1::syl.1          |- ( ph -> ps )
h2::syl.2          |- ( ps -> ch )
1002:?:            |- &w1
2002:?:            |- ( &w1 -> ( ph -> ch ) )
qed:1002,2002:ax-mp |- ( ph -> ch )
$)

3) enter step 2002 Ref label, "a2i", and press Ctrl-U (Unify):
(Notice that step 1002's &w1 Work Variable is automatically assigned "( ph -> ps )", and then determined to be equal to step h1 and automatically removed because it has an incomplete hyp ("?") -- and that step qed's reference to 1002 is automatically changed to 1!)


$( <MM> <PROOF_ASST> THEOREM=syl  LOC_AFTER=?
h1::syl.1          |- ( ph -> ps )
h2::syl.2          |- ( ps -> ch )
3002:?:            |- ( ph -> ( ps -> ch ) )
2002:3002:a2i      |- ( ( ph -> ps ) -> ( ph -> ch ) )
qed:1,2002:ax-mp    |- ( ph -> ch )
$)

4) enter step 3002 Ref label, "a1i", and press Ctrl-U (Unify) -- and voila!:

$( <MM> <PROOF_ASST> THEOREM=syl  LOC_AFTER=?
h1::syl.1          |- ( ph -> ps )
h2::syl.2          |- ( ps -> ch )
3002:2:a1i         |- ( ph -> ( ps -> ch ) )
2002:3002:a2i      |- ( ( ph -> ps ) -> ( ph -> ch ) )
qed:1,2002:ax-mp    |- ( ph -> ch )
$=  wph wps wi wph wch wi syl.1 wph wps wch wps wch wi wph syl.2 a1i
    a2i ax-mp $.
$)





II. Parse Trees, Proof Trees and Assertion Representations:


mmj2's Proof Assistant and unification algorithms rely upon the existence of  an "unambiguous grammar" for the input Metamath .mm file. What is meant by an "unambiguous grammar" is that for each valid formula there exists exactly one (syntactic) parse tree. With unique parse trees it is possible to convert a formula to a parse tree and back again to the original formula, without loss of information.

For example, here is the parse tree for the formula "A" equal to "( ( x -> y ) -> ( x -> z ) )" where "->" is shown using set.mm's statement label ("wi") for implication ("->"):
                wi
. .
wi wi
. . . .
x y x z
Note that a "proof tree" has the same logical structure as a parse tree. The children of a Proof Tree root node correspond to the Mandatory Hypotheses of the final proof step's referenced assertion, not just the mandatory variable hypotheses -- this means that if "ax-mp" is the root node of a proof tree then it will have 4 children, corresponding in order to its mandatory hypotheses, "ph, "ps", "min" and "maj".  Furthermore, each sub-tree in a valid proof tree is also a valid proof tree (it is a recursive structure); the syntax sub-trees in a proof tree are actually -- also -- parse (sub)trees, and correspond to the substitutions that will be made into the referenced assertion's mandatory variables.

Because set.mm's grammar is unambiguous, we know that valid simultaneous substitutions to variables in A will result in new formulas that are also syntactically valid. A's parse tree makes it clear why that is true. For example, if  every "x" sub-tree in A's parse tree is substituted with the sub-tree for "( z -> y )", a new valid parse tree results, representing the formula

    A' = "( ( ( z -> y ) -> y ) -> ( ( z -> y ) -> z ) )":
                wi
. .
wi wi
. . . .
wi y wi z
. . . .
z y z y


It is possible to represent formula A using functional notation borrowed from mathematics:

A("x","y","z") = ( ( x -> y ) -> ( x -> z ) )

Then, the substitution of "( z -> y )" for x can be represented like this:

A("( z -> y )","y","z") = ( ( ( z -> y ) -> y ) -> ( ( z -> y ) -> z ) )

The new Work Variables enhancement takes into account omitted functional parameters of A, as follows:

A(,,) = ( ( &w1 -> &w2 ) -> ( &w1 -> &w3 ) )

where "&w1", "&w2" and "&w3" are Work Variables standing in for the "mandatory variables" of A (Metamath uses "mandatory variables" to refer to variables used in an assertion or its logical hypotheses, thus distinguishing them from "optional" or "dummy" variables that may be referenced in a proof of that assertion.)

This same functional notation can be used to refer to a set of formulas comprised of an assertion and its logical hypotheses. For example, the Modus Ponens Axiom, "ax-mp" in set.mm consists of two logical hypotheses and an assertion:

  ${
    min $e |- ph $.
    maj $e |- ( ph -> ps ) $.
    ax-mp $a |- ps $.
  $}

Using functional notation we could write ax-mp(ch,th) = the set of three formulas:

    "ch"
    "( ch -> th )"
    "th"

And, with omitted variables in the ax-mp function call, ax-mp(,) =

    {"&w1",
     "( &w1 -> &w2 )",
     "&w2"}

Finally...

The idea of mmj2's original proof step "unification" was to reverse the functional notation used above and pass formulas as function arguments to generate the variable substitutions!

So, for example instead of writing

A("( z -> y )","y","z") = ( ( ( z -> y ) -> y ) -> ( ( z -> y ) -> z ) )

we write:

A("( ( ( z -> y ) -> y ) -> ( ( z -> y ) -> z ) )") = ("( z -> y )","y","z")

The right hand side of the equation contains the variables that must be substituted into A to generate the formula passed as an argument!

In the case of an assertion which has logical hypotheses, such as ax-mp, there must be multiple formulas passed as function arguments. For example, the final proof step of theorem syl in set.mm could be represented like this:

ax-mp("( ph -> ps )", "( ( ph -> ps ) -> ( ph -> ch ) )", "( ph -> ch )") =
     ("( ph -> ps )", "( ph -> ch )")

In point of fact, the problem of unification, and searching for an assertion to justify a proof step is like solving a set of equations to eliminate unknowns. Let variable R stand for a Ref label, F for a set of hypothesis formulas plus a proof step formula, and S stand for a set of variable substitutions into R's mandatory variables, then we can write for a given proof step:

R(S) = F

where, given one or more unknowns we can attempt to solve for the others. In the case where incomplete information is provided, and R's variable substitutions are "under-specified", we will employ Work Variables as placeholders.




III. "One Way" vs. "Two Way" Unification:


In mmj2's Proof Assistant "unification" not involving formulas containing work variable may be termed "one-way unification". It is simply a mechanism for a) determining that the proof step is an instance of the logical assertion (and thus is true/valid), and b) computing the substitutions that must be made to the referenced logical assertion and its logical hypotheses (if any), which, when applied, will transform the referenced assertion and make it identical to the proof step and its logical hypotheses.

In the case where work variables are involved, a (so called) "two-way unification" process is required which not only computes substitutions for a referenced logical assertion, but which may generate substitutions to the work variables in a proof step formula and/or its hypotheses. In a finished proof step the unknowns in R(S) = F are completely specified and all work variables are replaced in the proof by non-work variables.  In incomplete proof steps still containing unknowns in R(S) = F the result of unification may be to modify the original work variables -- so, for example, "&w1" might be updated and redisplayed by the program after unification as "( &w2 -> &w3 )".

The term "unification" is somewhat overused and imprecise, like the term "parse". Most often discussion of "unification" concerns computation of a "MGU", the Most General Unifier for a set of formulas. Given, say, two formulas, the MGU computation produces a set of substitutions that when applied to the two input formulas makes each of them equal to a third formula. That is not what mmj2 is doing, but the basic functions of the algorithm are the same and mmj2 is adapting the basics of MGU unification.

Here is an excellent introductory explanation of Unification by Michael Beeson: http://michaelbeeson.com/research/otter-lambda/index.php?include=Unification

Note these basic differences between what Mr. Beeson is discussing and what mmj2 is doing in its unification processing:

Unification Example 1: "S" is a proof step formula, we'll call it the "source formula", and its variables are the "source variables". "T" is the formula of an assertion, an axiom or theorem used to justify S; we'll call it the "target formula" and its variables the "target variables" (we are primarily interested in making substitutions to T in order to justify S...)

               
Let T = "( ( x -> y ) -> ( x -> z ) )" and
S = "( WFF1 -> WFF2 )".

S unifies with T to yield:

S" = "( ( WFF3 -> WFF4 ) -> (WFF3 -> WFF5 ) )"


Then T's syntax tree: T's assrtSubst
=============== |-------|---y---|---z---|
| x | y | z |
-> |-------|-------|-------|
. . | WFF3 | WFF4 | WFF5 |
-> -> ------------------------
. . . .
x y x z


S"'s syntax tree: S"'s Work Variable Subst:
================= |---------|---------|------|------|------|
| WFF1 | WFF2 | WFF3 | WFF4 | WFF5 |
-> |---------|---------|------|------|------|
. . | -> | -> | x | y | z |
WFF1 WFF2 | x y | x z | | | |
-----------------------------------------

Updated S" syntax tree:
=======================

->
. .
-> ->
. . . .
WFF3 WFF4 WFF3 WFF5



Unification Example 2:


Let T = "( ( x -> y ) -> ( x -> z ) )" and
S = "( ( F -> G ) -> WFF1 ) )".

S unifies with T to yield:

S" = "( ( F -> G ) -> ( F -> WFF2 ) )"


Then T's syntax tree: T's assrtSubst
=============== |-------|---y---|---z---|
| x | y | z |
-> |-------|-------|-------|
. . | F | G | WFF2 |
-> -> ------------------------
. . . .
x y x z


S"'s syntax tree: S"'s Work Variable Subst:
================= |---------|---------|
| WFF1 | WFF2 |
-> |---------|---------|
-> . | -> | z |
. . WFF1 | x z | |
F G ---------------------

Updated S" syntax tree:
=======================

->
. .
-> ->
. . . .
F G F WFF2




IV. Algorithm for "mmj2 Proof Step Unification with Work Variables":


A. Assumptions and Context:

1. In mmj2's ProofUnifier.java code numerous heuristics are used to "fast fail" a candidate assertion "T" as being unifiable with proof step "S". For example, if T's parse tree's height is greater than S's, then S cannot be unified with T. That heuristic is invalid if S contains work variables because each work variable, &w1, etc., represents an unknown sub-expression, which, if validly substituted with a given sub-expression might result in S's parse tree height becoming greater than or equal to T's.

Removing these heuristics, or customizing them to accomodate work variables would probably reduce the efficiency of the unification search process in ProofUnifier.java -- and speed is important because the unification search process scans the entire input Metamath .mm file looking for unifiable assertions for proof steps that contain a blank Ref label. Therefore, StepUnifier.java's use will be restricted to cases where the proof step's Ref label is provided (except for "Hint" acquisition processing.). Thus, the speed of the unification algorithm in StepUnifier.java is less critical than in ProofUnifier.java -- it only needs to be performed once for each proof step when the user presses Ctrl-U, not 10,000 times!

2. The basic "unit of work" for unifications involving work variables is a single proof step. This has several ramifications:

a) If unification fails, the proof step -- and related parts of the Proof Worksheet -- are left untouched.

b) Prior to unification of a proof step involving work variables, it shall be the case that every work variable used in the proof steps has a null value (because it will be programmed that way.) By that, it is meant that if during processing a work variable is rewritten with an assigned value, then every instance of that work variable in every proof step is then rewritten to reflect the new value -- and thus, the work variable is either resolved with a non-work variable assignment, or it is renamed to be a different work variable, or it is decomposed into a sub-expression involving other variables (i.e. "&w1" is decomposed and replaced everywhere by "( ph -> &w2 )"). What b) really means is that when proof step unification begins, the value is each work variable is really "unknown" -- not yet determined. Also, in the future, if a mechanism is developed allowing the user to assign a sub-expression to a work variable then b) assumes that that assignment is complete and is reflected in the formulas and parse trees in the Proof Worksheet before unification begins.

c) A given work variable may occur in more than one proof step (work variables can be manually entered by the users, and can be generated by the Derive Hypothesis and Derive Formula features.) Because of this fact, unification of proof step M may result in modification of proof step N -- in fact, a work variable used in step N may be resolved and assigned with a non-work variable as the result of step M's unification.


B. Substitution Types: Target Formula Updates and Source Formula Updates:

1. Target Formula Updates are substitutions resulting from a target formula's VarHyp parse tree node being matched to any source formula parse tree node having the same Type Code
Examples:
1a) Source Variable into Target Variable Substitutions:

T (target) S (source) Target Formula Updates
=============== =============== ======================
-> ->
. . . .
x y <======== F &W1 [x:F][y:&W1]

1b) Source Sub-expression into Target Variable Substitutions:

T (target) S (source) Target Formula Updates
=============== =============== ======================
x <============ -> [x:F->&W1]
 . .
F &W1

2. Source Formula Updates are substitutions resulting from a target formula non-VarHyp (assertion) parse tree node being matched to a Work VarHyp parse node in the source formula.
Example: (notice that target variables are shown in lower case and source variables are upper case, but in real life a variable named "ph" could occur in both the source and target formulas and yet they would be different variables!)

2a) Target Sub-expression into Source Work Variable Substitutions:

T (target) S (source) Source Formula Updates
============= =============== ======================
-> =============> &W1 [&W1:x->y] (raw/partial substitutions...
. . [x:&W2] (and renames/source updates!)
x y [y:&W3]
[&W1:&W2->&W3]
[&W2:]
[&W3:]




C. Partial Unification Results, Heuristics and "Raw" Substitutions:


Unification of a proof step and its hypotheses with a Ref assertion and
its hypotheses may require N! formula unifications involving the proof
step's hypotheses, where N is the number of hypotheses in the Ref'd
assertion -- that is the worst case scenario. In the best case,
the user's input is correct and only one pass through the proof step's
hypotheses is necessary. If the initial attempt fails however, then to
reduce the number of hypothesis unification attempts (pairing each
proof step hypothesis with a candidate Ref assertion hypothesis) mmj2
uses heuristics to guide the trial-and-error search process. 

The primary proof step unification heuristic is to unify the proof step's formula first, which establishes the majority of source/target variable substitution mappings.

Secondly, after the proof step formula is successfully unified, and a unification error is encountered with the user's hypothesis input, mmj2 pre-sorts the proof step and Ref assertion hypotheses as follows: 1) longest formulas are unified first; and 2) if two formulas have the same length, then if one has variables in common with the conclusion formula, unify it first. This heuristic is based on the observation that longer formulas are most likely to contain variables in common with the proof step/assertion formula; unifying the longest formulas first reduces the number of incorrect source/target variable substitution mapping attempts. For example, it is very common for an assertion to contain multiple hypotheses of this nature: "A e. V", "B e. V", etc. These hypotheses should be unified last, once the correct target-to-source variable mappings of A and B are determined.

Unfortunately, some trial and error hypothesis unification attempts are still required. To minimize the number of computations needed, mmj2 uses "Partial Unification Results" when unifying proof step hypotheses. What this means is generating and saving the set of "raw" substitutions resulting from unifying a target hypothesis formula parse tree with a source hypothesis formula parse tree without regard for the consistency of these substitutions.

Example of "Raw" Substitutions (prior to accumulation/merge and possible renaming of target variables):
    T (target)          S (source)          Raw Substitutions (unmerged)
============= =============== ======================
-> -> [y:F][x:G][&W1:z->y]
. . . &W1
-> -> ->
. . . . . .
y x z y F G


Example: where there are no "raw" substitutions for a pair of parse trees. If the trees are incongruous, then no unification is possible (in this example corresponding nodes of "<->" and "->" do not match and cause unification failure):

    T (target)          S (source)          Raw Substitutions (unmerged)
============= =============== ======================
<-> -> Error! Unification Impossible!
. . . &W1
-> -> <->
. . . . . .
y x z y F G

mmj2's unification of hypotheses, thus, involves a two-step process: 1)
generate and save the "raw" substitutions generated by mapping the
source parse tree onto the target parse tree; and 2) accumulate and
merge the "raw" substitutions into the existing set of substitutions
derived from the proof step formula and any previously unified
hypotheses. 

There are three possible results to record for each possible hypothesis unification: 1) unification not yet attempted; 2) unification impossible -- parse trees incongruent; 3) unification possible, non-empty set of "raw" substitutions exists.

These results are stored in an N by N "Partial Unification Results" matrix, where N is the number of hypotheses. mmj2's trial and error process attempts to successfully unify all N hypotheses and uses the Partial Unification Results matrix to systematically work through all of the possibilities (while doing the least amount of work.)

Example of Partial Unification Results Matrixs and Two Step Hypothesis Unification Process: assume there are 3 hypotheses, that the rows of the matrix correspond to the Ref'd assertion's hypotheses and that the columns correspond to the proof step hypotheses; and that inside each cell is either "null" (not attempted), "E" (unification impossible), or R, the List of raw substitutions generated from mapping the pair of hypothesis parse trees.  In the matrix below, mmj2 has determined that the correct proof step hypothesis order is "1,3,2".

             Proof Step Hyps, sorted
               3     1      2
            ===================
 Ref      2 |  R  |     |     |             
 Hyps,      ===================                       
 sorted   3 |     |  E  |  R  |               
            ===================                       
          1 |     |  R  |     |
            ===================                       

Note: the tricky part about carrying out this trial-and-error process is that if the accumulate and merge step fails for an hypothesis, a "backout" of the partially merged results is needed. The details of the backout process are not presented in this document as they are primarily technical in nature (see mmj/src/pa/StepUnifier.java).



D. "Sub-Unifications":

Multiple assignments of substitution values to either a single target formula variable or to a single source formula work variable must be consistent. In the case where no work variables are present, two substitutions to the same variable must be equal in order for consistency to be maintained. But equality is not a requirement for consistency when work variables are part of the substitutions -- because, after all, a work variable represents an "unknown" sub-expression.

Consistency checking of multiple substitutions to a single variable boils down to "Sub-Unification" of the two substitutions.

Example: Given these two raw substitutions, [x:F->(G->&W1)] and [x:&W2->(G->&W3)], are they "consistent"?

     Initial                                   Updated
Storage Storage After
     Accum'd Raw Substitution Accum/Merge of
Substitutions for Accum/Merge Raw Substitution
     ====== ================= ===============
[x:F->(G->&W1)] [x:&W2->(G->&W3)] [x:F->(G->&W1)]
 [&W1:] [&W1:&W3]
[&W2:F]
[&W3:]
Target Source
=============== =================
  -> ->
. . . .
F -> &W2 ->
. . . .
G &W1 G &W3



E. "Occurs In" Checking: "Occurs In" checking is needed only for assignments to Work Variables. There are two basic situations: 1) assignment of a Work Variable (the sub-tree depth = 1) to another Work Variable; and 2) other assignments of sub-expressions to a Work Variable.

In the first case, say [&W1 := &W2] , the assignment simply represents a Renaming of &W1 to &W2. Multiple assignments in a chain of Renames do not create an invalid "occurs in" situation even if a given Work Variable appears more than once in the chain -- just so long as the program handles the Renames correctly and does not actually create in storage a loop of assignment. So,  {[&W1 := &W2], [&W2 := &W3], [&W3 := &W1]} is just saying that &W1, &W2 and &W3 are the same thing, and as long as not subsequent substitution assignments contradict that, there is no problem.

In the second case, "occurs in" checking reveals the error situation in which a sub-tree of depth > 1 containing &W1 is assigned to &W1. This is always an error because two trees of different depth cannot be equal.




F. Putting It All Together:

Here is a high level narrative exposition describing proof step unification with work variables.

1. Initialize, as needed, the storage areas for substitution values assigned to source variables and target (work) variables.

2. Unify the proof step formula with the Ref'd assertion formula (its conclusion).
3. Unify the proof step hypotheses with the Ref'd assertion's hypotheses in the order given using the "HypothesisUnification" process described below. If any errors are encountered, proceed to step 4, otherwise, proceed to step 9.

4. Sort the proof step hypotheses and Ref'd assertion's hypotheses as described above and create the Partial Unification Results Matrix described above (along with the necessary storage for Backout of partial results, not described in this paper.)

5. (Brief synopsis...) Use the "HypothesisUnification" process described below to unify the first pair of hypotheses according to the sorted results from step 4. This pair corresponds to element [0,0] of the Partial Unification Results Matrix. If  unification is "Impossible" (the parse trees are incongruous), try pair [0,1], then [0,2], etc. until a set of raw substitutions is obtained for the pair of hypotheses. If the end of row 0 is reached, report unification "Failure". Once a set of raw substitutions is obtained for row 0, store them in the matrix and attempt accumulation/merge into the source/target storage areas. If accumulation/merge fails, backout the updates to the source/target storage areas and try the next column in row 0. Once successful accumulation/merge is complete for row 0, proceed to the next row and perform similar processing. Note that successful accumulation/merge may be achieved all the way to row N, and then fail, requiring backout and re-processing of earlier rows. When successful accumulation/merge is achieved for row N (the final hypothesis), proceed to step 9.

9. Unification is successful! Hooray... Now, allocate and load the "assrtSubst" array, which is a parallel array to the Ref'd assertions Mandatory Hypothesis array in mmj2 -- see mmj.lang.MandFrame.hypArray -- containing one substitution parse tree for each mandatory variable. To do this, proceed through the source variable storage area and for each source variable generate a clone of the assigned substitution value (a parse node sub-tree root node, technically), resolving any references to work variables as follows: if a work variable has been not been assigned a value, insert a work VarHyp node in the cloned sub-tree, otherwise, generate a clone of the work variables assigned substitution value, recursively resolving any references to work variables that it may contain.) Return the completed assrtSubst array. Note: the remaining functional requirements such as actually doing DeriveFormula or DeriveHypotheses, or updating formulas in the Proof Worksheet with updates to their work variables, etc., is tedious to describe and not essential to this document (most of these functions are already present in mmj2 but will require some modifications.)

HypothesisUnification: