GrammarRuleTreeNotes.html

Red-Black Tree sample java code, public domain:
douglea\RBTreeDocIndex.html
douglea\RBCell.java
douglea\RBTree.java

An examination of RBCell.java reveals that it can be customized and specialized to replace C:\astevel3\java\src\mmj\lang\SynNode.java.

Using Red-Black trees achieves well-balanced binary trees at minimal insert cost. In the initial coding effort for MMJ, SynNode.java was used in a similar scenario except that Cnst did not point to the root of the Cnst's syntax sub-tree -- instead, a SynTree.java was used, meaning that a lookup was needed to find the node for the initial Constant of an expression. Not only can that initial lookup be eliminated, but the SynTree insert process involved a bulk add of sorted objects that was designed to achieve a balanced tree -- and also precluded dynamic adds later! So, with Red-Black binary trees we can avoid sorting the Grammar Rules and we can load them incrementally, if desired.

Another important change will be made involving the Grammar Rule trees: no duplicates will be allowed! The original coding effort allowed duplicate nodes pointing to Stmt objects with duplicate Expressions. In the new scheme of things, duplicate Grammar Rules will be purposely and skillfully dropped -- with our new theoretical understanding of Unique Readability and Ambiguity, the elimination of duplicates is actually a key objective!

As mentioned RBCell.java will be "stolen" from the public domain. All of the unnecessary functions and code will be stripped out, including code involved with deletions (not allowed at this time.) The RBCell code was developed by Doug Lea as a pre-Sun java Collections framework, and it is very sophisticated. In principle, the Sun Collections framework could be used, but instead of building hierarchies of trees using only RBCell type nodes, each level of Trees would be an explicitly separate Collection, with its own root, and so on; and a reference to a "payload" element would be stored in each node, meaning at least one extra layer of abstration/inefficiency at every level, not counting use of Comparators. With this "off the shelf" approach, the backing representation wouldn't necessarily have to be a tree -- a hashmap might work just as well. There are benefits to using the off the shelf approach, especially in an online update environment with concurrent processings underway. But for now, because I am stubborn and a jerk, I will go for the custom approach now, striving for speed (the idea of just putting all Expression sequences into a big Collection would be feasible, except for the fact that we need to be able to search for overlaps in initial segments of sequences and a tree structure is ideal for that purpose -- i.e. we could have "A E G H I", "A E G" and "A E" as (dubious) Rules and searching the "A" tree followed by a search of the "E" subtree seems simpler and faster.