Tree Model
Last Modified: 7/19/99 11:59 AM
Contents:
1.
Model DTD and tag description.
2. Sample exported XML.
3. Scoring algorithm.
1. Model DTD and tag description.
Note: the model below assumes that dictionary tags are defined elsewhere. Variables are referred to by name.
<!ELEMENT
tree-model (node)>
<!ATTLIST tree-model
model-id CDATA #REQUIRED>
<!ELEMENT node ((%predicates;)?, info*, node*, score-distribution*)>
<!ATTLIST node
score CDATA #IMPLIED>
<!ELEMENT score-distribution EMPTY>
<!ATTLIST score-distribution
label CDATA #REQUIRED
value CDATA #REQUIRED>
<!ELEMENT info EMPTY>
<!ATTLIST info
name CDATA #REQUIRED
value CDATA #REQUIRED>
<!ELEMENT compound-predicate (%predicates;, (%predicates;)+)>
<!ATTLIST compound-predicate
bool-op (or | and | xor | cascade) #REQUIRED>
<!ELEMENT predicate EMPTY>
<!ATTLIST predicate
attribute CDATA #REQUIRED
op (eq | ne | lt | le | gt | ge) #REQUIRED
value CDATA #REQUIRED>
<!ELEMENT true EMPTY>
<!ELEMENT false EMPTY>
tree-model
- marks the beginning of a tree model.
node - encapsulization of all information concerning a split. Every node contains a predicate specifying when it should be chosen instead of one of its siblings.
info - a name-value pair associated with a node. Vendors may include named statistics about a node in a list of info elements.
score-distribution - a name-value pair that represents part of the score that a node predicts. When a score is scalar value, it is stored in the score attribute of node; when it is a vector, it is store as a list of score-distribution.
predicate - a simple boolean expression consisting of a field name, a binary comparison operator, and a value to which to compare the field value. Predicates are used to indicate the condition under which a child node should be chosen
compound-predicate - a container for combining simple predicates using logical and, or, or xor. Also, provides a framework for expressing CART's surrogates using the cascade operator.
true - a predicate that has the constant value of true.
false - a predicate that has the constant value of false.
2. Sample exported XML.
3. Scoring algorithm.
We will use the above example to illustrate the steps that should be followed in the scoring process. Say the following case (observation) must be scored:
indep29=25.1 indep16=0.2 indep02=0.3 indep28=0.4 indep12=0.5 indep27=0.6
Do model file parsing.
Go to the root node and select it because its predicate is the constant true.
Look at each of the child nodes of the root node, and observe which one in top to bottom order is the first to evaluate to true. In this case, since (25.1 <= 43.375) is true, the first child node (which we call A) is selected.
Look at each of the child nodes of A. Observe which one in top to bottom order is the first to evaluate to true. In this case, since (25.1 > 22.0469) is true, the second child node (which we call B) is selected.
Look at each of the child nodes of B. Observe which one in top to bottom order is the first to evaluate to true. In this case, since (0.3 != 42.0625) is true, the second child node (which we call C) is selected.
Observe that C has no children nodes. Therefore, the score contained in C is the score for this case, namely, 3.89063.