Run Config¶
RunConfig is the top-level configuration object passed into evaluate(). It tells the orchestrator which evaluator to run and how fields should be compared.
RunConfig fields¶
| Field | Type | Default | Required | Notes |
|---|---|---|---|---|
task_type |
TaskType |
required | yes | Chooses the evaluator |
feature_rules |
list[FeatureRule] |
required | yes | The fields to compare |
index_key_name |
str \| None |
None |
indexed tasks only | Required for SINGLE_FEATURE and SINGLE_ENTITY |
grouping_key_names |
list[str] \| None |
None |
no | Present in the model, not currently used in the runtime path |
log_directory_path |
str |
"./logs" |
no | Used by RunLogger |
matching_config |
MatchingConfig \| None |
None |
multi-entity only | Defaults are applied if omitted |
classification_config |
ClassificationConfig \| None |
None |
no | Optional label-reporting behavior |
MatchingConfig¶
MatchingConfig is used by MULTI_ENTITY tasks.
| Field | Type | Default | Notes |
|---|---|---|---|
matching_mode |
str |
"weighted" |
Current implementation expects "weighted" or "exact" |
minimum_similarity_threshold |
float |
0.5 |
Candidate pairs below this score are discarded |
maximum_candidate_pairs |
int \| None |
None |
If set, only the top-scoring candidate pairs are kept before greedy matching |
random_tie_breaker_seed |
int |
13 |
Used to make equal-score tie-breaking deterministic |
Validation note¶
Unlike FeatureRule.feature_type, MatchingConfig fields are not currently validated beyond normal Pydantic type conversion. Invalid matching_mode values do not raise at model creation time; they fall through to the weighted-similarity path.
ClassificationConfig¶
ClassificationConfig affects how feature-level label metrics are reported.
| Field | Type | Default | Notes |
|---|---|---|---|
positive_label |
str \| None |
None |
If set and observed, the metric function returns one-vs-rest metrics for that label instead of macro metrics |
average_strategy |
str |
"macro" |
Present in the model but not currently consumed by the metric computation path |
Task-specific requirements¶
SINGLE_FEATURE¶
Use:
task_type=TaskType.SINGLE_FEATURE- exactly one
FeatureRule index_key_nameset
Current runtime failures:
- missing
index_key_nameraisesValueError - more than one feature rule raises
ValueError
SINGLE_ENTITY¶
Use:
task_type=TaskType.SINGLE_ENTITY- one or more
FeatureRuleobjects index_key_nameset
Current runtime failure:
- missing
index_key_nameraisesValueError
MULTI_ENTITY¶
Use:
task_type=TaskType.MULTI_ENTITY- one or more
FeatureRuleobjects - optional
matching_config
index_key_name is not required because rows are matched by similarity rather than by an explicit join key.
Configuration interactions¶
task_typedecides which evaluator class the orchestrator instantiates.feature_rulesaffect both comparison semantics and, for multi-entity tasks, similarity scoring.matching_configonly matters forMULTI_ENTITY.classification_config.positive_labelchanges per-feature precision/recall/F1/specificity output from macro to one-vs-rest for that label when the label is present.log_directory_pathdoes nothing by itself; it is only used if you instantiateRunLogger.
Common invalid or misleading configurations¶
-
Setting
matching_configfor an indexed task. It is harmless but unused. -
Omitting
index_key_nameforSINGLE_FEATUREorSINGLE_ENTITY. The evaluator will fail at runtime. -
Passing several feature rules to
SINGLE_FEATURE. The evaluator expects exactly one feature. -
Assuming
grouping_key_nameschanges result aggregation. It is currently not used in the execution path. -
Assuming
average_strategyswitches between macro and micro reporting. The current metric path does not consume that field.
Example¶
from extraction_testing import FeatureRule, MatchingConfig, RunConfig, TaskType
run_config = RunConfig(
task_type=TaskType.MULTI_ENTITY,
feature_rules=[
FeatureRule(feature_name="contract_title", feature_type="text", weight_for_matching=2.0),
FeatureRule(feature_name="contract_amount", feature_type="number", numeric_absolute_tolerance=100.0),
],
matching_config=MatchingConfig(
matching_mode="weighted",
minimum_similarity_threshold=0.6,
),
log_directory_path="./logs",
)