Skip to content

config

This module defines the task enum and the configuration models used to control evaluation.

Key symbols

TaskType

Allowed enum values:

  • SINGLE_FEATURE
  • SINGLE_ENTITY
  • MULTI_ENTITY

FeatureRule

Constructor highlights:

  • required: feature_name, feature_type
  • validated allowed feature_type values: text, number, date, category
  • side effect: none
  • error conditions: invalid feature_type raises ValueError

MatchingConfig

Constructor highlights:

  • defaults to weighted matching with threshold 0.5
  • used only by MULTI_ENTITY
  • side effect: none
  • current caveat: matching_mode is not explicitly validated beyond normal type coercion

ClassificationConfig

Constructor highlights:

  • positive_label switches the metric function into one-vs-rest mode for that label when present
  • average_strategy exists in the model but is not currently consumed in the metric path

RunConfig

Constructor highlights:

  • required: task_type, feature_rules
  • index_key_name is required at runtime for indexed task types
  • matching_config is relevant only for MULTI_ENTITY
  • log_directory_path controls where RunLogger writes files
  • grouping_key_names exists but is not currently used in the runtime path

Generated API details

TaskType

Bases: str, Enum

Enumeration of supported task types.

Source code in src/extraction_testing/config.py
32
33
34
35
36
37
class TaskType(str, Enum):
    """Enumeration of supported task types."""

    SINGLE_FEATURE = "SINGLE_FEATURE"
    SINGLE_ENTITY = "SINGLE_ENTITY"
    MULTI_ENTITY = "MULTI_ENTITY"

FeatureRule

Bases: BaseModel

Configuration for how to compare a single feature.

Source code in src/extraction_testing/config.py
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
class FeatureRule(BaseModel):
    """Configuration for how to compare a single feature."""

    feature_name: str
    feature_type: str  # "text", "number", "date", "category"
    is_mandatory_for_matching: bool = True
    weight_for_matching: float = 1.0

    casefold_text: bool = True
    strip_text: bool = True
    remove_punctuation: bool = True

    alias_map: Optional[Dict[str, str]] = None

    numeric_rounding_digits: Optional[int] = None
    numeric_absolute_tolerance: Optional[float] = None
    numeric_relative_tolerance: Optional[float] = None

    date_tolerance_days: Optional[int] = None

    @field_validator("feature_type")
    def validate_feature_type(cls, value: str) -> str:
        """Validate feature_type."""
        allowed = {"text", "number", "date", "category"}
        if value not in allowed:
            raise ValueError(f"feature_type must be one of {allowed}")
        return value

validate_feature_type(value)

Validate feature_type.

Source code in src/extraction_testing/config.py
60
61
62
63
64
65
66
@field_validator("feature_type")
def validate_feature_type(cls, value: str) -> str:
    """Validate feature_type."""
    allowed = {"text", "number", "date", "category"}
    if value not in allowed:
        raise ValueError(f"feature_type must be one of {allowed}")
    return value

MatchingConfig

Bases: BaseModel

Configuration for entity matching.

Source code in src/extraction_testing/config.py
69
70
71
72
73
74
75
class MatchingConfig(BaseModel):
    """Configuration for entity matching."""

    matching_mode: str = "weighted"  # "exact" or "weighted"
    minimum_similarity_threshold: float = 0.5
    maximum_candidate_pairs: Optional[int] = None
    random_tie_breaker_seed: int = 13

ClassificationConfig

Bases: BaseModel

Configuration for classification reporting.

Source code in src/extraction_testing/config.py
78
79
80
81
82
class ClassificationConfig(BaseModel):
    """Configuration for classification reporting."""

    positive_label: Optional[str] = None
    average_strategy: str = "macro"  # "macro" or "micro" (exposed for future use)

RunConfig

Bases: BaseModel

Top-level run configuration.

Source code in src/extraction_testing/config.py
85
86
87
88
89
90
91
92
93
94
class RunConfig(BaseModel):
    """Top-level run configuration."""

    task_type: TaskType
    feature_rules: List[FeatureRule]
    index_key_name: Optional[str] = None
    grouping_key_names: Optional[List[str]] = None
    log_directory_path: str = "./logs"
    matching_config: Optional[MatchingConfig] = None
    classification_config: Optional[ClassificationConfig] = None