Skip to content

extraction-testing

config

`config`¶

This module defines the task enum and the configuration models used to control evaluation.

Key symbols¶

`TaskType`¶

Allowed enum values:

SINGLE_FEATURE
SINGLE_ENTITY
MULTI_ENTITY

`FeatureRule`¶

Constructor highlights:

required: feature_name, feature_type
validated allowed feature_type values: text, number, date, category
side effect: none
error conditions: invalid feature_type raises ValueError

`MatchingConfig`¶

Constructor highlights:

defaults to weighted matching with threshold 0.5
used only by MULTI_ENTITY
side effect: none
current caveat: matching_mode is not explicitly validated beyond normal type coercion

`ClassificationConfig`¶

Constructor highlights:

positive_label switches the metric function into one-vs-rest mode for that label when present
average_strategy exists in the model but is not currently consumed in the metric path

`RunConfig`¶

Constructor highlights:

required: task_type, feature_rules
index_key_name is required at runtime for indexed task types
matching_config is relevant only for MULTI_ENTITY
log_directory_path controls where RunLogger writes files
grouping_key_names exists but is not currently used in the runtime path

Generated API details¶

`TaskType` ¶

Bases: str, Enum

Enumeration of supported task types.

Source code in src/extraction_testing/config.py

class TaskType(str, Enum):
    """Enumeration of supported task types."""

    SINGLE_FEATURE = "SINGLE_FEATURE"
    SINGLE_ENTITY = "SINGLE_ENTITY"
    MULTI_ENTITY = "MULTI_ENTITY"

`FeatureRule` ¶

Bases: BaseModel

Configuration for how to compare a single feature.

Source code in src/extraction_testing/config.py

class FeatureRule(BaseModel):
    """Configuration for how to compare a single feature."""

    feature_name: str
    feature_type: str  # "text", "number", "date", "category"
    is_mandatory_for_matching: bool = True
    weight_for_matching: float = 1.0

    casefold_text: bool = True
    strip_text: bool = True
    remove_punctuation: bool = True

    alias_map: Optional[Dict[str, str]] = None

    numeric_rounding_digits: Optional[int] = None
    numeric_absolute_tolerance: Optional[float] = None
    numeric_relative_tolerance: Optional[float] = None

    date_tolerance_days: Optional[int] = None

    @field_validator("feature_type")
    def validate_feature_type(cls, value: str) -> str:
        """Validate feature_type."""
        allowed = {"text", "number", "date", "category"}
        if value not in allowed:
            raise ValueError(f"feature_type must be one of {allowed}")
        return value

`validate_feature_type(value)` ¶

Validate feature_type.

Source code in src/extraction_testing/config.py

@field_validator("feature_type")
def validate_feature_type(cls, value: str) -> str:
    """Validate feature_type."""
    allowed = {"text", "number", "date", "category"}
    if value not in allowed:
        raise ValueError(f"feature_type must be one of {allowed}")
    return value

`MatchingConfig` ¶

Bases: BaseModel

Configuration for entity matching.

Source code in src/extraction_testing/config.py

class MatchingConfig(BaseModel):
    """Configuration for entity matching."""

    matching_mode: str = "weighted"  # "exact" or "weighted"
    minimum_similarity_threshold: float = 0.5
    maximum_candidate_pairs: Optional[int] = None
    random_tie_breaker_seed: int = 13

`ClassificationConfig` ¶

Bases: BaseModel

Configuration for classification reporting.

Source code in src/extraction_testing/config.py

class ClassificationConfig(BaseModel):
    """Configuration for classification reporting."""

    positive_label: Optional[str] = None
    average_strategy: str = "macro"  # "macro" or "micro" (exposed for future use)

`RunConfig` ¶

Bases: BaseModel

Top-level run configuration.

Source code in src/extraction_testing/config.py

class RunConfig(BaseModel):
    """Top-level run configuration."""

    task_type: TaskType
    feature_rules: List[FeatureRule]
    index_key_name: Optional[str] = None
    grouping_key_names: Optional[List[str]] = None
    log_directory_path: str = "./logs"
    matching_config: Optional[MatchingConfig] = None
    classification_config: Optional[ClassificationConfig] = None