Skip to content

orchestrator

This module contains the main user-facing entry points for running evaluations.

evaluate(predicted_records, gold_records, run_config)

Purpose:

  • choose the correct evaluator class from run_config.task_type
  • run the evaluation
  • return a ResultBundle

Parameters:

  • predicted_records: list of Pydantic models representing predictions
  • gold_records: list of Pydantic models representing gold data
  • run_config: RunConfig controlling task type and comparison behavior

Returns:

  • ResultBundle

Error conditions:

  • unsupported task_type raises ValueError
  • indexed task types raise ValueError if index_key_name is missing
  • single-feature evaluation raises ValueError if more than one feature rule is supplied

Side effects:

  • none

build_run_context(run_config)

Purpose:

  • create a RunContext for logging and traceability

Returns:

  • RunContext containing a run identifier, start timestamp, and configuration hash

Side effects:

  • none

Generated API details

build_run_context(run_config)

Create a run context with identifier, timestamp, and config hash.

Source code in src/extraction_testing/orchestrator.py
27
28
29
30
31
32
def build_run_context(run_config: RunConfig) -> RunContext:
    """Create a run context with identifier, timestamp, and config hash."""
    run_identifier_value = timestamp_string()
    started_at_timestamp_value = datetime.now().isoformat(timespec="seconds")
    configuration_hash_value = hash_configuration(model_to_dict(run_config))
    return RunContext(run_identifier_value, started_at_timestamp_value, configuration_hash_value)

evaluate(predicted_records, gold_records, run_config)

Convenience entry point to evaluate based on task type.

Source code in src/extraction_testing/orchestrator.py
35
36
37
38
39
40
41
42
43
44
45
def evaluate(predicted_records: List[BaseModel], gold_records: List[BaseModel], run_config: RunConfig) -> ResultBundle:
    """Convenience entry point to evaluate based on task type."""
    if run_config.task_type == TaskType.MULTI_ENTITY:
        tester = MultiEntityExtractionTest(run_config)
    elif run_config.task_type == TaskType.SINGLE_ENTITY:
        tester = SingleEntityExtractionTest(run_config)
    elif run_config.task_type == TaskType.SINGLE_FEATURE:
        tester = SingleFeatureExtractionTest(run_config)
    else:
        raise ValueError(f"Unsupported task type: {run_config.task_type}")
    return tester.test(predicted_records, gold_records)