Choosing a Task Type¶
Choose the task type based on the shape of your records and how alignment should happen before scoring.
Decision table¶
| If your data looks like this | Use this task type | Alignment strategy | index_key_name required |
Entity presence metrics |
|---|---|---|---|---|
| One indexed record, one extracted field to compare | SINGLE_FEATURE |
Exact join on the index key | Yes | No |
| One indexed record, multiple extracted fields to compare | SINGLE_ENTITY |
Exact join on the index key | Yes | No |
| Lists of predicted and gold entities with no stable row key | MULTI_ENTITY |
Entity matching, then feature scoring | No | Yes |
Input shape examples¶
SINGLE_FEATURE¶
Use this when each record contributes one label-like value.
class ArticleLabel(BaseModel):
row_identifier: int
topic_label: str
Typical use cases:
- document classification labels
- one extracted category per record
- one scalar field where you want one-feature reporting
SINGLE_ENTITY¶
Use this when each indexed record represents one entity with multiple fields.
from typing import Optional
from pydantic import BaseModel
class ArticleFeatures(BaseModel):
row_identifier: int
headline_text: str
author_name: str
publish_date: Optional[str]
Typical use cases:
- one form or document produces one structured object
- you want per-field metrics for a single known entity per record
MULTI_ENTITY¶
Use this when each dataset is a list of entities and the evaluator must decide which predicted entity matches which gold entity.
from typing import Optional
from pydantic import BaseModel
class ContractRecord(BaseModel):
contract_title: str
contract_amount: Optional[float]
contract_date: Optional[str]
Typical use cases:
- extracting many entities from one source document set
- invoice line items, contracts, people, products, or events
- no stable shared key exists across predicted and gold outputs
Common mistakes¶
-
Using
SINGLE_FEATUREwith multipleFeatureRuleobjects. The single-feature evaluator expects exactly one feature rule. -
Forgetting
index_key_nameforSINGLE_FEATUREorSINGLE_ENTITY. Both indexed task types require a join key and raise an error without one. -
Using
SINGLE_ENTITYwhen the real problem is entity matching. If the evaluator needs to discover which entity matches which, useMULTI_ENTITY. -
Treating
MULTI_ENTITYas if row order were the identity. Multi-entity evaluation matches entities by similarity, not by list position.
Practical rule of thumb¶
Start with the simplest task type that matches your data shape:
- one field per keyed record:
SINGLE_FEATURE - several fields per keyed record:
SINGLE_ENTITY - many entities that must be matched first:
MULTI_ENTITY