Quality

Evaluate and monitor the quality of entity and relationship extractions across sources. The Quality API provides scoring, analysis, and comparison tools to identify high- and low-performing extractions and track quality trends across domains.

Base path: /api/v1/quality

Score Source

GET /api/v1/quality/sources/{source_id}

Score a single source's extraction quality. Returns quality metrics including entity and relationship contributions, connectivity, density, and pollution indicators. Uses cached scores when available for performance.

Path Parameters

Parameter	Type	Required	Description
`source_id`	string	Yes	ID of the source to score

Query Parameters

Parameter	Type	Required	Default	Description
`force_recalculate`	bool	No	`false`	Bypass cache and recalculate fresh scores

Example

# Score a source (uses cache if available)
curl "http://localhost/api/v1/quality/sources/src-abc123"

# Force recalculation
curl "http://localhost/api/v1/quality/sources/src-abc123?force_recalculate=true"

Response

200 OK

{
  "source_id": "src-abc123",
  "source_title": "Research Paper on Neural Networks",
  "domain": "science",
  "entity_count": 45,
  "relationship_count": 62,
  "entity_contribution": 2847.5,
  "relationship_contribution": 3102.0,
  "connectivity_bonus": 150.0,
  "total_score": 6099.5,
  "avg_entity_quality": 63.28,
  "avg_relationship_quality": 50.03,
  "connectivity_ratio": 0.82,
  "quality_grade": 72.5,
  "quality_label": "Good",
  "low_quality_entity_count": 3,
  "low_quality_relationship_count": 8,
  "density_ratio": 1.38,
  "density_score": 68.9,
  "topology_score": 75.4,
  "pollution_penalty": 2.5,
  "structural_penalty": 0.0,
  "hub_skew": 1.4,
  "reciprocal_rate": 0.05,
  "coverage_score": 61.3
}

SourceQualityScoreResponse

Field	Type	Description
`source_id`	string	ID of the source
`source_title`	string\|null	Title of the source
`domain`	string\|null	Extraction domain used
`entity_count`	int	Number of entities
`relationship_count`	int	Number of relationships
`entity_contribution`	float	Sum of quality-weighted entity scores
`relationship_contribution`	float	Sum of quality-weighted relationship scores
`connectivity_bonus`	float	Bonus for connected entities
`total_score`	float	Richness score (unbounded, quantity-driven)
`avg_entity_quality`	float	Average quality per entity (0--100)
`avg_relationship_quality`	float	Average quality per relationship (0--100)
`connectivity_ratio`	float	Ratio of connected entities (0--1)
`quality_grade`	float	Quality rating 0--100 (independent of volume)
`quality_label`	string	Quality label: Outstanding, Excellent, Good, Fair, or Low
`low_quality_entity_count`	int	Entities with score below 40 (inflation indicator)
`low_quality_relationship_count`	int	Relationships with score below 40
`density_ratio`	float	Relationships per entity ratio
`density_score`	float	Density score (bell-shaped around target, 0--100; over-dense graphs are penalized)
`topology_score`	float	Combined connectivity + density score (0--100)
`pollution_penalty`	float	Penalty for low-quality items (0--15)
`structural_penalty`	float	Penalty for graph-shape noise: hub skew + reciprocal rate (0--15)
`hub_skew`	float	max_entity_degree ÷ median_entity_degree (≥1.0; high = one entity over-connected)
`reciprocal_rate`	float	Fraction of edges with a same-type reciprocal partner (0--1)
`coverage_score`	float	Entities per chunk normalized to 0--100

Errors

Status	Description
`404`	Source not found

Score Source Details

GET /api/v1/quality/sources/{source_id}/details

Score a source with detailed entity and relationship breakdowns. Returns the same top-level metrics as the score endpoint plus individual score breakdowns for every entity and relationship.

Cache behavior

Detail view always requires calculation as individual breakdowns are not cached.

Path Parameters

Parameter	Type	Required	Description
`source_id`	string	Yes	ID of the source to score

Query Parameters

Parameter	Type	Required	Default	Description
`force_recalculate`	bool	No	`false`	Bypass cache and recalculate fresh scores

Example

curl "http://localhost/api/v1/quality/sources/src-abc123/details"

Response

200 OK

Returns a SourceQualityScoreResponse with two additional fields — entity_scores and relationship_scores:

{
  "source_id": "src-abc123",
  "source_title": "Research Paper on Neural Networks",
  "...": "... same fields as SourceQualityScoreResponse ...",
  "entity_scores": [
    {
      "entity_name": "Convolutional Neural Network",
      "entity_type": "Concept",
      "description_score": 18.0,
      "confidence_score": 13.5,
      "cross_chunk_score": 12.0,
      "properties_score": 10.5,
      "aliases_score": 7.0,
      "type_value_score": 25.0,
      "total_score": 86.0
    }
  ],
  "relationship_scores": [
    {
      "relationship_type": "trained_on",
      "source_entity": "Convolutional Neural Network",
      "target_entity": "ImageNet",
      "justification_score": 30.0,
      "confidence_score": 22.5,
      "specificity_score": 20.0,
      "valid_refs_score": 15.0,
      "total_score": 87.5
    }
  ]
}

SourceQualityDetailResponse

Extends SourceQualityScoreResponse with two additional fields:

Field	Type	Description
`entity_scores`	`EntityQualityScoreResponse[]`	Individual entity score breakdowns
`relationship_scores`	`RelationshipQualityScoreResponse[]`	Individual relationship score breakdowns

EntityQualityScoreResponse

Field	Type	Description
`entity_name`	string	Name of the entity
`entity_type`	string	Type of the entity
`description_score`	float	Score for description richness (0--20)
`confidence_score`	float	Score for extraction confidence (0--15)
`cross_chunk_score`	float	Score for cross-chunk mentions (0--15)
`properties_score`	float	Score for property richness (0--15)
`aliases_score`	float	Score for alias count (0--10)
`type_value_score`	float	Score based on entity type tier (0--25)
`total_score`	float	Sum of all component scores (0--100)

RelationshipQualityScoreResponse

Field	Type	Description
`relationship_type`	string	Type of the relationship
`source_entity`	string	Name of source entity
`target_entity`	string	Name of target entity
`justification_score`	float	Score for justification richness (0--35)
`confidence_score`	float	Score for extraction confidence (0--25)
`specificity_score`	float	Score based on relationship type tier (0--25)
`valid_refs_score`	float	Score for valid entity references (0--15)
`total_score`	float	Sum of all component scores (0--100)

Errors

Status	Description
`404`	Source not found

Recalculate Scores

POST /api/v1/quality/recalculate

Recalculate and cache quality scores for all sources, or for sources in a specific domain.

Use Cases

After updating scoring configuration (domain quality_scoring settings)
After upgrading to a new scoring algorithm version
Initial migration of existing data

Request Body

Field	Type	Required	Default	Description
`domain`	string\|null	No	`null`	Only recalculate sources in this extraction domain

Example

# Recalculate all sources
curl -X POST "http://localhost/api/v1/quality/recalculate" \
  -H "Content-Type: application/json" \
  -d '{}'

# Recalculate only science domain
curl -X POST "http://localhost/api/v1/quality/recalculate" \
  -H "Content-Type: application/json" \
  -d '{"domain": "science"}'

Response

200 OK

{
  "recalculated_count": 42,
  "errors": []
}

With errors:

{
  "recalculated_count": 40,
  "errors": [
    {
      "source_id": "src-broken1",
      "error": "No graph data found for source"
    },
    {
      "source_id": "src-broken2",
      "error": "Failed to read extraction results"
    }
  ]
}

RecalculateResponse

Field	Type	Description
`recalculated_count`	int	Number of sources successfully recalculated
`errors`	dict[]	List of errors encountered during recalculation

Outdated Sources

GET /api/v1/quality/outdated

Get sources with outdated or missing cached quality scores. Returns sources that need recalculation due to missing cached scores (never calculated) or an outdated scoring version (algorithm changed since caching).

Example

curl "http://localhost/api/v1/quality/outdated"

Response

200 OK

{
  "outdated_count": 5,
  "sources": [
    {
      "id": "src-abc123",
      "title": "Research Paper on Neural Networks",
      "cached_scores_version": 1,
      "current_version": 3
    }
  ]
}

A cached_scores_version of null indicates the source has never been scored.

OutdatedSourcesResponse

Field	Type	Description
`outdated_count`	int	Number of sources with outdated scores
`sources`	`OutdatedSourceResponse[]`	List of sources needing recalculation

OutdatedSourceResponse

Field	Type	Description
`id`	string	Source ID
`title`	string\|null	Source title
`cached_scores_version`	int\|null	Version of cached scores (null if never calculated)
`current_version`	int	Current scoring algorithm version

Batch Analysis

POST /api/v1/quality/analyze

Analyze quality across multiple sources with optional filters. Returns all matching sources with aggregated average metrics.

Request Body

Field	Type	Required	Default	Description
`source_ids`	string[]\|null	No	`null`	Specific source IDs to analyze (null = all)
`domain`	string\|null	No	`null`	Filter by extraction domain
`min_entities`	int	No	`0`	Minimum entity count to include

Example

# Analyze all sources
curl -X POST "http://localhost/api/v1/quality/analyze" \
  -H "Content-Type: application/json" \
  -d '{}'

# Analyze specific sources
curl -X POST "http://localhost/api/v1/quality/analyze" \
  -H "Content-Type: application/json" \
  -d '{"source_ids": ["src-abc123", "src-def456"]}'

# Filter by domain with minimum entity count
curl -X POST "http://localhost/api/v1/quality/analyze" \
  -H "Content-Type: application/json" \
  -d '{"domain": "science", "min_entities": 10}'

Response

200 OK

Each item in sources is a SourceQualityScoreResponse.

{
  "sources": [
    {
      "source_id": "src-abc123",
      "source_title": "Research Paper on Neural Networks",
      "...": "... same schema as SourceQualityScoreResponse ..."
    }
  ],
  "total_sources": 2,
  "avg_score": 4699.75,
  "avg_entity_quality": 59.14,
  "avg_relationship_quality": 49.02
}

QualityAnalysisResponse

Field	Type	Description
`sources`	`SourceQualityScoreResponse[]`	Quality scores for each source
`total_sources`	int	Total sources analyzed
`avg_score`	float	Average total score across sources
`avg_entity_quality`	float	Average entity quality across sources
`avg_relationship_quality`	float	Average relationship quality across sources

Paginated Analysis

GET /api/v1/quality/analyze

Analyze quality across sources with pagination, sorting, and filtering. Returns a single page of results with pagination metadata and aggregated averages computed across all matching sources.

Query Parameters

Parameter	Type	Required	Default	Description
`domain`	string\|null	No	`null`	Filter by extraction domain
`min_entities`	int	No	`0`	Minimum entity count to include (min: 0)
`page`	int	No	`1`	Page number (min: 1)
`page_size`	int\|null	No	server default	Items per page (min: 1, capped at server max)
`sort_by`	string	No	`total_score`	Sort field: `total_score`, `avg_entity_quality`, `avg_relationship_quality`, or `entity_count`
`sort_order`	string	No	`desc`	Sort order: `asc` or `desc`

Page size defaults

When page_size is not provided, the server default page size is used. Values exceeding the server maximum are clamped automatically.

Example

# Default paginated analysis (sorted by total_score descending)
curl "http://localhost/api/v1/quality/analyze"

# Filter by domain with custom pagination
curl "http://localhost/api/v1/quality/analyze?domain=science&page=2&page_size=10"

# Sort by entity quality ascending
curl "http://localhost/api/v1/quality/analyze?sort_by=avg_entity_quality&sort_order=asc"

# Filter sources with at least 5 entities
curl "http://localhost/api/v1/quality/analyze?min_entities=5"

Response

200 OK

Each item in sources is a SourceQualityScoreResponse.

{
  "sources": [
    {
      "source_id": "src-abc123",
      "source_title": "Research Paper on Neural Networks",
      "...": "... same schema as SourceQualityScoreResponse ..."
    }
  ],
  "total_sources": 42,
  "avg_score": 4200.3,
  "avg_entity_quality": 57.5,
  "avg_relationship_quality": 46.8,
  "pagination": {
    "page": 1,
    "page_size": 50,
    "total": 42,
    "total_pages": 1,
    "has_next": false,
    "has_prev": false
  }
}

QualityAnalysisPaginatedResponse

Field	Type	Description
`sources`	`SourceQualityScoreResponse[]`	Quality scores for the current page
`total_sources`	int	Total sources analyzed
`avg_score`	float	Average total score across all sources
`avg_entity_quality`	float	Average entity quality across all sources
`avg_relationship_quality`	float	Average relationship quality across all sources
`pagination`	`PaginationInfo`	Pagination metadata

PaginationInfo

Field	Type	Description
`page`	int	Current page number
`page_size`	int	Items per page
`total`	int	Total items
`total_pages`	int	Total number of pages
`has_next`	bool	Whether there is a next page
`has_prev`	bool	Whether there is a previous page

Domain Comparison

GET /api/v1/quality/domains

Compare quality performance across extraction domains. Returns aggregated quality metrics for each domain, sorted by average total score descending.

Example

curl "http://localhost/api/v1/quality/domains"

Response

200 OK

One entry per domain, sorted by avg_total_score descending:

{
  "domains": [
    {
      "domain": "science",
      "source_count": 15,
      "avg_total_score": 5200.8,
      "avg_entity_quality": 62.4,
      "avg_relationship_quality": 51.3,
      "avg_connectivity_ratio": 0.78,
      "total_entities": 680,
      "total_relationships": 920
    }
  ]
}

DomainComparisonResponse

Field	Type	Description
`domains`	`DomainPerformanceResponse[]`	Performance metrics per domain

DomainPerformanceResponse

Field	Type	Description
`domain`	string	Domain name
`source_count`	int	Number of sources in this domain
`avg_total_score`	float	Average total score
`avg_entity_quality`	float	Average entity quality
`avg_relationship_quality`	float	Average relationship quality
`avg_connectivity_ratio`	float	Average connectivity ratio
`total_entities`	int	Total entities across all sources
`total_relationships`	int	Total relationships across all sources

Database Summary

GET /api/v1/quality/summary

Get an overall quality summary for the entire database. Provides high-level statistics and identifies the top 5 and bottom 5 sources by total score.

Example

curl "http://localhost/api/v1/quality/summary"

Response

200 OK

{
  "total_sources": 42,
  "total_entities": 1490,
  "total_relationships": 1875,
  "avg_total_score": 4150.6,
  "avg_entity_quality": 55.3,
  "avg_relationship_quality": 45.9,
  "avg_quality_grade": 62.1,
  "avg_connectivity_ratio": 0.66,
  "top_sources": [
    {
      "source_id": "src-top1",
      "source_title": "Comprehensive Biology Textbook",
      "...": "... same schema as SourceQualityScoreResponse ..."
    }
  ],
  "bottom_sources": [
    {
      "source_id": "src-bottom1",
      "source_title": "Brief Meeting Notes",
      "...": "... same schema as SourceQualityScoreResponse ..."
    }
  ]
}

QualitySummaryResponse

Field	Type	Description
`total_sources`	int	Total sources with extractions
`total_entities`	int	Total entities extracted
`total_relationships`	int	Total relationships extracted
`avg_total_score`	float	Average total score
`avg_entity_quality`	float	Average entity quality
`avg_relationship_quality`	float	Average relationship quality
`avg_quality_grade`	float	Average quality grade (0--100)
`avg_connectivity_ratio`	float	Average connectivity ratio
`top_sources`	`SourceQualityScoreResponse[]`	Top 5 sources by total score
`bottom_sources`	`SourceQualityScoreResponse[]`	Bottom 5 sources by total score

Score Source​

Path Parameters​

Query Parameters​

Example​

Response​

SourceQualityScoreResponse​

Errors​

Score Source Details​

Path Parameters​

Query Parameters​

Example​

Response​

SourceQualityDetailResponse​

EntityQualityScoreResponse​

RelationshipQualityScoreResponse​

Errors​

Recalculate Scores​

Use Cases​

Request Body​

Example​

Response​

RecalculateResponse​

Outdated Sources​

Example​

Response​

OutdatedSourcesResponse​

OutdatedSourceResponse​

Batch Analysis​

Request Body​

Example​

Response​

QualityAnalysisResponse​

Paginated Analysis​

Query Parameters​

Example​

Response​

QualityAnalysisPaginatedResponse​

PaginationInfo​

Domain Comparison​

Example​

Response​

DomainComparisonResponse​

DomainPerformanceResponse​

Database Summary​

Example​

Response​

QualitySummaryResponse​

Score Source

Path Parameters

Query Parameters

Example

Response

SourceQualityScoreResponse

Errors

Score Source Details

Path Parameters

Query Parameters

Example

Response

SourceQualityDetailResponse

EntityQualityScoreResponse

RelationshipQualityScoreResponse

Errors

Recalculate Scores

Use Cases

Request Body

Example

Response

RecalculateResponse

Outdated Sources

Example

Response

OutdatedSourcesResponse

OutdatedSourceResponse

Batch Analysis

Request Body

Example

Response

QualityAnalysisResponse

Paginated Analysis

Query Parameters

Example

Response

QualityAnalysisPaginatedResponse

PaginationInfo

Domain Comparison

Example

Response

DomainComparisonResponse

DomainPerformanceResponse

Database Summary

Example

Response

QualitySummaryResponse