Skip to main content

Quality

Evaluate and monitor the quality of entity and relationship extractions across sources. The Quality API provides scoring, analysis, and comparison tools to identify high- and low-performing extractions and track quality trends across domains.

Base path: /api/v1/quality


Score Source

GET /api/v1/quality/sources/{source_id}

Score a single source's extraction quality. Returns quality metrics including entity and relationship contributions, connectivity, density, and pollution indicators. Uses cached scores when available for performance.

Path Parameters

ParameterTypeRequiredDescription
source_idstringYesID of the source to score

Query Parameters

ParameterTypeRequiredDefaultDescription
force_recalculateboolNofalseBypass cache and recalculate fresh scores

Example

# Score a source (uses cache if available)
curl "http://localhost:8080/api/v1/quality/sources/src-abc123"

# Force recalculation
curl "http://localhost:8080/api/v1/quality/sources/src-abc123?force_recalculate=true"

Response

200 OK

{
"source_id": "src-abc123",
"source_title": "Research Paper on Neural Networks",
"domain": "science",
"entity_count": 45,
"relationship_count": 62,
"entity_contribution": 2847.5,
"relationship_contribution": 3102.0,
"connectivity_bonus": 150.0,
"total_score": 6099.5,
"avg_entity_quality": 63.28,
"avg_relationship_quality": 50.03,
"connectivity_ratio": 0.82,
"quality_grade": 72.5,
"quality_label": "Good",
"low_quality_entity_count": 3,
"low_quality_relationship_count": 8,
"density_ratio": 1.38,
"density_score": 68.9,
"topology_score": 75.4,
"pollution_penalty": 2.5,
"structural_penalty": 0.0,
"hub_skew": 1.4,
"reciprocal_rate": 0.05,
"coverage_score": 61.3
}

SourceQualityScoreResponse

FieldTypeDescription
source_idstringID of the source
source_titlestring|nullTitle of the source
domainstring|nullExtraction domain used
entity_countintNumber of entities
relationship_countintNumber of relationships
entity_contributionfloatSum of quality-weighted entity scores
relationship_contributionfloatSum of quality-weighted relationship scores
connectivity_bonusfloatBonus for connected entities
total_scorefloatRichness score (unbounded, quantity-driven)
avg_entity_qualityfloatAverage quality per entity (0--100)
avg_relationship_qualityfloatAverage quality per relationship (0--100)
connectivity_ratiofloatRatio of connected entities (0--1)
quality_gradefloatQuality rating 0--100 (independent of volume)
quality_labelstringQuality label: Outstanding, Excellent, Good, Fair, or Low
low_quality_entity_countintEntities with score below 40 (inflation indicator)
low_quality_relationship_countintRelationships with score below 40
density_ratiofloatRelationships per entity ratio
density_scorefloatDensity score (bell-shaped around target, 0--100; over-dense graphs are penalized)
topology_scorefloatCombined connectivity + density score (0--100)
pollution_penaltyfloatPenalty for low-quality items (0--15)
structural_penaltyfloatPenalty for graph-shape noise: hub skew + reciprocal rate (0--15)
hub_skewfloatmax_entity_degree ÷ median_entity_degree (≥1.0; high = one entity over-connected)
reciprocal_ratefloatFraction of edges with a same-type reciprocal partner (0--1)
coverage_scorefloatEntities per chunk normalized to 0--100

Errors

StatusDescription
404Source not found

Score Source Details

GET /api/v1/quality/sources/{source_id}/details

Score a source with detailed entity and relationship breakdowns. Returns the same top-level metrics as the score endpoint plus individual score breakdowns for every entity and relationship.

Cache behavior

Detail view always requires calculation as individual breakdowns are not cached.

Path Parameters

ParameterTypeRequiredDescription
source_idstringYesID of the source to score

Query Parameters

ParameterTypeRequiredDefaultDescription
force_recalculateboolNofalseBypass cache and recalculate fresh scores

Example

curl "http://localhost:8080/api/v1/quality/sources/src-abc123/details"

Response

200 OK

Returns a SourceQualityScoreResponse with two additional fields — entity_scores and relationship_scores:

{
"source_id": "src-abc123",
"source_title": "Research Paper on Neural Networks",
"...": "... same fields as SourceQualityScoreResponse ...",
"entity_scores": [
{
"entity_name": "Convolutional Neural Network",
"entity_type": "Concept",
"description_score": 18.0,
"confidence_score": 13.5,
"cross_chunk_score": 12.0,
"properties_score": 10.5,
"aliases_score": 7.0,
"type_value_score": 25.0,
"total_score": 86.0
}
],
"relationship_scores": [
{
"relationship_type": "trained_on",
"source_entity": "Convolutional Neural Network",
"target_entity": "ImageNet",
"justification_score": 30.0,
"confidence_score": 22.5,
"specificity_score": 20.0,
"valid_refs_score": 15.0,
"total_score": 87.5
}
]
}

SourceQualityDetailResponse

Extends SourceQualityScoreResponse with two additional fields:

FieldTypeDescription
entity_scoresEntityQualityScoreResponse[]Individual entity score breakdowns
relationship_scoresRelationshipQualityScoreResponse[]Individual relationship score breakdowns

EntityQualityScoreResponse

FieldTypeDescription
entity_namestringName of the entity
entity_typestringType of the entity
description_scorefloatScore for description richness (0--20)
confidence_scorefloatScore for extraction confidence (0--15)
cross_chunk_scorefloatScore for cross-chunk mentions (0--15)
properties_scorefloatScore for property richness (0--15)
aliases_scorefloatScore for alias count (0--10)
type_value_scorefloatScore based on entity type tier (0--25)
total_scorefloatSum of all component scores (0--100)

RelationshipQualityScoreResponse

FieldTypeDescription
relationship_typestringType of the relationship
source_entitystringName of source entity
target_entitystringName of target entity
justification_scorefloatScore for justification richness (0--35)
confidence_scorefloatScore for extraction confidence (0--25)
specificity_scorefloatScore based on relationship type tier (0--25)
valid_refs_scorefloatScore for valid entity references (0--15)
total_scorefloatSum of all component scores (0--100)

Errors

StatusDescription
404Source not found

Recalculate Scores

POST /api/v1/quality/recalculate

Recalculate and cache quality scores for all sources, or for sources in a specific domain.

Use Cases

  • After updating scoring configuration (domain quality_scoring settings)
  • After upgrading to a new scoring algorithm version
  • Initial migration of existing data

Request Body

FieldTypeRequiredDefaultDescription
domainstring|nullNonullOnly recalculate sources in this extraction domain

Example

# Recalculate all sources
curl -X POST "http://localhost:8080/api/v1/quality/recalculate" \
-H "Content-Type: application/json" \
-d '{}'

# Recalculate only science domain
curl -X POST "http://localhost:8080/api/v1/quality/recalculate" \
-H "Content-Type: application/json" \
-d '{"domain": "science"}'

Response

200 OK

{
"recalculated_count": 42,
"errors": []
}

With errors:

{
"recalculated_count": 40,
"errors": [
{
"source_id": "src-broken1",
"error": "No graph data found for source"
},
{
"source_id": "src-broken2",
"error": "Failed to read extraction results"
}
]
}

RecalculateResponse

FieldTypeDescription
recalculated_countintNumber of sources successfully recalculated
errorsdict[]List of errors encountered during recalculation

Outdated Sources

GET /api/v1/quality/outdated

Get sources with outdated or missing cached quality scores. Returns sources that need recalculation due to missing cached scores (never calculated) or an outdated scoring version (algorithm changed since caching).

Example

curl "http://localhost:8080/api/v1/quality/outdated"

Response

200 OK

{
"outdated_count": 5,
"sources": [
{
"id": "src-abc123",
"title": "Research Paper on Neural Networks",
"cached_scores_version": 1,
"current_version": 3
}
]
}

A cached_scores_version of null indicates the source has never been scored.

OutdatedSourcesResponse

FieldTypeDescription
outdated_countintNumber of sources with outdated scores
sourcesOutdatedSourceResponse[]List of sources needing recalculation

OutdatedSourceResponse

FieldTypeDescription
idstringSource ID
titlestring|nullSource title
cached_scores_versionint|nullVersion of cached scores (null if never calculated)
current_versionintCurrent scoring algorithm version

Batch Analysis

POST /api/v1/quality/analyze

Analyze quality across multiple sources with optional filters. Returns all matching sources with aggregated average metrics.

Request Body

FieldTypeRequiredDefaultDescription
source_idsstring[]|nullNonullSpecific source IDs to analyze (null = all)
domainstring|nullNonullFilter by extraction domain
min_entitiesintNo0Minimum entity count to include

Example

# Analyze all sources
curl -X POST "http://localhost:8080/api/v1/quality/analyze" \
-H "Content-Type: application/json" \
-d '{}'

# Analyze specific sources
curl -X POST "http://localhost:8080/api/v1/quality/analyze" \
-H "Content-Type: application/json" \
-d '{"source_ids": ["src-abc123", "src-def456"]}'

# Filter by domain with minimum entity count
curl -X POST "http://localhost:8080/api/v1/quality/analyze" \
-H "Content-Type: application/json" \
-d '{"domain": "science", "min_entities": 10}'

Response

200 OK

Each item in sources is a SourceQualityScoreResponse.

{
"sources": [
{
"source_id": "src-abc123",
"source_title": "Research Paper on Neural Networks",
"...": "... same schema as SourceQualityScoreResponse ..."
}
],
"total_sources": 2,
"avg_score": 4699.75,
"avg_entity_quality": 59.14,
"avg_relationship_quality": 49.02
}

QualityAnalysisResponse

FieldTypeDescription
sourcesSourceQualityScoreResponse[]Quality scores for each source
total_sourcesintTotal sources analyzed
avg_scorefloatAverage total score across sources
avg_entity_qualityfloatAverage entity quality across sources
avg_relationship_qualityfloatAverage relationship quality across sources

Paginated Analysis

GET /api/v1/quality/analyze

Analyze quality across sources with pagination, sorting, and filtering. Returns a single page of results with pagination metadata and aggregated averages computed across all matching sources.

Query Parameters

ParameterTypeRequiredDefaultDescription
domainstring|nullNonullFilter by extraction domain
min_entitiesintNo0Minimum entity count to include (min: 0)
pageintNo1Page number (min: 1)
page_sizeint|nullNoserver defaultItems per page (min: 1, capped at server max)
sort_bystringNototal_scoreSort field: total_score, avg_entity_quality, avg_relationship_quality, or entity_count
sort_orderstringNodescSort order: asc or desc
Page size defaults

When page_size is not provided, the server default page size is used. Values exceeding the server maximum are clamped automatically.

Example

# Default paginated analysis (sorted by total_score descending)
curl "http://localhost:8080/api/v1/quality/analyze"

# Filter by domain with custom pagination
curl "http://localhost:8080/api/v1/quality/analyze?domain=science&page=2&page_size=10"

# Sort by entity quality ascending
curl "http://localhost:8080/api/v1/quality/analyze?sort_by=avg_entity_quality&sort_order=asc"

# Filter sources with at least 5 entities
curl "http://localhost:8080/api/v1/quality/analyze?min_entities=5"

Response

200 OK

Each item in sources is a SourceQualityScoreResponse.

{
"sources": [
{
"source_id": "src-abc123",
"source_title": "Research Paper on Neural Networks",
"...": "... same schema as SourceQualityScoreResponse ..."
}
],
"total_sources": 42,
"avg_score": 4200.3,
"avg_entity_quality": 57.5,
"avg_relationship_quality": 46.8,
"pagination": {
"page": 1,
"page_size": 50,
"total": 42,
"total_pages": 1,
"has_next": false,
"has_prev": false
}
}

QualityAnalysisPaginatedResponse

FieldTypeDescription
sourcesSourceQualityScoreResponse[]Quality scores for the current page
total_sourcesintTotal sources analyzed
avg_scorefloatAverage total score across all sources
avg_entity_qualityfloatAverage entity quality across all sources
avg_relationship_qualityfloatAverage relationship quality across all sources
paginationPaginationInfoPagination metadata

PaginationInfo

FieldTypeDescription
pageintCurrent page number
page_sizeintItems per page
totalintTotal items
total_pagesintTotal number of pages
has_nextboolWhether there is a next page
has_prevboolWhether there is a previous page

Domain Comparison

GET /api/v1/quality/domains

Compare quality performance across extraction domains. Returns aggregated quality metrics for each domain, sorted by average total score descending.

Example

curl "http://localhost:8080/api/v1/quality/domains"

Response

200 OK

One entry per domain, sorted by avg_total_score descending:

{
"domains": [
{
"domain": "science",
"source_count": 15,
"avg_total_score": 5200.8,
"avg_entity_quality": 62.4,
"avg_relationship_quality": 51.3,
"avg_connectivity_ratio": 0.78,
"total_entities": 680,
"total_relationships": 920
}
]
}

DomainComparisonResponse

FieldTypeDescription
domainsDomainPerformanceResponse[]Performance metrics per domain

DomainPerformanceResponse

FieldTypeDescription
domainstringDomain name
source_countintNumber of sources in this domain
avg_total_scorefloatAverage total score
avg_entity_qualityfloatAverage entity quality
avg_relationship_qualityfloatAverage relationship quality
avg_connectivity_ratiofloatAverage connectivity ratio
total_entitiesintTotal entities across all sources
total_relationshipsintTotal relationships across all sources

Database Summary

GET /api/v1/quality/summary

Get an overall quality summary for the entire database. Provides high-level statistics and identifies the top 5 and bottom 5 sources by total score.

Example

curl "http://localhost:8080/api/v1/quality/summary"

Response

200 OK

{
"total_sources": 42,
"total_entities": 1490,
"total_relationships": 1875,
"avg_total_score": 4150.6,
"avg_entity_quality": 55.3,
"avg_relationship_quality": 45.9,
"avg_quality_grade": 62.1,
"avg_connectivity_ratio": 0.66,
"top_sources": [
{
"source_id": "src-top1",
"source_title": "Comprehensive Biology Textbook",
"...": "... same schema as SourceQualityScoreResponse ..."
}
],
"bottom_sources": [
{
"source_id": "src-bottom1",
"source_title": "Brief Meeting Notes",
"...": "... same schema as SourceQualityScoreResponse ..."
}
]
}

QualitySummaryResponse

FieldTypeDescription
total_sourcesintTotal sources with extractions
total_entitiesintTotal entities extracted
total_relationshipsintTotal relationships extracted
avg_total_scorefloatAverage total score
avg_entity_qualityfloatAverage entity quality
avg_relationship_qualityfloatAverage relationship quality
avg_quality_gradefloatAverage quality grade (0--100)
avg_connectivity_ratiofloatAverage connectivity ratio
top_sourcesSourceQualityScoreResponse[]Top 5 sources by total score
bottom_sourcesSourceQualityScoreResponse[]Bottom 5 sources by total score