Skip to main content

LLM Monitoring

Monitor and manage the LLM task queue -- view statistics, inspect running tasks, cancel operations, and diagnose stuck queues.

All endpoints are prefixed with /api/v1/llm.


Statistics

Get LLM Queue Stats

GET /api/v1/llm/stats

Returns current LLM queue statistics including queue depth, token usage, cost tracking, estimated completion times, and semaphore state.

curl http://localhost:8080/api/v1/llm/stats

Response 200 OK -- LLMStatsResponse

{
"data": {
"queues": [
{
"queue": "llm",
"queued": 3,
"running": 1,
"workers": 1,
"max_depth": 100,
"depth_percent": 4.0,
"total_input_tokens": 15200,
"total_output_tokens": 4800,
"total_cost_usd": 0.032
}
],
"total_queued": 3,
"total_cost_usd": 0.032,
"total_input_tokens": 15200,
"total_output_tokens": 4800,
"total_tokens": 20000,
"estimated_completion_time_seconds": 45,
"estimated_completion_time_human": "45s",
"estimated_completion_times_human": {
"llm": "45s",
"operations": "0s"
},
"semaphore_stats": {
"max_concurrent": 1,
"reserved_high_priority": 0,
"active_count": 1,
"active_high_priority": 0,
"active_low_priority": 1,
"waiting_high_priority": 0,
"waiting_low_priority": 2,
"total_high_priority": 42,
"total_low_priority": 118,
"avg_wait_time_high": 0.05,
"avg_wait_time_low": 12.3
}
}
}

Errors:

StatusDescription
503LLM queue service unavailable

Clear Stats

DELETE /api/v1/llm/stats

Clear all LLM queue statistics and remove old completed tasks from Valkey. Also clears workflow stats if available.

# Clear tasks older than 48 hours
curl -X DELETE "http://localhost:8080/api/v1/llm/stats?older_than_hours=48"

# Clear with default (24 hours)
curl -X DELETE http://localhost:8080/api/v1/llm/stats
ParameterTypeRequiredDefaultDescription
older_than_hoursintegerNo24Remove completed tasks older than this many hours. Min: 0, Max: 8760.

Response 204 No Content -- empty body.

Errors:

StatusDescription
422Validation error (e.g., older_than_hours out of range)
503LLM queue service unavailable

Tasks

List LLM Tasks

GET /api/v1/llm/tasks

List currently queued and running LLM tasks. Completed and failed tasks are excluded.

curl http://localhost:8080/api/v1/llm/tasks

Response 200 OK -- LLMTasksResponse

{
"data": [
{
"task_id": "a1b2c3d4-e5f6-7890-abcd-ef1234567890",
"queue": "llm",
"operation": "chat_completion",
"status": "running",
"priority": "10",
"created_at": "2026-03-09T14:30:00.000000+00:00",
"started_at": "2026-03-09T14:30:01.500000+00:00",
"metadata": "{\"source\": \"interactive_chat\"}",
"attempts": "1"
},
{
"task_id": "b2c3d4e5-f6a7-8901-bcde-f12345678901",
"queue": "llm",
"operation": "generate_embedding",
"status": "queued",
"priority": "50",
"created_at": "2026-03-09T14:30:05.000000+00:00",
"metadata": "{}",
"attempts": "0"
}
]
}

Errors:

StatusDescription
503LLM queue service unavailable

Get Task Status

GET /api/v1/llm/tasks/{task_id}

Get the status of a specific LLM task by its ID.

curl http://localhost:8080/api/v1/llm/tasks/a1b2c3d4-e5f6-7890-abcd-ef1234567890
ParameterTypeRequiredDescription
task_idstring (path)YesUUID of the task to inspect

Response 200 OK -- LLMTaskStatusResponse

{
"data": {
"task_id": "a1b2c3d4-e5f6-7890-abcd-ef1234567890",
"queue": "llm",
"operation": "chat_completion",
"status": "running",
"priority": "10",
"created_at": "2026-03-09T14:30:00.000000+00:00",
"started_at": "2026-03-09T14:30:01.500000+00:00",
"data": "{\"messages\": [...], \"task_type\": \"CHAT\"}",
"metadata": "{\"source\": \"interactive_chat\"}",
"result_ttl": "3600",
"attempts": "1"
}
}

Task statuses: queued, running, completed, failed, cancelled

Errors:

StatusDescription
404Task not found
503LLM queue service unavailable

Cancel Task

DELETE /api/v1/llm/tasks/{task_id}

Cancel a specific queued or running LLM task.

curl -X DELETE http://localhost:8080/api/v1/llm/tasks/a1b2c3d4-e5f6-7890-abcd-ef1234567890
ParameterTypeRequiredDescription
task_idstring (path)YesUUID of the task to cancel

Response 204 No Content -- empty body.

Errors:

StatusDescription
400Task could not be cancelled (not found or already completed)
503LLM queue service unavailable

Cancel All Tasks

DELETE /api/v1/llm/tasks

Bulk cancel all queued and running LLM tasks.

curl -X DELETE http://localhost:8080/api/v1/llm/tasks

Response 200 OK -- CancelAllTasksResponse

{
"data": {
"cancelled": 5,
"message": "Task cancellation requested for LLM queue"
}
}

Errors:

StatusDescription
503LLM queue service unavailable

Diagnostics

Clear Semaphore

DELETE /api/v1/llm/semaphore

Clear all waiting tasks from the LLM priority semaphore queues. Use this when Valkey queues have been cleared but the semaphore still has orphaned waiters that will never complete.

warning

This can cause deadlock if workers are actively waiting. Only use when Valkey queues have been cleared and no workers are actively processing. If unsure, restart the backend instead.

curl -X DELETE http://localhost:8080/api/v1/llm/semaphore

Response 200 OK -- ClearSemaphoreResponse

{
"data": {
"high_priority_cleared": 0,
"low_priority_cleared": 3,
"total_cleared": 3
}
}

Errors:

StatusDescription
503LLM queue service unavailable

Response Models

LLMStatsResponse

FieldTypeDescription
data.queuesarrayPer-queue statistics (queue name, depth, workers, token usage, cost)
data.total_queuedintegerTotal tasks waiting across all LLM queues
data.total_cost_usdfloatCumulative estimated cost in USD
data.total_input_tokensintegerCumulative input tokens processed
data.total_output_tokensintegerCumulative output tokens generated
data.total_tokensintegerSum of input and output tokens
data.estimated_completion_time_secondsintegerEstimated seconds to drain the LLM queue
data.estimated_completion_time_humanstringHuman-readable estimate (e.g., "2m 30s")
data.estimated_completion_times_humanobjectPer-queue-type estimates (llm, operations)
data.semaphore_statsobjectReal-time semaphore state (active slots, waiting counts, averages)

LLMTasksResponse

FieldTypeDescription
dataarrayList of active task objects (queued and running only)

LLMTaskStatusResponse

FieldTypeDescription
dataobjectFull task metadata including task_id, queue, operation, status, priority, created_at, started_at, data, metadata, result_ttl, and attempts

CancelAllTasksResponse

FieldTypeDescription
data.cancelledintegerNumber of tasks that were cancelled
data.messagestringHuman-readable result message

ClearSemaphoreResponse

FieldTypeDescription
data.high_priority_clearedintegerNumber of high-priority waiters cleared
data.low_priority_clearedintegerNumber of low-priority waiters cleared
data.total_clearedintegerTotal waiters cleared