LLM Monitoring

Monitor and manage the LLM task queue -- view statistics, inspect running tasks, cancel operations, and diagnose stuck queues.

All endpoints are prefixed with /api/v1/llm.

Statistics

Get LLM Queue Stats

GET /api/v1/llm/stats

Returns current LLM queue statistics including queue depth, token usage, cost tracking, estimated completion times, and semaphore state.

curl http://localhost/api/v1/llm/stats

Response 200 OK -- LLMStatsResponse

{
  "data": {
    "queues": [
      {
        "queue": "llm",
        "queued": 3,
        "running": 1,
        "workers": 1,
        "max_depth": 100,
        "depth_percent": 4.0,
        "total_input_tokens": 15200,
        "total_output_tokens": 4800,
        "total_cost_usd": 0.032
      }
    ],
    "total_queued": 3,
    "total_cost_usd": 0.032,
    "total_input_tokens": 15200,
    "total_output_tokens": 4800,
    "total_tokens": 20000,
    "estimated_completion_time_seconds": 45,
    "estimated_completion_time_human": "45s",
    "estimated_completion_times_human": {
      "llm": "45s",
      "operations": "0s"
    },
    "semaphore_stats": {
      "max_concurrent": 1,
      "reserved_high_priority": 0,
      "active_count": 1,
      "active_high_priority": 0,
      "active_low_priority": 1,
      "waiting_high_priority": 0,
      "waiting_low_priority": 2,
      "total_high_priority": 42,
      "total_low_priority": 118,
      "avg_wait_time_high": 0.05,
      "avg_wait_time_low": 12.3
    }
  }
}

Errors:

Status	Description
`503`	LLM queue service unavailable

Clear Stats

DELETE /api/v1/llm/stats

Clear all LLM queue statistics and remove old completed tasks from Valkey. Also clears workflow stats if available.

# Clear tasks older than 48 hours
curl -X DELETE "http://localhost/api/v1/llm/stats?older_than_hours=48"

# Clear with default (24 hours)
curl -X DELETE http://localhost/api/v1/llm/stats

Parameter	Type	Required	Default	Description
`older_than_hours`	integer	No	`24`	Remove completed tasks older than this many hours. Min: `0`, Max: `8760`.

Response 204 No Content -- empty body.

Errors:

Status	Description
`422`	Validation error (e.g., `older_than_hours` out of range)
`503`	LLM queue service unavailable

Tasks

List LLM Tasks

GET /api/v1/llm/tasks

List currently queued and running LLM tasks. Completed and failed tasks are excluded.

curl http://localhost/api/v1/llm/tasks

Response 200 OK -- LLMTasksResponse

{
  "data": [
    {
      "task_id": "a1b2c3d4-e5f6-7890-abcd-ef1234567890",
      "queue": "llm",
      "operation": "chat_completion",
      "status": "running",
      "priority": "10",
      "created_at": "2026-03-09T14:30:00.000000+00:00",
      "started_at": "2026-03-09T14:30:01.500000+00:00",
      "metadata": "{\"source\": \"interactive_chat\"}",
      "attempts": "1"
    },
    {
      "task_id": "b2c3d4e5-f6a7-8901-bcde-f12345678901",
      "queue": "llm",
      "operation": "generate_embedding",
      "status": "queued",
      "priority": "50",
      "created_at": "2026-03-09T14:30:05.000000+00:00",
      "metadata": "{}",
      "attempts": "0"
    }
  ]
}

Errors:

Status	Description
`503`	LLM queue service unavailable

Get Task Status

GET /api/v1/llm/tasks/{task_id}

Get the status of a specific LLM task by its ID.

curl http://localhost/api/v1/llm/tasks/a1b2c3d4-e5f6-7890-abcd-ef1234567890

Parameter	Type	Required	Description
`task_id`	string (path)	Yes	UUID of the task to inspect

Response 200 OK -- LLMTaskStatusResponse

{
  "data": {
    "task_id": "a1b2c3d4-e5f6-7890-abcd-ef1234567890",
    "queue": "llm",
    "operation": "chat_completion",
    "status": "running",
    "priority": "10",
    "created_at": "2026-03-09T14:30:00.000000+00:00",
    "started_at": "2026-03-09T14:30:01.500000+00:00",
    "data": "{\"messages\": [...], \"task_type\": \"CHAT\"}",
    "metadata": "{\"source\": \"interactive_chat\"}",
    "result_ttl": "3600",
    "attempts": "1"
  }
}

Task statuses: queued, running, completed, failed, cancelled

Errors:

Status	Description
`404`	Task not found
`503`	LLM queue service unavailable

Cancel Task

DELETE /api/v1/llm/tasks/{task_id}

Cancel a specific queued or running LLM task.

curl -X DELETE http://localhost/api/v1/llm/tasks/a1b2c3d4-e5f6-7890-abcd-ef1234567890

Parameter	Type	Required	Description
`task_id`	string (path)	Yes	UUID of the task to cancel

Response 204 No Content -- empty body.

Errors:

Status	Description
`400`	Task could not be cancelled (not found or already completed)
`503`	LLM queue service unavailable

Cancel All Tasks

DELETE /api/v1/llm/tasks

Bulk cancel all queued and running LLM tasks.

curl -X DELETE http://localhost/api/v1/llm/tasks

Response 200 OK -- CancelAllTasksResponse

{
  "data": {
    "cancelled": 5,
    "message": "Task cancellation requested for LLM queue"
  }
}

Errors:

Status	Description
`503`	LLM queue service unavailable

Diagnostics

Clear Semaphore

DELETE /api/v1/llm/semaphore

Clear all waiting tasks from the LLM priority semaphore queues. Use this when Valkey queues have been cleared but the semaphore still has orphaned waiters that will never complete.

warning

This can cause deadlock if workers are actively waiting. Only use when Valkey queues have been cleared and no workers are actively processing. If unsure, restart the backend instead.

curl -X DELETE http://localhost/api/v1/llm/semaphore

Response 200 OK -- ClearSemaphoreResponse

{
  "data": {
    "high_priority_cleared": 0,
    "low_priority_cleared": 3,
    "total_cleared": 3
  }
}

Errors:

Status	Description
`503`	LLM queue service unavailable

Response Models

LLMStatsResponse

Field	Type	Description
`data.queues`	array	Per-queue statistics (queue name, depth, workers, token usage, cost)
`data.total_queued`	integer	Total tasks waiting across all LLM queues
`data.total_cost_usd`	float	Cumulative estimated cost in USD
`data.total_input_tokens`	integer	Cumulative input tokens processed
`data.total_output_tokens`	integer	Cumulative output tokens generated
`data.total_tokens`	integer	Sum of input and output tokens
`data.estimated_completion_time_seconds`	integer	Estimated seconds to drain the LLM queue
`data.estimated_completion_time_human`	string	Human-readable estimate (e.g., `"2m 30s"`)
`data.estimated_completion_times_human`	object	Per-queue-type estimates (`llm`, `operations`)
`data.semaphore_stats`	object	Real-time semaphore state (active slots, waiting counts, averages)

LLMTasksResponse

Field	Type	Description
`data`	array	List of active task objects (queued and running only)

LLMTaskStatusResponse

Field	Type	Description
`data`	object	Full task metadata including `task_id`, `queue`, `operation`, `status`, `priority`, `created_at`, `started_at`, `data`, `metadata`, `result_ttl`, and `attempts`

CancelAllTasksResponse

Field	Type	Description
`data.cancelled`	integer	Number of tasks that were cancelled
`data.message`	string	Human-readable result message

ClearSemaphoreResponse

Field	Type	Description
`data.high_priority_cleared`	integer	Number of high-priority waiters cleared
`data.low_priority_cleared`	integer	Number of low-priority waiters cleared
`data.total_cleared`	integer	Total waiters cleared

Statistics​

Get LLM Queue Stats​

Clear Stats​

Tasks​

List LLM Tasks​

Get Task Status​

Cancel Task​

Cancel All Tasks​

Diagnostics​

Clear Semaphore​

Response Models​

LLMStatsResponse​

LLMTasksResponse​

LLMTaskStatusResponse​

CancelAllTasksResponse​

ClearSemaphoreResponse​

Statistics

Get LLM Queue Stats

Clear Stats

Tasks

List LLM Tasks

Get Task Status

Cancel Task

Cancel All Tasks

Diagnostics

Clear Semaphore

Response Models

LLMStatsResponse

LLMTasksResponse

LLMTaskStatusResponse

CancelAllTasksResponse

ClearSemaphoreResponse