knowledgesdk.com/glossary/json-schema
LLMsbeginner

Also known as: JSON schema

JSON Schema

A vocabulary for describing and validating the structure of JSON data, widely used to define the expected output format for LLM function calls.

What Is JSON Schema?

JSON Schema is a declarative vocabulary — itself written in JSON — that describes the structure, types, and constraints of a JSON document. It acts as a contract: any JSON object that conforms to a given schema is guaranteed to have the right shape, types, and required fields.

JSON Schema is defined by an open specification (currently Draft 2020-12) and is supported by validators in virtually every programming language.

A minimal example:

{
  "type": "object",
  "properties": {
    "name": { "type": "string" },
    "price": { "type": "number", "minimum": 0 },
    "in_stock": { "type": "boolean" },
    "tags": {
      "type": "array",
      "items": { "type": "string" }
    }
  },
  "required": ["name", "price", "in_stock"],
  "additionalProperties": false
}

This schema describes a product object with three required fields and one optional array field.

JSON Schema Keywords Reference

Keyword Purpose Example
type Data type "string", "number", "boolean", "array", "object", "null"
properties Object field definitions {"name": {"type": "string"}}
required List of mandatory fields ["name", "price"]
additionalProperties Allow extra fields false to disallow
enum Allowed values ["active", "inactive", "pending"]
minimum / maximum Number bounds {"minimum": 0, "maximum": 100}
minLength / maxLength String length bounds {"minLength": 1}
items Array element schema {"type": "string"}
description Human-readable field description LLMs use this for context
$ref Reference to another schema Supports schema reuse

JSON Schema in LLM Function Calling

When you define a tool or function for an LLM, you describe its parameters using JSON Schema. The LLM uses the schema — including description fields — to understand what each parameter means and how to populate it:

const extractProductTool = {
  type: "function",
  function: {
    name: "extract_product_details",
    description: "Extract structured product information from a web page.",
    parameters: {
      type: "object",
      properties: {
        name: {
          type: "string",
          description: "The full product name as it appears on the page."
        },
        price_usd: {
          type: "number",
          description: "The product price in US dollars. Omit currency symbols."
        },
        availability: {
          type: "string",
          enum: ["in_stock", "out_of_stock", "pre_order"],
          description: "Current availability status."
        },
        features: {
          type: "array",
          items: { "type": "string" },
          description: "List of key product features or bullet points."
        }
      },
      required: ["name", "price_usd", "availability"],
      additionalProperties: false
    }
  }
};

The description fields are not just for humans — LLMs read them to understand the semantic meaning of each field, which directly improves extraction accuracy.

JSON Schema for Structured Output

OpenAI's response_format with json_schema uses grammar-constrained decoding to guarantee the output matches your schema token by token:

const response = await openai.chat.completions.create({
  model: "gpt-4o",
  response_format: {
    type: "json_schema",
    json_schema: {
      name: "knowledge_item",
      strict: true,
      schema: {
        type: "object",
        properties: {
          title: { type: "string", description: "Page title" },
          summary: { type: "string", description: "2-3 sentence summary" },
          category: {
            type: "string",
            enum: ["product", "documentation", "blog", "pricing", "other"]
          },
          key_facts: {
            type: "array",
            items: { type: "string" },
            description: "Key extractable facts from the page"
          }
        },
        required: ["title", "summary", "category", "key_facts"],
        additionalProperties: false
      }
    }
  },
  messages: [{ role: "user", content: scrapedContent }]
});

JSON Schema in KnowledgeSDK

KnowledgeSDK uses JSON Schema internally to ensure all extraction endpoints return consistently typed, validated responses. When you call /v1/extract, the response object is defined by a strict schema — guaranteeing that title, content, category, and metadata fields always have the correct types regardless of the source page.

The TypeScript SDK exposes these response types as TypeScript interfaces, giving you compile-time type safety on top of the runtime JSON Schema validation:

import KnowledgeSDK, { ExtractResult } from "@knowledgesdk/node";

const result: ExtractResult = await sdk.extract("https://example.com");
// result.title — string (guaranteed)
// result.content — string (guaranteed)
// result.category — string (guaranteed)

Validating JSON Schema in Your Application

Always validate LLM output against your schema in application code as a second layer of defense:

import Ajv from "ajv";
const ajv = new Ajv({ strict: true });
const validate = ajv.compile(mySchema);

const output = JSON.parse(llmResponse);
if (!validate(output)) {
  console.error(validate.errors);
  throw new Error("LLM output failed schema validation");
}

Combining LLM-side constrained generation with application-side validation gives you reliable, type-safe structured data extraction at scale.

Related Terms

LLMsintermediate
Structured Output
LLM responses constrained to a specific format — typically JSON — by using function calling, grammar constraints, or guided generation.
AI Agentsbeginner
Function Calling
A structured mechanism that allows LLMs to output structured JSON specifying a function name and arguments for external execution.
AI Agentsbeginner
Tool Use
The ability of an LLM-powered agent to call external functions, APIs, or services to gather information or take actions.
JavaScript RenderingKnowledge Base

Try it now

Build with JSON Schema using one API.

Extract, index, and search any web content. First 1,000 requests free.

GET API KEY →
← Back to glossary