What Is JSON Schema?
JSON Schema is a declarative vocabulary — itself written in JSON — that describes the structure, types, and constraints of a JSON document. It acts as a contract: any JSON object that conforms to a given schema is guaranteed to have the right shape, types, and required fields.
JSON Schema is defined by an open specification (currently Draft 2020-12) and is supported by validators in virtually every programming language.
A minimal example:
{
"type": "object",
"properties": {
"name": { "type": "string" },
"price": { "type": "number", "minimum": 0 },
"in_stock": { "type": "boolean" },
"tags": {
"type": "array",
"items": { "type": "string" }
}
},
"required": ["name", "price", "in_stock"],
"additionalProperties": false
}
This schema describes a product object with three required fields and one optional array field.
JSON Schema Keywords Reference
| Keyword | Purpose | Example |
|---|---|---|
type |
Data type | "string", "number", "boolean", "array", "object", "null" |
properties |
Object field definitions | {"name": {"type": "string"}} |
required |
List of mandatory fields | ["name", "price"] |
additionalProperties |
Allow extra fields | false to disallow |
enum |
Allowed values | ["active", "inactive", "pending"] |
minimum / maximum |
Number bounds | {"minimum": 0, "maximum": 100} |
minLength / maxLength |
String length bounds | {"minLength": 1} |
items |
Array element schema | {"type": "string"} |
description |
Human-readable field description | LLMs use this for context |
$ref |
Reference to another schema | Supports schema reuse |
JSON Schema in LLM Function Calling
When you define a tool or function for an LLM, you describe its parameters using JSON Schema. The LLM uses the schema — including description fields — to understand what each parameter means and how to populate it:
const extractProductTool = {
type: "function",
function: {
name: "extract_product_details",
description: "Extract structured product information from a web page.",
parameters: {
type: "object",
properties: {
name: {
type: "string",
description: "The full product name as it appears on the page."
},
price_usd: {
type: "number",
description: "The product price in US dollars. Omit currency symbols."
},
availability: {
type: "string",
enum: ["in_stock", "out_of_stock", "pre_order"],
description: "Current availability status."
},
features: {
type: "array",
items: { "type": "string" },
description: "List of key product features or bullet points."
}
},
required: ["name", "price_usd", "availability"],
additionalProperties: false
}
}
};
The description fields are not just for humans — LLMs read them to understand the semantic meaning of each field, which directly improves extraction accuracy.
JSON Schema for Structured Output
OpenAI's response_format with json_schema uses grammar-constrained decoding to guarantee the output matches your schema token by token:
const response = await openai.chat.completions.create({
model: "gpt-4o",
response_format: {
type: "json_schema",
json_schema: {
name: "knowledge_item",
strict: true,
schema: {
type: "object",
properties: {
title: { type: "string", description: "Page title" },
summary: { type: "string", description: "2-3 sentence summary" },
category: {
type: "string",
enum: ["product", "documentation", "blog", "pricing", "other"]
},
key_facts: {
type: "array",
items: { type: "string" },
description: "Key extractable facts from the page"
}
},
required: ["title", "summary", "category", "key_facts"],
additionalProperties: false
}
}
},
messages: [{ role: "user", content: scrapedContent }]
});
JSON Schema in KnowledgeSDK
KnowledgeSDK uses JSON Schema internally to ensure all extraction endpoints return consistently typed, validated responses. When you call /v1/extract, the response object is defined by a strict schema — guaranteeing that title, content, category, and metadata fields always have the correct types regardless of the source page.
The TypeScript SDK exposes these response types as TypeScript interfaces, giving you compile-time type safety on top of the runtime JSON Schema validation:
import KnowledgeSDK, { ExtractResult } from "@knowledgesdk/node";
const result: ExtractResult = await sdk.extract("https://example.com");
// result.title — string (guaranteed)
// result.content — string (guaranteed)
// result.category — string (guaranteed)
Validating JSON Schema in Your Application
Always validate LLM output against your schema in application code as a second layer of defense:
import Ajv from "ajv";
const ajv = new Ajv({ strict: true });
const validate = ajv.compile(mySchema);
const output = JSON.parse(llmResponse);
if (!validate(output)) {
console.error(validate.errors);
throw new Error("LLM output failed schema validation");
}
Combining LLM-side constrained generation with application-side validation gives you reliable, type-safe structured data extraction at scale.