knowledgesdk.com/blog/web-data-for-google-adk
integrationMarch 20, 2026·14 min read

Live Web Data in Google ADK: Private Grounding for AI Agents

Google ADK's built-in google_search only searches the public index. Learn how to add KnowledgeSDK as a custom FunctionTool for private URL grounding and competitor monitoring.

Live Web Data in Google ADK: Private Grounding for AI Agents

Live Web Data in Google ADK: Private Grounding for AI Agents

Google's Agent Development Kit (ADK) ships with a built-in google_search grounding tool that connects your agent to Google's public search index. For many tasks, this is sufficient. But there is a category of agent use cases where it fundamentally falls short: when you need to read specific URLs, monitor competitor pages, or ground your agent in content that Google's index may not reflect accurately or completely.

The public search index returns snippets and summaries. It does not give you the full text of a pricing page updated yesterday. It does not let you monitor 20 specific competitor URLs for changes. It does not extract structured data from a page in the format your agent needs.

This tutorial shows you how to register KnowledgeSDK as a custom FunctionTool in Google ADK, build an agent that combines Google Search (for discovery) with KnowledgeSDK (for full content extraction), and deploy a practical competitive intelligence agent.


Google ADK Grounding: What the Built-In Tool Does and Does Not Do

Google ADK's google_search tool (via GoogleSearchRetrieval or the google_search tool configuration) gives your agent access to the Google Search API. When the agent decides to search, it submits a query and receives search result snippets — the same format you see in a browser search results page.

What this enables:

  • Discovery of relevant URLs given a natural language query
  • Access to content from across the public web
  • Up-to-date search index (typically 1–7 days freshness)

What this does not enable:

  • Reading the full content of a specific URL
  • Structured data extraction from a page
  • Monitoring specific pages for changes over time
  • Access to content that Google has not indexed (low-traffic pages, pages requiring JavaScript rendering, recently published content)
  • Guaranteed coverage of specific competitor pages

The solution is a hybrid grounding architecture: Google Search for discovery, KnowledgeSDK for full content extraction from specific URLs.

Grounding Mode Discovery Full Content Structured Data Change Monitoring
google_search built-in Excellent No (snippets only) No No
KnowledgeSDK FunctionTool No Yes Yes Yes (webhooks)
Hybrid (both) Excellent Yes Yes Yes

Setup

Install the required packages:

pip install google-adk knowledgesdk openai

Set your environment variables:

export GOOGLE_API_KEY="your_google_api_key"
export KNOWLEDGESDK_API_KEY="knowledgesdk_live_your_key"

Step 1: Define the KnowledgeSDK FunctionTool

In Google ADK, custom tools are Python functions decorated or wrapped to expose their signature and docstring to the model. The model reads the function signature and description to decide when and how to call each tool.

import os
from typing import Optional
from knowledgesdk import KnowledgeSDK

knowledge_client = KnowledgeSDK(api_key=os.environ["KNOWLEDGESDK_API_KEY"])

def extract_url_content(
    url: str,
    extract_fields: Optional[str] = None,
) -> dict:
    """
    Fetch and extract the full content of a specific URL.
    Use this tool when you have a specific URL you need to read in full,
    including pricing pages, competitor feature lists, and documentation.

    Args:
        url: The full URL to fetch and extract content from.
        extract_fields: Optional comma-separated list of specific fields to extract,
                        e.g. "product name, pricing plans, features list".
                        If not provided, returns full page content as markdown.

    Returns:
        A dict with 'content' (markdown text), 'title', 'url', and optionally 'data'
        (structured extracted fields if extract_fields was provided).
    """
    try:
        if extract_fields:
            result = knowledge_client.extract(
                url=url,
                description=f"Extract the following information: {extract_fields}",
            )
            return {
                "url": url,
                "title": getattr(result, "title", url),
                "data": result.data,
                "content": str(result.data),
                "success": True,
            }
        else:
            result = knowledge_client.scrape(url=url)
            return {
                "url": url,
                "title": getattr(result, "title", url),
                "content": result.markdown,
                "success": True,
            }
    except Exception as e:
        return {
            "url": url,
            "content": "",
            "success": False,
            "error": str(e),
        }


def search_extracted_knowledge(
    query: str,
    limit: int = 5,
) -> dict:
    """
    Search across all previously extracted web content using semantic search.
    Use this when you want to find relevant information from pages you have
    already extracted, without fetching them again.

    Args:
        query: Natural language search query.
        limit: Maximum number of results to return (default 5).

    Returns:
        A dict with 'results' list, each containing 'content', 'url', and 'score'.
    """
    try:
        results = knowledge_client.search(query=query, limit=limit)
        return {
            "results": [
                {
                    "content": item.content,
                    "url": item.url,
                    "score": item.score,
                }
                for item in results.items
            ],
            "total": len(results.items),
            "success": True,
        }
    except Exception as e:
        return {"results": [], "success": False, "error": str(e)}


def screenshot_url(url: str) -> dict:
    """
    Take a screenshot of a URL and return it as a base64 PNG.
    Use this for pages with visual content like charts, dashboards,
    or complex layouts that do not convert well to text.

    Args:
        url: The full URL to screenshot.

    Returns:
        A dict with 'image' (base64 PNG string) and 'url'.
    """
    try:
        result = knowledge_client.screenshot(url=url)
        return {
            "url": url,
            "image": result.image,
            "success": True,
        }
    except Exception as e:
        return {"url": url, "success": False, "error": str(e)}

Step 2: Register the Tools and Build the Agent

from google.adk.agents import Agent
from google.adk.tools import FunctionTool, google_search

# Wrap the KnowledgeSDK functions as ADK FunctionTools
extract_tool = FunctionTool(func=extract_url_content)
search_knowledge_tool = FunctionTool(func=search_extracted_knowledge)
screenshot_tool = FunctionTool(func=screenshot_url)

# Create the agent with both Google Search and KnowledgeSDK tools
competitive_intelligence_agent = Agent(
    name="competitive_intelligence_agent",
    model="gemini-2.0-flash",
    description=(
        "A competitive intelligence agent that monitors competitor websites, "
        "extracts pricing and feature information, and provides analysis."
    ),
    instruction="""You are a competitive intelligence analyst agent.

Your job is to research and monitor competitor websites to extract pricing,
features, and positioning information.

Tool usage guidelines:
- Use google_search when you need to DISCOVER relevant URLs or find general information
- Use extract_url_content when you have a SPECIFIC URL you need to read in full
- Use search_extracted_knowledge when looking for information from pages already extracted
- Use screenshot_url only for pages with charts, graphs, or complex visual layouts

When analyzing competitors:
1. Start with google_search to discover their main pages
2. Use extract_url_content on their pricing page with extract_fields="pricing plans, prices, features per plan"
3. Use extract_url_content on their features/integrations page
4. Synthesize the information into a clear comparison

Always cite the specific URLs you retrieved information from.""",
    tools=[
        google_search,        # Built-in Google Search for discovery
        extract_tool,         # KnowledgeSDK for full content extraction
        search_knowledge_tool, # KnowledgeSDK semantic search over past extractions
        screenshot_tool,      # KnowledgeSDK for visual pages
    ],
)

Step 3: Run the Agent

from google.adk.runners import Runner
from google.adk.sessions import InMemorySessionService
from google.genai.types import Content, Part

session_service = InMemorySessionService()
runner = Runner(
    agent=competitive_intelligence_agent,
    app_name="competitive_intelligence",
    session_service=session_service,
)

async def run_agent(query: str, session_id: str = "default") -> str:
    """Run the agent and return the final response."""
    session = await session_service.create_session(
        app_name="competitive_intelligence",
        user_id="user_001",
        session_id=session_id,
    )

    content = Content(role="user", parts=[Part(text=query)])
    final_response = ""

    async for event in runner.run_async(
        user_id="user_001",
        session_id=session_id,
        new_message=content,
    ):
        if event.is_final_response():
            final_response = event.content.parts[0].text

    return final_response

# Example queries
import asyncio

async def main():
    # Discovery + extraction combined
    result = await run_agent(
        "Research the pricing and main features of Linear.app and compare them "
        "to Jira's pricing. I need specific plan names, prices, and key differentiators."
    )
    print(result)

asyncio.run(main())

Step 4: The Competitive Intelligence Agent — Full Use Case

Here is a production-ready agent that monitors 20 competitor pricing pages, stores the extracted data, and detects changes. This is the real-world use case where the combination of Google Search and KnowledgeSDK is most valuable.

import asyncio
import json
from datetime import datetime
from pathlib import Path
from google.adk.agents import Agent
from google.adk.tools import FunctionTool, google_search
from google.adk.runners import Runner
from google.adk.sessions import InMemorySessionService
from google.genai.types import Content, Part
from knowledgesdk import KnowledgeSDK

knowledge_client = KnowledgeSDK(api_key=os.environ["KNOWLEDGESDK_API_KEY"])

# List of competitor pricing pages to monitor
COMPETITOR_URLS = [
    "https://linear.app/pricing",
    "https://www.notion.so/pricing",
    "https://monday.com/pricing",
    "https://asana.com/pricing",
    "https://clickup.com/pricing",
    # ... up to 20 URLs
]

PRICING_SCHEMA_DESCRIPTION = (
    "Extract all pricing plans with their names, monthly price, annual price, "
    "currency, key features list, user limits, storage limits, and whether "
    "there is a free tier. Also extract the page title and last updated date if visible."
)

# Storage for extracted pricing data
pricing_data_path = Path("./competitor_pricing.json")

def load_existing_data() -> dict:
    if pricing_data_path.exists():
        return json.loads(pricing_data_path.read_text())
    return {}

def save_pricing_data(data: dict):
    pricing_data_path.write_text(json.dumps(data, indent=2))

def extract_competitor_pricing(url: str) -> dict:
    """Extract pricing from a competitor URL."""
    try:
        result = knowledge_client.extract(
            url=url,
            description=PRICING_SCHEMA_DESCRIPTION,
        )
        return {
            "url": url,
            "data": result.data,
            "extracted_at": datetime.utcnow().isoformat(),
            "success": True,
        }
    except Exception as e:
        return {"url": url, "success": False, "error": str(e)}

def detect_pricing_changes(old_data: dict, new_data: dict) -> list[dict]:
    """Compare old and new pricing data to detect changes."""
    changes = []
    for url in new_data:
        if url not in old_data:
            changes.append({"url": url, "type": "new_competitor"})
            continue

        old = json.dumps(old_data[url].get("data", {}), sort_keys=True)
        new = json.dumps(new_data[url].get("data", {}), sort_keys=True)

        if old != new:
            changes.append({
                "url": url,
                "type": "pricing_changed",
                "old_data": old_data[url].get("data"),
                "new_data": new_data[url].get("data"),
            })
    return changes

async def run_monitoring_cycle():
    """Run one full monitoring cycle across all competitor URLs."""
    existing_data = load_existing_data()
    new_data = {}

    print(f"Monitoring {len(COMPETITOR_URLS)} competitor pricing pages...")

    for url in COMPETITOR_URLS:
        print(f"  Extracting: {url}")
        result = extract_competitor_pricing(url)
        new_data[url] = result

    # Detect changes
    changes = detect_pricing_changes(existing_data, new_data)

    if changes:
        print(f"\nDetected {len(changes)} pricing changes:")
        for change in changes:
            print(f"  - {change['url']}: {change['type']}")

        # Save updated data
        save_pricing_data(new_data)

        # Use the agent to generate a change summary
        change_summary = await generate_change_report(changes)
        return change_summary
    else:
        print("No pricing changes detected.")
        save_pricing_data(new_data)
        return "No changes detected in this monitoring cycle."

def generate_change_summary_tool(changes_json: str) -> dict:
    """
    Analyze and summarize pricing changes between monitoring cycles.

    Args:
        changes_json: JSON string containing list of detected pricing changes.

    Returns:
        A dict with 'summary' and 'recommendations'.
    """
    return {"changes": json.loads(changes_json), "status": "provided for analysis"}

async def generate_change_report(changes: list[dict]) -> str:
    """Use the agent to generate a human-readable change report."""
    session_service = InMemorySessionService()
    summary_tool = FunctionTool(func=generate_change_summary_tool)

    report_agent = Agent(
        name="pricing_change_reporter",
        model="gemini-2.0-flash",
        instruction="""You are a competitive pricing analyst. Given detected pricing changes,
        provide a clear executive summary of what changed, which competitors raised or lowered
        prices, and strategic recommendations for the product team.""",
        tools=[summary_tool],
    )

    runner = Runner(
        agent=report_agent,
        app_name="pricing_monitor",
        session_service=session_service,
    )

    session = await session_service.create_session(
        app_name="pricing_monitor",
        user_id="system",
        session_id="report_session",
    )

    changes_text = json.dumps(changes, indent=2)
    content = Content(
        role="user",
        parts=[Part(text=f"Analyze these pricing changes and provide a report:\n{changes_text}")]
    )

    final_response = ""
    async for event in runner.run_async(
        user_id="system",
        session_id="report_session",
        new_message=content,
    ):
        if event.is_final_response():
            final_response = event.content.parts[0].text

    return final_response

# Run the monitoring cycle
if __name__ == "__main__":
    asyncio.run(run_monitoring_cycle())

When to Use Each Grounding Mode

The decision between Google Search grounding and KnowledgeSDK custom grounding comes down to the type of question your agent needs to answer:

Question Type Use Google Search Use KnowledgeSDK
"What is the latest news about X?" Yes No
"What are all the integrations on Competitor.com/integrations?" No Yes
"Find me pricing pages for project management tools" Yes (discovery) Yes (extraction)
"Has Competitor X changed their pricing since last week?" No Yes (with webhook monitoring)
"What do users say about X on Reddit?" Yes No
"What are the exact plan limits on this specific pricing page?" No Yes
"Find documentation for API authentication" Yes (discovery) Yes (read full docs)

The hybrid pattern — Google Search for discovery, KnowledgeSDK for extraction — handles both categories. Your agent calls Google Search to find the right URLs, then calls KnowledgeSDK to read them in full and extract structured data.


Registering a Webhook for Automatic Change Detection

For production monitoring, set up KnowledgeSDK webhooks so you are notified of changes without running a polling cycle:

import os
from knowledgesdk import KnowledgeSDK

knowledge_client = KnowledgeSDK(api_key=os.environ["KNOWLEDGESDK_API_KEY"])

# Register a webhook for each competitor URL
for url in COMPETITOR_URLS:
    webhook = knowledge_client.webhooks.create(
        url="https://your-app.com/webhooks/pricing-change",
        events=["page.changed"],
        metadata={
            "watch_url": url,
            "monitor_type": "competitor_pricing",
        },
    )
    print(f"Registered webhook {webhook.id} for {url}")

# Your webhook handler (FastAPI example)
from fastapi import FastAPI, Request

app = FastAPI()

@app.post("/webhooks/pricing-change")
async def handle_pricing_change(request: Request):
    payload = await request.json()
    changed_url = payload["metadata"]["watch_url"]

    # Re-extract the changed page
    result = extract_competitor_pricing(changed_url)
    existing_data = load_existing_data()

    changes = detect_pricing_changes(
        {changed_url: existing_data.get(changed_url, {})},
        {changed_url: result},
    )

    if changes:
        report = await generate_change_report(changes)
        # Send to Slack, email, or your alerting system
        await notify_team(report)

    return {"status": "processed"}

Conclusion

Google ADK's built-in google_search grounding is a powerful discovery tool. But discovery is only half the problem. For any agent that needs to read specific URLs, extract structured data, or monitor content over time, you need a second grounding layer.

KnowledgeSDK fills exactly this gap: register it as a FunctionTool in your ADK agent, and your agent gains the ability to read any URL in full, extract structured JSON, and detect changes via webhooks. Combined with Google Search for discovery, you have a complete grounding stack that handles both "find relevant pages" and "read those pages completely."

The competitive intelligence use case — monitoring 20 competitor pricing pages and generating change reports automatically — is deployable in an afternoon with the code in this tutorial.

Ready to add private grounding to your Google ADK agent? Start for free at knowledgesdk.com — 1,000 extractions per month on the free tier, no credit card required.

Try it now

Scrape, search, and monitor any website with one API.

Get your API key in 30 seconds. First 1,000 requests free.

GET API KEY →

Related Articles

integration

Google ADK Web Scraping: Custom Grounding Beyond Google Search

integration

DSPy + Web Scraping: Optimize Your Retrieval Prompts Automatically

integration

Web Scraping with Haystack: Build a Live RAG Pipeline with KnowledgeSDK

integration

LangGraph Web Scraping: Build a Stateful Web Research Agent

← Back to blog