Discovery API

The Discovery API allows you to programmatically crawl web applications, detect user flows, and identify UI patterns and API endpoints.

Endpoints

Start Discovery Session

Begin a web application discovery crawl

POST /discovery/start

Request Body:

{
  "project_id": "proj_abc123",
  "target_url": "https://example.com",
  "max_pages": 100,
  "max_depth": 5,
  "include_patterns": ["/app/*", "/dashboard/*"],
  "exclude_patterns": ["/api/*", "/admin/*"],
  "headless": true,
  "wait_for_navigation": true,
  "enable_ai_analysis": false
}

Parameters:

ParameterTypeRequiredDescription
project_idstringYesProject ID
target_urlstringYesStarting URL
max_pagesintegerNoMaximum pages to crawl (default: 100)
max_depthintegerNoMaximum link depth (default: 5)
include_patternsarrayNoURL patterns to include (glob patterns)
exclude_patternsarrayNoURL patterns to exclude
headlessbooleanNoRun browser headless (default: true)
wait_for_navigationbooleanNoWait for navigation events (default: true)
enable_ai_analysisbooleanNoEnable AI pattern analysis (default: false)

Example Request:

curl -X POST https://api.bugbrain.tech/api/v1/discovery/start \
  -H "Authorization: Bearer $API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "project_id": "proj_abc123",
    "target_url": "https://example.com",
    "max_pages": 50,
    "exclude_patterns": ["/admin/*", "/api/*"]
  }'

Response:

{
  "session_id": "disc_abc123xyz789",
  "project_id": "proj_abc123",
  "target_url": "https://example.com",
  "status": "running",
  "started_at": "2025-03-08T10:30:00Z",
  "estimated_completion": "2025-03-08T10:40:00Z"
}

Get Discovery Progress

Monitor ongoing crawl progress

GET /discovery/{session_id}/progress

Example Request:

curl https://api.bugbrain.tech/api/v1/discovery/disc_abc123xyz789/progress \
  -H "Authorization: Bearer $API_KEY"

Response:

{
  "session_id": "disc_abc123xyz789",
  "status": "running",
  "progress": {
    "percent": 35,
    "pages_discovered": 35,
    "pages_crawled": 35,
    "pages_pending": 65,
    "flows_detected": 3,
    "patterns_found": 12,
    "api_endpoints": 8
  },
  "current_page": "https://example.com/products",
  "elapsed_seconds": 120,
  "estimated_remaining_seconds": 240
}

Get Discovery Results

Retrieve completed crawl results

GET /discovery/{session_id}/results

Example Request:

curl https://api.bugbrain.tech/api/v1/discovery/disc_abc123xyz789/results \
  -H "Authorization: Bearer $API_KEY"

Response:

{
  "session_id": "disc_abc123xyz789",
  "status": "completed",
  "summary": {
    "total_pages": 45,
    "unique_flows": 5,
    "ui_patterns": 12,
    "api_endpoints": 8,
    "discovery_confidence": 0.92
  },
  "pages": [
    {
      "url": "https://example.com/",
      "title": "Home",
      "status_code": 200,
      "content_type": "text/html",
      "response_time_ms": 250,
      "links_found": 12,
      "forms_found": 2,
      "api_calls": 3
    }
  ],
  "flows": [
    {
      "flow_id": "flow_1",
      "type": "authentication",
      "pages": [
        "https://example.com/login",
        "https://example.com/dashboard"
      ],
      "steps": [
        {"action": "navigate", "url": "/login"},
        {"action": "fill_form", "selector": "#login-form"},
        {"action": "submit"},
        {"action": "navigate", "url": "/dashboard"}
      ]
    },
    {
      "flow_id": "flow_2",
      "type": "checkout",
      "pages": [
        "https://example.com/products",
        "https://example.com/cart",
        "https://example.com/checkout",
        "https://example.com/confirmation"
      ]
    }
  ],
  "patterns": [
    {
      "pattern_id": "pat_1",
      "type": "form",
      "occurrences": 8,
      "average_fields": 5,
      "examples": ["login-form", "search-form", "contact-form"]
    },
    {
      "pattern_id": "pat_2",
      "type": "navigation",
      "location": "header",
      "links": 6
    }
  ],
  "api_endpoints": [
    {
      "method": "GET",
      "endpoint": "/api/products",
      "response_time_ms": 125,
      "status_code": 200
    },
    {
      "method": "POST",
      "endpoint": "/api/cart/add",
      "response_time_ms": 200,
      "status_code": 201
    }
  ]
}

Pause Discovery

Temporarily pause a running crawl

POST /discovery/{session_id}/pause

Example Request:

curl -X POST https://api.bugbrain.tech/api/v1/discovery/disc_abc123xyz789/pause \
  -H "Authorization: Bearer $API_KEY"

Response:

{
  "session_id": "disc_abc123xyz789",
  "status": "paused",
  "pages_crawled": 30,
  "paused_at": "2025-03-08T10:35:00Z"
}

Resume Discovery

Resume a paused crawl

POST /discovery/{session_id}/resume

Example Request:

curl -X POST https://api.bugbrain.tech/api/v1/discovery/disc_abc123xyz789/resume \
  -H "Authorization: Bearer $API_KEY"

Response:

{
  "session_id": "disc_abc123xyz789",
  "status": "running",
  "resumed_at": "2025-03-08T10:36:00Z"
}

Cancel Discovery

Stop and cancel a crawl session

DELETE /discovery/{session_id}

Example Request:

curl -X DELETE https://api.bugbrain.tech/api/v1/discovery/disc_abc123xyz789 \
  -H "Authorization: Bearer $API_KEY"

Response:

{
  "session_id": "disc_abc123xyz789",
  "status": "cancelled",
  "cancelled_at": "2025-03-08T10:36:00Z",
  "pages_crawled": 30
}

Stream Discovery Events

Real-time event stream (Server-Sent Events)

GET /discovery/{session_id}/stream

Usage:

const eventSource = new EventSource(
  'https://api.bugbrain.tech/api/v1/discovery/disc_abc123xyz789/stream',
  {
    headers: {
      'Authorization': `Bearer ${apiKey}`
    }
  }
);
 
eventSource.addEventListener('page_discovered', (event) => {
  const page = JSON.parse(event.data);
  console.log(`Discovered: ${page.url}`);
});
 
eventSource.addEventListener('flow_detected', (event) => {
  const flow = JSON.parse(event.data);
  console.log(`Flow detected: ${flow.type}`);
});
 
eventSource.addEventListener('completion', (event) => {
  console.log('Discovery completed');
  eventSource.close();
});

Discovery Configuration

Include/Exclude Patterns

Use glob patterns to control which URLs are crawled:

{
  "include_patterns": [
    "/app/*",           // Match /app/anything
    "/dashboard/**",    // Match /dashboard/anything/recursively
    "*.pdf"             // Match any PDF
  ],
  "exclude_patterns": [
    "/api/*",           // Exclude API endpoints
    "/admin/**",        // Exclude admin section
    "*test*",          // Exclude URLs with "test"
    "*.zip"            // Exclude zip files
  ]
}

Rate Limiting

Discovery crawls are rate-limited to avoid overwhelming target servers:

  • Default: 1 request per 500ms
  • Configurable: Via request_delay_ms parameter
  • Respect robots.txt: Automatically honored

Flow Detection Types

TypeDescriptionExample Pages
authenticationLogin/logout flowslogin → dashboard
checkoutPurchase flowscart → checkout → confirmation
searchSearch and filter flowssearch → results
form_submissionForm interactionsform page → confirmation
navigationNavigation patternsmenu → subpages
crudCreate/Read/Update/Deletelist → detail → edit → delete

Example: Crawl and Generate Test Cases

Python:

import requests
import time
 
API_KEY = 'bugbrain_sk_prod_...'
API_URL = 'https://api.bugbrain.tech/api/v1'
 
headers = {'Authorization': f'Bearer {API_KEY}'}
 
# Start discovery
response = requests.post(
    f'{API_URL}/discovery/start',
    headers=headers,
    json={
        'project_id': 'proj_abc123',
        'target_url': 'https://example.com',
        'max_pages': 50,
        'enable_ai_analysis': False
    }
)
 
session_id = response.json()['session_id']
print(f"Started discovery: {session_id}")
 
# Poll until complete
while True:
    status = requests.get(
        f'{API_URL}/discovery/{session_id}/progress',
        headers=headers
    ).json()
 
    if status['status'] == 'completed':
        break
 
    progress = status['progress']
    print(f"Progress: {progress['percent']}% - "
          f"{progress['pages_discovered']} pages, "
          f"{progress['flows_detected']} flows")
 
    time.sleep(5)
 
# Get results
results = requests.get(
    f'{API_URL}/discovery/{session_id}/results',
    headers=headers
).json()
 
# Generate test cases from flows
for flow in results['flows']:
    test_case = {
        'project_id': 'proj_abc123',
        'name': f"{flow['type'].title()} Flow",
        'description': f"Auto-generated from discovery",
        'steps': convert_flow_to_steps(flow)
    }
    requests.post(
        f'{API_URL}/test-cases',
        headers=headers,
        json=test_case
    )
 
print(f"Created {len(results['flows'])} test cases from discovered flows")

Cost Optimization: Discovery crawls without AI analysis are much faster and cheaper. Enable enable_ai_analysis: true only when you need AI-generated flow descriptions.