LinuxCommandLibrary

katana

Web security reconnaissance and information gathering

TLDR

Crawl a list of URLs

$ katana -list [https://example.com,https://google.com,...]
copy

Crawl a [u]RL using headless mode using Chromium
$ katana -u [https://example.com] [[-hl|-headless]]
copy

Pass requests through a proxy (http/socks5) and use custom headers from a file
$ katana -proxy [http://127.0.0.1:8080] [[-H|-headers]] [path/to/headers.txt] -u [https://example.com]
copy

Specify the crawling strategy, depth of subdirectories to crawl, and rate limiting (requests per second)
$ katana [[-s|-strategy]] [depth-first|breadth-first] [[-d|-depth]] [value] [[-rl|-rate-limit]] [value] -u [https://example.com]
copy

Find subdomains using subfinder, crawl each for a maximum number of seconds, and write results to an output file
$ subfinder [[-dL|-list]] [path/to/domains.txt] | katana [[-ct|-crawl-duration]] [value] [[-o|-output]] [path/to/output.txt]
copy

SYNOPSIS

katana [flags] [target]

PARAMETERS

-u, --url string[]
    Target URL(s) or FILE to crawl

-l, --list string
    File containing list of URLs to process

-o, --output string
    Output file/path to write results

-format string
    Output format: json, csv, md, html (default json)

-silent
    Display only results in output, suppress logs

-c, --concurrency int
    Max concurrent fetchers (default 10)

-d, --depth int
    Maximum crawl depth (default 3)

-rl, --random-agent
    Use random user-agent for requests

-sc, --system-chrome
    Use system Chrome for JS parsing

-jsl, --js-extraction-limit int
    Max JS file size to parse in KB (default 5)

-ef, --exclude-files string[]
    Exclude files by extension

-field string
    Scope crawling to specific form field

-hc, --http2
    Enable HTTP/2 support

-rt, --retries int
    Max request retries (default 1)

-timeout int
    Request timeout in seconds (default 10)

DESCRIPTION

Katana is a high-performance, next-generation crawling and spidering framework designed by ProjectDiscovery for web reconnaissance and security testing. It efficiently discovers hidden endpoints, files, directories, and JavaScript-generated paths on web applications.

Key features include scope-based crawling to stay within defined boundaries, support for headless Chrome rendering to execute JavaScript, rate-limiting, customizable headers, and output in JSON, Markdown, or plain text formats. Katana handles large-scale crawling with concurrency controls, depth limits, and filtering by status codes, content types, or extensions.

Ideal for bug bounty hunters, penetration testers, and developers, it integrates seamlessly into CI/CD pipelines or toolchains like Nuclei and Httpx. Its speed outperforms traditional tools like Dirsearch or Gobuster, making it suitable for modern web apps with SPAs and APIs.

CAVEATS

Third-party tool, not in standard Linux repos; install via Go or binaries.
Resource-intensive at high concurrency; respect rate limits and robots.txt (can disable with -no-robots).
May generate traffic—use ethically.

INSTALLATION

go install -v github.com/projectdiscovery/katana/cmd/katana@latest
Or download binaries from GitHub releases.

EXAMPLE

katana -u https://example.com -o results.txt -d 4 -c 20
Crawls example.com to depth 4 with 20 concurrent threads.

HISTORY

Developed by ProjectDiscovery and released in 2022. Evolved from community needs for faster crawling beyond Hakrawler/Gospider. Actively maintained with v1.0+ focusing on JS support and performance; widely adopted in bug bounty and red team workflows.

SEE ALSO

curl(1), wget(1), lynx(1)

Copied to clipboard