OSINT Definition Open Source Intelligence (OSINT) is the disciplined collection, evaluation, and reporting of information from public or legally accessible sources. It is the practice of turning observable public artifacts — search results, certificates, archives, leaks, registries, images, public profiles — into evidence-backed answers to a specific question. Why it matters OSINT turns scattered public clues into usable context. In cybersecurity, it maps companies, domains, technologies, exposed documents, public identities, breach clues, and attack surface without sending packets at the target. It is also the only recon mode that is safe to run before authorization is in place: nothing here causes load, errors, or alerts on the target's infrastructure. The important distinction: OSINT is not "anything found online." It is evidence-backed analysis with a question, scoped sources, source-quality grading, ethical limits, and a defensible conclusion. A folder of screenshots is collection; a triaged answer with provenance is intelligence. The same skill is also the strongest defensive primitive a small team has. Running OSINT against your own organization shows what an attacker can already infer for free — leaked subdomains, stale documents, source maps, exposed buckets, employee directories — before any active testing happens. How it works OSINT follows 5 stages that should always run in this order. Skipping any stage is the most common cause of bad OSINT. Question. Define exactly what you are trying to learn. "Map the company's public attack surface" is workable. "Find dirt on this person" is not — it has no scope, no stop condition, and no ethical limit. A good question fits in one sentence and names a deliverable. Collection. Gather public-source clues against the question. Stay strictly passive: no logins, no port scans, no probes that change target state. Record every source URL and timestamp at the moment of collection — public data drifts. Triage. Separate signal from noise. Move every lead into verified / likely / uncertain / noise / sensitive (see osint-triage). The point of triage is to decide what each lead actually proves and what it does not prove yet. Corroboration. Confirm important claims with at least one independent source before reporting them. Identity collisions, stale archives, and tool-output overtrust are easy to commit and hard to notice without this step. Reporting. Preserve evidence, confidence labels, source URLs, timestamps, scope limits, and concrete next actions. A report that cannot be re-walked by a second analyst is not finished. There is no exploit payload. The core skill is turning public data into defensible conclusions without overclaiming, and the deliverable is a report another analyst can audit. A small worked example: Question: Does example.com expose forgotten subdomains? Stage 1: scoped to apex domain example.com and known sibling brands. Stage 2: pull crt.sh certificate transparency results, archive.org snapshots, public DNS. Stage 3: triage 47 names → 12 verified live, 9 likely-stale, 3 collision (sibling brand), 23 noise. Stage 4: corroborate "stale" by HTTP head against owned probe + archive.org last-seen. Stage 5: report 9 stale names with provenance, suggest takedown or claim verification. Techniques / patterns Each technique pairs with concrete public sources. Use the registry, not random tool lists. Search and archive. Search engines (Google, Bing, DuckDuckGo, Yandex), advanced operators, Google dorking via GHDB, archive.org Wayback Machine, archive.today, common-crawl. Documents and metadata. Public PDFs/DOCs found via filetype: operators, EXIF metadata via image OSINT, package registries (npm, PyPI, Maven), GitHub/GitLab code search, source maps and .well-known paths. People and accounts. Company pages, conference bios, LinkedIn job postings, public commit emails, conference speaker lists; covered with ethics framing in social-media-osint and email-and-phone-osint. Breach and leak signals. Have I Been Pwned, public dump listings, paste sites, and credential-exposure feeds; covered in breach-and-leak-intelligence. Image, video, location. Reverse image search, EXIF, georeferencing, sun-angle/shadow analysis, Mapillary; covered in image-and-location-osint. Domain, DNS, certificate, registration. WHOIS/RDAP, certificate transparency (crt.sh), DNS history (SecurityTrails, ViewDNS), ASN/BGP records, DNSSEC posture; covered in company-osint and passive recon. Variants and bypasses OSINT has 5 working modes. Choose the mode that matches the question; do not blend them. 1. Cyber OSINT Focus: assets, technologies, exposure, leaked secrets, attack surface. Inputs are domains, certificates, source maps, package metadata, GitHub leaks, archive snapshots. Output is an evidence-graded asset inventory and an exposure list. The handoff is into external attack surface and active recon. 2. Company OSINT Focus: brand, ownership, subsidiaries, vendors, products, legal entities, public footprint. Inputs are corporate registries, press releases, job postings, vendor announcements, certificate organization fields. Output is an ownership map that drives scope validation — knowing who owns a domain or asset matters before any test goes live (see scope validation). 3. People OSINT Focus: public identity clues that connect a person to a role, account, or capability. Inputs are conference bios, public commits, public profiles, breach listings tied to email addresses. Strongest ethical boundaries apply here: clear purpose, legal basis, minimization, retention limit, and no aggregation that creates harm beyond the original question. Default to the lightest-touch evidence that answers the question. 4. Media and location OSINT Focus: where, when, and who from images, video, audio, or environmental clues. Inputs are EXIF metadata, reverse image search, landmarks, language/license plate cues, sun position, and street imagery. Output is a corroborated time/place/person claim with confidence and limits, never a single-source assertion. 5. Threat intelligence OSINT Focus: tracking adversary infrastructure, indicators, and campaigns through public sources. Inputs are vendor blogs, public IOC feeds, MISP/ATT&CK mappings, certificate reuse, passive DNS, and public sandbox results. Output is contextual indicators tied to the organization's exposure, not a generic IOC dump. Impact Ordered roughly by severity: Attack surface discovery. OSINT reveals assets, endpoints, and ownership before any active probe — often the single largest source of exposure for under-funded teams. Scope clarity. Company and ownership clues prevent wrong-target testing during pentests and bug bounty work. Exposure discovery. Public documents, leaks, source maps, and metadata reveal sensitive context (internal hostnames, customer data, credentials) that the organization did not know was public. Better testing strategy. Stack and route clues from passive recon make later active testing faster and quieter. Defensive awareness. Teams learn what outsiders can already infer for free, which sharpens hardening priorities and incident response posture. Detection and defense OSINT against your own organization is itself a defense. Order is by what changes the most exposure for the least effort. Run OSINT against your own organization. Defensive OSINT shows what public sources expose before attackers use it. Repeat it on a cadence (quarterly minimum) because public data drifts; new certs, new repos, new vendors, new docs change the picture. Grade source reliability and confidence. Public clues are uneven. Mark every claim as verified, likely, uncertain, stale, or noise, and require corroboration before acting on sensitive conclusions. The label is the educational payload — a verified claim and a likely claim drive different decisions. Minimize collection of personal data. People-focused OSINT must have a clear purpose, legal basis, minimization rule, and retention limit. Default to the lightest-touch evidence that answers the question; do not aggregate beyond scope. Clean up avoidable public exposure. Stale subdomains, stale docs, stripped EXIF on outbound images, removed source maps, redacted metadata in PDFs, secret-scanning on public repos, and credential rotation after breach mentions are concrete, bounded fixes. Turn findings into inventory, training, or remediation. OSINT is only useful when it changes decisions. Tie reports to a tracked inventory item, a training change, or a remediation ticket — not to a Slack screenshot. What does not work as a primary defense Assuming "public" means "harmless." Public clues compose; an org chart plus a job post plus a cert SAN is sensitive even if each piece is not. Assuming old data is useless. Archives and stale records often expose patterns still true today (naming conventions, vendor relationships, internal terminology). Collecting everything. Unfocused OSINT creates noise, privacy risk, and analyst fatigue. Every collected item should answer the question. Single-source conclusions. Important claims need at least one independent corroborating source. Robots.txt and noindex. They reduce indexing pressure, not exposure. The asset is still public. Practical labs Use your own name/domain/company, an authorized engagement, or an intentionally chosen public training target. Stay strictly passive — none of these labs should send any packets at non-owned infrastructure. Define the OSINT question first Question: "Map example.com's public subdomain footprint and flag stale entries." Allowed sources: crt.sh, archive.org, public DNS, public WHOIS/RDAP. Out-of-scope: any HTTP request to non-owned hosts; any login attempt. Evidence standard: >=2 independent sources for any "live" claim. Stop condition: all certificate-transparency names triaged into 5 buckets. A scoped question is the difference between an investigation and a link-pile. Pull certificate transparency names curl -s 'https://crt.sh/?q=%25.example.com&output=json' \ | jq -r '.[].name_value' | tr ',' '\n' | sort -u Certificate transparency is the single highest-signal passive source for subdomain discovery — every public TLS cert appears here. Capture public DNS without active scanning dig +short ANY example.com dig +short txt example.com dig +short mx example.com Inspect from your own resolver. This is passive lookup, not authoritative probing of the target. Inspect archive snapshots for stale assets curl -s "https://web.archive.org/cdx/search/cdx?url=example.com/*&output=json&limit=200" \ | jq -r '.[1:] | .[] | [.[1], .[2]] | @tsv' Archive.org reveals paths and subdomains that no longer respond. Stale assets often outlive the team that built them. Source-table every claim before reporting claim | source | timestamp | confidence | corroboration | next action api-staging.example.com exists | crt.sh cert #98231 | 2026-04-29T18:00Z | likely | archive.org 2024-08 | http-head probe (owned scope) old-blog.example.com exists | archive.org snapshot | 2026-04-29T18:01Z | stale | none | triage as noise This is the artifact that turns "I found a thing" into a report another analyst can audit. Compare passive vs active before each action Search result reading: passive Certificate transparency lookup: passive WHOIS/RDAP query: passive HTTP request to target host: active Port scan / banner grab: active Login attempt or credential use: active and intrusive Keep OSINT strictly passive; the boundary into active recon is the moment you owe the target a notification. Practical examples Public certificates reveal forgotten staging or admin subdomains long after the original project ends. Job postings reveal cloud provider, framework, and tooling choices that narrow active recon. Public documents (PDF, DOCX) carry author names, internal project labels, and template artifacts in metadata. Search operators (filetype:, inurl:, intitle:) surface exposed internal documents and old admin pages. Breach indicators tied to corporate emails suggest credential-rotation and MFA-enforcement priorities. Public source maps from production frontends reveal route names, API paths, and internal module names. Related notes osint-triage search-engine-operators google-dorking breach-and-leak-intelligence company-osint Passive Recon External Attack Surface Suggested future atomic notes osint-opsec source-reliability-grading historical-internet-artifacts public-document-metadata threat-intelligence-osint osint-legal-and-ethical-framework References Foundational: OSINT Framework — https://osintframework.com/ Foundational: Bellingcat Online Investigation Toolkit — https://bellingcat.gitbook.io/toolkit Foundational: OWASP WSTG information gathering — https://owasp.org/www-project-web-security-testing-guide/latest/ ← PreviousImage and Location OSINTNext →OSINT Reporting Explore nearby notes OSINTBreach and Leak IntelligenceBreach and leak intelligence is the OSINT practice of identifying public indicators that accounts, domains, credentials, documents, code, or systems may have... OSINTCompany OSINTCompany OSINT is the use of public sources to understand an organization's brands, domains, products, subsidiaries, vendors, technologies, public people context... OSINTEmail and Phone OSINTEmail and phone OSINT is the collection and validation of public email addresses, phone numbers, contact patterns, and account-exposure clues for a scoped security... OSINTGoogle DorkingGoogle dorking is the use of search-engine operators and exposure-shaped query patterns to find sensitive, misconfigured, indexed, or security-relevant public... OSINTImage and Location OSINTImage and location OSINT is the analysis of public images, videos, embedded metadata, maps, landmarks, shadows, signs, and environmental clues to infer where... OSINTOSINT ReportingOSINT reporting is the process of turning collection, triage, and analysis into a clear, evidence-backed report with **separated facts and inferences, confidence...