Reset
LAT · LON · READY
Sightings Timeline
1900 — 2026 · 0 in view
All three charts reflect the current Observatory filter state. Change a filter on any tab and these cards re-tally instantly.

All sightings over time

Quality score over time (median per year)

Movement categories over time (yearly share)

Insights

Emotion & Sentiment Analysis

Sentiment Polarity

Emotion Distribution (7-Class)

GoEmotions Detail (28-Class)

Sentiment Score Distributions

Emotion Profile by Source

NRC Emotion Lexicon

Data Quality & Red Flags

Quality Score Distribution

Narrative Red Flags (keyword heuristic)

Movement & Shape

Movement Taxonomy

Shape × Movement Matrix (Top 10 shapes)

Ask AI about the data
Bring your own API key — chat happens in your browser, the key never touches our server.
Ask anything about the unified UFO database
Try:
  • "What are the most common shapes?"
  • "Show me triangle sightings in California in the 1970s"
  • "How many sightings happened in October 1973?"
  • "Which states report the most sightings?"
You'll need an API key from your provider — open Settings above.
Powered by MCP-compatible tools. Your API key is stored locally in your browser only.

Connect your own AI to the UFOSINT data

Every tool the website's chatbot has access to is also exposed via the Model Context Protocol at a single HTTPS endpoint, so any MCP-compatible AI client can query the unified UFO sightings database with your own model and your own subscription. The endpoint is read-only and free to use.

MCP endpoint

https://ufosint-explorer.azurewebsites.net/mcp

6 tools available: search_sightings, get_sighting, get_stats, get_timeline, find_duplicates_for, count_by.

Claude Code (CLI / Desktop App)

One command to connect from any project directory:

claude mcp add --transport http ufosint https://ufosint-explorer.azurewebsites.net/mcp

Restart your Claude Code session. The 6 UFOSINT tools will be available immediately. Remove later with claude mcp remove ufosint.

Claude Desktop

Open ~/Library/Application Support/Claude/claude_desktop_config.json (macOS) or %APPDATA%\Claude\claude_desktop_config.json (Windows) and add:

{
  "mcpServers": {
    "ufosint": {
      "url": "https://ufosint-explorer.azurewebsites.net/mcp",
      "transport": "http"
    }
  }
}

Restart Claude Desktop. The 6 UFOSINT tools will appear in the tools panel.

Cursor / Cline / Continue / Windsurf

These all support remote MCP servers. Add the same URL to your client's MCP configuration. Each client documents the exact location, but the JSON shape is the same.

Direct API (curl / Python / any HTTP client)

The endpoint is JSON-RPC 2.0 over HTTPS. List the tools:

curl -s https://ufosint-explorer.azurewebsites.net/mcp \
  -H 'Content-Type: application/json' \
  -d '{"jsonrpc":"2.0","id":1,"method":"tools/list"}'

Call a tool:

curl -s https://ufosint-explorer.azurewebsites.net/mcp \
  -H 'Content-Type: application/json' \
  -d '{
    "jsonrpc":"2.0",
    "id":2,
    "method":"tools/call",
    "params":{
      "name":"search_sightings",
      "arguments":{"q":"triangle","state":"CA","limit":5}
    }
  }'

OpenAI / OpenRouter function-calling format

If you're integrating with OpenAI or OpenRouter and want the tool definitions in their native format (instead of going through MCP), fetch:

GET https://ufosint-explorer.azurewebsites.net/api/tools-catalog

And invoke individual tools at:

POST https://ufosint-explorer.azurewebsites.net/api/tool/<tool_name>
Content-Type: application/json

{ "q": "triangle", "state": "CA", "limit": 5 }

Download the database (SQLite)

Want to run your own analysis, train models, or hack on the data offline? The full 508 MB SQLite snapshot is attached to every tagged release on GitHub — 614,505 deduplicated sightings, 502,985 with emotion analysis, and all derived columns.

curl -LO https://github.com/UFOSINT/ufosint-explorer/releases/latest/download/ufo_public.db

sqlite3 ufo_public.db "SELECT COUNT(*) FROM sighting;"
# 614505

See the Methodology tab for the full schema, derived-column definitions, and per-source licensing. Browse the releases page for older versions.

AI Discovery

This site exposes standard AI-readiness files so agents and LLMs can discover and understand the UFOSINT tools automatically:

Local stdio MCP server

Prefer to run an MCP server on your own machine? Clone the repo and use mcp_server.py:

git clone https://github.com/UFOSINT/ufosint-explorer
cd ufosint-explorer
pip install fastmcp psycopg[binary]
DATABASE_URL="postgresql://..." python mcp_server.py

Then point Claude Desktop at the local script via the command form of the MCP config. (You'll need read-only credentials to a PostgreSQL with the UFOSINT schema.)

All access is read-only. Source data is licensed by UFOSINT; the deduplicated database is built by the ufo-dedup pipeline.

Unified UFO Sightings Database — Methodology

This is not raw data. UFOSINT Explorer presents a processed scientific analysis of six major UFO/UAP databases — 618,316 sighting records deduplicated, cross-referenced, geocode-verified, quality-scored, LLM-enriched, movement-classified, and emotion-analyzed using four transformer models. Every step of the pipeline is documented below and can be independently replicated from the source data using the open-source ufo-dedup pipeline. The web application source code is at ufosint-explorer.

Download the full database The 628 MB SQLite snapshot (ufo_public.db) is attached to every tagged release. Download latest · Browse releases
Reproducibility statement. The entire database can be rebuilt from source files with a single command (python rebuild_db.py). All data quality fixes are idempotent and preserve original values. LLM-derived enrichments are cached to CSV files and replayed deterministically on rebuilds — no API key is needed for reproduction. Derived columns (quality scores, movement categories, emotion classifications) are computed deterministically from the raw narratives. No records are deleted — duplicates are flagged, not merged.

Pipeline Architecture

The 17-step pipeline transforms ~2.56 million raw records from 6 sources into a unified, analysis-ready database:

            Raw Data (5 CSVs + 1 JSON + Reddit CSV)
                |
                v
            rebuild_db.py  (17 steps, ~30 min)
                |
                +-- Steps 1-7:    Import 6 sources (618,316 sightings)
                +-- Step 8:       SQL data quality fixes (~30 corrections)
                +-- Step 9:       Geocode pass 1 (GeoNames offline gazetteer)
                +-- Step 10:      Audit: fix bad geocodes + replay LLM location fixes
                +-- Step 11:      Geocode pass 2 (picks up audit-improved locations)
                +-- Steps 12-14:  Enrich, dedup, sentiment analysis
                +-- Step 15:      Derived analysis (shapes, quality, duration, hoax)
                +-- Step 16:      Replay cached enrichments (emotions, LLM extractions)
                +-- Step 17:      Export
                |
                v
            ufo_public.db  (618,316 sightings, 100 columns, 628 MB)
                    

Source Databases

SourceRaw RecordsImportedSkippedDescription
UFOCAT 320,412197,108123,304 CUFOS UFOCAT 2023 catalog. Richest metadata: Hynek/Vallee classifications, lat/lon, witness counts, durations. 123K NUFORC-origin records (SOURCE=UFOReportCtr) skipped; metadata transferred via enrichment.
NUFORC 159,320159,3200 National UFO Reporting Center. Self-reported sightings with detailed free-text descriptions. Enriched post-import with 102K Hynek and 83K Vallee classifications from UFOCAT.
MUFON 138,310138,3100 Mutual UFO Network case reports. Short + long descriptions, investigator summaries.
UPDB 1,885,75765,0161,820,741 Unified Phenomena Database (phenomenAInon). 1.82M rows skipped (MUFON/NUFORC already imported from richer originals). Remaining 65K from UFODNA (38K), Blue Book (14K), NICAP (5.8K), etc.
UFO-search 54,75154,7510 Majestic Timeline compilation from ufo-search.com. Historical records from 19 source compilations (Hatch, Eberhart, NICAP, Vallee, etc.).
r/UFOs 3,8113,8110 Reddit r/UFOs community sighting reports. Three-pass pipeline: API scraping, LLM-structured extraction (Gemini Flash), database import. Includes anomaly assessments and strangeness ratings.

Total raw records across all sources: ~2.56 million. After removing known overlaps at import time: 618,316.

LLM-Powered Data Quality Pipeline

After initial import and geocoding, the pipeline runs an AI-powered data quality audit using Google Gemini Flash. Results are cached for deterministic replay on future rebuilds.

Audit TierWhat It DoesRecordsMethod
Tier A: Geocode Verification Detects map pins in the wrong country/hemisphere (e.g., "Nelson, Nebraska" geocoded to New Zealand) 29,029 fixed Code: US/CA state bounding-box validation
Tier B: Location Normalization Cleans 120K messy location strings for re-geocoding (e.g., "Toronto (Canada)" → city=Toronto, state=ON, country=CA) 49,190 improved LLM: Gemini Flash, cached to CSV
Field Extraction Extracts shape, color, duration, witnesses, sound, and direction from narrative descriptions that had missing structured fields 297,446 enriched LLM: Gemini Flash, cached to CSV

Impact: +48,503 new correct map pins, +38,347 records above the quality threshold, 574,269 new structured field values extracted from text.

Geocoding

Locations are geocoded offline using the GeoNames gazetteer (cities with population ≥15,000). Three matching strategies with decreasing specificity:

  1. Exact: city + state + country → coordinates (highest confidence)
  2. City + country: ignores state, picks largest city by population
  3. City only: global lookup, prefers matches in the expected country when a US/CA state code is present (v0.14 fix — prevents wrong-continent matches)

Coverage: 418,077 of 618,316 sightings (67.6%) have map coordinates. Two geocoding passes run: before and after the LLM location audit.

Quality Score (0–100)

Every sighting receives a quality score based on the richness and completeness of its data. Higher scores indicate more detailed, verifiable reports.

FeaturePointsNotes
Description length0 / 5 / 15 / 25None / <50 / <200 / 200+ characters
Has media (photo/video)+15
Number of witnesses0 / 5 / 10 / 150 / 1 / 2 / 3+
Movement mentioned+10 (+5 bonus)+5 if 2+ movement categories detected
9 structured fields3 pts each (max 27)time, shape, color, duration, sound, direction, elevation, Hynek, Vallee
Coordinates present+5
Specificity bonus+5Time-of-day, compass direction, or altitude in description
Unknown-date capmin(score, 15)Relaxed to 35 if 8+ features and has description

Distribution: 160,728 sightings (26%) score ≥60. Average score: 42.3. The structured fields row is where LLM extraction has the biggest impact — each newly extracted field adds 3 points.

Shape Normalization

Raw shape strings from 6 sources are normalized to 28 canonical shapes using a three-tier matcher:

  1. Exact match against canonical list (286,826 matches)
  2. Substring/alias match against 70+ extended mappings — e.g., "ovoid" → Oval, "V-shape" → Chevron, "torpedo" → Cigar (52,448 matches)
  3. Fuzzy match via rapidfuzz at ≥85% similarity (41 matches)

28 canonical shapes: Sphere, Disc, Triangle, Cigar, Oval, Circle, Light, Fireball, Cylinder, Diamond, Rectangle, Chevron, Cross, Teardrop, Star, Egg, Cone, Cube, Saucer, Boomerang, Flash, Formation, Changing, Crescent, Cloud, Dome, Unknown, Other.

Coverage: 343,602 sightings (55.6%) have a standardized shape.

Duration Parsing

Raw duration strings are parsed to seconds using a multi-strategy parser:

  • Natural language: "5 minutes" → 300s, "about 2 hours" → 7200s
  • UFOCAT codes: "B" (brief) → 3s, "M" (medium) → 120s, "H" (hour) → 3600s
  • Bare numbers: "5" → 300s (UFOCAT convention: decimal minutes)
  • Ranges: "5-10 minutes" → 450s (midpoint)

Parsed durations are bucketed: instant (<5s), seconds (<60s), minutes (<1h), hours (<24h), days (≥24h).

Coverage: 232,438 sightings (37.6%) have parsed durations.

Emotion & Sentiment Analysis

Four models classify the emotional content of sighting narratives:

ModelOutputCoverage
GoEmotions 28-class Dominant emotion label + sentiment group (positive/negative/ambiguous/neutral) 506,788
7-class RoBERTa Emotion label (surprise/fear/neutral/anger/disgust/sadness/joy) + full softmax probabilities 506,788
RoBERTa Sentiment Compound sentiment score (-1.0 to +1.0) 506,788
VADER + NRC Lexicon VADER compound score + 10 NRC emotion word counts (joy, fear, anger, sadness, surprise, disgust, trust, anticipation, positive, negative) 506,788

Nuclear Proximity Analysis

Haversine distance computed from every geocoded sighting to the nearest of 50 nuclear-relevant facilities (military bases, national labs, nuclear test sites, weapons storage areas).

Results: 35,203 sightings within 50 km of a nuclear site. 69,912 within 100 km.

Supplemented by 14 crash-retrieval cases and 35 nuclear encounter records from the UAP Gerb research bundle.

Movement & Behavior Classification

Regex-based extraction from narrative text identifies 10 movement categories and 14 behavior tags:

  • Movement: hovering, linear, erratic, accelerating, rotating, ascending, descending, vanished, followed, landed
  • Behavior: hovering, silent, bright, pulsing, rotating, zigzag, vanished, accelerated, split, merged, formation, chased, followed, landed

Coverage: 250,820 sightings (40.6%) have at least one movement category.

Cross-Source Deduplication

A three-tier engine flags potential duplicates across sources. No records are deleted — duplicates are flagged for review.

TierMethodPairs Flagged
1MUFON ↔ NUFORC: date + city + state exact match~48K
2All remaining cross-source pairs: date + city + state~69K
3Date-only matches with description fuzzy similarity ≥60%~10K

Total: 126,729 duplicate candidate pairs across 618,316 sightings.

Content Policy & Privacy

The public database (ufo_public.db) contains only derived, non-copyrighted fields. Raw narrative text from NUFORC, MUFON, UFOCAT, UPDB, and UFO-search is stripped during export. Reddit sighting descriptions are LLM-generated summaries (transformative derivative works), not original Reddit posts.

No personally identifiable information (PII) is published. The witness_names column is NULL in the public export.

How to Reproduce This Database

Full instructions: docs/PIPELINE.md in the ufo-dedup repository.
            pip install -r requirements-etl.txt
            python geocode.py --download              # one-time gazetteer (~30 MB)
            python rebuild_db.py                       # 17-step pipeline (~30 min)
            python emotions.py                         # GPU emotion classification (~35 min)
            python run_enrich.py --limit 378000        # LLM field extraction (~$8, optional)
            python run_enrich.py --apply               # apply extractions to DB
            python export_public.py                    # clean public export
                

LLM-derived results (location normalization, field extraction, emotion classification) are cached to CSV files in data/output/ and replayed automatically on future rebuilds without re-calling APIs or re-running GPU inference.

TIME WINDOW · 1947 — 2026 · in range
34 AD ⫽ 1900 2026