—

of 614,505

0 50 100

1947 — 2026

COLOR SIZE

LAT — · LON — · READY

Sightings Timeline

1900 — 2026 · 0 in view

All three charts reflect the current Observatory filter state. Change a filter on any tab and these cards re-tally instantly.

All sightings over time

Quality score over time (median per year)

Movement categories over time (yearly share)

Insights

Emotion & Sentiment Analysis

Sentiment Polarity

Emotion Distribution (7-Class)

GoEmotions Detail (28-Class)

Sentiment Score Distributions

Emotion Profile by Source

NRC Emotion Lexicon ⓘ

Data Quality & Red Flags

Quality Score Distribution

Narrative Red Flags (keyword heuristic)

Movement & Shape

Movement Taxonomy

Shape × Movement Matrix (Top 10 shapes)

Ask AI about the data

Bring your own API key — chat happens in your browser, the key never touches our server.

Ask anything about the unified UFO database

Try:

"What are the most common shapes?"
"Show me triangle sightings in California in the 1970s"
"How many sightings happened in October 1973?"
"Which states report the most sightings?"

You'll need an API key from your provider — open Settings above.

Connect your own AI to the UFOSINT data

Every tool the website's chatbot has access to is also exposed via the Model Context Protocol at a single HTTPS endpoint, so any MCP-compatible AI client can query the unified UFO sightings database with your own model and your own subscription. The endpoint is read-only and free to use.

MCP endpoint

https://ufosint-explorer.azurewebsites.net/mcp

6 tools available: search_sightings, get_sighting, get_stats, get_timeline, find_duplicates_for, count_by.

Claude Code (CLI / Desktop App)

One command to connect from any project directory:

claude mcp add --transport http ufosint https://ufosint-explorer.azurewebsites.net/mcp

Restart your Claude Code session. The 6 UFOSINT tools will be available immediately. Remove later with claude mcp remove ufosint.

Claude Desktop

Open ~/Library/Application Support/Claude/claude_desktop_config.json (macOS) or %APPDATA%\Claude\claude_desktop_config.json (Windows) and add:

{
  "mcpServers": {
    "ufosint": {
      "url": "https://ufosint-explorer.azurewebsites.net/mcp",
      "transport": "http"
    }
  }
}

Restart Claude Desktop. The 6 UFOSINT tools will appear in the tools panel.

Cursor / Cline / Continue / Windsurf

These all support remote MCP servers. Add the same URL to your client's MCP configuration. Each client documents the exact location, but the JSON shape is the same.

Direct API (curl / Python / any HTTP client)

The endpoint is JSON-RPC 2.0 over HTTPS. List the tools:

curl -s https://ufosint-explorer.azurewebsites.net/mcp \
  -H 'Content-Type: application/json' \
  -d '{"jsonrpc":"2.0","id":1,"method":"tools/list"}'

Call a tool:

curl -s https://ufosint-explorer.azurewebsites.net/mcp \
  -H 'Content-Type: application/json' \
  -d '{
    "jsonrpc":"2.0",
    "id":2,
    "method":"tools/call",
    "params":{
      "name":"search_sightings",
      "arguments":{"q":"triangle","state":"CA","limit":5}
    }
  }'

OpenAI / OpenRouter function-calling format

If you're integrating with OpenAI or OpenRouter and want the tool definitions in their native format (instead of going through MCP), fetch:

GET https://ufosint-explorer.azurewebsites.net/api/tools-catalog

And invoke individual tools at:

POST https://ufosint-explorer.azurewebsites.net/api/tool/<tool_name>
Content-Type: application/json

{ "q": "triangle", "state": "CA", "limit": 5 }

Download the database (SQLite)

Want to run your own analysis, train models, or hack on the data offline? The full 508 MB SQLite snapshot is attached to every tagged release on GitHub — 614,505 deduplicated sightings, 502,985 with emotion analysis, and all derived columns.

curl -LO https://github.com/UFOSINT/ufosint-explorer/releases/latest/download/ufo_public.db

sqlite3 ufo_public.db "SELECT COUNT(*) FROM sighting;"
# 614505

See the Methodology tab for the full schema, derived-column definitions, and per-source licensing. Browse the releases page for older versions.

AI Discovery

This site exposes standard AI-readiness files so agents and LLMs can discover and understand the UFOSINT tools automatically:

/llms.txt — Lightweight index of the site, tools, and data for LLMs
/llms-full.txt — Full tool schemas and documentation in one file
/.well-known/mcp.json — MCP server discovery manifest
/robots.txt — All AI crawlers allowed

Local stdio MCP server

Prefer to run an MCP server on your own machine? Clone the repo and use mcp_server.py:

git clone https://github.com/UFOSINT/ufosint-explorer
cd ufosint-explorer
pip install fastmcp psycopg[binary]
DATABASE_URL="postgresql://..." python mcp_server.py

Then point Claude Desktop at the local script via the command form of the MCP config. (You'll need read-only credentials to a PostgreSQL with the UFOSINT schema.)

All access is read-only. Source data is licensed by UFOSINT; the deduplicated database is built by the ufo-dedup pipeline.

Unified UFO Sightings Database — Methodology

This is not raw data. UFOSINT Explorer presents a processed scientific analysis of six major UFO/UAP databases — 618,316 sighting records deduplicated, cross-referenced, geocode-verified, quality-scored, LLM-enriched, movement-classified, and emotion-analyzed using four transformer models. Every step of the pipeline is documented below and can be independently replicated from the source data using the open-source ufo-dedup pipeline. The web application source code is at ufosint-explorer.

Download the full database The 628 MB SQLite snapshot (ufo_public.db) is attached to every tagged release. Download latest · Browse releases

Reproducibility statement. The entire database can be rebuilt from source files with a single command (python rebuild_db.py). All data quality fixes are idempotent and preserve original values. LLM-derived enrichments are cached to CSV files and replayed deterministically on rebuilds — no API key is needed for reproduction. Derived columns (quality scores, movement categories, emotion classifications) are computed deterministically from the raw narratives. No records are deleted — duplicates are flagged, not merged.

Pipeline Architecture

The 17-step pipeline transforms ~2.56 million raw records from 6 sources into a unified, analysis-ready database:

            Raw Data (5 CSVs + 1 JSON + Reddit CSV)
                |
                v
            rebuild_db.py  (17 steps, ~30 min)
                |
                +-- Steps 1-7:    Import 6 sources (618,316 sightings)
                +-- Step 8:       SQL data quality fixes (~30 corrections)
                +-- Step 9:       Geocode pass 1 (GeoNames offline gazetteer)
                +-- Step 10:      Audit: fix bad geocodes + replay LLM location fixes
                +-- Step 11:      Geocode pass 2 (picks up audit-improved locations)
                +-- Steps 12-14:  Enrich, dedup, sentiment analysis
                +-- Step 15:      Derived analysis (shapes, quality, duration, hoax)
                +-- Step 16:      Replay cached enrichments (emotions, LLM extractions)
                +-- Step 17:      Export
                |
                v
            ufo_public.db  (618,316 sightings, 100 columns, 628 MB)

Source Databases

Source	Raw Records	Imported	Skipped	Description
UFOCAT	320,412	197,108	123,304	CUFOS UFOCAT 2023 catalog. Richest metadata: Hynek/Vallee classifications, lat/lon, witness counts, durations. 123K NUFORC-origin records (`SOURCE=UFOReportCtr`) skipped; metadata transferred via enrichment.
NUFORC	159,320	159,320	0	National UFO Reporting Center. Self-reported sightings with detailed free-text descriptions. Enriched post-import with 102K Hynek and 83K Vallee classifications from UFOCAT.
MUFON	138,310	138,310	0	Mutual UFO Network case reports. Short + long descriptions, investigator summaries.
UPDB	1,885,757	65,016	1,820,741	Unified Phenomena Database (phenomenAInon). 1.82M rows skipped (MUFON/NUFORC already imported from richer originals). Remaining 65K from UFODNA (38K), Blue Book (14K), NICAP (5.8K), etc.
UFO-search	54,751	54,751	0	Majestic Timeline compilation from ufo-search.com. Historical records from 19 source compilations (Hatch, Eberhart, NICAP, Vallee, etc.).
r/UFOs	3,811	3,811	0	Reddit r/UFOs community sighting reports. Three-pass pipeline: API scraping, LLM-structured extraction (Gemini Flash), database import. Includes anomaly assessments and strangeness ratings.

Total raw records across all sources: ~2.56 million. After removing known overlaps at import time: 618,316.

LLM-Powered Data Quality Pipeline

After initial import and geocoding, the pipeline runs an AI-powered data quality audit using Google Gemini Flash. Results are cached for deterministic replay on future rebuilds.

Audit Tier	What It Does	Records	Method
Tier A: Geocode Verification	Detects map pins in the wrong country/hemisphere (e.g., "Nelson, Nebraska" geocoded to New Zealand)	29,029 fixed	Code: US/CA state bounding-box validation
Tier B: Location Normalization	Cleans 120K messy location strings for re-geocoding (e.g., "Toronto (Canada)" → city=Toronto, state=ON, country=CA)	49,190 improved	LLM: Gemini Flash, cached to CSV
Field Extraction	Extracts shape, color, duration, witnesses, sound, and direction from narrative descriptions that had missing structured fields	297,446 enriched	LLM: Gemini Flash, cached to CSV

Impact: +48,503 new correct map pins, +38,347 records above the quality threshold, 574,269 new structured field values extracted from text.

Geocoding

Locations are geocoded offline using the GeoNames gazetteer (cities with population ≥15,000). Three matching strategies with decreasing specificity:

Exact: city + state + country → coordinates (highest confidence)
City + country: ignores state, picks largest city by population
City only: global lookup, prefers matches in the expected country when a US/CA state code is present (v0.14 fix — prevents wrong-continent matches)

Coverage: 418,077 of 618,316 sightings (67.6%) have map coordinates. Two geocoding passes run: before and after the LLM location audit.

Quality Score (0–100)

Every sighting receives a quality score based on the richness and completeness of its data. Higher scores indicate more detailed, verifiable reports.

Feature	Points	Notes
Description length	0 / 5 / 15 / 25	None / <50 / <200 / 200+ characters
Has media (photo/video)	+15
Number of witnesses	0 / 5 / 10 / 15	0 / 1 / 2 / 3+
Movement mentioned	+10 (+5 bonus)	+5 if 2+ movement categories detected
9 structured fields	3 pts each (max 27)	time, shape, color, duration, sound, direction, elevation, Hynek, Vallee
Coordinates present	+5
Specificity bonus	+5	Time-of-day, compass direction, or altitude in description
Unknown-date cap	min(score, 15)	Relaxed to 35 if 8+ features and has description

Distribution: 160,728 sightings (26%) score ≥60. Average score: 42.3. The structured fields row is where LLM extraction has the biggest impact — each newly extracted field adds 3 points.

Shape Normalization

Raw shape strings from 6 sources are normalized to 28 canonical shapes using a three-tier matcher:

Exact match against canonical list (286,826 matches)
Substring/alias match against 70+ extended mappings — e.g., "ovoid" → Oval, "V-shape" → Chevron, "torpedo" → Cigar (52,448 matches)
Fuzzy match via rapidfuzz at ≥85% similarity (41 matches)

28 canonical shapes: Sphere, Disc, Triangle, Cigar, Oval, Circle, Light, Fireball, Cylinder, Diamond, Rectangle, Chevron, Cross, Teardrop, Star, Egg, Cone, Cube, Saucer, Boomerang, Flash, Formation, Changing, Crescent, Cloud, Dome, Unknown, Other.

Coverage: 343,602 sightings (55.6%) have a standardized shape.

Duration Parsing

Raw duration strings are parsed to seconds using a multi-strategy parser:

Natural language: "5 minutes" → 300s, "about 2 hours" → 7200s
UFOCAT codes: "B" (brief) → 3s, "M" (medium) → 120s, "H" (hour) → 3600s
Bare numbers: "5" → 300s (UFOCAT convention: decimal minutes)
Ranges: "5-10 minutes" → 450s (midpoint)

Parsed durations are bucketed: instant (<5s), seconds (<60s), minutes (<1h), hours (<24h), days (≥24h).

Coverage: 232,438 sightings (37.6%) have parsed durations.

Emotion & Sentiment Analysis

Four models classify the emotional content of sighting narratives:

Model	Output	Coverage
GoEmotions 28-class	Dominant emotion label + sentiment group (positive/negative/ambiguous/neutral)	506,788
7-class RoBERTa	Emotion label (surprise/fear/neutral/anger/disgust/sadness/joy) + full softmax probabilities	506,788
RoBERTa Sentiment	Compound sentiment score (-1.0 to +1.0)	506,788
VADER + NRC Lexicon	VADER compound score + 10 NRC emotion word counts (joy, fear, anger, sadness, surprise, disgust, trust, anticipation, positive, negative)	506,788

Nuclear Proximity Analysis

Haversine distance computed from every geocoded sighting to the nearest of 50 nuclear-relevant facilities (military bases, national labs, nuclear test sites, weapons storage areas).

Results: 35,203 sightings within 50 km of a nuclear site. 69,912 within 100 km.

Supplemented by 14 crash-retrieval cases and 35 nuclear encounter records from the UAP Gerb research bundle.

Movement & Behavior Classification

Regex-based extraction from narrative text identifies 10 movement categories and 14 behavior tags:

Movement: hovering, linear, erratic, accelerating, rotating, ascending, descending, vanished, followed, landed
Behavior: hovering, silent, bright, pulsing, rotating, zigzag, vanished, accelerated, split, merged, formation, chased, followed, landed

Coverage: 250,820 sightings (40.6%) have at least one movement category.

Cross-Source Deduplication

A three-tier engine flags potential duplicates across sources. No records are deleted — duplicates are flagged for review.

Tier	Method	Pairs Flagged
1	MUFON ↔ NUFORC: date + city + state exact match	~48K
2	All remaining cross-source pairs: date + city + state	~69K
3	Date-only matches with description fuzzy similarity ≥60%	~10K

Total: 126,729 duplicate candidate pairs across 618,316 sightings.

Content Policy & Privacy

The public database (ufo_public.db) contains only derived, non-copyrighted fields. Raw narrative text from NUFORC, MUFON, UFOCAT, UPDB, and UFO-search is stripped during export. Reddit sighting descriptions are LLM-generated summaries (transformative derivative works), not original Reddit posts.

No personally identifiable information (PII) is published. The witness_names column is NULL in the public export.

How to Reproduce This Database

            pip install -r requirements-etl.txt
            python geocode.py --download              # one-time gazetteer (~30 MB)
            python rebuild_db.py                       # 17-step pipeline (~30 min)
            python emotions.py                         # GPU emotion classification (~35 min)
            python run_enrich.py --limit 378000        # LLM field extraction (~$8, optional)
            python run_enrich.py --apply               # apply extractions to DB
            python export_public.py                    # clean public export

LLM-derived results (location normalization, field extraction, emotion classification) are cached to CSV files in data/output/ and replayed automatically on future rebuilds without re-calling APIs or re-running GPU inference.

SPEED

TIME WINDOW · 1947 — 2026 · — in range

34 AD ⫽ 1900 2026

All sightings over time

Quality score over time (median per year)

Movement categories over time (yearly share)

Emotion & Sentiment Analysis

Sentiment Polarity

Emotion Distribution (7-Class)

GoEmotions Detail (28-Class) + neutral

Sentiment Score Distributions

Emotion Profile by Source

NRC Emotion Lexicon ⓘ

Data Quality & Red Flags

Quality Score Distribution

Narrative Red Flags (keyword heuristic)

Movement & Shape

Movement Taxonomy

Shape × Movement Matrix (Top 10 shapes)

Connect your own AI to the UFOSINT data

MCP endpoint

Claude Code (CLI / Desktop App)

Claude Desktop

Cursor / Cline / Continue / Windsurf

Direct API (curl / Python / any HTTP client)

OpenAI / OpenRouter function-calling format

Download the database (SQLite)

AI Discovery

Local stdio MCP server

Unified UFO Sightings Database — Methodology

Pipeline Architecture

Source Databases

LLM-Powered Data Quality Pipeline

Geocoding

Quality Score (0–100)

Shape Normalization

Duration Parsing

Emotion & Sentiment Analysis

Nuclear Proximity Analysis

Movement & Behavior Classification

Cross-Source Deduplication

Content Policy & Privacy

How to Reproduce This Database

GoEmotions Detail (28-Class)