Open Graph / Twitter Card metadata extractor CLI. Fetch a URL (or pipe HTML on stdin) and get back a clean JSON blob of everything you need to render a link preview: title, description, image, site name, Twitter Card fields, canonical URL, favicon.
Every team that builds a Slack bot, a Discord integration, a blog with link cards, or a notifications pipeline eventually writes this code. It's a small, well-defined problem that's easy to get wrong — og: takes precedence over twitter:, both take precedence over <title>, relative image URLs need resolving, HTML entities need decoding. ogp-fetch does those things in ~80 lines of Python stdlib + httpx.
ogp-fetch https://example.com/article{
"url": "https://example.com/article",
"title": "Hello World",
"description": "An example page",
"image": "https://example.com/static/hero.png",
"site_name": "Example",
"type": "article",
"twitter": { "card": "summary_large_image", "site": "@example", "creator": null },
"canonical": "https://example.com/article",
"favicon": "https://example.com/favicon.ico"
}pip install .Runtime dependency: httpx only.
# Fetch a URL and emit JSON (default).
ogp-fetch https://example.com/
# Human-readable key: value layout.
ogp-fetch https://example.com/ --format human
# Markdown link-card preview (great for README / Slack).
ogp-fetch https://example.com/ --format markdown
# Pipe HTML from anywhere.
curl -s https://example.com/ | ogp-fetch - --no-resolve
# Pipe with a base URL so relative og:image paths still resolve.
curl -s https://example.com/ | ogp-fetch - --no-resolve --base-url https://example.com/| Flag | Default | Description |
|---|---|---|
--format {json,human,markdown} |
json |
Output format |
--user-agent STRING |
ogp-fetch/0.1.0 (…) |
Sent as the User-Agent header |
--timeout SECONDS |
10 |
HTTP timeout |
--max-size BYTES |
2097152 (2 MB) |
Refuse responses larger than this |
--base-url URL |
(fetched URL) | Resolve relative links against this |
--no-resolve |
off | Skip HTTP entirely; requires - as the URL |
| Code | Meaning |
|---|---|
0 |
Metadata found |
1 |
Fetched/parsed successfully but no OGP / Twitter / <title> data |
2 |
Fetch, parse, or argument error |
ogp-fetch - --no-resolve reads HTML from stdin and emits the same JSON without any network traffic. That lets you:
- plug it into a curl pipeline without making
ogp-fetchresponsible for TLS or retries; - test your extraction on a captured HTML fixture;
- run it in a sandbox or offline CI job.
The extractor collects every meta tag it finds; the normalizer picks the winner:
| Field | First checked | Then | Last resort |
|---|---|---|---|
title |
og:title |
twitter:title |
<title> |
description |
og:description |
twitter:description |
<meta name="description"> |
image |
og:image |
og:image:url → twitter:image |
— |
canonical |
<link rel="canonical"> |
og:url |
— |
Relative URLs in og:image, twitter:image, canonical, and favicon are resolved to absolute using urllib.parse.urljoin(base_url, value). Protocol-relative //cdn.example.com/x.png works too.
docker build -t ogp-fetch .
docker run --rm ogp-fetch --help
# Pipe HTML in:
cat page.html | docker run --rm -i ogp-fetch - --no-resolve --format markdownImage is multi-stage Alpine, non-root, under 90 MB.
pip install ".[dev]"
pytest -qAll network paths are exercised via httpx.MockTransport — the test suite never touches the real network.
MIT