# Minifetch Skill: Technical SEO Page Audit Run a complete technical SEO audit on a single web page in one API call. Returns a structured report with PASS/ WARN/ FAIL findings for every signal, plus the rules used to evaluate them. No black-box scoring — every threshold is documented below and applied deterministically. This is a **composer endpoint** (`/run/seo-page-audit`) — under the hood it runs the preflight check, fetches metadata, extracts links, and assembles the audit. You pay one price for the whole thing. If robots.txt blocks the page, the request fails with 403 and you are not charged. **Base URL:** https://minifetch.com --- ## What You'll Do 1. Setup — choose payment method and access method 2. Preflight (optional) — confirm the URL is fetchable 3. Run the audit — single API call, structured response 4. Interpret findings — every finding has a `status` and an `expected` value 5. Output the report — summarize pass/warn/fail counts and act on failures --- ## Step 1 — Setup ### Choose a payment method There is no account setup fee or monthly fee. Minifetch does not charge for blocked pages or errors. **Option 1: Credit card & API key** Sign up and get credits worth 25 free audits automatically: https://minifetch.com/dashboard No credit card required to begin. Click the "Sign up" button and verify your email address to create your account. Once you are signed in, use the dashboard to create your API key. Each successful fetch will be deducted from your credit balance. Top up for as little as $2 with your credit card. Recommended for most builders. **Option 2: USDC on Base or Solana** Just load your wallet with USDC on Base or Solana and you're ready. No "gas token" (ETH or SOL) required. No Minifetch account setup needed. Recommended for agents and agent builders. For details: https://www.x402.org/ ### Prices (pay-as-you-go) - URL Check (preflight): Free - SEO Page Audit: $0.01 - URL Metadata: $0.002 - URL Links: $0.002 - URL Preview: $0.001 - URL Content: $0.002 ### Choose an access method **Option A: curl + API key** ``` curl "https://minifetch.com/api/v1/extract/url-metadata?url=https://yoursite.com/your-page" \ -H "Authorization: Bearer [your_api_key]" ``` **Option B: minifetch-api (recommended for agents & agent builders)** Handles payment automatically — no manual auth header or x402 handshake needed. README Quick Start section details how to initialize the client: https://www.npmjs.com/package/minifetch-api ``` npm install minifetch-api --save ``` **Option C: Coinbase Payments MCP (for AI assistants like Claude)** Gives AI assistants a built-in wallet — no private key needed. ``` npx @coinbase/payments-mcp ``` See: https://www.npmjs.com/package/@coinbase/payments-mcp --- ## Step 2 (Free) -- Preflight Check Confirm the URL is fetchable before spending credits: ``` curl "https://minifetch.com/api/v1/free/preflight/url-check?url=https://yoursite.com/your-page" ``` Or with minifetch-api (the `checkAndExtract*` methods run this automatically before each paid fetch). You can also call it as a standalone function: ```js const response = await client.preflightCheck("https://yoursite.com/your-page"); ``` If the response includes `allowed: false`, the page is blocked by the site owner. If you own the site and want to allow Minifetch access, see: https://minifetch.com/skills/unblock-minifetch/SKILL.md The audit endpoint runs preflight internally and returns 502 (no charge) if the URL is blocked, so calling `/url-check` first is optional. It can still be useful when you have a list of candidate URLs and want to filter cheaply before paying. --- ## Step 3 — Run the Audit **Price:** $0.01 per URL (charged only on success). From your CLI: ``` curl "https://minifetch.com/api/v1/run/seo-page-audit?url=https://yoursite.com/your-page" \ -H "Authorization: Bearer [your_api_key]" ``` Or with `minifetch-api`: https://www.npmjs.com/package/minifetch-api ``` const response = await client.checkAndRunSeoPageAudit("https://yoursite.com/your-page"); ``` Response shape: ```json { "success": true, "results": [ { "data": { "summary": { "pass": 12, "warn": 3, "fail": 1 }, "requestUrl": "https://redirect.to/your-page", "url": "https://yoursite.com/your-page", "responseStatusCode": {...}, "responseHeaders": {...}, "compliance": { "robotsTxt": {...}, "https": {...}, "mixedContent": {...} }, "metadata": { "title": {...}, "description": {...}, ... }, "hreflang": {...}, // always present; { note, count: 0 } when none "jsonld": {...}, "headings": {...}, "content": {...}, "images": {...}, "links": {...}, "social": { "openGraph": {...}, "twitterCard": {...} }, "minifetchCache": {...} } ] } ``` Every audit finding has the same shape: ```json { "status": "pass" | "warn" | "fail", "expected": , ... // additional fields (value, count, length, etc.) } ``` Pure data fields (counts, dates, dimensions) appear without `status` or `expected` — they are informational pass-throughs. Some findings drop to informational (a `note`, no `status`) when there's nothing to evaluate: `canonical`, `canonicalMatchesSelf`, `openGraphUrlMatchesSelf`, and `hreflang` are always present in the response but carry a `note` instead of a `status` when not applicable. Null-check `status` before relying on it. --- ## Step 4 — Audit Rules Every threshold is documented here. The audit utililty applies these deterministically. No scoring model, no AI judgment. The audit composes /preflight/url-check + /extract/url-metadata + /extract/url-links + the rules below. While iterating, call the primitives directly — they're cheaper and return the same underlying data. Watch the `minifetchCache.hit` field — back-to-back calls within the cache window (~2 min) skip the network fetch entirely. Run the full audit composer again once your pipeline is stable and the cache `expiresAt` timestamp has passed. All API endpoints share the cache. ### summary `{ pass, warn, fail }` — counts of findings with each status across the whole report. Pure data fields (counts, dates) are not counted. ### responseStatusCode **pass** when 200; **fail** otherwise. (3xx redirects are followed before this check.) *Why this matters:* - The page should return HTTP 200. Redirects (3xx) are followed before this check, so anything other than 200 here is a real error. ### responseHeaders | Header | Rule | |---|---| | `Date`, `last-modified` | informational only — no status | | `X-Robots-Tag` | **fail** if contains `noindex`; **pass** otherwise | | `Content-Type` | **pass** if matches `text/html`; **fail** otherwise | | `Cache-Control` | **pass** if present; **warn** if missing | | `Strict-Transport-Security` | **pass** if present (HSTS); **warn** if missing | *Why these matter:* - **X-Robots-Tag** — A server-level robots directive sent as an HTTP header. If it contains noindex, search engines will not index the page even if the HTML and robots meta tag say otherwise. - **Content-Type** — An SEO audit target should be served as text/html. PDFs, JSON, or other content types are not crawled the same way. - **Cache-Control** — Tells browsers and intermediaries how long to cache the response. Missing cache-control means clients fall back to heuristic defaults, which can hurt repeat-visit performance. - **Strict-Transport-Security** — HSTS header tells browsers to only connect to this domain over HTTPS. A security best practice and a minor ranking signal. ### compliance | Field | Rule | |---|---| | `robotsTxt` | **pass** if site's robots.txt allows our user agent (`minifetch`); **fail** if disallowed (request returns 502, no charge) | | `https` | **pass** if the audited page is served over HTTPS (checked against the post-redirect URL); **fail** otherwise. `value` is one of `https`, `http`, or `unknown`. | | `mixedContent` | **pass** if 0 http:// resources; **fail** if any. Scans the HTML for `src`/`href`/`data` attributes with http:// values. `resources` is the first 20 offending URLs; `omitted` is any beyond. On non-HTTPS pages this finding is a no-op pass with a `note` field — the `https` finding above is the real issue there. | *Why this matters:* - **robotsTxt** — The site's robots.txt must allow our user agent. If disallowed, the audit returns 502 and is not charged. - **https** — SEO-relevant pages should be served over HTTPS. Search engines penalize http pages, browsers show 'Not Secure' warnings, and mixed content protections only activate over HTTPS. This finding looks at the URL we actually fetched (post-redirect), so an audit of http://example.com that redirects to https://example.com passes. - **mixedContent** — On an HTTPS page, any resource loaded over http:// is mixed content. Browsers block active mixed content (scripts, iframes, stylesheets) and warn on passive (images, video). The resources array lists the first 20 offending URLs so you know what to fix; the count and omitted fields tell you the full total. ### metadata | Field | Rule | |---|---| | `title` | **pass** 30–60 chars; **warn** 1–29 (short — room for keywords) or 61–70 (risks truncation); **fail** empty or >70. | | `description` | **pass** 70–155 chars; **warn** 1–69 (short — room for USPs/CTA) or 156–200 (risks truncation); **fail** empty or >200. | | `canonical` | **pass** if present, parseable, and consistent; **fail** if the HTML and Link response header values disagree (`conflictWithLinkHeader: true`) or the canonical is unparseable (`malformed: true`); **info** (no status) if absent — search engines self-canonicalize a page to its own URL, so a missing canonical only matters when the page has duplicate URLs the audit can't see. Source is one of `html`, `header`, `both`. | | `canonicalMatchesSelf` | **pass** if the canonical resolves to the *post-redirect* final URL we fetched (`value: true`); **warn** if it points elsewhere (`value: false`); informational (a `note`, no status) when the canonical is absent or unparseable — the `canonical` finding above carries that detail. Pointing elsewhere is often intentional for paginated, filtered, or syndicated pages — a warn, not a fail. The `crossDomain` boolean is informational (does the canonical point off-domain?). Normalization for the comparison ignores www-prefix, http-vs-https, default ports, and trailing slash on root path; everything else (path, query, hash, non-root trailing slashes) is significant. Relative canonicals are resolved against the fetched URL like a browser would. | | `robots` | **warn** if value contains `noindex`; **pass** otherwise. Defaults to `"index, follow"` (Google's assumed default) when meta tag is absent. Same reasoning as `canonicalMatchesSelf`: `noindex` is often intentional (admin pages, staging, internal search results) so we surface it as a warn, not a fail. | | `lang` | **pass** if present; **warn** if missing. Attribute on top-level `` tag. | | `viewport` | **pass** if present; **warn** if missing | *Why these matter:* - **title** — The clickable headline shown in Google search results. 30–60 characters is the healthy range, and the upper end (50–60) uses the available width best. Over ~60 characters Google truncates the title and may drop important words. A short title is not broken — it renders fine — but it leaves room to add keywords or communicate the page value, so we flag it as an opportunity rather than an error. - **description** — Often used as the snippet under the title in search results. 70–155 characters is the healthy range. Over ~155 characters Google truncates the snippet; under 70 it still renders fine but leaves room to add benefits, USPs, or a call to action — so a short description is flagged as a missed opportunity, not an error. Google may rewrite the description regardless, but a strong one still influences click-through. - **canonical** — Tells search engines which URL is the master version when duplicate content exists across multiple URLs. Can appear in the HTML or the Link response header. If both are present they must agree (a disagreement is a fail), and the canonical must be a parseable URL (an unparseable one is a fail, flagged by malformed: true). If no canonical is declared, search engines self-canonicalize the page to its own URL, which is fine unless the page has duplicate URLs the audit cannot see. So a missing canonical is surfaced as information, not graded. - **canonicalMatchesSelf** — Whether the declared canonical points back at the URL we fetched: value true means it points at this page, false means it points elsewhere. When the canonical is missing or unparseable there's nothing to compare, so this is informational (a note, no status); the canonical finding above carries the detail. If it points elsewhere (false), Google likely treats this page as a duplicate, consolidates ranking signals to the canonical target, and usually shows that URL instead — though a cross-domain canonical is a strong hint Google can still override. Pointing elsewhere is usually deliberate (syndicated content crediting the original), so it's a warn, not a fail. The crossDomain field flags an off-domain canonical as informational, doesn't change status. - **robots** — The robots meta tag controls whether search engines index this page and follow its links. If absent, Google assumes "index, follow". A noindex value keeps the page out of search results; often intentional (admin pages, staging environments, internal search results) which is why this surfaces as a warn rather than a fail. Same logic as canonicalMatchesSelf: page is opting out of indexing, we surface it but don't presume to grade it. - **lang** — The lang attribute on the top-level `` tag tells search engines and screen readers what language the page is in. Helps with accessibility and international SEO. - **viewport** — The viewport meta tag tells mobile browsers how to size the page. Pages without it are flagged as not mobile-friendly, which is a ranking factor. ### hreflang (always present; informational `note` + `count: 0` when the page has no hreflang tags) | Field | Rule | |---|---| | `count` | informational — number of hreflang entries on the page | | `xDefault` | informational — `present: true/false` | | `selfReferencing` | **pass** if at least one entry's href matches the audited URL; **fail** otherwise. Matched against the *post-redirect* final URL, not the URL you requested (if there were redirects) | | `fullyQualifiedUrls` | **pass** if all hrefs are absolute (`http://` or `https://`); **fail** with the offending hrefs in `invalid` | | `inHead` | **pass** if all hreflang `` tags appear inside ``; **warn** otherwise | *Why these matter:* - **selfReferencing** — A page's hreflang set must include an entry that points back to itself. Without self-reference, Google treats the entire set as invalid. - **fullyQualifiedUrls** — hreflang hrefs must be absolute URLs (with http:// or https://). Relative URLs are silently ignored by search engines. - **inHead** — hreflang `` tags must appear inside ``. Tags placed in `` are ignored. ### jsonld **pass** if at least one typed item is present; **warn** if none. `itemCount` is the number of top-level items Google evaluates — `@graph` arrays are expanded so each node counts as its own item. `types` lists the distinct top-level item types; `nestedTypes` lists supporting entity types found inside those items (an author Person, a logo ImageObject, breadcrumb ListItems) and is informational. `itemCount` and `types.length` can differ (two Product items = itemCount 2, one type). *Why this matters:* - JSON-LD structured data makes pages eligible for rich results in search (recipe cards, product listings, review stars, etc). itemCount is the number of top-level items Google can evaluate. @graph arrays are expanded so each node counts as its own item, the way Google reads them. types lists the distinct top-level item types; nestedTypes lists supporting entity types found inside those items (an author Person, a logo ImageObject, breadcrumb ListItems) and is informational, since Google treats them as properties of their parent item rather than standalone items. At least one typed item makes the page eligible. ### headings | Field | Rule | |---|---| | `h1` | **pass** if exactly 1; **fail** otherwise | | `hierarchy` | **pass** if no level skips in document order (h2 → h4 is a skip); **warn** if any skips found, listed in `skips` | *Why these matter:* - **h1** — Search engines treat the h1 as the page's primary topic. Multiple h1s dilute the signal; zero h1s leave the page without a clear subject. - **hierarchy** — Heading levels should descend without skipping (h2 → h3, not h2 → h4). Skipped levels confuse screen readers and weaken the document's logical structure. ### content Purely informational — no pass/warn/fail status. Both fields are derived from the response body with no extra network fetch. | Field | Description | |---|---| | `wordCount` | Visible text word count after stripping HTML tags, `