The Form 424B8 Files Dataset is a collection of every late-cured prospectus filed to EDGAR under Rule 424(b)(8) of the Securities Act of 1933 from January 2006 to the present. One record is one EDGAR submission of Form 424B8, identified by an 18-digit accession number, and packaged as a folder containing a metadata.json header, the registrant-supplied 424B8 prospectus HTML, and — when the filer attached one — an Exhibit 107 EX-FILING FEES inline-XBRL HTML. The filer is always the Securities Act registrant whose effective registration statement covers the offering, most often a structured-note finance subsidiary or a bank holding company operating an automatic shelf. The corpus is delivered as monthly ZIP containers (YYYY/YYYY-MM.zip) through the sec-api.io datasets API and contains HTML, JSON, PDF, and TXT files. Because every record is a prospectus that missed its original (b)(1) through (b)(7) deadline and is being submitted "as soon as practicable" after discovery, the dataset is uniquely suited to compliance benchmarking, late-filing-gap analysis, and structured-product pricing-supplement extraction.
Programmatically retrieve the full list of dataset archive files, download URLs and dataset metadata.
Dataset Index JSON API
Download the entire dataset as a single archive file.
Download Entire Dataset:
Download a single container file (e.g. monthly archive) from the dataset.
Download Single Container:
The Form 424B8 Files Dataset packages every EDGAR submission whose form code is 424B8 — that is, every prospectus filed pursuant to Rule 424(b)(8) of the Securities Act of 1933. Paragraph (b)(8) is a residual catch-up provision: it applies to "any form of prospectus required to be filed pursuant to" another paragraph of Rule 424(b) — most commonly (b)(2) (prospectuses reflecting offerings off a shelf), (b)(3) (substantive changes or material additions to a prospectus already on file), (b)(5) (prospectus supplements relating to shelf takedowns), or (b)(7) (additional information for medium-term notes and similar continuous offerings) — that was not transmitted within the time frame the underlying paragraph requires. The form must be filed as soon as practicable after the failure to timely file is discovered. Substantively, a 424B8 re-presents the same prospectus content the original (b)(x) paragraph would have required; only the filing identifier differs. EDGAR records the late-filing posture by stamping the cover with a "Filed Pursuant to Rule 424(b)(8)" line and assigning the submission the form code 424B8.
The 424B8 population is dominated by structured-product and medium-term-note shelf programs. Large issuers — Citigroup Global Markets, JPMorgan Chase Financial, Bank of Montreal, GS Finance, Morgan Stanley Finance, Royal Bank of Canada, UBS AG, Barclays Bank — generate many small pricing supplements per month under universal shelf registrations, and the ones that miss the (b)(2)/(b)(5) window are re-filed under (b)(8). Consequently the "prospectus" inside a typical 424B8 record is a compact, product-specific pricing supplement (autocallable notes, contingent income notes, buffered enhanced-return notes, equity-linked notes, market-linked CDs) of a few dozen pages, not a stand-alone base prospectus.
The dataset covers all Form 424B8 filings submitted to EDGAR from January 2006 to present, packaged as monthly ZIP containers and refreshed on an ongoing basis as new filings arrive. The file types found in the dataset are HTML (the dominant format for both the prospectus body and the fee exhibit), JSON (the per-record metadata), with PDF and TXT also represented across the dataset.
One record in the Form 424B8 Files Dataset is a single EDGAR submission of Form 424B8 — a prospectus filed pursuant to Rule 424(b)(8) of the Securities Act of 1933 — identified by its 18-digit SEC accession number. On disk the record is a folder named with the unhyphenated accession number (for example 000095010325013761), placed inside a per-month directory (YYYY-MM/) that is the sole top-level entry of a monthly ZIP archive (YYYY/YYYY-MM.zip). Inside the accession folder sit a metadata.json describing the EDGAR submission as a whole and the registrant-supplied filing artifacts: the primary 424B8 prospectus document (HTML) and, when the filer furnished one, an Exhibit 107 EX-FILING FEES inline-XBRL HTML document. The unit of observation is the submission, not the issuer, the registration statement, or the underlying offering — every accession is one record, and an issuer that files multiple late prospectus supplements within a month appears as multiple, independent records.
The dataset is delivered as a tree of monthly ZIPs organized by year — YYYY/YYYY-MM.zip — and each ZIP unpacks to a YYYY-MM/ directory containing one accession-numbered subfolder per filing. The accession folder is the record. Inside it appear:
metadata.json — always present, one per accession, capturing the full EDGAR submission metadata (see below).type == "424B8" in documentFormatFiles[]. The on-disk filename is the registrant's original EDGAR filename (for example dp236412_424b8-us2522940d.htm, ea0262163-01_424b8.htm, bmo4899_424b2-32282.htm); there is no normalized name, so the canonical way to locate it is to look up the entry whose type is 424B8 in metadata.documentFormatFiles[].*exfilingfees*.htm or ex-filingfees.htm and carry type == "EX-FILING FEES".documentFormatFiles[].Two classes of artifacts present in the original EDGAR submission are deliberately omitted from the ZIP. GRAPHIC entries (GIF/JPG/PNG images embedded in the prospectus, e.g. image_001.jpg, bg1.jpg) are enumerated in metadata.documentFormatFiles[] but their bytes are not packaged. Likewise the EDGAR "complete submission text file" — the raw .txt wrapper hosted on sec.gov, listed in metadata with empty sequence and type fields — is not bundled; the URL is preserved under linkToTxt. Standalone XBRL instance documents that accompany the EX-FILING FEES exhibit (such as *_exfilingfees_htm.xml) are referenced in metadata.dataFiles[] but, like images, are not shipped inside the ZIP — only the inline-XBRL HTML rendition is.
metadata.json anatomymetadata.json mirrors the EDGAR submission header and the document-list table from the filing index. The top-level fields are:
formType — always the literal string "424B8".accessionNo — the canonical hyphenated accession number (e.g. 0000950103-25-013761).filedAt — ISO-8601 timestamp with timezone offset (e.g. 2025-10-28T18:08:00-04:00) reflecting EDGAR acceptance time.description — short human-readable form description, typically "Form 424B8 - Prospectus filed pursuant to Rule 424(b)(8)".linkToFilingDetails — sec.gov URL of the primary 424B8 document.linkToHtml — sec.gov URL of the -index.htm landing page for the submission.linkToTxt — sec.gov URL of the full submission .txt wrapper.linkToXbrl — sec.gov URL of the XBRL viewer; an empty string when no viewer is exposed.id — a stable 32-character hex hash uniquely identifying the record.Three array fields carry the structural detail:
documentFormatFiles[] enumerates every document in the original EDGAR submission. Each element has sequence (the EDGAR document ordinal as a string), size (bytes, as a string), documentUrl (sec.gov), description (e.g. PRICING SUPPLEMENT, PRELIMINARY PRICING SUPPLEMENT, EX-FILING FEES, GRAPHIC), and type (e.g. 424B8, EX-FILING FEES, GRAPHIC). The trailing element with empty sequence and type represents the complete submission text file.entities[] lists the filer and any co-filers/co-registrants. For each entity the metadata carries cik, companyName (with a role suffix such as (Filer) or (Subject)), type (mirroring the form type), act (Securities Act citation, "33" for the 1933 Act), fileNo (the SEC file number — typically the registration statement number such as 333-270327, with -NN suffixes for co-registrant subsidiaries), filmNo, sic (industry code with description, e.g. "6021 National Commercial Banks"), irsNo, stateOfIncorporation (two-letter code, including non-US codes such as A6 for Ontario), fiscalYearEnd (MMDD), and an optional tickers[] array.dataFiles[] enumerates structured XBRL or other data attachments associated with the submission, such as EXTRACTED XBRL INSTANCE DOCUMENT of type XML. The array is empty when the filer included no XBRL exhibit.A fourth array, seriesAndClassesContractsInformation[], is reserved for investment-company series-and-class identifiers; it is empty for the operating-company and finance-subsidiary issuers that account for the vast majority of 424B8 filings.
The primary document is wrapped in the standard EDGAR SGML envelope around the registrant-supplied HTML body:
1
<DOCUMENT>
2
<TYPE>424B8
3
<SEQUENCE>1
4
<FILENAME>...
5
<DESCRIPTION>PRICING SUPPLEMENT
6
<TEXT>
7
<HTML>...prospectus body...</HTML>
8
</TEXT>
9
</DOCUMENT>
The body is a prospectus. Because the (b)(8) catch-up provision is overwhelmingly used by structured-note shelf programs, the typical body is a pricing supplement to a previously filed base prospectus, prospectus supplement, and product supplement, rather than a stand-alone offering document. Common content blocks, in roughly the order they appear, are:
Layout fidelity varies by filer because EDGAR accepts any well-formed HTML. Three patterns recur:
<TABLE> and <P> markup and registrant-branded color/typography (Citigroup-style filings).<DIV> layouts using point-sized inline typography to reproduce a print template (Bank of Montreal-style filings).<div class="t ..."> tiles over bg*.jpg background images (JPMorgan-style supplements). The text is fully present but heavily fragmented and hard to reflow. Because the GRAPHIC backgrounds are excluded from the ZIP, opening such files in a browser shows the text without the visual page background.These differences are stylistic; the substantive prospectus content is the same regardless of the markup style.
When a 424B8 carries fees, the registrant attaches an Exhibit 107 fee table. In this dataset that exhibit appears as a separate HTML file with type == "EX-FILING FEES". The file is authored as inline XBRL: it opens as XHTML with the inline-XBRL namespace xmlns:ix="http://www.xbrl.org/2013/inlineXBRL", declares one or more xbrli:context blocks (referencing the registrant CIK and filing date), and tags the SEC fee-table facts with the ffd: (filing-fees disclosure) and dei: taxonomies — for example ffd:SubmissnTp, ffd:FeeExhibitTp, ffd:RegnFileNb, dei:EntityCentralIndexKey. The visible rendering is the standard SEC fee table: form type, fee exhibit type, security type, security class, fee calculation rule, amount registered, proposed maximum offering price per unit, proposed maximum aggregate offering price, fee rate, and fee due, with a carry-forward block where applicable. The companion XBRL instance document (*_exfilingfees_htm.xml) is enumerated under metadata.dataFiles[] but is not packaged in the ZIP — the inline-XBRL HTML is the canonical artifact for fee data within the dataset.
metadata.json for the EDGAR submission, including filer/co-filer identification, document inventory, and external links.<DOCUMENT>...<TEXT>...</TEXT></DOCUMENT> envelope.documentFormatFiles[] and that are not images or the raw submission wrapper.metadata.documentFormatFiles[] and remain accessible at their documentUrls on sec.gov but are not packaged in the ZIP. For PDF-rendered HTML supplements this means the page-background images are absent locally..txt envelope concatenating every document in the submission). Its URL is preserved as linkToTxt.dataFiles[] and remain available at their EDGAR URLs.Because Rule 424(b)(8) is a procedural catch-up paragraph rather than a content rule, the substantive disclosure required in a 424B8 is whatever the underlying paragraph (most often (b)(2), (b)(3), (b)(5), or (b)(7)) demands. The most material changes over the dataset's coverage period (January 2006 to present) are therefore changes to those underlying paragraphs and to surrounding rules:
Form 424B8 has been an EDGAR-accepted HTML/SGML submission throughout the dataset's coverage period. The principal format developments visible across records are:
<DOCUMENT>...<TEXT>...</TEXT></DOCUMENT> envelope around the registrant-supplied HTML body, with <TYPE>424B8 driving form classification.<div> tiles and image-backed pages — a stylistic change that affects parseability but not substance.ffd:/dei: tagging. The 424B8 prospectus body itself is not XBRL-tagged.metadata.documentFormatFiles[*].description (e.g. PRICING SUPPLEMENT, PRELIMINARY PRICING SUPPLEMENT, PROSPECTUS SUPPLEMENT) — these reveal which underlying (b)(x) paragraph the filing was originally meant to satisfy.metadata.documentFormatFiles[*].type == "424B8" rather than by filename. Registrants reuse legacy filenames freely; for example a Bank of Montreal pricing supplement may carry a filename containing 424b2 while being typed 424B8 in metadata.entities[], with hyphen-suffixed file numbers (333-XXXXXX-NN) distinguishing each co-registrant on the same registration statement.seriesAndClassesContractsInformation[] array is a placeholder inherited from the EDGAR submission schema and is empty for the operating-company and finance-subsidiary issuers that produce nearly all 424B8 filings; it would only populate for investment-company filers reporting series and class identifiers.linkToTxt, linkToHtml, linkToFilingDetails, and linkToXbrl URLs let a consumer round-trip back to EDGAR for the artifacts not packaged locally (image attachments, the raw submission wrapper, the standalone XBRL instance documents, and the EDGAR XBRL viewer rendering).fileNo carried in entities[*].fileNo.The filer is always the Securities Act registrant whose effective registration statement covers the offering. That is typically the issuer itself, or, in shelf and structured-product programs, the registrant on whose registration statement a takedown is being conducted (often a finance subsidiary, with a parent guarantor).
The pool of registrants drawn into 424B8 filings is dominated by high-volume shelf issuers, including:
Underwriters, dealers, selling securityholders, and parent guarantors may be named in the prospectus and may carry liability exposure, but they do not file Form 424B8 in their own right. The filing is made under the registrant's CIK.
Form 424B8 is a corrective submission. It exists solely to cure a missed Rule 424(b) deadline.
Rule 424(b)(1) through (b)(7) each prescribe how and when specific categories of prospectuses, supplements, and pricing materials must be filed in connection with an effective registration statement, generally within two business days (occasionally five) of the relevant pricing, sale, or first-use event. When a registrant fails to file within that window, Rule 424(b)(8) requires it to file the prospectus "as soon as practicable after the discovery of the failure to file," designating the EDGAR submission as 424B8 rather than the originally applicable 424B1 through 424B7 type.
The trigger is therefore event-driven and two-step: (1) a missed original (b)(1)–(b)(7) deadline, and (2) the subsequent discovery of that lapse, which starts the "as soon as practicable" clock. There is no fixed numeric deadline for the 424B8 itself, and there is no voluntary or strategic reason to elect 424B8 in a timely-filing scenario. Choosing the 424B8 label is itself an admission that the original deadline was missed.
Filing dates do not follow a periodic schedule. They cluster around:
Dataset coverage begins in January 2006, immediately after the SEC's 2005 Securities Offering Reform restructured the shelf and prospectus-supplement regime and clarified the (b)(8) corrective path. Earlier paper or pre-Reform filings are not included.
Form 424B8 is not a distinct prospectus type. It is a cure-filing label used when a prospectus required under another paragraph of Rule 424(b) (b1 through b7) missed its filing deadline and is being submitted late, "as soon as practicable" after discovery. The substantive content of any 424B8 mirrors whichever paragraph the late filing was supposed to satisfy. That single fact governs every comparison below.
424B1 — initial prospectus with Rule 430A pricing. The on-time filing for prospectuses adding information omitted from the effective registration under Rule 430A (typically IPO pricing). A 424B1 and a late-cured 424B8 derived from a missed b(1) deadline can be content-identical; only the timing posture differs.
424B2 — base prospectus plus shelf takedown pricing supplement (Rule 430B). The high-volume workhorse for shelf debt and MTN programs. A 424B8 carrying a missed b(2) takedown shows the same coupon, maturity, CUSIPs, and underwriters; the form code signals only the missed window.
424B3 — material changes or additions to a previously filed prospectus. Narrative-heavier than pricing supplements (updated risk factors, transaction changes, revised financials). A late b(3) becomes a 424B8, distinguishable from a missed b(2)/b(5) only by reading the document body.
424B4 — final priced prospectus where changes exceed Rule 430A scope. The IPO-pricing filing used when material information beyond 430A omissions is added. IPO-pricing studies should treat 424B8 filings whose underlying paragraph is b(1) or b(4) as part of the IPO population.
424B5 — shelf-takedown prospectus supplement. Overlaps heavily with 424B2 but applies to different combinations of base-prospectus reliance and Rule 430B mechanics. A missed b(5) deadline likewise produces a 424B8 carrying equivalent supplement content.
424B7 — selling-securityholder reoffer prospectus. Used for resales by selling holders (e.g., PIPE shares). Disclosure centers on the selling-holder table and resale mechanics rather than primary-issuer pricing. A late b(7) becomes a 424B8 retaining that selling-holder structure.
424A — preliminary prospectus (red herring). Upstream of all 424B paragraphs in the offering timeline. 424B8 has no preliminary-stage analog; it is always a final or supplemental prospectus filed late under one of the b-paragraphs. The two are not substitutes in any research design.
S-1 / S-3 — base registration statements. These authorize the offering; 424B filings (including 424B8) are the as-used prospectuses delivered after effectiveness or pricing. An S-1/S-3 dataset gives the registered universe and full legal disclosure; a 424B8 dataset gives only the late-cured prospectuses. Linking 424B8 records back to their underlying S-1/S-3 (via CIK and registration file number) is often necessary for full offering context.
FWP — free writing prospectus (Rule 433). A separate communications regime for term sheets, road show materials, and pricing communications outside the statutory prospectus. FWPs are not Rule 424 prospectuses, are not subject to b-paragraph deadlines, and never trigger 424B8 cure filings, even when they accompany the same shelf takedown.
Form 497 — mutual fund prospectus. Procedurally analogous as a post-effective prospectus filing, but governed by the Investment Company Act of 1940 and filed by registered funds. Filer population, content (objectives, fees, share classes), and downstream uses do not overlap with the 424B series.
The dataset is defined by timing failure, not content type. Every record is a prospectus that should have been filed under another 424(b) paragraph and was not filed on time. Three consequences follow:
No other SEC dataset occupies this niche of late-cured Rule 424(b) prospectuses.
Every record in this corpus is a late-filed prospectus cured under Rule 424(b)(8), which makes it useful to a narrow set of professionals who care about filing-timeline discipline, structured-note terms, or peer cure behavior. Most workflows draw on three layers: the metadata.json header (entities[], cik, filedAt, accessionNo, formType), the primary 424B8 HTML prospectus body, and the Exhibit 107 inline XBRL filing-fee table when present.
Issuer- and underwriter-side disclosure counsel use the corpus as a structured ledger of cure filings to scope Section 11 and Section 12(a)(2) exposure. They reconcile the offering or pricing date stated in the prospectus body against filedAt to measure the gap beyond the original 424(b)(1)/(2)/(5)/(7) window, infer which paragraph should have applied, and benchmark how peers word the late-filing event in the cured supplement. Output: liability memos, Rule 159A access-equals-delivery analyses, and precedent banks of cure language.
Compliance teams at issuers, broker-dealers, and underwriters aggregate entities[].name, cik, and filedAt to score their own filing-deadline performance against peers and to flag deal teams that repeatedly route through 424B8. Output: internal SLA dashboards, business cases for filing-automation tooling, and remediation evidence presented to internal audit and to examiners.
Pricing supplements for structured notes, market-linked CDs, and shelf takedowns dominate this channel. Analysts parse the HTML body for payoff formulas, barriers, buffers, participation rates, observation dates, underlying baskets, reference indices, issuer credit terms, and CUSIP/ISIN identifiers, then key the extracted terms to cik and filedAt to build secondary-market reference tables, back-test payoffs against realized index paths, and detect competitor launches that only surface in this channel. Exhibit 107 supplies machine-readable offering size and fee class for issuance dashboards.
Teams building prospectus-summarization, risk-factor classification, and term-extraction models use the corpus as a small, well-labeled training and evaluation slice. Its bounded scope, consistent HTML/PDF prospectus structure, and clean metadata.json labels make it suitable for supervised fine-tuning, RAG retrieval evaluation, and pre-training on payoff and indicative-terms language.
Examinations and market-oversight staff group by entities[].cik and filedAt to surface registrants whose 424B8 cadence suggests systemic prospectus-filing-control weakness, then compare the cured supplement against the base prospectus and registration statement to test whether terms changed materially. Output: examination scoping memos, deficiency letters, and referrals.
Syndicate and debt-capital-markets bankers compare the offering or pricing date in the prospectus body against the EDGAR filedAt to identify peers that habitually cure late, informing competitive pitches to issuer clients and post-mortems on missed filing windows in the desk's own deal flow.
Auditors covering registrants with active shelf programs use cik and filedAt to confirm whether a client filed via the 424B8 channel during the audit period, then read the prospectus body to test issuance-fee revenue-recognition timing, the completeness of offerings disclosed in financial-statement footnotes, and the design effectiveness of disclosure-controls procedures for ICFR walk-throughs.
Experts supporting Section 11 and Section 12 cases use accessionNo and filedAt to fix EDGAR receipt time as chain-of-custody anchors, and the HTML body to compare the operative offering terms against the version delivered to investors, supporting reliance and damages opinions in expert reports.
Vendors building structured-note inventories and prospectus libraries normalize entities[] to internal issuer IDs, deduplicate on accessionNo against other 424(b) channels, and parse the HTML and Exhibit 107 fee tables to enrich product-master records with offering size, fee class, and underlying-asset metadata for downstream wealth-management feeds.
Academic and practitioner researchers treat the corpus as a clean sample of self-cured disclosure failures. They link late-filing frequency and issuer characteristics from entities[] to outcomes such as restatements, enforcement actions, or shelf-program continuation, producing empirical work on Rule 424 compliance.
Because every record is a late-cured Rule 424(b) prospectus, the dataset supports a tight set of workflows that combine the metadata.json header with the primary 424B8 HTML body and, when present, the EX-FILING FEES inline-XBRL exhibit.
A securities attorney scoping Section 11 and Section 12(a)(2) exposure for a shelf-program client extracts the trade date, original issue date, and pricing date from the "Key Terms" table in the primary 424B8 HTML body and subtracts them from metadata.filedAt. Grouping by entities[].cik and the registration fileNo produces a per-issuer distribution of cure latencies, a precedent bank of how peers word the late-filing event, and inputs to access-equals-delivery memos under Rule 159A.
A structured-product desk strategist iterates over every record where documentFormatFiles[*].description contains PRICING SUPPLEMENT, parses the prospectus HTML for CUSIP/ISIN, underlying basket, barrier/buffer levels, contingent-coupon thresholds, autocall dates, and payoff formulas, and joins each row to offering size and fee class extracted from the ffd: tags in the EX-FILING FEES iXBRL document. Output: a secondary-market reference table keyed by cik and filedAt that feeds payoff back-tests and competitor-launch monitors for issuers like JPMorgan Chase Financial, GS Finance, and Citigroup Global Markets Holdings.
A broker-dealer compliance officer aggregates entities[].cik, entities[].companyName, fileNo, and filedAt across the full corpus to compute monthly counts of 424B8 cures per registration statement, then ranks the firm against named peers. The resulting SLA dashboard identifies deal teams that route disproportionately through (b)(8), supports business cases for filing-automation tooling, and produces remediation evidence for examiners.
A reference-data engineer at a prospectus-data vendor walks entities[] for every record, splitting on the (Filer) versus (Subject) role suffix and on hyphen-suffixed fileNo values (333-XXXXXX-NN) to reconstruct issuer-guarantor pairs such as JPMorgan Chase Financial / JPMorgan Chase & Co. or GS Finance Corp. / The Goldman Sachs Group. The output is a normalized issuer-guarantor crosswalk plus SIC and stateOfIncorporation attributes used to enrich downstream wealth-management product feeds.
An applied NLP team uses the corpus as a bounded fine-tuning slice for term-extraction and risk-factor classification on structured-note language. The HTML bodies cover three distinct layout regimes (semantic HTML, absolutely-positioned <DIV> print templates, and PDF-rendered tile HTML) referenced in the anatomy, and metadata.json supplies clean labels (formType, entities[].sic, documentFormatFiles[*].description) for supervised fine-tuning, RAG retrieval evaluation, and layout-robustness testing.
An external auditor testing issuance-fee revenue-recognition timing for a shelf-program client filters records by entities[].cik and audit-period filedAt, opens the EX-FILING FEES iXBRL exhibit, and reads ffd:AggtSalesPric, ffd:FeeRate, and ffd:FeeAmt against the aggregate offering price and discount described in the "Plan of Distribution" section of the 424B8 HTML body. Discrepancies feed disclosure-controls walk-throughs for ICFR and completeness testing of offerings recorded in financial-statement footnotes.
The dataset is distributed through the sec-api.io datasets API. A JSON index endpoint exposes dataset metadata and all container download URLs, while the dataset itself is available either as one consolidated archive or as individual monthly container ZIPs. All download endpoints require a sec-api.io API key, passed either as a ?token=YOUR_API_KEY query parameter or via an Authorization header. The index endpoint itself is public and does not require authentication.
Dataset Index JSON API: https://api.sec-api.io/datasets/form-424b8-files.json
Returns dataset-level metadata (name, description, updatedAt, earliestSampleDate, totalRecords, totalSize, formTypes, containerFormat, fileTypes) and a containers array listing every monthly container with its key, downloadUrl, size, recordsCount, and updatedAt timestamp. Poll this endpoint to detect which containers changed in the latest refresh run and incrementally download only those containers.
1
{
2
"datasetId": "1f13365b-9ae0-696d-8403-2189a750d9c1",
3
"datasetDownloadUrl": "https://api.sec-api.io/datasets/form-424b8-files.zip",
4
"name": "Form 424B8 Files Dataset",
5
"updatedAt": "2026-04-21T02:54:31.354Z",
6
"earliestSampleDate": "2006-01-01",
7
"totalRecords": 2769,
8
"totalSize": 88838022,
9
"formTypes": ["424B8"],
10
"containerFormat": "ZIP",
11
"fileTypes": ["HTML", "JSON", "PDF", "TXT"],
12
"containers": [
13
{
14
"downloadUrl": "https://api.sec-api.io/datasets/form-424b8-files/2026/2026-04.zip",
15
"key": "2026/2026-04.zip",
16
"size": 1248301,
17
"recordsCount": 12,
18
"updatedAt": "2026-04-21T02:54:31.354Z"
19
}
20
]
21
}
Download Entire Dataset: https://api.sec-api.io/datasets/form-424b8-files.zip?token=YOUR_API_KEY
A single archive containing every monthly container ZIP from January 2006 to the latest refresh. Use this for an initial bulk load. Requires an API key.
Download Single Container: https://api.sec-api.io/datasets/form-424b8-files/2026/2026-04.zip?token=YOUR_API_KEY
Each container is a monthly ZIP at form-424b8-files/YYYY/YYYY-MM.zip. Inside, one folder per accession number holds a metadata.json file plus the filing's original EDGAR documents in HTML, TXT, JSON, and PDF form (image attachments excluded). Requires an API key.
Example with curl:
1
curl -O "https://api.sec-api.io/datasets/form-424b8-files/2026/2026-04.zip?token=YOUR_API_KEY"
Or with wget:
1
wget "https://api.sec-api.io/datasets/form-424b8-files/2026/2026-04.zip?token=YOUR_API_KEY"
For batch downloads, the helper script scripts/download-sec-api-file.js can be used to fetch one or more container files from the dataset index without manually composing each URL.
The dataset covers Form 424B8, the EDGAR submission code for prospectuses filed pursuant to Rule 424(b)(8) of the Securities Act of 1933. Rule 424(b)(8) is a residual catch-up paragraph used when a prospectus required under another paragraph of Rule 424(b) — typically (b)(2), (b)(3), (b)(5), or (b)(7) — was not filed within the time frame the underlying paragraph requires.
One record is a single EDGAR submission of Form 424B8, identified by an 18-digit SEC accession number. On disk it is a folder containing a metadata.json describing the EDGAR submission, the primary 424B8 prospectus HTML, and, when the registrant attached one, an Exhibit 107 EX-FILING FEES inline-XBRL HTML document. Each accession is a separate record, even when the same issuer files several late prospectus supplements within the same month.
The filer is always the Securities Act registrant whose effective registration statement covers the offering — typically the issuer or, in shelf and structured-product programs, a finance subsidiary with a parent guarantor. The 424B8 population is dominated by well-known seasoned issuers (WKSIs) on automatic shelves and by bank holding companies and their finance subsidiaries issuing medium-term notes and structured notes. Underwriters, dealers, and selling securityholders may be named in the prospectus but do not file Form 424B8 in their own right.
The dataset includes all Form 424B8 filings submitted to EDGAR from January 2006 to the present, refreshed on an ongoing basis as new filings arrive. Coverage starts in 2006 because the SEC's 2005 Securities Offering Reform restructured the shelf and prospectus-supplement regime and clarified the (b)(8) corrective path; earlier paper or pre-Reform filings are not included.
Forms 424B1 through 424B7 are routine, on-time prospectus filings tied to specific paragraphs of Rule 424(b) (initial pricing, shelf takedowns, material changes, selling-holder reoffers, and so on). Form 424B8 is the corrective label used when one of those deadlines has been missed, filed "as soon as practicable after the discovery of the failure to file." The substantive prospectus content is generally what would have been filed under the original paragraph; only the submission type changes.
The dataset is distributed as monthly ZIP containers named YYYY/YYYY-MM.zip. Each ZIP unpacks to a YYYY-MM/ directory containing one accession-numbered subfolder per filing, and each accession folder contains a metadata.json plus the registrant-supplied EDGAR documents in HTML, TXT, JSON, and PDF form. Image attachments (GRAPHIC entries) are excluded from the ZIP but remain accessible at their original sec.gov URLs.
The dataset is served by the sec-api.io datasets API. The public index endpoint at https://api.sec-api.io/datasets/form-424b8-files.json lists every monthly container with its download URL, size, record count, and update timestamp. Authenticated download endpoints require a sec-api.io API key passed as ?token=YOUR_API_KEY or via an Authorization header, and let you fetch either the consolidated archive (form-424b8-files.zip) or any single monthly container (form-424b8-files/YYYY/YYYY-MM.zip).