The Form S-2 Files Dataset is a closed historical archive of every Form S-2 and Form S-2/A registration statement accepted by EDGAR between January 1994 and December 2005, the month in which the SEC retired the form under Securities Offering Reform. Each record is one EDGAR submission — an accession-numbered folder containing a metadata.json manifest, the primary registration-statement document, and every exhibit and correspondence document filed with it. The underlying form was the middle tier of the SEC's pre-2005 integrated disclosure system: a long-form Securities Act registration available only to seasoned U.S. domestic issuers with at least three years of Exchange Act reporting history who did not qualify for the short-form Form S-3. Filings are grouped into monthly ZIP containers keyed by acceptance month, distributed alongside TXT, JSON, HTML, and PDF document bodies. Because Form S-2 was eliminated effective December 1, 2005 and folded into a revised Form S-1, the corpus is permanently bounded and does not grow.
Programmatically retrieve the full list of dataset archive files, download URLs and dataset metadata.
Dataset Index JSON API
Download the entire dataset as a single archive file.
Download Entire Dataset:
Download a single container file (e.g. monthly archive) from the dataset.
Download Single Container:
The dataset captures the full EDGAR-era population of Form S-2 (original registration statement) and Form S-2/A (pre-effective or post-effective amendment) submissions. The formType vocabulary is restricted to exactly these two values. Each record is a single accession-numbered submission, not an issuer or an offering: a registrant that filed an original S-2 followed by three S-2/A amendments produces four independent, self-contained records. The accession number, not the registrant, defines the record boundary.
Form S-2 was a long-form registration statement under the Securities Act of 1933 designed for seasoned issuers. It occupied a middle position between the full-disclosure Form S-1 and the short-form Form S-3: it permitted the registrant to incorporate substantial portions of its Exchange Act history (10-K, 10-Q, 8-K, proxy materials) by reference rather than reprinting it, while still requiring a delivered prospectus that included or summarized the financial statements. Because of this hybrid posture, an S-2 record typically carries a prospectus, a description of the securities being registered, risk factors, use of proceeds, plan of distribution, and either embedded or incorporated-by-reference audited financial statements and MD&A, plus the standard slate of Part II exhibits.
Records are grouped into monthly ZIP archives keyed by acceptance month (for example 2005/2005-08.zip). The dataset spans January 1994 through December 2005 inclusive and is a closed corpus: no new S-2 or S-2/A filings exist after the December 2005 sunset. Document bodies inside each record are preserved verbatim in their original formats — predominantly ASCII (.txt) and HTML (.htm, .html), with a minority of late-period PDF exhibits — each still wrapped in its EDGAR SGML document envelope. The metadata.json manifest is included for every record.
Every accession-numbered folder has three physical components:
metadata.json — exactly one per folder, holding filing-level facts (form type, accession number, acceptance timestamp, registrant identifiers, URLs) and a document index for every constituent file.<TYPE> is S-2 or S-2/A, almost always at <SEQUENCE>1. Filenames are filer-coined (e.g. forms2a03725_08152005.htm, ds2.htm, grays2.txt, a05-15449_1s2a.htm, v023069_s2-a.txt, doc1.txt).<TYPE>, <SEQUENCE>, and <FILENAME>. When the filer omitted a filename (most commonly for CORRESP), EDGAR substitutes filenameN.htm or filenameN.txt, where N is the document sequence number.The folder name is the 18-character SEC accession number with dashes stripped (NNNNNNNNNNYYNNNNNN). Every document file — whether the inner body is plain ASCII or HTML — opens with an EDGAR SGML document envelope: line-prefixed tags <DOCUMENT>, <TYPE>, <SEQUENCE>, <FILENAME>, an optional <DESCRIPTION>, then <TEXT> followed by the body. For HTML documents the body begins with <HTML> immediately after <TEXT>; for ASCII documents the body begins with <PAGE> form-feed markers and fixed-width text. When <DESCRIPTION> is present, its value matches the description field that the same document carries inside metadata.json.
Cover page and facing sheet. The primary document opens with the EDGAR-mandated facing sheet: form name and registration-number placeholder, exact legal name of the registrant, state or jurisdiction of incorporation, primary standard industrial classification code, IRS Employer Identification Number, principal-executive-office address, agent for service, telephone number, and the calculation-of-registration-fee table. The fee table itemizes each title of securities being registered together with amount to be registered, proposed maximum offering price per unit, proposed maximum aggregate offering price, and the registration fee due.
Prospectus body. The prospectus follows the facing sheet and contains, in roughly this order: a cover page summarizing the offering, a table of contents, a summary of the offering and the company, risk factors, use of proceeds, dilution where applicable, capitalization, selected financial data, management's discussion and analysis (or an incorporation-by-reference pointer to the most recent 10-K and intervening 10-Qs and 8-Ks), description of the business, description of the securities being offered, plan of distribution and underwriting arrangements, legal matters, experts, and a "where you can find additional information" pointer to EDGAR. Because S-2 eligibility presumes Exchange Act seasoning, large portions of the descriptive and historical disclosure are routinely incorporated by reference rather than restated.
Part II — information not required in the prospectus. After the prospectus, Part II contains other expenses of issuance and distribution, indemnification of directors and officers, recent sales of unregistered securities, the exhibit index, undertakings of the registrant, and the signature page. Signatures identify the registrant, the principal executive officer, the principal financial and accounting officer, and a majority of the board of directors.
Exhibits. Each exhibit is a separately wrapped document in its own file. <TYPE> values in the corpus follow the Regulation S-K Item 601 numbering scheme. Common types include:
EX-1 — underwriting agreementEX-3 — charter and bylaws (when not incorporated by reference)EX-4, EX-4.05 — instruments defining the rights of securityholders (indentures, warrant agreements, certificates of designation)EX-5, EX-5.1, EX-5.01 — opinion of counsel on legality of the securities being registeredEX-10, EX-10.18, EX-10.(VII) — material contracts (employment agreements, credit facilities, postponement agreements)EX-11 — computation of earnings per shareEX-21, EX-21.01 — subsidiaries of the registrantEX-23, EX-23.1, EX-23.2, EX-23.(A), EX-23.01 — consents of independent registered public accounting firms and other expertsEX-99, EX-99.1, EX-99.A through EX-99.J — additional exhibits (pricing supplements, form proxies, board resolutions)CORRESP — correspondence with SEC staffGRAPHIC — image attachments (enumerated in the manifest but not stored in the dataset; see "Excluded content" below)The exhibit index inside Part II enumerates these by Item 601 number; the same exhibits appear physically as separate files in the folder, each with its own SGML envelope.
metadata.json manifestThe manifest is a flat JSON object that surfaces filing-level facts and a document index. Its fields are:
formType — "S-2" or "S-2/A".accessionNo — dashed EDGAR accession number ("NNNNNNNNNN-YY-NNNNNN"); the parent folder name is the same value with dashes stripped.description — human-readable EDGAR form description, e.g. "Form S-2 - Registration of securities" or "Form S-2/A - Registration of securities: [Amend]".filedAt — ISO-8601 acceptance timestamp in Eastern Time, e.g. "2005-08-29T16:30:03-04:00".linkToFilingDetails — URL of the primary document on EDGAR.linkToHtml — URL of the EDGAR filing index page.linkToTxt — URL of the full-submission text bundle on EDGAR (the single .txt aggregation of every document with SGML wrappers).linkToXbrl — present but always an empty string; Form S-2 predates XBRL.dataFiles — present but always an empty array, for the same reason.id — 32-character hexadecimal identifier assigned by the dataset publisher.documentFormatFiles — array of document descriptors, one per submitted document plus a trailing complete-submission pseudo-entry.entities — array of filer/registrant identification objects.documentFormatFiles entries. Each entry carries:
sequence — numeric string (e.g. "1", "2") for ordinary documents; a single-space string " " for the trailing complete-submission pseudo-entry.size — byte count expressed as a JSON string, not a number.documentUrl — EDGAR URL whose final path segment matches the on-disk filename inside the folder.description — optional uppercase free-text label such as OPINION RE LEGALITY, CONSENT OF AUDITORS, POSTPONEMENT AGREEMENT, FORM S-2.type — EDGAR document type (S-2, S-2/A, EX-5, EX-23.1, EX-10.18, EX-99.J, CORRESP, GRAPHIC, and so on).Sequence numbering reflects the original EDGAR submission and may be non-contiguous when the filer reserved or withdrew a sequence slot.
entities entries. Each entity object describes a filer or co-registrant and carries:
companyName — legal name with the EDGAR role appended in parentheses, e.g. "ACME CORP (Filer)".cik — zero-padded ten-digit CIK.fileNo — SEC file number, typically 333-NNNNNN for registration statements; stable across the amendment chain.irsNo — Employer Identification Number, sometimes "000000000" for shell or pre-revenue issuers.stateOfIncorporation — two-letter code.sic — SIC code prefixed to its human-readable industry label.fiscalYearEnd — four-digit MMDD string.act — always "33" because S-2 is filed under the Securities Act of 1933.filmNo — EDGAR film/microfiche number.type — registrant role-specific form code.tickers — array of public ticker symbols, included only when the issuer has them.Most S-2 filings list a single entity (the registrant); guarantor co-registrant entries are possible under EDGAR conventions.
Each record includes every textual document EDGAR received for that S-2 or S-2/A submission: the primary registration statement, every exhibit (legality opinions, auditor consents, material contracts, subsidiary lists, EPS computations, additional exhibits), and any CORRESP documents. ASCII (.txt) and HTML (.htm, .html) bodies are preserved verbatim, each still wrapped in its original EDGAR SGML document envelope. A minority of late-period submissions also carry PDF exhibits, which are preserved as-is.
Two categories of submission content are intentionally absent:
GRAPHIC documents (logos, signature images, hand-drawn organizational charts, photographs) are excluded from the ZIP archives. They remain enumerated as GRAPHIC-type entries in documentFormatFiles with valid documentUrl values pointing to EDGAR, so a consumer can fetch them from sec.gov if needed, but they are not written to disk inside the folder..txt bundle. EDGAR exposes every filing as a single concatenated text file (<accessionWithDashes>.txt) that re-aggregates every document with its SGML wrapper. This bundle is referenced as a trailing documentFormatFiles entry whose sequence and type are both a single space character, and its URL is also exposed via linkToTxt, but the bundle itself is not stored locally — only the individual decomposed documents are written into the folder.Material that the registrant incorporated by reference (prior 10-Ks, 10-Qs, 8-Ks, proxy statements) is not duplicated inside the S-2 record. The prospectus names the incorporated filings, but their content lives in the dataset for the relevant form type.
Across the 1994–2005 window, Form S-2's eligibility criteria and core disclosure architecture remained substantially stable: a seasoned-issuer registration framework that combined a delivered prospectus with permissive incorporation by reference of Exchange Act materials. The most consequential changes during the dataset's lifetime were external to the form rather than internal to it. Regulation M (1996–1997) altered the plan-of-distribution and stabilization disclosures; Regulation FD (2000) reshaped how forward-looking and selective-disclosure issues were framed in risk factors and MD&A; the Private Securities Litigation Reform Act safe-harbor language migrated into forward-looking-statement legends; and Sarbanes-Oxley (2002) introduced certifications and internal-control disclosures that, while primarily Exchange Act features, propagated into the incorporated-by-reference base that an S-2 relied upon. The form itself was then eliminated in December 2005 as part of Securities Offering Reform, which folded the S-2 use case into a revised Form S-1 and an expanded Form S-3 (including the new well-known seasoned issuer category and automatic shelf registration). The dataset terminates cleanly at that regulatory boundary.
The relevant format evolution within the 1994–2005 window is the EDGAR transition from ASCII-only submissions to HTML submissions. Early filings in the corpus are uniformly plain text — fixed-width prospectus pages, <PAGE> form-feed markers, ASCII tables — wrapped in SGML document envelopes inside .txt files. From the late 1990s onward, filers progressively adopted HTML, and by the form's final years both conventions coexist, often within the same monthly archive: some filers continued to submit everything as wrapped ASCII while others used HTML for the primary document and exhibits alike. PDF exhibits also appear in a minority of late-period filings. The SGML document envelope itself (<DOCUMENT>, <TYPE>, <SEQUENCE>, <FILENAME>, optional <DESCRIPTION>, <TEXT>) is invariant across the entire window and prefixes every document file in the dataset.
fileNo (333-NNNNNN), which is stable across the amendment chain.ds2.htm, grays2.txt, forms2a03725_08152005.htm, a2161212zs-2a.htm, and doc1.txt are all primary S-2 documents. The authoritative identifier for what a file is comes from the SGML <TYPE> tag inside the file and the matching type field in metadata.json.CORRESP), the file appears as filenameN.htm or filenameN.txt, where N is the document sequence number. These are valid content files, not placeholders.documentFormatFiles[i].sequence reflect the original EDGAR submission, not packaging gaps. The complete-submission pseudo-entry always trails the list with sequence and type both set to a single space character — consumers iterating the array should skip it.GRAPHIC entries point off-archive. A GRAPHIC entry in documentFormatFiles references a .jpg or similar image file that is not present in the folder. The documentUrl value remains a valid EDGAR URL.<DOCUMENT> / <TYPE> / <SEQUENCE> / <FILENAME> / optional <DESCRIPTION> / <TEXT> header lines before the actual content. HTML parsers that expect a leading <HTML> token must skip past the envelope first; the matching closing </TEXT> and </DOCUMENT> tags follow the body.size and sequence in documentFormatFiles are numeric values expressed as JSON strings, not numbers.linkToXbrl is always an empty string and dataFiles is always an empty array because Form S-2 predates structured-data requirements; these fields exist for schema consistency with later form-type datasets and should not be treated as gaps.Each record is a Form S-2 or Form S-2/A registration statement filed on EDGAR by the issuer itself, acting as registrant under Section 5 of the Securities Act of 1933. The filer is the company whose securities are being registered for public sale. Subsidiary guarantors and other co-registrants on registered debt offerings may appear as additional filers on a single submission.
Form S-2 was available only to seasoned U.S. domestic issuers. A registrant generally had to:
Issuers outside this profile used a different form. First-time registrants, recent IPO issuers, companies emerging from reorganization, shell companies, and seasoned issuers that had fallen out of timely Exchange Act compliance filed on Form S-1. Issuers that also met the public-float threshold (typically $75 million of non-affiliate float during the relevant period) and other Form S-3 conditions used Form S-3, which permitted Rule 415 shelf registration and broader forward incorporation by reference. S-2 captured the in-between group: seasoned enough to refer back to their Exchange Act file, but not eligible for the most abbreviated S-3 regime.
Foreign private issuers used the parallel F-series (Form F-1, Form F-2, Form F-3), not S-2. Investment companies used the N-series (Form N-1A, Form N-2, etc.). Asset-backed issuers, business development companies, and other specialized registrants used their own form regimes and do not appear in this dataset.
Filing is event-driven, not periodic. The trigger is the issuer's decision to offer and sell securities to the public in a transaction requiring registration under Section 5 of the Securities Act. There is no recurring deadline.
Offerings commonly registered on Form S-2 included:
Unlike Form S-3, Form S-2 did not support primary shelf takedowns in the same flexible manner; its incorporation-by-reference mechanics were also more limited, generally requiring delivery of the most recent annual report to security holders rather than ongoing forward incorporation of future Exchange Act filings.
Form S-2/A records are pre-effective or post-effective amendments to a previously filed S-2. They arise from:
Because Securities Act review is iterative, a single offering often produced one S-2 and several S-2/A filings before effectiveness. Each amendment is a separately accessioned EDGAR submission and a separate record.
Key dates around an S-2 filing are:
Eligibility was tested as of filing, and the issuer had to remain Exchange Act-compliant throughout review; loss of timely-filer status could force a downgrade to Form S-1.
The SEC adopted Securities Offering Reform in Release No. 33-8591 (July 2005), effective December 1, 2005. The release restructured Securities Act registration around four issuer categories — well-known seasoned issuers (WKSIs), other seasoned issuers, unseasoned reporting issuers, and non-reporting issuers — and eliminated Form S-2 (and Form F-2), folding backward incorporation by reference into a revised Form S-1. After the transition window, formerly S-2-eligible issuers moved either to Form S-3 (if qualified) or to the enhanced Form S-1. No S-2 or S-2/A filings exist after late 2005, which permanently bounds the dataset.
Form S-2 occupied the middle tier of Securities Act registration from 1982 to December 2005, distinguished from its neighbors by partial incorporation by reference paired with mandatory physical delivery of incorporated reports. The most useful comparisons are with the other S-series registration forms, their F-series foreign-issuer counterparts, Rule 424(a) prospectus filings downstream of effectiveness, and the Form 10-K/Form 10-Q reports incorporated by reference.
The nearest substitute. S-1 is the general-purpose long-form registration with the same core content (prospectus, risk factors, use of proceeds, plan of distribution, financial statements). The decisive split is eligibility and mechanics: S-1 historically required full restatement of business and financial disclosure inside the filing, while S-2 let issuers with three years of Exchange Act reporting incorporate prior 10-K/10-Q content by reference and physically deliver those reports with the prospectus. After Securities Offering Reform (Dec 1, 2005), S-1 absorbed S-2's forward-incorporation provisions, making S-1 the only post-2005 substitute. An S-1 corpus is far larger, spans 1994 to present, and skews toward IPOs and first-time registrants; the S-2 corpus is exclusively seasoned-issuer follow-on registrations from 1994 to 2005.
The next tier up. S-3 was reserved for the most-seasoned issuers (public-float threshold plus 12-month reporting history) and is the vehicle for shelf registration under Rule 415. Both S-2 and S-3 rely on incorporation by reference, but S-3 permits full forward incorporation of future Exchange Act filings and produces a thin base prospectus completed later via 424(b) supplements. S-2 incorporated only prior reports, required their physical delivery, and lacked full shelf mechanics. Shelf takedowns, WKSI activity, and large secondary offerings live in S-3; S-2 captures the mid-tier seasoned issuers who failed S-3's float or follower tests.
Same registration framework, different trigger. S-4 registers securities issued in mergers, business combinations, and exchange offers, and carries transaction-specific disclosure absent from S-2: deal background, fairness opinions, target financials, pro formas, and voting agreements. S-4 remains active post-2005. The corpora are complementary: the same issuer might file S-4 for an acquisition-related issuance and S-2 for a cash follow-on during the same period.
A specialized track for real estate issuers (REITs, real estate LPs) with property-level financials and REIT tax-qualification disclosure not present in S-2. S-2 was a general-issuer form; reporting real estate issuers occasionally used S-2 for non-real-estate offerings, but S-11 is not a substitute corpus.
The F-series mirrors the S-series for foreign private issuers, with F-2 as the direct foreign analogue to S-2 (also eliminated in December 2005 and folded into F-1). F-series filings substitute Form 20-F for the incorporated 10-K, include home-country GAAP reconciliations (pre-IFRS era), and apply different signature and agent-for-service rules. S-2 explicitly excludes foreign private issuers; cross-border seasoned-issuer coverage for the same window requires the F-2 corpus.
Rule 424(a) and Rule 424(b)(1)-(8) filings are the actual prospectuses and supplements used to offer securities after a registration is declared effective. They sit downstream of S-2 in the same transaction: S-2 establishes the registration shell and base prospectus; the 424 filing delivers final pricing, terms, and any post-effectiveness updates. Studies of actual offering economics (pricing, underwriting spreads, deal terms) require 424 data; studies of the registration disclosure itself can rely on S-2 alone. The two are complementary, not interchangeable.
Not registration filings and not offering-triggered. 10-K and 10-Q are calendar-driven Securities Exchange Act of 1934 reports that S-2 issuers incorporated by reference instead of restating. The relationship is dependency, not substitution: an S-2 filing is deliberately incomplete without the contemporaneous 10-K (business description, MD&A, audited financials) and most recent 10-Q. Reconstructing the full disclosure package presented to investors at the time of an S-2 offering requires joining S-2 to the same issuer's 10-K/10-Q filings on or before the effective date.
The Form S-2 Files dataset is distinct on four dimensions:
Routing guide: post-2005 seasoned-issuer offerings shift to S-1 or S-3; merger-related issuances to S-4; foreign issuers to the F-series; final pricing and terms to Rule 424; full contemporaneous disclosure to joined 10-K/10-Q.
The Form S-2 Files corpus is used as a primary-source archive of seasoned-issuer registrations under the pre-Securities Offering Reform tier system. There is no live monitoring use case; users work with a fixed, completed population.
Researchers treat the corpus as a defined population for studies of seasoned equity offerings, debt issuance, underwriting spreads, and post-issuance returns. The use-of-proceeds, plan of distribution, and financial statements (or Exchange Act cross-references) feed offering-level datasets. The fixed end date in December 2005 makes the corpus a clean pre-period for difference-in-differences work around Securities Offering Reform. Law and accounting scholars examine cover pages, eligibility statements, and incorporation-by-reference language to study how the S-1/S-2/S-3 tier system actually operated.
Securities offering associates and KM lawyers at law firms mine the corpus as a precedent library for prospectus drafting language: risk factor formulations, plan of distribution boilerplate, lock-up provisions, indemnification carve-outs, and selling-stockholder disclosure. S-2/A amendments are read alongside the originating S-2 to recover SEC staff comment patterns and the negotiated wording of contested disclosure items.
Equity syndicate and debt capital markets teams use the dataset to study historical offering structures, underwriter lists, gross spreads, over-allotment options, and stabilization language during the period when seasoned issuers chose among S-1, S-2, and S-3. DCM teams pull historical covenant packages, redemption mechanics, and ranking provisions when refinancing legacy instruments or structuring successor-entity offerings.
In-house compliance and securities administration groups archive registration statements filed by the issuer, its predecessors, and acquired entities. The metadata and full submissions support filing inventories, internal audit responses, and tracking of outstanding securities, indenture obligations, and registration rights tied to pre-2005 offerings.
These teams retrieve registrations filed by predecessor entities, divested subsidiaries, and acquisition targets to confirm terms of legacy securities still outstanding, verify representations made in historical offering documents, and support purchase-accounting and successor-issuer determinations. Investor relations uses the same filings to reconcile share counts, conversion mechanics, and inherited registration rights.
Litigation consultants and testifying experts use S-2 and S-2/A filings as primary evidence in long-tail Sections 11 and 12 claims, fraud-on-the-market matters, and restatement disputes referencing pre-2005 disclosures. Financial statements, MD&A incorporated by reference, risk factors, and certifications anchor expert reports on what was disclosed, omitted, or mischaracterized at the time of the offering.
Policy researchers and regulator staff conducting retrospective reviews of Securities Offering Reform use the corpus to document how the eliminated form actually functioned in its final decade, comparing S-2 volume, issuer characteristics, and disclosure content against S-1 and S-3 to reconstruct the rationale for collapsing the tier system.
Teams building prospectus-aware language models and retrieval systems use the corpus as bounded, form-consistent training data. The pairing of S-2 and S-2/A across the same accession family supports section-boundary detection (risk factors, use of proceeds, plan of distribution, selling stockholders) and diff modeling between initial and amended registrations. HTML and TXT variants support segmentation; JSON metadata provides filing-level labels.
Vendor and buy-side data engineering teams ingest the corpus to backfill historical offering coverage in security masters and corporate actions tables, extracting issuer identifiers, offering size, security type, underwriters, and effective dates and linking them to CUSIP, ticker, and entity histories. The fixed, non-growing nature of the corpus makes it well suited to a one-time deterministic load for point-in-time analytics.
Across these groups, the high-value sections are the cover page and eligibility statement, the prospectus body with risk factors and use of proceeds, the plan of distribution, the financial statements or incorporation-by-reference cross-references, and the S-2 to S-2/A amendment diff.
Reconstructing the S-2 to S-2/A redline history per offering. Group records by entities[].fileNo (the stable 333-NNNNNN registration number) and order by filedAt to recover the original S-2 followed by each S-2/A amendment for the same offering. Diffing the primary registration-statement documents across the chain surfaces the exact disclosure items that moved during SEC staff review (risk factor wording, fee-table revisions, plan-of-distribution edits), and any in-chain CORRESP documents tie those edits to staff comment letters.
Building a precedent bank of plan-of-distribution and risk-factor language. Extract the prospectus body from each S-2 or S-2/A primary document (skipping the SGML envelope), section-split on "Plan of Distribution," "Risk Factors," "Use of Proceeds," and "Selling Stockholders," and index the resulting passages by SIC, security type from the fee table, and acceptance year. Capital markets associates and KM teams query this index for boilerplate covering lock-ups, stabilization under Regulation M, indemnification carve-outs, and over-allotment mechanics from the pre-Reform tier era.
Constructing a counsel and auditor panel from EX-5 and EX-23 exhibits. Iterate documentFormatFiles for entries with type matching EX-5* (legality opinions) and EX-23* (auditor and expert consents), parse the firm name and signing office from each exhibit body, and join to the registrant CIK, SIC, and filedAt. The result is a longitudinal panel of which law firms and audit firms served which seasoned issuers during the integrated-disclosure period, usable for market-share studies and conflict checks.
A bounded pre-period dataset for Securities Offering Reform difference-in-differences studies. Because the corpus terminates at the December 2005 sunset, the closed S-2 population forms a clean treated-form sample. Researchers pair S-2 records with same-issuer S-1, S-3, or post-Reform filings using CIK to estimate the effect of Reform on offering frequency, time-to-effectiveness (measured across the S-2 / S-2/A chain), and disclosure length for mid-tier seasoned issuers who lost their dedicated form.
Training and evaluating prospectus-section NLP models on form-consistent data. The fixed corpus and stable Item 601 exhibit vocabulary support supervised tasks: section-boundary segmentation on the prospectus body, classification of exhibits by type, and S-2 vs. S-2/A pair-wise edit modeling. ASCII and HTML variants in the same window let teams test format-robust extractors, while metadata.json provides filing-level labels (form type, SIC, fiscal year end, state of incorporation) without external joins.
Litigation and forensic reconstruction of what was disclosed at offering. For Section 11 and Section 12 claims tied to pre-2005 offerings, experts pull the exact S-2 or S-2/A in force at the relevant date, capture the fee table, risk factors, use-of-proceeds language, and the list of filings incorporated by reference from the prospectus, then resolve those cross-references against contemporaneous 10-K and 10-Q records. The accession-numbered folder plus metadata.json provides a self-contained evidentiary unit with EDGAR-authoritative URLs (linkToFilingDetails, linkToHtml) for chain-of-custody references.
Backfilling historical offering attributes into security masters. Data engineering teams parse the calculation-of-registration-fee table from the primary document to capture security title, registered amount, and aggregate offering price, then enrich with entities[].cik, tickers, sic, and filedAt to produce point-in-time offering records keyed to CUSIP and ticker histories. The non-growing nature of the corpus makes this a one-time deterministic load rather than an ongoing pipeline.
The Form S-2 Files dataset is accessible through three endpoints: a metadata index, a full archive download, and per-container downloads. Container files are monthly ZIP archives keyed by year and month, covering the period from 1994-01 through the most recent month available.
Dataset Index JSON API: https://api.sec-api.io/datasets/form-s2-files.json
Returns dataset-level metadata (name, description, last update timestamp, earliest sample date, total records, total size, form types covered, container format, and file types) along with the full list of container files. Each container entry includes its key, size, record count, last update timestamp, and direct download URL. This endpoint does not require an API key.
Poll this endpoint to monitor which containers were refreshed in the latest run via their updatedAt timestamps, and download only the containers that changed.
Example response:
1
{
2
"datasetId": "1f13365b-9ae0-6973-a718-7adf2a243360",
3
"datasetDownloadUrl": "https://api.sec-api.io/datasets/form-s2-files.zip",
4
"name": "Form S-2 Files Dataset",
5
"updatedAt": "2026-04-15T08:02:51.286Z",
6
"earliestSampleDate": "1994-01-01",
7
"totalRecords": 12045,
8
"totalSize": 256798968,
9
"formTypes": ["S-2", "S-2/A"],
10
"containerFormat": "ZIP",
11
"fileTypes": ["TXT", "JSON", "HTML", "PDF"],
12
"containers": [
13
{
14
"downloadUrl": "https://api.sec-api.io/datasets/form-s2-files/2005/2005-12.zip",
15
"key": "2005/2005-12.zip",
16
"size": 1842311,
17
"records": 18,
18
"updatedAt": "2026-04-15T08:02:51.286Z"
19
}
20
]
21
}
Download Entire Dataset: https://api.sec-api.io/datasets/form-s2-files.zip?token=YOUR_API_KEY
Downloads the complete dataset as a single ZIP archive containing all monthly containers from 1994-01 through 2005-12. This endpoint requires a valid sec-api.io API key passed via the token query parameter.
Download Single Container: https://api.sec-api.io/datasets/form-s2-files/2005/2005-12.zip?token=YOUR_API_KEY
Downloads one monthly container ZIP instead of the full archive. Replace the year and month segments with any valid container key from the dataset index (e.g. 1994/1994-01.zip through 2005/2005-12.zip). This endpoint requires a valid sec-api.io API key passed via the token query parameter.
The Form S-2 Files Dataset covers exactly two form types: S-2 (the original long-form registration statement under the Securities Act of 1933) and S-2/A (pre-effective or post-effective amendments to a previously filed S-2). No other Securities Act or Exchange Act forms appear in the corpus.
One record is a single EDGAR submission identified by its 18-character SEC accession number. Physically, each record is an accession-numbered folder containing a metadata.json manifest, the primary S-2 or S-2/A registration-statement document, and every exhibit and CORRESP document filed with that submission. An issuer that filed an original S-2 and three S-2/A amendments produces four independent records, not one.
Form S-2 was available only to seasoned U.S. domestic issuers — companies organized in the United States that had been subject to Exchange Act reporting for at least 36 consecutive months, were current and timely in their Section 13, 14, and 15(d) filings, and had not defaulted on dividends, sinking fund installments, borrowed money, or material long-term leases since their last audited fiscal year. Foreign private issuers, investment companies, and asset-backed issuers used separate form regimes.
The corpus spans EDGAR-era S-2 and S-2/A filings from January 1994 through December 2005. The SEC adopted Securities Offering Reform in SEC Release No. 33-8591, effective December 1, 2005, which eliminated Form S-2 and folded its use case into a revised Form S-1 and an expanded Form S-3. No S-2 or S-2/A filings exist after late 2005, which permanently closes the dataset.
Document bodies are predominantly ASCII (.txt) and HTML (.htm, .html), with a minority of late-period PDF exhibits. Every document file — text or HTML — is wrapped in an EDGAR SGML document envelope (<DOCUMENT>, <TYPE>, <SEQUENCE>, <FILENAME>, optional <DESCRIPTION>, <TEXT>) that must be stripped before body parsing. The filing-level manifest is JSON. Containers are monthly ZIP archives.
Form S-1 historically required full restatement of business and financial disclosure; Form S-3 was the short-form vehicle for the most-seasoned issuers and supported Rule 415 shelf registration with full forward incorporation of future Exchange Act filings. S-2 sat between them: it permitted incorporation of prior 10-K and 10-Q content by reference but required physical delivery of those reports with the prospectus, and it lacked the full shelf mechanics of S-3. The S-2 corpus is therefore exclusively seasoned-issuer follow-on registrations from mid-tier issuers who cleared the three-year reporting bar but failed S-3's float or transactional tests.
No. GRAPHIC documents (logos, signature images, organizational charts, photographs) are excluded from the ZIP archives but remain enumerated in documentFormatFiles with valid EDGAR documentUrl values for off-archive retrieval. The single concatenated <accessionWithDashes>.txt full-submission bundle that EDGAR exposes is also not stored locally — only the individual decomposed documents are written into each folder, and the bundle URL is referenced via linkToTxt and a trailing pseudo-entry in documentFormatFiles.