Form S-2 Files Dataset

The Form S-2 Files Dataset is a closed historical archive of every Form S-2 and Form S-2/A registration statement accepted by EDGAR between January 1994 and December 2005, the month in which the SEC retired the form under Securities Offering Reform. Each record is one EDGAR submission — an accession-numbered folder containing a metadata.json manifest, the primary registration-statement document, and every exhibit and correspondence document filed with it. The underlying form was the middle tier of the SEC's pre-2005 integrated disclosure system: a long-form Securities Act registration available only to seasoned U.S. domestic issuers with at least three years of Exchange Act reporting history who did not qualify for the short-form Form S-3. Filings are grouped into monthly ZIP containers keyed by acceptance month, distributed alongside TXT, JSON, HTML, and PDF document bodies. Because Form S-2 was eliminated effective December 1, 2005 and folded into a revised Form S-1, the corpus is permanently bounded and does not grow.

Update Frequency
Daily
Updated at
2026-04-15
Earliest Sample Date
1994-01-01
Total Size
256.8 MB
Total Records
12,045
Container Format
ZIP
Content Types
TXT, JSON, HTML, PDF
Form Types
S-2, S-2/A

Dataset APIs

Programmatically retrieve the full list of dataset archive files, download URLs and dataset metadata.

Dataset Index JSON API

Download the entire dataset as a single archive file.

Download Entire Dataset:

Download a single container file (e.g. monthly archive) from the dataset.

Download Single Container:

Dataset Files

143 files · 256.8 MB
Download All
2005-11.zip975.3 KB58 records
2005-10.zip1.5 MB82 records
2005-09.zip1.4 MB75 records
2005-08.zip755.2 KB56 records
2005-07.zip1.4 MB73 records
2005-06.zip2.2 MB148 records
2005-05.zip3.3 MB122 records
2005-04.zip2.7 MB141 records
2005-03.zip1.6 MB71 records
2005-02.zip1.8 MB88 records
2005-01.zip2.2 MB72 records
2004-12.zip1.9 MB77 records
2004-11.zip2.0 MB66 records
2004-10.zip1.5 MB64 records
2004-09.zip579.4 KB35 records
2004-08.zip1.7 MB85 records
2004-07.zip1.6 MB60 records
2004-06.zip3.0 MB106 records
2004-05.zip2.8 MB102 records
2004-04.zip2.0 MB102 records
2004-03.zip2.2 MB106 records
2004-02.zip2.0 MB82 records
2004-01.zip2.2 MB98 records
2003-12.zip2.4 MB96 records
2003-11.zip3.5 MB89 records
2003-10.zip2.2 MB70 records
2003-09.zip2.4 MB79 records
2003-08.zip1.5 MB88 records
2003-07.zip2.0 MB54 records
2003-06.zip1.9 MB73 records
2003-05.zip1.3 MB67 records
2003-04.zip1.7 MB84 records
2003-03.zip1.3 MB44 records
2003-02.zip984.1 KB36 records
2003-01.zip2.2 MB86 records
2002-12.zip1.2 MB58 records
2002-11.zip1.0 MB52 records
2002-10.zip978.5 KB48 records
2002-09.zip1.1 MB41 records
2002-08.zip758.3 KB39 records
2002-07.zip928.4 KB43 records
2002-06.zip4.3 MB84 records
2002-05.zip4.7 MB110 records
2002-04.zip1.4 MB66 records
2002-03.zip1.2 MB67 records
2002-02.zip2.1 MB94 records
2002-01.zip2.1 MB120 records
2001-12.zip791.0 KB43 records
2001-11.zip2.2 MB108 records
2001-10.zip3.3 MB150 records
2001-09.zip1.5 MB71 records
2001-08.zip1.8 MB109 records
2001-07.zip1.6 MB97 records
2001-06.zip2.0 MB86 records
2001-05.zip1.4 MB84 records
2001-04.zip1.5 MB80 records
2001-03.zip599.1 KB20 records
2001-02.zip1.0 MB81 records
2001-01.zip836.7 KB54 records
2000-12.zip834.7 KB55 records
2000-11.zip784.4 KB47 records
2000-10.zip2.0 MB88 records
2000-09.zip2.3 MB111 records
2000-08.zip1.5 MB79 records
2000-07.zip1.1 MB99 records
2000-06.zip2.7 MB113 records
2000-05.zip876.6 KB58 records
2000-04.zip879.4 KB67 records
2000-03.zip517.1 KB27 records
2000-02.zip673.4 KB32 records
2000-01.zip1.1 MB75 records
1999-12.zip208.9 KB7 records
1999-11.zip832.4 KB54 records
1999-10.zip1.7 MB102 records
1999-09.zip1.1 MB75 records
1999-08.zip720.3 KB42 records
1999-07.zip1.8 MB78 records
1999-06.zip1.4 MB62 records
1999-05.zip1.1 MB69 records
1999-04.zip1.9 MB87 records
1999-03.zip635.7 KB30 records
1999-02.zip1.1 MB52 records
1999-01.zip1.4 MB74 records
1998-12.zip1.3 MB69 records
1998-11.zip901.3 KB53 records
1998-10.zip1.2 MB63 records
1998-09.zip1.8 MB98 records
1998-08.zip1.3 MB85 records
1998-07.zip3.4 MB160 records
1998-06.zip1.8 MB69 records
1998-05.zip2.4 MB110 records
1998-04.zip2.9 MB142 records
1998-03.zip2.3 MB130 records
1998-02.zip1.8 MB95 records
1998-01.zip1.9 MB125 records
1997-12.zip3.2 MB162 records
1997-11.zip4.0 MB171 records
1997-10.zip4.7 MB207 records
1997-09.zip4.5 MB222 records
1997-08.zip2.8 MB137 records
1997-07.zip1.9 MB80 records
1997-06.zip3.4 MB143 records
1997-05.zip4.6 MB191 records
1997-04.zip4.3 MB228 records
1997-03.zip1.1 MB55 records
1997-02.zip2.3 MB106 records
1997-01.zip4.0 MB192 records
1996-12.zip3.8 MB198 records
1996-11.zip3.2 MB145 records
1996-10.zip2.4 MB85 records
1996-09.zip1.5 MB81 records
1996-08.zip1.3 MB78 records
1996-07.zip2.8 MB140 records
1996-06.zip2.0 MB108 records
1996-05.zip3.6 MB167 records
1996-04.zip2.4 MB112 records
1996-03.zip1.0 MB79 records
1996-02.zip1.5 MB108 records
1996-01.zip1.9 MB92 records
1995-12.zip1.4 MB74 records
1995-11.zip1.3 MB49 records
1995-10.zip2.2 MB101 records
1995-09.zip1.3 MB64 records
1995-08.zip1.5 MB74 records
1995-07.zip1.1 MB74 records
1995-06.zip2.2 MB88 records
1995-05.zip2.5 MB119 records
1995-04.zip1.3 MB79 records
1995-03.zip1.7 MB117 records
1995-02.zip823.5 KB43 records
1995-01.zip1.5 MB62 records
1994-12.zip454.3 KB20 records
1994-11.zip391.0 KB24 records
1994-10.zip580.5 KB20 records
1994-09.zip568.7 KB29 records
1994-08.zip200.6 KB6 records
1994-07.zip1.0 MB36 records
1994-06.zip2.0 MB56 records
1994-05.zip555.7 KB17 records
1994-04.zip358.6 KB29 records
1994-03.zip767.9 KB30 records
1994-02.zip2.1 MB57 records
1994-01.zip1.5 MB68 records

What This Dataset Contains

The dataset captures the full EDGAR-era population of Form S-2 (original registration statement) and Form S-2/A (pre-effective or post-effective amendment) submissions. The formType vocabulary is restricted to exactly these two values. Each record is a single accession-numbered submission, not an issuer or an offering: a registrant that filed an original S-2 followed by three S-2/A amendments produces four independent, self-contained records. The accession number, not the registrant, defines the record boundary.

Form S-2 was a long-form registration statement under the Securities Act of 1933 designed for seasoned issuers. It occupied a middle position between the full-disclosure Form S-1 and the short-form Form S-3: it permitted the registrant to incorporate substantial portions of its Exchange Act history (10-K, 10-Q, 8-K, proxy materials) by reference rather than reprinting it, while still requiring a delivered prospectus that included or summarized the financial statements. Because of this hybrid posture, an S-2 record typically carries a prospectus, a description of the securities being registered, risk factors, use of proceeds, plan of distribution, and either embedded or incorporated-by-reference audited financial statements and MD&A, plus the standard slate of Part II exhibits.

Records are grouped into monthly ZIP archives keyed by acceptance month (for example 2005/2005-08.zip). The dataset spans January 1994 through December 2005 inclusive and is a closed corpus: no new S-2 or S-2/A filings exist after the December 2005 sunset. Document bodies inside each record are preserved verbatim in their original formats — predominantly ASCII (.txt) and HTML (.htm, .html), with a minority of late-period PDF exhibits — each still wrapped in its EDGAR SGML document envelope. The metadata.json manifest is included for every record.

Content Structure of a Single Form S-2 Record

Physical structure of a record

Every accession-numbered folder has three physical components:

  1. metadata.json — exactly one per folder, holding filing-level facts (form type, accession number, acceptance timestamp, registrant identifiers, URLs) and a document index for every constituent file.
  2. The primary registration-statement document — one document whose EDGAR <TYPE> is S-2 or S-2/A, almost always at <SEQUENCE>1. Filenames are filer-coined (e.g. forms2a03725_08152005.htm, ds2.htm, grays2.txt, a05-15449_1s2a.htm, v023069_s2-a.txt, doc1.txt).
  3. Zero or more exhibit and correspondence documents — each a separately wrapped SGML document with its own <TYPE>, <SEQUENCE>, and <FILENAME>. When the filer omitted a filename (most commonly for CORRESP), EDGAR substitutes filenameN.htm or filenameN.txt, where N is the document sequence number.

The folder name is the 18-character SEC accession number with dashes stripped (NNNNNNNNNNYYNNNNNN). Every document file — whether the inner body is plain ASCII or HTML — opens with an EDGAR SGML document envelope: line-prefixed tags <DOCUMENT>, <TYPE>, <SEQUENCE>, <FILENAME>, an optional <DESCRIPTION>, then <TEXT> followed by the body. For HTML documents the body begins with <HTML> immediately after <TEXT>; for ASCII documents the body begins with <PAGE> form-feed markers and fixed-width text. When <DESCRIPTION> is present, its value matches the description field that the same document carries inside metadata.json.

Section-by-section anatomy of the underlying S-2

Cover page and facing sheet. The primary document opens with the EDGAR-mandated facing sheet: form name and registration-number placeholder, exact legal name of the registrant, state or jurisdiction of incorporation, primary standard industrial classification code, IRS Employer Identification Number, principal-executive-office address, agent for service, telephone number, and the calculation-of-registration-fee table. The fee table itemizes each title of securities being registered together with amount to be registered, proposed maximum offering price per unit, proposed maximum aggregate offering price, and the registration fee due.

Prospectus body. The prospectus follows the facing sheet and contains, in roughly this order: a cover page summarizing the offering, a table of contents, a summary of the offering and the company, risk factors, use of proceeds, dilution where applicable, capitalization, selected financial data, management's discussion and analysis (or an incorporation-by-reference pointer to the most recent 10-K and intervening 10-Qs and 8-Ks), description of the business, description of the securities being offered, plan of distribution and underwriting arrangements, legal matters, experts, and a "where you can find additional information" pointer to EDGAR. Because S-2 eligibility presumes Exchange Act seasoning, large portions of the descriptive and historical disclosure are routinely incorporated by reference rather than restated.

Part II — information not required in the prospectus. After the prospectus, Part II contains other expenses of issuance and distribution, indemnification of directors and officers, recent sales of unregistered securities, the exhibit index, undertakings of the registrant, and the signature page. Signatures identify the registrant, the principal executive officer, the principal financial and accounting officer, and a majority of the board of directors.

Exhibits. Each exhibit is a separately wrapped document in its own file. <TYPE> values in the corpus follow the Regulation S-K Item 601 numbering scheme. Common types include:

  • EX-1 — underwriting agreement
  • EX-3 — charter and bylaws (when not incorporated by reference)
  • EX-4, EX-4.05 — instruments defining the rights of securityholders (indentures, warrant agreements, certificates of designation)
  • EX-5, EX-5.1, EX-5.01 — opinion of counsel on legality of the securities being registered
  • EX-10, EX-10.18, EX-10.(VII) — material contracts (employment agreements, credit facilities, postponement agreements)
  • EX-11 — computation of earnings per share
  • EX-21, EX-21.01 — subsidiaries of the registrant
  • EX-23, EX-23.1, EX-23.2, EX-23.(A), EX-23.01 — consents of independent registered public accounting firms and other experts
  • EX-99, EX-99.1, EX-99.A through EX-99.J — additional exhibits (pricing supplements, form proxies, board resolutions)
  • CORRESP — correspondence with SEC staff
  • GRAPHIC — image attachments (enumerated in the manifest but not stored in the dataset; see "Excluded content" below)

The exhibit index inside Part II enumerates these by Item 601 number; the same exhibits appear physically as separate files in the folder, each with its own SGML envelope.

The metadata.json manifest

The manifest is a flat JSON object that surfaces filing-level facts and a document index. Its fields are:

  • formType"S-2" or "S-2/A".
  • accessionNo — dashed EDGAR accession number ("NNNNNNNNNN-YY-NNNNNN"); the parent folder name is the same value with dashes stripped.
  • description — human-readable EDGAR form description, e.g. "Form S-2 - Registration of securities" or "Form S-2/A - Registration of securities: [Amend]".
  • filedAt — ISO-8601 acceptance timestamp in Eastern Time, e.g. "2005-08-29T16:30:03-04:00".
  • linkToFilingDetails — URL of the primary document on EDGAR.
  • linkToHtml — URL of the EDGAR filing index page.
  • linkToTxt — URL of the full-submission text bundle on EDGAR (the single .txt aggregation of every document with SGML wrappers).
  • linkToXbrl — present but always an empty string; Form S-2 predates XBRL.
  • dataFiles — present but always an empty array, for the same reason.
  • id — 32-character hexadecimal identifier assigned by the dataset publisher.
  • documentFormatFiles — array of document descriptors, one per submitted document plus a trailing complete-submission pseudo-entry.
  • entities — array of filer/registrant identification objects.

documentFormatFiles entries. Each entry carries:

  • sequence — numeric string (e.g. "1", "2") for ordinary documents; a single-space string " " for the trailing complete-submission pseudo-entry.
  • size — byte count expressed as a JSON string, not a number.
  • documentUrl — EDGAR URL whose final path segment matches the on-disk filename inside the folder.
  • description — optional uppercase free-text label such as OPINION RE LEGALITY, CONSENT OF AUDITORS, POSTPONEMENT AGREEMENT, FORM S-2.
  • type — EDGAR document type (S-2, S-2/A, EX-5, EX-23.1, EX-10.18, EX-99.J, CORRESP, GRAPHIC, and so on).

Sequence numbering reflects the original EDGAR submission and may be non-contiguous when the filer reserved or withdrew a sequence slot.

entities entries. Each entity object describes a filer or co-registrant and carries:

  • companyName — legal name with the EDGAR role appended in parentheses, e.g. "ACME CORP (Filer)".
  • cik — zero-padded ten-digit CIK.
  • fileNoSEC file number, typically 333-NNNNNN for registration statements; stable across the amendment chain.
  • irsNoEmployer Identification Number, sometimes "000000000" for shell or pre-revenue issuers.
  • stateOfIncorporation — two-letter code.
  • sicSIC code prefixed to its human-readable industry label.
  • fiscalYearEnd — four-digit MMDD string.
  • act — always "33" because S-2 is filed under the Securities Act of 1933.
  • filmNo — EDGAR film/microfiche number.
  • type — registrant role-specific form code.
  • tickers — array of public ticker symbols, included only when the issuer has them.

Most S-2 filings list a single entity (the registrant); guarantor co-registrant entries are possible under EDGAR conventions.

Included content

Each record includes every textual document EDGAR received for that S-2 or S-2/A submission: the primary registration statement, every exhibit (legality opinions, auditor consents, material contracts, subsidiary lists, EPS computations, additional exhibits), and any CORRESP documents. ASCII (.txt) and HTML (.htm, .html) bodies are preserved verbatim, each still wrapped in its original EDGAR SGML document envelope. A minority of late-period submissions also carry PDF exhibits, which are preserved as-is.

Excluded or separate content

Two categories of submission content are intentionally absent:

  1. Image files. Graphics that the filer submitted as GRAPHIC documents (logos, signature images, hand-drawn organizational charts, photographs) are excluded from the ZIP archives. They remain enumerated as GRAPHIC-type entries in documentFormatFiles with valid documentUrl values pointing to EDGAR, so a consumer can fetch them from sec.gov if needed, but they are not written to disk inside the folder.
  2. The complete-submission .txt bundle. EDGAR exposes every filing as a single concatenated text file (<accessionWithDashes>.txt) that re-aggregates every document with its SGML wrapper. This bundle is referenced as a trailing documentFormatFiles entry whose sequence and type are both a single space character, and its URL is also exposed via linkToTxt, but the bundle itself is not stored locally — only the individual decomposed documents are written into the folder.

Material that the registrant incorporated by reference (prior 10-Ks, 10-Qs, 8-Ks, proxy statements) is not duplicated inside the S-2 record. The prospectus names the incorporated filings, but their content lives in the dataset for the relevant form type.

Changes in required content and structure over time

Across the 1994–2005 window, Form S-2's eligibility criteria and core disclosure architecture remained substantially stable: a seasoned-issuer registration framework that combined a delivered prospectus with permissive incorporation by reference of Exchange Act materials. The most consequential changes during the dataset's lifetime were external to the form rather than internal to it. Regulation M (1996–1997) altered the plan-of-distribution and stabilization disclosures; Regulation FD (2000) reshaped how forward-looking and selective-disclosure issues were framed in risk factors and MD&A; the Private Securities Litigation Reform Act safe-harbor language migrated into forward-looking-statement legends; and Sarbanes-Oxley (2002) introduced certifications and internal-control disclosures that, while primarily Exchange Act features, propagated into the incorporated-by-reference base that an S-2 relied upon. The form itself was then eliminated in December 2005 as part of Securities Offering Reform, which folded the S-2 use case into a revised Form S-1 and an expanded Form S-3 (including the new well-known seasoned issuer category and automatic shelf registration). The dataset terminates cleanly at that regulatory boundary.

Changes in data format over time

The relevant format evolution within the 1994–2005 window is the EDGAR transition from ASCII-only submissions to HTML submissions. Early filings in the corpus are uniformly plain text — fixed-width prospectus pages, <PAGE> form-feed markers, ASCII tables — wrapped in SGML document envelopes inside .txt files. From the late 1990s onward, filers progressively adopted HTML, and by the form's final years both conventions coexist, often within the same monthly archive: some filers continued to submit everything as wrapped ASCII while others used HTML for the primary document and exhibits alike. PDF exhibits also appear in a minority of late-period filings. The SGML document envelope itself (<DOCUMENT>, <TYPE>, <SEQUENCE>, <FILENAME>, optional <DESCRIPTION>, <TEXT>) is invariant across the entire window and prefixes every document file in the dataset.

Interpretation and extraction notes

  • Amendments are independent records. An S-2/A is a full record in its own right with its own accession number and folder. Tracking the lineage from the original S-2 through successive amendments requires joining on the registrant's CIK and the fileNo (333-NNNNNN), which is stable across the amendment chain.
  • Incorporation by reference is pervasive. An S-2 prospectus routinely refers the reader to specific prior Exchange Act filings rather than reprinting their content. Extractors that expect a self-contained narrative will see truncated business or financial sections; the cross-references explicitly name the incorporated filings.
  • Filenames are filer-coined. There is no naming convention — ds2.htm, grays2.txt, forms2a03725_08152005.htm, a2161212zs-2a.htm, and doc1.txt are all primary S-2 documents. The authoritative identifier for what a file is comes from the SGML <TYPE> tag inside the file and the matching type field in metadata.json.
  • EDGAR fallback filenames are real documents. When the filer omitted a filename (commonly for CORRESP), the file appears as filenameN.htm or filenameN.txt, where N is the document sequence number. These are valid content files, not placeholders.
  • Sequence numbers may be non-contiguous. Skips in documentFormatFiles[i].sequence reflect the original EDGAR submission, not packaging gaps. The complete-submission pseudo-entry always trails the list with sequence and type both set to a single space character — consumers iterating the array should skip it.
  • GRAPHIC entries point off-archive. A GRAPHIC entry in documentFormatFiles references a .jpg or similar image file that is not present in the folder. The documentUrl value remains a valid EDGAR URL.
  • Strip the SGML envelope before body parsing. Every document file — text or HTML — begins with EDGAR's <DOCUMENT> / <TYPE> / <SEQUENCE> / <FILENAME> / optional <DESCRIPTION> / <TEXT> header lines before the actual content. HTML parsers that expect a leading <HTML> token must skip past the envelope first; the matching closing </TEXT> and </DOCUMENT> tags follow the body.
  • Byte sizes and sequences are strings. Both size and sequence in documentFormatFiles are numeric values expressed as JSON strings, not numbers.
  • Empty XBRL fields are structural, not missing data. linkToXbrl is always an empty string and dataFiles is always an empty array because Form S-2 predates structured-data requirements; these fields exist for schema consistency with later form-type datasets and should not be treated as gaps.

Who Files or Publishes This Dataset, and When

Who files

Each record is a Form S-2 or Form S-2/A registration statement filed on EDGAR by the issuer itself, acting as registrant under Section 5 of the Securities Act of 1933. The filer is the company whose securities are being registered for public sale. Subsidiary guarantors and other co-registrants on registered debt offerings may appear as additional filers on a single submission.

Filer population: seasoned domestic issuers

Form S-2 was available only to seasoned U.S. domestic issuers. A registrant generally had to:

  • be organized in the United States or a U.S. state or territory, with principal operations in the United States;
  • have a class of securities registered under Section 12 of the Securities Exchange Act of 1934, or be required to file reports under Section 15(d);
  • have been subject to Exchange Act reporting for at least 36 consecutive calendar months before filing;
  • have filed all required Section 13, 14, and 15(d) reports on a timely basis during the 12 months immediately preceding the filing; and
  • not have, since the end of the last fiscal year covered by audited financials in a Securities Act or Exchange Act report, failed to pay a preferred dividend or sinking fund installment or defaulted on borrowed money or a material long-term lease.

Issuers outside this profile used a different form. First-time registrants, recent IPO issuers, companies emerging from reorganization, shell companies, and seasoned issuers that had fallen out of timely Exchange Act compliance filed on Form S-1. Issuers that also met the public-float threshold (typically $75 million of non-affiliate float during the relevant period) and other Form S-3 conditions used Form S-3, which permitted Rule 415 shelf registration and broader forward incorporation by reference. S-2 captured the in-between group: seasoned enough to refer back to their Exchange Act file, but not eligible for the most abbreviated S-3 regime.

Foreign private issuers used the parallel F-series (Form F-1, Form F-2, Form F-3), not S-2. Investment companies used the N-series (Form N-1A, Form N-2, etc.). Asset-backed issuers, business development companies, and other specialized registrants used their own form regimes and do not appear in this dataset.

Triggering event: a registered public offering

Filing is event-driven, not periodic. The trigger is the issuer's decision to offer and sell securities to the public in a transaction requiring registration under Section 5 of the Securities Act. There is no recurring deadline.

Offerings commonly registered on Form S-2 included:

  • follow-on common equity offerings;
  • preferred stock offerings;
  • investment-grade and high-yield debt, including notes, debentures, and convertibles;
  • warrants, rights, and units;
  • resale registrations for selling securityholders, including PIPE resales and shares issuable on conversion or exercise;
  • securities issued in acquisitions, employee benefit plans, or dividend reinvestment plans, where the issuer chose S-2 over a special-purpose form.

Unlike Form S-3, Form S-2 did not support primary shelf takedowns in the same flexible manner; its incorporation-by-reference mechanics were also more limited, generally requiring delivery of the most recent annual report to security holders rather than ongoing forward incorporation of future Exchange Act filings.

Form S-2/A amendments

Form S-2/A records are pre-effective or post-effective amendments to a previously filed S-2. They arise from:

  • staff comment responses during SEC review;
  • pricing amendments in offerings not relying on Rule 430A;
  • updates to keep financial statements current under Rule 3-12 of Regulation S-X;
  • changes in shares registered, selling securityholders, or offering terms;
  • post-effective amendments for material developments, deregistration of unsold securities, or fundamental changes to the prospectus.

Because Securities Act review is iterative, a single offering often produced one S-2 and several S-2/A filings before effectiveness. Each amendment is a separately accessioned EDGAR submission and a separate record.

Timing markers in the lifecycle

Key dates around an S-2 filing are:

  • the initial filing date, which triggers the Section 6(b) filing fee and starts SEC staff review;
  • each amendment date during review;
  • the effective date, after which sales may be made under Section 5(a);
  • for offerings priced post-effectiveness, the date of any Rule 424 prospectus supplement (filed as a separate form type, not part of this dataset).

Eligibility was tested as of filing, and the issuer had to remain Exchange Act-compliant throughout review; loss of timely-filer status could force a downgrade to Form S-1.

Important distinctions

  • The legal filer is the issuer. Underwriters, selling securityholders, guarantors, and co-issuers are disclosed in the prospectus but are not the registrant; subsidiary guarantors of registered debt may, however, appear as co-registrants on a single submission.
  • Transactions falling within special-purpose Securities Act forms generally did not use S-2: Form S-4 (business combinations), Form S-8 (employee benefit plans), Form S-11 (real estate), Form SB-1 and Form SB-2 (small business issuers), the F-series (foreign private issuers), and the N-series (investment companies).
  • Issuers near the eligibility line sometimes elected Form S-1 even when technically S-2-eligible, particularly when full restatement of disclosure better suited the offering.
  • S-3-eligible issuers almost always preferred S-3, so the S-2 population skews toward mid-cap and smaller seasoned issuers that cleared the three-year reporting bar but failed S-3's float or transactional tests.

Why the dataset ends in 2005

The SEC adopted Securities Offering Reform in Release No. 33-8591 (July 2005), effective December 1, 2005. The release restructured Securities Act registration around four issuer categories — well-known seasoned issuers (WKSIs), other seasoned issuers, unseasoned reporting issuers, and non-reporting issuers — and eliminated Form S-2 (and Form F-2), folding backward incorporation by reference into a revised Form S-1. After the transition window, formerly S-2-eligible issuers moved either to Form S-3 (if qualified) or to the enhanced Form S-1. No S-2 or S-2/A filings exist after late 2005, which permanently bounds the dataset.

How This Dataset Differs From Similar Datasets or Filings

Form S-2 occupied the middle tier of Securities Act registration from 1982 to December 2005, distinguished from its neighbors by partial incorporation by reference paired with mandatory physical delivery of incorporated reports. The most useful comparisons are with the other S-series registration forms, their F-series foreign-issuer counterparts, Rule 424(a) prospectus filings downstream of effectiveness, and the Form 10-K/Form 10-Q reports incorporated by reference.

Form S-1

The nearest substitute. S-1 is the general-purpose long-form registration with the same core content (prospectus, risk factors, use of proceeds, plan of distribution, financial statements). The decisive split is eligibility and mechanics: S-1 historically required full restatement of business and financial disclosure inside the filing, while S-2 let issuers with three years of Exchange Act reporting incorporate prior 10-K/10-Q content by reference and physically deliver those reports with the prospectus. After Securities Offering Reform (Dec 1, 2005), S-1 absorbed S-2's forward-incorporation provisions, making S-1 the only post-2005 substitute. An S-1 corpus is far larger, spans 1994 to present, and skews toward IPOs and first-time registrants; the S-2 corpus is exclusively seasoned-issuer follow-on registrations from 1994 to 2005.

Form S-3

The next tier up. S-3 was reserved for the most-seasoned issuers (public-float threshold plus 12-month reporting history) and is the vehicle for shelf registration under Rule 415. Both S-2 and S-3 rely on incorporation by reference, but S-3 permits full forward incorporation of future Exchange Act filings and produces a thin base prospectus completed later via 424(b) supplements. S-2 incorporated only prior reports, required their physical delivery, and lacked full shelf mechanics. Shelf takedowns, WKSI activity, and large secondary offerings live in S-3; S-2 captures the mid-tier seasoned issuers who failed S-3's float or follower tests.

Form S-4

Same registration framework, different trigger. S-4 registers securities issued in mergers, business combinations, and exchange offers, and carries transaction-specific disclosure absent from S-2: deal background, fairness opinions, target financials, pro formas, and voting agreements. S-4 remains active post-2005. The corpora are complementary: the same issuer might file S-4 for an acquisition-related issuance and S-2 for a cash follow-on during the same period.

Form S-11

A specialized track for real estate issuers (REITs, real estate LPs) with property-level financials and REIT tax-qualification disclosure not present in S-2. S-2 was a general-issuer form; reporting real estate issuers occasionally used S-2 for non-real-estate offerings, but S-11 is not a substitute corpus.

Foreign private issuer counterparts: F-1, F-2, F-3

The F-series mirrors the S-series for foreign private issuers, with F-2 as the direct foreign analogue to S-2 (also eliminated in December 2005 and folded into F-1). F-series filings substitute Form 20-F for the incorporated 10-K, include home-country GAAP reconciliations (pre-IFRS era), and apply different signature and agent-for-service rules. S-2 explicitly excludes foreign private issuers; cross-border seasoned-issuer coverage for the same window requires the F-2 corpus.

Post-effectiveness prospectus filings: Rule 424

Rule 424(a) and Rule 424(b)(1)-(8) filings are the actual prospectuses and supplements used to offer securities after a registration is declared effective. They sit downstream of S-2 in the same transaction: S-2 establishes the registration shell and base prospectus; the 424 filing delivers final pricing, terms, and any post-effectiveness updates. Studies of actual offering economics (pricing, underwriting spreads, deal terms) require 424 data; studies of the registration disclosure itself can rely on S-2 alone. The two are complementary, not interchangeable.

Incorporated periodic reports: 10-K and 10-Q

Not registration filings and not offering-triggered. 10-K and 10-Q are calendar-driven Securities Exchange Act of 1934 reports that S-2 issuers incorporated by reference instead of restating. The relationship is dependency, not substitution: an S-2 filing is deliberately incomplete without the contemporaneous 10-K (business description, MD&A, audited financials) and most recent 10-Q. Reconstructing the full disclosure package presented to investors at the time of an S-2 offering requires joining S-2 to the same issuer's 10-K/10-Q filings on or before the effective date.

Boundary summary

The Form S-2 Files dataset is distinct on four dimensions:

  • Closed and bounded. No new S-2 filings since December 2005; the archive is complete, not updating.
  • Tier-specific. It captures only seasoned issuers with three years of reporting history who did not qualify for S-3 — an eligibility tier that no longer exists as a separate form.
  • Mechanism-specific. It preserves the unique partial-incorporation-with-delivery model of S-2 (and F-2), distinct from pre-2005 S-1's full restatement and S-3's full forward incorporation.
  • Full-submission scope. Each accession includes the metadata file and all documents in the EDGAR submission (excluding images), not an extracted section or structured derivative.

Routing guide: post-2005 seasoned-issuer offerings shift to S-1 or S-3; merger-related issuances to S-4; foreign issuers to the F-series; final pricing and terms to Rule 424; full contemporaneous disclosure to joined 10-K/10-Q.

Who Uses This Dataset

The Form S-2 Files corpus is used as a primary-source archive of seasoned-issuer registrations under the pre-Securities Offering Reform tier system. There is no live monitoring use case; users work with a fixed, completed population.

Empirical finance and securities law researchers

Researchers treat the corpus as a defined population for studies of seasoned equity offerings, debt issuance, underwriting spreads, and post-issuance returns. The use-of-proceeds, plan of distribution, and financial statements (or Exchange Act cross-references) feed offering-level datasets. The fixed end date in December 2005 makes the corpus a clean pre-period for difference-in-differences work around Securities Offering Reform. Law and accounting scholars examine cover pages, eligibility statements, and incorporation-by-reference language to study how the S-1/S-2/S-3 tier system actually operated.

Capital markets attorneys and knowledge management

Securities offering associates and KM lawyers at law firms mine the corpus as a precedent library for prospectus drafting language: risk factor formulations, plan of distribution boilerplate, lock-up provisions, indemnification carve-outs, and selling-stockholder disclosure. S-2/A amendments are read alongside the originating S-2 to recover SEC staff comment patterns and the negotiated wording of contested disclosure items.

Investment bank capital markets and DCM desks

Equity syndicate and debt capital markets teams use the dataset to study historical offering structures, underwriter lists, gross spreads, over-allotment options, and stabilization language during the period when seasoned issuers chose among S-1, S-2, and S-3. DCM teams pull historical covenant packages, redemption mechanics, and ranking provisions when refinancing legacy instruments or structuring successor-entity offerings.

Corporate compliance and securities administration

In-house compliance and securities administration groups archive registration statements filed by the issuer, its predecessors, and acquired entities. The metadata and full submissions support filing inventories, internal audit responses, and tracking of outstanding securities, indenture obligations, and registration rights tied to pre-2005 offerings.

Treasury, corporate development, and M&A integration

These teams retrieve registrations filed by predecessor entities, divested subsidiaries, and acquisition targets to confirm terms of legacy securities still outstanding, verify representations made in historical offering documents, and support purchase-accounting and successor-issuer determinations. Investor relations uses the same filings to reconcile share counts, conversion mechanics, and inherited registration rights.

Forensic accountants and securities litigation support

Litigation consultants and testifying experts use S-2 and S-2/A filings as primary evidence in long-tail Sections 11 and 12 claims, fraud-on-the-market matters, and restatement disputes referencing pre-2005 disclosures. Financial statements, MD&A incorporated by reference, risk factors, and certifications anchor expert reports on what was disclosed, omitted, or mischaracterized at the time of the offering.

Regulatory historians and policy researchers

Policy researchers and regulator staff conducting retrospective reviews of Securities Offering Reform use the corpus to document how the eliminated form actually functioned in its final decade, comparing S-2 volume, issuer characteristics, and disclosure content against S-1 and S-3 to reconstruct the rationale for collapsing the tier system.

Teams building prospectus-aware language models and retrieval systems use the corpus as bounded, form-consistent training data. The pairing of S-2 and S-2/A across the same accession family supports section-boundary detection (risk factors, use of proceeds, plan of distribution, selling stockholders) and diff modeling between initial and amended registrations. HTML and TXT variants support segmentation; JSON metadata provides filing-level labels.

Financial data engineering

Vendor and buy-side data engineering teams ingest the corpus to backfill historical offering coverage in security masters and corporate actions tables, extracting issuer identifiers, offering size, security type, underwriters, and effective dates and linking them to CUSIP, ticker, and entity histories. The fixed, non-growing nature of the corpus makes it well suited to a one-time deterministic load for point-in-time analytics.

Across these groups, the high-value sections are the cover page and eligibility statement, the prospectus body with risk factors and use of proceeds, the plan of distribution, the financial statements or incorporation-by-reference cross-references, and the S-2 to S-2/A amendment diff.

Specific Use Cases

  • Reconstructing the S-2 to S-2/A redline history per offering. Group records by entities[].fileNo (the stable 333-NNNNNN registration number) and order by filedAt to recover the original S-2 followed by each S-2/A amendment for the same offering. Diffing the primary registration-statement documents across the chain surfaces the exact disclosure items that moved during SEC staff review (risk factor wording, fee-table revisions, plan-of-distribution edits), and any in-chain CORRESP documents tie those edits to staff comment letters.

  • Building a precedent bank of plan-of-distribution and risk-factor language. Extract the prospectus body from each S-2 or S-2/A primary document (skipping the SGML envelope), section-split on "Plan of Distribution," "Risk Factors," "Use of Proceeds," and "Selling Stockholders," and index the resulting passages by SIC, security type from the fee table, and acceptance year. Capital markets associates and KM teams query this index for boilerplate covering lock-ups, stabilization under Regulation M, indemnification carve-outs, and over-allotment mechanics from the pre-Reform tier era.

  • Constructing a counsel and auditor panel from EX-5 and EX-23 exhibits. Iterate documentFormatFiles for entries with type matching EX-5* (legality opinions) and EX-23* (auditor and expert consents), parse the firm name and signing office from each exhibit body, and join to the registrant CIK, SIC, and filedAt. The result is a longitudinal panel of which law firms and audit firms served which seasoned issuers during the integrated-disclosure period, usable for market-share studies and conflict checks.

  • A bounded pre-period dataset for Securities Offering Reform difference-in-differences studies. Because the corpus terminates at the December 2005 sunset, the closed S-2 population forms a clean treated-form sample. Researchers pair S-2 records with same-issuer S-1, S-3, or post-Reform filings using CIK to estimate the effect of Reform on offering frequency, time-to-effectiveness (measured across the S-2 / S-2/A chain), and disclosure length for mid-tier seasoned issuers who lost their dedicated form.

  • Training and evaluating prospectus-section NLP models on form-consistent data. The fixed corpus and stable Item 601 exhibit vocabulary support supervised tasks: section-boundary segmentation on the prospectus body, classification of exhibits by type, and S-2 vs. S-2/A pair-wise edit modeling. ASCII and HTML variants in the same window let teams test format-robust extractors, while metadata.json provides filing-level labels (form type, SIC, fiscal year end, state of incorporation) without external joins.

  • Litigation and forensic reconstruction of what was disclosed at offering. For Section 11 and Section 12 claims tied to pre-2005 offerings, experts pull the exact S-2 or S-2/A in force at the relevant date, capture the fee table, risk factors, use-of-proceeds language, and the list of filings incorporated by reference from the prospectus, then resolve those cross-references against contemporaneous 10-K and 10-Q records. The accession-numbered folder plus metadata.json provides a self-contained evidentiary unit with EDGAR-authoritative URLs (linkToFilingDetails, linkToHtml) for chain-of-custody references.

  • Backfilling historical offering attributes into security masters. Data engineering teams parse the calculation-of-registration-fee table from the primary document to capture security title, registered amount, and aggregate offering price, then enrich with entities[].cik, tickers, sic, and filedAt to produce point-in-time offering records keyed to CUSIP and ticker histories. The non-growing nature of the corpus makes this a one-time deterministic load rather than an ongoing pipeline.

Dataset Access

The Form S-2 Files dataset is accessible through three endpoints: a metadata index, a full archive download, and per-container downloads. Container files are monthly ZIP archives keyed by year and month, covering the period from 1994-01 through the most recent month available.

Dataset Index JSON API: https://api.sec-api.io/datasets/form-s2-files.json

Returns dataset-level metadata (name, description, last update timestamp, earliest sample date, total records, total size, form types covered, container format, and file types) along with the full list of container files. Each container entry includes its key, size, record count, last update timestamp, and direct download URL. This endpoint does not require an API key.

Poll this endpoint to monitor which containers were refreshed in the latest run via their updatedAt timestamps, and download only the containers that changed.

Example response:

Example
1 {
2 "datasetId": "1f13365b-9ae0-6973-a718-7adf2a243360",
3 "datasetDownloadUrl": "https://api.sec-api.io/datasets/form-s2-files.zip",
4 "name": "Form S-2 Files Dataset",
5 "updatedAt": "2026-04-15T08:02:51.286Z",
6 "earliestSampleDate": "1994-01-01",
7 "totalRecords": 12045,
8 "totalSize": 256798968,
9 "formTypes": ["S-2", "S-2/A"],
10 "containerFormat": "ZIP",
11 "fileTypes": ["TXT", "JSON", "HTML", "PDF"],
12 "containers": [
13 {
14 "downloadUrl": "https://api.sec-api.io/datasets/form-s2-files/2005/2005-12.zip",
15 "key": "2005/2005-12.zip",
16 "size": 1842311,
17 "records": 18,
18 "updatedAt": "2026-04-15T08:02:51.286Z"
19 }
20 ]
21 }

Download Entire Dataset: https://api.sec-api.io/datasets/form-s2-files.zip?token=YOUR_API_KEY

Downloads the complete dataset as a single ZIP archive containing all monthly containers from 1994-01 through 2005-12. This endpoint requires a valid sec-api.io API key passed via the token query parameter.

Download Single Container: https://api.sec-api.io/datasets/form-s2-files/2005/2005-12.zip?token=YOUR_API_KEY

Downloads one monthly container ZIP instead of the full archive. Replace the year and month segments with any valid container key from the dataset index (e.g. 1994/1994-01.zip through 2005/2005-12.zip). This endpoint requires a valid sec-api.io API key passed via the token query parameter.

Frequently Asked Questions

What forms does this dataset cover?

The Form S-2 Files Dataset covers exactly two form types: S-2 (the original long-form registration statement under the Securities Act of 1933) and S-2/A (pre-effective or post-effective amendments to a previously filed S-2). No other Securities Act or Exchange Act forms appear in the corpus.

What does one record in this dataset represent?

One record is a single EDGAR submission identified by its 18-character SEC accession number. Physically, each record is an accession-numbered folder containing a metadata.json manifest, the primary S-2 or S-2/A registration-statement document, and every exhibit and CORRESP document filed with that submission. An issuer that filed an original S-2 and three S-2/A amendments produces four independent records, not one.

Who was required to file Form S-2?

Form S-2 was available only to seasoned U.S. domestic issuers — companies organized in the United States that had been subject to Exchange Act reporting for at least 36 consecutive months, were current and timely in their Section 13, 14, and 15(d) filings, and had not defaulted on dividends, sinking fund installments, borrowed money, or material long-term leases since their last audited fiscal year. Foreign private issuers, investment companies, and asset-backed issuers used separate form regimes.

What time period does the dataset cover, and why does it end in 2005?

The corpus spans EDGAR-era S-2 and S-2/A filings from January 1994 through December 2005. The SEC adopted Securities Offering Reform in SEC Release No. 33-8591, effective December 1, 2005, which eliminated Form S-2 and folded its use case into a revised Form S-1 and an expanded Form S-3. No S-2 or S-2/A filings exist after late 2005, which permanently closes the dataset.

What file formats are inside each record?

Document bodies are predominantly ASCII (.txt) and HTML (.htm, .html), with a minority of late-period PDF exhibits. Every document file — text or HTML — is wrapped in an EDGAR SGML document envelope (<DOCUMENT>, <TYPE>, <SEQUENCE>, <FILENAME>, optional <DESCRIPTION>, <TEXT>) that must be stripped before body parsing. The filing-level manifest is JSON. Containers are monthly ZIP archives.

How does this dataset differ from a Form S-1 or Form S-3 dataset?

Form S-1 historically required full restatement of business and financial disclosure; Form S-3 was the short-form vehicle for the most-seasoned issuers and supported Rule 415 shelf registration with full forward incorporation of future Exchange Act filings. S-2 sat between them: it permitted incorporation of prior 10-K and 10-Q content by reference but required physical delivery of those reports with the prospectus, and it lacked the full shelf mechanics of S-3. The S-2 corpus is therefore exclusively seasoned-issuer follow-on registrations from mid-tier issuers who cleared the three-year reporting bar but failed S-3's float or transactional tests.

Are images and the EDGAR full-submission text bundle included?

No. GRAPHIC documents (logos, signature images, organizational charts, photographs) are excluded from the ZIP archives but remain enumerated in documentFormatFiles with valid EDGAR documentUrl values for off-archive retrieval. The single concatenated <accessionWithDashes>.txt full-submission bundle that EDGAR exposes is also not stored locally — only the individual decomposed documents are written into each folder, and the bundle URL is referenced via linkToTxt and a trailing pseudo-entry in documentFormatFiles.