The Form S-4 Files Dataset is a complete archive of every Form S-4 and Form S-4/A registration statement filed on EDGAR, the Securities Act registration vehicle used whenever a U.S. registrant issues securities as consideration in a business combination — stock-for-stock mergers, share-exchange offers, holding-company reorganizations, de-SPAC business combinations, and debt-for-debt exchange offers. One record is a single EDGAR submission keyed by accession number, comprising the primary registration statement, every filed exhibit (merger agreement, charters, legality and tax opinions, material contracts, auditor consents, fairness opinions, filing-fee exhibit), and a metadata.json envelope that identifies filers, co-registrants, and per-document roles. The dataset is distributed as monthly ZIP containers following a YYYY/YYYY-MM.zip layout and covers filings from January 1994 through the present, refreshed daily. Filers include operating-company acquirers, holding-company reorganizers, SPACs consummating initial business combinations, and multi-registrant parent-plus-guarantor groups. Because S-4s almost always undergo SEC staff review, the dataset captures both the original S-4 and the full chain of S-4/A pre-effective amendments that trail each transaction through effectiveness.
Programmatically retrieve the full list of dataset archive files, download URLs and dataset metadata.
Dataset Index JSON API
Download the entire dataset as a single archive file.
Download Entire Dataset:
Download a single container file (e.g. monthly archive) from the dataset.
Download Single Container:
The dataset is the complete corpus of Form S-4 and Form S-4/A EDGAR submissions. Form S-4 is the Securities Act of 1933 registration statement used whenever a registrant issues securities as consideration in a business combination: mergers, stock-for-stock acquisitions, reorganizations, share-exchange offers, debt-for-debt exchange offers, and certain going-private transactions structured as combinations. The filing simultaneously performs two jobs — it registers the new securities with the Commission, and, where a shareholder vote is required, it doubles as the combined proxy statement/prospectus delivered to target-company shareholders asked to vote on or tender into the deal. Because of that dual role, a single S-4 typically contains deal-description narrative, historical financial statements of both acquirer and target, pro forma combined financials, risk factors, descriptions of the securities being issued and of the combined entity, and the full text of the merger agreement and ancillary contracts as exhibits. Form S-4/A is the amendment variant, filed to respond to SEC staff comment letters, to incorporate updated financials, or to reflect revised transaction terms. Amendments range from full restatements of the prospectus to narrow updates that touch only a single exhibit or paragraph; the record format is identical in either case.
The dataset spans January 1994 — the start of EDGAR's mandatory electronic-filing phase-in — through the present, and is distributed as monthly ZIP containers. Recent monthly cadence sits around thirty S-4 and S-4/A filings per month based on November 2025 sampling, reflecting both fresh registration statements and the multi-amendment lifecycle typical of SEC review. Form S-4 itself was adopted in 1985 to consolidate earlier business-combination registration forms; any pre-1994 S-4 filings exist only in paper records outside this dataset.
One record is a single EDGAR submission of Form S-4 or Form S-4/A, keyed by its SEC accession number. On disk, the record is one sub-folder inside a monthly ZIP whose name is the eighteen-digit accession in zero-padded, dash-stripped form (for example 000119312525262679/). Each folder contains exactly one metadata.json envelope together with the SGML-wrapped documents that constitute the original submission. The record unit is the filing, not the transaction: a single business combination typically generates an initial S-4 plus one or more S-4/A amendments, each of which is an independent record with its own accession, its own exhibit set, and its own metadata block. Per-folder document counts range from a two-file minimum (the updated primary document plus metadata.json, when an amendment re-files nothing else) up to roughly two dozen documents for deal-heavy filings that carry charters, bylaws, tax and legality opinions, consents, and extensive material-contract exhibits.
The dataset is distributed as monthly ZIP containers laid out <year>/<year>-<month>.zip. Inside a monthly ZIP, the top level is a single directory named after the filing month (for example 2025-11/), the second level is one folder per accession number in dash-stripped form, and the third level is the flat set of filing documents plus the metadata.json envelope. All documents sit as siblings inside the accession folder; there is no further sub-structure separating the primary document from the exhibits. The exhibit role of each file is carried in the metadata.json envelope and in the SGML <TYPE> tag of the document itself, not in the directory layout. Bundling is strictly temporal: every accession filed in a given calendar month lands in that month's archive regardless of registrant, industry, or deal size, so the monthly ZIP is the natural unit for incremental ingestion.
Every accession folder carries exactly one metadata.json file that serves as the structured anchor for the filing. Its top-level scalar fields identify the submission on EDGAR:
formType — either S-4 or S-4/A.accessionNo — the canonical dashed accession (for example 0001193125-25-262679); this is the same identifier encoded dash-free in the folder name.filedAt — ISO 8601 timestamp with timezone offset.description — human-readable form label, with [Amend] appended for /A filings.linkToFilingDetails, linkToTxt, linkToHtml — URLs pointing respectively to the primary document, the complete-submission .txt stream, and the EDGAR filing-index page on sec.gov.linkToXbrl — URL to the XBRL instance; frequently an empty string on plain S-4 filings because the form does not carry full financial-statement XBRL, only the structured filing-fee exhibit.id — a thirty-two-character hexadecimal content hash suitable for deduplication.Three array fields carry the substantive structure of the submission.
documentFormatFiles[] enumerates every document in the original EDGAR submission — the primary registration statement, every exhibit, any graphics, and the trailing complete-submission .txt. Each entry is an object carrying sequence, size, documentUrl, description, and type. The type field holds the canonical exhibit taxonomy (S-4, S-4/A, EX-8.1, EX-10.11, EX-21.1, EX-23.1, EX-99.4, EX-FILING FEES, GRAPHIC, and so on) and is the authoritative handle for identifying a document's role; the final entry in the array is typically the complete-submission text file with a blank type and no sequence.
dataFiles[] lists the XBRL companion files (XSD schema, XML instance, extracted _htm.xml) when the filing carries inline XBRL beyond the fee table. It is commonly empty for plain S-4 submissions and populated for S-4/A filings that carry broader inline XBRL.
entities[] lists the filer(s) and co-registrants. Each entity object carries companyName, cik, fileNo, irsNo, fiscalYearEnd, stateOfIncorporation (a two-character SEC jurisdiction code; I0 for France, DE for Delaware, and so on), act, sic, filmNo, type, and an optional tickers[] array for publicly listed parties. Multi-registrant S-4 filings — a frequent pattern when a holding company co-files with its operating subsidiary, when guarantors co-register debt securities, or when spin-merger structures pair multiple affiliated registrants — produce multiple entity objects sharing a file-number group. The group is expressed as a primary file number plus dash-suffixed siblings (for example 333-289994 on the lead entity and 333-289994-01 on the co-registrant), allowing every issuer, guarantor, and co-registrant affiliate to be enumerated alongside its individual CIK and SIC code.
A seriesAndClassesContractsInformation[] array is present in the envelope schema but is typically empty for S-4, as it is used for investment-company contract disclosures that do not apply to this form.
The envelope advertises more documents than the ZIP ships: image entries (type: GRAPHIC, usually .jpg or .gif) and the complete-submission .txt appear in documentFormatFiles[] for completeness, but neither is placed in the archive. Consumers that need those assets must follow the documentUrl back to EDGAR.
The primary document is the file whose documentFormatFiles[].type is S-4 or S-4/A and whose internal SGML <TYPE> tag matches. File names are preparer-specific stubs — for example d37576ds4.htm, tm2524487-6_s4a.htm, or newtek2025exchangeoffer-fo.htm — with no standardized naming convention across filers, so role identification must rely on the type field or the SGML header, never on filename heuristics.
Every .htm in the archive — primary and exhibit alike — is an EDGAR SGML-wrapped document rather than a bare HTML file. The first lines always follow this preamble:
1
<DOCUMENT>
2
<TYPE>S-4
3
<SEQUENCE>1
4
<FILENAME>d37576ds4.htm
5
<DESCRIPTION>S-4
6
<TEXT>
7
<HTML>... full HTML body ...</HTML>
8
</TEXT>
9
</DOCUMENT>
The <TYPE> token matches the documentFormatFiles[].type value, <SEQUENCE> matches the document's ordering within the submission, and <FILENAME> matches the local file name. Any HTML parser applied directly to the raw bytes will fail on the SGML preamble; extractors must either strip the wrapper first or operate on the payload between <TEXT> and </TEXT>.
Inside the HTML body, the primary document follows the conventional registration-statement scaffold. The cover page identifies the registrant, the Securities Act registration file number (for example 333-291227), the state of incorporation, the IRS Employer Identification Number, the primary standard industrial classification code, the address and agent for service, and a reference to the calculation-of-registration-fee table (which, for post-compliance filings, points to the separate EX-FILING FEES exhibit rather than being tabulated inline). A letter to target shareholders and a notice of special meeting typically follow when the S-4 doubles as a proxy statement. The prospectus/proxy body then presents, in varying order depending on deal structure: a "Questions and Answers" summary, a "Summary of the Transaction", the detailed "The Merger" or "The Exchange Offer" section describing background, terms, consideration mechanics, fairness opinions, and regulatory approvals, a full "Risk Factors" section, "Selected Historical Financial Data" for each party, "Unaudited Pro Forma Condensed Combined Financial Statements" showing the combined balance sheet and income statement giving effect to the transaction, "Description of Capital Stock" or "Description of the Securities Being Registered", a "Comparison of Rights of Shareholders" contrasting the acquirer's and target's charter and bylaw provisions, a "Description of the Combined Entity" covering business overview, management, and post-closing governance, and a "Material U.S. Federal Income Tax Consequences" section keyed to the EX-8.1 tax opinion. Audited historical financial statements of the target (and sometimes the acquirer) are either included in full or incorporated by reference from Exchange Act filings. Undertakings, signatures of the registrant together with its directors and principal officers, and the exhibit index close the primary document.
S-4 submissions carry a characteristic exhibit set driven by Item 21 of Form S-4 and the exhibit requirements of Regulation S-K Item 601. Each exhibit is a separate .htm file in the accession folder whose <TYPE> tag and matching documentFormatFiles[].type value carry the exhibit number. The exhibits most relevant to S-4 are:
EX-2.x — the plan of acquisition, reorganization, arrangement, liquidation, or succession: the merger agreement or share-exchange agreement itself, together with any amendments. This is the definitive transaction contract.EX-3.x — charters and bylaws of the registrant, including post-closing amended-and-restated charters when the transaction reshapes the acquirer's equity structure. Filings that overhaul the capital structure can enumerate charters and bylaws running well past EX-3.10.EX-4.x — instruments defining the rights of security holders: indentures, rights agreements, specimen stock or note certificates, and supplemental indentures for newly registered debt.EX-5.x — legality opinion from counsel that the securities being registered will be validly issued, fully paid, and non-assessable.EX-8.x — tax opinion from counsel addressing the material U.S. federal income tax consequences of the transaction, particularly whether a merger qualifies as a tax-free reorganization under Section 368 of the Internal Revenue Code.EX-10.x — material contracts, including employment and retention agreements entered into in connection with the deal, voting and support agreements from significant shareholders, financing and debt-commitment letters, transition-services agreements, and sponsor-support and PIPE subscription agreements on de-SPAC S-4s.EX-21.x — list of subsidiaries of the registrant, relevant for both acquirer and, where applicable, target disclosures.EX-23.x — consents of independent auditors and other experts whose reports or opinions are included or incorporated by reference. S-4 filings almost universally carry multiple EX-23 entries — one per audit firm per entity whose statements appear — because both the acquirer's and the target's auditors must consent.EX-99.x — additional exhibits. On S-4 these typically include the form of proxy card, form of letter to shareholders, form of letter of transmittal and notice of guaranteed delivery for exchange offers, fairness opinions from financial advisors, and press releases announcing the transaction.EX-FILING FEES — the structured filing-fee calculation exhibit. Unlike every other .htm in the archive, this file's body is an inline XBRL document using the ix, xbrli, dei, and ffd namespaces to render machine-readable fee-calculation tables. It is present whenever fees are being calculated on the instant filing rather than deferred to a later amendment.The presence or absence of specific exhibits is itself a strong signal of transaction structure: exchange offers carry letter-of-transmittal EX-99.x exhibits; stock-for-stock mergers requiring shareholder approval carry proxy-card EX-99.x and tax-opinion EX-8.x exhibits; deals that rewrite the acquirer's charter carry an enumerated run of EX-3.x files; and narrow amendments often consist of only an updated primary document plus a fresh EX-FILING FEES exhibit and one or more updated EX-23.x consents.
Each record includes the structured metadata envelope and every textual document from the original EDGAR submission: the primary S-4 or S-4/A registration statement, every exhibit filed on the submission (predominantly .htm, with occasional .txt, .pdf, or .xfd form-based filings), and the inline-XBRL filing-fee exhibit where present. SGML document wrappers are preserved intact, so the <TYPE>, <SEQUENCE>, <FILENAME>, and <DESCRIPTION> tags remain available for parsing and for correlating each file with its documentFormatFiles[] entry. The metadata envelope exposes issuer identifiers, co-registrant affiliations, SIC classification, tickers, file numbers, and exhaustive per-document URLs, so any externally hosted asset can be re-fetched deterministically from EDGAR.
Three categories of content are systematically omitted from each record. First, image files (type: GRAPHIC, typically .jpg or .gif, used for signature blocks, organization charts, deal-structure diagrams, auditor and fairness-opinion logos, and similar visual artifacts referenced from the HTML body) are enumerated in documentFormatFiles[] but are not placed in the ZIP. Second, the EDGAR complete-submission .txt bundle — the concatenated SGML file that carries every document of the submission in one stream — is enumerated in the envelope and addressable through linkToTxt, but is not included locally, because its content is already available disaggregated as the individual .htm documents. Third, XBRL companion data files (.xsd schemas, _htm.xml instance documents) listed in dataFiles[] are referenced by URL but not bundled; only the .htm carrier of the filing-fee iXBRL exhibit is shipped in the ZIP, because for that exhibit the structured data is embedded directly in the HTML file itself. Content incorporated by reference from other filings — most commonly the acquirer's Exchange Act reports cited in the S-4 body — is not expanded into the record; the S-4 text retains the textual reference, but the referenced filings remain in their own accession records elsewhere in EDGAR.
Although the dataset spans more than three decades, the underlying filing format evolved across that window. Filings from 1994 through the late 1990s were submitted as ASCII/SGML-only documents: the SGML document wrapper that still frames every file today was the entire format, and the body between <TEXT> and </TEXT> was plain text rather than HTML, with tables laid out as fixed-width character grids. HTML bodies became common in the early 2000s as EDGAR accepted HTML submissions, and for roughly the last two decades essentially every S-4 has carried a true HTML body inside the SGML wrapper, with CSS-styled tables, inline image references, and navigation anchors. The SGML preamble itself has remained stable throughout.
Two regulatory shifts materially changed the record's content. First, amendments to Regulation S-K Item 601(b)(2) and related guidance now permit redaction of confidential commercial information from material-contract exhibits (EX-2 and EX-10) without a formal confidential-treatment request; S-4 filings after the rule change frequently contain redacted merger agreements with explicit [***] or bracketed redaction markers in the exhibit text. Second, and more consequential for structured extraction, the Commission adopted Rule 408 and amended Rule 411 to require filing-fee tables in a structured, inline-XBRL format for most fee-bearing registration statements including Form S-4, phased in from 2022 by filer class. Filings before the compliance date carry a free-form "Calculation of Registration Fee" table inside the primary document; filings after carry a separate EX-FILING FEES exhibit whose .htm body is an inline-XBRL document with ix:, xbrli:, dei:, and ffd: namespaces encoding each fee line-item as structured facts. The iXBRL fee exhibit is the single most reliably machine-parseable component of modern S-4 records.
Several structural variants recur across the dataset and are worth recognizing explicitly.
metadata.json when no exhibits are re-filed.-01, -02). The entities[] array captures every co-filer with its own CIK, SIC, and state of incorporation.EX-10.x and EX-99.x.EX-99.x series in place of proxy-card materials, and often omit the EX-8.x tax opinion when the exchange is taxable.Several nuances materially affect downstream extraction.
tm2524487d9_ex23-1.htm, d37576dex231.htm, flyx-ex23_1.htm are all EX-23.1 consents in different filings). Use the SGML <TYPE> tag or the documentFormatFiles[].type value.<DOCUMENT>, <TYPE>, <SEQUENCE>, <FILENAME>, <DESCRIPTION>, and <TEXT> lines as non-HTML content before the <HTML> payload begins.EX-3.1, EX-3.3, and EX-3.7 without intervening numbers when the primary document's exhibit index enumerates more exhibits than are being filed on the instant submission, with the omitted ones incorporated by reference from prior filings. Gaps do not indicate missing documents.EX-FILING FEES requires an iXBRL-aware parser. Treating it as plain HTML yields only the rendered table text, not the structured facts (security class, proposed maximum aggregate offering price, fee rate, total fee paid). Use an XBRL/iXBRL parser that recognizes the ffd: namespace.accessionNo, but all amendments of a given registration share the same entities[].fileNo with the original S-4. Sequence-of-amendments reconstruction must therefore join on file number; the description field's [Amend] marker and the formType value are the direct indicators of amendment status on any individual record.[***] bracketed redaction markers inside merger agreements and ancillary contracts; extracted exhibit text will faithfully preserve these markers rather than the underlying confidential values.<img> reference in the HTML body — signature images, organization charts, deal-structure diagrams, fairness-opinion logos — will fail to resolve when rendered locally. The underlying assets remain fetchable via the documentUrl entries of the corresponding documentFormatFiles[] objects.The Form S-4 registrant is the entity issuing securities as consideration in a business-combination transaction. Depending on the structure, that is:
The target is not the filer, even though much of the prospectus describes the target's business, risk factors, and financials. The target is the counterparty whose shareholders will receive the acquirer's securities. Target information appears in the S-4 because Rule 145 treats the solicitation of target-shareholder consent as an "offer" of the acquirer's securities to those shareholders, and Section 5 of the Securities Act requires the entity offering and selling securities to register them. Where the target also needs a shareholder vote, the target typically files its own Schedule 14A proxy materials, or the S-4 is structured as a joint proxy statement/prospectus covering both votes.
Typical filers include domestic operating-company acquirers, holding-company reorganizers, SPACs consummating de-SPAC business combinations, and foreign private issuers using S-4 where F-4 eligibility is not met or where a domestic subsidiary is the immediate issuer. Co-registrants routinely appear on one S-4 when subsidiary guarantors of registered debt securities must themselves register, or when a new parent and its merging subsidiaries are all named issuers.
Filing is event-driven, not periodic. A new S-4 is prepared once the registrant has committed to a transaction in which it will issue registrable securities. Typical triggers:
Purely cash acquisitions do not trigger S-4. With no securities offered to target holders, those deals are disclosed, if at all, under Exchange Act tender-offer or proxy rules rather than Securities Act registration.
Form S-4/A is a pre-effective amendment (or, less commonly, a post-effective amendment). Because S-4s are almost always reviewed by the Division of Corporation Finance, amendments are the norm, not the exception. A new S-4/A is typically filed when:
Multiple S-4/A amendments per transaction are routine; contested or complex deals can produce five or more before effectiveness.
The S-4 must be declared effective before the acquirer may issue the registrable securities. Because the S-4 almost always carries the joint proxy statement/prospectus, the effectiveness date also sets the earliest date for mailing definitive proxy materials. State-law meeting notice and Regulation 14A timing then govern the vote, and closing follows shareholder approval and satisfaction of regulatory conditions (HSR clearance, foreign competition approvals, sector-specific approvals). The effectiveness-to-closing window typically runs four to eight weeks, longer where regulators delay or the deal is contested.
Form S-4 sits at the intersection of two regimes: Securities Act registration of newly issued shares and Exchange Act solicitation of shareholders who must vote or tender. Depending on deal structure, an M&A transaction may trigger one, both, or neither of those regimes. The comparisons below isolate the single event or condition that makes S-4 the right filing rather than each adjacent form.
Form S-1 is the general Securities Act registration used for IPOs, follow-ons, and resale registrations where shares are sold for cash. S-4 registers shares issued as merger consideration, with the target's shareholders receiving stock in exchange for their old stock rather than paying cash. The trigger is the nature of the consideration: cash offering uses S-1, stock-for-stock business combination uses S-4. Pricing ranges, underwriter syndicates, and use-of-proceeds live in S-1; exchange ratios, fairness opinions, and merger-consideration mechanics live in S-4.
F-4 is the S-4 equivalent when the issuer of the consideration securities qualifies as a foreign private issuer under Rule 405. The disclosure architecture is parallel, but F-4/A permits IFRS financials without U.S. GAAP reconciliation and accommodates home-country governance practice. The trigger is solely the FPI status of the share issuer, not the target's domicile: a U.S. acquirer issuing shares to foreign targets files S-4; a foreign acquirer issuing shares to U.S. targets files F-4.
Form S-3 is a shelf registration available to seasoned issuers meeting float and reporting tests, used for transaction-agnostic, recurring capital raises taken down via 424(b) supplements. S-4 is a one-off, deal-specific document that goes effective once and is consumed in a single business combination. S-3 cannot register merger consideration; the presence of a named target, exchange ratio, and shareholder vote forces S-4.
Schedule 14A is the Exchange Act proxy form used when shareholders are solicited for a vote but no new registered securities are issued to them. The boundary is whether Securities Act registration is required: cash mergers require a DEFM14A with no S-4, while stock-for-stock deals require an S-4 whose combined proxy statement/prospectus absorbs the target's 14A function. If the target shareholders are receiving registered stock, the vote disclosure lives inside the S-4, not in a standalone 14A.
Schedule TO is the tender-offer statement required under Section 14(d) / Regulation 14D. Pure cash tender offers generate only a Schedule TO. Exchange offers paying newly registered securities generate both: Schedule TO for the tender-offer mechanics (bidder identity, conditions, withdrawal rights) and a co-filed S-4 for registration of the share consideration. Neither substitutes for the other. Going-private deals add a Schedule 13E-3 alongside whichever of S-4 or 14A applies.
Form 425 is the wrapper for written communications about a business combination made under Rules 165 and 425, covering press releases, investor decks, employee Q&As, and scripts distributed before S-4 effectiveness; it also serves as Rule 14a-12 soliciting material. 425 and S-4 are complementary, not alternatives: 425 captures the running stream of pre-effectiveness communications (dozens of short filings per deal), while S-4 is the single long registration document that ultimately goes effective.
Form 8-K time-stamps material events; S-4 registers and discloses. Item 1.01 marks signing of the merger agreement and starts the S-4 drafting clock; Item 2.01 marks closing and ends the S-4's useful life because the consideration shares have been issued. Item 9.01 typically attaches the merger agreement as Exhibit 2.1. Use 8-Ks for trigger dates and agreement text; use S-4 for the prospectus, pro forma financials, and fairness opinions.
Rule 424(b) filings document individual takedowns from an effective S-3 (or S-1) shelf, carrying pricing and final terms without re-filing the base registration. S-4 does not use 424(b): it reaches effectiveness through pre-effective S-4/A amendments, and consideration is fixed by the merger agreement's exchange ratio rather than a book-built price. Track shelf takedowns via 424(b); track iterative merger disclosure via the S-4/A series.
S-4 is the only SEC filing that simultaneously (a) registers newly issued shares under the Securities Act, (b) delivers a prospectus to target shareholders, and (c) solicits the target (and often acquirer) vote in stock-for-stock deals. S-1 and S-3 register without soliciting; Schedule 14A solicits without registering; Schedule TO governs tender mechanics without registering the exchange consideration; Form 425 and 8-K communicate and announce but do neither. Whenever registered securities are used as consideration in a business combination, S-4 is the correct primary source; for cash deals, going-private transactions, pure tender offers, or capital raises, an adjacent dataset above is correct instead.
Form S-4 filings bundle a negotiated merger agreement, deal chronology, fairness opinions, pro forma financials, tax and legality opinions, antitrust disclosure, and shareholder-vote mechanics into one registered prospectus. Each professional group below pulls a specific slice.
Coverage and M&A associates use S-4s as a precedent library. They extract exchange ratios, fixed and floating collars, and unaffected-price premia from "The Merger" and "Background of the Merger," and calibrate valuation work against the DCF ranges, selected-companies multiples, and premia-paid analyses disclosed in the "Opinion of Financial Advisor" section. Article I adjustment and walk-away provisions in EX-2.1 feed structuring templates.
Deal and disclosure counsel benchmark EX-2.1 across transactions: reps and warranties, interim covenants, no-shop and go-shop provisions (typically Section 5.03 or 5.04), fiduciary outs, MAE definitions, termination-fee triggers and quantum, and specific-performance clauses. EX-5.1 legality opinions and EX-8.1 Section 368(a) tax opinions serve as drafting templates, with counsel tracking evolving qualifications. "Risk Factors" and "Litigation Related to the Merger" language is reused when drafting a new S-4.
Arb desks parse Article VI closing conditions, "reasonable best efforts" versus "hell or high water" regulatory covenants, divestiture commitments, financing conditions, and the list in "Regulatory Approvals Required for the Merger." Article VIII termination provisions supply drop-dead dates, reverse termination fees, and ticking-fee mechanics for spread and downside models. Go-shop windows, matching rights, and superior-proposal definitions drive topping-bid probability.
Solicitors and governance advisors track S-4/A amendments for changes in voting mechanics. They work from "The Special Meeting" (record date, quorum, approval thresholds), "Appraisal Rights" or "Dissenters' Rights" statutory notices, and "Interests of Directors and Executive Officers in the Merger" for golden-parachute and Rule 14a-21(c) Say-on-Pay disclosures. Outputs include vote recommendations, solicitation strategy, and perfection-of-appraisal guidance.
Sell-side and buy-side analysts rebuild combined-entity economics from the "Unaudited Pro Forma Condensed Combined Financial Information" (Article 11 of Regulation S-X), modeling revenue, EBITDA, leverage, and EPS accretion or dilution. Management projections in "Certain Unaudited Prospective Financial Information" give a rare window into internal forecasts, and synergy disclosures (cost versus revenue, phasing, integration costs) feed models for both parties and sector peers.
In-house corp dev benchmarks termination fees as a percentage of equity value, reverse termination fees for financing or antitrust failure, CVR mechanics (in "Description of CVRs" or a CVR Agreement exhibit), escrow and holdback terms, and retention packages. "Background of the Merger" is used to study process design: auction versus bilateral, don't-ask-don't-waive standstills, and board-meeting sequencing.
Audit and transaction-services teams mine pro forma footnotes for ASC 805 application: intangible identification and amortization, bargain-purchase gains, step acquisitions, and transaction-cost treatment. EX-23.1 and EX-23.2 auditor consents document sign-off on historical financials. Valuation professionals use disclosed purchase price, allocation methodology, and goodwill-to-consideration ratios as comparable inputs.
Finance scholars build large-sample studies of premia, payment mix, announcement returns, and completion rates. Law-and-finance researchers study fiduciary-duty disclosures in "Background of the Merger," Revlon and Unocal implications of deal-protection devices, and the drafting response to Delaware doctrine (appraisal arbitrage, Corwin cleansing, MFW in controller deals). Coverage back to 1994 supports longitudinal work on market-check design, go-shop prevalence, and MAE drafting.
Competition economists and staff attorneys treat S-4s as the public analog to confidential HSR filings. "Regulatory Approvals," antitrust risk factors, and EX-2.1 efforts covenants disclose the parties' own market definitions, overlaps, and contemplated divestitures or behavioral remedies. Post-filing amendments often contain revised market-definition language after second-request negotiations, supporting retrospective review and divestiture-design research.
Deal-tracker platforms extract parties, announcement and effective dates, consideration structure, collars, termination fees, financial and legal advisors, advisor fees (from "Fees and Expenses" in the fairness-opinion section), and vote outcomes. These feed league tables, precedent-transaction databases, and fee-benchmarking products.
Deal reporters use S-4s and S-4/A amendments for negotiation chronology in "Background of the Merger," Party A and Party B references to alternative bidders, executive compensation on departure, merger-litigation settlements with supplemental disclosures, and revised fairness-opinion inputs. This supports deal post-mortems and coverage of contested transactions.
Teams building deal-term extractors and retrieval tools pair the prospectus narrative with the executed EX-2.1 to train models on provision extraction, deal-term classification, and closing-risk QA. S-4/A amendments supply supervision signals for revision tracking across a deal's life.
Concrete workflows anchored to specific S-4 exhibits, sections, and metadata fields.
M&A associates and deal counsel extract every EX-2.1 (and its amendments EX-2.2, EX-2.3) from S-4 filings in a target SIC code over the last five years, strip the SGML wrapper, and load the clean text into a clause-tagged repository. The output is a searchable precedent library for no-shop and go-shop language (typically Section 5.03 or 5.04), MAE carve-outs, fiduciary outs, and specific-performance provisions, used at first-draft time on new deals.
Event-driven desks ingest each month's ZIP, filter by formType in (S-4, S-4/A), and parse Article VI (closing conditions) and Article VIII (termination) from EX-2.1 plus the "Regulatory Approvals Required for the Merger" section of the primary document. The pipeline emits a daily watch list with drop-dead date, termination-fee quantum, reverse termination fee, and the regulatory-efforts standard ("reasonable best efforts" vs. "hell or high water") scored for downside and topping-bid risk.
Deal-data vendors walk every S-4 filed in a calendar year, locate each "Opinion of [Bank]'s Financial Advisor" section in the primary document and the corresponding EX-99.x fairness opinion, and extract advisor name, fee structure (from "Fees and Expenses"), DCF ranges, selected-companies multiples, and premia-paid comparables. The output is an annual league table of advisor count, aggregate deal value, and fee economics.
Sell-side equity analysts pull the "Unaudited Pro Forma Condensed Combined Financial Information" section from each S-4 covering a peer-group acquirer, together with "Certain Unaudited Prospective Financial Information" (management cases) and the synergy disclosures in "The Merger." These feed per-deal accretion/dilution models and sector-level synergy-phasing benchmarks.
Tax counsel aggregates every EX-8.1 across reorganization S-4s, indexed by acquirer jurisdiction and deal structure (forward triangular, reverse triangular, share exchange). The collection becomes a drafting bank for Section 368(a) qualifications, "should" vs. "will" opinion-level tracking, and representation-letter carve-outs.
Competition economists build a panel by joining entities[].fileNo across S-4 and S-4/A records to reconstruct each deal's amendment chain, then diff the "Regulatory Approvals" and antitrust risk-factor sections across successive amendments. Cross-referencing with closing (8-K Item 2.01) or abandonment (8-K Item 1.02) yields a labeled dataset of market-definition language and divestiture commitments for deals that did versus did not clear second-request review.
Data engineers extract EX-21.1 from every S-4 record, parse the subsidiary lists (entity name, jurisdiction of organization) for both acquirer and target, and merge with entities[] CIK and SIC to maintain a corporate-family graph that updates at each new filing. The output feeds KYC, sanctions, and exposure-aggregation tooling.
NLP teams build paired training data by aligning primary-document prospectus narrative (the "The Merger" and "Background of the Merger" sections) with the corresponding EX-2.1 clauses, and by using S-4/A revisions as supervision for clause-level diff tasks. The result is a fine-tuning corpus for deal-term extraction, MAE-carve-out classification, and closing-condition question answering.
Proxy solicitors and compensation consultants parse "Interests of Directors and Executive Officers in the Merger" (golden-parachute tables, Rule 14a-21(c) Say-on-Pay votes) and "The Special Meeting" (record date, quorum, approval thresholds) across recent S-4s. Outputs feed vote-recommendation memos, parachute-magnitude peer reports, and appraisal-rights client guidance.
Capital-markets teams target the EX-FILING FEES exhibit on post-2022 filings, parse the ffd: namespace facts (security class, proposed maximum aggregate offering price, fee rate, total fee paid), and produce aggregate Securities Act registration-fee tallies by filer, industry, and month, bypassing the unstructured fee tables that precede the rule.
Shell-company S-4 filings are identified by the presence of sponsor-support and PIPE-subscription agreements in EX-10.x combined with target-only historical financials. Surveillance teams filter the monthly ZIP on those exhibit patterns to produce a de-SPAC pipeline report with sponsor identity, PIPE size, and earn-out structure for each transaction.
Corp dev teams extract the "Description of CVRs" section of the primary document and any CVR Agreement filed as an EX-10.x or EX-99.x exhibit across deals where contingent value rights are part of the consideration. The library of milestone definitions, payment caps, and dispute mechanisms is reused when structuring biotech and pharma acquisitions.
Journalists and deal-litigation teams reconstruct the amendment sequence by joining records on entities[].fileNo, then diff successive primary documents to surface added "Litigation Related to the Merger" disclosures, revised management projections, updated fairness-opinion inputs, and supplemental disclosures settling stockholder suits. The output is a timeline of disclosure changes for each contested transaction.
The Form S-4 Files Dataset is available through three access modes: a JSON index API for programmatic discovery, a full archive download, and per-container monthly downloads. Containers follow a monthly partition pattern of YYYY/YYYY-MM.zip and cover filings from January 1994 to present, with daily updates. Image files are excluded from each container archive.
Dataset Index JSON API: https://api.sec-api.io/datasets/form-s4-files.json
Returns dataset metadata and the full list of monthly ZIP containers with their download URLs, sizes, record counts, and last-updated timestamps. Use this endpoint to monitor which containers changed in the most recent refresh and decide which to re-download. This endpoint does not require an API key.
Example response:
1
{
2
"datasetId": "1f13365b-9ae0-68e2-a0f4-6e8e71978b6e",
3
"datasetDownloadUrl": "https://api.sec-api.io/datasets/form-s4-files.zip",
4
"name": "Form S-4 Files Dataset",
5
"updatedAt": "2026-04-23T03:02:36.235Z",
6
"earliestSampleDate": "1994-01-01",
7
"totalRecords": 386971,
8
"totalSize": 13581316022,
9
"formTypes": ["S-4", "S-4/A"],
10
"containerFormat": "ZIP",
11
"fileTypes": ["TXT", "JSON", "HTML", "PDF", "XFD"],
12
"containers": [
13
{
14
"downloadUrl": "https://api.sec-api.io/datasets/form-s4-files/2026/2026-04.zip",
15
"key": "2026/2026-04.zip",
16
"size": 52341890,
17
"records": 412,
18
"updatedAt": "2026-04-23T03:02:36.235Z"
19
}
20
]
21
}
Download Entire Dataset: https://api.sec-api.io/datasets/form-s4-files.zip?token=YOUR_API_KEY
Downloads the complete dataset as a single ZIP archive covering all S-4 and S-4/A filings from 1994 to present. This endpoint requires an API key.
Download Single Container: https://api.sec-api.io/datasets/form-s4-files/2026/2026-04.zip?token=YOUR_API_KEY
Downloads one monthly container ZIP instead of the full archive. Container paths follow the YYYY/YYYY-MM.zip pattern and can be read directly from the downloadUrl field in the index JSON. This endpoint requires an API key.
The dataset covers Form S-4 and Form S-4/A, the Securities Act of 1933 registration statement for business combinations and its pre-effective (or, less commonly, post-effective) amendment variant. The formType field on every metadata.json envelope will be either S-4 or S-4/A.
One record is a single EDGAR submission of Form S-4 or S-4/A, keyed by its SEC accession number. Each record is a folder containing the primary registration statement, every filed exhibit (merger agreement, charters, legality and tax opinions, material contracts, auditor consents, fairness opinions, filing-fee exhibit), and a metadata.json envelope. The record unit is the filing, not the transaction: a single business combination typically generates one initial S-4 plus multiple S-4/A amendments, each of which is an independent record.
The filer is the entity issuing securities as consideration in a business combination — typically the acquirer in a stock-for-stock merger, the offeror in a securities exchange offer, or a newly formed holding company in a reorganization. The target is not the filer, even though much of the prospectus describes the target's business and financials. Multi-registrant S-4s are common when subsidiary guarantors co-register debt securities or when a new parent files alongside its merging subsidiaries.
F-4 is the S-4 equivalent when the issuer of the consideration securities qualifies as a foreign private issuer under Rule 405. The disclosure architecture is parallel, but F-4 permits IFRS financials without U.S. GAAP reconciliation. The trigger is the FPI status of the share issuer, not the target's domicile: a U.S. acquirer issuing shares to a foreign target files S-4, while a foreign acquirer issuing shares to a U.S. target files F-4. F-4 filings are in a separate dataset.
Form S-4/A is an amendment to a previously filed S-4, typically filed in response to SEC staff comment letters, updated financial statements, or changed deal terms. The record format is identical to an S-4, and the description field carries an [Amend] marker. All amendments of a given registration share the same entities[].fileNo as the original S-4, so amendment-chain reconstruction joins on file number rather than accession number. Amendments can range from fully restated prospectuses with refreshed exhibit sets to two-file records containing only an updated primary document and metadata.json.
No. Image files (type: GRAPHIC, typically .jpg or .gif — used for signature blocks, organization charts, deal-structure diagrams, and fairness-opinion logos) are enumerated in the documentFormatFiles[] array of metadata.json for completeness but are not placed in the container ZIP. The EDGAR complete-submission .txt bundle is likewise enumerated but not bundled. Consumers that need either category can fetch them directly from the documentUrl back to EDGAR.
The definitive merger agreement (or share-exchange agreement) is filed as the EX-2.x exhibit, identified by documentFormatFiles[].type equal to EX-2.1, EX-2.2, and so on. Do not rely on file names — preparer stubs such as d37576dex21.htm or tm2524487d9_ex2-1.htm are not standardized — use the type field in the metadata envelope or the SGML <TYPE> tag inside the document. The exhibit may contain [***] bracketed redaction markers where confidential commercial terms have been redacted under Item 601(b)(2) of Regulation S-K.
The dataset is refreshed daily and covers filings from January 1994 (the start of EDGAR's mandatory electronic-filing phase-in) through the present. Recent monthly cadence sits around thirty S-4 and S-4/A filings per month based on November 2025 sampling. Containers are organized as monthly ZIPs in the pattern YYYY/YYYY-MM.zip, so incremental ingestion runs naturally at the monthly container level.