Form S-4 Files Dataset

The Form S-4 Files Dataset is a complete archive of every Form S-4 and Form S-4/A registration statement filed on EDGAR, the Securities Act registration vehicle used whenever a U.S. registrant issues securities as consideration in a business combination — stock-for-stock mergers, share-exchange offers, holding-company reorganizations, de-SPAC business combinations, and debt-for-debt exchange offers. One record is a single EDGAR submission keyed by accession number, comprising the primary registration statement, every filed exhibit (merger agreement, charters, legality and tax opinions, material contracts, auditor consents, fairness opinions, filing-fee exhibit), and a metadata.json envelope that identifies filers, co-registrants, and per-document roles. The dataset is distributed as monthly ZIP containers following a YYYY/YYYY-MM.zip layout and covers filings from January 1994 through the present, refreshed daily. Filers include operating-company acquirers, holding-company reorganizers, SPACs consummating initial business combinations, and multi-registrant parent-plus-guarantor groups. Because S-4s almost always undergo SEC staff review, the dataset captures both the original S-4 and the full chain of S-4/A pre-effective amendments that trail each transaction through effectiveness.

Update Frequency
Daily
Updated at
2026-05-19
Earliest Sample Date
1994-01-01
Total Size
13.6 GB
Total Records
387,423
Container Format
ZIP
Content Types
TXT, JSON, HTML, PDF, XFD
Form Types
S-4, S-4/A

Dataset APIs

Programmatically retrieve the full list of dataset archive files, download URLs and dataset metadata.

Dataset Index JSON API

Download the entire dataset as a single archive file.

Download Entire Dataset:

Download a single container file (e.g. monthly archive) from the dataset.

Download Single Container:

Dataset Files

389 files · 13.6 GB
Download All
2026-05.zip24.1 MB331 records
2026-04.zip40.1 MB582 records
2026-03.zip26.8 MB355 records
2026-02.zip22.1 MB340 records
2026-01.zip46.1 MB517 records
2025-12.zip41.4 MB707 records
2025-11.zip13.8 MB247 records
2025-10.zip19.3 MB318 records
2025-09.zip39.2 MB561 records
2025-08.zip21.2 MB364 records
2025-07.zip32.9 MB394 records
2025-06.zip29.4 MB365 records
2025-05.zip35.4 MB407 records
2025-04.zip36.7 MB462 records
2025-03.zip17.6 MB258 records
2025-02.zip36.2 MB398 records
2025-01.zip42.5 MB462 records
2024-12.zip35.0 MB330 records
2024-11.zip41.6 MB476 records
2024-10.zip35.3 MB373 records
2024-09.zip18.8 MB263 records
2024-08.zip20.8 MB318 records
2024-07.zip40.0 MB422 records
2024-06.zip39.2 MB466 records
2024-05.zip41.0 MB535 records
2024-04.zip31.8 MB359 records
2024-03.zip23.0 MB384 records
2024-02.zip40.7 MB418 records
2024-01.zip50.8 MB657 records
2023-12.zip51.0 MB549 records
2023-11.zip58.6 MB646 records
2023-10.zip59.3 MB657 records
2023-09.zip68.3 MB887 records
2023-08.zip60.7 MB663 records
2023-07.zip54.4 MB586 records
2023-06.zip64.4 MB804 records
2023-05.zip57.0 MB716 records
2023-04.zip44.3 MB595 records
2023-03.zip17.2 MB242 records
2023-02.zip48.0 MB568 records
2023-01.zip43.9 MB438 records
2022-12.zip44.7 MB543 records
2022-11.zip51.8 MB675 records
2022-10.zip50.7 MB536 records
2022-09.zip42.4 MB427 records
2022-08.zip47.1 MB583 records
2022-07.zip52.0 MB593 records
2022-06.zip57.5 MB502 records
2022-05.zip62.8 MB793 records
2022-04.zip55.2 MB693 records
2022-03.zip36.4 MB524 records
2022-02.zip47.9 MB584 records
2022-01.zip59.4 MB515 records
2021-12.zip63.6 MB606 records
2021-11.zip72.3 MB581 records
2021-10.zip86.9 MB904 records
2021-09.zip85.8 MB879 records
2021-08.zip85.6 MB1,011 records
2021-07.zip91.5 MB1,182 records
2021-06.zip96.8 MB1,121 records
2021-05.zip127.1 MB1,340 records
2021-04.zip55.5 MB871 records
2021-03.zip61.6 MB1,013 records
2021-02.zip43.0 MB506 records
2021-01.zip49.4 MB711 records
2020-12.zip53.4 MB719 records
2020-11.zip49.9 MB668 records
2020-10.zip54.0 MB765 records
2020-09.zip39.6 MB537 records
2020-08.zip24.7 MB341 records
2020-07.zip17.1 MB258 records
2020-06.zip15.8 MB322 records
2020-05.zip15.3 MB311 records
2020-04.zip28.7 MB486 records
2020-03.zip23.1 MB461 records
2020-02.zip36.3 MB680 records
2020-01.zip21.1 MB526 records
2019-12.zip21.7 MB341 records
2019-11.zip21.6 MB381 records
2019-10.zip24.6 MB460 records
2019-09.zip19.8 MB346 records
2019-08.zip22.1 MB508 records
2019-07.zip19.1 MB325 records
2019-06.zip19.1 MB395 records
2019-05.zip18.7 MB347 records
2019-04.zip19.1 MB441 records
2019-03.zip14.5 MB384 records
2019-02.zip21.2 MB385 records
2019-01.zip27.9 MB610 records
2018-12.zip26.7 MB566 records
2018-11.zip29.3 MB639 records
2018-10.zip26.7 MB498 records
2018-09.zip18.9 MB375 records
2018-08.zip25.8 MB499 records
2018-07.zip22.4 MB455 records
2018-06.zip23.1 MB486 records
2018-05.zip22.0 MB543 records
2018-04.zip23.3 MB525 records
2018-03.zip16.2 MB450 records
2018-02.zip24.6 MB425 records
2018-01.zip19.8 MB460 records
2017-12.zip30.9 MB695 records
2017-11.zip24.3 MB513 records
2017-10.zip25.2 MB654 records
2017-09.zip26.6 MB475 records
2017-08.zip30.7 MB607 records
2017-07.zip24.1 MB565 records
2017-06.zip35.9 MB798 records
2017-05.zip28.4 MB908 records
2017-04.zip22.7 MB556 records
2017-03.zip22.0 MB534 records
2017-02.zip18.0 MB507 records
2017-01.zip21.7 MB460 records
2016-12.zip25.8 MB658 records
2016-11.zip24.4 MB471 records
2016-10.zip26.4 MB962 records
2016-09.zip24.3 MB509 records
2016-08.zip26.9 MB687 records
2016-07.zip23.9 MB558 records
2016-06.zip26.3 MB525 records
2016-05.zip26.0 MB740 records
2016-04.zip24.2 MB731 records
2016-03.zip19.6 MB793 records
2016-02.zip17.7 MB400 records
2016-01.zip20.1 MB493 records
2015-12.zip23.4 MB541 records
2015-11.zip17.7 MB456 records
2015-10.zip20.7 MB544 records
2015-09.zip21.6 MB557 records
2015-08.zip26.7 MB559 records
2015-07.zip30.1 MB926 records
2015-06.zip26.0 MB806 records
2015-05.zip31.0 MB751 records
2015-04.zip28.7 MB888 records
2015-03.zip20.4 MB743 records
2015-02.zip17.9 MB338 records
2015-01.zip24.1 MB600 records
2014-12.zip43.9 MB1,412 records
2014-11.zip23.0 MB509 records
2014-10.zip39.6 MB804 records
2014-09.zip28.0 MB1,053 records
2014-08.zip32.5 MB1,037 records
2014-07.zip33.0 MB760 records
2014-06.zip28.0 MB694 records
2014-05.zip27.9 MB661 records
2014-04.zip31.0 MB1,195 records
2014-03.zip26.0 MB891 records
2014-02.zip19.3 MB482 records
2014-01.zip23.5 MB625 records
2013-12.zip30.2 MB940 records
2013-11.zip30.2 MB939 records
2013-10.zip30.3 MB742 records
2013-09.zip40.8 MB1,436 records
2013-08.zip36.4 MB938 records
2013-07.zip30.7 MB683 records
2013-06.zip38.1 MB1,406 records
2013-05.zip26.3 MB752 records
2013-04.zip40.7 MB1,169 records
2013-03.zip27.0 MB906 records
2013-02.zip27.7 MB934 records
2013-01.zip35.5 MB855 records
2012-12.zip36.7 MB934 records
2012-11.zip30.8 MB745 records
2012-10.zip23.2 MB598 records
2012-09.zip30.9 MB776 records
2012-08.zip35.5 MB1,106 records
2012-07.zip20.0 MB558 records
2012-06.zip33.7 MB974 records
2012-05.zip30.1 MB767 records
2012-04.zip22.5 MB764 records
2012-03.zip20.8 MB612 records
2012-02.zip16.6 MB412 records
2012-01.zip17.9 MB638 records
2011-12.zip31.3 MB1,177 records
2011-11.zip25.5 MB625 records
2011-10.zip33.8 MB1,208 records
2011-09.zip41.5 MB1,319 records
2011-08.zip40.7 MB939 records
2011-07.zip49.8 MB1,201 records
2011-06.zip48.3 MB1,583 records
2011-05.zip38.6 MB1,537 records
2011-04.zip55.8 MB2,561 records
2011-03.zip42.1 MB1,773 records
2011-02.zip18.2 MB513 records
2011-01.zip16.6 MB464 records
2010-12.zip36.8 MB933 records
2010-11.zip39.7 MB941 records
2010-10.zip37.4 MB1,351 records
2010-09.zip22.9 MB609 records
2010-08.zip24.6 MB812 records
2010-07.zip22.8 MB694 records
2010-06.zip33.4 MB1,232 records
2010-05.zip37.8 MB1,913 records
2010-04.zip26.6 MB732 records
2010-03.zip37.2 MB1,333 records
2010-02.zip16.3 MB434 records
2010-01.zip25.4 MB589 records
2009-12.zip32.7 MB812 records
2009-11.zip27.6 MB549 records
2009-10.zip58.1 MB1,066 records
2009-09.zip40.6 MB1,026 records
2009-08.zip35.5 MB874 records
2009-07.zip38.3 MB963 records
2009-06.zip24.9 MB722 records
2009-05.zip20.1 MB480 records
2009-04.zip18.3 MB405 records
2009-03.zip13.1 MB454 records
2009-02.zip11.8 MB211 records
2009-01.zip15.4 MB355 records
2008-12.zip26.5 MB515 records
2008-11.zip22.7 MB613 records
2008-10.zip25.6 MB920 records
2008-09.zip24.0 MB714 records
2008-08.zip28.8 MB998 records
2008-07.zip21.0 MB516 records
2008-06.zip22.8 MB500 records
2008-05.zip29.3 MB1,135 records
2008-04.zip20.9 MB616 records
2008-03.zip19.5 MB559 records
2008-02.zip18.5 MB443 records
2008-01.zip21.5 MB579 records
2007-12.zip30.9 MB922 records
2007-11.zip18.8 MB550 records
2007-10.zip37.2 MB1,501 records
2007-09.zip38.7 MB936 records
2007-08.zip42.0 MB1,334 records
2007-07.zip39.8 MB1,380 records
2007-06.zip39.5 MB1,416 records
2007-05.zip38.0 MB1,028 records
2007-04.zip34.7 MB1,135 records
2007-03.zip34.8 MB1,331 records
2007-02.zip24.9 MB710 records
2007-01.zip24.2 MB631 records
2006-12.zip33.5 MB849 records
2006-11.zip32.3 MB1,005 records
2006-10.zip37.7 MB1,480 records
2006-09.zip30.2 MB837 records
2006-08.zip32.2 MB924 records
2006-07.zip31.7 MB948 records
2006-06.zip39.9 MB1,232 records
2006-05.zip36.0 MB972 records
2006-04.zip32.4 MB1,356 records
2006-03.zip26.5 MB1,010 records
2006-02.zip33.4 MB980 records
2006-01.zip28.2 MB871 records
2005-12.zip35.2 MB1,267 records
2005-11.zip35.7 MB1,143 records
2005-10.zip39.3 MB1,619 records
2005-09.zip33.7 MB1,161 records
2005-08.zip35.2 MB1,362 records
2005-07.zip44.4 MB1,274 records
2005-06.zip48.3 MB1,510 records
2005-05.zip49.8 MB1,767 records
2005-04.zip50.4 MB2,008 records
2005-03.zip27.9 MB1,008 records
2005-02.zip35.1 MB1,091 records
2005-01.zip33.3 MB1,204 records
2004-12.zip43.7 MB1,601 records
2004-11.zip33.0 MB1,343 records
2004-10.zip32.7 MB1,079 records
2004-09.zip45.4 MB1,401 records
2004-08.zip45.1 MB1,416 records
2004-07.zip46.9 MB1,909 records
2004-06.zip55.4 MB1,871 records
2004-05.zip48.6 MB1,906 records
2004-04.zip60.0 MB2,426 records
2004-03.zip46.7 MB1,846 records
2004-02.zip43.9 MB1,589 records
2004-01.zip39.6 MB1,530 records
2003-12.zip46.9 MB1,468 records
2003-11.zip39.9 MB1,225 records
2003-10.zip52.1 MB1,714 records
2003-09.zip49.3 MB1,933 records
2003-08.zip35.9 MB1,264 records
2003-07.zip49.7 MB2,054 records
2003-06.zip45.6 MB2,106 records
2003-05.zip41.1 MB1,384 records
2003-04.zip30.5 MB1,226 records
2003-03.zip20.1 MB843 records
2003-02.zip24.3 MB773 records
2003-01.zip30.0 MB1,044 records
2002-12.zip24.0 MB841 records
2002-11.zip24.7 MB818 records
2002-10.zip30.5 MB1,047 records
2002-09.zip30.1 MB1,097 records
2002-08.zip35.0 MB1,182 records
2002-07.zip35.3 MB1,374 records
2002-06.zip29.9 MB954 records
2002-05.zip49.6 MB1,898 records
2002-04.zip32.9 MB1,384 records
2002-03.zip26.2 MB1,083 records
2002-02.zip33.6 MB1,066 records
2002-01.zip34.4 MB1,357 records
2001-12.zip35.4 MB1,495 records
2001-11.zip36.5 MB1,500 records
2001-10.zip44.9 MB1,637 records
2001-09.zip37.9 MB1,443 records
2001-08.zip46.1 MB1,695 records
2001-07.zip35.9 MB1,345 records
2001-06.zip42.7 MB1,720 records
2001-05.zip29.2 MB1,238 records
2001-04.zip37.7 MB1,586 records
2001-03.zip21.9 MB859 records
2001-02.zip26.6 MB1,064 records
2001-01.zip24.7 MB1,000 records
2000-12.zip28.7 MB1,190 records
2000-11.zip27.0 MB1,095 records
2000-10.zip36.3 MB1,471 records
2000-09.zip26.1 MB992 records
2000-08.zip38.3 MB1,507 records
2000-07.zip32.9 MB1,319 records
2000-06.zip31.8 MB1,279 records
2000-05.zip42.7 MB1,794 records
2000-04.zip36.3 MB1,399 records
2000-03.zip36.8 MB1,568 records
2000-02.zip41.8 MB1,735 records
2000-01.zip34.5 MB1,318 records
1999-12.zip43.9 MB1,813 records
1999-11.zip42.8 MB1,681 records
1999-10.zip45.0 MB1,824 records
1999-09.zip52.4 MB2,038 records
1999-08.zip55.5 MB2,454 records
1999-07.zip53.4 MB2,437 records
1999-06.zip59.4 MB2,480 records
1999-05.zip47.5 MB2,143 records
1999-04.zip51.9 MB2,338 records
1999-03.zip52.2 MB1,657 records
1999-02.zip48.9 MB2,156 records
1999-01.zip42.6 MB1,676 records
1998-12.zip45.4 MB1,603 records
1998-11.zip53.4 MB2,244 records
1998-10.zip65.5 MB2,071 records
1998-09.zip83.0 MB3,393 records
1998-08.zip80.3 MB3,244 records
1998-07.zip84.1 MB3,629 records
1998-06.zip96.1 MB4,154 records
1998-05.zip72.3 MB2,960 records
1998-04.zip88.3 MB3,726 records
1998-03.zip52.2 MB2,062 records
1998-02.zip60.0 MB2,471 records
1998-01.zip67.2 MB2,580 records
1997-12.zip114.1 MB3,031 records
1997-11.zip72.6 MB2,965 records
1997-10.zip78.6 MB3,235 records
1997-09.zip61.3 MB2,732 records
1997-08.zip54.1 MB2,485 records
1997-07.zip61.9 MB2,362 records
1997-06.zip48.4 MB2,293 records
1997-05.zip49.2 MB1,975 records
1997-04.zip50.1 MB2,178 records
1997-03.zip36.1 MB1,587 records
1997-02.zip29.8 MB1,295 records
1997-01.zip47.9 MB1,787 records
1996-12.zip38.0 MB1,637 records
1996-11.zip32.5 MB1,345 records
1996-10.zip40.5 MB1,644 records
1996-09.zip33.0 MB1,405 records
1996-08.zip36.8 MB1,422 records
1996-07.zip40.8 MB1,755 records
1996-06.zip41.3 MB1,692 records
1996-05.zip31.8 MB1,256 records
1996-04.zip19.1 MB821 records
1996-03.zip13.2 MB554 records
1996-02.zip15.9 MB668 records
1996-01.zip16.7 MB586 records
1995-12.zip15.9 MB632 records
1995-11.zip13.9 MB623 records
1995-10.zip15.1 MB681 records
1995-09.zip15.2 MB673 records
1995-08.zip19.3 MB691 records
1995-07.zip9.1 MB492 records
1995-06.zip14.5 MB537 records
1995-05.zip11.2 MB401 records
1995-04.zip12.5 MB529 records
1995-03.zip7.9 MB315 records
1995-02.zip11.2 MB456 records
1995-01.zip11.1 MB458 records
1994-12.zip8.4 MB337 records
1994-11.zip10.5 MB443 records
1994-10.zip7.3 MB329 records
1994-09.zip6.4 MB340 records
1994-08.zip6.4 MB254 records
1994-07.zip4.9 MB235 records
1994-06.zip7.4 MB316 records
1994-05.zip5.8 MB246 records
1994-04.zip7.5 MB319 records
1994-03.zip6.2 MB230 records
1994-02.zip6.1 MB242 records
1994-01.zip7.8 MB303 records

What This Dataset Contains

The dataset is the complete corpus of Form S-4 and Form S-4/A EDGAR submissions. Form S-4 is the Securities Act of 1933 registration statement used whenever a registrant issues securities as consideration in a business combination: mergers, stock-for-stock acquisitions, reorganizations, share-exchange offers, debt-for-debt exchange offers, and certain going-private transactions structured as combinations. The filing simultaneously performs two jobs — it registers the new securities with the Commission, and, where a shareholder vote is required, it doubles as the combined proxy statement/prospectus delivered to target-company shareholders asked to vote on or tender into the deal. Because of that dual role, a single S-4 typically contains deal-description narrative, historical financial statements of both acquirer and target, pro forma combined financials, risk factors, descriptions of the securities being issued and of the combined entity, and the full text of the merger agreement and ancillary contracts as exhibits. Form S-4/A is the amendment variant, filed to respond to SEC staff comment letters, to incorporate updated financials, or to reflect revised transaction terms. Amendments range from full restatements of the prospectus to narrow updates that touch only a single exhibit or paragraph; the record format is identical in either case.

The dataset spans January 1994 — the start of EDGAR's mandatory electronic-filing phase-in — through the present, and is distributed as monthly ZIP containers. Recent monthly cadence sits around thirty S-4 and S-4/A filings per month based on November 2025 sampling, reflecting both fresh registration statements and the multi-amendment lifecycle typical of SEC review. Form S-4 itself was adopted in 1985 to consolidate earlier business-combination registration forms; any pre-1994 S-4 filings exist only in paper records outside this dataset.

Content Structure of a Single Record

1. What one record represents

One record is a single EDGAR submission of Form S-4 or Form S-4/A, keyed by its SEC accession number. On disk, the record is one sub-folder inside a monthly ZIP whose name is the eighteen-digit accession in zero-padded, dash-stripped form (for example 000119312525262679/). Each folder contains exactly one metadata.json envelope together with the SGML-wrapped documents that constitute the original submission. The record unit is the filing, not the transaction: a single business combination typically generates an initial S-4 plus one or more S-4/A amendments, each of which is an independent record with its own accession, its own exhibit set, and its own metadata block. Per-folder document counts range from a two-file minimum (the updated primary document plus metadata.json, when an amendment re-files nothing else) up to roughly two dozen documents for deal-heavy filings that carry charters, bylaws, tax and legality opinions, consents, and extensive material-contract exhibits.

2. Container organization

The dataset is distributed as monthly ZIP containers laid out <year>/<year>-<month>.zip. Inside a monthly ZIP, the top level is a single directory named after the filing month (for example 2025-11/), the second level is one folder per accession number in dash-stripped form, and the third level is the flat set of filing documents plus the metadata.json envelope. All documents sit as siblings inside the accession folder; there is no further sub-structure separating the primary document from the exhibits. The exhibit role of each file is carried in the metadata.json envelope and in the SGML <TYPE> tag of the document itself, not in the directory layout. Bundling is strictly temporal: every accession filed in a given calendar month lands in that month's archive regardless of registrant, industry, or deal size, so the monthly ZIP is the natural unit for incremental ingestion.

3. The metadata.json envelope

Every accession folder carries exactly one metadata.json file that serves as the structured anchor for the filing. Its top-level scalar fields identify the submission on EDGAR:

  • formType — either S-4 or S-4/A.
  • accessionNo — the canonical dashed accession (for example 0001193125-25-262679); this is the same identifier encoded dash-free in the folder name.
  • filedAt — ISO 8601 timestamp with timezone offset.
  • description — human-readable form label, with [Amend] appended for /A filings.
  • linkToFilingDetails, linkToTxt, linkToHtml — URLs pointing respectively to the primary document, the complete-submission .txt stream, and the EDGAR filing-index page on sec.gov.
  • linkToXbrl — URL to the XBRL instance; frequently an empty string on plain S-4 filings because the form does not carry full financial-statement XBRL, only the structured filing-fee exhibit.
  • id — a thirty-two-character hexadecimal content hash suitable for deduplication.

Three array fields carry the substantive structure of the submission.

documentFormatFiles[] enumerates every document in the original EDGAR submission — the primary registration statement, every exhibit, any graphics, and the trailing complete-submission .txt. Each entry is an object carrying sequence, size, documentUrl, description, and type. The type field holds the canonical exhibit taxonomy (S-4, S-4/A, EX-8.1, EX-10.11, EX-21.1, EX-23.1, EX-99.4, EX-FILING FEES, GRAPHIC, and so on) and is the authoritative handle for identifying a document's role; the final entry in the array is typically the complete-submission text file with a blank type and no sequence.

dataFiles[] lists the XBRL companion files (XSD schema, XML instance, extracted _htm.xml) when the filing carries inline XBRL beyond the fee table. It is commonly empty for plain S-4 submissions and populated for S-4/A filings that carry broader inline XBRL.

entities[] lists the filer(s) and co-registrants. Each entity object carries companyName, cik, fileNo, irsNo, fiscalYearEnd, stateOfIncorporation (a two-character SEC jurisdiction code; I0 for France, DE for Delaware, and so on), act, sic, filmNo, type, and an optional tickers[] array for publicly listed parties. Multi-registrant S-4 filings — a frequent pattern when a holding company co-files with its operating subsidiary, when guarantors co-register debt securities, or when spin-merger structures pair multiple affiliated registrants — produce multiple entity objects sharing a file-number group. The group is expressed as a primary file number plus dash-suffixed siblings (for example 333-289994 on the lead entity and 333-289994-01 on the co-registrant), allowing every issuer, guarantor, and co-registrant affiliate to be enumerated alongside its individual CIK and SIC code.

A seriesAndClassesContractsInformation[] array is present in the envelope schema but is typically empty for S-4, as it is used for investment-company contract disclosures that do not apply to this form.

The envelope advertises more documents than the ZIP ships: image entries (type: GRAPHIC, usually .jpg or .gif) and the complete-submission .txt appear in documentFormatFiles[] for completeness, but neither is placed in the archive. Consumers that need those assets must follow the documentUrl back to EDGAR.

4. The primary S-4 / S-4/A document

The primary document is the file whose documentFormatFiles[].type is S-4 or S-4/A and whose internal SGML <TYPE> tag matches. File names are preparer-specific stubs — for example d37576ds4.htm, tm2524487-6_s4a.htm, or newtek2025exchangeoffer-fo.htm — with no standardized naming convention across filers, so role identification must rely on the type field or the SGML header, never on filename heuristics.

Every .htm in the archive — primary and exhibit alike — is an EDGAR SGML-wrapped document rather than a bare HTML file. The first lines always follow this preamble:

1 <DOCUMENT>
2 <TYPE>S-4
3 <SEQUENCE>1
4 <FILENAME>d37576ds4.htm
5 <DESCRIPTION>S-4
6 <TEXT>
7 <HTML>... full HTML body ...</HTML>
8 </TEXT>
9 </DOCUMENT>

The <TYPE> token matches the documentFormatFiles[].type value, <SEQUENCE> matches the document's ordering within the submission, and <FILENAME> matches the local file name. Any HTML parser applied directly to the raw bytes will fail on the SGML preamble; extractors must either strip the wrapper first or operate on the payload between <TEXT> and </TEXT>.

Inside the HTML body, the primary document follows the conventional registration-statement scaffold. The cover page identifies the registrant, the Securities Act registration file number (for example 333-291227), the state of incorporation, the IRS Employer Identification Number, the primary standard industrial classification code, the address and agent for service, and a reference to the calculation-of-registration-fee table (which, for post-compliance filings, points to the separate EX-FILING FEES exhibit rather than being tabulated inline). A letter to target shareholders and a notice of special meeting typically follow when the S-4 doubles as a proxy statement. The prospectus/proxy body then presents, in varying order depending on deal structure: a "Questions and Answers" summary, a "Summary of the Transaction", the detailed "The Merger" or "The Exchange Offer" section describing background, terms, consideration mechanics, fairness opinions, and regulatory approvals, a full "Risk Factors" section, "Selected Historical Financial Data" for each party, "Unaudited Pro Forma Condensed Combined Financial Statements" showing the combined balance sheet and income statement giving effect to the transaction, "Description of Capital Stock" or "Description of the Securities Being Registered", a "Comparison of Rights of Shareholders" contrasting the acquirer's and target's charter and bylaw provisions, a "Description of the Combined Entity" covering business overview, management, and post-closing governance, and a "Material U.S. Federal Income Tax Consequences" section keyed to the EX-8.1 tax opinion. Audited historical financial statements of the target (and sometimes the acquirer) are either included in full or incorporated by reference from Exchange Act filings. Undertakings, signatures of the registrant together with its directors and principal officers, and the exhibit index close the primary document.

5. The S-4 exhibit taxonomy

S-4 submissions carry a characteristic exhibit set driven by Item 21 of Form S-4 and the exhibit requirements of Regulation S-K Item 601. Each exhibit is a separate .htm file in the accession folder whose <TYPE> tag and matching documentFormatFiles[].type value carry the exhibit number. The exhibits most relevant to S-4 are:

  • EX-2.x — the plan of acquisition, reorganization, arrangement, liquidation, or succession: the merger agreement or share-exchange agreement itself, together with any amendments. This is the definitive transaction contract.
  • EX-3.x — charters and bylaws of the registrant, including post-closing amended-and-restated charters when the transaction reshapes the acquirer's equity structure. Filings that overhaul the capital structure can enumerate charters and bylaws running well past EX-3.10.
  • EX-4.x — instruments defining the rights of security holders: indentures, rights agreements, specimen stock or note certificates, and supplemental indentures for newly registered debt.
  • EX-5.x — legality opinion from counsel that the securities being registered will be validly issued, fully paid, and non-assessable.
  • EX-8.x — tax opinion from counsel addressing the material U.S. federal income tax consequences of the transaction, particularly whether a merger qualifies as a tax-free reorganization under Section 368 of the Internal Revenue Code.
  • EX-10.x — material contracts, including employment and retention agreements entered into in connection with the deal, voting and support agreements from significant shareholders, financing and debt-commitment letters, transition-services agreements, and sponsor-support and PIPE subscription agreements on de-SPAC S-4s.
  • EX-21.x — list of subsidiaries of the registrant, relevant for both acquirer and, where applicable, target disclosures.
  • EX-23.x — consents of independent auditors and other experts whose reports or opinions are included or incorporated by reference. S-4 filings almost universally carry multiple EX-23 entries — one per audit firm per entity whose statements appear — because both the acquirer's and the target's auditors must consent.
  • EX-99.x — additional exhibits. On S-4 these typically include the form of proxy card, form of letter to shareholders, form of letter of transmittal and notice of guaranteed delivery for exchange offers, fairness opinions from financial advisors, and press releases announcing the transaction.
  • EX-FILING FEES — the structured filing-fee calculation exhibit. Unlike every other .htm in the archive, this file's body is an inline XBRL document using the ix, xbrli, dei, and ffd namespaces to render machine-readable fee-calculation tables. It is present whenever fees are being calculated on the instant filing rather than deferred to a later amendment.

The presence or absence of specific exhibits is itself a strong signal of transaction structure: exchange offers carry letter-of-transmittal EX-99.x exhibits; stock-for-stock mergers requiring shareholder approval carry proxy-card EX-99.x and tax-opinion EX-8.x exhibits; deals that rewrite the acquirer's charter carry an enumerated run of EX-3.x files; and narrow amendments often consist of only an updated primary document plus a fresh EX-FILING FEES exhibit and one or more updated EX-23.x consents.

6. Included content

Each record includes the structured metadata envelope and every textual document from the original EDGAR submission: the primary S-4 or S-4/A registration statement, every exhibit filed on the submission (predominantly .htm, with occasional .txt, .pdf, or .xfd form-based filings), and the inline-XBRL filing-fee exhibit where present. SGML document wrappers are preserved intact, so the <TYPE>, <SEQUENCE>, <FILENAME>, and <DESCRIPTION> tags remain available for parsing and for correlating each file with its documentFormatFiles[] entry. The metadata envelope exposes issuer identifiers, co-registrant affiliations, SIC classification, tickers, file numbers, and exhaustive per-document URLs, so any externally hosted asset can be re-fetched deterministically from EDGAR.

7. Excluded or separate content

Three categories of content are systematically omitted from each record. First, image files (type: GRAPHIC, typically .jpg or .gif, used for signature blocks, organization charts, deal-structure diagrams, auditor and fairness-opinion logos, and similar visual artifacts referenced from the HTML body) are enumerated in documentFormatFiles[] but are not placed in the ZIP. Second, the EDGAR complete-submission .txt bundle — the concatenated SGML file that carries every document of the submission in one stream — is enumerated in the envelope and addressable through linkToTxt, but is not included locally, because its content is already available disaggregated as the individual .htm documents. Third, XBRL companion data files (.xsd schemas, _htm.xml instance documents) listed in dataFiles[] are referenced by URL but not bundled; only the .htm carrier of the filing-fee iXBRL exhibit is shipped in the ZIP, because for that exhibit the structured data is embedded directly in the HTML file itself. Content incorporated by reference from other filings — most commonly the acquirer's Exchange Act reports cited in the S-4 body — is not expanded into the record; the S-4 text retains the textual reference, but the referenced filings remain in their own accession records elsewhere in EDGAR.

8. Structural evolution over time

Although the dataset spans more than three decades, the underlying filing format evolved across that window. Filings from 1994 through the late 1990s were submitted as ASCII/SGML-only documents: the SGML document wrapper that still frames every file today was the entire format, and the body between <TEXT> and </TEXT> was plain text rather than HTML, with tables laid out as fixed-width character grids. HTML bodies became common in the early 2000s as EDGAR accepted HTML submissions, and for roughly the last two decades essentially every S-4 has carried a true HTML body inside the SGML wrapper, with CSS-styled tables, inline image references, and navigation anchors. The SGML preamble itself has remained stable throughout.

Two regulatory shifts materially changed the record's content. First, amendments to Regulation S-K Item 601(b)(2) and related guidance now permit redaction of confidential commercial information from material-contract exhibits (EX-2 and EX-10) without a formal confidential-treatment request; S-4 filings after the rule change frequently contain redacted merger agreements with explicit [***] or bracketed redaction markers in the exhibit text. Second, and more consequential for structured extraction, the Commission adopted Rule 408 and amended Rule 411 to require filing-fee tables in a structured, inline-XBRL format for most fee-bearing registration statements including Form S-4, phased in from 2022 by filer class. Filings before the compliance date carry a free-form "Calculation of Registration Fee" table inside the primary document; filings after carry a separate EX-FILING FEES exhibit whose .htm body is an inline-XBRL document with ix:, xbrli:, dei:, and ffd: namespaces encoding each fee line-item as structured facts. The iXBRL fee exhibit is the single most reliably machine-parseable component of modern S-4 records.

9. Variants worth recognizing

Several structural variants recur across the dataset and are worth recognizing explicitly.

  • S-4/A amendments are a large fraction of records. An amendment can be as substantial as a fully restated prospectus with a refreshed exhibit set, or as minimal as a two-file record consisting solely of the updated primary document and metadata.json when no exhibits are re-filed.
  • Multi-registrant co-filings are characteristic of S-4 because holding-company / operating-subsidiary pairs, guarantor structures for registered debt, and spin-merger Reverse Morris Trust transactions generate multiple co-registrants sharing a file-number group (primary file number plus dash-suffixed siblings -01, -02). The entities[] array captures every co-filer with its own CIK, SIC, and state of incorporation.
  • Shell-company S-4s, filed by SPACs and newly formed acquisition vehicles in connection with de-SPAC business combinations, follow the same structural template but typically lack historical financials of the acquirer (which is the shell) and carry extensive target-company financials along with sponsor-support agreements and PIPE-subscription exhibits under EX-10.x and EX-99.x.
  • Exchange-offer S-4s, used in debt-for-debt and stock-for-stock tender transactions, carry letter-of-transmittal and notice-of-guaranteed-delivery exhibits in the EX-99.x series in place of proxy-card materials, and often omit the EX-8.x tax opinion when the exchange is taxable.

10. Interpretation and extraction notes

Several nuances materially affect downstream extraction.

  • Exhibit identification must ignore file names. Document file names are preparer-specific stubs with no guaranteed convention (tm2524487d9_ex23-1.htm, d37576dex231.htm, flyx-ex23_1.htm are all EX-23.1 consents in different filings). Use the SGML <TYPE> tag or the documentFormatFiles[].type value.
  • SGML wrappers must be stripped before HTML parsing. Raw HTML parsers will encounter <DOCUMENT>, <TYPE>, <SEQUENCE>, <FILENAME>, <DESCRIPTION>, and <TEXT> lines as non-HTML content before the <HTML> payload begins.
  • Exhibit numbering is non-contiguous. A filing may carry EX-3.1, EX-3.3, and EX-3.7 without intervening numbers when the primary document's exhibit index enumerates more exhibits than are being filed on the instant submission, with the omitted ones incorporated by reference from prior filings. Gaps do not indicate missing documents.
  • Target financial statements are frequently incorporated by reference. A "complete" view of the transaction may require traversing references into the target's 10-K and 10-Q accessions elsewhere in EDGAR; the S-4 body carries the textual reference but not the statements themselves.
  • EX-FILING FEES requires an iXBRL-aware parser. Treating it as plain HTML yields only the rendered table text, not the structured facts (security class, proposed maximum aggregate offering price, fee rate, total fee paid). Use an XBRL/iXBRL parser that recognizes the ffd: namespace.
  • Amendment-chain reconstruction joins on file number, not accession. Each S-4 and each S-4/A has its own accessionNo, but all amendments of a given registration share the same entities[].fileNo with the original S-4. Sequence-of-amendments reconstruction must therefore join on file number; the description field's [Amend] marker and the formType value are the direct indicators of amendment status on any individual record.
  • Redacted commercial terms may appear in EX-2 and EX-10. Post-rule-change filings can carry [***] bracketed redaction markers inside merger agreements and ancillary contracts; extracted exhibit text will faithfully preserve these markers rather than the underlying confidential values.
  • Images are excluded, so rendered HTML will show broken image references. Any <img> reference in the HTML body — signature images, organization charts, deal-structure diagrams, fairness-opinion logos — will fail to resolve when rendered locally. The underlying assets remain fetchable via the documentUrl entries of the corresponding documentFormatFiles[] objects.

Who Files or Publishes This Dataset, and When

Who files: the issuer of the registrable consideration

The Form S-4 registrant is the entity issuing securities as consideration in a business-combination transaction. Depending on the structure, that is:

  • the acquirer in a stock-for-stock merger,
  • the offeror in a securities exchange offer,
  • a newly formed holding company in a reorganization, or
  • the surviving or reclassified entity where existing holders receive new registrable securities.

The target is not the filer, even though much of the prospectus describes the target's business, risk factors, and financials. The target is the counterparty whose shareholders will receive the acquirer's securities. Target information appears in the S-4 because Rule 145 treats the solicitation of target-shareholder consent as an "offer" of the acquirer's securities to those shareholders, and Section 5 of the Securities Act requires the entity offering and selling securities to register them. Where the target also needs a shareholder vote, the target typically files its own Schedule 14A proxy materials, or the S-4 is structured as a joint proxy statement/prospectus covering both votes.

Typical filers include domestic operating-company acquirers, holding-company reorganizers, SPACs consummating de-SPAC business combinations, and foreign private issuers using S-4 where F-4 eligibility is not met or where a domestic subsidiary is the immediate issuer. Co-registrants routinely appear on one S-4 when subsidiary guarantors of registered debt securities must themselves register, or when a new parent and its merging subsidiaries are all named issuers.

Regulatory framework

  • Securities Act Section 5 is the source obligation: no offer or sale of securities in interstate commerce without an effective registration statement. Form S-4 is the purpose-built form for business-combination registrations.
  • Rule 145 classifies mergers, consolidations, certain reclassifications, and asset transfers for securities as "offers" and "sales" to the voting security holders of the target. It is the conceptual hook that turns a corporate transaction into a Securities Act registration event.
  • Rule 165 permits written pre- and post-filing communications about the transaction, provided they are filed with the SEC (generally as Rule 425 submissions) and carry the required legends. Rule 425 communications are a related but separate dataset.
  • Rule 162 modifies timing rules so that an exchange offer may commence before the S-4 is declared effective, provided no securities are purchased until effectiveness.
  • Regulation M-A (Items 1000-1016) supplies the substantive M&A disclosure items cross-referenced by Form S-4: background of the transaction, reasons, fairness opinions, financial advisor reports, and past contacts.
  • Schedule 14A applies when acquirer, target, or both need a shareholder vote. The S-4 prospectus is then bound with a proxy statement as a joint proxy statement/prospectus and must satisfy both Securities Act and Regulation 14A requirements, including proxy timing rules.

Triggering events for a new S-4

Filing is event-driven, not periodic. A new S-4 is prepared once the registrant has committed to a transaction in which it will issue registrable securities. Typical triggers:

  • execution of a definitive merger agreement paying acquirer stock as all or part of consideration;
  • stock-for-stock acquisitions (fixed-ratio or floating-ratio);
  • exchange offers using the offeror's securities as consideration;
  • holding-company reorganizations in which a new parent issues shares for the existing operating company's shares;
  • reclassifications that change existing security rights enough to be a new offer under Rule 145;
  • spin-offs executed as part of a broader combination where the spun entity issues registrable shares;
  • de-SPAC initial business combinations.

Purely cash acquisitions do not trigger S-4. With no securities offered to target holders, those deals are disclosed, if at all, under Exchange Act tender-offer or proxy rules rather than Securities Act registration.

When S-4/A amendments are triggered

Form S-4/A is a pre-effective amendment (or, less commonly, a post-effective amendment). Because S-4s are almost always reviewed by the Division of Corporation Finance, amendments are the norm, not the exception. A new S-4/A is typically filed when:

  • the staff issues a comment letter and the registrant responds with revised disclosure (commonly on background of the transaction, fairness opinions, projections, pro forma financials, or risk factors);
  • financial statements of the acquirer or target must be updated as review stretches across a new interim or annual period;
  • deal terms change (exchange ratio, collars, cash/stock mix, contingent value rights);
  • a pricing amendment is required;
  • fairness opinions or board disclosures are updated or supplemented;
  • the registrant submits a final pre-effective amendment alongside a request for acceleration of effectiveness.

Multiple S-4/A amendments per transaction are routine; contested or complex deals can produce five or more before effectiveness.

Effectiveness, vote, and closing

The S-4 must be declared effective before the acquirer may issue the registrable securities. Because the S-4 almost always carries the joint proxy statement/prospectus, the effectiveness date also sets the earliest date for mailing definitive proxy materials. State-law meeting notice and Regulation 14A timing then govern the vote, and closing follows shareholder approval and satisfaction of regulatory conditions (HSR clearance, foreign competition approvals, sector-specific approvals). The effectiveness-to-closing window typically runs four to eight weeks, longer where regulators delay or the deal is contested.

Edge cases and co-registrant patterns

  • Multi-registrant filings: a single S-4 often names a parent issuer plus subsidiary co-registrants that will guarantee registered debt or emerge from the merger as named obligors. Each co-registrant signs the registration statement and is a filer of record.
  • De-SPAC S-4s: SPACs filing S-4s for their initial business combinations face heightened scrutiny following the 2024 SPAC rules. These filings frequently co-register successor shares and warrants, with the target treated as the accounting acquirer in historical financials.
  • Cross-border boundary with F-4: foreign acquirers register on S-4 when F-4 eligibility conditions are not met or a domestic subsidiary is the immediate issuer. Where the foreign acquirer itself qualifies, Form F-4 is the analogous form and those filings appear in the F-4 dataset, not here.
  • Newly formed holding-company registrants: in reorganizations, the top-of-structure filer may be an entity with no operating history, and its disclosure is dominated by the existing operating company's financials.
  • Substantive restructurings: where a deal is materially restructured, the filer may file a heavily revised S-4/A replacing the original prospectus, or withdraw the original and refile a fresh S-4.

How This Dataset Differs From Similar Datasets or Filings

Form S-4 sits at the intersection of two regimes: Securities Act registration of newly issued shares and Exchange Act solicitation of shareholders who must vote or tender. Depending on deal structure, an M&A transaction may trigger one, both, or neither of those regimes. The comparisons below isolate the single event or condition that makes S-4 the right filing rather than each adjacent form.

Form S-1 / S-1/A

Form S-1 is the general Securities Act registration used for IPOs, follow-ons, and resale registrations where shares are sold for cash. S-4 registers shares issued as merger consideration, with the target's shareholders receiving stock in exchange for their old stock rather than paying cash. The trigger is the nature of the consideration: cash offering uses S-1, stock-for-stock business combination uses S-4. Pricing ranges, underwriter syndicates, and use-of-proceeds live in S-1; exchange ratios, fairness opinions, and merger-consideration mechanics live in S-4.

Form F-4 / F-4/A

F-4 is the S-4 equivalent when the issuer of the consideration securities qualifies as a foreign private issuer under Rule 405. The disclosure architecture is parallel, but F-4/A permits IFRS financials without U.S. GAAP reconciliation and accommodates home-country governance practice. The trigger is solely the FPI status of the share issuer, not the target's domicile: a U.S. acquirer issuing shares to foreign targets files S-4; a foreign acquirer issuing shares to U.S. targets files F-4.

Form S-3

Form S-3 is a shelf registration available to seasoned issuers meeting float and reporting tests, used for transaction-agnostic, recurring capital raises taken down via 424(b) supplements. S-4 is a one-off, deal-specific document that goes effective once and is consumed in a single business combination. S-3 cannot register merger consideration; the presence of a named target, exchange ratio, and shareholder vote forces S-4.

Schedule 14A (proxy statement)

Schedule 14A is the Exchange Act proxy form used when shareholders are solicited for a vote but no new registered securities are issued to them. The boundary is whether Securities Act registration is required: cash mergers require a DEFM14A with no S-4, while stock-for-stock deals require an S-4 whose combined proxy statement/prospectus absorbs the target's 14A function. If the target shareholders are receiving registered stock, the vote disclosure lives inside the S-4, not in a standalone 14A.

Schedule TO

Schedule TO is the tender-offer statement required under Section 14(d) / Regulation 14D. Pure cash tender offers generate only a Schedule TO. Exchange offers paying newly registered securities generate both: Schedule TO for the tender-offer mechanics (bidder identity, conditions, withdrawal rights) and a co-filed S-4 for registration of the share consideration. Neither substitutes for the other. Going-private deals add a Schedule 13E-3 alongside whichever of S-4 or 14A applies.

Form 425

Form 425 is the wrapper for written communications about a business combination made under Rules 165 and 425, covering press releases, investor decks, employee Q&As, and scripts distributed before S-4 effectiveness; it also serves as Rule 14a-12 soliciting material. 425 and S-4 are complementary, not alternatives: 425 captures the running stream of pre-effectiveness communications (dozens of short filings per deal), while S-4 is the single long registration document that ultimately goes effective.

Form 8-K (Items 1.01, 2.01)

Form 8-K time-stamps material events; S-4 registers and discloses. Item 1.01 marks signing of the merger agreement and starts the S-4 drafting clock; Item 2.01 marks closing and ends the S-4's useful life because the consideration shares have been issued. Item 9.01 typically attaches the merger agreement as Exhibit 2.1. Use 8-Ks for trigger dates and agreement text; use S-4 for the prospectus, pro forma financials, and fairness opinions.

Rule 424(b) prospectus supplements

Rule 424(b) filings document individual takedowns from an effective S-3 (or S-1) shelf, carrying pricing and final terms without re-filing the base registration. S-4 does not use 424(b): it reaches effectiveness through pre-effective S-4/A amendments, and consideration is fixed by the merger agreement's exchange ratio rather than a book-built price. Track shelf takedowns via 424(b); track iterative merger disclosure via the S-4/A series.

Boundary summary

S-4 is the only SEC filing that simultaneously (a) registers newly issued shares under the Securities Act, (b) delivers a prospectus to target shareholders, and (c) solicits the target (and often acquirer) vote in stock-for-stock deals. S-1 and S-3 register without soliciting; Schedule 14A solicits without registering; Schedule TO governs tender mechanics without registering the exchange consideration; Form 425 and 8-K communicate and announce but do neither. Whenever registered securities are used as consideration in a business combination, S-4 is the correct primary source; for cash deals, going-private transactions, pure tender offers, or capital raises, an adjacent dataset above is correct instead.

Who Uses This Dataset

Form S-4 filings bundle a negotiated merger agreement, deal chronology, fairness opinions, pro forma financials, tax and legality opinions, antitrust disclosure, and shareholder-vote mechanics into one registered prospectus. Each professional group below pulls a specific slice.

M&A bankers and deal advisors

Coverage and M&A associates use S-4s as a precedent library. They extract exchange ratios, fixed and floating collars, and unaffected-price premia from "The Merger" and "Background of the Merger," and calibrate valuation work against the DCF ranges, selected-companies multiples, and premia-paid analyses disclosed in the "Opinion of Financial Advisor" section. Article I adjustment and walk-away provisions in EX-2.1 feed structuring templates.

M&A and securities counsel

Deal and disclosure counsel benchmark EX-2.1 across transactions: reps and warranties, interim covenants, no-shop and go-shop provisions (typically Section 5.03 or 5.04), fiduciary outs, MAE definitions, termination-fee triggers and quantum, and specific-performance clauses. EX-5.1 legality opinions and EX-8.1 Section 368(a) tax opinions serve as drafting templates, with counsel tracking evolving qualifications. "Risk Factors" and "Litigation Related to the Merger" language is reused when drafting a new S-4.

Merger-arb and event-driven analysts

Arb desks parse Article VI closing conditions, "reasonable best efforts" versus "hell or high water" regulatory covenants, divestiture commitments, financing conditions, and the list in "Regulatory Approvals Required for the Merger." Article VIII termination provisions supply drop-dead dates, reverse termination fees, and ticking-fee mechanics for spread and downside models. Go-shop windows, matching rights, and superior-proposal definitions drive topping-bid probability.

Proxy solicitors and governance advisors

Solicitors and governance advisors track S-4/A amendments for changes in voting mechanics. They work from "The Special Meeting" (record date, quorum, approval thresholds), "Appraisal Rights" or "Dissenters' Rights" statutory notices, and "Interests of Directors and Executive Officers in the Merger" for golden-parachute and Rule 14a-21(c) Say-on-Pay disclosures. Outputs include vote recommendations, solicitation strategy, and perfection-of-appraisal guidance.

Equity research analysts

Sell-side and buy-side analysts rebuild combined-entity economics from the "Unaudited Pro Forma Condensed Combined Financial Information" (Article 11 of Regulation S-X), modeling revenue, EBITDA, leverage, and EPS accretion or dilution. Management projections in "Certain Unaudited Prospective Financial Information" give a rare window into internal forecasts, and synergy disclosures (cost versus revenue, phasing, integration costs) feed models for both parties and sector peers.

Corporate development teams

In-house corp dev benchmarks termination fees as a percentage of equity value, reverse termination fees for financing or antitrust failure, CVR mechanics (in "Description of CVRs" or a CVR Agreement exhibit), escrow and holdback terms, and retention packages. "Background of the Merger" is used to study process design: auction versus bilateral, don't-ask-don't-waive standstills, and board-meeting sequencing.

Transaction-services auditors and valuation specialists

Audit and transaction-services teams mine pro forma footnotes for ASC 805 application: intangible identification and amortization, bargain-purchase gains, step acquisitions, and transaction-cost treatment. EX-23.1 and EX-23.2 auditor consents document sign-off on historical financials. Valuation professionals use disclosed purchase price, allocation methodology, and goodwill-to-consideration ratios as comparable inputs.

Corporate finance and law academics

Finance scholars build large-sample studies of premia, payment mix, announcement returns, and completion rates. Law-and-finance researchers study fiduciary-duty disclosures in "Background of the Merger," Revlon and Unocal implications of deal-protection devices, and the drafting response to Delaware doctrine (appraisal arbitrage, Corwin cleansing, MFW in controller deals). Coverage back to 1994 supports longitudinal work on market-check design, go-shop prevalence, and MAE drafting.

Antitrust economists and competition staff

Competition economists and staff attorneys treat S-4s as the public analog to confidential HSR filings. "Regulatory Approvals," antitrust risk factors, and EX-2.1 efforts covenants disclose the parties' own market definitions, overlaps, and contemplated divestitures or behavioral remedies. Post-filing amendments often contain revised market-definition language after second-request negotiations, supporting retrospective review and divestiture-design research.

Deal-data vendors

Deal-tracker platforms extract parties, announcement and effective dates, consideration structure, collars, termination fees, financial and legal advisors, advisor fees (from "Fees and Expenses" in the fairness-opinion section), and vote outcomes. These feed league tables, precedent-transaction databases, and fee-benchmarking products.

M&A journalists

Deal reporters use S-4s and S-4/A amendments for negotiation chronology in "Background of the Merger," Party A and Party B references to alternative bidders, executive compensation on departure, merger-litigation settlements with supplemental disclosures, and revised fairness-opinion inputs. This supports deal post-mortems and coverage of contested transactions.

NLP and LLM engineering teams

Teams building deal-term extractors and retrieval tools pair the prospectus narrative with the executed EX-2.1 to train models on provision extraction, deal-term classification, and closing-risk QA. S-4/A amendments supply supervision signals for revision tracking across a deal's life.

Specific Use Cases

Concrete workflows anchored to specific S-4 exhibits, sections, and metadata fields.

Merger-agreement precedent library

M&A associates and deal counsel extract every EX-2.1 (and its amendments EX-2.2, EX-2.3) from S-4 filings in a target SIC code over the last five years, strip the SGML wrapper, and load the clean text into a clause-tagged repository. The output is a searchable precedent library for no-shop and go-shop language (typically Section 5.03 or 5.04), MAE carve-outs, fiduciary outs, and specific-performance provisions, used at first-draft time on new deals.

Risk-arbitrage triage from new filings

Event-driven desks ingest each month's ZIP, filter by formType in (S-4, S-4/A), and parse Article VI (closing conditions) and Article VIII (termination) from EX-2.1 plus the "Regulatory Approvals Required for the Merger" section of the primary document. The pipeline emits a daily watch list with drop-dead date, termination-fee quantum, reverse termination fee, and the regulatory-efforts standard ("reasonable best efforts" vs. "hell or high water") scored for downside and topping-bid risk.

Fairness-opinion league tables

Deal-data vendors walk every S-4 filed in a calendar year, locate each "Opinion of [Bank]'s Financial Advisor" section in the primary document and the corresponding EX-99.x fairness opinion, and extract advisor name, fee structure (from "Fees and Expenses"), DCF ranges, selected-companies multiples, and premia-paid comparables. The output is an annual league table of advisor count, aggregate deal value, and fee economics.

Pro forma accretion / dilution modeling

Sell-side equity analysts pull the "Unaudited Pro Forma Condensed Combined Financial Information" section from each S-4 covering a peer-group acquirer, together with "Certain Unaudited Prospective Financial Information" (management cases) and the synergy disclosures in "The Merger." These feed per-deal accretion/dilution models and sector-level synergy-phasing benchmarks.

Section 368(a) tax-opinion drafting bank

Tax counsel aggregates every EX-8.1 across reorganization S-4s, indexed by acquirer jurisdiction and deal structure (forward triangular, reverse triangular, share exchange). The collection becomes a drafting bank for Section 368(a) qualifications, "should" vs. "will" opinion-level tracking, and representation-letter carve-outs.

Antitrust retrospective

Competition economists build a panel by joining entities[].fileNo across S-4 and S-4/A records to reconstruct each deal's amendment chain, then diff the "Regulatory Approvals" and antitrust risk-factor sections across successive amendments. Cross-referencing with closing (8-K Item 2.01) or abandonment (8-K Item 1.02) yields a labeled dataset of market-definition language and divestiture commitments for deals that did versus did not clear second-request review.

Subsidiary-graph construction

Data engineers extract EX-21.1 from every S-4 record, parse the subsidiary lists (entity name, jurisdiction of organization) for both acquirer and target, and merge with entities[] CIK and SIC to maintain a corporate-family graph that updates at each new filing. The output feeds KYC, sanctions, and exposure-aggregation tooling.

Training corpus for M&A-specialized LLMs

NLP teams build paired training data by aligning primary-document prospectus narrative (the "The Merger" and "Background of the Merger" sections) with the corresponding EX-2.1 clauses, and by using S-4/A revisions as supervision for clause-level diff tasks. The result is a fine-tuning corpus for deal-term extraction, MAE-carve-out classification, and closing-condition question answering.

Governance and compensation benchmarking

Proxy solicitors and compensation consultants parse "Interests of Directors and Executive Officers in the Merger" (golden-parachute tables, Rule 14a-21(c) Say-on-Pay votes) and "The Special Meeting" (record date, quorum, approval thresholds) across recent S-4s. Outputs feed vote-recommendation memos, parachute-magnitude peer reports, and appraisal-rights client guidance.

Filing-fee analytics from iXBRL

Capital-markets teams target the EX-FILING FEES exhibit on post-2022 filings, parse the ffd: namespace facts (security class, proposed maximum aggregate offering price, fee rate, total fee paid), and produce aggregate Securities Act registration-fee tallies by filer, industry, and month, bypassing the unstructured fee tables that precede the rule.

De-SPAC deal surveillance

Shell-company S-4 filings are identified by the presence of sponsor-support and PIPE-subscription agreements in EX-10.x combined with target-only historical financials. Surveillance teams filter the monthly ZIP on those exhibit patterns to produce a de-SPAC pipeline report with sponsor identity, PIPE size, and earn-out structure for each transaction.

CVR and earn-out mechanics library

Corp dev teams extract the "Description of CVRs" section of the primary document and any CVR Agreement filed as an EX-10.x or EX-99.x exhibit across deals where contingent value rights are part of the consideration. The library of milestone definitions, payment caps, and dispute mechanisms is reused when structuring biotech and pharma acquisitions.

Amendment-chain revision tracking

Journalists and deal-litigation teams reconstruct the amendment sequence by joining records on entities[].fileNo, then diff successive primary documents to surface added "Litigation Related to the Merger" disclosures, revised management projections, updated fairness-opinion inputs, and supplemental disclosures settling stockholder suits. The output is a timeline of disclosure changes for each contested transaction.

Dataset Access

The Form S-4 Files Dataset is available through three access modes: a JSON index API for programmatic discovery, a full archive download, and per-container monthly downloads. Containers follow a monthly partition pattern of YYYY/YYYY-MM.zip and cover filings from January 1994 to present, with daily updates. Image files are excluded from each container archive.

Dataset Index JSON API: https://api.sec-api.io/datasets/form-s4-files.json

Returns dataset metadata and the full list of monthly ZIP containers with their download URLs, sizes, record counts, and last-updated timestamps. Use this endpoint to monitor which containers changed in the most recent refresh and decide which to re-download. This endpoint does not require an API key.

Example response:

Example
1 {
2 "datasetId": "1f13365b-9ae0-68e2-a0f4-6e8e71978b6e",
3 "datasetDownloadUrl": "https://api.sec-api.io/datasets/form-s4-files.zip",
4 "name": "Form S-4 Files Dataset",
5 "updatedAt": "2026-04-23T03:02:36.235Z",
6 "earliestSampleDate": "1994-01-01",
7 "totalRecords": 386971,
8 "totalSize": 13581316022,
9 "formTypes": ["S-4", "S-4/A"],
10 "containerFormat": "ZIP",
11 "fileTypes": ["TXT", "JSON", "HTML", "PDF", "XFD"],
12 "containers": [
13 {
14 "downloadUrl": "https://api.sec-api.io/datasets/form-s4-files/2026/2026-04.zip",
15 "key": "2026/2026-04.zip",
16 "size": 52341890,
17 "records": 412,
18 "updatedAt": "2026-04-23T03:02:36.235Z"
19 }
20 ]
21 }

Download Entire Dataset: https://api.sec-api.io/datasets/form-s4-files.zip?token=YOUR_API_KEY

Downloads the complete dataset as a single ZIP archive covering all S-4 and S-4/A filings from 1994 to present. This endpoint requires an API key.

Download Single Container: https://api.sec-api.io/datasets/form-s4-files/2026/2026-04.zip?token=YOUR_API_KEY

Downloads one monthly container ZIP instead of the full archive. Container paths follow the YYYY/YYYY-MM.zip pattern and can be read directly from the downloadUrl field in the index JSON. This endpoint requires an API key.

Frequently Asked Questions

What forms does this dataset cover?

The dataset covers Form S-4 and Form S-4/A, the Securities Act of 1933 registration statement for business combinations and its pre-effective (or, less commonly, post-effective) amendment variant. The formType field on every metadata.json envelope will be either S-4 or S-4/A.

What does one record in this dataset represent?

One record is a single EDGAR submission of Form S-4 or S-4/A, keyed by its SEC accession number. Each record is a folder containing the primary registration statement, every filed exhibit (merger agreement, charters, legality and tax opinions, material contracts, auditor consents, fairness opinions, filing-fee exhibit), and a metadata.json envelope. The record unit is the filing, not the transaction: a single business combination typically generates one initial S-4 plus multiple S-4/A amendments, each of which is an independent record.

Who files Form S-4?

The filer is the entity issuing securities as consideration in a business combination — typically the acquirer in a stock-for-stock merger, the offeror in a securities exchange offer, or a newly formed holding company in a reorganization. The target is not the filer, even though much of the prospectus describes the target's business and financials. Multi-registrant S-4s are common when subsidiary guarantors co-register debt securities or when a new parent files alongside its merging subsidiaries.

What is the difference between S-4 and F-4?

F-4 is the S-4 equivalent when the issuer of the consideration securities qualifies as a foreign private issuer under Rule 405. The disclosure architecture is parallel, but F-4 permits IFRS financials without U.S. GAAP reconciliation. The trigger is the FPI status of the share issuer, not the target's domicile: a U.S. acquirer issuing shares to a foreign target files S-4, while a foreign acquirer issuing shares to a U.S. target files F-4. F-4 filings are in a separate dataset.

How do S-4/A amendments differ from the original S-4?

Form S-4/A is an amendment to a previously filed S-4, typically filed in response to SEC staff comment letters, updated financial statements, or changed deal terms. The record format is identical to an S-4, and the description field carries an [Amend] marker. All amendments of a given registration share the same entities[].fileNo as the original S-4, so amendment-chain reconstruction joins on file number rather than accession number. Amendments can range from fully restated prospectuses with refreshed exhibit sets to two-file records containing only an updated primary document and metadata.json.

Are image files included in the ZIP archives?

No. Image files (type: GRAPHIC, typically .jpg or .gif — used for signature blocks, organization charts, deal-structure diagrams, and fairness-opinion logos) are enumerated in the documentFormatFiles[] array of metadata.json for completeness but are not placed in the container ZIP. The EDGAR complete-submission .txt bundle is likewise enumerated but not bundled. Consumers that need either category can fetch them directly from the documentUrl back to EDGAR.

Where do I find the merger agreement inside a record?

The definitive merger agreement (or share-exchange agreement) is filed as the EX-2.x exhibit, identified by documentFormatFiles[].type equal to EX-2.1, EX-2.2, and so on. Do not rely on file names — preparer stubs such as d37576dex21.htm or tm2524487d9_ex2-1.htm are not standardized — use the type field in the metadata envelope or the SGML <TYPE> tag inside the document. The exhibit may contain [***] bracketed redaction markers where confidential commercial terms have been redacted under Item 601(b)(2) of Regulation S-K.

How often is the dataset updated, and what time period does it cover?

The dataset is refreshed daily and covers filings from January 1994 (the start of EDGAR's mandatory electronic-filing phase-in) through the present. Recent monthly cadence sits around thirty S-4 and S-4/A filings per month based on November 2025 sampling. Containers are organized as monthly ZIPs in the pattern YYYY/YYYY-MM.zip, so incremental ingestion runs naturally at the monthly container level.