The Form F-4 Files Dataset packages every Form F-4 and Form F-4/A registration statement submitted to EDGAR by foreign private issuers since October 1994. Form F-4 is the registration statement prescribed by 17 CFR 239.34 under the Securities Act of 1933 for foreign private issuers (FPIs) registering securities issued in business-combination transactions — statutory mergers, consolidations, exchange offers, and Rule 145 transactions. One record in the dataset is one EDGAR submission, identified by accession number and delivered as a folder containing the byte-faithful original documents (main registration statement, exhibits, XBRL data files) plus a generated metadata.json sidecar that re-states the EDGAR submission header in structured form. Form F-4/A records are amendments — staff-comment responses, refreshed financials, revised exchange ratios, and pre- or post-effective amendments — included alongside initial F-4s, so the dataset captures the complete amendment chain of every cross-border registered M&A registration on EDGAR. The dataset is distributed as ZIP containers with file types TXT, JSON, HTML, and PDF.
Programmatically retrieve the full list of dataset archive files, download URLs and dataset metadata.
Dataset Index JSON API
Download the entire dataset as a single archive file.
Download Entire Dataset:
Download a single container file (e.g. monthly archive) from the dataset.
Download Single Container:
The dataset contains every Form F-4 and Form F-4/A registration-statement submission filed on EDGAR from October 1994 forward. Each record is a single accession-numbered submission rather than a single document: the underlying SEC registration statement (assembled by counsel and filed via EDGAR) is delivered alongside the dataset's packaging of that filing — one folder per accession number, byte-faithful documents, and a structured metadata sidecar. The folder name is the eighteen-digit accession number with the dashes stripped — for example, 0001683168-25-008354 becomes 000168316825008354.
Form F-4 itself is a hybrid registration statement and prospectus. Part I is the prospectus delivered to security holders of the company being acquired; Part II contains supplementary information, undertakings, signatures, and the exhibit list filed with the SEC but not delivered. Because F-4 governs cross-border deals involving foreign issuers, its prospectus must reconcile target-company and registrant financial statements to either U.S. GAAP or to IFRS as issued by the IASB, depending on the issuer's accounting framework election under Form 20-F General Instruction G. Multi-issuer business combinations frequently produce co-registrant filings under a single accession number: a master file number such as 333-289108 combined with suffixed file numbers 333-289108-01, 333-289108-02 for each co-registrant.
The dataset stores every original document type except graphics. Image files (GRAPHIC documents — JPGs, GIFs, PNGs embedded as logos, signature images, or chart images) are excluded from the bundle but remain referenced by URL inside metadata.json and by inline <IMG SRC="…"> tags inside the HTML. The file types found in the dataset are TXT, JSON, HTML, and PDF, distributed in ZIP containers.
Each accession-number folder contains:
metadata.json — the structured index of the submission, described in detail below.F-4 or F-4/A in the EDGAR submission header. In the modern era this is an inline-XBRL-tagged XHTML file, often several megabytes, containing the full prospectus and Part II content.EX-2.x, EX-3.x, EX-4.x, EX-5.x, EX-8.x, EX-10.x, EX-21.x, EX-23.x, EX-25.x, EX-99.x, and EX-FILING FEES. Most exhibits are wrapped in EDGAR's SGML <DOCUMENT> envelope around an inner HTML body; the EX-FILING FEES exhibit is itself iXBRL..xsd) and the calculation, definition, label, and presentation linkbases (_cal.xml, _def.xml, _lab.xml, _pre.xml), plus extracted instance documents (*_htm.xml).metadata.json and in the HTML body, but the bytes are absent locally.HTML/HTM is the dominant format for the main statement and exhibits in the modern era; JSON is the metadata sidecar; TXT covers the legacy ASCII-era filings and the complete-submission text URL listed in metadata.json; PDF appears for occasional supplemental exhibits where issuers were permitted to file in that format. XML files (taxonomy linkbases and extracted XBRL instances) ride alongside as data files and are listed under metadata.json.dataFiles[].
metadata.json shapeThe sidecar restates the EDGAR submission header in JSON and adds dataset-level identifiers. Its top-level keys are:
| Key | Type | Role |
|---|---|---|
formType | string | "F-4" or "F-4/A". |
accessionNo | string | Hyphenated EDGAR accession number, e.g. "0001683168-25-008354". |
linkToFilingDetails | string | URL to the primary document on sec.gov, often prefixed with https://www.sec.gov/ix?doc=… for iXBRL. |
description | string | Standard EDGAR description, e.g. "Form F-4/A — Registration of securities, foreign private issuers, business combinations: [Amend]". |
linkToTxt | string | URL to the complete-submission .txt file on EDGAR. |
filedAt | string | ISO-8601 timestamp with offset, e.g. "2025-11-14T07:26:12-05:00". |
documentFormatFiles | array | One entry per non-data document in the submission, including graphics that are not redistributed. |
dataFiles | array | XBRL/data documents (taxonomy linkbases, instance). May be empty for filings that do not ship XBRL. |
entities | array | One entry per filer; multi-filer business combinations produce two or three entries. |
seriesAndClassesContractsInformation | array | Series-and-class contract information; typically empty for F-4. |
linkToHtml | string | URL to the EDGAR filing-index page. |
linkToXbrl | string | URL to a separate XBRL package; commonly empty when XBRL is inline in the main document. |
id | string | 32-character internal record identifier. |
documentFormatFiles[] and dataFiles[] itemsEach item carries:
sequence — EDGAR sequence number as a string ("1", "2", …). The complete-submission text-file row uses a single space " " for both sequence and type.size — byte count of the original document, encoded as a string.documentUrl — direct URL to the file on sec.gov.description — free-form description from the submission header, often a truncated all-caps phrase (e.g. "AGREEMENT AND PLAN OF MERGER AND REORGANIZATION", "OPINION OF OGIER", "FORM OF PROXY CARD"). Truncated at roughly 80 characters.type — the EDGAR document-type code: F-4, F-4/A, EX-2.x, EX-3.x, EX-4.x, EX-5.x, EX-8.x, EX-10.x, EX-21.x, EX-23.x, EX-25.x, EX-99.x, EX-FILING FEES, GRAPHIC, XML, EX-101.SCH/CAL/DEF/LAB/PRE.A large fraction of documentFormatFiles[] rows carry type: "GRAPHIC". Their bytes are not redistributed in the bundle, but the URL remains valid for re-fetching from EDGAR.
entities[] itemsOne entry per filer (registrant, co-registrant, subject company). Each entity carries:
companyName — preserves the EDGAR (Filer) role suffix verbatim (e.g. "YHNA MS I Ltd (Filer)").cik — numeric CIK as a string, no zero-padding.irsNo — IRS employer identification number; frequently "000000000" for foreign private issuers without a U.S. EIN.fileNo — SEC file number, including any co-registrant suffix ("333-289108", "333-289108-01", "333-289108-02").filmNo — EDGAR film number assigned at acceptance.type — repeats the form type ("F-4" / "F-4/A").act — Securities Act under which filed; "33" for F-4.sic — SIC code combined with its textual label (e.g. "7371 Services-Computer Programming Services", "2834 Pharmaceutical Preparations"). May be absent for some co-filers.stateOfIncorporation — EDGAR state/country code (D8 Cayman Islands, E9 Cayman Islands variant, V8 Germany, A1 British Columbia, Z4 other, etc.) preserved as the raw code rather than the human-readable jurisdiction.fiscalYearEnd — MMDD string (e.g. "1231", "0731").tickers — optional array of trading symbols (e.g. ["VACH", "VACHU", "VACHW"]); often absent for unlisted private targets and shell registrants.The multi-filer pattern is central to F-4. A typical business combination registration produces two or three entities[] entries that share a master fileNo prefix (e.g. 333-289108) with sequential suffixes, while SIC, ticker, state of incorporation, and fiscal-year-end vary across co-registrants because the entities sit in different industries and jurisdictions.
The submission opens with EDGAR's SGML header — accession number, submission type, period of report (where applicable), public document count, filer blocks, and exhibit list. The dataset's metadata.json is the structured projection of this header.
The first pages of the main document carry the Form F-4 cover page: the exact name of the registrant as specified in its charter and (where applicable) the English translation; the state or other jurisdiction of incorporation; the IRS employer identification number (often 00-0000000 for foreign issuers); the primary standard industrial classification code; the principal executive office and agent for service of process in the United States; the file number(s) assigned to the registration; the title of each class of securities being registered; the proposed maximum aggregate offering price; for pre-2022 filings, the calculation of the registration fee directly on the cover; and the box-check disclosures (delaying amendment, large accelerated filer status, emerging-growth-company status, and the foreign-private-issuer accommodations).
Part I, the prospectus delivered to security holders, conventionally includes:
Part II contains:
Exhibits follow Form F-4 Item 21 / Regulation S-K Item 601 numbering and appear in the folder as separate .htm files:
xbrl.sec.gov/ffd/…, declaring ffd:SubmissnTp, ffd:FeeExhibitTp, ffd:RegnFileNb, ffd:OfferingTableNa, and the offering-table line items.The exhibit set scales with deal complexity. SPAC and small-cap business combinations commonly ship only the main statement plus EX-5.1, one or more EX-23 consents, and EX-99.1 (form of proxy card). Operating-issuer combinations involving registered debt ship the full slate, including extensive EX-4 indenture exhibits and EX-25.1 trustee eligibility.
For filings that ship XBRL, the folder additionally contains the taxonomy schema (.xsd), the calculation linkbase (_cal.xml), the definition linkbase (_def.xml), the label linkbase (_lab.xml), the presentation linkbase (_pre.xml), and the extracted instance documents (*_htm.xml). These are listed under metadata.json.dataFiles[] rather than documentFormatFiles[], which keeps narrative documents and structured-data documents on separate axes of the metadata.
Two distinct on-disk formats coexist within the same modern filing folder:
Pure inline-XBRL XHTML. The main F-4 / F-4/A document and the EX-FILING FEES exhibit begin with an XML prolog (<?xml version='1.0' encoding='ASCII'?>) and an <html> root carrying the XBRL namespace family — ix (inline XBRL 2013), dei (xbrl.sec.gov/dei/…), us-gaap (fasb.org/us-gaap/…), srt, iso4217, an issuer-specific extension taxonomy, and, for fee exhibits, the ffd namespace. The body interleaves the prospectus narrative with <ix:nonNumeric>, <ix:nonFraction>, <ix:header>, and <ix:hidden> tags binding concepts such as dei:AmendmentFlag, dei:EntityCentralIndexKey, us-gaap:CommitmentsAndContingencies, and us-gaap:StockholdersEquity to context references that name reporting periods (e.g. From2025-04-29to2025-06-30). HTML tags are lowercase XHTML.
SGML <DOCUMENT> wrapper around HTML. Most other exhibits (EX-5.x, EX-10.x, EX-23.x, EX-99.x) are stored in EDGAR's submission-file format: a header block of pseudo-tags — <TYPE>, <SEQUENCE>, <FILENAME>, <DESCRIPTION> — terminating with <TEXT>, followed by the body HTML, then </TEXT></DOCUMENT>. The header pseudo-tags are unclosed (this is SGML, not XML). The inner HTML uses uppercase tags (<HTML>, <BODY>, <P>, <TABLE>, <TR>, <TD>) and frequently contains <IMG SRC="image_NNN.jpg"> references that resolve on sec.gov but are absent locally because graphics are stripped from the bundle.
EX-FILING FEES exhibits are commonly produced by the Novaworks Fee Exhibit Editor and carry editor-version metadata in HTML comments such as <!-- Field: Set; Name: Platform; Value: Novaworks Fee Exhibit Editor --> together with an MD5 of the source. Parsers should detect format per file (XML prolog vs. SGML <DOCUMENT> opener) rather than per record, because both dialects coexist inside the same accession-number folder.
Included in each record:
metadata.json sidecar that re-states the EDGAR header, lists every original document (including those not redistributed), and exposes the structured filer/issuer fields.Excluded from each record:
GRAPHIC document type — JPG, GIF, PNG). The metadata still references them by URL, and inline <IMG> tags inside the HTML still point to the original filenames, so they can be retrieved from EDGAR if needed but are not present locally.CORRESP, UPLOAD) and which are not part of the F-4 submission itself.Form F-4's required content has accumulated several material layers since its introduction:
The dataset spans October 1994 to present, traversing every EDGAR document-format era:
<DOCUMENT> blocks with plain-ASCII bodies. Tabular financial data is rendered with monospaced columns. No HTML, no XBRL.<DOCUMENT>-wrapped HTML. No XBRL..xsd, _cal.xml, _def.xml, _lab.xml, _pre.xml, and instance .xml files — alongside the human-readable HTML.<ix:…> elements, producing iXBRL XHTML files declared with an XML prolog and many XBRL namespaces. Taxonomy linkbases continue to ride alongside as dataFiles, and extracted instance documents (*_htm.xml) are produced from the inline tags.ffd taxonomy, replacing the narrative fee table on the cover page.Throughout these eras the dataset preserves the original byte content, so a record's encoding reflects the filing-era conventions: 1994 records read as plain ASCII text inside SGML wrappers, 2010-era records read as HTML inside SGML, and modern records read as a mix of pure iXBRL XHTML (main statement + fee exhibit) alongside SGML-wrapped HTML (most other exhibits).
fileNo (the 333-xxxxxx prefix in entities[].fileNo), not by accession number — each amendment receives its own accession number but reuses the same file number.entities[] entries that share a master fileNo and add suffixes -01, -02. SIC, ticker, state of incorporation, and fiscal-year-end vary across co-registrants because the entities sit in different industries and jurisdictions.<IMG> tags inside the HTML and documentFormatFiles[] entries with type: "GRAPHIC" point to filenames whose bytes are absent from the bundle. Their documentUrl on sec.gov remains valid for re-fetching when needed.*_htm.xml data file in dataFiles[] is the canonical instance document produced from those inline tags.(Filer) role suffix on companyName, the literal "000000000" IRS number for foreign issuers without a U.S. EIN, and the EDGAR state/country codes (D8, E9, V8, A1, Z4) are preserved verbatim from the submission header rather than normalized to human-readable values.Each record is one EDGAR submission of Form F-4 (initial registration statement) or Form F-4/A (pre- or post-effective amendment) by a foreign private issuer registering securities to be issued in a business combination, exchange offer, or other Rule 145(a) transaction. Form F-4 is the FPI counterpart to Form S-4.
Eligibility to file is defined by Securities Act Rule 405 and Exchange Act Rule 3b-4(c). An issuer organized outside the United States qualifies as an FPI unless both (i) more than 50 percent of its outstanding voting securities are held of record by U.S. residents, and (ii) any one of these is true: a majority of executive officers or directors are U.S. citizens or residents, more than 50 percent of assets are located in the United States, or the business is administered principally in the United States. FPI status is retested as of the last business day of the second fiscal quarter; an issuer that loses FPI status moves to the domestic regime and would file S-4 instead of F-4.
The F-4 filer is the FPI issuing the securities being registered. In a merger that is typically the acquirer or a parent holding company; in an exchange offer it is the offeror; in a Rule 145(a) reclassification or transfer of assets it is the entity whose securities will be issued to voting holders. The target is generally not the F-4 filer, although its financial statements and MD&A are commonly incorporated into the F-4 prospectus. Foreign governments and political subdivisions are not FPIs and use Schedule B, not F-4.
Form F-4 is required when an FPI offers or sells securities in a transaction subject to registration under Section 5 of the Securities Act of 1933 and the transaction is within the scope of the form. The principal triggers are:
There is no periodic cadence; F-4 is entirely event-driven by the underlying transaction.
The disclosure obligation flows from Section 5 of the Securities Act, with Rule 145 confirming that certain holder-vote transactions are "offers" and "sales" of the new securities. Form F-4 integrates Securities Act registration with Regulation S-K and Regulation S-X disclosure, applied through the FPI financial-statement rules (U.S. GAAP, or IFRS as issued by the IASB without reconciliation; otherwise reconciliation to U.S. GAAP). The F-4 prospectus typically does double duty as the offer-to-exchange document or proxy/information statement delivered to target holders, so its content overlaps with Regulations 14A, 14D, and 14E where those regimes also apply.
The standard sequence:
The original F-4 and its F-4/A amendments share a single registration-file lineage (CIK plus the F-4 accession-number chain).
Rule 145 was adopted in 1972 (Securities Act Release No. 5316); Form F-4 was adopted in 1982 as part of the integrated disclosure system that established the F-series for foreign private issuers (replacing prior use of forms such as S-14). The dataset's earliest records are from October 1994, reflecting EDGAR phase-in rather than the historical origin of the form; pre-1994 paper F-4 filings are not included.
Form F-4 sits at the crossing of two axes that govern most Securities Act registration choices: (1) issuer status — foreign private issuer versus U.S. domestic registrant, and (2) purpose — business-combination consideration versus general capital raising. Every adjacent form below differs from F-4 along one or both axes. Mapping those axes is the fastest way to know which filings overlap with F-4, which substitute for it, and which travel with it in the same deal.
S-4 is the single closest analogue. Both register securities issued as consideration in mergers, exchange offers, and Rule 145 transactions, and both contain a prospectus with target financials, pro formas, fairness opinions, and merger agreements. The dividing line is Rule 405 FPI status: F-4 is filed only by registrants qualifying as FPIs; S-4 by all other domestic registrants. This drives accounting (IFRS or home-country GAAP with U.S. GAAP reconciliation on F-4; U.S. GAAP throughout on S-4) and incorporated-by-reference baseline (Form 20-F vs. Form 10-K). The two datasets together approximate the full universe of registered M&A consideration in the U.S., but they are mutually exclusive per filing and cannot be merged without normalizing for accounting regime.
Form F-1 is the FPI registration statement for offerings not covered by a more specialized form (typically IPOs and follow-ons for cash). Same filer population as F-4 and the same Securities Act mechanics, but no target financials, no pro forma combination, no exchange-ratio mechanics, no Rule 145 framework. An FPI raising primary capital files F-1; the same FPI issuing stock as deal consideration files F-4. They are substitutes only in the rare case where a transaction can be structured either as a primary offering or as a registered exchange.
Form F-3 is the short-form FPI shelf available to seasoned issuers meeting reporting-history and float thresholds. It permits incorporation by reference of Exchange Act filings (notably Form 20-F) and supports continuous or delayed takedowns from a single base prospectus. F-4 is transaction-specific and long-form: even when it incorporates 20-F by reference, it must carry the deal prospectus, target financials, and pro formas that F-3 never includes. F-3 is generally not used for business combinations, although a registered acquirer may fund a cash-and-stock deal via a shelf takedown structured as a primary offering rather than an exchange.
Form S-1 is two steps removed from F-4: domestic issuer and not business-combination specific. It is useful only as a contrast point clarifying that F-4 is doubly specialized — by FPI status and by transactional purpose — whereas S-1 is the unspecialized domestic baseline.
Form 425 is a filing wrapper for written communications relating to a business combination that constitute prospectuses or solicitation material under Rules 165, 166, and 425 (investor presentations, press releases, employee communications, transcripts). It is not a registration statement and never substitutes for F-4. The two are deeply complementary: 425s typically begin before the F-4 is filed and continue through closing, while the F-4 is the formal registration and definitive prospectus. Full deal reconstruction requires pairing them.
When a business combination requires a U.S. shareholder vote, proxy materials enter the picture. A U.S. target whose shareholders must vote files Schedule 14A (PRE 14A, then DEF 14A); a 14C information statement applies when no solicitation occurs. F-4 governs issuance of securities under the Securities Act; 14A/14C governs solicitation of votes under the Exchange Act. The same physical document is routinely filed as a joint proxy statement / prospectus serving both regimes, but the filing identifiers and datasets remain distinct. FPI acquirer-side votes are usually governed by home-country law and rarely produce a 14A on the acquirer side, so 14A overlap is most common on the U.S. target side.
Schedule TO is the Exchange Act framework for tender and exchange offers (TO-T for third-party offers; TO-I for issuer self-tenders). Overlap with F-4 occurs only in exchange offers: when a bidder offers its own securities for target securities, the F-4 registers the consideration securities and the TO discloses offer mechanics, with the F-4 prospectus incorporated into the TO. A pure cash tender offer requires TO only, no F-4. A one-step statutory merger requires F-4 (if stock is consideration) but no TO; instead 14A typically accompanies it. The choice between exchange-offer structure (F-4 + TO) and merger structure (F-4 + 14A) is driven by deal speed, squeeze-out mechanics, and target-shareholder dynamics.
Eligible Canadian issuers in cross-border business combinations may file Form F-8 (cash exchange offers/business combinations) or Form F-80 (stock exchange offers/business combinations) under the multijurisdictional disclosure system, with reduced disclosure, instead of F-4. Form F-10 is the corresponding MJDS form for non-business-combination registered offerings. Where the Tier I cross-border exemption applies — U.S. ownership of the target is sufficiently limited and the transaction qualifies under Rule 13e-4(h)(8) or Rule 14d-1(c) — Securities Act registration on F-4 may not be required at all and the deal can proceed under home-country rules with a limited U.S. overlay.
Form 20-F is the Exchange Act annual report for FPIs. It is not a substitute for F-4 but is routinely incorporated by reference to supply registrant historical financials, MD&A, and risk factors. The relationship is hierarchical: 20-F is the periodic baseline; F-4 is the transactional event document that pulls 20-F content forward and layers in target financials, pro formas, and deal-specific items.
F-4/A filings are amendments to previously filed F-4 registration statements and are included in this dataset alongside initial F-4s. Amendments respond to SEC staff comments, refresh stale financials, reflect repricing or revised exchange ratios, or incorporate post-signing changes to the merger agreement. A typical transaction produces one F-4 and several F-4/A filings before effectiveness. They should be treated as a sequenced trail of a single registration, not as duplicates.
The F-4 dataset is defined by the intersection of FPI registrant status and business-combination purpose. It is not interchangeable with F-1 / F-3 (same FPI population, but general capital raising rather than deal consideration), S-4 (same business-combination purpose, but domestic registrants), S-1 (neither axis matches), 20-F (periodic reporting, not Securities Act registration), or 425, 14A/14C, and Schedule TO (companion filings under different statutes that travel with F-4 in the same deal but capture different content). Within its scope, F-4 is the authoritative, prospectus-level Securities Act document for cross-border and FPI-led registered M&A consideration — the only filing carrying the full registrant-and-target financial reconciliation, pro forma combination, and deal-mechanics disclosure for that population.
Form F-4 filings bundle a transaction prospectus, dual-issuer financials reconciled to U.S. GAAP or IFRS, pro formas, and a full exhibit set. Different professions mine different slices of the same record.
Sell-side bankers and in-house corp dev teams pull deal mechanics from the "Terms of the Transaction" and "The Merger Agreement" sections: exchange ratios, collars, walk-away rights, fiduciary-outs, termination and reverse termination fees, and closing conditions. The background-of-the-merger narrative and fairness-opinion exhibits feed precedent decks and break-fee benchmarking by jurisdiction, deal size, and consideration mix.
Risk-arb analysts size spreads and probability-weight outcomes using exchange ratio mechanics, collar formulas, election and proration rules, and regulatory conditions (antitrust, foreign investment review, sectoral approvals). EX-99 voting and support agreements quantify locked-up target shares. Historical F-4/F-4/A series support backtests of completion rates, time-to-close, and the predictive power of specific deal-protection terms.
Transactional counsel use the dataset as a precedent library for drafting, staff-comment response, and benchmarking. They search risk factors, tax-consequences disclosure, accounting treatment, and appraisal-rights summaries, and reuse EX-5 legality opinions, EX-8 tax opinions, EX-23 auditor consents, and EX-99 voting/support, lock-up, and stockholder agreements. F-4/A revision diffs expose the comment-and-response pattern, letting counsel anticipate review issues on comparable deals.
Equity research models the post-combination entity from pro forma financial statements, segment reconciliations, synergy disclosures, and the IFRS/U.S. GAAP reconciliation footnotes. Credit analysts pair the same financials with assumed indebtedness, change-of-control covenants in material debt agreements, and any new financing exhibits to reassess pro forma leverage and covenant headroom on both legs of the combination.
Controllers and technical accounting groups at foreign private issuers use the dataset as a working reference for IFRS-to-U.S. GAAP reconciliation, purchase-accounting disclosure, and pro forma adjustment practice. They benchmark reconciliation footnotes, goodwill and intangible allocation tables, IFRS 8 vs. ASC 280 segment reporting, and revenue recognition disclosure against peers in the same industry and home jurisdiction.
Compliance and FCPA teams diligence cross-border counterparties using risk factors on bribery, sanctions, and export controls, the legal proceedings section, related-party transactions, and any disclosure of internal investigations or government inquiries. Output feeds onboarding, sanctions screening, and ongoing monitoring of combined entities operating in higher-risk geographies.
Operations teams running exchange-offer mechanics use the prospectus consideration and election sections plus EX-99 forms of letter of transmittal and exchange agent agreements to configure election deadlines, default elections, fractional-share treatment, and tender procedures for certificated and book-entry shares.
Quant teams build historical libraries of cross-border deal terms and outcomes using cover-page fields, EX-FILING FEES exhibits for registered share counts and aggregate transaction value, and any inline XBRL data. Features feed completion classifiers, premium models, and post-merger drift signals conditioned on jurisdiction, consideration mix, and deal-protection strength.
Data engineering teams use metadata.json (accession, filer, form type, exhibit-type tags) for indexing, then run document parsers on prospectus HTML and exhibit text to populate deal-terms, financials, and parties tables. Having every submitted document in one container simplifies snapshot rebuilds and reprocessing when extraction logic changes.
Engineering teams building clause-extraction and precedent-search products use exhibit-level segmentation across EX-5, EX-8, EX-23, and EX-99 merger, voting, support, and registration-rights agreements, plus the prospectus narrative for risk-factor and rationale clause libraries. F-4/A revision diffs provide labeled examples for revision-prediction and comment-response models.
Teams building retrieval systems on SEC content chunk the proxy statement prospectus by section (summary, risk factors, the merger, material U.S. federal income tax consequences, accounting treatment, comparison of stockholder rights), embed exhibits separately by type, and link registration statements to their amendments via metadata. Mixed HTML/PDF/TXT formats and consistent EDGAR metadata make the corpus suitable for benchmarking parsing accuracy and grounded answers on cross-border M&A.
Finance academics use deal terms and outcomes for premium, completion, and announcement-return studies. Legal scholars analyze deal-protection evolution, fiduciary-out drafting, and forum-selection clauses across F-4/A revisions. Accounting researchers study reconciliation quality and pro forma disclosure. Coverage from 1994 forward supports event-time, calendar-time, and panel designs across jurisdictions and industries.
Deal practitioners extract transaction terms and exhibit precedent; investors price announced deals; accounting and compliance functions handle reconciliation and counterparty risk; data, legal-tech, and research teams build structured products on top of the corpus. Each role keys into a different layer of the same record — cover page and metadata.json, prospectus narrative, pro forma and reconciliation financials, XBRL, or the EX-5/EX-8/EX-23/EX-99/EX-FILING FEES exhibit set — which is why the full document bundle, including F-4/A amendments, is the working unit.
The following workflows show how teams operate on Form F-4 records in practice. Each ties to specific exhibits, sections, or metadata.json fields.
Parse the EX-107 (EX-FILING FEES) iXBRL exhibit to extract ffd:OfferingTableNa line items, registered share counts, aggregate transaction value, fee rate, and offsets. Joined to entities[].sic, entities[].stateOfIncorporation, and filedAt, this yields a panel of cross-border registered M&A volumes by jurisdiction and industry, plus per-deal SEC fee economics for budgeting and pitch decks.
Parse the "Terms of the Transaction," "The Merger Agreement," and consideration-election sections of the main F-4 document for exchange ratios, fixed/floating collars, walk-away thresholds, election and proration mechanics, termination fees, and regulatory closing conditions. Combined with EX-99 voting and support agreements (locked-up share counts), these features feed spread-sizing models, completion-probability classifiers, and time-to-close estimators.
Group records by master fileNo (the 333-xxxxxx prefix in entities[].fileNo) to assemble the F-4 / F-4/A amendment chain for a single registration, then diff successive prospectus and exhibit text to surface the staff-comment response pattern: added risk factors, refreshed financials, revised tax-consequences language, and reworked exchange-ratio mechanics. Output supports benchmarking of likely review issues on comparable in-flight deals and labeled training data for revision-prediction models.
Extract the audited financial statements, reconciliation footnotes, and pro forma combined statements from Part I of the prospectus, segmented by registrant vs. target headings. Filtered against entities[].sic and stateOfIncorporation, this builds a peer-keyed reference set of reconciliation entries, purchase-price allocation tables, IFRS 8 vs. ASC 280 segment mappings, and pro forma adjustment practice for use by technical accounting teams and external reporters.
Index EX-5.x legality opinions and EX-8.x tax opinions by issuing firm (Conyers Dill & Pearman, Ogier, Maples and Calder, Walkers, U.S. counsel), registrant stateOfIncorporation, and deal structure (Section 368 reorganization, scheme of arrangement, statutory merger). Counsel use the resulting precedent search to draft opinions for Cayman, BVI, Bermuda, and Delaware sub-issuer structures and to benchmark assumption and qualification language across firms.
Chunk the main iXBRL XHTML prospectus by canonical section (Q&A, summary, risk factors, background of the merger, material U.S. federal income tax consequences, accounting treatment, comparison of rights of security holders, MD&A) and embed each exhibit type (EX-2, EX-5, EX-8, EX-10, EX-23, EX-99) as its own document class. Records are linked to their amendments through the master fileNo, producing a grounded retrieval corpus for cross-border M&A question-answering and a benchmark for parsing iXBRL XHTML alongside SGML-wrapped HTML in the same record.
Combine EX-21.x subsidiary lists with the multi-entity entities[] array (master fileNo plus -01, -02 suffixes, divergent SIC and stateOfIncorporation) to build a deal-time corporate graph linking the registrant, target, surviving entity, and named subsidiaries by jurisdiction. Output supports counterparty-risk diligence, sanctions and FCPA screening of combined entities, and post-close entity-master maintenance at data vendors.
Dataset Index JSON API: https://api.sec-api.io/datasets/form-f4-files.json
This endpoint returns the dataset's metadata, including its name, description, last updated timestamp, earliest sample date, total record and size counts, covered form types (F-4, F-4/A), container format (ZIP), and contained file types (TXT, JSON, HTML, PDF). The response also includes the full dataset download URL and a list of all individual container files with per-container size, record count, last updated timestamp, and download URL. This endpoint can be polled daily to identify which containers were updated in the most recent refresh, allowing incremental downloads instead of re-fetching the full archive. No API key is required to access this endpoint.
1
{
2
"datasetId": "1f13365b-9ae0-692c-99b0-82ddaf21130b",
3
"datasetDownloadUrl": "https://api.sec-api.io/datasets/form-f4-files.zip",
4
"name": "Form F-4 Files Dataset",
5
"updatedAt": "2026-04-24T03:02:20.356Z",
6
"earliestSampleDate": "1994-10-01",
7
"totalRecords": 33850,
8
"totalSize": 1923243320,
9
"formTypes": ["F-4", "F-4/A"],
10
"containerFormat": "ZIP",
11
"fileTypes": ["TXT", "JSON", "HTML", "PDF"],
12
"containers": [
13
{
14
"downloadUrl": "https://api.sec-api.io/datasets/form-f4-files/2026/2026-04.zip",
15
"key": "2026/2026-04.zip",
16
"size": 13818783,
17
"records": 154,
18
"updatedAt": "2026-04-24T03:02:20.356Z"
19
}
20
]
21
}
Download Entire Dataset: https://api.sec-api.io/datasets/form-f4-files.zip?token=YOUR_API_KEY
Downloads the complete Form F-4 Files dataset as a single ZIP archive covering all filings from October 1994 to the most recent refresh. This endpoint requires an API key.
Download Single Container: https://api.sec-api.io/datasets/form-f4-files/2026/2026-04.zip?token=YOUR_API_KEY
Downloads one monthly container file rather than the full archive, which is useful for retrieving only newly added or updated filings identified through the dataset index. This endpoint requires an API key.
The dataset covers Form F-4 (initial registration statement) and Form F-4/A (pre- or post-effective amendment) filings submitted to EDGAR. F-4 is the registration statement prescribed by 17 CFR 239.34 under the Securities Act of 1933 for foreign private issuers registering securities issued in business-combination transactions.
One record is a single Form F-4 or Form F-4/A registration-statement submission, identified by its EDGAR accession number and packaged as one folder on disk. The folder contains the byte-faithful original-submission documents (main registration statement, every exhibit, and any XBRL data files) together with a generated metadata.json sidecar that re-states the EDGAR submission header and indexes every document the filer originally transmitted.
A foreign private issuer (as defined in Securities Act Rule 405 and Exchange Act Rule 3b-4(c)) issuing securities in a business combination, exchange offer, or other Rule 145(a) transaction must file Form F-4 to register those securities under Section 5 of the Securities Act. The target company is generally not the F-4 filer, although its financial statements and MD&A are commonly incorporated into the F-4 prospectus.
S-4 is the domestic counterpart to F-4 and is otherwise substantively parallel: both register securities issued as consideration in mergers, exchange offers, and Rule 145 transactions. The dividing line is Rule 405 FPI status — F-4 is filed only by foreign private issuers, S-4 by all other domestic registrants — which drives accounting (IFRS or home-country GAAP with U.S. GAAP reconciliation on F-4; U.S. GAAP throughout on S-4) and the incorporated-by-reference baseline (Form 20-F vs. Form 10-K).
The file types found in the dataset are TXT, JSON, HTML, and PDF, packaged in ZIP containers. HTML/HTM is the dominant format for the main statement and exhibits in the modern era; JSON is the metadata sidecar; TXT covers legacy ASCII-era filings and the complete-submission text URL; PDF appears for occasional supplemental exhibits. XML files (XBRL taxonomy linkbases and extracted instances) ride alongside as data files listed under metadata.json.dataFiles[].
The dataset spans October 1994 to the present. The 1994 start date reflects EDGAR phase-in rather than the historical origin of the form (Form F-4 was adopted in 1982); pre-1994 paper F-4 filings are not included.
No. Image files (the GRAPHIC document type — JPG, GIF, PNG) are excluded from each record. Their URLs and filenames remain visible in metadata.json and inside the HTML body via inline <IMG SRC="…"> tags, and they can be re-fetched from sec.gov if needed, but their bytes are not present locally.