The Form F-1 Files Dataset is the corpus of foreign private issuer (FPI) Securities Act registration statements filed with the SEC on Form F-1 and its pre- or post-effectiveness amendments on Form F-1/A. Each record is one complete EDGAR submission keyed by a single SEC accession number, materialized as one folder containing a metadata.json submission header and every text, HTML, XML, or PDF document attached to that submission, with image binaries omitted by design. Filers are non-U.S. operating companies, holding companies, and other commercial entities organized outside the United States that qualify as foreign private issuers under Rule 405 of the Securities Act and that are registering securities for offer or sale in the U.S. market under Section 5 of the Securities Act of 1933. The dataset begins in June 1996 — when EDGAR electronic filing became mandatory for most registrants — and is delivered as monthly ZIP containers covering both initial F-1 registrations and the F-1/A amendment chains they generate during SEC review.
Programmatically retrieve the full list of dataset archive files, download URLs and dataset metadata.
Dataset Index JSON API
Download the entire dataset as a single archive file.
Download Entire Dataset:
Download a single container file (e.g. monthly archive) from the dataset.
Download Single Container:
Form F-1 is the registration statement prescribed under the Securities Act of 1933 for foreign private issuers for which no other Securities Act form (F-3, F-4, F-7, F-8, F-10) is authorized or available. It is the foreign-issuer counterpart to the domestic Form S-1 and is most commonly used by non-U.S. companies undertaking an initial public offering of equity securities in the United States, but is also used for follow-on offerings, ADR programs that require full registration, resale registrations on behalf of selling shareholders, and registrations of debt and convertible instruments. The form is filed under the Securities Act of 1933 (act = "33" in the EDGAR header) and is assigned a 333- series file number that carries through every subsequent Form F-1/A amendment in the same registration.
Each record corresponds 1:1 to a row of the EDGAR full-submission index for form types F-1 and F-1/A, not to a per-issuer or per-offering aggregation. A single offering frequently produces many records over its lifecycle: one initial F-1 followed by a sequence of F-1/A amendments, all sharing the same 333-XXXXXX SEC file number but each with a distinct accession number, and each appearing as its own record. The substantive content is governed by the items prescribed in Form F-1 itself, which in turn incorporate by reference the disclosure schedule of Form 20-F. The bulk of the registrant's narrative and financial disclosure is therefore organized around the Form 20-F item numbering (Items 3 through 19), wrapped inside an offering-specific prospectus that addresses the offering mechanics dictated by the Securities Act and Regulation S-K Item 501 et seq. The financial statements must be prepared under U.S. GAAP, IFRS as issued by the IASB, or home-country GAAP reconciled to U.S. GAAP, with an auditor's report from a PCAOB-registered public accounting firm.
The dataset is distributed in monthly ZIP containers. File types found inside the containers are HTML/HTM, JSON (the metadata file), TXT (full-submission bundles), and PDF (occasional exhibit attachments). XML and XSD artifacts referenced in dataFiles[] are sometimes present locally and sometimes only reachable via the SEC URL recorded in the metadata.
A single record is materialized as one folder whose name is the 18-digit, zero-padded form of the SEC accession number with the two hyphens stripped (e.g., accession 0001104659-25-114804 becomes folder 000110465925114804). Inside the folder sit a metadata.json capturing the EDGAR submission header and every text/HTML/XML document that was part of the original submission, with image binaries omitted. The canonical dashed accession number is preserved inside metadata.json under accessionNo.
Each record stacks two layers:
metadata.json) that mirrors the EDGAR SGML header and enumerates every document attached to the accession.<DOCUMENT> envelope.Beneath those layers sit the analytic sub-structures that matter for interpretation: the prospectus narrative inside the main F-1 document, the Inline XBRL fact stream for filing-fee disclosures (and increasingly for cover-page tagging), the legal/tax/auditor opinion exhibits, the material-contract exhibits, and the filing-fee XML data files.
metadata.json captures the parsed EDGAR submission header. Top-level keys describe the filing as a whole:
formType — "F-1" or "F-1/A".accessionNo — canonical dashed accession number (NNNNNNNNNN-YY-NNNNNN).description — human-readable form description (e.g., "Form F-1 - Registration statement for certain foreign private issuers", with [Amend] appended on amendments).filedAt — ISO-8601 timestamp with timezone offset (e.g., "2025-11-28T17:24:18-05:00").linkToFilingDetails — absolute URL to the primary F-1 document.linkToHtml — URL of the EDGAR -index.htm page.linkToTxt — URL of the full-submission .txt bundle.linkToXbrl — URL of the XBRL viewer; consistently empty for F-1 records even when XBRL is present, so XBRL discovery must rely on dataFiles[].id — opaque 32-character hex identifier (sec-api internal record id).The header carries an entities[] array, which for F-1 is typically a single (Filer) element representing the foreign private issuer. Per-entity fields include:
cik — issuer CIK without leading zeros.companyName — company name with role suffix in parentheses (e.g., "SU Group Holdings Ltd (Filer)").type — repeats the form type for that entity.act — Securities Act under which it is filed ("33" for F-1).fileNo — 333-XXXXXX registration file number.filmNo — EDGAR film number.irsNo — IRS Employer Identification Number (uniformly "000000000" for foreign issuers without a U.S. EIN).fiscalYearEnd — "MMDD" four-digit string; sometimes absent on F-1/A amendment entities.stateOfIncorporation — EDGAR jurisdiction code (e.g., E9 Cayman Islands, D8 British Virgin Islands, G7 Denmark, K3 PRC).sic — SIC code with description; ampersands appear HTML-entity encoded (&).tickers[] — array of ticker symbols associated with the filer; may be empty.The seriesAndClassesContractsInformation array is almost always empty for F-1, which is reserved for investment-company filings.
The documentFormatFiles[] array enumerates the primary documents in the submission in EDGAR sequence order. Each entry carries:
sequence — numeric string ("1", "2", …); a single space " " is reserved for the trailing complete-submission .txt bundle row.size — byte count as a string.documentUrl — absolute SEC URL; Inline XBRL documents are exposed through the https://www.sec.gov/ix?doc=/Archives/... viewer rewriter.description — free-form text such as "F-1", "EXHIBIT 10.15", "FILING FEE IXBRL", "GRAPHIC", or "Complete submission text file".type — EDGAR document-type code drawn from a vocabulary that includes F-1, F-1/A, EX-1.1, EX-3.1, EX-4.x, EX-5.1, EX-8.1, EX-10.x, EX-16.1, EX-21.1, EX-23.1, EX-24.1, EX-99.x, EX-FILING FEES, GRAPHIC, etc.The parallel dataFiles[] array uses the same shape but carries XBRL/XML auxiliary artifacts, with type values such as XML, EX-101.SCH, EX-101.CAL, EX-101.DEF, EX-101.LAB, and EX-101.PRE. For thin F-1/A amendments dataFiles[] is frequently empty.
Each document referenced in documentFormatFiles[] is materialized as a file in the accession folder, with two structural exceptions: GRAPHIC entries (image binaries) are intentionally excluded from the dataset, and some dataFiles[] artifacts (the XBRL schema/calc/def/lab/pre set) may be referenced only by URL rather than stored locally. File names are filer-supplied and not standardized; common patterns include filing-agent ticket prefixes (e.g., tm2525392, mask, g085003, ea0267566) followed by a body-type suffix such as _f1, _f1a, formf-1, or formf-1a for the main registration statement, and ex<schedule>-<number> (e.g., ex5-1, ex10-15, ex21-1, ex23-1, ex99-1) for exhibits. The filing-fee exhibit appears under names like ex107, ex-fee, or ex-filingfees.
Every document — main F-1 included — begins with the EDGAR SGML <DOCUMENT> envelope: five unquoted header tags (TYPE, SEQUENCE, FILENAME, DESCRIPTION, TEXT) followed by the inner <HTML> payload, e.g.:
1
<DOCUMENT>
2
<TYPE>F-1
3
<SEQUENCE>1
4
<FILENAME>g085003_f1.htm
5
<DESCRIPTION>F-1
6
<TEXT>
7
<HTML>
8
...
9
</HTML>
10
</TEXT>
11
</DOCUMENT>
This wrapper must be stripped before the inner HTML can be fed into a standard HTML parser. Filing-fee exhibits keep the same wrapper but embed an Inline XBRL XHTML body whose root carries namespace declarations such as xmlns:ix="http://www.xbrl.org/2013/inlineXBRL" and xmlns:ffd="http://xbrl.sec.gov/ffd/2025", with <ix:header>, <ix:nonNumeric>, and <ix:nonFraction> tags surfacing the fee-calculation facts.
The main F-1 document is the prospectus that, in updated form, will be delivered to investors after the registration becomes effective. Although the SEC does not impose a rigid table of contents, the prospectus is conventionally organized in the following order:
The Part II portion of the registration statement (Information Not Required in Prospectus) sits at the end of the main document and covers indemnification of directors and officers, recent sales of unregistered securities, the exhibit and financial-statement schedules index, undertakings, and the signature block executed by the registrant, the principal executive officer, the principal financial officer, the principal accounting officer, a majority of the directors, and the authorized U.S. representative.
Exhibits are filed as separate sequenced documents and are each wrapped in their own SGML <DOCUMENT> envelope. For a Form F-1 the typical exhibit slate, mirroring the Item 8 / Item 601 of Regulation S-K exhibit table, includes:
Exhibit content varies in form: EX-3 documents are constitutional legal text; EX-5 and EX-8 are signed legal opinions; EX-10 documents are executed contracts with their own signature blocks, schedules, and exhibits; EX-21 is a tabular list of subsidiary names with their jurisdictions of incorporation; EX-23 is a brief one- to two-page auditor consent referencing the audit report and the use of the auditor's name; and EX-FILING FEES is a structured fee-calculation table rendered as Inline XBRL with the ffd: namespace.
Each accession folder includes metadata.json (always present) and every text/HTML/XML document of the original EDGAR submission that is not a GRAPHIC binary. This covers the main F-1 or F-1/A prospectus, the full slate of exhibits, the filing-fee Inline XBRL exhibit, and (where present) some of the XBRL schema and linkbase artifacts referenced in dataFiles[].
Image binaries embedded in the submission (logos, photographs, signature scans, structural diagrams) are excluded from the dataset, even though their GRAPHIC entries remain enumerated in documentFormatFiles[] for reference. Some XBRL linkbase files (the _cal.xml, _def.xml, _lab.xml, _pre.xml set, and the corresponding .xsd schema) referenced in dataFiles[] may not be present locally for thinner amendments — only the SEC URL pointer is preserved. The full-submission .txt bundle is referenced via linkToTxt and as the trailing row of documentFormatFiles[]; whether the bundle itself is materialized inside the folder depends on the submission. Externally-incorporated documents (prior filings cross-referenced by the prospectus) are not pulled in; they remain at their original accession numbers. Confidential draft registration statements (DRS / DRSLTR) filed under the JOBS Act / FAST Act confidential-submission accommodation are out of scope and live under different EDGAR form types; the dataset covers only the public F-1 and F-1/A submissions that follow the public filing of the offering.
F-1 and F-1/A records coexist in the same monthly partition and share an identical record shape. F-1/A folders are typically smaller because the amendment frequently re-files only the changed pages of the prospectus together with a refreshed auditor consent (EX-23.1), an updated legal opinion (EX-5.1), an updated filing-fee exhibit, and — when an auditor change has occurred — a former-auditor letter (EX-16.1). The amendment chain for a single offering is reconstructable from the shared 333-XXXXXX fileNo carried in entities[].fileNo, ordered chronologically by filedAt.
The substantive content requirements of Form F-1 have evolved meaningfully across the dataset's coverage window (June 1996 to present), driven by both Form F-1 instructions and the underlying Form 20-F item set that F-1 incorporates by reference:
EX-FILING FEES), required to be filed as Inline XBRL using the ffd taxonomy. The presence of this exhibit and its Inline XBRL structure is therefore a temporal marker in the dataset.Early F-1 records (mid-1996 through roughly 2002) were filed predominantly as plain-text ASCII submissions, with a single .txt document carrying the entire registration statement and exhibits separated by EDGAR <DOCUMENT> envelopes. From the early 2000s onward, HTML became the default presentation format for the main F-1 and individual exhibits, and by the late 2000s essentially all F-1 documents were delivered as .htm files inside the same SGML envelope. Modern HTML payloads are styled with inline CSS and include extensive in-page tables for capitalization, dilution, executive compensation, principal shareholders, and the financial statements.
XBRL did not historically apply to Form F-1 except via the universal cover-page tagging requirements that the SEC extended progressively. The most material format change in recent years is the adoption of Inline XBRL for the filing-fee exhibit (Exhibit 107), in which the fee-calculation table is rendered as XHTML with <ix:> tags surfacing each fact (offering amount, fee rate, fee paid, offsets) under the ffd namespace. The main F-1 document itself can also carry Inline XBRL tags for cover-page facts; in such cases the documentUrl is exposed through the https://www.sec.gov/ix?doc=/... viewer rewriter. Auxiliary XBRL schema and linkbase artifacts (EX-101.SCH, EX-101.CAL, EX-101.DEF, EX-101.LAB, EX-101.PRE) appear in dataFiles[] for filings that include traditional, non-Inline XBRL data alongside the HTML.
333-XXXXXX fileNo across formType in (F-1, F-1/A) and ordering by filedAt..htm payload begins with the unquoted SGML <DOCUMENT> header (five header tags before <HTML>); a parser that does not strip this header before processing will fail or produce incorrect DOM output.dei, ffd, xbrli, and ix namespaces. The fact stream is recoverable directly from the HTML body without a separate XML download.dataFiles[] is the XBRL discovery path. Because linkToXbrl is consistently empty for F-1 records even when XBRL is present, all XBRL artifact discovery should go through the dataFiles[] array.metadata.json (notably sic) preserve HTML-entity encoding such as &; consumers rendering plain text should decode these.irsNo is uniformly "000000000" for foreign issuers without a U.S. EIN; stateOfIncorporation carries EDGAR's foreign-jurisdiction code set (e.g., E9 Cayman Islands, D8 British Virgin Islands, G7 Denmark, K3 PRC), which is the most reliable jurisdiction signal at the metadata layer.EX-5.1, EX-10.15, EX-99.1, EX-FILING FEES, etc.) is assigned by the filer and not strictly normalized; consumers building exhibit-type indexes should canonicalize against Item 601 categories rather than rely on exact string matching.GRAPHIC entries in documentFormatFiles[] but not stored in the record; downstream consumers needing those binaries must fetch them from the SEC URL.metadata.accessionNo is dashed (NNNNNNNNNN-YY-NNNNNN); the folder name is the same digits without dashes. Conversion is a simple substitution.The filer of a Form F-1 or Form F-1/A is the issuer-registrant: a non-U.S. company that qualifies as a foreign private issuer (FPI) under Rule 405 of the Securities Act and Rule 3b-4 of the Exchange Act, and that is registering securities for offer or sale in the United States under the Securities Act of 1933.
An issuer is an FPI unless both:
FPI status is tested as of the last business day of the most recently completed second fiscal quarter.
The filer population consists of operating companies, holding companies, and other commercial entities organized outside the United States, including:
The filer is always the issuer. Underwriters, depositaries, selling shareholders, auditors, and counsel are named in the registration statement and may sign consents or exhibits, but they do not file Form F-1.
The following parties register elsewhere and are outside this dataset:
An FPI that loses FPI status migrates to the S-series; a domestic issuer that becomes an FPI may move to F-1 or F-3.
Form F-1 is transactional and event-driven, not periodic. The trigger is the issuer's decision to undertake a registered offering of securities into the U.S. market under Section 5 of the Securities Act, where no shorter or more specialized Securities Act form is available.
Common trigger events:
There is no recurring schedule. Filing volume tracks U.S. capital-markets activity and FPI listing windows.
The initial F-1 is filed when the issuer is ready to begin (or to make public) its SEC registration process. Under the JOBS Act and subsequent SEC staff policy (expanded in 2017 to all FPIs), an issuer may first submit a draft F-1 confidentially; those drafts enter the public EDGAR record only when the issuer publicly files. The dataset reflects the public filing date, not the date of first SEC engagement.
A Form F-1/A is filed whenever the issuer needs to amend a pending F-1 before effectiveness. Typical drivers:
There is no fixed deadline for an F-1/A; timing is driven by the SEC review cycle, financial-statement age-out, and the issuer's pricing schedule.
The typical chronology for an F-1 IPO runs:
Form F-1 is a registration statement under the Securities Act of 1933. Key provisions:
The SEC's Division of Corporation Finance, including the Office of International Corporate Finance, reviews F-1 filings.
Form F-1 sits at the intersection of two regimes: Securities Act registration and the foreign private issuer framework. Several adjacent datasets overlap with it in either purpose (registering securities) or filer population (FPIs). The comparisons below identify the closest neighbors and the precise boundary between each one and Form F-1.
The closest functional analogue. S-1 and F-1 share nearly identical disclosure architecture: prospectus body, business description, risk factors, MD&A, plan of distribution, dilution, use of proceeds, and audited financials.
Key differences are filer eligibility and FPI accommodations:
Use S-1 for the U.S.-issuer IPO universe; F-1 for the FPI slice. The two are not interchangeable populations.
Same filer universe as F-1 but at the opposite end of the issuer-maturity curve. F-3 is available to seasoned FPIs meeting reporting-history and float thresholds and incorporates 20-F and 6-K content by reference rather than restating disclosures.
Use F-1 for first-time U.S. listings; F-3 for secondary capital raises by listed FPIs.
The FPI parallel to S-4. Both F-1 and F-4 are Securities Act registrations with prospectus-style disclosure, but the transaction context is different.
Use F-4 for cross-border M&A consideration disclosures; F-1 will not capture them.
Periodic, not transactional. The relationship to F-1 is sequential: an FPI files F-1 once (initial U.S. registration) and 20-F annually thereafter. F-3 takedowns frequently incorporate 20-F by reference.
Use 20-F as the longitudinal time series for an FPI's disclosure; use F-1 for the single registration event.
Furnished, not filed. 6-K is keyed to home-country disclosure obligations and exchange announcements rather than the fixed event list that drives Form 8-K. It is a continuous stream of interim items (press releases, interim financials, material announcements).
No content overlap with F-1, but the two are complementary when reconstructing an FPI's information record around an offering. Form 6-K is not a substitute for F-1.
Specialized registrations under the Multijurisdictional Disclosure System: F-10 (long-form), F-7 (rights offerings), F-8 and F-80 (business combinations, F-80 for larger deals). MJDS filings rely on Canadian disclosure documents and accept Canadian GAAP/IFRS without U.S.-style restructuring.
Eligible Canadian issuers generally choose MJDS over F-1 because of the lighter burden. As a result, the F-1 dataset systematically underrepresents MJDS-eligible Canadians and captures only Canadian issuers that opt out of or do not qualify for MJDS, plus all non-Canadian FPIs. Use MJDS-form datasets for Canadian cross-border filings.
F-1/A amendments are pre-effective (and occasionally post-effective) amendments to F-1 registration statements and form a substantial share of records, since SEC review typically produces multiple amendment rounds.
The meaningful contrast is between:
Studies of disclosure evolution, comment-letter response, or final-prospectus content must distinguish initial from final amendment. Treating all F-1 and F-1/A records as interchangeable conflates draft and final disclosure.
Sequential stages of the same offering. F-1 (and its amendments) is the pre-effective registration statement; 424(b) is the post-effective prospectus actually delivered to investors, capturing final pricing and any Rule 430A changes.
Neither substitutes for the other; together they describe the full offering lifecycle.
A short notice form filed by foreign issuers or bidders for cross-border tender offers, exchange offers, rights offerings, or business combinations qualifying for Rule 13e-4(h)(8), Rule 14d-1(c), or Securities Act Rule 802 exemptions. The U.S. submission is a cover form attaching home-country offering documents, not a U.S.-format prospectus.
F-1 is the opposite case: no exemption applies and the issuer produces a full U.S. registration statement. Both can describe FPI offerings reaching U.S. holders, but they represent different regulatory pathways and are not interchangeable. (Form CB)
The Form F-1 Files Dataset is the corpus of long-form Securities Act registration statements and pre-effective amendments filed by foreign private issuers that do not qualify for short-form registration (F-3), MJDS treatment (F-7/F-8/F-10/F-80), or a Securities Act exemption (Form CB).
It is:
For prospectus-level disclosure produced at the moment an FPI first registers a U.S. offering (or any later registered offering ineligible for short-form treatment), this dataset is the correct source. For any other FPI or registration scenario, one of the adjacent datasets above applies instead.
The Form F-1 corpus is the canonical record of foreign private issuer registration in the United States. Each profession below works on a different slice of the filing — prospectus text, financial statements, exhibits, or metadata — and converts it into a specific output.
Attorneys representing FPIs, sponsors, and underwriters use the dataset as a precedent library. They pull recent F-1s from the same jurisdiction, sector, and size band to benchmark risk-factor language (country, VIE, exchange-control, sanctions, enforcement), related-party and controlling-shareholder disclosure, plan of distribution, lock-ups, dual-class structures, and use-of-proceeds wording. The exhibit index drives most drafting work: legal opinions (Ex. 5), tax opinions (Ex. 8), underwriting agreements (Ex. 1), articles and bylaws (Ex. 3), material contracts (Ex. 10), subsidiary lists (Ex. 21), and auditor consents. Diffing successive F-1/A amendments reconstructs the staff comment-and-response cycle, which trains associates and informs negotiation on live deals.
IPO origination and equity capital markets teams build comparable-deal analyses for FPI mandates. From the cover, plan of distribution, and underwriting exhibit they extract offering size, primary/secondary mix, greenshoe, gross spread, syndicate composition, lock-up terms, pre-IPO cap tables, and insider holdings. MD&A and selected financial data feed growth and margin comps. These extracts populate pitch books, league-table materials, fee grids, and price-talk discussions.
Long-only fundamental, event-driven, and IPO-dedicated funds use F-1s as the primary diligence document for foreign offerings, where home-country disclosure is often non-English or sparse. They focus on the business description, customer concentration, audited financials and IFRS/GAAP reconciliations, jurisdiction-specific risks, governance disclosure around dual-class shares and FPI exemptions, related-party flows in VIE structures, and dilution. Comparing F-1/A amendments surfaces price-range cuts, share-count changes, restatements, and new risk factors as deal-momentum signals.
Due-diligence committees, new-issue review desks, and broker-dealer compliance officers use the corpus to support Section 11 and Section 12 defenses and FINRA Rule 5110 filings. They rely on the full prospectus for material-disclosure verification, auditor consents and financials for going-concern flags, legal opinions, and the underwriting agreement for compensation and conflicts. Prior FPI precedents test the adequacy of their own diligence records.
Researchers use the dataset as an empirical corpus for studies on FPI underpricing and long-run performance, JOBS Act emerging-growth-company effects (confidential submission, scaled disclosure), cross-listing and bonding theory, VIE and dual-class governance, IFRS-to-GAAP comparability, and textual analysis of risk factors and boilerplate. Filer CIK, accession number, form type, and amendment number support panel construction; the document text supports NLP pipelines.
Strategic acquirers and cross-border M&A advisors mine F-1s for precedent on targets in specific jurisdictions. Pre-IPO ownership and shareholder agreements, Exhibit 21 subsidiary lists, material contracts, foreign-investment and CFIUS disclosures, and post-IPO change-of-control restrictions inform target screening, valuation, deal structuring, and reps-and-warranties drafting.
Financial-crime teams at banks, prime brokers, custodians, and asset managers use F-1 disclosures to onboard FPIs and their controlling shareholders. They focus on beneficial-ownership and principal-shareholder tables, officer and director biographies, offshore holding structures, operations in sanctioned jurisdictions, state-owned counterparty dealings, PEP relationships, and pending legal or tax proceedings. The filings provide a reproducible primary-source record for enhanced-diligence files.
Analysts evaluating convertible bonds, pre-IPO loans, and cornerstone tickets in FPIs use the financial statements, capitalization tables, and Exhibit 10 credit agreements to assess leverage, liquidity runway, debt maturity, and covenants, then layer on use-of-proceeds and dilution to model post-offering capital structure.
Financial-data engineering teams ingest the corpus to build FPI offering databases and event tables. Work includes parsing prospectuses into normalized fields (size, share count, range, sector, country, auditor), extracting syndicate and exhibit chains, backtesting IPO and post-IPO strategies, linking F-1 records to subsequent 20-F, 6-K, and Form 4 filings, and training or evaluating LLMs on prospectus and risk-factor text for retrieval-augmented systems serving lawyers, bankers, and analysts. The combination of full-text documents (TXT/HTML/PDF) and structured JSON metadata supports both rule-based and LLM-based pipelines.
In summary, lawyers treat the Form F-1 Files Dataset as a precedent library, bankers as a comp database, analysts as a diligence source, compliance teams as a liability-defense file, researchers as a corpus, and engineers as a training and extraction substrate. The 1996-onward completeness of F-1 and F-1/A filings, combined with the full exhibit set, is what makes a single dataset support all of these workflows.
The following workflows draw on the prospectus body, exhibit slate, and submission metadata of the Form F-1 Files Dataset.
Benchmarking jurisdiction-specific risk-factor language for FPI prospectus drafting. Capital markets lawyers pull risk-factor sections from recent F-1s filed by issuers sharing the same stateOfIncorporation code (e.g., E9 Cayman, K3 PRC, D8 BVI) and SIC band, then diff the language to draft VIE, HFCAA, exchange-control, and enforceability-of-civil-liabilities risk factors. The output is a precedent bank that feeds first-draft risk-factor sections and staff-comment anticipation memos for live mandates.
Reconstructing the SEC comment-and-response cycle from F-1/A amendment chains. Grouping records by the shared 333-XXXXXX fileNo across formType in (F-1, F-1/A) and ordering by filedAt produces the full pre-effective amendment chain for a single offering. Section-by-section diffs across consecutive amendments surface staff-driven changes in disclosure (price ranges, share counts, new risk factors, restated financials, auditor changes signaled by EX-16.1), which trains associates, informs negotiation, and feeds academic studies of disclosure evolution.
Building an FPI IPO comparables database for ECM pitch books. Bankers parse the cover page, prospectus summary, plan of distribution, and EX-1.1 underwriting agreement to extract deal size, primary/secondary mix, greenshoe, gross spread, lead/co-manager syndicate, lock-up duration, and pre-IPO cap-table composition. Joined with entities[].sic, stateOfIncorporation, and ticker, these fields populate league-table extracts, fee grids, and price-talk decks for new FPI mandates.
Extracting Exhibit 21 subsidiary lists for cross-border corporate-structure mapping. EX-21.1 documents are pulled from each accession folder and parsed into subsidiary-name and jurisdiction pairs, then linked to the parent CIK and 333- file number. The resulting graph supports CFIUS screening, CFC/PFIC tax analysis, sanctions exposure mapping for KYC files, and identification of VIE-tier and offshore SPV layers in PRC and Cayman holding structures.
Harvesting Inline XBRL filing-fee facts from Exhibit 107. For filings post-2022, the EX-FILING FEES exhibit is parsed by reading the <ix:nonNumeric> and <ix:nonFraction> tags under the ffd: namespace inside the SGML-stripped XHTML body. Extracted facts (offering amount, fee rate, fee paid, prior-fee offsets) feed registered-offering size statistics, fee-aggregation across an amendment chain, and reconciliation against final 424(b) prospectuses.
Building a textual NLP corpus for FPI prospectus retrieval and LLM evaluation. Data engineers strip the leading <DOCUMENT> SGML envelope from each .htm payload, segment the prospectus into canonical sections (risk factors, MD&A, business, related-party transactions, taxation, enforceability, plan of distribution), and index the segments by CIK, jurisdiction, SIC, and filedAt. The corpus drives retrieval-augmented systems for FPI counsel and analysts, fine-tuning evaluations on cross-border disclosure, and large-scale textual studies of EGC scaled-disclosure adoption and HFCAA risk-factor diffusion.
Linking F-1 IPO records to subsequent 20-F, 6-K, and Form 4 filings for post-IPO event studies. Researchers and quants use issuer CIK as the join key to chain each initial F-1 to the issuer's later 20-F annual reports, 6-K interim items, and insider Form 4 filings. The resulting panel supports underpricing studies, lock-up-expiry insider-selling analysis, post-IPO restatement tracking, and long-run performance work on the foreign-issuer slice of the U.S. market from June 1996 forward.
The Form F-1 Files Dataset is available through three access methods: a JSON metadata index, a full archive download, and individual container downloads. Containers are ZIP files organized by year and month, covering filings from June 1996 to present.
Dataset Index JSON API: https://api.sec-api.io/datasets/form-f1-files.json
Returns dataset-level metadata (name, description, last updated timestamp, earliest sample date, total records, total size, form types, container format, file types) along with the full dataset download URL and the list of all container files. Each container entry includes its key, size, record count, last updated timestamp, and download URL. Use this endpoint to monitor which containers were updated in the most recent refresh run and decide which to re-download incrementally. This endpoint does not require an API key.
Example response:
1
{
2
"datasetId": "1f13365b-9ae0-692b-92be-c06b5f5da436",
3
"datasetDownloadUrl": "https://api.sec-api.io/datasets/form-f1-files.zip",
4
"name": "Form F-1 Files Dataset",
5
"updatedAt": "2026-05-07T02:50:21.261Z",
6
"earliestSampleDate": "1996-06-01",
7
"totalRecords": 88926,
8
"totalSize": 4113149130,
9
"formTypes": ["F-1", "F-1/A"],
10
"containerFormat": "ZIP",
11
"fileTypes": ["TXT", "JSON", "HTML", "PDF"],
12
"containers": [
13
{
14
"downloadUrl": "https://api.sec-api.io/datasets/form-f1-files/2026/2026-04.zip",
15
"key": "2026/2026-04.zip",
16
"size": 13818783,
17
"records": 154,
18
"updatedAt": "2026-05-07T02:50:21.261Z"
19
}
20
]
21
}
Download Entire Dataset: https://api.sec-api.io/datasets/form-f1-files.zip?token=YOUR_API_KEY
Downloads the complete dataset as a single ZIP archive containing every monthly container from June 1996 onward. Given the dataset size, prefer this only for full local mirrors; for incremental syncs, use the per-container URLs from the index. This endpoint requires an API key.
Download Single Container: https://api.sec-api.io/datasets/form-f1-files/2026/2026-04.zip?token=YOUR_API_KEY
Downloads one monthly container ZIP. Each container holds the metadata file and all documents (excluding image files) for every F-1 and F-1/A accession in that month, grouped by accession number. This endpoint requires an API key.
The dataset covers Form F-1, the long-form Securities Act registration statement filed by foreign private issuers, and Form F-1/A, the pre- or post-effectiveness amendment to such a registration. Both form types coexist in the same monthly partition and share an identical record shape.
One record is one complete EDGAR submission keyed by exactly one SEC accession number, materialized as a single folder containing a metadata.json submission header and every text, HTML, XML, or PDF document that was part of the original submission. A record corresponds 1:1 to a row of the EDGAR full-submission index for F-1 and F-1/A, not to a per-issuer or per-offering aggregation, so a single offering typically generates one initial F-1 plus a sequence of F-1/A amendment records.
The filer is always the issuer-registrant: a non-U.S. company that qualifies as a foreign private issuer under Rule 405 of the Securities Act and that is registering securities for offer or sale in the United States under the Securities Act of 1933, where no shorter or more specialized Securities Act form (F-3, F-4, F-7, F-8, F-10) is available. Underwriters, depositaries, selling shareholders, auditors, and counsel may be named in the registration statement and may sign exhibits, but they do not file the form themselves.
The dataset begins in June 1996, when EDGAR electronic filing became mandatory for most registrants, and runs to present. Earlier F-1 filings exist only on paper and are not part of this corpus.
The dataset is distributed as monthly ZIP containers organized by year and month. File types found inside the containers are HTML/HTM, JSON (the metadata file), TXT (full-submission bundles), and PDF (occasional exhibit attachments). Image binaries enumerated as GRAPHIC entries in documentFormatFiles[] are intentionally excluded.
Form S-1 is the closest functional analogue but covers domestic U.S. issuers rather than FPIs, and cannot use IFRS without U.S. GAAP reconciliation. Form F-3 covers the same FPI filer universe as F-1 but is a short-form, incorporation-by-reference registration available only to seasoned FPIs that meet reporting-history and float thresholds; F-1 is used by FPIs at first U.S. listing or by FPIs not yet eligible for F-3.
Each F-1/A is its own first-class record with its own accession number and folder; reconstructing a single offering's full registration history requires grouping records by the shared 333-XXXXXX fileNo carried in entities[].fileNo across formType in (F-1, F-1/A) and ordering them chronologically by filedAt. Treating all F-1 and F-1/A records as interchangeable conflates draft and final disclosure.