The Form 20-F Files Dataset is the document-level corpus of annual reports filed on Form 20-F and Form 20-F/A by foreign private issuers (FPIs) registered with the U.S. Securities and Exchange Commission. Each record is a single EDGAR submission identified by its accession number, materialized on disk as a flat folder containing a metadata.json inventory, the primary 20-F or 20-F/A document, and every separately-filed exhibit. The filer is always the issuer itself — a non-U.S.-organized company with securities registered under Section 12 of the Exchange Act or with a continuing reporting obligation under Section 15(d). Coverage runs from the earliest electronic 20-F filings in June 1995 through the present, spanning the full Item 1–19 architecture introduced by the 1999 IOSCO-aligned modernization, the 2007 elimination of the U.S. GAAP reconciliation for IFRS filers, and every subsequent SOX, Dodd-Frank, HFCAA, clawback, insider-trading, and cybersecurity rulemaking. Records are delivered as monthly ZIP containers partitioned by EDGAR filing month (YYYY/YYYY-MM.zip); the file types found inside a record are TXT, JSON, PDF, and HTML.
Programmatically retrieve the full list of dataset archive files, download URLs and dataset metadata.
Dataset Index JSON API
Download the entire dataset as a single archive file.
Download Entire Dataset:
Download a single container file (e.g. monthly archive) from the dataset.
Download Single Container:
The Form 20-F Files Dataset captures one foreign-private-issuer annual report (or one amendment) per record, in essentially the textual form in which it was originally accepted by EDGAR, minus image assets and standalone XBRL data files (both of which are referenced in metadata.json but not packaged on disk). On disk a record materializes as a flat folder named with the 18-digit dash-stripped accession number (for example 000119312525319187), nested under a monthly partition YYYY/YYYY-MM.zip keyed to the EDGAR filing month (filedAt), not the fiscal period being reported. Inside the folder every artifact for the filing sits as a flat sibling: a metadata.json inventory plus the primary annual report document and each separately-filed exhibit as its own file. There are no nested sub-directories.
Form 20-F is the annual report mandated by Sections 13 or 15(d) of the Securities Exchange Act of 1934 for foreign private issuers whose equity securities are registered or listed in the United States. It plays the same disclosure role as Form 10-K for domestic registrants but accommodates non-U.S. accounting frameworks (commonly IFRS as issued by the IASB), home-country governance regimes, and foreign corporate-law concepts. The form is also used as a registration statement under Section 12 of the Exchange Act when a foreign issuer first lists in the U.S., though the dataset's primary content is annual-report use. Form 20-F/A is an amendment to a previously filed 20-F: it carries the same form architecture, restates or supplements specific Items, and reuses the original periodOfReport of the report being amended — so the filing-month partition of the dataset does not coincide with the fiscal-year partition for amendments. A 20-F/A may republish the entire annual report with corrections, or it may target a small number of Items or exhibits (the dataset includes minimal amendments consisting of only the primary document plus, for example, a single clawback policy exhibit). The amendment does not replace the original record; it coexists with it under its own accession number.
The substantive disclosure schema is set by the SEC's General Instructions to Form 20-F. Parts I through III contain the substantive Items; Part IV captures supplemental information; signatures close the body; and a list of exhibits — many filed as separate documents — is governed by the Instructions as to Exhibits. The Item architecture is closely aligned with the IOSCO international disclosure standards for cross-border offerings.
Inside an accession folder the content layers, in load order, are:
metadata.json — a single JSON object describing the filing, its filer entity, and a complete inventory of every document and data file in the original EDGAR submission.type: "20-F" or type: "20-F/A" in documentFormatFiles.EX-99.x).What is enumerated in metadata.json but not packaged on disk:
GRAPHIC-type files (JPG, GIF, PNG embedded illustrations, charts, signatures-as-images, and inline-XBRL inline-image references) — excluded by dataset policy.dataFiles (the standalone instance document *_htm.xml and the linkbase files EX-101.SCH/CAL/DEF/LAB/PRE) — referenced via documentUrl but not physically included..txt envelope, which is enumerated as the trailing row in documentFormatFiles (with a sentinel single-space sequence and type) but not bundled.The authoritative description of what each file is comes from the documentFormatFiles array inside metadata.json — file names themselves are filer-agent-specific and do not encode role information consistently.
metadata.json — field-by-field anatomymetadata.json is the canonical record header. Its top-level keys are:
formType — "20-F" or "20-F/A".accessionNo — the canonical dashed accession number (e.g. "0001104659-25-124926"); the folder name is the same value with dashes removed.filedAt — ISO-8601 timestamp with timezone offset capturing the EDGAR acceptance moment.periodOfReport — fiscal-period end date in YYYY-MM-DD form. For 20-F/A this points to the period of the original report being amended, which can predate filedAt by years.description — human-readable form description such as "Form 20-F - Annual and transition report of foreign private issuers [Sections 13 or 15(d)]".linkToFilingDetails — direct EDGAR URL to the primary report HTM, prefixed by the inline-XBRL viewer path https://www.sec.gov/ix?doc=... when the primary document carries iXBRL.linkToTxt — URL to the complete EDGAR submission text bundle (<accessionNo>.txt).linkToHtml — URL to the EDGAR submission index page (...-index.htm).linkToXbrl — present but consistently an empty string, even when XBRL data clearly exists; rely on dataFiles[] and the inline tagging in the primary document instead.id — internal 32-character hex content hash.seriesAndClassesContractsInformation — empty array for 20-F filings (the structure is reused from investment-company filings).documentFormatFiles — ordered array, one element per document in the EDGAR submission (primary report, each exhibit, each graphic, plus the trailing complete-submission text bundle).dataFiles — array enumerating the XBRL schema, linkbase files, and the extracted instance document.entities — array of filer descriptors, almost always one element for a 20-F.Each documentFormatFiles[] entry carries sequence (string-typed; the complete-submission .txt row uses a literal space), size (bytes, string-typed), documentUrl, free-text description (e.g. "20-F", "ANNUAL REPORT", "CERTIFICATION", "DESCRIPTION OF SECURITIES", "LIST OF SUBSIDIARIES", "CONSENT OF ...", "GRAPHIC", "AMENDMENT NO 1", "Complete submission text file"), and the EDGAR submission type code. Type codes seen on Form 20-F filings include the primary 20-F / 20-F/A; narrative exhibits such as EX-1.2, EX-2, EX-2.1, EX-4.A(8); EX-8 / EX-8.1 (list of subsidiaries); governance and policy exhibits EX-11.1 (code of ethics), EX-14.1, EX-19.1 (insider-trading policies), EX-97 (clawback policy); SOX certifications EX-12.1 / EX-12.2 (Section 302) and EX-13.1 / EX-13.2 (Section 906); auditor and counsel consents EX-15.1 / EX-15.2 / EX-15.3 and EX-23.1; miscellaneous EX-16.1, EX-22.1, EX-99.1; and GRAPHIC rows for embedded image assets that are excluded from the on-disk package. The codes are not normalized across filers — the same conceptual exhibit may appear as EX-8 for one filer and EX-8.1 for another — so consumers should treat them as filer-supplied tags rather than an enumerated controlled vocabulary. The order roughly follows EDGAR submission sequence: primary 20-F, narrative exhibits, certifications, consents, other exhibits, graphics, and finally the complete-submission .txt bundle.
dataFiles[] entries follow the same shape and enumerate the XBRL artifacts referenced by the filing: EX-101.SCH taxonomy schema, EX-101.CAL calculation linkbase, EX-101.DEF definition linkbase, EX-101.LAB label linkbase, EX-101.PRE presentation linkbase, and an XML row labeled "EXTRACTED XBRL INSTANCE DOCUMENT" pointing at *_htm.xml. A filing whose taxonomy has no calculations may omit EX-101.CAL. Sequence numbers on dataFiles[] are typically non-contiguous because EDGAR slots them after the GRAPHIC rows.
entities[] describes the filer with companyName (with the role suffix (Filer) appended), cik (no leading zeros), fileNo (e.g. "001-14840"), filmNo, irsNo (commonly "000000000" for foreign filers), fiscalYearEnd in MMDD form (e.g. "0930", "1231", "0731"), stateOfIncorporation as the EDGAR state/country code (e.g. X0 Israel, D8 Cayman, E9 Singapore, A1 Alberta), act set to "34", type echoing the form type, sic carrying both the four-digit SIC code and its description (e.g. "7372 Services-Prepackaged Software"), and tickers as an array of trading symbols (e.g. ["DOX"], ["DFSC", "KWE", "KWEMF", "DFSCW"]).
The primary annual report document carries the form's substantive disclosure, organized by the General Instructions to Form 20-F into the architecture below. Filings are not required to render the headings literally, but the conceptual order is consistent.
Cover page. Identifies the registrant and the report; marks whether the filing is an annual report, transition report, or shell-company report; declares the title and number of outstanding shares of each class as of the period end; indicates whether the registrant is an emerging-growth company or a shell company; identifies the basis of accounting (U.S. GAAP, IFRS as issued by the IASB, or "Other") and, when "Other," the specific items the auditor has reviewed; states whether the registrant has filed all reports required during the preceding 12 months; and (where applicable) indicates whether the registrant has prepared an Interactive Data File. Modern cover pages are exhaustively iXBRL-tagged via the dei (Document and Entity Information) namespace.
Part I.
Part II.
Part III.
Signatures. A signature block by an officer of the registrant certifying that the registrant has duly caused the report to be signed.
For modern 20-F filings the primary document is full-fidelity XHTML carrying inline-XBRL markup. The file begins with an <?xml ... ?> prologue and an <html> element bearing many namespace declarations (xmlns:dei, xmlns:us-gaap, xmlns:ix, xmlns:xbrli, xmlns:srt, plus xmlns:ifrs-full for IFRS filers and a filer-specific extension namespace). Tagged facts are embedded in elements such as <ix:nonNumeric contextRef="..." name="dei:DocumentType">20-F</ix:nonNumeric>, with <ix:header>, <ix:hidden>, <ix:references>, and <ix:resources> blocks at the top of <body>. These primary documents are large — frequently 1–6 MB — because they carry the entire annual report plus its structured tagging. DEI tagging blankets the cover page, the financial statements are tagged against the US-GAAP taxonomy or IFRS taxonomy as applicable, and (per the 2020 amendments) risk-factor section headings are detail-tagged. Inline XBRL was phased in for FPIs by SEC Release 33-10514, with FPIs using IFRS required to comply for fiscal periods ending on or after 15 June 2021.
Form 20-F's Instructions as to Exhibits define a numbered exhibit scheme. Each exhibit typically appears as a separate file in the accession folder. The most common exhibits and their roles:
EX-1.1, EX-1.2, EX-1.3) is used when multiple constitutional documents are filed.EX-2.D (or a numbered sub-exhibit) is the post-2019 home of the Description of Securities Registered under Section 12.EX-4.A series for material agreements, employment and management contracts, financing agreements). Sub-references such as EX-4.A(8) reflect the deeply nested numbering used by long-running filers.EX-8 or EX-8.1), required by Item 19 and analogous to a 10-K's Exhibit 21.EX-11.1), required since the 2003 SOX-driven amendments.EX-12.1, EX-12.2).EX-13.1, EX-13.2).EX-15.1) and consents of other experts and counsel (EX-15.2, EX-15.3), required when financial statements are incorporated by reference into Securities Act registration statements.The exhibit type code in documentFormatFiles[].type is the canonical role tag; the filename pattern (tmb-20250930xex12d1.htm, dox-ex4_a8.htm, birk-ex2_1.htm, etc.) is filer-agent-specific and not standardized.
The file types found in the dataset are TXT, JSON, PDF, and HTML, but in modern filings the on-disk content is overwhelmingly HTM plus the JSON manifest, with PDF and plain TXT appearing primarily in older filings. Two distinct HTM dialects coexist within a single accession folder:
Inline-XBRL XHTML (the primary 20-F / 20-F/A document). As described above, this is full-fidelity XHTML beginning with an <?xml ... ?> declaration and a long namespace-laden <html> element, with <ix:...> tagged facts embedded throughout the body.
SGML-wrapped legacy HTML (most exhibits). The file opens with the EDGAR submission-bundle SGML envelope tags and then an inner uppercase-tag HTML body:
1
<DOCUMENT>
2
<TYPE>EX-12.1
3
<SEQUENCE>4
4
<FILENAME>ea027079501ex12-1_ezgotech.htm
5
<DESCRIPTION>CERTIFICATION
6
<TEXT>
7
<HTML>
8
<HEAD>
9
<TITLE></TITLE>
10
</HEAD>
11
<BODY STYLE="font: 10pt Times New Roman, Times, Serif">
12
...exhibit body (uppercase HTML tags, inline styles, no XBRL)...
These files are not standalone XHTML — the <DOCUMENT> / <TYPE> / <SEQUENCE> / <FILENAME> / <DESCRIPTION> / <TEXT> lines are normally wrapped inside the complete-submission .txt envelope but here remain on each exhibit. A consumer that wants clean HTML must strip those leading lines (and any trailing </TEXT></DOCUMENT>) before passing to an HTML parser.
There is no enforced naming standard; each filer agent has its own. Patterns commonly observed include:
tmb-20250930x20f.htm, tmb-20250930xex12d1.htm (RR Donnelley convention — x separator, d denoting sub-numbering such that 12d1 represents 12.1).ea0270795-20f_<slug>.htm, ea027079501ex12-2_<slug>.htm (sequential ea + accession suffix + ex<X-Y>_<slug>).dox-20250930.htm, dox-ex4_a8.htm; birk-20250930.htm, birk-ex2_1.htm; umewf_20f.htm, umewf_ex121.htm.arqq-20250930x20f.htm, lvro-20250630.htm — the primary 20-F is frequently named after the reporting period date.The only reliable way to map a file to its role is via documentFormatFiles[] in metadata.json (matched on the trailing path segment of documentUrl).
A record physically includes:
metadata.json describing the filing, its filer, and the full original-submission inventory.documentFormatFiles whose type is not GRAPHIC.GRAPHIC-type files (embedded JPG/GIF/PNG illustrations, charts, signatures, and inline-image references) are excluded from the on-disk package by dataset policy. Their documentUrl values in metadata.json remain the authoritative source if they are needed.*_htm.xml) are listed in dataFiles but not packaged. The structured XBRL facts are nonetheless embedded inline in the primary document via the <ix:...> tags, so the structured data is fully recoverable from the on-disk material alone..txt envelope is enumerated (with literal-space sequence and type) but not bundled — consumers who need the raw EDGAR submission can dereference linkToTxt.Form 20-F's substantive disclosure schema has been amended multiple times since 1995, and the dataset spans those eras:
The dataset begins in June 1995, when EDGAR submissions were predominantly plain ASCII text, and extends to the present:
.txt files inside the SGML submission envelope, with tabular data rendered as fixed-width text.<DOCUMENT>/<TYPE>/<SEQUENCE>/<FILENAME>/<DESCRIPTION>/<TEXT> header style for exhibits dates, and that style persists in many exhibits even today.EX-101.* exhibits filed alongside the primary document. FPIs using IFRS were phased in for fiscal years ending on or after 15 June 2011, once the IFRS taxonomy was available.EXTRACTED XBRL INSTANCE DOCUMENT continues to be listed in dataFiles).YYYY-MM of filedAt, not periodOfReport. A 20-F/A in a December 2025 partition can amend a fiscal year ending years earlier; a 20-F filed in early 2026 will sit in a 2026 partition even though it covers fiscal 2025.linkToFilingDetails carries the https://www.sec.gov/ix?doc=/... viewer prefix when the primary document is iXBRL. Drop the prefix to retrieve the raw XHTML directly.linkToXbrl is consistently empty even when XBRL is plainly present. Rely on dataFiles[] and the inline tagging in the primary document.documentFormatFiles[] by matching the trailing path segment of documentUrl. Filer-agent naming patterns vary (RR Donnelley tmb-...x20f.htm, Edgar Online ea...-20f_<slug>.htm, issuer-slug dox-20250930.htm, etc.).type codes across filers (EX-8 vs EX-8.1 for list of subsidiaries; minor variation in EX-12.x / EX-13.x / EX-15.x numbering). Treat type codes as filer-supplied tags rather than a controlled vocabulary.<DOCUMENT>...<TEXT> header outside the <HTML> body. Strip those header lines (and any trailing </TEXT></DOCUMENT>) before parsing as HTML.sequence: " " sentinel. The complete-submission .txt row uses a literal space for both sequence and type; consumers should not assume sequence is numeric.documentUrl values in metadata.json resolve directly to the EDGAR-hosted originals.metadata.json is the canonical inventory; the on-disk file count alone does not indicate whether a filing is "minimal" or "complete," because graphics and standalone XBRL artifacts are deliberately excluded from the package.Each record is an EDGAR submission of Form 20-F or Form 20-F/A by a foreign private issuer (FPI) registered with the SEC. The filer is always the issuer itself: a non-U.S.-organized company with securities registered under Section 12 of the Securities Exchange Act of 1934, or with a continuing reporting obligation under Section 15(d).
"Foreign private issuer" is defined in Rule 405 (Securities Act) and Rule 3b-4 (Exchange Act). The status test is reapplied annually as of the last business day of the second fiscal quarter:
An issuer that fails the test loses FPI status at fiscal year end and transitions to domestic forms (10-K, 10-Q, 8-K, Regulation 14A proxy rules) the following fiscal year.
The 20-F population spans operating companies, holding companies, banks, insurers, mining and resource companies, and other non-U.S. enterprises listed on NYSE, NYSE American, Nasdaq, or OTC markets, or otherwise SEC-registered. Securities may take the form of ADRs, ordinary share listings, or registered debt. Dual-listed and U.S.-only-listed FPIs both file on Form 20-F.
The 20-F is a schedule-driven annual report, not an event-driven filing. The obligation arises from:
Rules 13a-1 and 15d-1 require an annual report on the form prescribed for the issuer class; Form 20-F's General Instructions designate it as the FPI annual report (other than for Form 40-F MJDS filers). Form 20-F can also serve as an Exchange Act Section 12 registration statement, but the overwhelming majority of records in this dataset are annual reports.
Form 20-F/A records amend a previously filed 20-F. Common triggers:
There is no fixed deadline; an amendment is filed when the need arises. A 20-F/A neither resets the original due date nor cures delinquent status for the underlying report.
Form 20-F was adopted in 1979 (Release No. 34-16371), consolidating prior foreign-issuer annual report forms. EDGAR availability begins around June 1995, consistent with the earliest dataset sample; mandatory electronic filing of 20-F by FPIs was substantially complete by the late 1990s. The dataset therefore reflects the full electronic record of FPI annual reports and amendments from June 1995 to the present.
The legal filer is always the issuer. Directors, senior management, controlling shareholders, subsidiaries, joint ventures, and counterparties are described in the report but are not filers of the 20-F record. In a corporate group, only the SEC-registered issuer files; non-registered parents and subsidiaries appear only as described entities.
Form 20-F sits within the SEC's foreign private issuer reporting regime. The most useful comparisons are with the domestic annual report (10-K), the MJDS Canadian annual report (40-F), the FPI interim/event report (6-K), the FPI registration statement family (F-1/F-3/F-4 and the domestic S-1), 20-F/A amendments, a hypothetical Items extraction product, and structured XBRL fact datasets. Each overlaps with 20-F in subject matter or filer population but differs sharply in scope, cadence, structure, or content type.
The closest functional analog. Both are Exchange Act annual reports with audited financials, business descriptions, risk factors, and MD&A. They are mutually exclusive at the filer level: 10-K covers domestic registrants, 20-F covers FPIs.
Key differences:
To assemble a complete SEC annual-report universe, 10-K and 20-F datasets must be combined.
The annual report used by qualifying Canadian issuers under the Multijurisdictional Disclosure System. Same role as 20-F (annual report by a non-U.S. issuer) but a different regime:
A 40-F dataset is complementary to 20-F, not a substitute.
The FPI interim and event-driven report. FPIs do not file Form 10-Q or Form 8-K; Form 6-K covers both functions. Differences from 20-F:
6-K is the natural year-round complement to 20-F for the same filer population.
F-1, Form F-3, and Form F-4 are the FPI Securities Act registration statements (long-form, shelf, and business combinations respectively). Form S-1 is the domestic equivalent of F-1; the S/F divide tracks the same domestic/FPI line as 10-K vs 20-F.
Relationship to 20-F:
20-F is the source for periodic FPI disclosure; F-1/F-3/Form S-3/F-4 datasets are the source for offering-stage disclosure.
20-F/A filings amend a previously filed 20-F to correct errors, restate financials, add omitted exhibits, or respond to staff comments. They are included in this dataset alongside originals.
A Form 20-F Items dataset would deliver parsed text aligned to the FPI item schedule (Item 3D Risk Factors, Item 4, Item 5, etc.). The Files dataset is different in kind:
The Files dataset is the source-of-truth raw corpus from which Items extractions can be derived.
XBRL datasets expose structured financial facts (revenues, assets, liabilities, cash flows, segments) tagged under US-GAAP or IFRS taxonomies. They overlap with 20-F because 20-F filers tag financials in Inline XBRL.
The two are complementary halves (numeric and textual) of the same underlying filings.
Form 20-F Files is the comprehensive, document-level corpus of FPI annual reports and their amendments, delivered as the original EDGAR submission contents. It is not the domestic annual report (10-K), the MJDS Canadian annual report (40-F), the FPI interim/event report (6-K), an FPI registration statement (F-1/F-3/F-4), a section-parsed Items product, or a structured XBRL fact dataset. It is the right corpus when research requires the full text, exhibits, and metadata of FPI annual filings under the 20-F item schedule, including IFRS financial statements and 20-F/A amendment tracking. Adjacent datasets cover offering-stage disclosure, intra-year FPI events, structured numeric data, or domestic-issuer comparisons, but none substitute for the 20-F filing record itself.
Form 20-F is the controlling annual disclosure for foreign private issuers listed on U.S. exchanges. Because no domestic 10-K equivalent exists for these companies, a narrow but professionally diverse set of users treat this dataset as the authoritative source.
Sell-side and buy-side analysts covering non-U.S. issuers build models from Item 5 (operating and financial review), Item 4 (business and segments), and Item 18 financial statements. They track segment redefinitions, FX exposure, dividend policy, and (for older years) IFRS-to-GAAP reconciliations. The 20-F is often the only English-language audited reference, so it anchors initiation reports, quarterly model updates, and ADR-vs-local pair trades.
Equity PMs use Item 3 risk factors, Item 7 related-party and controlling-shareholder disclosures, and Item 10 share-capital and exchange-control disclosures to size positions in foreign-domiciled names. Credit analysts pull debt schedules, covenant terms, and guarantor structures from the financial-statement notes to model recovery on cross-border bonds. The 1995-to-present history and 20-F/A amendments let them walk prior cycles and restructurings on the same issuer.
Disclosure counsel benchmark drafting against peer 20-Fs on risk factors, sanctions exposure, controlling-shareholder arrangements, ADR taxation, and exhibits such as charters and indentures. F-1 and F-3 teams use prior 20-Fs as the incorporation-by-reference base. Litigation and enforcement counsel reconstruct what was disclosed and when for Section 10(b), Section 18, and FCPA matters, using 20-F/A history to track restatements and comment-response patterns.
Accounting policy groups at audit firms, standard-setter staffs, and corporate technical-accounting teams treat 20-Fs as a large comparable corpus of IFRS-as-filed-in-the-U.S. They study impairment, lease, financial-instrument, revenue, and pension choices, plus segment reporting. The longitudinal span supports tracking IFRS adoption by 20-F filers and the elimination of the GAAP reconciliation.
Deal teams use Item 4 segment and subsidiary descriptions, Item 7 related-party disclosures, Item 10 share-capital tables, and the exhibit set (charters, bylaws, material agreements) as the deepest public diligence on foreign targets, partners, and competitors. Corporate development groups monitor competitor capex, capacity, and capital allocation from Item 5 to source acquisition and partnership candidates.
ADR program managers and cross-listing advisors track program-related disclosures, fee arrangements, dividend tax treatment, and changes in share-capital or voting rights affecting ADR holders. The dataset supports onboarding diligence for new programs and periodic review of existing ones.
Quant research and financial-NLP groups use the corpus to train and back-test cross-border equity signals from risk-factor language, MD&A tone, and forward-looking statements. The TXT, HTML, and PDF mix supports both clean-text and layout-aware extraction. RAG developers ground research-assistant answers about foreign issuers in the primary filing rather than summaries.
Governance analysts use Item 6 (directors, senior management, employees) and Item 7 (major shareholders, related parties) to assess board composition, independence, executive pay, and controlling-family or sovereign influence. ESG teams extract environmental, supply-chain, sanctions, and human-rights disclosures, plus referenced sustainability exhibits, to feed voting recommendations and engagement priorities for non-U.S. portfolios.
Compliance functions at broker-dealers, asset managers, and banks use 20-Fs for customer due diligence and counterparty review on foreign issuers. They focus on Section 13(r) Iran-related disclosures, operations in sanctioned jurisdictions, beneficial-ownership and government-proceedings disclosures, and ICFR and auditor-identification information relevant to PCAOB inspection risk. 20-F/A history supports restatement and remediation tracking.
Accounting and finance researchers use the dataset for text-based studies of risk-factor evolution, readability, and tone; empirical work on accruals, impairments, and segment reporting; and studies linking disclosure to ADR liquidity, pricing, and short interest. Full-document coverage including exhibits supports hand-collected variables beyond what structured databases capture.
What unifies these users is the absence of a domestic equivalent: for U.S.-listed foreign private issuers, the 20-F is the authoritative annual record, and this dataset is the practical means of working with it at scale across analytical, legal, accounting, governance, compliance, and machine-learning workflows.
Concrete workflows the Form 20-F Files dataset supports across research, legal, accounting, and engineering teams.
Building an IFRS-as-filed comparables corpus. Technical-accounting researchers filter metadata.json for filings whose primary document declares the ifrs-full namespace and harvest the inline-XBRL-tagged Item 18 financial statements to build peer panels for impairment, lease, revenue-recognition, and pension policy choices. The longitudinal span (1995 to present) supports before/after studies of the 2007 elimination of the U.S. GAAP reconciliation.
Benchmarking risk-factor language across FPI peers. Disclosure counsel and equity analysts pull Item 3.D from the primary 20-F across a peer set defined by SIC code (entities[].sic) and stateOfIncorporation, then diff risk-factor headings (detail-tagged since the 2020 amendments) year-over-year for a single issuer or across a peer group. Output feeds drafting markup for the next 20-F or risk-mapping decks for buy-side coverage.
Tracking restatements and amendment scope via 20-F/A. Litigation, enforcement, and forensic-accounting teams isolate formType: "20-F/A" records, pair each with its original 20-F (matched on filer cik plus periodOfReport), and parse the amendment's explanatory note to classify scope (full re-publication vs. single-exhibit add such as EX-97 clawback policy). The output is a restatement timeline keyed to specific Items and exhibits for Section 10(b), Section 18, and PCAOB-inspection narratives.
Mining the exhibit set for material contracts and subsidiary maps. M&A and credit teams index exhibits by documentFormatFiles[].type to extract EX-4.x material contracts (financing agreements, employment contracts, indentures), EX-8 / EX-8.1 subsidiary lists, and EX-2.D securities descriptions. Pipelines must strip the SGML <DOCUMENT>/<TYPE>/<TEXT> envelope from legacy exhibit HTML before parsing. Output supports diligence packs, recovery models on cross-border bonds, and entity-graph construction.
Detecting newly-required disclosures as rules phase in. Compliance and governance analysts watch for first appearances of Item 16I (HFCAA auditor-jurisdiction disclosure, 2021), Exhibit 19 insider-trading policies (2022), Exhibit 97 clawback policies (2023), and Item 16K cybersecurity (effective for fiscal years ending on or after 15 December 2024). Filtering on exhibit type codes and partition month (YYYY/YYYY-MM.zip) flags filers that lag rule adoption and produces a coverage map for stewardship and PCAOB-risk reviews.
Grounding RAG and FPI-focused agents. NLP engineering teams chunk the iXBRL primary document by Item, attach accessionNo, cik, ticker, periodOfReport, and SIC as retrieval metadata, and use the result as the retrieval corpus for analyst-assistant agents covering ADR names. Because linkToFilingDetails carries the iXBRL viewer prefix, citations resolve directly back to the SEC-hosted source for end-user verification.
Cohort construction by jurisdiction and fiscal calendar. Quant and academic teams build issuer cohorts by combining entities[].stateOfIncorporation (e.g. X0 Israel, D8 Cayman, E9 Singapore), entities[].fiscalYearEnd (MMDD), and SIC, then align panel observations on periodOfReport rather than filedAt to avoid mixing fiscal years across the monthly partition. The cohort feeds event studies on cross-listing, sanctions exposure, and PCAOB-inspection access.
The Form 20-F Files Dataset can be accessed in three ways: through the dataset index JSON API, by downloading the full dataset archive, or by retrieving individual monthly container files.
Dataset Index JSON API: https://api.sec-api.io/datasets/form-20f-files.json
This endpoint returns dataset-level metadata (name, description, last updated timestamp, earliest sample date, total record count, total size, form types covered, container format, and file types) along with the full list of container files. Each container entry includes its size, record count, last updated timestamp, and direct download URL. Use this endpoint to monitor which containers were refreshed in the most recent update run and to selectively download only the containers that changed. This endpoint does not require an API key.
Example response:
1
{
2
"datasetId": "1f13365b-9ae0-6914-8189-afd3117ffa88",
3
"datasetDownloadUrl": "https://api.sec-api.io/datasets/form-20f-files.zip",
4
"name": "Form 20-F Files Dataset",
5
"updatedAt": "2026-04-28T02:56:43.957Z",
6
"earliestSampleDate": "1995-06-01",
7
"totalRecords": 198017,
8
"totalSize": 8987538136,
9
"formTypes": ["20-F", "20-F/A"],
10
"containerFormat": "ZIP",
11
"fileTypes": ["TXT", "JSON", "PDF", "HTML"],
12
"containers": [
13
{
14
"downloadUrl": "https://api.sec-api.io/datasets/form-20f-files/2026/2026-04.zip",
15
"key": "2026/2026-04.zip",
16
"size": 89241733,
17
"records": 312,
18
"updatedAt": "2026-04-28T02:56:43.957Z"
19
}
20
]
21
}
Fetch the index with curl:
1
curl -s https://api.sec-api.io/datasets/form-20f-files.json
Download Entire Dataset: https://api.sec-api.io/datasets/form-20f-files.zip?token=YOUR_API_KEY
Downloads the complete archive containing all Form 20-F and Form 20-F/A filings from June 1995 to the present in a single ZIP. This endpoint requires an API key passed via the token query parameter.
1
curl -o form-20f-files.zip \
2
"https://api.sec-api.io/datasets/form-20f-files.zip?token=YOUR_API_KEY"
Download Single Container: https://api.sec-api.io/datasets/form-20f-files/2026/2026-04.zip?token=YOUR_API_KEY
Downloads one monthly container ZIP instead of the full dataset. Containers follow the path pattern /form-20f-files/<year>/<year>-<month>.zip, and the complete list of available containers is enumerated in the dataset index JSON. This endpoint requires an API key.
1
curl -o 2026-04.zip \
2
"https://api.sec-api.io/datasets/form-20f-files/2026/2026-04.zip?token=YOUR_API_KEY"
The dataset covers Form 20-F and Form 20-F/A — the annual report and amendment forms used by foreign private issuers under Sections 13 or 15(d) of the Securities Exchange Act of 1934. It does not include Form 10-K (domestic annual reports), Form 40-F (Canadian MJDS annual reports), or Form 6-K (FPI interim and event reports).
One record is a single Form 20-F or Form 20-F/A submission as accepted by EDGAR, identified by its accession number. On disk it is a flat folder named with the 18-digit dash-stripped accession number, containing a metadata.json inventory, the primary 20-F or 20-F/A document, and each separately-filed exhibit as its own file.
Form 20-F is filed by foreign private issuers — non-U.S.-organized companies with securities registered under Section 12 of the Exchange Act or with a continuing reporting obligation under Section 15(d). FPI status is defined in Rule 405 and Rule 3b-4 and reapplied annually as of the last business day of the second fiscal quarter. MJDS-eligible Canadian issuers may instead use Form 40-F.
The annual deadline is four months after fiscal year end, uniform across FPIs regardless of accelerated-filer status. A Form NT 20-F filed before the original due date triggers an automatic 15-calendar-day extension under Rule 12b-25. Form 20-F/A amendments have no fixed deadline and are filed when needed.
The dataset begins in June 1995, the earliest period of EDGAR availability for Form 20-F filings, and runs to the present. It spans the 1999 IOSCO-aligned modernization, the 2007 elimination of the U.S. GAAP reconciliation for IFRS filers, and every subsequent Sarbanes-Oxley Act, Dodd-Frank, HFCAA, clawback, insider-trading, and cybersecurity rulemaking applicable to FPIs.
The file types are TXT, JSON, PDF, and HTML. In modern filings the on-disk content is overwhelmingly HTM (an inline-XBRL XHTML primary document plus SGML-wrapped legacy HTML exhibits) plus the metadata.json manifest; PDF and plain TXT appear primarily in older filings. The dataset is delivered as monthly ZIP containers partitioned by EDGAR filing month under the path pattern YYYY/YYYY-MM.zip.
The Files dataset delivers the complete EDGAR submission — primary document, exhibits, and supporting attachments (excluding images) — preserving the original HTML, TXT, PDF, and JSON metadata. A Form 20-F Items dataset would deliver parsed text aligned to the FPI item schedule (Item 3D Risk Factors, Item 4, Item 5, etc.), while an XBRL dataset would deliver structured numeric facts tagged under US-GAAP or IFRS taxonomies. Files is the source-of-truth raw corpus from which Items extractions and XBRL fact tables can be derived.