The Form 20FR12G Files Dataset is a complete archive of every EDGAR submission of form type 20FR12G and 20FR12G/A — the initial Section 12(g) Exchange Act registration statement filed by foreign private issuers using the Form 20-F disclosure template, together with its amendments. One record corresponds to one EDGAR accession: an accession folder containing a metadata.json envelope describing the submission and the original document files EDGAR received (the primary Form 20-F registration statement plus its non-image exhibits). Filers are non-U.S. operating companies, holding companies, and subsidiaries that have either crossed the Section 12(g) shareholder-of-record thresholds or are registering voluntarily in support of OTC trading, Rule 144 resale, or future listings. EDGAR-era electronic 20FR12G filings begin on 1996-09-01 and continue to the present, and the dataset is delivered as monthly ZIP containers holding a mix of HTML, plain-text TXT, JSON, and PDF files.
Programmatically retrieve the full list of dataset archive files, download URLs and dataset metadata.
Dataset Index JSON API
Download the entire dataset as a single archive file.
Download Entire Dataset:
Download a single container file (e.g. monthly archive) from the dataset.
Download Single Container:
The dataset packages, in their original EDGAR-archived form, every initial Section 12(g) registration statement submitted on the Form 20-F template by a foreign private issuer, plus every 20FR12G/A amendment to such a registration. Each record is a self-contained, file-system-native rendition of one accession — both a structured-metadata record (the JSON envelope) and a document-archive record (the primary 20-F and its exhibits) — preserving the same shape EDGAR received and stored.
A 20FR12G filing is a registration statement under Section 12(g) of the Securities Exchange Act of 1934, submitted by a foreign private issuer using Form 20-F as the disclosure vehicle. The "20FR12G" submission type is the EDGAR routing label that combines two facts: (a) the registrant uses Form 20-F content, and (b) the purpose is the initial registration of a class of equity securities under Section 12(g) — typically because the issuer has crossed Section 12(g) shareholder/asset thresholds, is preparing for U.S. over-the-counter trading, or is responding to a compelled-registration situation. Variant 20FR12G/A amends a previously filed 20FR12G, restating, supplementing, or correcting portions of the original registration; amendments commonly add or refresh financial statements, fix cover-page errors, or respond to SEC staff comments.
Because the form is Form 20-F, the disclosure scope mirrors the 20-F annual-report regime: a comprehensive description of the issuer's business, properties, risk factors, governance, financial condition, and audited financial statements prepared either under U.S. GAAP or under IFRS as issued by the IASB (with prior-era variants permitting home-country GAAP plus a U.S. GAAP reconciliation). The dataset begins in September 1996 — consistent with the phase-in of mandatory EDGAR filing for foreign private issuers — and is distributed as monthly ZIP containers grouping accession folders under YYYY/YYYY-MM/ directories. The file types found across records are HTML/HTM, plain-text TXT, JSON (the manifest), and PDF.
One record in the Form 20FR12G Files Dataset is a single EDGAR submission of form type 20FR12G or 20FR12G/A, materialized as one accession folder named with the eighteen-digit dash-stripped accession number (for example 000106299323017949 for accession 0001062993-23-017949). The folder holds two kinds of artifacts: a single metadata.json describing the entire submission, and the original document files EDGAR received — the primary Form 20-F registration statement plus its exhibits. Image attachments (.jpg, .gif, .png) are catalogued in metadata.json but are not bundled inside the record. Accession folders are grouped under YYYY/YYYY-MM/ directories and packaged as monthly ZIP containers, but the unit of analysis is the accession folder.
A record is laid out in three concentric layers:
metadata.json and the original document files.metadata.json, a single JSON object describing the EDGAR submission as a whole: filer identity, timestamps, document manifest, and filing-level URLs.<DOCUMENT> envelope around an HTML, plain-text, or (less commonly) PDF payload.The record is simultaneously a structured-metadata record (the JSON) and a document-archive record (the files), tightly cross-referenced: every file on disk corresponds to an entry in metadata.json.documentFormatFiles[], with matching sequence, filename (encoded inside the SGML header), type, and description values.
metadata.json envelopemetadata.json is a single JSON object whose top-level fields describe the entire accession.
formType — EDGAR form-type literal, either "20FR12G" or "20FR12G/A".accessionNo — dashed accession number, e.g. "0001062993-23-017949".filedAt — ISO-8601 timestamp with EDGAR's local offset, e.g. "2023-09-12T16:46:40-04:00".description — human-readable EDGAR description, typically "Form 20FR12G - Registration of securities [Section 12(g)]" or, for amendments, the same string with : [Amend] appended.linkToFilingDetails — canonical sec.gov URL to the primary document.linkToTxt — URL to the rolled-up combined SGML submission (*.txt) that EDGAR produces from all document parts.linkToHtml — URL to the EDGAR *-index.htm filing-index page.linkToXbrl — URL to an XBRL instance; ordinarily an empty string for this form type, since 20FR12G initial-registration submissions do not carry inline XBRL.id — internal record identifier.documentFormatFiles[] — document manifest (described below).entities[] — one element per filer/subject company (described below).seriesAndClassesContractsInformation[] — investment-company-specific structure, empty for 20FR12G.dataFiles[] — reserved for structured-data files such as XBRL exhibits, generally empty for this form type.documentFormatFiles[]Each element describes one item in the submission. Per-entry fields:
sequence — EDGAR sequence number assigned at submission (the rolled-up combined .txt row carries a blank or whitespace sequence).description — human label, such as "FORM 20FR12G/A", "EXHIBIT 15.4", "GRAPHIC", or "Complete submission text file".documentUrl — canonical sec.gov URL for the file.type — SEC document type code: 20FR12G, 20FR12G/A, EX-1.* (charter/bylaws), EX-2.* (instruments defining securities), EX-4.* (material contracts), EX-7.* (computations), EX-8.* (subsidiaries), EX-12.* (certifications, where applicable), EX-15.* (auditor consents and opinions), EX-99.* (additional exhibits), GRAPHIC (image attachments), and so on.size — file size in bytes (string).Image entries (type: "GRAPHIC") and the rolled-up combined .txt row are catalogued here but are not present as files inside the record on disk.
entities[]Each filer is represented by an object built from EDGAR header fields. Fields include:
cik — Central Index Key.companyName — registrant's name as on file with EDGAR, normally suffixed with a role marker such as "(Filer)".type — registrant's form-type role on this submission, mirroring formType.act — governing act, normally "34" for Section 12(g) registrations under the Exchange Act.fileNo — SEC file number, e.g. "000-52568".filmNo — EDGAR film number assigned at acceptance.sic — SIC code with description, e.g. "7372 Services-Prepackaged Software".irsNo — IRS employer identification number (often "000000000" for foreign issuers without a U.S. EIN).stateOfIncorporation — EDGAR's two-character location code; for foreign private issuers this almost always denotes a non-U.S. jurisdiction (for example A1 = British Columbia, A0 = Alberta, M0 = Cayman Islands).fiscalYearEnd — four-digit MMDD code, e.g. "1031" for October 31.Most 20FR12G submissions have a single filer; multi-filer submissions place additional objects in entities[], each carrying its own cik, companyName, fileNo, and role.
Every document file in the record — including HTML files — opens with the EDGAR SGML <DOCUMENT> wrapper rather than a clean HTML doctype. The wrapper carries header lines that mirror the corresponding documentFormatFiles[] entry, followed by the document payload between <TEXT> and </TEXT>:
1
<DOCUMENT>
2
<TYPE>EX-15.4
3
<SEQUENCE>2
4
<FILENAME>exhibit15-4.htm
5
<DESCRIPTION>EXHIBIT 15.4
6
<TEXT>
7
<html>
8
... rendered document body ...
9
</html>
10
</TEXT>
11
</DOCUMENT>
Consumers expecting strictly conformant HTML must either strip the SGML header lines or parse the wrapper before handing the inner payload to an HTML parser. The same envelope is used regardless of whether the inner payload is HTML, plain text, or a PDF blob (PDFs are uuencoded inside <TEXT> in older filings and stored as binary attachments in modern ones).
The primary document carries type of 20FR12G or 20FR12G/A and is by far the largest file in the record (commonly ranging from a few hundred kilobytes to several megabytes of HTML). It contains the full Form 20-F as registration statement.
The document opens with the standard cover sheet:
FORM 20-F or FORM 20-F/A with amendment number).After the cover, the body follows the Form 20-F instruction structure, organized into three Parts. A table of contents is typically inserted between the cover and Part I.
Part I
Part II
Part III
The primary document closes with signature blocks identifying the officer(s) signing on behalf of the registrant, the location, and the date.
References to images embedded in the HTML (for example <img src="form20fr12gax001.jpg" ...>) point to GRAPHIC attachments that are listed in metadata.json.documentFormatFiles[] but intentionally not included as files in the record.
Exhibits are separate files, each with its own <DOCUMENT> envelope and its own type code (EX-1.1, EX-2.1, EX-4.*, EX-8.1, EX-12.*, EX-15.*, EX-99.*, etc.). Exhibit content varies with the issuer's circumstances but typically includes:
EX-15.* consents when more than one audit firm is involved (for example because of a change of auditor or because predecessor and successor auditors each opined on different fiscal years).Exhibits are typically short relative to the primary document — a few kilobytes for consents, larger for material contracts, occasionally PDF for scanned older instruments.
The record on disk contains:
metadata.json — always.documentFormatFiles[], in HTML, plain text, or PDF as filed.In modern records the body is overwhelmingly HTM plus the JSON manifest, while older records lean on TXT.
The record deliberately omits:
.jpg, .gif, .png) referenced from the HTML and listed under type: "GRAPHIC" in documentFormatFiles[]. They remain catalogued in metadata but are not bundled.*.txt) that EDGAR generates by concatenating all document parts. It is referenced via linkToTxt in metadata.json but is not stored as a separate file inside the record (the individual document parts are stored instead).The substantive content requirements of Form 20-F — and therefore of 20FR12G registrations — have evolved through several SEC rulemakings since the mid-1990s:
For 20FR12G specifically, the practical effect is that older records carry a sparser Part II, no IFRS-without-reconciliation option, and no cybersecurity or mine-safety items, while modern records include the full slate of post-Sarbanes-Oxley and post-Dodd-Frank disclosure blocks even when many Part II items are marked "Not applicable" because of the registration-statement (rather than annual-report) posture.
Item 19 exhibit requirements likewise grew over the period, particularly with the addition of CEO/CFO certifications, auditor-consent expectations for amendments incorporating new audit reports, and expanded contract-exhibit categories.
The dataset spans submissions from September 1996 to the present, and the file-format mix in records evolves accordingly:
<DOCUMENT> SGML envelope is the same, but <TEXT> payloads are unformatted ASCII with hand-laid tables and page-break markers. PDFs, when present, were uuencoded inside <TEXT> blocks. Records from this era often contain a single large .txt for the primary form and .txt exhibits..htm with embedded <table> markup; exhibits migrate to .htm as well. PDFs continue to appear for scanned material contracts, old by-law documents, and similar artifacts. Image references appear in HTML, with corresponding .jpg/.gif attachments catalogued separately as GRAPHIC entries.linkToXbrl is typically empty and dataFiles[] is typically empty.The SGML <DOCUMENT> envelope, the metadata.json schema, and the accession-folder layout remain stable across all eras of the dataset; what changes is the inner <TEXT> payload's format and richness.
20FR12G/A is a complete re-filing in form, but its content may amend only specific Items, with the cover page indicating "Amendment No. N". The amendment's metadata.json and documentFormatFiles[] describe only the amendment submission; the original 20FR12G remains a separate record under its own accession.documentFormatFiles[] for the current record; they are obtainable only by following the cross-reference to the prior accession.20FR12G/A updates or refreshes the financial statements, expect one or more EX-15.* consents dated near the amendment date, each naming the audit firm and the specific report being consented to; multiple consents typically indicate a change of auditor across the periods presented.stateOfIncorporation: U.S. state codes (AL...WY) for U.S. jurisdictions and a separate set for foreign jurisdictions (codes beginning with A–Z for non-U.S. provinces and countries, e.g. A1 British Columbia, A0 Alberta, M0 Cayman Islands). For 20FR12G the value almost always denotes a non-U.S. jurisdiction by design.<TYPE>, <SEQUENCE>, <FILENAME>, <DESCRIPTION> lines at the head of each document are a stable, line-oriented header rather than valid XML or HTML. Extracting the inner HTML cleanly requires recognizing this envelope and stripping it before passing content to an HTML parser.documentFormatFiles[], metadata.json is the canonical index for traversing the record: it lists exhibits that are present and references those (graphics, rolled-up .txt) that are not stored locally.The filer is a foreign private issuer registering a class of equity or debt securities under Section 12(g) of the Securities Exchange Act of 1934. The issuer files in its own capacity as registrant. The disclosure content is the Form 20-F template, but the EDGAR submission type "20FR12G" designates the specific statutory purpose: an initial Section 12(g) class registration, distinct from the annual Form 20-F report and from Section 12(b) exchange-listing registrations.
To qualify as a foreign private issuer under Rule 405 of the Securities Act and Rule 3b-4 of the Exchange Act, a non-governmental, non-U.S. issuer must either (a) have 50% or less of its outstanding voting securities held by U.S. residents, or (b) if more than 50% are U.S.-held, have a majority of non-U.S. directors and officers, principal business operations outside the United States, and a majority of assets outside the United States. Non-U.S. issuers that fail this test register on Form 10 instead.
Within the FPI population, 20FR12G is used by:
Foreign sovereigns, supranational entities, and MJDS-eligible Canadian issuers are excluded. Canadian MJDS filers use Form 40FR12G; foreign governments use Schedule B-family submissions; U.S. domestic issuers use Form 10.
A 20FR12G filing arises from one of two paths.
Mandatory registration under Section 12(g)(1). A foreign private issuer must register a class of equity securities when, on the last day of its fiscal year, it has total assets exceeding $10 million (the threshold set by Rule 12g-1) and the class is held of record by either 2,000 or more persons, or 500 or more persons who are not accredited investors. Holders are counted under Rule 12g5-1, modified by Rule 12g3-2(a) to include only U.S. resident holders for foreign private issuers. An issuer that crosses the thresholds without an available exemption must file the registration statement within 120 days after the fiscal year-end on which the thresholds were exceeded.
Voluntary registration. A foreign private issuer below the mandatory thresholds, or one previously relying on Rule 12g3-2(b), may elect to register under Section 12(g) by filing 20FR12G. Common motivations include OTCQX/OTCQB participation, terminating Rule 12g3-2(b) reliance in connection with a registered offering, supporting Rule 144 resale, or preparing for a future national exchange listing.
Loss of the Rule 12g3-2(b) exemption. Rule 12g3-2(b) lets foreign private issuers avoid Section 12(g) registration if they maintain a primary trading market outside the U.S. and electronically publish specified home-country disclosure in English. Failing those conditions, or undertaking a disqualifying transaction, typically pushes the issuer onto 20FR12G (or 20FR12B if listing on a national exchange).
A 20FR12G registration statement becomes effective automatically 60 days after filing under Section 12(g)(1), subject to SEC authority to accelerate or postpone. For mandatory filers, the underlying filing deadline is 120 days after the triggering fiscal year-end. Once effective, the issuer becomes subject to Section 13(a) reporting: annual Form 20-F within four months of fiscal year-end and Form 6-K furnishings on the home-jurisdiction publication basis. 20FR12G is the entry point into ongoing FPI reporting, not a one-time act.
20FR12G/A amendments respond to Division of Corporation Finance staff comments, refresh stale financial statements, correct disclosure, or add exhibits. Because effectiveness is automatic at day 60, amendments cluster in the run-up to that date.
EDGAR-era electronic 20FR12G filings begin in September 1996, consistent with the phase-in of mandatory EDGAR filing for foreign private issuers. Pre-EDGAR Section 12(g) registrations exist on paper but are outside this dataset.
Because the Form 20-F template is reused across registration, periodic reporting, and amendments — and because several adjacent forms cover similar issuers or similar statutory triggers — confusion is common. The comparisons below isolate each neighbor by triggering statute, filer population, lifecycle stage, and content scope.
Same disclosure template, opposite lifecycle stage. Form 20-F is the recurring annual report filed by foreign private issuers already registered; 20FR12G is the one-time entry document that establishes that reporting obligation in the first place. Pick 20FR12G to study new entrants, time-to-effectiveness, or the baseline disclosure at registration; pick 20-F to study the ongoing reporting population. The two are sequential, not substitutes.
Identical template and filer type, different statutory subsection. Form 20FR12B covers securities listed on a national exchange (NYSE, Nasdaq); 20FR12G covers securities that do not list but cross the Section 12(g) holder-of-record threshold (commonly OTC-traded ADRs). Filer populations diverge in practice: 20FR12B issuers tend to be larger foreign companies pursuing a U.S. listing; 20FR12G issuers are more often smaller-cap or OTC-only registrants caught by the holder threshold. Use 20FR12B for U.S. listing decisions; use 20FR12G for non-listed Exchange Act entry.
Restricted to eligible Canadian issuers under the Multijurisdictional Disclosure System, which permits Form 40-F (Canadian AIF and home-jurisdiction financials) instead of Form 20-F. The 12(g)/12(b) split parallels the 20FR series exactly. Disclosure structure is not line-for-line comparable to 20FR12G. To capture the full population of non-listed foreign entrants, combine 20FR12G with 40FR12G; 20FR12G alone excludes Canadian MJDS filers.
Different statute, different purpose. F-1/F-3 register the offer and sale of securities under the 1933 Act and contain a prospectus; 20FR12G registers a class of securities for ongoing reporting under the 1934 Act and contains no offering. The two can run in parallel during a U.S. IPO, but a Section 12(g) registrant with no public offering files only 20FR12G. Use Form F-1/Form F-3 for offering activity; use 20FR12G for reporting-universe entry.
Same statutory section, opposite content profile. Form 8-A is the abbreviated registration used by issuers already reporting under the Exchange Act (typically post-F-1) — a brief securities description that incorporates substantive disclosure by reference. 20FR12G is the full standalone Form 20-F package filed by issuers entering the reporting system through the Exchange Act path itself. Many foreign issuers reach Section 12(g) registration via Form 8-A12G after an F-1 IPO rather than via 20FR12G, so the two datasets capture distinct entry pathways and are not interchangeable proxies for "new foreign registrants."
Event-driven furnishing, not registration. Form 6-Ks transmit material information already made public in the home jurisdiction (interim financials, press releases, regulatory announcements) between annual 20-Fs. Content overlap with 20FR12G is minimal: 6-Ks lack the comprehensive business description, audited annual financials, and risk factor disclosure that define a 20FR12G. Use 6-K for ongoing disclosure flow; use 20FR12G for the one-time initial baseline.
The bookend to 20FR12G. Form 15-12G terminates Section 12(g) reporting once the issuer falls below the holder threshold or otherwise becomes eligible to deregister. It is a short procedural certification, not a disclosure document. A complete reporting-lifecycle dataset for a foreign 12(g) issuer pairs the 20FR12G entry filing, the intervening 20-Fs, and the Form 15-12G exit filing.
A submission belongs in the 20FR12G dataset only when four conditions hold simultaneously: foreign private issuer, initial Exchange Act registration, Section 12(g) (non-exchange-listed) pathway, and full Form 20-F disclosure template. Change any one and the filing moves elsewhere — domestic issuers to Form 10 or 8-A, exchange-listed foreign issuers to 20FR12B, Canadian MJDS filers to 40FR12G, already-reporting issuers to 8-A12G, annual updates to Form 20-F, and offerings to F-1/F-3. For complete coverage of non-listed foreign entrants, combine 20FR12G with 40FR12G; for the listed counterpart, add 20FR12B; for subsequent activity by the same issuers, follow with 20-F.
The dataset is read by lawyers, bankers, investors, compliance staff, governance specialists, academics, journalists, and ML engineers, each pulling different sections of the same filing.
U.S. counsel advising foreign private issuers use the corpus as a precedent library. They mine peer 20FR12G and 20FR12G/A filings for jurisdiction-specific drafting of risk factors, dual-class and home-country governance exemptions, enforcement-of-judgments language, and PFIC or sanctions disclosure. The /A amendment chain shows how SEC staff comments reshape risk factors, related-party disclosures, and financial statement reconciliations. Output: first-draft registration statements, comment-response memos, country-and-sector drafting checklists.
Bankers structuring ADR programs, direct listings, and dual listings study comparable cross-listings. They focus on the business description, operating and financial review, share-capital and shareholder-structure exhibits, and depositary arrangements to build comparable books, calibrate filing-to-effectiveness timelines, and prepare pitches for prospective foreign issuer clients of similar size and region.
Sell-side and buy-side analysts covering ADRs and foreign-domiciled names use 20FR12G filings to initiate coverage on issuers that just entered U.S. reporting. Audited financials, segment data, the operating and financial review, and risk factors feed earnings models and valuation; directors-and-senior-management and major-shareholder tables map controlling-family, state-ownership, and VIE dynamics that drive minority-shareholder returns.
In-house disclosure counsel, controllers, and IR teams benchmark their own drafting against peers: risk factor structure, segment and geographic depth, the home-country-versus-NYSE/Nasdaq governance comparison, and reconciliation formatting. Output: internal disclosure checklists, audit-committee peer-comparison appendices, drafting guidance for subsequent annual 20-F reports.
Portfolio managers running international, EM, and frontier strategies use the filings as primary-source diligence at the moment a name enters the U.S. reporting net. Risk factors, related-party transactions, taxation (PFIC, withholding), and audited financials inform benchmark-inclusion validation, position sizing, and pre-purchase governance flags.
Onboarding teams at prime brokers, custodians, and clearing firms ingest legal name, jurisdiction of incorporation, share classes, depositary arrangements, beneficial ownership, and subsidiary lists from the metadata.json and standardized exhibits. These feed entity-master systems, sanctions screening, and tax-treaty eligibility workflows without re-keying from PDFs.
Forensic and short-focused analysts scrutinize related-party transactions, auditor identity and location, revenue recognition, segment reconciliations, contingent liabilities, and GAAP/IFRS reconciliations at the point of U.S. market entry. Diffing the original 20FR12G against /A amendments highlights where SEC staff pressed for more detail, flagging the line items most worth investigating.
Stewardship teams at large asset managers, proxy advisory firms, and governance research providers score newly registered foreign issuers using directors-and-senior-management, board committee, related-party, and home-country governance exemption sections. Output: governance ratings, voting guidance for first U.S.-era annual meetings, and engagement priority lists ranked by gap between home-country practice and U.S. exchange norms.
Researchers use the 1996-onward span and amendment history to study cross-listing choice, the bonding hypothesis, Section 12(b) versus 12(g) selection, and home-country versus U.S. disclosure differentials. The corpus supports event studies around registration effectiveness, longitudinal disclosure-quality work, and cross-jurisdiction NLP studies of risk-factor text.
Reporters covering cross-border listings, ADR markets, and EM equities pull risk factors, ownership tables, related-party disclosures, and audited financials for profiles, investigations, and explainers when a foreign issuer first becomes a U.S. registrant. The unified corpus replaces filing-by-filing EDGAR retrieval.
Strategy teams inside multinationals study foreign competitors that previously filed only at home. Business description, properties, segment financials, customer-concentration disclosures, and material-contracts exhibits feed market-entry analyses, pricing studies, and M&A target lists.
Teams building financial LLMs and retrieval systems treat the dataset as a structured cross-border disclosure corpus. HTML, TXT, and PDF documents alongside per-accession metadata.json support chunking, embedding, and entity-resolution pipelines; the /A amendment chain enables revision-aware training tasks such as redline detection, comment-letter response drafting, and point-in-time disclosure summarization without leakage.
Concrete workflows the Form 20FR12G Files Dataset supports, each tied to specific fields, items, or exhibits in the record.
U.S. counsel drafting Section 12(g) registrations mine peer filings for jurisdiction- and sector-matched language. They filter records by entities[].stateOfIncorporation and entities[].sic, then extract Item 3 risk factors, Item 7 related-party disclosures, Item 10 enforcement-of-judgments and exchange-controls sections, and the home-country governance discussion. Diffing the original 20FR12G against its 20FR12G/A chain reveals where SEC staff pressed for revision, producing a comment-driven drafting checklist and first-draft language for new clients.
Researchers and capital-markets desks isolate the universe of foreign private issuers that crossed Section 12(g) thresholds without an exchange listing by enumerating all formType = "20FR12G" accessions and joining on entities[].cik, fileNo, stateOfIncorporation, sic, and filedAt. Pairing with later Form 20-F dataset accessions and Form 15-12G deregistrations yields a clean entry-to-exit reporting-lifecycle panel for cross-listing, bonding-hypothesis, and time-to-effectiveness studies.
Compliance and reference-data teams ingest entities[] directly into entity-master systems: cik, companyName, fileNo, irsNo, stateOfIncorporation, fiscalYearEnd, and sic populate the issuer record without re-keying. The primary 20-F's cover page supplies the registered office address, agent for service, and authorized share counts; the subsidiaries exhibit (EX-8.*) feeds group-structure mapping for sanctions screening and tax-treaty eligibility workflows.
Sell-side and buy-side analysts initiating coverage at the moment of U.S. registration extract Item 5 (operating and financial review), Item 8 audited financials, Item 6 director-and-senior-management compensation, and the Item 7 major-shareholder table to build first-pass earnings models, identify controlling-family or state-ownership concentrations, and flag VIE structures. The Item 3 risk factors and Item 10 taxation discussion drive PFIC, withholding, and benchmark-inclusion notes for pre-purchase memos.
Stewardship and proxy-advisory teams score newly registered foreign issuers by parsing the Item 6 board-practices section and the Item 16G "significant ways in which corporate governance practices differ" disclosure (where present) against U.S. exchange standards. Combined with EX-1.* charter and bylaw exhibits, the output is a per-issuer governance gap score, a voting recommendation for the first U.S.-era annual meeting, and an engagement priority list.
Forensic accountants and short-side researchers track every 20FR12G/A against its parent accession to surface the items SEC staff most often forced revised: revenue recognition footnotes, segment reconciliations, related-party detail, and contingent liabilities. Counting and dating EX-15.* consent files across the amendment chain identifies auditor changes (multiple consents from different firms covering different fiscal years) and dating issues that warrant closer review of the underlying audit reports.
ML teams build cross-border disclosure corpora by stripping the SGML <DOCUMENT> envelope, splitting payloads by 20-F Item, and joining chunks to metadata.json fields (formType, filedAt, entities[].stateOfIncorporation, sic). The original-versus-/A pairing supports revision-aware training tasks: redline prediction, comment-letter response drafting, and point-in-time risk-factor summarization with clean temporal splits to prevent leakage between amendment versions.
Dataset Index JSON API: https://api.sec-api.io/datasets/form-20fr12g-files.json
This endpoint returns metadata about the Form 20FR12G Files dataset, including its name, description, last updated timestamp, earliest sample date (1996-09-01), total record and size counters, covered form types (20FR12G, 20FR12G/A), container format (ZIP), and the file types contained within each archive (TXT, JSON, HTML, PDF). The response also includes the full dataset download URL and a list of all individual container files with per-container metadata such as size, record count, last updated timestamp, and a direct download URL. This endpoint does not require an API key.
The response can be polled to monitor which containers have been refreshed in the most recent update run, allowing you to selectively download only the containers that changed on a given day rather than re-downloading the entire archive.
Example response:
1
{
2
"datasetId": "1f13365b-9ae0-69a1-b444-99ddb0887c45",
3
"datasetDownloadUrl": "https://api.sec-api.io/datasets/form-20fr12g-files.zip",
4
"name": "Form 20FR12G Files Dataset",
5
"updatedAt": "2026-04-15T12:04:40.004Z",
6
"earliestSampleDate": "1996-09-01",
7
"totalRecords": 7193,
8
"totalSize": 318938287,
9
"formTypes": ["20FR12G", "20FR12G/A"],
10
"containerFormat": "ZIP",
11
"fileTypes": ["TXT", "JSON", "HTML", "PDF"],
12
"containers": [
13
{
14
"downloadUrl": "https://api.sec-api.io/datasets/form-20fr12g-files/2026/2026-04.zip",
15
"key": "2026/2026-04.zip",
16
"size": 13818783,
17
"records": 154,
18
"updatedAt": "2026-04-15T12:04:40.004Z"
19
}
20
]
21
}
Download Entire Dataset: https://api.sec-api.io/datasets/form-20fr12g-files.zip?token=YOUR_API_KEY
Downloads the complete dataset as a single ZIP archive containing all Form 20FR12G and 20FR12G/A filings from September 1996 to the present. This endpoint requires an API key.
Download Single Container: https://api.sec-api.io/datasets/form-20fr12g-files/2026/2026-04.zip?token=YOUR_API_KEY
Downloads one monthly container archive instead of the full dataset. Use the downloadUrl values from the dataset index JSON API to target specific months. This endpoint requires an API key.
The dataset covers EDGAR submission types 20FR12G (initial Section 12(g) Exchange Act registration filed by a foreign private issuer using the Form 20-F template) and 20FR12G/A (amendments to a previously filed 20FR12G). Both share the Form 20-F disclosure structure but are filed for the purpose of class registration rather than annual reporting.
One record represents one EDGAR accession — a single 20FR12G or 20FR12G/A submission, materialized as an accession folder named with the eighteen-digit dash-stripped accession number. The folder contains a single metadata.json envelope describing the submission and the original document files EDGAR received: the primary Form 20-F registration statement plus its non-image exhibits.
A foreign private issuer (as defined under Rule 405 of the Securities Act and Rule 3b-4 of the Exchange Act) must file 20FR12G when, on the last day of its fiscal year, it has total assets exceeding $10 million and a class held of record by 2,000 or more persons or 500 or more non-accredited investors, counting only U.S. resident holders. Foreign private issuers may also file voluntarily — for example, to support OTCQX/OTCQB participation, Rule 144 resale, or to terminate Rule 12g3-2(b) reliance.
A 20FR12G registration statement becomes effective automatically 60 days after filing under Section 12(g)(1), subject to the SEC's authority to accelerate or postpone. For mandatory filers, the underlying filing deadline is 120 days after the triggering fiscal year-end. Because effectiveness is automatic at day 60, 20FR12G/A amendments responding to staff comments commonly cluster in the run-up to that date.
Both use the same Form 20-F disclosure template, but they sit at opposite ends of the reporting lifecycle. 20FR12G is the one-time entry document that establishes the Exchange Act reporting obligation; Form 20-F annual report dataset is the recurring annual report filed thereafter. Use 20FR12G to study new entrants and time-to-effectiveness; use Form 20-F to study the ongoing FPI reporting population.
EDGAR-era electronic 20FR12G filings begin on 1996-09-01 and continue to the present. Records are delivered as monthly ZIP containers; inside each container, document payloads appear as HTML/HTM, plain-text TXT, JSON (the per-record metadata.json manifest), and PDF, with the format mix shifting from predominantly TXT in the late 1990s to predominantly HTM from the 2010s onward.
No. Image and graphic attachments (.jpg, .gif, .png) referenced from the HTML and listed under type: "GRAPHIC" in documentFormatFiles[] remain catalogued in metadata but are not bundled into the record on disk. The rolled-up "Complete submission text file" that EDGAR generates by concatenating all document parts is referenced via linkToTxt in metadata.json but is not stored locally; the individual document parts are stored instead.