The Form SE Files Dataset is a closed corpus of EDGAR submissions on SEC Form SE — the cover form prescribed for transmitting paper-format exhibits that accompany filings otherwise required to be made electronically under Regulation S-T. Each record represents a single Form SE accession, identified by its 18-digit EDGAR accession number, and pairs a structured metadata.json filing header with the original scanned PDF carrying the Form SE cover sheet and the paper exhibit it transmits. The form is filed by any electronic filer on EDGAR — Securities Act registrants, Exchange Act reporting companies, Trust Indenture Act filers, and Investment Company Act registrants — when Rule 201 (temporary hardship), Rule 202 (continuing hardship), or Rule 311 (paper-format exhibits) of Regulation S-T permits or compels paper submission. The dataset begins in February 2001 and is updated as new Form SE filings are accepted by EDGAR.
Programmatically retrieve the full list of dataset archive files, download URLs and dataset metadata.
Dataset Index JSON API
Download the entire dataset as a single archive file.
Download Entire Dataset:
Download a single container file (e.g. monthly archive) from the dataset.
Download Single Container:
The Form SE Files Dataset captures every Form SE submission accepted by EDGAR. Form SE is the cover form prescribed by the SEC for the submission of paper-format exhibits relating to filings that are otherwise made electronically through EDGAR. It is adopted under four statutes simultaneously — the Securities Act of 1933, the Securities Exchange Act of 1934, the Trust Indenture Act of 1939, and the Investment Company Act of 1940 — and operates in conjunction with Rules 201, 202, and 311 of Regulation S-T.
The form's instructions require the filer to send four complete paper copies of both the cover form and the accompanying exhibit to the Commission. EDGAR captures the cover form, and typically a scanned image of the underlying paper exhibit, as the electronic record of that paper transmission. The form thus functions as a bridge: it lives inside EDGAR as a structured filing with a regular accession number, but it points outward to a paper original held physically by the Commission.
A distinctive consequence of the paper-hardship origin is the EDGAR pseudo-CIK 9999999997. EDGAR uses this reserved CIK in the filer position of accession numbers issued for paper-only and hardship submissions. Form SE accession numbers therefore characteristically take the shape 9999999997-YY-NNNNNN, even though the substantive registrant — the company whose paper exhibit is being transmitted — has its own real CIK that appears inside entities[].cik and inside the /edgar/data/<cik>/ segment of the document URLs. The dataset is distributed as monthly ZIP containers and ships only PDF and JSON file types; the dataset window begins February 2001 and extends to the present.
A single record in the Form SE Files dataset corresponds to one Form SE submission to EDGAR, identified by its 18-digit EDGAR accession number. Physically, the record is an accession folder living one level beneath a YYYY-MM/ directory inside a year-month ZIP container. The folder name is the accession number with hyphens stripped (for example 999999999721004917 for accession 9999999997-21-004917). Inside that folder sit two kinds of artifacts that together constitute the record:
metadata.json describing the filing in machine-readable form, andA record has two structural layers:
metadata.json) capturing the filing header, party identifiers, document inventory, and EDGAR-side links exactly once per accession.The aggregate SGML submission envelope (the .txt file EDGAR generates for every accession) is referenced by URL inside the metadata but is not bundled inside the ZIP. Image attachments from the original submission are likewise excluded by dataset policy. The file-types shipped in the dataset are PDF and JSON.
The folder is the atomic record unit. Its name is the 18-digit hyphen-stripped accession number, globally unique within EDGAR. The path shape is YYYY-MM/<accessionNoNoDashes>/. Contents are flat — there are no nested subdirectories.
metadata.jsonEvery record contains exactly one metadata.json, always under that exact name, on the order of 1–2 KB in size. It conveys, in a machine-readable structure, the same filing-header information that EDGAR exposes on its filing-index page, plus a manifest of the documents that belong to the submission. The fields populated for Form SE records are:
formType — the form code, always "SE" for this dataset.accessionNo — the canonical hyphenated accession number, e.g. "9999999997-21-004917". The 9999999997 prefix reflects the paper-hardship pseudo-CIK convention.filedAt — ISO 8601 timestamp with timezone offset (Eastern, e.g. "2021-09-27T15:47:52-04:00") marking EDGAR acceptance of the cover form.effectivenessDate — ISO date (YYYY-MM-DD) marking the filing's effective date, frequently the same calendar day as filedAt.description — free-text label, typically "Form SE - Exhibits".linkToFilingDetails — absolute URL into https://www.sec.gov/Archives/edgar/data/<cik>/<accessionNoNoDashes>/ resolving to the primary exhibit document (the PDF).linkToTxt — URL to the complete SGML-wrapped submission .txt on EDGAR; the envelope itself is not included in the ZIP.linkToHtml — URL to the EDGAR filing-index HTML page for the accession (the -index.htm page).linkToXbrl — empty string for Form SE; this form type does not carry XBRL.documentFormatFiles[] — ordered array of document descriptors. Each entry contains sequence (a 1-based EDGAR document sequence number as a string, or a single space " " for the synthetic complete-submission entry), size (byte size as a string), documentUrl (absolute URL on sec.gov), type (the EDGAR document type code: "SE" for the cover/exhibit document, a single space " " for the aggregate text file), and, for the aggregate-text entry, description: "Complete submission text file". Entries with substantive type codes resolve to files physically present in the folder; the aggregate-text entry is an EDGAR-only reference.dataFiles[] — empty for Form SE; the form has no associated financial-report or XBRL data files.seriesAndClassesContractsInformation[] — empty for Form SE; this array is reserved for investment-company series/class identifiers.entities[] — array of party objects describing registrants and other parties of record. Each object carries:
cik — the real registrant CIK (distinct from the pseudo-CIK in the accession), e.g. "1679198".companyName — name with a parenthetical role suffix such as " (Filer)" or " (Subject)".type — the party-level form code, typically "SE".fileNo — SEC file number tying the paper exhibit to its related electronic registration (e.g. "333-213968").irsNo — IRS Employer Identification Number; often "000000000" when not assigned (e.g. for foreign sovereigns).fiscalYearEnd — four-digit MMDD string (e.g. "0331" for a March 31 fiscal year-end).stateOfIncorporation — EDGAR's two-character jurisdiction code, including foreign codes such as "M0" for Japan.sic — four-digit SIC industry code with descriptive label, e.g. "8888 Foreign Governments".act — numeric code identifying the statute under which the filing is made (for example "98" for the foreign-government carve-out series); these codes correspond to the four Acts under which Form SE is adopted.filmNo — the EDGAR film number assigned at acceptance, e.g. "211281472".id — internal opaque 32-hex-character identifier uniquely keying the record within the dataset.The substantive content of a Form SE filing lives in the exhibit document, which is in practice a single PDF placed alongside metadata.json. The PDF holds:
Filenames are preserved verbatim from the original EDGAR submission with no normalization. They are typically short and lowercased, frequently built from an issuer abbreviation plus the se mnemonic (for example japanse.pdf). The structural invariant is that any documentFormatFiles[] entry whose type is not the single-space placeholder resolves to a sibling file in the same folder.
A record may, in principle, carry multiple exhibit files if the original submission listed more than one document, in which case each appears as a sibling alongside metadata.json. Single-PDF records dominate.
metadata.json capturing the filing header, document manifest, and entities..gif, .jpg) — excluded by dataset policy; the dataset ships only PDF and JSON..txt submission envelope — referenced by URL via linkToTxt and as a documentFormatFiles[] entry, but the file itself is not bundled.linkToXbrl is empty and dataFiles[] is empty.Form SE has been remarkably stable across the dataset's February 2001 to present window. Its statutory authority (the four Acts named on the form) and its operating rules within Regulation S-T (Rules 201, 202, 311) have not undergone substantive structural change. The cover-sheet content set — electronic filer identification, reference to the related electronic filing, identification of the paper exhibit, statement of the rule relied on, and signature — has been required throughout.
What has shifted materially is the scope of paper-eligible exhibits. As Regulation S-T's electronic-filing mandate has tightened over successive amendments, the universe of exhibits eligible for paper submission under Rule 311 has narrowed, and hardship-exemption use under Rules 201 and 202 has been progressively constrained, with the Commission encouraging electronic re-submission once technical difficulties are resolved. The empirical consequence is the extreme sparsity of Form SE filings across the dataset window: the form is increasingly used only for the residual categories where paper remains appropriate, such as foreign-sovereign-style filings and certain certified or oversized exhibits.
The EDGAR pseudo-CIK 9999999997 convention for paper/hardship accession numbering has been in continuous use across the dataset window and is not a recent innovation; it is the mechanism by which EDGAR issues an accession number to a submission whose filer-side identification flows through the paper-hardship channel rather than through the registrant's own CIK.
Form SE exists specifically to register paper exhibits, and the form has never carried XBRL or structured financial data; the dataset reflects that uniformly. The relevant format evolution concerns how the paper exhibit is represented electronically inside the accession:
The metadata schema itself is consistent across the dataset: every record carries the same metadata.json field set regardless of filing year, so downstream consumers can parse all records uniformly.
9999999997, which is not a real company. The substantive registrant CIK lives inside entities[].cik, and the same CIK appears in the /edgar/data/<cik>/ path component of linkToFilingDetails, linkToTxt, and linkToHtml.documentFormatFiles[] mixes two kinds of entries: actual sibling files in the folder (with type codes such as "SE" and a numeric sequence like "1") and the synthetic complete-submission text file entry (with both type and sequence set to a single space " " and a description of "Complete submission text file"). Only the former resolve to files inside the ZIP; the latter is a reference to an EDGAR-hosted artifact that the dataset deliberately does not bundle.companyName (e.g. "... (Filer)", "... (Subject)"), not as a separate field. Multi-party Form SE filings list each party as a separate object in entities[] with the appropriate suffix.act codes. The numeric act field on each entity identifies the statute under which the filing is being made; these codes correspond to the four Acts under which Form SE is adopted (Securities Act of 1933, Exchange Act of 1934, Trust Indenture Act of 1939, Investment Company Act of 1940), with additional codes such as "98" used for specialized programs (foreign-government issuers, asset-backed and similar carve-outs).sequence: "1" entry of documentFormatFiles[] and resolve its documentUrl to the corresponding sibling file by basename.metadata.json fields. The entities[].fileNo value (e.g. "333-213968") often coincides with the file number of the related registration and is the most reliable structured hook for cross-linking.Form SE is filed by an electronic filer on EDGAR that needs to transmit a paper-format exhibit in connection with an otherwise electronic submission. It is a transmittal cover, not a substantive disclosure form. The legally responsible filer is the same registrant, reporting person, or third-party filer responsible for the related electronic filing; counsel, financial printers, or filing agents typically prepare and submit the paper package on the filer's behalf, but the obligation runs to the electronic filer of record.
The Form SE population spans every filer class operating under the four statutes the form is adopted under:
Form SE is filer-class agnostic within this universe. A "filer" subject to a paper exhibit obligation may be the issuer itself or a third party (bidder, beneficial owner, acquirer) whose underlying filing carries an exhibit not transmissible electronically.
Form SE is transactional and event-driven, not periodic. It arises only when one of three Regulation S-T (17 CFR Part 232) provisions permits or compels paper submission of a document or exhibit:
In practice, Rule 311 incorporations-by-reference of pre-EDGAR paper exhibits are the most common operative trigger; Rule 201 emergency filings are rare; Rule 202 filings exist only on Commission grant.
Form SE has no independent calendar. Its timing is anchored to the related electronic filing:
The procedural requirement under Regulation S-T is the submission of the prescribed paper copies of both the Form SE cover and the accompanying paper exhibit; the Commission assigns the submission an EDGAR accession number and exposes the metadata electronically.
Form SE occupies a narrow corner of EDGAR: it is a transmittal cover for paper-format exhibits attached to filings that are otherwise required to be electronic. Because it is neither a standalone disclosure nor an ordinary exhibit, it is easily confused with several adjacent record types — Form TH, Form CB, regular electronic exhibits, other paper-only filings under Rule 311, and confidential treatment requests.
Form TH is the closest neighbor by regulatory framework but addresses a different scenario.
Form CB is sometimes filed in paper, but the rationale is unrelated to hardship.
The closest content comparison is to ordinary electronic exhibits (Exhibits 2, 99, etc.) attached to 10-K, 10-Q, 8-K, S-1, N-1A, and similar filings.
A handful of filings have historically been permitted or required on paper under Rule 311 — for example, Form 144 prior to its mandatory electronic filing in 2023, Form 6, certain earlier-era Regulation A items, and miscellaneous exempt-issuer filings.
Confidential Treatment Requests are often confused with hardship-paper exhibits because both historically involved paper submissions and material not freely available in electronic EDGAR.
Form SE is distinctive because it is a transmittal mechanism, not a disclosure form. It exists at the seam between the electronic-filing regime and the residual category of documents that must or may be lodged on paper. Form TH handles temporary, whole-filing technical hardship; Form CB handles a specific cross-border exemption; regular electronic exhibits cover the inline default; other Rule 311 paper forms are themselves substantive submissions; CTRs handle confidentiality rather than format. None of them answers the question "what paper exhibits accompany an otherwise electronic filing, and what do they contain?" For that question — and for reconstructing an issuer's complete exhibit record — Form SE Files is the only source that captures both the cover-sheet linkage to the related electronic accession and the scanned paper exhibit itself.
Each Form SE record points to a paper exhibit invisible inside EDGAR's inline document tree, so the user base is narrow and specific: people who must reconstruct a complete filing, audit their own paper-exhibit history, or study how the paper-to-electronic transition played out.
Disclosure counsel and supporting paralegals use the dataset to reconstruct complete exhibit sets for registration statements, prospectuses, indentures, or Investment Company Act filings whose exhibit list points to a paper-only attachment. The entities[].fileNo, the related-form references on the cover sheet, and the description field in metadata.json map the SE record back to the line in the parent filing's exhibit index; the PDF documents carry the operative legal text — typically an indenture, trust agreement, long-form contract, or voluminous schedule. Workflow: closing binders, legal opinions, precedent review, and litigation discovery where inline EDGAR documents leave a placeholder.
In-house disclosure staff and outside compliance counsel use the dataset to audit a registrant's own historical submissions whenever Rule 201, 202, or 311 was invoked. The entities[].cik, filedAt timestamp, and the cover-sheet text inside the PDF confirm each paper exhibit was correctly cross-referenced and properly cited; the cover sheet verifies the logged exhibit number. Workflow: disclosure-controls testing, exhibit-inventory reconciliation, and responding to regulator inquiries about historical filings.
Engineers building filing-reconstruction pipelines and EDGAR-mirror systems use the dataset to patch a known gap: SE exhibits are not inline and are silently dropped by scrapers that walk only the primary document tree. The entities[].fileNo and the cover-sheet's stated parent form type provide the join keys that attach each SE record to its parent filing in a unified document graph; the PDFs feed OCR and text-extraction stages that normalize the content for search. Workflow: ensuring "complete filing" retrieval surfaces actually contain every exhibit, not a truncated subset.
Diligence teams reviewing a target's historical indentures, shelf takedowns, or fund-complex documentation use the dataset when the underlying contract — a long-form indenture, master trust agreement, or collateral schedule — was filed on paper under hardship. The entities[].cik, entities[].fileNo, and the cover-sheet description locate the right SE record; the PDF is the diligence material itself. Workflow: pulling the operative paper exhibit into the data room, comparing against later amendments, and confirming the terms of instruments that may still be live.
Regulator staff and disclosure-practice researchers use the dataset as a complete population — not a sample — for studying how Regulation S-T's paper-permitted provisions have been exercised. The filedAt date, the rule cited on the cover sheet (201, 202, or 311), the parent form type, and entities[] identifiers support frequency analysis by filer category, exhibit type, and year; the PDFs supply the substantive context for why paper was used. Workflow: descriptive studies of a vanishing filing practice and evidence for rulemaking on whether paper-permitted provisions remain necessary.
Archivists and librarians working with pre-EDGAR documents incorporated by reference into later electronic filings use the dataset as a bridge between the paper and electronic eras. The filedAt date, entities[] identifiers, and linkToFilingDetails locate the digital surrogate; the PDF is the archival object. Workflow: cataloguing and preserving electronic surrogates of paper exhibits so researchers do not have to chase physical copies.
Across all six groups, value comes from the same pairing — metadata.json fields that link the SE record to its parent electronic filing, plus the PDF carrying the substantive paper exhibit. Lawyers and diligence teams reconstruct exhibit sets; registrant compliance staff audit their paper-exhibit history; data engineers prevent silent gaps in filing pipelines; regulators and researchers treat the corpus as a closed population; archivists preserve the digital surrogates. In each case, programmatic retrieval is the only practical way to surface these records inside EDGAR.
The use cases below tie directly to the cover-sheet PDF and the metadata.json linkage fields that connect each SE accession to its related electronic filing.
Disclosure counsel pulls the SE record whose entities[].fileNo matches the file number on a parent S-1, S-3, or 424 prospectus and reads the cover sheet inside the PDF to identify the exhibit number and form type referenced. The scanned exhibit (typically a long-form indenture, trust agreement, or collateral schedule) is then dropped into the closing binder or litigation production alongside the inline electronic exhibits, eliminating the placeholder that EDGAR's inline document tree leaves in the parent filing.
Data-engineering teams that walk /edgar/data/<cik>/ document trees miss SE exhibits because the parent accession's exhibit index merely notes "filed in paper pursuant to Form SE." Using entities[].cik and entities[].fileNo from metadata.json as join keys, the pipeline attaches each SE PDF to its parent accession in a unified document graph, then feeds the PDFs through OCR so full-text search returns hits inside the paper exhibit rather than a truncated subset.
Researchers studying the residual paper-eligible filing population query entities[].sic (e.g. 8888 Foreign Governments), entities[].stateOfIncorporation (foreign codes such as M0 for Japan), and entities[].act (e.g. 98 for the foreign-government series) across the corpus to enumerate exactly which issuer categories still rely on Form SE. The cover sheet's stated rule (201, 202, or 311) is read from the PDF to classify each filing as temporary hardship, continuing hardship, or a Rule 311 format-impractical exhibit.
In-house disclosure staff filter the dataset by entities[].cik to surface every Form SE the registrant has ever filed, then cross-check each PDF cover sheet's stated file number, related form type, and exhibit reference against the registrant's internal exhibit log. The filedAt timestamp and filmNo from metadata.json confirm EDGAR acceptance details, supporting disclosure-controls testing and responses to staff comment letters about historical exhibits.
When a target's debt stack or fund complex includes an indenture, master trust agreement, or supplemental collateral schedule that was lodged on paper under Rule 311, diligence teams locate the SE accession via the target's CIK and the related fileNo, then extract the PDF as the operative contract text. The exhibit is compared against later electronic amendments and supplements pulled from the parent registration to confirm currently effective terms.
Disclosure-practice researchers treat the corpus as a complete population and chart filedAt year against the rule cited on each cover sheet, the entities[].sic industry code, and the parent form type referenced inside the PDF. The resulting frequency tables support descriptive studies of how Rule 201, 202, and 311 use has narrowed across successive Regulation S-T amendments and provide evidence for rulemaking on whether the paper-permitted provisions remain necessary.
Dataset Index JSON API: https://api.sec-api.io/datasets/form-se-files.json
This endpoint returns dataset-level metadata along with the list of available container files. The metadata includes the dataset name, description, last updated timestamp, earliest sample date, total record count and total size, the form types covered, the container format, and the content file types. For each container, the response includes its key, size, record count, last updated timestamp, and a direct download URL. The endpoint can be polled regularly to detect which containers were modified in the most recent refresh, so only changed monthly archives need to be re-downloaded. This endpoint does not require an API key.
Example response:
1
{
2
"datasetId": "1f13365b-9ae0-696a-8804-c2e4dd43cd97",
3
"datasetDownloadUrl": "https://api.sec-api.io/datasets/form-se-files.zip",
4
"name": "Form SE Files Dataset",
5
"updatedAt": "2026-04-15T07:58:12.211Z",
6
"earliestSampleDate": "2001-02-01",
7
"totalRecords": 53,
8
"totalSize": 1871716917,
9
"formTypes": ["SE"],
10
"containerFormat": "ZIP",
11
"fileTypes": ["PDF", "JSON"],
12
"containers": [
13
{
14
"downloadUrl": "https://api.sec-api.io/datasets/form-se-files/2026/2026-03.zip",
15
"key": "2026/2026-03.zip",
16
"size": 13818783,
17
"records": 2,
18
"updatedAt": "2026-04-15T07:58:12.211Z"
19
}
20
]
21
}
Download Entire Dataset: https://api.sec-api.io/datasets/form-se-files.zip?token=YOUR_API_KEY
Downloads the complete Form SE Files dataset as a single ZIP archive containing every monthly container from February 2001 to the latest refresh. This endpoint requires an API key.
Download Single Container: https://api.sec-api.io/datasets/form-se-files/2026/2026-03.zip?token=YOUR_API_KEY
Downloads one monthly container ZIP, useful when only a specific period is needed or when synchronizing recently updated months identified through the dataset index API. This endpoint requires an API key.
The dataset covers SEC Form SE, the cover form prescribed by the Commission for the submission of paper-format exhibits relating to filings that are otherwise made electronically through EDGAR. It is adopted under the Securities Act of 1933, the Securities Exchange Act of 1934, the Trust Indenture Act of 1939, and the Investment Company Act of 1940, and operates in conjunction with Rules 201, 202, and 311 of Regulation S-T.
One record corresponds to a single Form SE submission to EDGAR, identified by its 18-digit accession number and stored as an accession folder beneath a YYYY-MM/ directory inside a year-month ZIP container. Each folder contains a metadata.json filing header and one or more original EDGAR documents — almost always a single scanned PDF carrying both the Form SE cover sheet and the paper exhibit it transmits.
Form SE is filed by any electronic filer on EDGAR — including Securities Act registrants, Exchange Act reporting companies and third-party filers, Trust Indenture Act filers, and Investment Company Act registrants — that needs to transmit a paper-format exhibit in connection with an otherwise electronic submission. The legal obligation runs to the electronic filer of record for the related filing, even when counsel, financial printers, or filing agents prepare and submit the paper package.
Form SE is event-driven, not periodic. It is triggered by one of three Regulation S-T provisions: Rule 201 (temporary hardship exemption when unanticipated technical difficulties block timely electronic submission), Rule 202 (a continuing hardship exemption granted by the Commission), or Rule 311 (paper exhibits permitted because their physical form resists electronic conversion or because they pre-date the filer's EDGAR mandate and are incorporated by reference).
The dataset begins on February 1, 2001 and extends to the present, with new monthly containers added as additional Form SE filings are accepted by EDGAR. Because Form SE is rare, many monthly containers hold zero or one record; multi-month gaps between filings are normal and do not indicate missing data.
The dataset is distributed as monthly ZIP containers organized by YYYY-MM/ directory. Each accession folder ships only PDF and JSON files: the metadata.json filing header and the original-format paper-exhibit PDF preserved with its EDGAR filename. Image attachments and the aggregate SGML submission envelope are excluded by dataset policy, though the envelope remains reachable via the linkToTxt URL inside metadata.json.
Form TH is the electronic notification a filer submits to invoke Rule 201's temporary hardship exemption when an unanticipated technical failure prevents timely electronic submission of an entire filing; the underlying document goes in on paper and must be re-submitted electronically within six business days. Form SE, by contrast, is a transmittal cover for one or more discrete paper exhibits while the parent filing remains electronic, with no downstream electronic re-filing obligation for the paper exhibit itself. A single Rule 201 episode can produce both a Form TH notification and a Form SE paper transmittal.