The Form AW Files Dataset is a per-filing collection of every Form AW submission accepted by EDGAR — the Rule 477 letter that a Securities Act of 1933 registrant uses to request Commission consent to withdraw a single amendment to one of its registration statements. Each record is one EDGAR accession, identified by its 18-digit accession number and delivered as a folder containing a structured metadata.json manifest plus the original AW letter document(s) in their native form. The filer is always the registrant of the underlying registration statement (or its successor-in-interest); the substantive payload is a short procedural letter naming the registration, the specific amendment being retracted, and the Rule 477 representation that no securities were sold under it. Coverage runs from 1994 — the EDGAR phase-in for Securities Act submissions — to the present, with monthly container refreshes. The dataset is distributed as monthly ZIP archives partitioned under year directories.
Programmatically retrieve the full list of dataset archive files, download URLs and dataset metadata.
Dataset Index JSON API
Download the entire dataset as a single archive file.
Download Entire Dataset:
Download a single container file (e.g. monthly archive) from the dataset.
Download Single Container:
The dataset captures Form AW filings made under Rule 477 of the Securities Act of 1933. Form AW is the EDGAR submission type used by registrants to ask the Commission for consent to withdraw an amendment to a Securities Act registration statement. It is not itself a registration statement: it is a short procedural letter that points back at another, already-filed registration statement (typically an S-1, S-3, F-1, N-14, POS AM, or any of their /A variants) and asks that one specific amendment to that statement be removed from the active filing record. Rule 477 conditions consent on the absence of any sale of securities under the affected amendment, so the letter must affirmatively make that representation.
The substantive payload of a Form AW filing is small: the registrant's identity, the file number of the registration statement involved, the precise amendment being withdrawn (form type, date, accession), and a brief statement of the reason. The dataset preserves the full original EDGAR document for each accession alongside a normalized JSON manifest that exposes the header fields needed to join AW records to their parent registrations. Coverage starts at the 1994 EDGAR phase-in for Securities Act submissions; equivalent paper withdrawal letters predating EDGAR are not represented. The file-types found in the dataset are TXT, JSON, HTML, and PDF, although in practice a Form AW record almost always consists of metadata.json plus a single HTML letter; PDF and TXT attachments appear only when a filer chose to submit the letter or supporting material in those formats, which is uncommon.
One record in the Form AW Files Dataset is exactly one Form AW submission as accepted by EDGAR — that is, one Rule 477 amendment-withdrawal request filed under the Securities Act of 1933, identified by a unique 18-digit EDGAR accession number. On disk, each record is materialised as a per-filing folder that pairs a structured manifest (metadata.json) with the original EDGAR submission documents associated with that accession, minus image attachments and the EDGAR-generated complete-submission .txt envelope. The folder is the unit of record: its accession number is the primary key, and everything inside the folder describes a single registrant's request to withdraw a single pre-effective or post-effective amendment to a previously filed Securities Act registration statement.
The dataset is delivered as a ZIP archive partitioned into monthly slices — one ZIP per calendar month under a year directory (for example 2025/2025-07.zip). Inside a monthly ZIP, each accession is its own subfolder whose name is the dehyphenated 18-digit accession number: accession 0001104659-25-068305 becomes folder 000110465925068305. The dashed canonical form of the accession is preserved inside metadata.json.accessionNo, so the folder name and the manifest can always be reconciled.
Inside one accession folder there are exactly two kinds of artefacts:
metadata.json — the canonical per-filing manifest, always present, always named identically.*.htm file containing the Rule 477 letter, with filer-chosen filenames such as tm2520604d4_aw.htm, ea0246832-05_aw.htm, esgh-20250702_s1awd.htm, or simply formaw.htm.Records are extremely compact: a typical AW filing decompresses to a few kilobytes because the letter itself is short and image attachments are not packaged.
metadata.jsonmetadata.json is the structured summary of the EDGAR submission and the join point between the on-disk documents and external EDGAR resources. The top-level fields carried by the dataset are:
formType — always "AW" for records in this dataset.accessionNo — the EDGAR accession number in canonical dashed form (e.g. 0001520138-25-000196).description — the static label "Form AW - Amendment Withdrawal Request".filedAt — ISO-8601 timestamp with timezone offset reflecting EDGAR's acceptance time (e.g. 2025-07-28T13:19:44-04:00).linkToFilingDetails — absolute URL of the primary AW HTML document on www.sec.gov.linkToTxt — URL of the complete SGML/text submission on EDGAR (the wrapper that is not bundled in the ZIP).linkToHtml — URL of the EDGAR -index.htm filing index page for human browsing.linkToXbrl — empty for AW filings.documentFormatFiles — array of every document attached to the original EDGAR submission.dataFiles — empty for AW filings.seriesAndClassesContractsInformation — empty array for AW filings.entities — array of filer entities; for AW this is a single registrant.id — 32-character hex identifier uniquely tagging the record.documentFormatFiles[]Each element describes one document in the original EDGAR submission, regardless of whether that document is physically present in the ZIP. Fields:
sequence — EDGAR sequence number; "1" for the AW letter itself, "2" and higher for graphics or other exhibits, and a single space " " for the EDGAR-generated complete-submission text wrapper.type — EDGAR document type tag, such as "AW" for the letter, "GRAPHIC" for embedded images, or " " for the complete submission.description — human label such as "AW", "GRAPHIC", or "Complete submission text file".documentUrl — absolute SEC.gov URL to the document.size — byte size of the document, expressed as a string.The first item is invariably the AW letter; subsequent items, when present, are typically a logo image_001.jpg (sequence 2, type GRAPHIC) and the auto-generated complete-submission .txt. Both are enumerated in this array even though they do not travel inside the ZIP.
entities[]For Form AW the array contains a single element representing the registrant whose registration-statement amendment is being withdrawn. Fields:
companyName — registrant name with a parenthetical role suffix (e.g. "VIRTUS EQUITY TRUST (Filer)").cik — Central Index Key without leading zeros, as a string.fileNo — the Securities Act file number of the affected registration statement (e.g. "333-288369"). This is the load-bearing cross-reference back to the original S-1, S-3, F-1, N-14, POS AM, or other registration filing whose amendment is being withdrawn.type — form type contributed by this entity ("AW").filmNo — EDGAR film number assigned to the AW submission.act — Securities Act code; uniformly "98" (Securities Act of 1933), consistent with Rule 477's statutory home.irsNo — IRS Employer Identification Number when reported.stateOfIncorporation — two-letter US state or foreign jurisdiction code.fiscalYearEnd — four-digit MMDD.sic — SIC code with industry label when EDGAR carries it (e.g. "6022 State Commercial Banks", "1000 Metal Mining", "2836 Biological Products, (No Diagnostic Substances)"); often absent for registered investment companies.tickers — optional array of ticker symbols when assigned (e.g. ["SSBK"], ["LBSR"], ["ESGH"]).entities[0].fileNo is the most analytically important field on this object: it is the key that reconnects an AW record to the registration statement whose amendment it withdraws, and it is the join column that turns isolated AW letters into events on a registration-statement timeline.
Each accession folder carries the AW filing as a single HTML file wrapped in an EDGAR SGML <DOCUMENT> envelope. The file is not a bare HTML page; the standard EDGAR document header sits before the <html> root tag:
1
<DOCUMENT>
2
<TYPE>AW
3
<SEQUENCE>1
4
<FILENAME>tm2520604d4_aw.htm
5
<DESCRIPTION>AW
6
<TEXT>
7
<html>
8
... (full HTML body of the Rule 477 withdrawal letter) ...
9
</html>
10
</TEXT>
11
</DOCUMENT>
Inside the <TEXT> block, the HTML body is a formal letter to the SEC and follows a tightly conventional structure across virtually all filers:
VIA EDGAR (or similar) transmission header at the top.Re: line that identifies the registrant by name and lists the file number(s) of the registration statement being withdrawn (e.g. File No. 333-267772).S-3/A, N-14/A, S-1/A, POS AM, etc.), filing date, and accession number.S-3/A that should have been a POS AM), a decision to refile under a different rule, abandonment of the offering, or successor-in-interest restructurings./s/ Name, the signer's title, and the registrant's name (occasionally identifying a successor entity, for example "FB Financial Corporation as successor-in-interest to Southern States Bancshares, Inc.").Because the body is HTML, it carries inline styling, paragraph structure, and occasionally <IMG> references to logos packaged elsewhere in the original EDGAR submission.
For each accession, the record bundles:
metadata.json manifest in full.<DOCUMENT> wrapper preserved.Two categories of content are intentionally omitted from each record:
image_001.jpg carried in the original submission as <TYPE>GRAPHIC documents — are excluded from the ZIP. They remain enumerated in documentFormatFiles[] (with type: "GRAPHIC" and a sequence of 2 or higher) and the HTML body may still contain <IMG SRC="image_001.jpg"> tags pointing at them, but the binary image files themselves are not packaged. This is a deliberate dataset-level decision and the reason the dataset's file-type set excludes image MIME types..txt wrapper. EDGAR generates a single .txt artefact that concatenates the entire submission inside one large SGML envelope. That wrapper is enumerated in documentFormatFiles[] with sequence: " " and type: " " and is reachable through linkToTxt, but it is not redistributed in the ZIP. The dataset relies on the per-document HTML files plus the manifest instead.Form AW has been an EDGAR submission type continuously since the mid-1990s, and its substantive content — registrant identification, affected file number, identification of the amendment being withdrawn, the no-sales representation, the reason, and a signature — has not changed materially because the underlying authority, Rule 477, has been stable. What has evolved is the document format the SEC accepted from filers:
<TYPE>AW … <TEXT> … </TEXT> markers as monospaced text, without HTML markup. The Rule 477 letter conventions (date, addressee, Re: line, no-sales representation, signature) were already in place but were rendered in plain text rather than styled HTML.<DOCUMENT> envelope. The <TYPE>AW tag and sequence numbering carried over unchanged; only the inner <TEXT> payload changed from plain text to <html>...</html>.*_aw.htm. Embedded logos as GRAPHIC attachments became routine, and corresponding <IMG> references appear in the body.The dataset's per-record structure abstracts over this evolution: regardless of whether the underlying letter was originally ASCII or HTML, every record is presented as one accession folder containing metadata.json plus the primary document(s) in their native form.
metadata.json.accessionNo. Either form may be used as a key, but the two representations must not be mixed when joining datasets.entities[0].fileNo is the canonical join back to the registration statement whose amendment is being withdrawn. To reconstruct the lifecycle of an offering — original S-1/S-3/F-1/N-14, subsequent amendments, and the AW that retired one of those amendments — match on this file number rather than on CIK alone, because a single registrant may carry many active file numbers in parallel.metadata.json identifies the registration statement at the file-number level but does not break out the specific amendment accession that the AW retires. The HTML body of the letter is the only authoritative source for which amendment is being withdrawn (form type, filing date, accession number), and extracting that requires parsing the letter text.<DOCUMENT> header sits outside the <html> root, so naive HTML parsers should either skip past the <TEXT> marker before parsing or be tolerant of the leading SGML preamble.act is uniformly "98" and formType is uniformly "AW", those fields are useful integrity checks rather than discriminators within this dataset.The filer is always the registrant of the underlying registration statement (or its successor-in-interest), identified in EDGAR by its own CIK and listed as the sole entity on the submission with act code "98". Counsel typically drafts and transmits the letter, and an officer, director, or authorized representative signs it, but the legal filer is the registrant itself — never the underwriter, transfer agent, or law firm.
The filer population is essentially the full universe of '33 Act registrants:
Form AW is filed under Rule 477 of the Securities Act of 1933 (17 CFR 230.477), which governs withdrawal of registration statements and amendments. Rule 477 provides that any registration statement or amendment may be withdrawn upon application of the registrant if the Commission consents — a finding that withdrawal is consistent with the public interest and the protection of investors. Commission consent operates through a deemed-grant mechanism: the application is treated as granted at the time the registrant is notified, or by operation of the rule if the staff does not object within a short period. EDGAR acceptance of the AW submission is the recorded filing event; Commission consent itself is not a separate EDGAR record and is not packaged in this dataset.
A core substantive prerequisite is that no securities have been sold under the registration statement to which the amendment relates. The AW letter typically contains an explicit representation to that effect.
Form AW is event-driven and discretionary. The trigger is a registrant's decision that a particular pre-effective (/A) or post-effective (POS AM) amendment should be removed from active status on the EDGAR record. There is no periodic schedule, no statutory deadline, and no calendar-driven obligation. Common triggers include:
S-3/A instead of POS AM); because EDGAR cannot silently re-tag an accepted submission, the registrant withdraws and re-files under the correct type.Form AW is submitted through normal EDGAR channels, accepted under the standard business-day window. The filedAt timestamp reflects EDGAR acceptance, not Commission consent. There is no statutory deadline — registrants file at their discretion whenever amendment housekeeping is needed, subject only to (i) the amendment already existing on the EDGAR record and (ii) the no-sales prerequisite under Rule 477.
The submission itself is short: typically a one-page HTML letter (occasionally with a graphic) addressed to the Division of Corporation Finance or, for fund filings, the Division of Investment Management. It identifies the registrant, the file number, the amendment to be withdrawn, and a brief reason. Volume is naturally low — the dataset spans 1994 to present, averaging well under 200 filings per year. The 1994 lower bound reflects EDGAR phase-in for Securities Act submissions; equivalent paper withdrawal letters predating EDGAR are not represented.
Form AW is a narrow procedural filing under Rule 477 of the Securities Act of 1933 that retracts a single amendment to a registration statement, with Commission consent. Because the EDGAR ecosystem contains many "withdrawal" forms that look superficially similar but operate on different legal objects under different statutes, the comparisons below isolate exactly what each neighbor covers and where Form AW remains distinct.
The nearest sibling. Both AW and RW are filed under Rule 477, both require Commission consent, and both are short cover-letter submissions. The distinction is the object withdrawn:
Use RW to identify abandoned IPOs, pulled shelves, and dropped offerings. Use AW to track amendment-level course corrections inside a still-active registration. A registrant unwinding a deal may file an AW to drop a pending amendment and a separate RW to retire the underlying registration.
A rescission of a previously filed AW, used when the registrant changes course before (or shortly after) Commission consent on the AW. Volumes are very small. Each AW-WD targets a specific AW accession number and should be joined back to the AW dataset as a cancellation event rather than treated as a parallel withdrawal universe.
These are the documents an AW acts against. The /A filings carry the substantive disclosure (prospectus updates, restated financials, revised deal terms, exhibits); the AW itself is a one- to three-page procedural letter referencing the /A by accession and file number. The AW dataset is content-thin and only becomes informative when joined to the corresponding /A filing it retracts.
These create the file number that every subsequent /A and AW points to. The hierarchy:
The base datasets are large and disclosure-rich; AW is a thin back-reference layer whose primary analytical value is as a join key into them.
Form 15 terminates or suspends registration of a class of securities and the associated periodic reporting duties (10-K, 10-Q, 8-K) under Section 12(g) or Rule 12h-3 (for Section 15(d) suspension) of the Securities Exchange Act of 1934. Form AW is a Securities Act of 1933 instrument acting on an offering registration. No statutory or operational overlap: Form 15 ends ongoing reporting for already-public issuers; AW retracts a single offering-side amendment, often pre-IPO or mid-shelf. Filer populations rarely intersect.
These W-suffix schedules withdraw tender offer or going-private filings under Section 13(e) and Section 14(d) of the Exchange Act:
The withdrawn object is a tender offer schedule, not a registration amendment, and the filer is typically a bidder or target. A study of "abandoned deals" might combine RW (abandoned offerings) with SC TO-W (abandoned tender offers), but AW does not belong in that bucket.
Each terminates a distinct, non-Securities-Act registration:
These share a procedural shape with AW but no substantive overlap. They terminate ongoing registered status of a person or entity; AW retracts a single document inside a Securities Act offering registration.
Form AW is uniquely the Rule 477 instrument for retracting one amendment to a Securities Act registration statement, with Commission consent, while leaving the base registration intact. It is not interchangeable with:
The dataset is procedurally precise but disclosure-thin: its value lies in linking amendment-level retraction events to the /A and base registration filings they reference, enabling analysis of Rule 477 consent activity, mid-review course corrections, and registrant behavior during the SEC amendment cycle.
Form AW filings are short but consequential, and different audiences extract value from different surfaces of the record — the metadata header, the Rule 477 letter body, and the file-number cross-reference to the underlying S-1, S-3, S-4, or F-series filing.
Use the letter bodies as a precedent library when drafting their own Rule 477 requests. Counsel study accepted phrasings, the granularity of stated reasons (market conditions, restructuring, refiling under a different form, abandonment), and the structure of the consent request. The metadata header (CIK, file number, affected amendment) lets them pull the full procedural history of comparable transactions and benchmark against prior Commission treatment.
Reconcile AW filings against internal capitalization registries, securities-law calendars, and board materials. They match accession number, file number, and acceptance date to confirm that withdrawn amendments are formally closed out, and retain the Rule 477 letter in the corporate file as evidence that no stale registration remains active before a future offering.
Feed AW filings into deal-mortality and pipeline analyses. The file number and CIK link each withdrawal back to its parent S-1, S-3, or S-4, supporting completion-rate and time-to-effectiveness statistics across IPOs, follow-ons, and shelf takedowns. The reason field, where stated, attributes mortality to market timing, structural redesign, or strategic exit, which feeds pitch decks and internal post-mortems on deals that lapsed before pricing.
Treat AW filings as a soft signal of stalled or restructured transactions. The registrant identifier, affected file number, and timing relative to prior amendments help update probability estimates for IPO pricing windows, secondary offerings, and stock-for-stock M&A registered on Form S-4. The signal is combined with the underlying registration to revise catalyst calendars and expected deal completion.
Run longitudinal studies on Rule 477 usage since 1994. The full population supports analysis of withdrawal frequency, clustering around macro events, and the distribution of stated reasons across sectors and form types — inputs to policy work on the consent requirement and the mechanics of registration amendments.
Ingest AW metadata to keep issuer profiles and registration timelines complete, particularly for companies that filed, withdrew, and refiled. File-number and CIK linkage normalizes each AW to its parent registration; the Rule 477 letter body serves as a corpus for NLP pipelines that classify withdrawal rationales and tag issuer events for downstream products.
Use AW filings as documentary evidence in audit-trail reconstruction. When a client reports a withdrawn amendment, auditors confirm a corresponding Form AW exists, that Commission consent was sought, and that the timing aligns with board minutes and management representations. The letter and metadata support workpapers on equity issuance disclosures, failed-raise going-concern considerations, and subsequent-events review.
Watch AW filings to detect pulled or restructured transactions. An AW tied to an S-4 can flag an abandoned or redesigned stock-for-stock merger; an AW on an S-1 amendment can flag a stalled IPO marketing process. The file-number link, withdrawal timing, and any reason in the Rule 477 letter feed spread management on announced deals, position sizing in issuers whose capital plans have shifted, and watchlists for likely refilers.
The use cases below tie directly to the three working surfaces of each record: the metadata.json header (CIK, entities[0].fileNo, filedAt, accession), the Rule 477 letter body (stated reason, identified amendment, no-sales representation, signature block), and the back-reference into the affected S-, F-, or N-series registration statement.
Join entities[0].fileNo and cik from every AW metadata.json to a base S-1, S-3, S-4, or F-1 registration index, then walk the /A history to identify which amendment each AW retracts (parsed from the letter body). Output: a per-file-number table marking each registration as completed, withdrawn at amendment level, or fully retired (when paired with RW), used to compute completion rates and median time-from-first-amendment-to-AW by sector and underwriter.
Index the HTML letter bodies by stated reason (miscoded form type, refiling under a different rule, abandonment, successor-in-interest restructuring) and by retracted form (S-3/A, N-14/A, POS AM, etc.). Output: a searchable precedent set that returns accepted phrasings of the consent request, the no-sales representation, and the reason paragraph, filtered to comparable transaction types for use when drafting a new Rule 477 letter.
For each client-reported withdrawn amendment, look up the matching AW by cik plus entities[0].fileNo, confirm filedAt aligns with board minutes and management representations, and extract the signer name and title from the letter's signature block. Output: a workpaper exhibit confirming Commission consent was sought, with the AW HTML attached as evidence supporting equity-issuance disclosure and subsequent-events review.
Stream new AW records as they land in the monthly ZIP, filter on entities[0].sic and on retracted form type extracted from the letter body, and alert when an AW retracts an S-4/A (potential restructured or abandoned stock-for-stock deal) or a late-stage S-1/A (potential pulled IPO). Output: a watchlist with cik, parent file number, retracted accession, and stated reason, feeding spread management on announced deals and refiler tracking.
Use the corpus of letter bodies as labelled training data for a classifier that tags each AW with a structured reason code (market conditions, form-type correction, restructuring, abandonment, successor reorganization). Combine with filedAt, sic, and stateOfIncorporation from entities[0] to produce a longitudinal panel of Rule 477 usage since the mid-1990s for policy research and for vendor-side issuer-event feeds.
Scan signature blocks and Re: lines for phrasings such as "as successor-in-interest to" to identify AWs filed in the wake of mergers, redomiciliations, or holding-company reorganizations. Cross-reference the named predecessor against the registrant's cik and prior fileNos. Output: a list of corporate-action events visible only through the AW letter text, useful for completing issuer histories where the underlying S-4 or 8-K linkage is ambiguous.
Dataset Index JSON API: [https://api.sec-api.io/datasets/form-aw-files.json](https://sec-api.io/datasets)
This endpoint returns metadata describing the Form AW Files Dataset, including its name, description, last refresh timestamp, earliest sample date (1994-01-01), covered form types, container format, file types, total record count, and total size. The containers[] array lists every individual container file in the dataset along with each container's key, size, record count, last updated timestamp, and direct download URL. Polling this endpoint allows you to detect which containers were updated in the most recent refresh run and selectively download only the changed archives. No API key is required to access the index.
Example response:
1
{
2
"datasetId": "1f13365b-9ae0-6957-a129-625be09ad17e",
3
"datasetDownloadUrl": "https://api.sec-api.io/datasets/form-aw-files.zip",
4
"name": "Form AW Files Dataset",
5
"updatedAt": "2026-04-25T03:01:06.958Z",
6
"earliestSampleDate": "1994-01-01",
7
"totalRecords": 4509,
8
"totalSize": 11320625,
9
"formTypes": ["AW"],
10
"containerFormat": "ZIP",
11
"fileTypes": ["TXT", "JSON", "HTML", "PDF"],
12
"containers": [
13
{
14
"downloadUrl": "https://api.sec-api.io/datasets/form-aw-files/2026/2026-04.zip",
15
"key": "2026/2026-04.zip",
16
"size": 248391,
17
"records": 17,
18
"updatedAt": "2026-04-25T03:01:06.958Z"
19
}
20
]
21
}
Download Entire Dataset: [https://api.sec-api.io/datasets/form-aw-files.zip](https://sec-api.io/datasets)?token=YOUR_API_KEY
Downloads the complete Form AW Files Dataset as a single ZIP archive containing every container from January 1994 to the most recent refresh. This endpoint requires a valid SEC API key passed via the token query parameter.
Download Single Container: https://api.sec-api.io/datasets/form-aw-files/2026/2026-04.zip?token=YOUR_API_KEY
Downloads a single monthly container archive instead of the full dataset. Use the downloadUrl values from the containers[] array in the index JSON to retrieve specific months. This endpoint also requires a valid SEC API key.
Form AW is the EDGAR submission type used by a Securities Act of 1933 registrant to ask the Commission for consent to withdraw a single amendment to a previously filed registration statement. It is filed under Rule 477 of the Securities Act of 1933 (17 CFR 230.477) and takes the form of a short procedural letter rather than a substantive disclosure document.
One record is exactly one Form AW submission as accepted by EDGAR, identified by its unique 18-digit accession number. On disk, each record is a per-filing folder containing a metadata.json manifest plus the original AW letter document(s) — almost always a single HTML file wrapped in an SGML <DOCUMENT> envelope.
Form AW is event-driven and discretionary. Common triggers include a mis-coded EDGAR submission type (for example, an S-3/A that should have been a POS AM), abandonment or restructuring of the offering contemplated by the amendment, refiling under a different form, successor-in-interest cleanup after a merger, fund reorganization changes that moot a prior N-14 amendment, or material errors that make withdrawal cleaner than further amendment. There is no statutory deadline.
Both AW and RW are filed under Rule 477 and both require Commission consent, but they target different layers of the record. Form RW withdraws the entire registration statement (S-1, S-3, S-4, F-1, N-2, etc.), so the file number itself is retired. Form AW withdraws only a specific pre- or post-effective amendment, leaving the base registration statement and its file number live.
Rule 477 provides that any registration statement or amendment may be withdrawn upon application of the registrant if the Commission consents. A core substantive prerequisite is that no securities have been sold under the registration statement to which the amendment relates, and the AW letter typically contains an explicit representation to that effect. Consent operates through a deemed-grant mechanism if the staff does not object within a short period.
The dataset spans 1994 — the EDGAR phase-in for Securities Act submissions — to the present. It is delivered as monthly ZIP containers under year directories (for example 2025/2025-07.zip), and the index JSON endpoint exposes each container's updatedAt timestamp so consumers can detect and download only the containers changed in the most recent refresh.
The dataset is distributed as ZIP archives partitioned by calendar month. Inside each monthly ZIP, every accession is a subfolder named with the dehyphenated 18-digit accession number, containing metadata.json plus the original EDGAR documents. The file types present in the dataset are TXT, JSON, HTML, and PDF, though in practice almost every record consists of the JSON manifest plus a single HTML letter; image attachments and the EDGAR-generated complete-submission .txt wrapper are intentionally excluded.