The Form 40-33 Files Dataset is a document-level corpus of every Form 40-33 and Form 40-33/A submission received by EDGAR from January 2003 onward, lodged with the SEC under Section 33 of the Investment Company Act of 1940. Each record is one complete EDGAR accession packaging the litigation papers — pleadings, motions, court orders, judgments, proposed settlements, compromises, or discontinuances — served or filed in a stockholder derivative action brought "by or in the right of" a registered investment company. The filers are the registered investment companies themselves (mutual funds, ETFs, closed-end funds, and BDCs) and their affiliated-party defendants (typically advisers, sub-advisers, distributors, officers, and trustees), who must transmit these papers to the Commission within five or ten days of service depending on the manner of service. Records are delivered as per-accession directories inside calendar-month ZIP containers, each pairing a metadata.json header with every non-image document the filer attached to the original EDGAR submission.
Programmatically retrieve the full list of dataset archive files, download URLs and dataset metadata.
Dataset Index JSON API
Download the entire dataset as a single archive file.
Download Entire Dataset:
Download a single container file (e.g. monthly archive) from the dataset.
Download Single Container:
The dataset captures the full population of Form 40-33 and Form 40-33/A accessions submitted to EDGAR from January 2003 to present. Form 40-33 is a lodgment filing: a document-transmittal vehicle through which registered investment companies and their affiliated-party defendants deposit copies of court papers from stockholder derivative actions with the SEC, as required by Section 33 of the Investment Company Act of 1940. Form 40-33/A is the amendment variant, used to correct, supplement, or supersede a previously submitted 40-33. The dataset's payload is therefore legal rather than financial — the content of each record is whatever court paper was served or filed, not a structured issuer disclosure.
Dataset coverage begins in January 2003, when electronic submission under the Form 40-33 code became the operative channel; earlier Section 33 lodgments exist only as paper records outside EDGAR. The dataset is distributed as calendar-month ZIP archives containing per-accession directories, and the file types that appear inside those directories are PDF, HTML, and JSON. Image files (such as .jpg and .gif exhibits) are stripped during dataset preparation and the wrapping SGML submission .txt file that EDGAR generates for each accession is referenced but not physically included; everything else the filer submitted is preserved verbatim.
A single record in this dataset is one complete Form 40-33 or Form 40-33/A submission, scoped to a single EDGAR accession number. Physically, a record is a per-accession directory packaged inside a calendar-month ZIP container. The directory is named with the 18-digit dashless accession number (for example 000139834425005779, corresponding to the canonical dashed form 0001398344-25-005779) and bundles a metadata.json header together with every document the filer attached to the original EDGAR submission, with the sole exception of image files, which are stripped during dataset preparation. One record therefore captures both the filing-level metadata envelope and the actual legal-paper payload that the Section 33 regime is designed to make public.
Form 40-33 is a disclosure vehicle required by Section 33 of the Investment Company Act of 1940. When a stockholder derivative action is filed in court against a registered investment company or an affiliated person of such a company, Section 33 obligates the defendants to deliver to the SEC copies of the pleadings, verdicts, judgments, proposed settlements, compromises, or discontinuances served or filed in that action. The filing is due within five days of service where personal service is made, and ten days where service is made otherwise. Form 40-33/A is the amendment variant, used to correct, supplement, or supersede a previously submitted 40-33.
The form is unusual among SEC filings in that it has no prescribed narrative body, no schedule of captioned items, and no standardized exhibit taxonomy. The content is whatever court paper was served or filed: a complaint, an amended complaint, a motion to dismiss, an answer, a court order, a stipulation of settlement, an approval order, or a notice of dismissal. Consequently the dataset's payload is legal, not financial: the exhibit is typically a single bundled PDF assembled from the court record, often scanned or natively printed, rather than a structured issuer disclosure.
The dataset is organized into per-month ZIP archives named <YYYY>-<MM>.zip, grouped under year-level folders. Inside each ZIP, every accession occupies its own folder named by the dashless accession number, and each folder contains exactly the files that make up one record. The canonical on-disk shape of a record folder is:
1
<YYYY>-<MM>/
2
<accessionNoNoDashes>/
3
metadata.json
4
<filer-named-document-1>.pdf
5
[<filer-named-document-2>.htm]
6
...
metadata.json is always present. Alongside it sit one or more attached documents, named exactly as the filer named them in EDGAR; no renormalization is performed. Filer-generated identifiers are preserved verbatim, so filenames such as fp0092762-1_4033.pdf (a service-bureau convention where fp<########>-<seq> is the filer prefix and the _4033 suffix advertises the form type) carry through intact. The SGML "complete submission text file" that EDGAR generates for every EDGAR accession is referenced by URL inside metadata.json but is not included as a physical file in the record folder; the folder holds only the actual filed documents plus the metadata header.
The file types found in the dataset are PDF, HTML, and JSON. In practice the attached documents for 40-33 are almost always PDFs (scanned or natively printed court bundles), occasionally accompanied by HTML cover or index pages when the filer chose to include them, and always paired with the single metadata.json.
metadata.json is a flat EDGAR-style filing header object that mirrors the fields EDGAR exposes for any accession, adapted to the 40-33 context. The fields carried on a record are:
formType — the string "40-33" for original submissions or "40-33/A" for amendments.accessionNo — the canonical dashed accession number, for example "0001398344-25-005779".effectivenessDate — the EDGAR effectiveness date in YYYY-MM-DD form.filedAt — an ISO 8601 timestamp with timezone offset, pinpointing when EDGAR accepted the submission.description — a static form-level description string: "Form 40-33 - Copies of all Stockholder Derivative Actions filed with a court against an Investment Company or an Affiliate thereof [Section 33]".linkToFilingDetails — an absolute sec.gov/Archives/edgar/... URL pointing to the primary filed document, typically the same PDF sitting next to the metadata file.linkToTxt — URL to the full SGML submission .txt on EDGAR.linkToHtml — URL to the EDGAR -index.htm landing page for the accession.linkToXbrl — URL slot for a structured-data exhibit; empty on 40-33 because the form carries none.id — an opaque 32-character hexadecimal identifier used internally by the dataset.documentFormatFiles[] — an array mirroring EDGAR's "Document Format Files" table, with one entry per attached document. Each row contains sequence (stringified sequence number, with the SGML .txt row carrying a blank " "), size (byte size as a string), documentUrl (absolute EDGAR URL), type (typically "40-33" for the primary document and a single space " " for the bundled submission .txt row whose description is "Complete submission text file"), and description.dataFiles[] — reserved for structured data exhibits; empty on 40-33.seriesAndClassesContractsInformation[] — reserved for fund series/class identifiers; usually empty on 40-33 but populated when the filer attaches such metadata.entities[] — an array of filer and affected-party records. Each entity object holds companyName (with a role suffix in parentheses such as "Firsthand Technology Value Fund, Inc. (Filer)"), cik (numeric string without leading zeros), irsNo, fileNo (commonly the 811-XXXXX series for registered investment companies or the 814-XXXXX series for business development companies), filmNo, act ("40" for the 1940 Act), fiscalYearEnd (an MMDD string such as "1231"), type (the entity-level form tag, typically mirroring formType), and tickers[] when the fund is exchange-listed.The entities[] array is the primary surface for identifying the registered investment company defendant and any affiliated persons named in the same filing; the (Filer) annotation on companyName is the canonical signal for the submitting party, while other role tags may appear on multi-party derivative-action submissions.
The attached document or documents carry the substantive legal content required by Section 33. Because the form has no prescribed structure, what a record's payload contains is driven entirely by the procedural stage of the underlying derivative action when the defendants were served. Across the dataset, the recurrent content categories are:
Payload documents are typically single PDFs bundling every paper served in a given round. The PDFs are often image-only scans of the court's filestamped copies, or text-over-image PDFs produced by a litigation support vendor; clean, text-native PDFs are less common. There is no standardized internal exhibit-tagging scheme within the document beyond the filer's own naming conventions (for example *_4033.pdf, ex99*.htm, or court-caption-derived names). Multi-document records do occur when the filer splits the pleadings, the order, and the proposed settlement into separate attachments, but single-bundle PDFs dominate.
Every record directory includes the full metadata.json header and every non-image document the filer submitted to EDGAR under the accession. Filer-chosen filenames are preserved, and documents appear in the folder regardless of whether they are the primary 40-33 document or a supplementary cover or exhibit page. HTML cover or index documents, when supplied by the filer, sit as peer files next to metadata.json under their original filer-assigned filenames.
Two categories of content are deliberately not included in the record folder. First, image files (such as .jpg and .gif exhibit graphics that may have been attached to the EDGAR submission) are stripped during dataset preparation and never appear. Second, the wrapping SGML submission .txt file that EDGAR generates to envelope the accession is not included as a physical file, although its location on EDGAR is referenced via the linkToTxt field. As a result, there is no <DOCUMENT>...<TEXT> SGML wrapper inside the record folder; the folder holds only the actual filed documents plus the metadata header. Nothing else from the EDGAR accession — the index HTML pages, any filer correspondence, or audit-trail headers — is repackaged into the record.
The formType field is the single axis of variation between original and amended submissions. Originals carry "40-33" and amendments carry "40-33/A". Directory layout, metadata schema, and payload conventions are identical between the two variants; amendments are later accessions that re-file, correct, or supplement previously served legal papers, and are linked to the original only implicitly through the filer's CIK, the form-type family, and the content of the attached documents. The dataset does not thread amendments to their originals — each accession stands as its own record.
Form 40-33 has had a stable disclosure obligation throughout the dataset's coverage window (January 2003 to present), and its required content is defined by reference to whatever court papers were served rather than by an itemized disclosure schedule, so there are no Item-level reorganizations to narrate. The practical evolution that matters for a consumer of this dataset is the document format of the attachments themselves, which has shifted only modestly over time: early-2000s filings are more likely to be lower-resolution scanned PDFs or, in rare cases, HTML-rendered text; filings from the 2010s onward are more consistently higher-quality PDFs produced either directly from litigation support software or as scanned-and-OCR'd bundles. Throughout the coverage window the metadata envelope and the per-accession directory shape are consistent, and image-file stripping applies uniformly.
Several nuances matter when working with the anatomy of a 40-33 record:
entities[]. The entity array lists every party EDGAR associates with the accession in a filer or affiliate role; it does not enumerate court-side parties such as plaintiff stockholders, non-filing co-defendants, or nominal defendants. Those must be read from the pleadings.type and description fields of the corresponding documentFormatFiles[] row. Because original filenames are preserved verbatim, name-based heuristics for identifying the primary 40-33 bundle are only approximate and vary across filers and time.metadata.json exposes must follow the linkToTxt URL back to EDGAR; the record folder itself contains no SGML file.Form 40-33 is a lodgment filing used to deposit with the SEC copies of litigation papers from a stockholder derivative action brought "by or in the right of" a registered investment company. The filing population is narrow:
Plaintiffs (shareholders suing derivatively on the fund's behalf) do not file Form 40-33. Operating-company issuers, private funds relying on Sections 3(c)(1) or 3(c)(7), unregistered foreign funds, and broker-dealers acting outside a fund-affiliate capacity are outside the obligation entirely.
The sole basis is Section 33 of the Investment Company Act of 1940 (15 U.S.C. Section 80a-33). Section 33 requires every registered investment company, and every affiliated person who is a defendant in a derivative action instituted on its behalf, to transmit to the Commission copies of all pleadings, verdicts, and judgments filed with the court or served in connection with the action, together with any proposed settlement, compromise, or discontinuance.
Section 33 submissions are received through EDGAR as Form 40-33, with Form 40-33/A used for amendments. Form 40-33 is a pure lodgment/notice filing — not a registration statement, not a periodic report under Section 30, and not an Exchange Act filing. It carries no Regulation S-X, Regulation S-K, or XBRL obligations. Its function is to place the Commission on notice of derivative litigation touching a fund's affairs and to create an accessible record on EDGAR.
The obligation is strictly event-driven, not periodic. A filing is triggered whenever a covered document is served on, or filed in court by, a covered defendant in a qualifying derivative action. Triggering documents include:
Each qualifying document can generate its own Form 40-33 submission, so a single derivative case typically produces a sequence of filings across its docket. The trigger is the service or filing of a document — not the commencement of the suit — meaning defendants must transmit papers generated by opposing parties as well as their own. There is no materiality threshold: every qualifying document must be lodged regardless of the size or likely outcome of the action.
Section 33 sets a short deadline keyed to the manner of service:
Deadlines run from the date of service or filing and reset with each new triggering document. A fund complex with no derivative litigation in a given year files nothing; a single high-volume case can produce dozens of filings. There is no annual or quarterly cadence.
Form 40-33/A (amendments) are filed to correct or supplement an earlier submission — for example, to substitute a cleaner copy of a lodged pleading, fix filer metadata, or attach a missing exhibit. An amendment references the original accession number and does not reset the underlying Section 33 deadline.
Form 40-33 occupies a narrow niche: a mandatory document-transmittal filing by a registered investment company (or an affiliated defendant) conveying copies of litigation papers from a stockholder derivative action to the SEC. Several adjacent filings partially overlap with it, but none are substitutes. The strongest comparisons are with fund periodic disclosures that may mention litigation, operating-company litigation disclosures, other 1940 Act notice filings, and SEC enforcement releases.
Form N-CSR and N-CSRS are periodic certified shareholder reports from registered management investment companies containing financial statements, schedules of investments, trustee information, and, where material, narrative references to legal proceedings affecting the fund. Overlap with 40-33 is limited to those narrative mentions.
Key differences:
N-CEN (and its predecessor N-SAR) is a structured annual census filing covering operational facts about registered funds: service providers, fees, classes, directors, auditor, and certain regulatory yes/no items including litigation indicators. Overlap with 40-33 is limited to the mere existence of litigation.
Key differences:
Item 103 of Form 10-K and Part II, Item 1 of Form 10-Q contain operating-company narrative disclosure of material pending legal proceedings, including derivative actions against operating-company boards. The conceptual overlap with 40-33 is litigation disclosure.
Key differences:
Derivative litigation against fund complexes cannot be retrieved via Item 103 searches because funds do not file Form 10-K or 10-Q.
8-K is the current-report vehicle for operating-company event disclosure under the Exchange Act. Registered investment companies generally do not file 8-Ks, and no 8-K item transmits full litigation documents. For funds, derivative-action litigation surfaces through 40-33 document transmittals plus summary references in N-CSR and N-CEN. 40-33 is the investment-company functional counterpart to a litigation-event record, but with a primary-document content model rather than narrative items.
These are the closest neighbors by form-number lineage, all arising under the Investment Company Act. They are frequently confused with 40-33 due to the shared "40-" prefix.
None transmits litigation documents. 40-33 is the only 40-series filing whose content is primarily court papers and the only one triggered by external legal process rather than a registrant's own compliance calendar or discretionary application.
SEC Litigation Releases and Accounting and Auditing Enforcement Releases are SEC-authored publications describing actions the Commission itself has brought or resolved. They co-occur with 40-33 in keyword searches on "litigation," but the direction of disclosure is opposite.
Key differences:
Form 40-33 is non-substitutable along four dimensions no other EDGAR filing type combines:
N-CSR and N-CEN can signal that fund litigation exists; Item 103 covers only the operating-company analogue; 8-K is unavailable to funds for this purpose; Litigation Releases cover a disjoint enforcement universe; the remaining 40-series forms serve unrelated compliance or application functions. These datasets complement 40-33 but cannot replace it as the source of the underlying court record.
Because Form 40-33 bundles the actual pleadings, motions, orders, and settlement exhibits served in stockholder derivative actions against registered investment companies, the dataset is a primary-source corpus for a narrow set of professionals working across fund litigation, governance, and insurance.
Firms that specialize in Section 36(b) excessive-fee and derivative actions mine complaints and amended complaints to see how claims are pled, which defendants are named (adviser, distributor, sub-adviser, trustees), and which fact patterns survive motions to dismiss. Defense counsel use the answers, MTD briefs, and opinions in reverse to build motion banks and benchmark arguments. Both sides use metadata.json CIK, filer, filing-date, and service-date fields to reconstruct dockets and link each 40-33/A amendment to its base filing. Output: precedent memos, brief banks, case-strategy databases.
Legal and compliance teams ingest new 40-33 filings to monitor industry exposure, pulling caption, court, docket, and claim language from complaint PDFs and matching filer identifiers against their own affiliate lists to detect service on any group entity. Settlement exhibits feed escalation memos and board briefings, and drive litigation-hold decisions plus material-legal-proceedings disclosure in Form N-CSR and registration statements.
Underwriters use defendant rosters, settlement amounts, non-monetary terms, and dismissal outcomes to quantify derivative-action frequency and severity for fund-complex books. Adjusters track 40-33/A amendments on open claims and benchmark reserves against recent settlement ranges. Output: loss-ratio and severity models, rating factors, reserve reviews, and reinsurance loss schedules.
Experts retained on either side pull settlement agreements, stipulations, and expert declarations to extract damages methodologies, comparative fee benchmarks, and the structure of fee rebates, caps, or governance-reform consideration. They use these to calibrate settlement ranges for funds with comparable AUM, fee levels, and share-class structures, feeding expert reports, mediation submissions, and fairness-hearing declarations.
Trustees and their independent counsel read complaints served on peer complexes to identify practices drawing claims and review settlement exhibits for the structural reforms counsel are accepting (fee reductions, board-composition changes, expanded 15(c) disclosure). This informs advisory-contract renewal deliberations, 15(c) process memos, and board minutes on fee reasonableness.
Empirical legal and finance scholars code complaints by claim type, adviser affiliation, fee category, and class period, and code outcomes from judgment and settlement PDFs. The filing-date series supports time-series panels, event studies of litigation on fund flows, and research on how Gartenberg jurisprudence and board composition correlate with outcomes.
Analysts at industry associations and policy organizations track filing volume, claim mix across adviser types, and settlement-versus-dismissal rates to document private enforcement trends. Output: comment letters on SEC rulemakings, white papers on fund governance, and testimony on 1940 Act reform.
Reporters covering excessive-fee and fiduciary-duty disputes use filer and date metadata to detect new cases, complaint PDFs to source factual allegations, and settlement and judgment exhibits to verify resolution terms independent of party statements.
Teams building retrieval, outcome-prediction, and drafting assistants use the corpus as a cleanly scoped training and evaluation set for 1940 Act derivative actions. They extract text from complaint, motion, and settlement PDFs and key documents by accession number and CIK for linkage to other EDGAR datasets, powering embedding stores, outcome classifiers, and domain-tuned summarizers.
Across these roles, litigators and damages experts live in the pleading and settlement PDFs; in-house compliance and trustees rely on metadata plus outcome documents; insurers focus on settlement terms and defendant rosters; researchers and legal-tech teams consume the full corpus as structured data. Because filings contain the litigation papers themselves rather than summaries, the dataset is directly usable as primary evidence in briefs, expert reports, underwriting files, board memos, and empirical studies of fund derivative actions.
The dataset's record-level combination of EDGAR metadata and primary court papers supports a small number of concrete, repeatable workflows.
Building a Section 36(b) complaint bank for motion-to-dismiss drafting. Defense counsel at fund-complex firms extract complaint and amended-complaint PDFs from each accession folder, OCR the scanned bundles, and tag allegations by fee type, share class, and Gartenberg factor. The output is a searchable brief bank of claim language and the MTD rulings that followed, keyed to metadata.json CIK and filedAt for docket reconstruction.
Quantifying derivative-action severity for fund D&O underwriting. Underwriters parse stipulations of settlement and final approval orders out of the payload PDFs to pull settlement dollar amounts, fee-rebate terms, and governance-reform concessions. Joined to entities[] CIK and fileNo (the 811- or 814- series), these feed severity tables and rating factors on registered-fund and BDC D&O books, plus reserve benchmarks for open claims.
Monitoring fund-complex litigation exposure in near real time. In-house legal and compliance teams run a daily pull of new 40-33 and 40-33/A accessions, match entities[] companyName and CIK against their affiliate roster, and route matched filings into litigation-hold, board-reporting, and N-CSR legal-proceedings disclosure workflows. Amendments are flagged by the /A suffix on formType and clustered to originals via CIK plus case caption parsed from the PDF.
Calibrating damages and fee-rebate benchmarks for expert reports. Forensic accountants retained in 15(c) and excessive-fee matters pull settlement agreements and expert declarations from payload PDFs across comparable funds, extract fee-cap levels, rebate schedules, and AUM at settlement, and produce a comparables table used in mediation submissions and fairness-hearing declarations.
Peer-practice review for independent trustees at 15(c) renewal. Independent trustee counsel pull recent settlement exhibits filed by peer complexes and summarize the structural reforms (board composition changes, expanded 15(c) disclosure, fee reductions) accepted in each matter. The output is a short memo attached to advisory-contract renewal minutes, anchored to specific accession numbers.
Training and evaluating fund-litigation LLM assistants. Legal-tech teams use the per-accession folders as a cleanly scoped corpus of 1940 Act derivative pleadings, orders, and settlements. They OCR the PDFs, key extracts by accession number and CIK for join to N-CSR and N-CEN, and build outcome classifiers (dismissal vs. settlement vs. judgment) and summarization models tuned to Section 33 matters.
Empirical research on fund governance and derivative-action outcomes. Academic teams code complaints by claim type, defendant role (adviser, sub-adviser, distributor, trustees), and class period, and code resolutions from judgment and settlement PDFs. Using filedAt as the event timestamp, they build panels for event studies of litigation on fund flows and tests of how board composition correlates with dismissal rates.
Dataset Index JSON API: https://api.sec-api.io/datasets/form-4033-files.json
The dataset index endpoint returns metadata describing the Form 40-33 Files Dataset, including its name, description, last updated timestamp, earliest sample date (2003-01-01), total records, total size, covered form types (40-33, 40-33/A), container format (ZIP), and included file types (HTML, JSON, PDF). It also returns the download URL for the full dataset archive and the full list of container files, with per-container size, record count, updated timestamp, and download URL. Use this endpoint to monitor which containers were refreshed in the most recent run so you can download only the changed containers on a daily basis.
This endpoint does not require an API key.
Example response:
1
{
2
"datasetId": "1f13365b-9ae0-699e-a07a-d6bcabefa83b",
3
"datasetDownloadUrl": "https://api.sec-api.io/datasets/form-4033-files.zip",
4
"name": "Form 40-33 Files Dataset",
5
"updatedAt": "2026-04-15T12:02:56.681Z",
6
"earliestSampleDate": "2003-01-01",
7
"totalRecords": 208,
8
"totalSize": 565699218,
9
"formTypes": ["40-33", "40-33/A"],
10
"containerFormat": "ZIP",
11
"fileTypes": ["HTML", "JSON", "PDF"],
12
"containers": [
13
{
14
"downloadUrl": "https://api.sec-api.io/datasets/form-4033-files/2026/2026-03.zip",
15
"key": "2026/2026-03.zip",
16
"size": 13818783,
17
"records": 4,
18
"updatedAt": "2026-04-15T12:02:56.681Z"
19
}
20
]
21
}
Download Entire Dataset: https://api.sec-api.io/datasets/form-4033-files.zip?token=YOUR_API_KEY
Use this URL to download the full dataset as a single ZIP archive containing every container. This endpoint requires an API key.
Download Single Container: https://api.sec-api.io/datasets/form-4033-files/2026/2026-03.zip?token=YOUR_API_KEY
Use a container downloadUrl from the index response to fetch one monthly archive instead of the entire dataset. This is the most efficient approach for incremental updates. This endpoint requires an API key.
The Form 40-33 Files Dataset covers Form 40-33 (original submissions) and Form 40-33/A (amendments). Both variants share the same directory layout, metadata schema, and payload conventions; the only axis of variation is the formType field.
One record is a single complete Form 40-33 or Form 40-33/A submission, scoped to a single EDGAR accession number and packaged as a per-accession directory named with the 18-digit dashless accession number. Each directory contains a metadata.json header plus every non-image document the filer attached to the original EDGAR submission.
Registered investment companies named as nominal defendants in a stockholder derivative action — open-end funds, ETFs, closed-end funds, BDCs, and unit investment trusts — must file, as must each affiliated-party defendant (investment advisers, sub-advisers, distributors, officers, trustees, and control persons). Plaintiff shareholders never file Form 40-33, and operating-company issuers, private funds, and unregistered foreign funds are outside the Section 33 obligation entirely.
Section 33 sets an event-driven deadline keyed to the manner of service: ten days when the pleading, verdict, judgment, or settlement paper is served personally on the defendant (or filed by the defendant in court), and five days when service is made by other means. Each qualifying document resets the clock, so a single derivative case typically produces a sequence of filings across its docket.
The dataset includes all Form 40-33 and Form 40-33/A filings submitted to EDGAR from January 2003 to present, which is when electronic submission under the Form 40-33 code became the operative channel. Earlier Section 33 lodgments exist only as paper records outside EDGAR.
The dataset is distributed as calendar-month ZIP archives named <YYYY>-<MM>.zip. Inside each ZIP, per-accession record folders contain a metadata.json header alongside the filer's attached documents, which in practice are almost always PDFs (scanned or natively printed court bundles), occasionally accompanied by HTML cover or index pages. Image files such as .jpg and .gif exhibits are stripped during dataset preparation.
N-CSR summarizes material litigation in prose on a semiannual or annual cadence, and 10-K Item 103 does the same quarterly or annually for operating-company issuers; neither reproduces the underlying court papers. Form 40-33, by contrast, is event-driven (filed within five or ten days of service) and transmits the actual complaint, motion, order, or settlement text, making it the sole EDGAR source for primary derivative-action documents involving registered investment companies.