The Form NSAR-B Files dataset is a closed historical corpus of every EDGAR submission of Form NSAR-B and its amendment counterpart Form NSAR-B/A — the second-half semi-annual operational and financial report mandated by Section 30 of the Investment Company Act of 1940 for registered management investment companies. Each record represents one accession: a single registrant (or umbrella series trust) reporting cumulative figures through fiscal year-end, filed within 60 days of the close of the reporting period. The dataset is filed by registered open-end funds, closed-end funds, small business investment companies, and series trusts, with advisers, sub-advisers, brokers, underwriters, and accountants described inside the answer file rather than filing separately. Coverage begins in January 1994 and runs through the SEC's rescission of Form N-SAR in 2018, with a continuing tail of NSAR-B/A amendments against pre-2018 reporting periods. Filings are delivered as monthly ZIP containers organized by filing month, with each accession folder containing a per-filing metadata.json, the SGML-wrapped primary NSAR-B answer file, and any item-77 or item-102 exhibits.
Programmatically retrieve the full list of dataset archive files, download URLs and dataset metadata.
Dataset Index JSON API
Download the entire dataset as a single archive file.
Download Entire Dataset:
Download a single container file (e.g. monthly archive) from the dataset.
Download Single Container:
The dataset packages every Form NSAR-B and Form NSAR-B/A filing submitted to EDGAR from January 1994 onward. Form N-SAR is a semi-annual operational and financial questionnaire prescribed under Section 30 of the Investment Company Act of 1940 and Rule 30a-1 (historically Rule 30b1-1) for management investment companies registered. The "B" variant covers the second half of the fund's fiscal year and reports cumulative annual figures through fiscal year end. Unlike narrative annual reports, N-SAR is a highly structured questionnaire: a fixed schedule of numbered items and sub-items soliciting specific data points about adviser arrangements, portfolio activity, share transactions, distribution practices, brokerage allocation, and audited financial results. NSAR-B is distinguished from its first-half sibling NSAR-A by its fiscal-year-end reach, the requirement to attach the independent accountant's report on internal controls, and broader annual-disclosure items.
Form N-SAR was rescinded by the SEC in 2018 under the investment company reporting modernization rules (Release No. 33-10231 / IC-32314) and replaced by Form N-CEN, an annual census report on a redesigned XML schema. Historical NSAR-B and NSAR-B/A filings remain on EDGAR and are preserved verbatim in this dataset. No new NSAR-B filings have been submitted against post-2018 fiscal periods, but NSAR-B/A amendments to pre-transition reports can still appear and are captured as fresh accessions. The dataset is distributed as monthly ZIP containers organized as YYYY/YYYY-MM.zip by filing month, with each container holding one folder per accession accepted by EDGAR during that month. For each accession, the dataset preserves a per-filing metadata.json together with every document that was part of the original EDGAR submission, with the sole exception of image files, which are excluded by design. File types found in the dataset are TXT, JSON, HTML, XFD, FRM, FIL, and PDF, reflecting the variety of exhibit packaging conventions used by filers across the 1994 to 2018 span.
A single record in the Form NSAR-B Files dataset is one complete EDGAR submission of Form NSAR-B or Form NSAR-B/A, identified by its 18-digit accession number. Physically, the record is a folder whose name is the accession number with dashes stripped (for example 000100472619000074, corresponding to accession 0001004726-19-000074). The folder bundles a per-filing metadata.json together with every document that was part of the original EDGAR submission, with the sole exception of image files. The unit of analysis is the filing as a whole: one registrant (or one umbrella trust), one reporting period ending at fiscal year-end, one accession, and the full set of answer-file responses plus any required exhibits. There is no archive-root metadata.json; metadata is always co-located with the documents of the filing it describes.
A record stacks three content layers:
metadata.json) that catalogs the filing's identifiers, registrant entities, document inventory, and SEC.gov references.<DOCUMENT> envelope.The per-document SGML <DOCUMENT> wrapper has the canonical EDGAR shape: header tags <TYPE>, <SEQUENCE>, and <FILENAME> appear without closing tags, followed by a <TEXT> ... </TEXT> block bracketing the payload, all enclosed by <DOCUMENT> ... </DOCUMENT>. The wrapper is consistent across the answer file and all exhibits; only the payload inside <TEXT> differs.
The metadata object is a single flat JSON dictionary describing the filing and its document inventory. The intentional fields are:
formType — either NSAR-B or NSAR-B/A (the /A suffix denotes an amendment to a previously submitted report).accessionNo — the dash-formatted EDGAR accession number (e.g. 0001004726-19-000074).linkToFilingDetails, linkToTxt, linkToHtml — SEC.gov URLs to the EDGAR filing index page, the complete-submission text file, and the primary HTML document respectively.linkToXbrl — present but always empty, because N-SAR predates XBRL tagging and no XBRL instance documents are ever attached to NSAR filings.description — the EDGAR human-readable form description, including the literal [Amend] tag for amendments.filedAt — ISO-8601 timestamp with timezone offset recording when EDGAR accepted the submission.id — an opaque dataset-internal hex identifier.documentFormatFiles[] — one entry per document in the EDGAR submission. Each entry carries a string sequence number, a string size in bytes, the SEC.gov documentUrl, the EDGAR type label (e.g. NSAR-B, NSAR-B/A, EX-99.77B ACCT LTTR), and an optional description. The array also contains a rollup entry for the complete-submission text file, recognizable by a blank " " sequence and type and the literal description "Complete submission text file"; this rollup is referenced but not extracted into the filing folder.dataFiles[] — always an empty array for this dataset; N-SAR has no structured data payloads to enumerate.entities[] — one object per registrant filing entity. Each carries companyName, a 10-digit zero-padded cik, a type describing the entity's role in the filing, fiscalYearEnd as an MMDD string ("0430" denotes April 30), a 2-letter stateOfIncorporation, the regulatory act code ("40" for the Investment Company Act of 1940), the SEC fileNo (e.g. 811-22497), a filmNo, and an irsNo (often 000000000 for series trusts).Numeric-looking metadata values (size, cik, fiscalYearEnd, act, filmNo, irsNo) are uniformly serialized as JSON strings, preserving leading zeros and original EDGAR formatting.
A critical interpretive nuance concerns series trusts. When a single NSAR-B is filed on behalf of multiple ETF or mutual-fund series under one umbrella registrant, only the umbrella trust appears in entities[]. The individual series names and their per-series financial data are not enumerated as separate entity rows: they are declared inside the answer file under item 007 and then propagated across subsequent items through the answer file's positional series-index encoding. Analysis at the series level must therefore parse the answer file rather than relying on metadata alone.
The principal NSAR-B payload is delivered as a fixed-format answer file, conventionally named answer.fil (the .fil extension is an EDGAR/N-SAR convention, though filer-chosen names occur). It is not prose, JSON, or XML: it is a line-oriented, column-positional encoding of responses to the standardized N-SAR question schedule. Each non-blank data line follows the structure:
1
<3-digit item> <1-letter sub-item> <6-digit position code> <value>
For example, a line such as 000 B000000 04/30/2018 records item 000, sub-item B, position code 000000, with the value 04/30/2018 (the period-of-report end date). The six-digit position code packs two semantic dimensions and is the key to disaggregating multi-series, multi-adviser, multi-broker disclosures:
000000 are registrant-wide.0NNNNN enumerate per-series records (000100 = series 1, 000200 = series 2, and so on), tying back to the series declared in item 007.AA, BB, CC, and so on — identify sub-record categories such as advisers (00AA01 = adviser 1 of the registrant), sub-advisers, principal underwriters, custodians, transfer agents, and brokers; the trailing two digits index instances within that category.Numeric values are right-aligned within fixed-width value columns; short alphanumeric values flow leftward from the value-column origin.
Items in the N-SAR schedule are grouped by topic:
Two layout artifacts intrude on the otherwise grid-like file. Page-break markers of the form <PAGE> PAGE N appear throughout, reflecting the printed-form heritage of N-SAR; they are presentation-only and do not delimit logical records. At the foot of the answer file, outside the numbered grid, literal SIGNATURE and TITLE lines carry the signing officer's name and title. This is the only place where the responsible officer is disclosed in machine-parseable form; it does not propagate into metadata.json.
NSAR-B records routinely include exhibit documents as separate SGML-wrapped files alongside the answer file. The most common is EX-99.77B ACCT LTTR, the independent registered public accountant's report on the fund's system of internal accounting controls, required by Item 77B. This exhibit is plain English prose, typically hard-wrapped at 60-70 characters per line, signed by the audit firm, and describing the auditors' procedures and findings.
Other item 77 sub-letter exhibits attach when the corresponding sub-item is triggered, including:
Item 102 attachments cover matters related to series-trust structures. Each exhibit's type label in documentFormatFiles[] follows the EX-99.<sub-item> <SHORT-LABEL> form. Exhibit filenames are filer-chosen (e.g. Audit.txt, 77B.htm) but always match the document name shown on the EDGAR filing index.
Each record's folder contains the per-filing metadata.json, the SGML-wrapped primary NSAR-B answer file, and the SGML-wrapped exhibit documents enumerated by EDGAR for that submission. JSON applies only to metadata.json; the answer file is almost always a .fil text payload; exhibits arrive as .txt, .htm/.html, .pdf, .frm, or .xfd depending on filer choice and era.
Image files are explicitly excluded — filers occasionally attached signature graphics, organizational charts, or scanned letterhead that would inflate the dataset without adding parseable content. The complete-submission rollup <accession>.txt (the concatenation of every part wrapped in a single SGML stream that EDGAR produces for download convenience) is referenced in documentFormatFiles[] but is not extracted into the folder, since its contents are fully recoverable by concatenating the per-document files already present. N-SAR predates the Commission's XBRL programs entirely, so linkToXbrl is structurally empty and dataFiles[] is always an empty array; there is no XBRL instance, no inline XBRL, and no structured financial data file in the record.
Form N-SAR remained substantially stable from its introduction through its 2018 rescission, but several content adjustments are visible across the dataset:
The early 1994-1996 portion of the dataset uses the original N-SAR answer schema; subsequent SEC updates revised the item count, refined sub-item labeling, and added items capturing newer fund practices (master-feeder structures, multi-class share arrangements, exchange-traded fund characteristics). The positional answer-file encoding accommodated these expansions without restructuring: new series, advisers, classes, or sub-advisers are added simply as new position-code rows under the same <item><sub-item><position-code> grammar.
Across the entire span of the dataset, the primary NSAR-B answer payload is delivered as a positionally encoded plain-text answer file inside an SGML wrapper. This format is unchanged from 1994 through the form's 2018 sunset; N-SAR never adopted HTML, XBRL, or iXBRL for its primary payload. What evolves over time is the exhibit layer. Early filings render item-77 exhibits as ASCII text within the SGML envelope. From the late 1990s onward, HTML-tagged exhibits become common, and PDF exhibits appear when filers chose to attach signed letters or formal accountant reports as PDFs. The less common .frm and .xfd file types correspond to specialized EDGAR exhibit conventions used in particular eras. The metadata.json per-filing JSON is a dataset-side packaging artifact applied uniformly across the entire archive; it is not part of the original EDGAR submission.
Several characteristics of NSAR-B records warrant care during extraction and analysis:
<item><sub-item><position-code> key on each line and reconcile it against the official N-SAR schedule to map values into meaningful fields. The position code is not opaque — its internal structure (series index, sub-record category, sub-record instance) is essential for joining per-series and per-adviser rows back to the entities they describe.entities[]. Any series-level analysis must parse the answer file.NSAR-B/A) replace or correct earlier submissions but are filed as fresh accessions; the dataset does not link amendments to their originals beyond the file number and registrant CIK, so amendment chains must be reconstructed by joining on fileNo, cik, and the item-000 period of report.<PAGE> markers inside the answer file are layout artifacts of the printed N-SAR form and do not delimit logical records; parsers should strip them before tokenizing.documentFormatFiles[] but not present on disk; downstream consumers expecting a single concatenated artifact must build it themselves or fall back to the per-document files.MMDD and CCNNNN patterns.SIGNATURE / TITLE lines) does not propagate into metadata.json and must be extracted from the answer payload directly.Each NSAR-B record is filed by a registered management investment company acting as the EDGAR registrant. The filing is signed by an authorized officer of the fund (typically the treasurer or principal financial officer). Advisers, sub-advisers, principal underwriters, custodians, transfer agents, accountants, and affiliated brokers are described inside the answer file but have no NSAR filing obligation of their own. Where a series trust is the registrant, the trust is the single legal filer; its constituent series (mutual fund or ETF portfolios) are encoded inside the answer file at item 007, not as separate filers.
Form N-SAR — and therefore the NSAR-A / NSAR-B half-year pair — applied to management investment companies registered under the Investment Company Act of 1940:
Excluded from the dataset:
NSAR-B is periodic and schedule-driven, not event-driven. The trigger is the close of the registrant's fiscal year:
Because the trigger is each fund's own fiscal year-end, NSAR-B filings are distributed across all twelve calendar months. There is no materiality threshold: every registered management investment company in operation during the second half of its fiscal year had a filing obligation, including funds winding down or in the process of deregistering.
Form N-SAR was prescribed under:
Filings carry the 40 act designator in EDGAR metadata, distinguishing them from 1933 Act and 1934 Act filings.
The statutory deadline is 60 days after the close of the reporting period — for NSAR-B, 60 days after fiscal year-end. The deadline is uniform across all management investment company registrants; there is no accelerated-filer tier of the kind that applies to Exchange Act periodic reports. Late filings are accepted by EDGAR but represent a Section 30 reporting deficiency. Funds that deregistered mid-period generally remained obligated to file a final NSAR-B for the partial period.
An NSAR-B/A is an amendment to a previously filed NSAR-B, filed by the same registrant under the same CIK and file number. Amendments are triggered by the need to correct numeric responses, replace exhibits (such as the Item 77B accountants' internal-controls letter), or supplement missing disclosures. The amendment retains the original period-of-report date in item 000 B; only the formType changes to NSAR-B/A. There is no statutory deadline for amendments — an NSAR-B/A can be filed years after the original, and multiple amendments to the same period are possible.
The dataset is therefore closed-ended: no genuinely new NSAR-B original filings arise after fund fiscal year-ends in early-to-mid 2018, but NSAR-B/A amendments against the historical record continue to arrive.
entities[] metadata lists only the trust as filer; per-series data must be parsed from the positional .fil payload under item 007.Form NSAR-B belongs to a tightly clustered family of registered investment company filings under the Investment Company Act of 1940. The same registrant typically files several of these in parallel, which is the main source of confusion. The comparisons below isolate where each adjacent filing overlaps with NSAR-B and where the boundary lies.
NSAR-A is the direct sibling: identical data structure, identical item-numbered answer format, identical operational and financial items (adviser and sub-adviser identification, portfolio turnover, sales of shares, sales loads, 12b-1 fees, brokerage allocation, income and expense). The difference is purely the reporting window. NSAR-A covers the first six months of the fiscal year; NSAR-B covers the second six months plus full-year roll-up items (accountant-related disclosures and other annual-only fields) that NSAR-A does not carry. For any continuous panel of N-SAR operational data, NSAR-A is the required complement.
N-CEN replaced the entire N-SAR regime under the SEC's 2016 reporting modernization rules. It is annual rather than semi-annual, structured XML rather than FRM/XFD answer forms, and revises the item set (cybersecurity, securities lending, line-of-credit usage, updated classifications). NSAR-B effectively terminates with fiscal periods ending on or before May 31, 2018; N-CEN picks up after. The content overlap is substantial but item numbering and granularity differ, so linking the two requires an explicit mapping layer. NSAR-B is the historical leg (1994 to mid-2018); N-CEN is the forward leg.
N-CSR (annual) and Form N-CSRS (semi-annual) carry the shareholder-facing reports: schedule of investments, audited or unaudited financial statements, MD&A-style commentary, and Sarbanes-Oxley officer certifications. Annual N-CSR shares NSAR-B's fiscal-year window but differs in purpose and form: N-CSR is narrative and investor-directed; NSAR-B is a structured operational questionnaire directed to the Commission. Overlapping items such as portfolio turnover and expense ratios appear in N-CSR only as embedded line items inside financial statements, not as discrete machine-readable fields. N-CSR is not a substitute for NSAR-B on brokerage commission allocation, sub-adviser identity, or 12b-1 plan mechanics.
N-Q reported portfolio holdings for the first and third fiscal quarters from 2004 to 2019. It overlaps the NSAR-B era for roughly fifteen years but covers a different layer: line-item holdings, not operational metrics. NSAR-B reports a single portfolio turnover number; N-Q (plus N-CSR holdings schedules) shows the positions behind that turnover. Complementary, not substitutable.
N-PORT replaced N-Q in 2018-2019 with monthly portfolio reporting in structured XML (third month of each quarter made public 60 days after quarter end). Like N-Q, it is holdings-focused. It has no analogue for NSAR-B's adviser arrangements, brokerage practices, distribution fees, or sales-load economics. It is also high-frequency and natively structured, while NSAR-B is semi-annual and produced from FRM/XFD answer forms that require parsing.
Form N-MFP is the monthly portfolio and operational report for money market funds in structured XML. For the money market subset of NSAR-B filers, N-MFP provides far higher-frequency holdings, weighted average maturity, shadow NAV, and shareholder-flow data. NSAR-B covers a broader filer population (all management investment companies) and a broader operational scope (advisers, 12b-1, brokerage), but is shallower per filing for money market funds.
Form 24F-2 is filed annually by open-end funds and unit investment trusts to pay registration fees based on net share sales for the fiscal year. Its overlap with NSAR-B is narrow: gross sales of shares and the resulting fee calculation. NSAR-B carries the same share-sales data plus the full operational item set. 24F-2 is a partial cross-check for the sales-of-shares slice only.
Form 485APOS (subject to SEC review) and Form 485BPOS (immediately effective) update the fund's prospectus and Statement of Additional Information. They disclose adviser and sub-adviser identities, 12b-1 plan terms, fee tables, and sales-load schedules to investors. Overlap with NSAR-B is on identification and permitted structure, but the framing is forward-looking and disclosure-oriented (what the fund is permitted to do and charge), while NSAR-B is retrospective and operational (what the fund actually did and charged during the period). NSAR-B carries realized values; 485 filings describe permitted ranges and plan terms.
NSAR-B is distinct in this neighborhood for three reasons. First, it is the only filing family that aggregates fund-level operational disclosure at semi-annual cadence into a single structured questionnaire spanning advisers and sub-advisers, brokerage allocation and soft-dollar practices, 12b-1 payments, sales-load economics, portfolio turnover, and income and expense components. Second, it is fixed in historical scope: the 1994-2018 N-SAR regime has no continuous successor with identical item structure, since N-CEN reorganized the taxonomy and N-PORT, N-MFP, N-CSR, and 24F-2 each carry only fragments of the original NSAR content. Third, NSAR-B is the period-end leg of the N-SAR pair, carrying year-end-only items that NSAR-A omits.
NSAR-B is therefore complementary, not substitutable, with portfolio-holdings forms (N-Q, N-PORT, N-MFP) and shareholder-report forms (N-CSR, N-CSRS). It is only partially substitutable with 24F-2 (share-sales data) and with 485APOS/485BPOS (adviser and fee structure identification). For continuous coverage of the same operational items beyond mid-2018, N-CEN is the required forward extension; for full-fiscal-year coverage inside the N-SAR era, NSAR-A is the required complement.
NSAR-B captures fiscal-year operations for registered management investment companies from 1994 through the transition to Form N-CEN. The professional users below draw on its standardized items covering 12b-1 plans, sub-adviser identities, advisory fee schedules, brokerage allocation, portfolio turnover, sales loads, and expense breakdowns.
Empirical researchers build pre-N-CEN panels on fund fees, governance, and industry structure. Standardized items on 12b-1 rates, advisory fee schedules, sub-adviser identification, sales loads, CDSCs, turnover, and aggregate brokerage commissions support studies of fee dispersion, soft-dollar use, sub-advisory delegation, and turnover-expense relationships. Many use NSAR-B to extend vendor fee histories backward or to validate commercial data against primary disclosures.
Competitive-intelligence teams benchmark distribution economics and sub-advisory mandates across complexes. 12b-1 fees, front-end and deferred loads, principal underwriter identity, and transfer-agent relationships feed channel-economics models; sub-adviser identification, breakpoints, and period-end AUM map sub-advisory market share. Long horizons reveal multi-decade trends in fee compression, share-class proliferation, and portfolio-management outsourcing.
Fund counsel and in-house compliance staff retrieve historical NSAR-B and NSAR-B/A filings for regulatory inquiries, internal audits, and board briefings. Amendments are central because they expose corrections and restatements. Review typically focuses on historical brokerage allocation, affiliated transactions, and 12b-1 plan operation relevant to Rule 38a-1 programs.
Testifying experts assemble fee, expense, and AUM series for the funds at issue and comparator funds. Advisory and sub-advisory rates, total expenses, net assets, and 12b-1 fees support peer comparisons, breakpoint analysis, and economies-of-scale arguments. Multi-year series document the fee pattern across the limitations period and over a fund's full history.
Historians of the fund industry and analysts at regulatory agencies and policy institutes use NSAR-B as a primary source on the rise of no-load funds, the diffusion of multi-class structures, adviser consolidation, and the effects of Rule 12b-1. Related-party items inform work on independent-director behavior; aggregate brokerage data informs soft-dollar policy review.
Vendors of commercial fund databases, plus engineering teams at asset managers and consultancies, parse NSAR-B's numbered items into normalized tables of advisers, sub-advisers, commissions, turnover, sales and redemption activity, and 12b-1 spend. NSAR-B is treated as the canonical source for backfilling pre-N-CEN fields and reconciling vendor discrepancies; NSAR-B/A filings track corrections.
Forensic accountants reconstruct how sales charges, advisory and sub-advisory fees, commissions, soft-dollar arrangements, and 12b-1 payments were recorded in specific periods. Items on payments to affiliated brokers, directed brokerage, and aggregate commissions support inquiries into self-dealing, undisclosed conflicts, and misallocated expenses. Accession numbers and fiscal periods anchor findings to discrete submissions.
Independent directors, board counsel, and 15(c) consultants use NSAR-B to support advisory-contract renewals. Advisory and sub-advisory rates, expense ratios, distribution fees, and brokerage practices feed peer comparisons and trend analyses that boards are expected to weigh. The long history documents how fund economics have evolved across contract cycles.
Economists examining intermediation costs and best-execution policy use NSAR-B's commission totals, identified commission recipients, and directed-brokerage disclosures to study trading-cost externalities, soft-dollar usage, and revenue sharing.
Developers training retrieval-augmented systems on fund disclosures use NSAR-B to cover a regulatory regime sparsely represented in modern fund data. Structured numeric items plus narrative attachments support both extractive QA on fund-level facts and broader retrieval over legacy operational disclosures, including terminology and item structures that predate current forms.
Concrete workflows the Form NSAR-B Files dataset supports. Each one ties to specific answer-file items, position codes, exhibits, or metadata fields.
Parse item 028 (12b-1 plan fees) from the answer file and divide by item 074 period-end net assets per series, joining on the series enumeration from item 007 and the 0NNNNN position codes that disaggregate multi-series trusts. Output a year-by-series panel of distribution-fee burden by share class, suitable for fee-compression trend analysis across the full pre-N-CEN era. Use entities[].cik and fileNo from metadata.json plus the item-000 period-of-report to align fiscal years across umbrella registrants.
Extract sub-adviser names and CRD/file identifiers from item 008 and the related sub-adviser items, keyed by the 00AA01, 00AA02, ... position-code instances. Pivot to an adviser-to-sub-adviser edge list with weights from item 075 average net assets, then track delegations year by year using the item-000 period date. Output supports studies on multi-manager mandate growth, sub-advisory market share by complex, and rebalancing of delegation following adviser consolidation.
For a target fund and its peer set, pull advisory fee rates and breakpoints from item 008, total expenses from item 072, average net assets from item 075, and 12b-1 payments from item 028 across the full available fiscal history. Compute effective fee rates net of waivers and produce per-year peer tables anchored to specific Section 36(b) accession numbers, so each row in the litigation exhibit is traceable to a discrete EDGAR filing.
EX-99.77B ACCT LTTR exhibit corpus for internal-control languageConcatenate the SGML-wrapped EX-99.77B ACCT LTTR payloads identified via documentFormatFiles[].type across all accessions, normalize hard-wrapped 60-70 character text, and run topic models or template clustering keyed on audit firm signature. Produce a corpus of internal-control attestations across two decades of fund auditors, useful for studying boilerplate drift, audit-firm-specific language, and post-SOX disclosure shifts.
Filter documentFormatFiles[] entries whose type matches EX-99.77K (changes in certifying accountant), EX-99.77C (matters submitted to a vote), and EX-99.77F (changes in directors or principal officers). Join to entities[].cik and filedAt to build a registrant-level event timeline of auditor changes, contested votes, and board turnover. Supports compliance monitoring workflows and event-study research designs.
Identify NSAR-B/A filings via formType and the [Amend] tag in description, then group by (cik, fileNo, item-000 period-of-report) to recover the original-to-amended sequence the dataset does not explicitly link. Diff the answer files line by line on the <item><sub-item><position-code> key to surface restated values in items 072 (income/expense), 074 (balance sheet), or 028 (12b-1). Output feeds forensic accounting reviews and vendor-data reconciliation pipelines.
Parse items 007 (series), 008 (advisers), item 010 (sub-advisers), 020-021 (principal underwriters and transfer agents), item 026-027 (brokerage and soft-dollar), 062-064 (sales loads and CDSCs), 070 (portfolio turnover), and 072-075 (financials) into a normalized adviser/series/period schema. Map each row to its source accession, document sequence, and answer-file line number so downstream vendor tables can carry primary-source provenance for every cell.
Aggregate item 026 and item 027 commission totals and broker enumerations (using the 00BB01, 00BB02, ... broker-instance codes) by registrant CIK and fiscal year. Compare directed-brokerage share against aggregate commissions paid to affiliated brokers to produce a multi-decade view of soft-dollar intensity, supporting market-structure research and best-execution policy review.
Parse the trailing SIGNATURE and TITLE lines at the foot of each answer file (these do not propagate into metadata.json) and join to the registrant's cik and the item-000 period of report. Build a registrant-by-period roster of who signed each filing, then cross-reference against item-77F director-change exhibits and item 008 adviser identities to study officer-level continuity and accountability over a fund's lifecycle.
Dataset Index JSON API: [https://api.sec-api.io/datasets/form-nsarb-files.json](https://sec-api.io/datasets)
This endpoint returns the dataset's metadata and the complete list of container files available for download. The response includes the dataset name, description, last updated timestamp, earliest sample date (1994-01-01), total record and size counters, covered form types (NSAR-B, NSAR-B/A), container format (ZIP), and the file types contained inside each archive (TXT, JSON, HTML, XFD, FRM, FIL, PDF). It also exposes the full dataset download URL and a containers array listing every individual archive with its key, size, record count, updated timestamp, and direct download URL. This endpoint does not require an API key.
The index is the recommended way to monitor the dataset between refresh runs. By comparing the updatedAt field of each container against the previous index snapshot, downstream pipelines can detect which monthly archives changed in the latest refresh and download only those containers instead of pulling the full dataset.
Example response:
1
{
2
"datasetId": "1f13365b-9ae0-68f5-8b45-f45d2ffd5b28",
3
"datasetDownloadUrl": "https://api.sec-api.io/datasets/form-nsarb-files.zip",
4
"name": "Form NSAR-B Files Dataset",
5
"updatedAt": "2026-04-14T11:12:24.793Z",
6
"earliestSampleDate": "1994-01-01",
7
"totalRecords": 313215,
8
"totalSize": 980191659,
9
"formTypes": ["NSAR-B", "NSAR-B/A"],
10
"containerFormat": "ZIP",
11
"fileTypes": ["TXT", "JSON", "HTML", "XFD", "FRM", "FIL", "PDF"],
12
"containers": [
13
{
14
"downloadUrl": "https://api.sec-api.io/datasets/form-nsarb-files/2026/2026-03.zip",
15
"key": "2026/2026-03.zip",
16
"size": 13818783,
17
"records": 154,
18
"updatedAt": "2026-04-14T11:12:24.793Z"
19
}
20
]
21
}
Download Entire Dataset: [https://api.sec-api.io/datasets/form-nsarb-files.zip](https://sec-api.io/datasets)?token=YOUR_API_KEY
Downloads the complete dataset as a single ZIP archive containing every NSAR-B and NSAR-B/A filing from January 1994 to the latest refresh. This endpoint requires an API key.
Download Single Container: [https://api.sec-api.io/datasets/form-nsarb-files/2026/2026-03.zip](https://sec-api.io/datasets)?token=YOUR_API_KEY
Downloads one monthly container instead of the full archive. Container keys follow a YYYY/YYYY-MM.zip pattern; the full list is available in the dataset index JSON API. This endpoint requires an API key.
The dataset covers Form NSAR-B and Form NSAR-B/A. NSAR-B is the second-half semi-annual report on Form N-SAR for registered management investment companies, filed within 60 days of fiscal year-end. NSAR-B/A submissions are amendments to previously filed NSAR-B reports.
One record is a single complete EDGAR submission of Form NSAR-B or NSAR-B/A, identified by an 18-digit accession number. Physically, the record is a folder named after the accession number (with dashes stripped) that contains a per-filing metadata.json plus every original EDGAR document for that submission except image files.
Form NSAR-B was filed by management investment companies registered under the Investment Company Act of 1940 — including open-end funds (mutual funds, money market funds, registered ETFs), closed-end funds, small business investment companies, and series trusts. Unit investment trusts, face-amount certificate companies, statutory BDCs, private funds, and standalone investment advisers are not part of this filer population.
The earliest filings date to January 1, 1994, when EDGAR began accepting investment company reports electronically. The SEC rescinded Form N-SAR effective June 1, 2018 and replaced it with Form N-CEN, so the dataset is a closed historical corpus with a continuing tail of NSAR-B/A amendments against pre-2018 reporting periods.
metadata.json is JSON. The primary NSAR-B answer file is a positionally encoded plain-text payload (typically .fil) inside an SGML <DOCUMENT> wrapper. Exhibits arrive as .txt, .htm/.html, .pdf, .frm, or .xfd depending on filer choice and era. N-SAR never adopted XBRL, so linkToXbrl is always empty and dataFiles[] is always an empty array.
Form N-CEN replaced the N-SAR regime under the SEC's 2016 reporting modernization rules. N-CEN is annual rather than semi-annual, is filed as structured XML rather than positional answer files, and revises the item set (adding cybersecurity, securities lending, and line-of-credit disclosures while reorganizing classifications). NSAR-B is the historical leg covering 1994 through mid-2018; N-CEN is the forward leg from June 2018 onward.
The dataset is distributed as monthly ZIP containers organized as YYYY/YYYY-MM.zip by filing month. The dataset index JSON API at https://api.sec-api.io/datasets/form-nsarb-files.json lists every container with its size, record count, and direct download URL; consumers can download the full archive or individual monthly containers using an API key.