Form 497K Files Dataset

The Form 497K Files Dataset is a continuous, accession-level corpus of every Form 497K summary prospectus transmitted to EDGAR under Rule 497(k) of the Securities Act of 1933. Each record represents a single 497K submission by a registered open-end management investment companies — a U.S. mutual fund or exchange-traded fund — and pairs a normalized metadata.json header with the original SGML-wrapped HTML summary prospectus as filed. The dataset begins in April 2009, when Rule 497(k) first became operational after the SEC's January 2009 Enhanced Disclosure and New Prospectus Delivery Option adopting release (Release No. 33-8998, effective 13 March 2009), and is refreshed monthly as new filings are transmitted. Records are grouped into monthly ZIP containers named YYYY-MM.zip, each carrying one subdirectory per accession with the primary .htm document and its structured header. The corpus is designed for section-level extraction of the standardized Item 2 through Item 8 disclosures that Rule 498 prescribes for summary prospectuses.

Update Frequency
Daily
Updated at
2026-05-19
Earliest Sample Date
2009-04-01
Total Size
3.7 GB
Total Records
331,439
Container Format
ZIP
Content Types
HTML, JSON, TXT, PDF
Form Types
497K

Dataset APIs

Programmatically retrieve the full list of dataset archive files, download URLs and dataset metadata.

Dataset Index JSON API

Download the entire dataset as a single archive file.

Download Entire Dataset:

Download a single container file (e.g. monthly archive) from the dataset.

Download Single Container:

Dataset Files

206 files · 3.7 GB
Download All
2026-05.zip11.1 MB1,639 records
2026-04.zip57.5 MB7,890 records
2026-03.zip16.1 MB1,321 records
2026-02.zip48.2 MB2,932 records
2026-01.zip23.7 MB1,565 records
2025-12.zip26.3 MB1,868 records
2025-11.zip16.0 MB1,117 records
2025-10.zip23.3 MB1,655 records
2025-09.zip22.5 MB1,568 records
2025-08.zip16.7 MB1,230 records
2025-07.zip20.0 MB1,332 records
2025-06.zip14.9 MB1,255 records
2025-05.zip20.4 MB1,699 records
2025-04.zip51.9 MB3,602 records
2025-03.zip22.6 MB1,685 records
2025-02.zip47.8 MB3,009 records
2025-01.zip22.2 MB1,571 records
2024-12.zip27.0 MB1,983 records
2024-11.zip13.9 MB1,096 records
2024-10.zip21.4 MB1,563 records
2024-09.zip16.3 MB1,229 records
2024-08.zip18.2 MB1,418 records
2024-07.zip19.1 MB1,490 records
2024-06.zip11.3 MB994 records
2024-05.zip19.9 MB1,464 records
2024-04.zip48.2 MB3,358 records
2024-03.zip18.3 MB1,725 records
2024-02.zip40.5 MB2,631 records
2024-01.zip19.5 MB1,634 records
2023-12.zip24.3 MB1,733 records
2023-11.zip10.9 MB864 records
2023-10.zip16.5 MB1,178 records
2023-09.zip21.5 MB1,611 records
2023-08.zip12.8 MB1,009 records
2023-07.zip16.9 MB1,183 records
2023-06.zip13.5 MB1,223 records
2023-05.zip18.1 MB1,350 records
2023-04.zip46.8 MB3,253 records
2023-03.zip17.4 MB1,686 records
2023-02.zip37.4 MB2,678 records
2023-01.zip17.6 MB1,346 records
2022-12.zip21.3 MB1,643 records
2022-11.zip11.3 MB810 records
2022-10.zip15.6 MB1,395 records
2022-09.zip19.1 MB1,381 records
2022-08.zip16.0 MB1,373 records
2022-07.zip17.8 MB1,262 records
2022-06.zip12.0 MB978 records
2022-05.zip14.8 MB1,097 records
2022-04.zip52.4 MB3,772 records
2022-03.zip21.9 MB1,925 records
2022-02.zip34.6 MB2,401 records
2022-01.zip18.2 MB1,409 records
2021-12.zip29.5 MB2,085 records
2021-11.zip12.7 MB1,239 records
2021-10.zip16.0 MB1,413 records
2021-09.zip17.9 MB1,471 records
2021-08.zip13.6 MB1,034 records
2021-07.zip18.7 MB1,566 records
2021-06.zip14.8 MB1,134 records
2021-05.zip13.3 MB1,111 records
2021-04.zip48.0 MB3,790 records
2021-03.zip18.4 MB1,619 records
2021-02.zip36.9 MB2,620 records
2021-01.zip14.5 MB1,161 records
2020-12.zip24.7 MB1,987 records
2020-11.zip10.1 MB980 records
2020-10.zip17.8 MB1,476 records
2020-09.zip17.5 MB1,405 records
2020-08.zip17.9 MB1,416 records
2020-07.zip17.9 MB1,453 records
2020-06.zip19.4 MB1,869 records
2020-05.zip18.4 MB1,706 records
2020-04.zip45.7 MB4,400 records
2020-03.zip19.2 MB2,642 records
2020-02.zip39.0 MB3,021 records
2020-01.zip17.6 MB1,396 records
2019-12.zip21.3 MB1,937 records
2019-11.zip14.3 MB1,278 records
2019-10.zip13.6 MB1,163 records
2019-09.zip16.2 MB1,494 records
2019-08.zip11.9 MB1,117 records
2019-07.zip18.7 MB1,689 records
2019-06.zip13.9 MB1,386 records
2019-05.zip22.6 MB1,936 records
2019-04.zip40.9 MB3,495 records
2019-03.zip14.8 MB1,540 records
2019-02.zip36.6 MB2,876 records
2019-01.zip19.2 MB1,916 records
2018-12.zip21.2 MB2,103 records
2018-11.zip12.4 MB1,226 records
2018-10.zip16.5 MB1,641 records
2018-09.zip16.1 MB1,330 records
2018-08.zip15.9 MB1,643 records
2018-07.zip15.8 MB1,355 records
2018-06.zip16.1 MB1,611 records
2018-05.zip17.3 MB1,724 records
2018-04.zip43.4 MB3,839 records
2018-03.zip16.1 MB1,437 records
2018-02.zip31.7 MB2,624 records
2018-01.zip16.8 MB1,524 records
2017-12.zip19.2 MB1,836 records
2017-11.zip13.1 MB1,535 records
2017-10.zip15.1 MB1,477 records
2017-09.zip16.4 MB1,589 records
2017-08.zip13.9 MB1,399 records
2017-07.zip15.2 MB1,595 records
2017-06.zip18.1 MB1,953 records
2017-05.zip20.1 MB1,882 records
2017-04.zip52.8 MB4,937 records
2017-03.zip21.9 MB2,142 records
2017-02.zip27.8 MB2,462 records
2017-01.zip15.4 MB1,448 records
2016-12.zip20.7 MB2,206 records
2016-11.zip11.9 MB1,183 records
2016-10.zip13.8 MB1,359 records
2016-09.zip20.7 MB2,090 records
2016-08.zip10.8 MB1,064 records
2016-07.zip18.8 MB1,643 records
2016-06.zip13.2 MB1,325 records
2016-05.zip11.8 MB1,151 records
2016-04.zip40.7 MB3,855 records
2016-03.zip14.3 MB1,481 records
2016-02.zip30.8 MB2,764 records
2016-01.zip16.7 MB1,603 records
2015-12.zip21.1 MB2,030 records
2015-11.zip11.5 MB1,136 records
2015-10.zip16.4 MB1,590 records
2015-09.zip16.4 MB1,447 records
2015-08.zip10.1 MB988 records
2015-07.zip20.1 MB1,812 records
2015-06.zip12.6 MB1,347 records
2015-05.zip14.9 MB1,446 records
2015-04.zip35.3 MB3,333 records
2015-03.zip13.2 MB1,309 records
2015-02.zip26.8 MB2,471 records
2015-01.zip12.6 MB1,197 records
2014-12.zip18.3 MB1,775 records
2014-11.zip9.4 MB861 records
2014-10.zip12.2 MB1,191 records
2014-09.zip14.5 MB1,302 records
2014-08.zip11.1 MB1,104 records
2014-07.zip14.3 MB1,356 records
2014-06.zip9.9 MB1,211 records
2014-05.zip13.8 MB1,435 records
2014-04.zip32.2 MB3,304 records
2014-03.zip10.0 MB1,043 records
2014-02.zip25.0 MB2,413 records
2014-01.zip16.6 MB1,641 records
2013-12.zip17.5 MB1,722 records
2013-11.zip10.1 MB953 records
2013-10.zip11.0 MB1,164 records
2013-09.zip13.3 MB1,349 records
2013-08.zip10.7 MB1,121 records
2013-07.zip15.3 MB1,514 records
2013-06.zip9.9 MB1,061 records
2013-05.zip13.5 MB1,592 records
2013-04.zip33.8 MB3,432 records
2013-03.zip11.4 MB1,266 records
2013-02.zip20.7 MB1,904 records
2013-01.zip12.4 MB1,303 records
2012-12.zip13.5 MB1,402 records
2012-11.zip7.5 MB804 records
2012-10.zip9.2 MB852 records
2012-09.zip12.0 MB1,200 records
2012-08.zip10.2 MB996 records
2012-07.zip11.2 MB1,129 records
2012-06.zip9.2 MB968 records
2012-05.zip11.7 MB1,318 records
2012-04.zip28.4 MB3,011 records
2012-03.zip8.3 MB895 records
2012-02.zip18.8 MB1,963 records
2012-01.zip10.9 MB1,120 records
2011-12.zip16.0 MB1,669 records
2011-11.zip7.2 MB660 records
2011-10.zip9.9 MB966 records
2011-09.zip9.2 MB947 records
2011-08.zip8.1 MB847 records
2011-07.zip10.4 MB1,178 records
2011-06.zip8.4 MB950 records
2011-05.zip14.3 MB1,433 records
2011-04.zip25.9 MB2,766 records
2011-03.zip11.6 MB1,268 records
2011-02.zip14.4 MB1,403 records
2011-01.zip12.3 MB1,328 records
2010-12.zip13.4 MB1,431 records
2010-11.zip7.6 MB773 records
2010-10.zip8.4 MB865 records
2010-09.zip7.4 MB841 records
2010-08.zip5.3 MB570 records
2010-07.zip15.3 MB1,007 records
2010-06.zip7.6 MB770 records
2010-05.zip9.8 MB1,127 records
2010-04.zip19.3 MB2,138 records
2010-03.zip9.2 MB1,023 records
2010-02.zip10.6 MB1,094 records
2010-01.zip4.8 MB554 records
2009-12.zip2.3 MB293 records
2009-11.zip3.9 MB347 records
2009-10.zip2.5 MB270 records
2009-09.zip1.6 MB124 records
2009-08.zip159.3 KB18 records
2009-07.zip227.3 KB23 records
2009-06.zip186.7 KB15 records
2009-05.zip104.8 KB12 records
2009-04.zip75.0 KB8 records

What This Dataset Contains

The dataset packages every EDGAR submission of Form 497K from April 2009 forward. Form 497K is not a free-standing registration form — it is the Rule 497 submission sub-type used to transmit a summary prospectus, the concise standardized disclosure document authorized by Rule 498 under the Securities Act of 1933. Rule 497(k) allows a fund with an effective Form N-1A registration statement to satisfy statutory prospectus-delivery obligations by providing (or making available online) a short summary prospectus rather than the full statutory prospectus, provided the summary adheres to the disclosure content and ordering requirements codified in Items 2 through 8 of Form N-1A.

A 497K filing therefore occupies a tightly constrained disclosure envelope. It is always a summary prospectus or a supplement/sticker to one; it is always tied back to an effective N-1A shelf; and it always arrives on EDGAR with structured header data identifying the specific fund series and share classes the summary covers. The content is investor-facing, narrative-plus-tabular, and event-independent — driven by fund launches, outcome-period rollovers, annual updates, fee changes, and similar disclosure events rather than by corporate events.

The dataset is distributed as monthly ZIP containers. Each accession folder inside a container holds a normalized metadata.json header and a single SGML-wrapped HTML document carrying the summary-prospectus text. Image binaries (performance charts, logos, payoff diagrams) and the composite SGML .txt submission wrapper are referenced by EDGAR URL in metadata.json but are not materialized on disk.

Content Structure of a Single Record

1. What one record represents

One record in the Form 497K Files Dataset is a single EDGAR submission of Form 497K. Physically, the record is a per-accession subdirectory inside a monthly ZIP container, named with the 18-digit zero-padded SEC accession number with dashes removed (for example 000121390025059889/ for accession 0001213900-25-059889). Each accession folder holds two materialized files: a structured metadata.json header and a single SGML-wrapped HTML document containing the summary-prospectus text itself.

A record ties together three layers: (1) the packaging unit on disk, (2) the normalized EDGAR submission header, and (3) the underlying summary-prospectus disclosure document as originally filed.

2. Packaging and container structure

At the container level, records are grouped into monthly ZIP archives named YYYY-MM.zip. Inside, a single top-level directory YYYY-MM/ holds one subdirectory per accession. A representative month contains on the order of 1,200–1,500 accession folders; the sampled June 2025 container holds 1,255 accession subdirectories.

Inside each accession folder, the on-disk payload is essentially fixed:

  1. metadata.json — structured EDGAR header object. Always present, exactly one per record.
  2. <primary>.htm — one SGML-wrapped HTML document carrying the Form 497K summary prospectus. Always present, always exactly one.

Everything else that the filer attached to the EDGAR submission — GIF/JPG performance-chart and logo graphics and the concatenated SGML .txt composite submission file — is listed under metadata.json.documentFormatFiles[] with an EDGAR URL, but is not materialized inside the ZIP. The dataset deliberately omits image binaries; the composite .txt is not duplicated on disk because the primary HTML already carries the document content. Although the dataset brief advertises PDF and TXT format support, 497K submissions in practice use HTML as the primary presentation format almost universally, so the per-accession payload is effectively HTML plus JSON.

HTML filenames are filer-specific and do not reliably contain the literal string 497k. Observed conventions include:

  • <accessionStub>_497k.htm (e.g. ea0245768-03_497k.htm, tm2517758-2_497k.htm)
  • Donnelley-style d<digits>d497k.htm (e.g. d93511d497k.htm)
  • PGIM-style f<digits>d1.htm (e.g. f42385d1.htm, which notably omits 497k from the filename)
  • Generic short names such as c497k.htm and etf1_497k.htm
  • Hybrid patterns like g108182_rdv-isi.htm

The authoritative form identifier is the SGML <TYPE>497K tag inside the document wrapper and the formType field inside metadata.json, not the filename.

3. The metadata.json header

The per-accession JSON object is a flat, consistently keyed record that mirrors and normalizes the SGML <SEC-HEADER> block of the original EDGAR submission. Observed top-level fields:

  • formType — always the string "497K".
  • accessionNo — dashed SEC accession number, e.g. "0001213900-25-059889". The containing folder name is the same number with dashes stripped and zero-padded to 18 digits.
  • effectivenessDate — ISO YYYY-MM-DD date on which the summary prospectus becomes effective; typically the first day of the month following filing, or a series-launch-specific date.
  • filedAt — full ISO-8601 timestamp with timezone offset (e.g. "2025-06-30T20:23:34-04:00") marking EDGAR acceptance.
  • description — boilerplate human-readable form label, uniform across the dataset: "Form 497K - Summary Prospectus for certain open-end management investment companies filed pursuant to Securities Act Rule 497(K)".
  • linkToFilingDetails — EDGAR URL to the primary HTML summary prospectus.
  • linkToTxt — URL to the concatenated SGML submission text file.
  • linkToHtml — URL to the accession's -index.htm landing page on EDGAR.
  • linkToXbrl — empty string throughout. Summary prospectuses filed under 497K do not carry interactive-data attachments; the structured Risk/Return Summary XBRL that corresponds to the same disclosure is filed separately under the companion registration-statement amendment (typically 485BPOS).
  • id — 32-character hexadecimal record identifier used by the provider's API.
  • documentFormatFiles[] — enumeration of every attachment in the EDGAR submission.
  • entities[] — filer-entity array.
  • seriesAndClassesContractsInformation[] — structured fund series and share-class data.
  • dataFiles[] — structured data-file array; consistently empty ([]) for 497K records because no XBRL instance is filed under this sub-type.

4. documentFormatFiles[]

Each element describes one file that was part of the original EDGAR submission package:

  • sequence — filer-assigned ordering string ("1" for the primary document, "2", "3", … for subsequent graphics; a literal single space " " for the composite .txt submission wrapper).
  • size — byte count, serialized as a string.
  • documentUrl — canonical EDGAR URL to the individual file.
  • description — filer-supplied free-text caption (e.g. "FORM 497K", "GRAPHIC", "Complete submission text file"); occasionally absent.
  • type — EDGAR document type code: "497K" for the primary prospectus, "GRAPHIC" for embedded images, and " " (single space) for the composite text submission.

Sequence 1 is always the primary 497K HTML and is the only entry materialized on disk inside the accession folder. The remaining entries enumerate graphic attachments (performance bar charts, hypothetical-growth line charts, adviser logos, and occasional payoff-diagram images for defined-outcome ETFs) and the composite .txt — all represented as URLs, not binaries. Observed attachment counts range from two entries (HTML plus composite text only, typical of short stickers) up to eight or more for full prospectuses with many embedded charts; the sampled Royce Fund record, for instance, lists seven graphic attachments.

5. entities[]

An array of filer-entity objects capturing the SEC-registered parties attached to the submission. Per-entity fields observed:

  • companyName — legal entity name with its submission role appended in parentheses, e.g. "Innovator ETFs Trust (Filer)".
  • cik — unpadded Central Index Key.
  • fileNo — SEC file number. For 497K filings this is usually the Securities Act registration file number for the N-1A shelf ("333-xxxxxx"); older trusts use legacy "002-xxxxx" formats, and Investment Company Act registration numbers ("811-xxxxx") can also appear on multi-registered entities.
  • filmNo — the SEC-assigned film number for this specific filing event.
  • type — entity-scoped filing type ("497K").
  • act — Securities Act reference code ("33" for the 1933 Act).
  • irsNoIRS Employer Identification Number; frequently "000000000" for mutual-fund statutory trusts.
  • fiscalYearEnd — four-digit MMDD; may be absent.
  • stateOfIncorporation — two-letter jurisdiction code (commonly "DE" for Delaware statutory trusts or "MA" for Massachusetts business trusts); occasionally absent.

6. seriesAndClassesContractsInformation[]

This is the structurally distinguishing header block for open-end fund filings, reflecting the Investment Company Act of 1940 series/class reporting framework mandatory since 2006. Each array element represents one fund series covered by the summary prospectus:

  • series — SEC series identifier of the form S000######.
  • name — series name (typically the marketing name of the fund, e.g. "Innovator Equity Dual Directional 15 Buffer ETF - July").
  • classesContracts[] — array of share-class objects, each with:
    • classContract — SEC class identifier of the form C000######.
    • name — share-class name (e.g. "Investor Class", "Service Class", or simply the ETF name for single-class ETFs).
    • ticker — exchange ticker symbol when one has been assigned (e.g. "BFJL", "RYDVX", "RDVIX"); the field is omitted entirely when no ticker applies, not emitted as an empty string.

Most 497K records describe a single series with a single share class — characteristic of ETFs and of per-series summary prospectuses issued by mutual-fund trusts. Multi-class mutual-fund series populate multiple classesContracts entries (for example a Royce Fund record listing both the Service Class RYDVX and the Investment Class RDVIX under one series). Less commonly, a single 497K covers multiple series, in which case the top-level array holds several series objects.

7. The Form 497K summary prospectus document

The primary .htm file in each accession folder is an SGML-wrapped HTML rendering of the statutory summary prospectus. The outer SGML envelope follows the canonical EDGAR document-wrapper pattern:

1 <DOCUMENT>
2 <TYPE>497K
3 <SEQUENCE>1
4 <FILENAME>d93511d497k.htm
5 <DESCRIPTION>ISHARES LARGE CAP DEEP BUFFER ETF
6 <TEXT>
7 <HTML>...</HTML>
8 </TEXT>
9 </DOCUMENT>

Inside <TEXT> is a self-contained HTML document. Observed sizes range from roughly 7 KB for one-page supplements or stickers up to 250–270 KB for fully embedded multi-series summary prospectuses with inline tables and graphic references.

The body of the HTML implements the Rule 498 content schema — the ordered set of Items 2 through 8 of Form N-1A that defines a compliant summary prospectus — preceded by a cover-page block:

  • Cover page. Fund name, share-class name(s), ticker symbol(s) for each class, the date of the summary prospectus, and the legend required by Rule 498(b)(1)(v) directing investors to the full statutory prospectus and the Statement of Additional Information (SAI), with URLs and a toll-free number for free paper copies. The cover also carries the cross-reference to the date of the statutory prospectus and SAI incorporated by reference.
  • Investment Objective (Item 2). A concise statement of the fund's investment goal.
  • Fees and Expenses of the Fund (Item 3). Two standardized tables — the Shareholder Fees table (sales loads, deferred sales charges, redemption fees, exchange fees, account fees) and the Annual Fund Operating Expenses table (management fees, distribution/12b-1 fees, other expenses, acquired fund fees and expenses, total annual operating expenses, fee waivers/expense reimbursements, net expenses) — followed by the Example showing cumulative expenses on a hypothetical $10,000 investment over 1, 3, 5, and 10 years, and a Portfolio Turnover disclosure.
  • Principal Investment Strategies (Item 4(a)). Narrative describing how the fund intends to achieve its objective — including any 80% name-test policy, benchmark index, derivatives usage, and structured-outcome mechanics (buffer levels, cap rates, outcome periods) where applicable.
  • Principal Risks (Item 4(b)). Itemized list of principal risks (market, issuer, liquidity, derivatives, concentration, non-diversification, plus fund-specific risks such as buffer/cap risk, FLEX options risk, or crypto-derivatives risk).
  • Performance (Item 4(b)(2)). A bar chart of annual total returns for up to ten calendar years, an average-annual-total-returns table comparing 1-, 5-, and 10-year (or since-inception) returns to a broad-based securities market index (and, under the SEC's 2022 Tailored Shareholder Report amendments, any additional index permitted for segment benchmarking), and a statement that past performance does not indicate future results. New funds without a full calendar year of performance substitute a disclosure to that effect for the chart.
  • Investment Adviser (Item 5). Identification of the investment adviser and any sub-advisers, plus portfolio manager names, titles, and length of service.
  • Purchase and Sale of Fund Shares (Item 6). Minimum initial and subsequent investment amounts and the process by which investors buy and redeem shares — including ETF-specific creation/redemption-unit language for exchange-traded funds.
  • Tax Information (Item 7). Short statement on the tax character of distributions and the treatment of tax-advantaged accounts.
  • Payments to Broker-Dealers and Other Financial Intermediaries (Item 8). Standardized disclosure on revenue-sharing and similar intermediary compensation arrangements.

For multi-series trusts, this ordered block repeats once per series in the same HTML document, each introduced by its own fund-name heading.

A substantial minority of 497K submissions are supplements / stickers rather than full fresh summary prospectuses — short documents (often 1–10 KB) that amend a previously filed summary prospectus (for example updating a cap rate on an outcome-period ETF, adding a new share class, correcting a portfolio manager's name, or extending a fee waiver). These supplements reuse the 497K submission type because they modify the summary-prospectus content; their HTML body is correspondingly short and narrative rather than a complete Items 2–8 sequence.

8. Supporting attachments and excluded materials

Graphics embedded by the HTML — performance bar charts, hypothetical-growth line graphs, fund-company logos, and occasional payoff-diagram images for defined-outcome ETFs — are part of the original EDGAR submission and are fully enumerated under documentFormatFiles[] with their EDGAR documentUrl. They are, however, not materialized inside the ZIP; the dataset excludes image binaries by design. Likewise, the composite SGML .txt file that EDGAR produces as a concatenation of all attachments is referenced by URL (linkToTxt and the documentFormatFiles entry with type: " ") but not duplicated on disk. Consumers wanting those assets retrieve them directly from EDGAR via the URLs in metadata.json.

Multi-class and multi-series variants are not separate files — they are handled by repeated content blocks within the single HTML document and by multiple entries inside seriesAndClassesContractsInformation[].

Included in each record

  • The primary Form 497K HTML in its original SGML-wrapped form, preserving the <DOCUMENT>, <TYPE>, <SEQUENCE>, <FILENAME>, <DESCRIPTION>, and <TEXT> envelope.
  • A normalized metadata.json capturing the EDGAR header, filer entities, series/class identifiers and tickers, document list, timestamps, and EDGAR URLs.

Referenced but not materialized

  • Embedded graphic files (GIF, JPG) carrying charts, payoff diagrams, and logos.
  • The composite .txt submission wrapper.
  • The EDGAR -index.htm accession landing page.
  • The companion Form N-1A / 485BPOS registration statement (and its inline XBRL Risk/Return Summary data), which is filed under separate accessions and is structurally distinct from 497K.
  • The full statutory prospectus and Statement of Additional Information, which the summary prospectus incorporates by reference rather than embedding. Their full text is reachable only through the fund's N-1A filing history, not from inside a 497K record.

9. Evolution over time

  • 2009 introduction. The summary prospectus regime went live on 13 March 2009 (Release No. 33-8998); 497K filings begin appearing on EDGAR in April 2009. The earliest dataset records track the initial adopters, with volume expanding rapidly through 2010–2012 as major fund complexes migrated delivery programs to summary prospectuses.
  • Series/class identifier consistency. The SEC's series (S######) and class/contract (C######) identifiers, mandatory for registered open-end funds since 2006, are present in the SGML header from the start of the 497K window and are reflected in every record's seriesAndClassesContractsInformation[].
  • 2019 ETF Rule (Rule 6c-11). The ETF Rule standardized the treatment of ETFs as open-end funds and accelerated the use of 497K supplements for ETF disclosure updates (outcome-period resets, cap-rate changes, reference-index substitutions). The share of records that are short supplements grows materially from 2019 onward.
  • 2022 Tailored Shareholder Report amendments (Release No. 33-11125). The 2022 amendments to Form N-1A modified the Performance and Fees-and-Expenses items and introduced new requirements around benchmark indexes (requiring a broad-based securities market index and permitting additional indexes). Summary-prospectus content filed after the July 2024 compliance date reflects those amendments; earlier records retain the pre-amendment presentation.
  • Structured-outcome ETF proliferation. From roughly 2018 onward, a growing fraction of 497K filings corresponds to defined-outcome / buffer / floor ETFs (Innovator, First Trust, Allianz/AllianzIM, BlackRock iShares Deep Buffer, TrueShares, Calamos). These filers use 497K intensively for monthly or quarterly outcome-period launches; a single filer can produce dozens of sibling 497K accessions in one month, each tied to a different series in the S###### namespace.

Form 497K has always been filed electronically on EDGAR with an SGML wrapper and an HTML primary document — there is no pre-HTML era for this form type, because the form did not exist before 2009. The practical format variation is therefore narrow: SGML-wrapped HTML is used universally for the primary document, graphics are attached as <TYPE>GRAPHIC SGML documents referenced from inline <img> tags, and the dataset's exclusion of those binaries is a packaging choice rather than a format change.

10. Interpretation and extraction notes

  • One record equals one accession. A fund family filing a batch of monthly outcome-ETF summary prospectuses produces one record per accession; sibling accession numbers from the same filer commonly cluster in the same monthly container.
  • Supplements versus full prospectuses. Not every .htm contains a complete Items 2–8 disclosure. A 7 KB file is almost certainly a sticker or supplement amending a previously filed summary prospectus; a 200+ KB file is typically a full summary prospectus, potentially stacking multiple series.
  • Multi-series HTML. A single HTML document can contain several fund summary prospectuses concatenated sequentially; the length of seriesAndClassesContractsInformation[] is the reliable indicator of how many series are packed into the record.
  • Optional tickers. Some filings (notably new-series launches or pre-launch supplements) omit the ticker key inside a class entry entirely. Downstream parsers should treat absence as "not yet assigned or not applicable", not as an error.
  • Incorporation by reference. The summary prospectus legally incorporates the full statutory prospectus and SAI by reference. Their full text is therefore not inside a 497K record — readers needing them must follow the EDGAR links to the relevant N-1A filings for the same registrant.
  • Filename heuristics are unreliable. The primary HTML's filename does not always contain 497k (see PGIM's f42385d1.htm). Rely on the SGML <TYPE>497K tag and the formType field in metadata.json.
  • Amendments. Amendments to a 497K are refiled as new 497K accessions rather than as an /A variant; sequence of filedAt timestamps and distinct effectivenessDate values identify successive versions. There is no separate 497K/A form type.
  • XBRL is not in this record. The Risk/Return Summary structured data corresponding to the summary prospectus is filed under the companion registration-statement amendment (typically 485BPOS) as an interactive-data exhibit under Rule 405 of Regulation S-T, not under 497K. Every record in this dataset therefore has an empty linkToXbrl and an empty dataFiles[] — a normal state, not a gap.
  • Section-level extraction. Because the core disclosure follows Form N-1A Items 2–8 in a stable order, heading-anchored parsing is tractable using the canonical labels ("Investment Objective", "Fees and Expenses of the Fund", "Principal Investment Strategies", "Principal Risks", "Performance", "Management", "Purchase and Sale of Fund Shares", "Tax Information", "Payments to Broker-Dealers and Other Financial Intermediaries"). Fee-table extraction benefits from the relatively strict column structure mandated by Item 3, though individual filers vary in HTML table markup conventions and in whether they wrap tables in <table> elements versus nested <div> grids.

Who Files or Publishes This Dataset, and When

Who files the record

Form 497K is filed by registered open-end management investment companies — investment companies registered under the Investment Company Act of 1940 and classified as open-end management investment company companies. The legal filer is the registrant (the trust or corporation that serves as the issuer), not its investment adviser, distributor, or transfer agent, even though those parties typically prepare the document.

The filer universe consists of:

  • Traditional open-end mutual funds, organized as series of Delaware statutory trusts, Massachusetts business trusts, Maryland corporations, or similar vehicles registered on Form N-1A.
  • Exchange-traded funds structured as open-end management investment companies (the dominant U.S. ETF structure), which register on Form N-1A and commonly use the summary prospectus regime.
  • Variable insurance product underlying funds and other open-end vehicles that have opted into the Rule 498 summary prospectus regime.

A fund enters the 497K population only after its board and registrant affirmatively elect the Rule 498 summary prospectus option. Election is voluntary — a fund may continue delivering the full statutory prospectus and never file a 497K — but most large U.S. open-end complexes use summary prospectuses for retail share classes. A single 497K typically covers one or more specific series and classes within the registrant, identified on EDGAR by series/class IDs.

Who does not file Form 497K

Form 497K is specific to the open-end fund summary prospectus regime. It is not filed by:

  • closed-end funds (Form N-2), which are not eligible for Rule 498 and use other 497-family submission types.
  • unit investment trusts (registered on N-8B-2 / S-6).
  • business development companies, which are not open-end management investment companies.
  • Open-end funds that have not opted into Rule 498, which continue to satisfy Section 5(b)(2) via full statutory prospectus delivery and file 497, 497J, or other 497-family submissions.
  • Private funds, collective investment trusts, separate accounts (other than registered variable product accounts), and bank-maintained pools.
  • Foreign funds not registered under the 1940 Act.

variable annuity and variable life contract summary prospectuses fall under the separate Rule 498A regime (adopted 2020) and are filed under insurance-product submission types tied to Forms Form N-3, Form N-4, and Form N-6, not as 497K.

Regulatory framework

Form 497K implements Rule 497(k) under the Securities Act of 1933, part of the Rule 497 prospectus-filing family that carries out the prospectus-filing obligations of Section 10 of the Securities Act. Rule 497(k) requires that a summary prospectus relied upon under Rule 498 be filed with the Commission no later than the date of first use.

The substantive content is governed by Item 3 of Form N-1A — fund name, investment objectives, fees and expenses, principal strategies and risks, past performance, adviser and portfolio managers, purchase/sale procedures, tax information, and payments to intermediaries — combined with the cover-page and required-legend provisions of Rule 498. The summary prospectus is effectively the Item 3 summary section packaged as a standalone document with legends pointing investors to the statutory prospectus, SAI, and shareholder reports.

The regime was established by the SEC's 2009 rulemaking "Enhanced Disclosure and New Prospectus Delivery Option for Registered Open-End Management Investment Companies" (Investment Company Act Release No. 28584 / Securities Act Release No. 8998, adopted January 13, 2009), which created Rule 498, added Rule 497(k), and amended Form N-1A. The earliest possible 497K filings on EDGAR accordingly begin in early 2009; there is no pre-2009 paper analog.

Triggering events

A registrant files a new Form 497K on each of the following events:

  1. Initial adoption of a summary prospectus for a series or class — for example, at the launch of a new fund, share class, or ETF. The 497K must be filed no later than first use with investors.
  2. Annual updates tied to Form N-1A. Open-end funds file an annual post-effective amendment under Rule 485(a) or Rule 485(b) (aligned with fiscal year end and the 16-month financial-update cycle). Each annual update produces a corresponding refreshed 497K so the short-form document tracks the updated Item 3 disclosure.
  3. Interim supplements ("stickers") for material changes between annual updates — including changes to fees or expense caps, principal strategies or risks, portfolio managers, adviser or sub-adviser identity, benchmark indices, fund name, or distribution arrangements. A supplemented summary prospectus is filed on 497K to keep the delivered document current.
  4. New share class or new series launches within an existing registrant. A 497K is filed for the new class or series no later than first use.

Timing and filing cadence

Rule 497(k) imposes a first-use deadline, not a periodic schedule. Filings cluster around:

  • the effective date of the annual Rule 485(b) post-effective amendment (routine annual updates);
  • the effective date of a Rule 485(a) amendment containing material changes;
  • the first distribution date of a statutory-prospectus supplement or sticker;
  • the commencement of operations of a new series or class.

Because cadence is driven by the fund's own update cycle and the occurrence of material changes, a stable single-series fund may file only a handful of 497Ks per year, while a large multi-series trust with frequent strategy or personnel changes may file many dozens.

Important distinctions

  • Filer vs. series. The legal filer is the registrant (trust or corporation). The "fund" as investors know it is usually a series within that registrant, and a single registrant often produces many 497K filings per year across its series and classes.
  • Summary vs. statutory prospectus. A 497K contains only the summary prospectus. The full statutory prospectus and SAI are filed separately as part of the N-1A registration statement and its post-effective amendments (e.g., 485APOS, 485BPOS) or as other 497-family definitive filings.
  • Sub-types within the 497 family. The family includes Form 497 (definitive materials), 497J (certification of no material change), and 497AD (certain advertising). This dataset covers only 497K.
  • No 497K/A amendment type. To correct or update a summary prospectus, the registrant files a new 497K; each filing stands on its own as the then-current document for the series or class it covers.
  • Delivery-regime conditions (Rule 498). Filing the 497K is one condition among several: the statutory prospectus, SAI, and most recent shareholder reports must be posted and accessible online at the URL identified in the summary prospectus; paper or electronic copies must be available on request free of charge; and the summary prospectus must meet Rule 498 content, format, and legend requirements. Failure on any condition means the summary prospectus does not satisfy the Section 5(b)(2) delivery obligation.
  • Voluntary election. Absence of 497K filings for a registrant does not imply noncompliance — it typically means the registrant continues to deliver the full statutory prospectus under the traditional regime.

How This Dataset Differs From Similar Datasets or Filings

Form 497K sits inside a dense cluster of mutual-fund disclosures on EDGAR. The most useful comparisons fall into four groups: other Rule 497 sub-types, the Form N-1A registration statement and its 485 amendment pathway, the shareholder- and portfolio-reporting regime (N-CSR/N-CSRS, N-PORT, N-CEN), and prospectus forms for non-open-end fund structures (N-2, N-3, N-4, N-6, Form N-14).

Other Rule 497 sub-types

Rule 497 is the post-effective filing rule that transmits prospectus materials to EDGAR. The sub-type suffix is load-bearing.

  • Form 497 (base). Definitive prospectus materials under Rule 497(a)-(j): full statutory prospectuses, stickers, and supplements. Heterogeneous in length and structure. 497K is specifically the Rule 497(k) summary prospectus — short and standardized. Filtering 497 for summary prospectuses yields a noisy mix; 497K isolates the population cleanly.
  • Form 497AD. Rule 482 advertising material. Marketing collateral subject to performance-advertising rules, not a statutory delivery vehicle.
  • Form 497H2. Narrow periodic supplements under Rule 497(h); not a full disclosure document.
  • Form 497J. A certification that the prospectus and SAI have not materially changed and no new filing is required. Procedural, with no fund disclosure content.

Among the 497 family, only 497K carries the standardized Item 3-driven summary-prospectus content, which is why it merits a dedicated, parseable dataset.

Form N-1A and the 485 pathway (485APOS, 485BPOS, 485BXT)

Form N-1A is the registration statement for open-end funds; it contains the full statutory prospectus (Part A) plus the Statement of Additional Information (Part B). The 485 series updates it — 485APOS (post-effective amendment subject to SEC review), 485BPOS (automatically effective), 485BXT (effective-date extensions).

497K is not independent disclosure. It is a condensed Item 3 extract of the N-1A prospectus, repackaged in the Rule 498 summary format. N-1A and the 485 series are the authoritative long-form source; 497K is the investor-facing short form derived from them. Timing also differs: 485BPOS filings cluster around annual update cycles with 60- or 75-day effectiveness windows, while 497K is transmitted whenever the summary prospectus itself is refreshed, stickered, or reissued for a new share class — producing a one-to-many relationship from 485 events to 497K transmissions.

N-CSR, N-CSRS, and the 2024 Tailored Shareholder Report regime

N-CSR (annual) and N-CSRS (semi-annual) are certified shareholder reports carrying financial statements, schedules of investments, management performance discussion, and officer certifications. They are retrospective; 497K is prospective. 497K tells a prospective investor what they are buying; N-CSR tells existing shareholders what happened during the period.

The closest point of confusion is the Tailored Shareholder Report (TSR) regime effective July 2024, filed as an exhibit to N-CSR. TSRs are short, plain-English, visually formatted investor summaries — superficially similar to 497K. The distinction is function: TSRs replace the long-form annual/semi-annual report and communicate historical results; 497K remains the pre-sale offering summary. Format resembles; content does not overlap.

N-PORT and N-CEN

N-PORT (monthly holdings, filed quarterly with the third month public) and N-CEN (annual fund census) are structured data filings delivering position-level holdings, derivatives, liquidity classifications, and registrant-level attributes. They carry no narrative offering content. Overlap with 497K is effectively zero, but the two are complementary for a complete fund profile: 497K supplies the stated strategy, fees, and risk narrative; N-PORT supplies the actual holdings that implement it.

Prospectus forms for other fund structures

Rule 497(k) applies only to open-end management investment companies. Neighboring structures use different forms and, where a summary regime exists, a different rule.

  • Form N-2. Closed-end funds and BDCs. Some N-2 registrants use Rule 498A summary prospectuses (adopted 2020), typically transmitted under 497 without the "K" suffix. Not in this dataset.
  • Forms N-3, N-4, N-6. Variable annuity and variable life separate accounts. Governed by a separate Rule 498A summary-prospectus regime for variable contracts, distinct from Rule 497(k).
  • Form N-14. Registration for fund mergers and reorganizations. Merger proxy/prospectus content, not ongoing-offering disclosure.

497K is strictly the open-end mutual fund population.

The three-tier disclosure hierarchy (SAI / summary prospectus / statutory prospectus)

Rule 498 defines a layered-delivery model:

  1. SAI — Part B of N-1A; fundamental policies, trustee and portfolio-manager detail, tax and operational disclosures.
  2. Summary Prospectus (497K) — the standardized middle layer; prescribed Item 3 sections.
  3. Statutory Prospectus — Part A of N-1A; full narrative, referenced by and available behind the summary.

497K is the only tier that is its own distinct EDGAR submission type; the SAI and statutory prospectus are embedded in N-1A and its 485 amendments. That regulatory discreteness plus content standardization is why 497K supports a dedicated dataset in a way the other two tiers do not.

497K vs a generic 497 dataset

A generic 497 dataset aggregates every Rule 497 sub-type other than 497K — stickers, supplements, full re-filings, procedural certifications — with highly variable structure. 497K is carved out because the Item 3 format is standardized and therefore suitable for section-level extraction, fee-table parsing, risk-factor normalization, and performance-table extraction at scale. Users who need consistent, machine-parseable summary-prospectus content should go directly to 497K rather than filtering a broader 497 corpus.

Key differences at a glance

  • vs 497 (base): 497K is one standardized sub-type; 497 is a mixed bag of supplements and full prospectuses.
  • vs 497AD / 497H2 / 497J: advertising, narrow supplements, and certifications — none carry summary-prospectus content.
  • vs N-1A / 485 series: registration statement and its amendments are the authoritative long-form source; 497K is the derived short form.
  • vs N-CSR / N-CSRS / TSR: retrospective shareholder reporting; 497K is prospective offering disclosure.
  • vs N-PORT / N-CEN: structured holdings and census data; no narrative offering content.
  • vs N-2 / N-3 / N-4 / N-6 / N-14: other fund structures and transactions; outside the open-end Rule 497(k) scope.
  • vs SAI and statutory prospectus: both are embedded in N-1A rather than filed as their own submission type; 497K is the only independently submitted tier.

Boundary summary

Form 497K is narrowly defined: the Rule 497(k) summary prospectus for open-end management investment companies, filed from April 2009 forward, containing a condensed Item 3 extract of the statutory prospectus. It is not marketing material (497AD), not a sticker or supplement (497, 497H2), not a certification (497J), not the registration statement (N-1A via 485APOS/485BPOS/485BXT), not a shareholder report (N-CSR/N-CSRS/TSR), not holdings or census data (N-PORT/N-CEN), and not applicable to closed-end, variable, or merger contexts (N-2, N-3, N-4, N-6, N-14). Its value lies in regulatory precision and content standardization — the single EDGAR submission type that reliably delivers the short-form, investor-facing mutual-fund summary prospectus and nothing else.

Who Uses This Dataset

Form 497K summary prospectuses are the front-line retail disclosure for open-end mutual funds. A continuous corpus from April 2009 onward gives several professional functions a long panel on fees, risks, strategies, advisers, and share-class structure, joined across the structured metadata.json header and the narrative HTML body.

Fund analytics and ratings teams

Analysts parse the fee and expense table (management fee, 12b-1, other expenses, acquired fund fees, total and net expense ratios, waivers, and 1/3/5/10-year example dollars) and the principal investment strategies and principal risks sections to drive category mapping, peer grouping, and scoring. metadata.json.seriesAndClassesContractsInformation[].ticker joins each class to NAV and flow feeds; effectivenessDate anchors the disclosure vintage used at any ratings cutoff. The adviser and sub-adviser block maintains the fund-adviser graph.

Quantitative researchers

Quant teams treat 497K filings as an event stream for fee changes, waiver extensions, new share classes, strategy rewrites, risk additions, benchmark changes, and adviser swaps. Panels are keyed on CIK, series ID, class ID, and ticker. Expense-ratio time series, strategy-text embeddings, and risk-section diffs feed flow-prediction, performance-persistence, and fee-compression models. entities[] and seriesAndClassesContractsInformation[] reconstruct fund-family hierarchies without scraping.

Fund-complex compliance, legal, and disclosure counsel

In-house and outside counsel benchmark peer drafting: principal risk wording, fee-table footnotes, waiver and expense-limitation language, class-structure descriptions, and investment-objective phrasing. When launching a new class or amending a fee schedule, teams filter recent peer 497Ks by effectivenessDate to validate that their own language tracks current market practice.

Product and distribution operations at broker-dealers, RIAs, and platforms

Product ops ingest 497Ks to keep shelves accurate. They reconcile ticker, CUSIP, and class mappings from seriesAndClassesContractsInformation[], detect class launches and closures via filedAt and effectivenessDate, and refresh internal fact sheets, platform screens, and point-of-sale materials from the HTML body. The stream also surfaces adviser changes, fee reductions, and reorganizations that drive shelf approval, breakpoints, and commission grids, and confirms that a current summary prospectus exists before a purchase is allowed.

Advisory platforms and RIA technology providers

Platforms need the current summary prospectus on demand, indexed by ticker, for prospectus delivery, cost-transparency tools, and Reg BI documentation. The dataset backs APIs that return the latest 497K by ticker or CIK/series/class, auto-populates account-opening disclosure bundles, and surfaces the relevant prospectus alongside client holdings.

Regtech and disclosure-monitoring vendors

NLP pipelines perform risk-factor extraction, fee-table parsing, supplement and sticker detection, and revision diffing. Outputs include expense-ratio benchmarking dashboards, risk-taxonomy tracking, and alerts on material edits to principal strategies or adviser identification. Join keys come from CIK, series ID, class ID, ticker, and effectivenessDate.

Academic researchers in finance and asset management

The 2009 start date aligns with the SEC summary prospectus regime, making the corpus a natural panel for fee compression, share-class proliferation, risk-disclosure evolution, strategy drift, adviser-subadviser networks, and disclosure-flow relationships. Researchers extract fee tables, performance, and risk text keyed on class, series, and CIK, then merge with returns and flows. effectivenessDate and entities[] anchor family identity and vintage across longitudinal samples.

Fintech builders of investor tooling

Cost calculators, fee-transparency apps, and fund comparison interfaces populate fields by ticker using seriesAndClassesContractsInformation[].ticker. Fee-table parsing drives cost projections; the strategies paragraph yields category labels and plain-English summaries; the adviser block supplies branding cues.

Litigation support and fund-fee counsel

Section 36(b) excessive-fee teams, mis-selling matters, and class-action counsel use effectivenessDate and filedAt to reconstruct the fee table, waivers, risk disclosures, and adviser identification in force on specific dates. Expert witnesses compile exhibits comparing a defendant fund's fee trajectory to peer funds, document when specific risk language appeared or disappeared, and trace class-specific fee differentials over time.

Across these functions, the join between entities[], seriesAndClassesContractsInformation[], effectivenessDate, filedAt, and the HTML body (fee table, principal risks, performance, adviser) is what makes each workflow scale.

Specific Use Cases

Each use case below anchors to specific fields in metadata.json and to identifiable sections of the Items 2-8 disclosure inside the primary HTML.

1. Building a class-level expense ratio panel for fee-compression studies

Parse the Item 3 Annual Fund Operating Expenses table from every 497K HTML and key each row by seriesAndClassesContractsInformation[].series, .classesContracts[].classContract, and .ticker. Stamp every observation with effectivenessDate to produce a monthly panel of management fee, 12b-1, other expenses, acquired fund fees, gross expense ratio, waiver, and net expense ratio from April 2009 forward. The output drives fee-compression regressions, peer-group median benchmarking, and share-class fee-differential analysis for Section 36(b) expert reports.

2. Monthly outcome-period tracking for defined-outcome ETFs

Filter accessions where entities[].companyName matches Innovator, First Trust, AllianzIM, Calamos, BlackRock iShares Deep Buffer, or TrueShares, then group sibling accessions within a single monthly ZIP by filer CIK. Extract cap rate, buffer level, floor, and outcome-period start/end dates from the Principal Investment Strategies section, joined to the ETF ticker in seriesAndClassesContractsInformation[]. The result is a structured ladder of active outcome-period series for portfolio construction, laddered-product monitoring, and distribution-shelf maintenance.

3. Risk-factor diffing across successive prospectus vintages

For a given series identifier, sort 497K accessions by filedAt and run heading-anchored extraction on the Principal Risks block. Produce a diff between consecutive versions to detect added, removed, or rewritten risk factors (e.g., appearance of crypto-derivatives risk, FLEX options risk, or new concentration language). Alerts feed regtech monitoring dashboards and compliance review of peer-family disclosure practice; the effectivenessDate pair identifies when each change took effect.

4. Current-prospectus-by-ticker API for Reg BI and prospectus delivery

Index every record by each classesContracts[].ticker and keep the accession with the latest effectivenessDate per ticker. Serve the primary HTML on demand so broker-dealers, RIA platforms, and account-opening flows can attach the current summary prospectus to a point-of-sale record, populate Reg BI disclosure bundles, and confirm that a live summary prospectus exists before permitting a buy ticket.

5. Share-class and fund-launch event detection

Stream new 497K accessions by monthly container and compare the seriesAndClassesContractsInformation[] block against a rolling registry of known (series, classContract, ticker) tuples. First-time appearance of a C000###### identifier or ticker flags a share-class launch; disappearance across successive filings for the same series flags a closure. Events feed product-ops shelf updates, commission-grid refreshes, and quant flow-prediction models that condition on share-class proliferation.

6. Adviser and sub-adviser network reconstruction

Extract the Item 5 Investment Adviser block from the HTML and join to entities[] (filer CIK and file number) and seriesAndClassesContractsInformation[].series. Accumulated over the full 2009-present window, this yields a longitudinal bipartite graph of advisers, sub-advisers, and fund series, with portfolio-manager names and tenures. Detecting adviser swaps via changes in the Item 5 text between accessions for the same series supports subadvisory-mandate tracking and Item 4.01-style manager-change monitoring for fund complexes.

7. Supplement-versus-full-prospectus classification for corpus curation

Use HTML byte size, section-heading coverage, and the presence of a complete Items 2-8 sequence to split the corpus into full summary prospectuses and shorter stickers/supplements. Route full prospectuses to fee-table, performance-chart, and risk-extraction pipelines; route supplements to a narrower change-event extractor that captures cap-rate resets, fee-waiver extensions, portfolio-manager corrections, and benchmark substitutions. The split keeps downstream NLP pipelines from misreading 7 KB stickers as structured prospectuses.

Dataset Access

The Form 497K Files Dataset can be accessed programmatically through a JSON index endpoint, downloaded in full as a single archive, or retrieved one monthly container at a time. Containers are organized by month, starting from April 2009.

Dataset Index JSON API: https://api.sec-api.io/datasets/form-497k-files.json

Returns dataset-level metadata (name, description, last updated timestamp, earliest sample date, total records, total size, form types, container format, and file types) along with a containers array listing every per-period ZIP archive with its key, download URL, size, record count, and last-modified timestamp. This endpoint does not require an API key and can be polled to monitor which monthly containers were refreshed in the most recent run, enabling incremental downloads instead of re-fetching the entire dataset.

Example response:

Example
1 {
2 "datasetId": "1f13365b-9ade-61dc-9da2-e4b13255c3bd",
3 "datasetDownloadUrl": "https://api.sec-api.io/datasets/form-497k-files.zip",
4 "name": "Form 497K Files Dataset",
5 "updatedAt": "2026-04-22T02:58:37.780Z",
6 "earliestSampleDate": "2009-04-01",
7 "totalRecords": 323522,
8 "totalSize": 3665685620,
9 "formTypes": ["497K"],
10 "containerFormat": "ZIP",
11 "fileTypes": ["HTML", "JSON", "TXT", "PDF"],
12 "containers": [
13 {
14 "downloadUrl": "https://api.sec-api.io/datasets/form-497k-files/2025/2025-06.zip",
15 "key": "2025/2025-06.zip",
16 "size": 13818783,
17 "records": 154,
18 "updatedAt": "2026-04-22T02:58:37.780Z"
19 }
20 ]
21 }

Download Entire Dataset: https://api.sec-api.io/datasets/form-497k-files.zip?token=YOUR_API_KEY

Downloads the complete dataset as one ZIP archive containing every monthly container. This endpoint requires an SEC API key, passed via the token query parameter or an Authorization header. Use this option for one-time bulk ingestion; for ongoing updates prefer the per-container approach below.

Download Single Container: https://api.sec-api.io/datasets/form-497k-files/2025/2025-06.zip?token=YOUR_API_KEY

Downloads one monthly container ZIP. Each container extracts to a YYYY-MM/ directory containing one subdirectory per accession number, which holds the metadata file and all documents from the original EDGAR submission (excluding image files). This endpoint requires an SEC API key, passed via the token query parameter or an Authorization header.

Frequently Asked Questions

What form does this dataset cover?

The dataset covers Form 497K, the Rule 497(k) submission sub-type used to transmit a summary prospectus under Rule 498 of the Securities Act of 1933. Every record has formType equal to the string "497K" and includes no other form types.

What does one record in this dataset represent?

One record represents a single EDGAR submission of Form 497K, materialized as a per-accession subdirectory inside a monthly ZIP container. Each subdirectory holds exactly two files: a normalized metadata.json header and one SGML-wrapped HTML document containing the summary-prospectus text.

Who is required to file Form 497K?

Form 497K is filed by registered open-end management investment companies — traditional U.S. mutual funds and exchange-traded funds structured as open-end funds — that have affirmatively elected the Rule 498 summary prospectus option. Election is voluntary, so registrants that continue to deliver the full statutory prospectus do not appear in the 497K population.

What time period does the dataset cover?

The dataset begins on 2009-04-01 and extends to the present. That start date aligns with the effective date of the summary prospectus regime under Release No. 33-8998 (operative 13 March 2009); there is no pre-2009 paper analog for Form 497K.

What file format is the dataset distributed in?

The dataset is distributed as monthly ZIP containers named YYYY-MM.zip. Inside each container, every accession folder contains an HTML primary document (SGML-wrapped) and a metadata.json header. The dataset advertises support for HTML, JSON, TXT, and PDF file types, though 497K submissions in practice use HTML almost universally.

How does Form 497K differ from Form 497 without the "K" suffix?

Form 497 (base) carries definitive prospectus materials under Rule 497(a)-(j) — full statutory prospectuses, stickers, and supplements of highly variable structure. Form 497K is specifically the Rule 497(k) summary prospectus: short, standardized, and governed by Items 2 through 8 of Form N-1A. 497K isolates the parseable summary-prospectus population that a generic 497 filter would dilute with heterogeneous content.

Why is linkToXbrl always empty and dataFiles[] always []?

Summary prospectuses filed under 497K do not carry interactive-data attachments. The structured Risk/Return Summary XBRL that corresponds to the same disclosure is filed separately under the companion registration-statement amendment — typically 485BPOS — as an interactive-data exhibit under Regulation S-T. The empty XBRL fields are a normal state of every 497K record, not a gap in the dataset.