Form 497K3A Files Dataset

The Form 497K3A Files dataset is a closed historical corpus of mutual fund profile filings made on EDGAR under submission type 497K3A. Each record is one EDGAR accession submitted by an open-end management investment company registered on Form N-1A, packaging the fund profile authorized by the original Rule 498 — a concise, plain-English alternative to the full statutory prospectus that the SEC permitted from December 1998 until the profile framework was replaced by the summary prospectus regime in March 2009. The legal filer is the registrant (typically a Massachusetts business trust, Delaware statutory trust, or Maryland corporation), and a single 497K3A submission may carry profile content for multiple share classes or multiple series of an umbrella trust. The dataset's earliest sample date is December 1, 1998, and coverage tracks the full operational lifetime of the form through its retirement in March 2009. Files are distributed as monthly ZIP containers; record content includes a parsed metadata.json header alongside each non-image body document from the original EDGAR submission, in TXT, JSON, and HTML form.

Update Frequency
Daily
Updated at
2026-04-16
Earliest Sample Date
1998-12-01
Total Size
1.3 MB
Total Records
52
Container Format
ZIP
Content Types
TXT, JSON, HTML
Form Types
497K3A

Dataset APIs

Programmatically retrieve the full list of dataset archive files, download URLs and dataset metadata.

Dataset Index JSON API

Download the entire dataset as a single archive file.

Download Entire Dataset:

Download a single container file (e.g. monthly archive) from the dataset.

Download Single Container:

Dataset Files

34 files · 1.3 MB
Download All
2009-02.zip49.4 KB1 records
2008-11.zip49.3 KB1 records
2008-10.zip49.8 KB2 records
2008-08.zip49.2 KB1 records
2008-05.zip43.3 KB1 records
2008-02.zip45.8 KB1 records
2007-11.zip43.1 KB1 records
2007-09.zip24.0 KB1 records
2007-08.zip40.7 KB1 records
2007-05.zip38.9 KB1 records
2007-02.zip37.1 KB1 records
2006-11.zip36.9 KB1 records
2006-08.zip36.7 KB1 records
2006-05.zip37.0 KB1 records
2006-01.zip58.5 KB2 records
2005-11.zip36.9 KB1 records
2005-08.zip38.6 KB1 records
2005-05.zip37.5 KB1 records
2005-03.zip35.3 KB1 records
2004-11.zip37.3 KB1 records
2004-09.zip36.5 KB1 records
2004-05.zip34.3 KB1 records
2004-02.zip53.6 KB2 records
2003-11.zip54.8 KB2 records
2003-08.zip30.8 KB1 records
2003-05.zip30.4 KB1 records
2000-11.zip7.8 KB1 records
2000-05.zip13.8 KB1 records
2000-02.zip14.1 KB2 records
1999-11.zip31.1 KB5 records
1999-08.zip5.3 KB1 records
1999-07.zip18.5 KB3 records
1999-01.zip92.4 KB8 records
1998-12.zip9.3 KB1 records

What This Dataset Contains

The Form 497K3A Files dataset captures every Form 497K3A filing accepted by EDGAR during the form's eleven-year operational window. Form 497K3A is a Rule 497 filing under the Securities Act of 1933 used by open-end management investment companies registered on Form N-1A to file a fund "profile" — the concise, plain-English alternative to a full statutory prospectus authorised by SEC Rule 498 in 1998. The "K3A" suffix corresponds to Rule 497(k)(1)(iii)(A), which prescribed the profile's required content and its standardised question-and-answer ordering. A 497K3A submission is the public dissemination vehicle for that profile: the actual document filed is the profile itself, not a registration statement, and its role is to satisfy prospectus delivery for investors who elected the profile alternative.

The form was operative from December 1998, when Rule 498 first took effect, until March 2009, when the SEC replaced the profile framework with the summary prospectus regime under amended Rule 498 (the "Form N-1A summary prospectus"). The dataset's temporal envelope tracks the full operational lifetime of the form; no records exist after the March 2009 retirement, and the dataset is closed-ended and historical. It is distributed as monthly ZIP archives, with per-record content shipped as JSON metadata plus the original-name body documents (TXT or HTML) preserved inside their EDGAR SGML envelopes. Image attachments listed in the EDGAR submission as GRAPHIC documents are intentionally excluded, but their inventory entries remain in the per-record metadata so that consumers can fetch them on demand from EDGAR.

Content Structure of a Single Record

What one record represents

One record in the Form 497K3A Files dataset is a single EDGAR filing of submission type 497K3A, identified by its accession number and materialised on disk as one accession-keyed folder. The folder bundles a parsed-header metadata.json together with every non-image document that was part of the original EDGAR submission, each preserved under its original filename and inside its original EDGAR SGML envelope. The unit of record is therefore the filing as a whole — not an extracted section, not a per-document row, not a per-fund or per-share-class observation. When a single 497K3A submission carried profiles for multiple share classes or multiple portfolios in one umbrella document, all of that content remains together in the same record because the underlying EDGAR submission was itself a single accession.

Container and on-disk layout

The dataset is distributed as monthly ZIP archives organised under a YYYY/YYYY-MM.zip path scheme. Extraction yields a top-level YYYY-MM/ directory whose immediate children are accession-number folders. Folder names use the path-safe 18-digit form of the accession number with no dashes (e.g. 000031321207000095), while metadata.json stores the canonical hyphenated form (0000313212-07-000095). Inside each accession folder sit metadata.json and one or more original-name body documents from the EDGAR submission. Because 497K3A is a low-volume form throughout its life, monthly containers are small.

The file types present in the dataset are TXT, JSON, and HTML. JSON is reserved for the per-record metadata.json. HTML (most often with an .htm extension) is the dominant body-document format from the mid-2000s onward, while flat TXT is most common in the earliest years of the form. Image attachments listed in the EDGAR submission as GRAPHIC documents (typically .gif or .jpg) are intentionally excluded from the archive, although their inventory entries remain in metadata.json with their original filenames and direct EDGAR URLs so consumers can fetch them on demand.

metadata.json — parsed EDGAR header

metadata.json is a compact JSON object derived from the EDGAR submission header and document index. Its fields describe the filing and enumerate every document EDGAR received, regardless of whether the dataset ships that document.

  • formType — always 497K3A for this dataset.
  • accessionNo — the canonical hyphenated accession number, e.g. 0000313212-07-000095.
  • filedAt — an ISO-8601 timestamp with timezone offset capturing the EDGAR acceptance time.
  • description — the standard Rule 497 description string ("Form 497K3A — Profiles for certain open-end management investment companies, [Rule 497(k)(1)(iii)(A)]").
  • linkToFilingDetails, linkToTxt, linkToHtml, linkToXbrl — back-references to the EDGAR filing detail page, the full submission .txt, the rendered HTML, and the XBRL instance. linkToXbrl is empty for every record because XBRL was never required for 497K3A filings.
  • documentFormatFiles — an array of objects describing each document in the EDGAR submission: sequence number, document type (e.g. 497K3A, GRAPHIC), original filename, byte size, and document URL on EDGAR. The final entry typically points at the complete *.txt submission file. GRAPHIC entries remain listed even though their bytes are not shipped, providing a complete inventory and EDGAR URLs for downstream retrieval.
  • entities — an array of filer entity objects. Each entity carries cik, companyName (suffixed with the EDGAR role marker, e.g. (Filer)), fileNo (1933 Act file number), irsNo, fiscalYearEnd (as MMDD), act (commonly 33 for the Securities Act of 1933), type (the form type as reported by EDGAR for that entity), and filmNo (the SEC film identifier).
  • dataFiles — an empty array for this form type, because 497K3A carries no XBRL data files.
  • id — an opaque internal identifier used for deduplication and indexing.

Body documents — SGML envelope around HTML or plain text

Each non-image document listed in documentFormatFiles is shipped under its original EDGAR filename in the accession folder. Despite the .htm extension on most modern filings, the bytes on disk are not pure HTML: they are EDGAR SGML, with the document body bracketed by an opening <DOCUMENT> block whose header tags are unclosed in the EDGAR style and a <TEXT> payload region. Only </TEXT> and </DOCUMENT> carry explicit close tags:

1 <DOCUMENT>
2 <TYPE>497K3A
3 <SEQUENCE>1
4 <FILENAME>inteqpro07ame.htm
5 <TEXT>
6 ... document payload ...
7 </TEXT>
8 </DOCUMENT>

Inside <TEXT>, the payload is either an HTML document (mid-2000s onward) or a flat ASCII text block with EDGAR-style monospaced table markup (early-era filings). To parse the file as HTML, consumers strip the leading <DOCUMENT>…<TEXT> envelope and the trailing </TEXT></DOCUMENT> lines and feed the inner content to an HTML parser.

The HTML payloads characteristic of profile filings are heavily print-oriented. They rely on inline <font> and <div style="…"> markup with explicit point sizes, named typefaces (e.g. Berkeley Book, Trajan, MetaPlusLF-MediumRoman), explicit color attributes, and numeric HTML entities such as &#160; (non-breaking space) and &#151; (em-dash). EDGAR redlining markers carried over from authoring tools — escaped &lt;R&gt;&lt;/R&gt; pairs — frequently bracket recently revised passages. Embedded <img src="…gif"> tags remain in the HTML and refer to the omitted GRAPHIC files; the URLs in metadata.json make those images recoverable from EDGAR.

Encoding is ASCII with numeric HTML entities for any non-ASCII characters. There is no byte-order mark and no explicit <meta charset> declaration. Line endings are predominantly \n. Body files are often a single long logical line of HTML, with newline-terminated breaks limited to the SGML wrapper itself, so line-oriented tools may report only a handful of lines for files that are hundreds of kilobytes long.

Typical content of a 497K3A profile

Rule 498 fixed the profile's required disclosures and ordered them as a short series of plain-language questions, so most 497K3A documents follow a recognisable internal sequence:

  1. Cover identifying block — fund name, share class designators, fund family, profile date, and a marker that the document is a Rule 498 fund profile. In modern filings this is a styled HTML banner; in early filings it is plain ASCII.
  2. Investment objectives — the fund's stated investment objectives, typically a single sentence or short paragraph, sometimes flagged as fundamental or non-fundamental.
  3. Principal investment strategies — narrative description of how the fund pursues its objective: asset classes, geographic focus, capitalisation ranges, derivatives use, security selection process.
  4. Principal risks — bulleted or paragraph-form enumeration of the principal risks of investing, including market, interest-rate, credit, foreign-investment, currency, and any fund-specific risk factors.
  5. Past performance — a bar chart of annual total returns and an average-annual-total-return table comparing the fund to a benchmark index, typically with a "best quarter / worst quarter" call-out and the standard disclaimer that past performance is not indicative of future results.
  6. Fee and expense table — the standardised shareholder fees table and the annual fund operating expenses table (management fees, distribution/12b-1 fees, other expenses, total annual operating expenses), commonly followed by an example illustrating dollar costs over 1, 3, 5, and 10 years.
  7. Investment adviser and portfolio managers — short identification of the investment adviser and key portfolio personnel.
  8. Purchase and sale of fund shares — minimum investment amounts, share-class descriptions, transaction channels, and redemption mechanics.
  9. Tax information — a brief summary of how distributions are taxed.
  10. Financial intermediary compensation — disclosure of payments to brokers and other intermediaries that may create conflicts of interest.

Filings range from compact single-fund profiles to multi-share-class or multi-portfolio profiles in which several funds share one umbrella document. Tabular sections (fees, performance) are rendered with HTML tables in modern filings and with monospaced ASCII alignment in early-era filings.

What the dataset includes

Each record contains the complete parsed header (metadata.json), every non-image body document from the EDGAR submission preserved in its original SGML-wrapped form under its original filename, and the full document inventory in documentFormatFiles listing every file EDGAR received including those not shipped. The hyphenated and non-hyphenated accession numbers, filer CIK and entity attributes, file number, IRS number, fiscal year end, film number, and EDGAR-side URLs are all retained.

What is excluded

Image documents listed as GRAPHIC in EDGAR (typically .gif and .jpg files used for logos, charts, and decorative typography) are intentionally omitted from the accession folder. Their entries remain in metadata.json's documentFormatFiles array with their original filenames and direct EDGAR URLs, allowing any consumer to retrieve them from EDGAR if needed. No XBRL instance documents exist for this form because 497K3A was never subject to XBRL tagging requirements; dataFiles is uniformly empty and linkToXbrl is uniformly an empty string.

The dataset packages per-document files individually rather than the full EDGAR submission .txt concatenation. The metadata still records a reference and URL to the original full submission text file, but that consolidated artefact is not duplicated alongside the per-document files.

Evolution of required content over time

The 497K3A profile's core content requirements were fixed by Rule 498 at the form's inception in December 1998 and remained largely stable across the form's eleven-year life. The set of disclosures — objective, strategies, risks, performance, fees, purchase/redemption procedures — and their ordered question-style presentation persisted from the earliest 1998 filings through the final March 2009 filings. Substantive changes were incremental: increased granularity in risk disclosure, evolution of fee-table conventions in line with broader Form N-1A amendments, and gradual standardisation of the performance-table benchmark and example formats. The form was discontinued when the SEC restructured prospectus delivery around the new summary prospectus under amended Rule 498, which absorbed the profile's role and triggered the removal of 497K3A as a valid EDGAR submission type.

Evolution of file format over time

Across the form's lifetime the on-disk presentation of the body document evolved while the SGML envelope remained constant.

  • December 1998 through the early 2000s: profiles were typically filed as plain ASCII inside the EDGAR SGML <TEXT> block, often within a <TYPE>497K3A document whose payload used monospaced text and EDGAR table conventions for fee and performance tables. The body document is therefore a flat text file in this era.
  • Mid-2000s onward: filings shifted to HTML payloads embedded inside the same <DOCUMENT>…<TEXT>…</TEXT></DOCUMENT> envelope. HTML markup is print-oriented and heavily inline-styled, with named typefaces, explicit point sizes, color attributes, numeric entities, and embedded <img> references to GRAPHIC files. EDGAR redlining tags surface as escaped &lt;R&gt; markers around revised passages.
  • Through the March 2009 retirement: HTML payloads continue to grow in typographic complexity but the SGML wrapper convention is unchanged, so the same parsing strategy — strip the wrapper, parse the inner HTML — applies uniformly to all modern-era records.

Interpretation notes

Several nuances matter for working with these records.

  • Body documents look like HTML by extension but are EDGAR SGML at the byte level. HTML parsing must be preceded by a step that removes the <DOCUMENT>, <TYPE>, <SEQUENCE>, <FILENAME>, and <TEXT> opening tags and the closing </TEXT></DOCUMENT> lines.
  • Escaped &lt;R&gt; redlining markers may interfere with naive HTML cleanup if a consumer unescapes entities indiscriminately; preserving them as literal text is usually safer.
  • <img> references inside the HTML point at files that are not in the accession folder, so any rendering pipeline must either suppress the missing images or fetch them from the EDGAR URLs preserved in documentFormatFiles.
  • The inline-styled, font-attribute-heavy HTML of modern filings is optimised for visual fidelity rather than semantic structure, so section detection generally relies on text patterns ("Investment Objective", "Principal Risks", "Fees and Expenses") rather than on consistent heading tags.
  • The dominant single-line HTML layout means line-based tools (wc -l, naive line splitters) will dramatically under-report the size and structure of the payload; byte- or token-oriented tooling is more reliable.
  • Multi-fund or multi-class profiles place several logical fund records inside one filing-level record; downstream extraction at the fund or share class level requires section-aware parsing rather than a one-record-per-fund assumption.
  • Accession numbers appear in two forms across the record — the path-safe 18-digit folder name and the canonical hyphenated form inside metadata.json — and joining the two requires inserting (or removing) the standard NNNNNNNNNN-YY-NNNNNN dashes.
  • Because the form was discontinued in March 2009, no current filings exist; the dataset is closed-ended and historical, and downstream pipelines should not expect new accessions.

Who Files or Publishes This Dataset, and When

Who files

Each Form 497K3A submission is made by an open-end management investment company registered on Form N-1A — in practice, a mutual fund. The legal filer is the registrant itself, typically organized as a Massachusetts business trust, Delaware statutory trust, or Maryland corporation. A single registrant often operates as a series trust with many funds and share classes, so one 497K3A accession may carry profile content for one or several series.

The form is not used by:

A filing agent or financial printer normally transmits the submission to EDGAR, but the disclosure obligation rests with the registrant and, indirectly, the principal underwriter that distributes the shares. Officers and trustees of the registrant are responsible for the underlying profile content.

Regulatory basis: Rule 497 plus the original Rule 498 profile

Form 497K3A is a submission type under Rule 497 of the Securities Act of 1933, the rule that requires registered investment companies to file with the SEC the prospectus-related materials they actually use in offering their shares.

The "K" family of 497 suffixes was created to carry fund profiles authorized under the original Rule 498, adopted by the SEC in March 1998. That version of Rule 498 let an N-1A-registered fund prepare a short, standardized profile summarizing:

  • investment objectives and principal investment strategies
  • principal risks
  • fee table and past performance
  • identity of the investment adviser
  • purchase, redemption, distribution, and tax information
  • shareholder services

Funds could deliver this profile to prospective investors in lieu of the full statutory prospectus, provided the statutory prospectus was made available on request and was delivered with the confirmation of the initial purchase.

Within the 497K suffix family, the trailing characters identified the role of the profile being filed (initial profile, revised profile, profile filed alongside other materials, etc.). 497K3A was one of these operational sub-codes for profile filings made against an already-effective N-1A registration statement. All 497K-suffix submissions in this 1998-2009 regime share the same legal authority: a Rule 498 profile filed pursuant to Rule 497.

Triggering event

A 497K3A filing is event-driven, not periodic. Rule 497 required any prospectus or profile used after the registration statement's effective date to be filed no later than the date it was first used. The trigger is therefore the fund's actual deployment of the profile — once the fund, through its principal underwriter or selling intermediaries, began distributing the profile, the registrant was obligated to file the corresponding 497K3A on EDGAR.

In practice, profiles were refreshed alongside the fund's annual N-1A update and its statutory prospectus, so 497K3A filings cluster around annual prospectus cycles, with additional filings whenever a profile was reissued or corrected mid-cycle.

Historical window: 1998 to early 2009

The original Rule 498 profile regime ran from March 1998 until it was replaced. In January 2009 the SEC adopted amendments to Form N-1A and a rewritten Rule 498 that established the summary prospectus — a short document that is itself part of the statutory prospectus and whose delivery (combined with online posting of the full prospectus) satisfies prospectus delivery obligations. The summary prospectus replaced the older profile concept entirely.

As funds transitioned during 2009, the profile-specific submission types, including 497K3A, were retired. The last 497K3A filings on EDGAR appear in early 2009. The 497K3A dataset is therefore closed and covers a fixed historical window from late 1998 through early 2009.

Important distinctions

Post-2009 "497K" is a different animal. After the 2009 amendments, EDGAR continues to accept a submission type labeled Form 497K, but those filings are summary prospectuses filed under the new Rule 498(k), not Rule 498 profiles. Despite the visually similar code, post-2009 497K filings are not part of this dataset and are governed by a different regulatory regime. The 497K3A code itself was not reused.

497K3A is not a registration filing. It is the EDGAR copy of a prospectus-equivalent document associated with an already-effective N-1A registration. It does not effect registration, does not move effective dates, and does not trigger staff review the way an N-1A post-effective amendment under Rule 485 does.

Profile use was optional. Funds that did not adopt the profile alternative filed nothing in the 497K family; they continued to file full statutory prospectuses and stickers under other Rule 497 suffixes (e.g., 497, 497J) and amended their registration statements under Rule 485. The 497K3A population is therefore narrower than the universe of N-1A registrants.

Series-trust filings. When a series trust filed one 497K3A covering multiple series or share classes, a single accession number was generated even though each series remained the substantive subject of its own disclosure.

Corrections. Amendments to a previously filed profile were handled by filing a new 497-family submission, not by amending the original accession in place.

No withdrawals at sunset. The discontinuation of 497K3A in 2009 is a regulatory boundary, not a corporate event. Funds did not withdraw existing 497K3A filings; they simply stopped producing new ones once the summary prospectus regime took effect.

How This Dataset Differs From Similar Datasets or Filings

Form 497K3A sits inside a tightly clustered family of mutual fund prospectus submission types filed under Rule 497 of the Securities Act of 1933, layered on top of the Form N-1A registration regime. The most useful comparisons are to other Rule 497 sub-types, to the post-effective amendments that contain the underlying statutory prospectus, to Form N-1A itself, and to the post-2009 summary prospectus framework that replaced the profile.

Form 497 (generic Rule 497 submission)

Form 497 is the parent submission type for definitive materials filed under Rule 497 after a registration statement is effective: full prospectuses, supplements, stickers, and certain sales literature. A bare 497 tag does not indicate which document type was filed.

497K3A is narrower on two axes: it is restricted to the fund profile authorized by the original Rule 498 (adopted 1998), and it encodes a specific delivery posture within the K-suffix taxonomy. Researchers using a generic 497 corpus capture 497K3A as a sibling category but cannot isolate profile documents without the suffix.

Form 497K1, 497K2, 497K3B (sibling K-suffixes, 1998–2009)

These siblings come from the same original Rule 498 profile regime and all carry profile content with substantially similar disclosure (objectives, strategies, risks, performance, fees, purchase/redemption procedures). The suffix encodes the operational and delivery context in which the profile was filed, not the substantive content.

The precise procedural meaning of each numeric suffix is an EDGAR submission-type convention rather than a category defined on the face of Rule 498 itself, and SEC public materials do not give a clean one-line definition for each suffix. What can be said accurately:

  • Form 497K1, Form 497K2, 497K3A, Form 497K3B all denote Rule 498 profiles filed during the 1998–2009 window.
  • They differ in the filer's elected delivery arrangement (profile alone, profile with prospectus, or profile under other Rule 498-permitted arrangements) and in procedural variants within those branches.
  • 497K3A and 497K3B are the closest pair and the most easily confused, both falling within the "3" branch tied to profile-with-prospectus delivery scenarios.

For content-level analysis the suffixes should generally be combined; for compliance or delivery-mechanics research, the specific suffix is what carries the distinction. Avoid attributing precise rule citations to individual K-suffixes beyond what EDGAR submission-type documentation supports.

Form 497K and Form 497K-SP (post-2009 summary prospectus regime)

The amended Rule 498, adopted January 2009 and effective March 31, 2009, replaced the profile with the summary prospectus. The submission type 497K (no numeric suffix) is reused on EDGAR for these post-2009 filings, but the legal substance changed:

  • The summary prospectus is a short document keyed to Form N-1A Items 2–8, with prescribed ordering and content.
  • Its delivery, paired with website posting of the full statutory prospectus, satisfies Section 5(b)(2) prospectus delivery under Rule 498(b).
  • The pre-2009 profile under the original Rule 498 was a different document with a different content template and did not function as a prospectus-delivery substitute on the same terms.

Form 497K-SP appears in some EDGAR records in connection with summary prospectus filings under the post-2009 regime and belongs to that successor universe, not the 497K3A peer set.

The 497K3A dataset terminates in March 2009 because the original profile framework was rescinded. The post-2009 497K corpus is the conceptual successor but is not content-interchangeable; cross-regime comparisons must account for the document-format change.

Form 485APOS and Form 485BPOS

Form 485APOS and Form 485BPOS are post-effective amendments to Form N-1A registration statements. 485APOS is filed under Rule 485(a) and is subject to a delayed effective date pending staff review; 485BPOS is filed under Rule 485(b) and goes effective immediately or on a date certain. Both carry the full statutory prospectus, the Statement of Additional Information, and Part C.

The relationship to 497K3A is hierarchical: the profile summarizes information that lives in greater detail in the corresponding 485BPOS prospectus. 485BPOS is the source-of-truth offering document; 497K3A is the investor-facing summary derived from it. They are complementary, not substitutable.

Form N-1A

Form N-1A is the registration form for open-end management investment companies. It defines the disclosure architecture that flows into both the 485-series amendments and the 497-series filings. N-1A as a registration statement covers Part A, Part B (SAI), and C; 497K3A captures only the profile document and its EDGAR submission metadata, and does not contain N-1A's full content.

Boundary summary

Form 497K3A is distinct on four axes at once:

  1. Document type — the original Rule 498 profile, not a full prospectus, sticker, or post-2009 summary prospectus.
  2. Delivery posture — the "3A" branch within the K-suffix taxonomy, distinguished from 497K1, 497K2, and 497K3B.
  3. Regulatory era — bounded by the 1998 adoption of Rule 498 and its March 31, 2009 replacement by the amended Rule 498.
  4. Filer population — open-end funds registered on Form N-1A that elected to use the profile.

It is not a substitute for 485BPOS when full prospectus detail is needed, not interchangeable with the post-2009 497K summary prospectus corpus, and not interchangeable with sibling K-suffixes when delivery-posture distinctions matter. It is the closed historical record of the pre-2009 profile experiment that preceded and informed the summary prospectus regime.

Who Uses This Dataset

Because the Form 497K3A corpus is closed and bounded by the December 1998 to March 2009 profile era, its users are professionals who need precise retrieval of how funds disclosed objectives, strategies, risks, fees, and performance during that decade.

Investment-management lawyers and disclosure counsel

Securities lawyers advising open-end funds use the corpus as a precedent set for Rule 498 profile drafting. Disclosure counsel compare the narrative HTML sections (investment objectives, principal strategies, principal risks, performance, fee table) against current Item 2-Item 8 summary prospectus language to trace which conventions migrated into the 2009 regime and which were dropped. Litigation counsel handling claims tied to the 1998-2009 window pull profiles by accession number, using metadata.json (CIK, filing date, submission type) to anchor the evidentiary chain and the HTML body to quote risk language verbatim.

Fund compliance and registration teams

In-house compliance officers at fund complexes that survived the profile era use the dataset as a precedent library for legacy series. They reference older 497K3A filings for the same family to reconstruct earlier disclosures of strategies, derivatives use, redemption fees, and sales loads, then verify continuity (or document changes) against current filings. The fee and expense table and the principal risks section receive the heaviest use.

Regulatory economists and academic disclosure researchers

Policy staff and securities-regulation academics study the profile experiment itself: why it was adopted, how it was used, and why it was abandoned after eleven years. The closed corpus is small enough to read exhaustively and large enough to code qualitatively. The standardized structure — objectives, strategies, risks, performance, fees, purchase/redemption — permits direct cross-fund and cross-year comparison. Outputs include retrospectives on plain-English disclosure and comment-letter submissions on prospectus rulemakings.

Quantitative finance researchers

Quant teams reconstructing historical fund universes mine the fee table for total annual operating expenses, management fees, 12b-1 fees, and the one/three/five/ten-year cost example, and the past-performance bar chart and average annual total returns table for point-in-time, as-disclosed return histories. These structured fields cross-check commercial fund databases and support work on fee dispersion, share-class economics, and disclosed-versus-realized risk.

Data engineers building EDGAR pipelines

Financial data engineering teams use the dataset as a bounded, text-only fixture (images excluded) for 497K3A handling. Every accession is enumerated, so metadata.json serves as a complete CIK/date lookup and the HTML and TXT documents act as a regression suite for parser changes, section extractors, and cross-form joins to N-1A and later 497K filings.

Litigators and expert witnesses

Plaintiff and defense counsel in fund mis-selling, suitability, fee, and fiduciary-duty matters from the 1998-2009 period retrieve the exact profile delivered at a given date. Expert witnesses preparing damages or disclosure-adequacy reports rely on the fee table, performance table, and principal-risks language to establish what was and was not said; metadata.json filing date and accession number anchor the timeline.

Financial historians and educators

Historians of the U.S. fund industry use the narrative sections — especially principal strategies and principal risks — as primary source material on how funds described themselves through the late-1990s expansion and the 2008 crisis. Faculty teaching securities regulation and investment-company law use individual filings as self-contained classroom artifacts: the fixed profile order makes one filing a complete teaching example and a handful sufficient for comparative exercises.

ML and RAG developers working with historical fund text

Teams building retrieval-augmented systems and fine-tuning corpora for financial language models use the dataset as a structurally consistent, pre-2009 disclosure source. The uniform profile schema yields clean pairs for section classification, fee extraction, and risk-language summarization, and pairs naturally with later summary prospectus corpora to expose models to the predecessor style.

The common thread across these audiences is authoritative access to the same recurring artifacts: the metadata.json identifiers, the fee and expense table, the performance bar chart and returns table, and the principal-risks narrative. The dataset's value is being complete, bounded, and standardized.

Specific Use Cases

The following workflows show how the closed Form 497K3A archive (December 1998 — March 2009) is operated on in practice. Each ties to specific record fields — metadata.json identifiers and the standardized profile sections inside the SGML-wrapped body documents.

Reconstructing the as-delivered profile for a litigation timeline

Plaintiff and defense counsel working fund mis-selling or fee-disclosure cases from the profile era retrieve the exact document delivered to investors on a given date. The workflow joins metadata.json filedAt, accessionNo, and entities[].cik to a calendar of investor transactions, then quotes the principal risks narrative and fee table from the body HTML verbatim. Output is an exhibit-ready packet: filing-date provenance from the metadata header plus a clean text extraction of the relevant section.

Building a fee-dispersion panel across the profile era

Quant researchers parse the standardized fee and expense table out of each profile to assemble a panel of management fees, 12b-1 fees, other expenses, total annual operating expenses, and the 1/3/5/10-year dollar example. CIK and fiscalYearEnd from entities anchor each row to a fund family; filedAt provides the as-disclosed timestamp. The resulting table cross-checks commercial fund databases and supports analyses of fee dispersion and share-class economics during 1998-2009.

Mining principal-strategies and principal-risks language for disclosure precedent

Investment-management lawyers drafting current summary prospectuses use the corpus as a precedent library. The workflow pulls the principal investment strategies and principal risks sections from each body HTML, segments them by fund family via CIK, and diffs phrasing on derivatives use, foreign-currency exposure, or redemption-fee mechanics across the family's profile history. Output is a precedent file showing which conventions migrated into the post-2009 summary prospectus and which were abandoned.

Constructing a point-in-time as-disclosed return history

Performance-research teams extract the bar chart of annual total returns and the average-annual-total-return table (including best-quarter / worst-quarter call-outs and the benchmark column) from the past performance section. Pairing that with the filedAt timestamp yields an as-disclosed return record uncontaminated by later restatements, useful for studies of disclosed-versus-realized return and benchmark drift.

Regression fixture for an EDGAR 497-family parser

Data engineers maintaining EDGAR ingestion pipelines use the closed record set as a bounded test fixture. The mix of early-era flat-ASCII payloads and mid-2000s heavily inline-styled HTML inside the <DOCUMENT>...<TEXT> envelope exercises SGML-wrapper stripping, redlining-marker handling (&lt;R&gt;), GRAPHIC reference resolution from documentFormatFiles, and section detection by text pattern. Every accession is enumerated, so metadata.json doubles as the ground-truth CIK/date/accession lookup for parser regression tests.

Cross-regime comparison against the post-2009 summary prospectus

Regulatory economists and academic researchers study the transition from profile to summary prospectus by aligning each 497K3A profile against the same fund family's first post-March-2009 497K filing. The fixed profile section order (objective, strategies, risks, performance, fees, purchase/redemption) maps onto Form N-1A Items 2-8, enabling structured side-by-side coding of which disclosures were retained, compressed, or dropped under amended Rule 498.

Dataset Access

The dataset is distributed as ZIP containers organized by month, covering filings from December 1998 through March 2009 when Form 497K3A was discontinued. Because the dataset is compact, most users will download the full archive directly, but the index JSON and per-container endpoints remain available for incremental workflows.

Dataset Index JSON API: https://api.sec-api.io/datasets/form-497k3a-files.json

Returns dataset metadata (name, description, last updated timestamp, earliest sample date, total records, total size, form types, container format, and file types), the download URL for the full dataset, and the list of monthly container files with per-container size, record count, updated timestamp, and download URL. Use this endpoint to monitor which containers were touched in the latest refresh run and to decide which monthly archives to fetch on a given day. This endpoint does not require an API key.

Example response:

Example
1 {
2 "datasetId": "1f13365b-9ae0-6a2e-b0a3-867d7bd2e7ab",
3 "datasetDownloadUrl": "https://api.sec-api.io/datasets/form-497k3a-files.zip",
4 "name": "Form 497K3A Files Dataset",
5 "updatedAt": "2026-04-16T08:33:43.524Z",
6 "earliestSampleDate": "1998-12-01",
7 "totalRecords": 52,
8 "totalSize": 1258065,
9 "formTypes": ["497K3A"],
10 "containerFormat": "ZIP",
11 "fileTypes": ["TXT", "JSON", "HTML"],
12 "containers": [
13 {
14 "downloadUrl": "https://api.sec-api.io/datasets/form-497k3a-files/2009/2009-03.zip",
15 "key": "2009/2009-03.zip",
16 "size": 24576,
17 "records": 1,
18 "updatedAt": "2026-04-16T08:33:43.524Z"
19 }
20 ]
21 }

Download Entire Dataset: https://api.sec-api.io/datasets/form-497k3a-files.zip?token=YOUR_API_KEY

Downloads the complete dataset as a single ZIP archive. Given the small overall size, this is typically the most convenient way to obtain the full corpus in one request. This endpoint requires an API key.

Download Single Container: https://api.sec-api.io/datasets/form-497k3a-files/2009/2009-03.zip?token=YOUR_API_KEY

Downloads one monthly container ZIP (for example, March 2009) instead of the full dataset. Use the downloadUrl values from the index JSON response to fetch specific months. This endpoint requires an API key.

Frequently Asked Questions

What form does this dataset cover?

The dataset covers EDGAR submission type 497K3A, a Rule 497 filing under the Securities Act of 1933 used by open-end management investment companies registered on Form N-1A to file the fund "profile" — the concise plain-English alternative to a full statutory prospectus authorised by the original SEC Rule 498. The "K3A" suffix corresponds to Rule 497(k)(1)(iii)(A), which governed the profile's required content and its question-and-answer ordering.

What does one record in this dataset represent?

One record is a single EDGAR 497K3A filing, identified by accession number and materialised as one accession-keyed folder containing a parsed metadata.json header and every non-image body document from the original EDGAR submission, preserved under its original filename inside its EDGAR SGML envelope. When a single submission carried profiles for multiple share classes or multiple series in one umbrella document, all of that content remains together in the same record.

Who is required to file Form 497K3A?

The legal filer is an open-end management investment company (a mutual fund) registered on Form N-1A — typically organised as a Massachusetts business trust, Delaware statutory trust, or Maryland corporation. Closed-end funds, business development companies, unit investment trusts, variable insurance separate accounts, and operating-company issuers do not use the form. Profile use was optional, so the 497K3A population is narrower than the universe of N-1A registrants.

What time period does the dataset cover?

The dataset covers the full operational lifetime of Form 497K3A, beginning at the December 1998 effective date of the original Rule 498 and ending in early 2009, when the SEC's January 2009 amendments replaced the profile framework with the summary prospectus regime under amended Rule 498 (effective March 31, 2009). The earliest sample date is December 1, 1998. No new accessions appear after the form's retirement; the dataset is closed-ended and historical.

What file formats are in the dataset?

Records are distributed as monthly ZIP containers under a YYYY/YYYY-MM.zip path scheme. Inside each accession folder, file types are TXT, JSON, and HTML: JSON for the per-record metadata.json, HTML (typically .htm) for body documents from the mid-2000s onward, and flat TXT for body documents in the earliest years of the form. Body documents are wrapped in an EDGAR SGML <DOCUMENT>...<TEXT>...</TEXT></DOCUMENT> envelope that must be stripped before HTML parsing.

How is Form 497K3A different from the post-2009 Form 497K?

Despite the visually similar code, post-2009 497K filings on EDGAR are summary prospectuses filed under the rewritten Rule 498(k), not Rule 498 profiles. The summary prospectus is keyed to Form N-1A Items 2–8 and, paired with website posting of the full statutory prospectus, satisfies prospectus-delivery obligations on different terms than the original profile. Post-2009 497K filings are not part of this dataset and must be analysed as a separate, non-interchangeable corpus.

Are images and XBRL files included?

No. Image documents listed as GRAPHIC in EDGAR (typically .gif and .jpg files used for logos, charts, and decorative typography) are intentionally omitted, although their inventory entries remain in metadata.json's documentFormatFiles array with their original filenames and direct EDGAR URLs. No XBRL instance documents exist for this form because 497K3A was never subject to XBRL tagging requirements; dataFiles is uniformly empty and linkToXbrl is uniformly an empty string.