The Form PRE 14A Files Dataset is a record-by-record archive of preliminary proxy statements filed with the SEC under Regulation 14A of the Securities Exchange Act of 1934. One record corresponds to a single EDGAR PRE 14A submission — identified by its 18-digit accession number and materialized on disk as one folder containing a metadata.json filing-level descriptor and the original EDGAR document set with image files removed. The form is filed by registrants soliciting proxies on non-routine matters such as mergers, going-private transactions, charter amendments, equity-plan adoptions, and contested director elections, at least ten calendar days before the corresponding DEF 14A is sent to security holders. Coverage spans the full EDGAR era from January 1, 1994 to the present, and the dataset is distributed in ZIP containers carrying TXT, JSON, HTML, and PDF file types.
Programmatically retrieve the full list of dataset archive files, download URLs and dataset metadata.
Dataset Index JSON API
Download the entire dataset as a single archive file.
Download Entire Dataset:
Download a single container file (e.g. monthly archive) from the dataset.
Download Single Container:
The dataset packages every PRE 14A submission accepted by EDGAR into a per-accession folder. Form PRE 14A is the preliminary proxy statement filed under Regulation 14A. Rule 14a-6(a) requires the registrant to lodge a preliminary copy of the proxy materials with the Commission at least ten calendar days before the definitive materials are released to security holders, giving the staff the option to review and comment before the solicitation reaches investors. PRE 14A is reserved for non-routine matters — mergers and acquisitions, going-private transactions, charter amendments, equity-plan adoptions, contested director elections, reverse stock splits, and similar items — that fall outside the safe harbor in Rule 14a-6(a) for proxies that solely concern routine annual-meeting business. Routine annual-meeting proxies are normally filed directly as DEF 14A.
The substantive content of a PRE 14A is governed by Schedule 14A (Rule 14a-101), which specifies the disclosures item by item. The filing is by construction a draft: the cover sheet and document text bear "PRELIMINARY COPY" and "PRELIMINARY PROXY STATEMENT — SUBJECT TO COMPLETION" markings, and the corresponding box on the Schedule 14A cover page is checked.
The dataset covers the entire population of bare-form PRE 14A filings on EDGAR from January 1994 forward. Amended preliminary proxies filed under the separate PRE 14A/A form code are not included. The on-disk artifacts are TXT (legacy ASCII proxies and the complete-submission text bundle for every filing), HTML (the dominant primary-document format throughout the modern era), JSON (the per-record metadata.json descriptor), and PDF (used occasionally by filers for ancillary exhibits). Image files referenced in the original submission are excluded.
One record in the form-pre-14a-files dataset is a single EDGAR PRE 14A submission, identified by its 18-digit accession number and materialized on disk as one folder named after that accession (dashes stripped). Every artifact that belongs to the submission — the filing-level metadata descriptor and the full set of documents from the original EDGAR package, with image files removed — sits inside that folder. The unit is the filing, not the proposal, the registrant, or the meeting: a single accession folder may carry one operating-company proxy statement or a joint preliminary proxy filed by dozens of fund trusts under one accession.
Every accession folder in the dataset is composed of two layers:
metadata.json descriptor with a fixed name, encoding the filing-level header EDGAR records on the submission and on the filing-index page.Image files (the GRAPHIC documents enumerated on the EDGAR index — typically signature scans, photographs, charts, and logo art) are removed from the on-disk container, although they remain enumerated inside metadata.json so the original submission inventory is still recoverable. XBRL side files referenced in metadata.json are likewise often not materialized as separate files because, in modern PRE 14A filings, the XBRL is embedded inline into the primary document.
In practice, most records reduce to two files on disk — metadata.json plus one primary .htm — but the layout is structured to accommodate any number of additional document files when the filer attaches separate exhibits.
metadata.json descriptormetadata.json is a single JSON object that mirrors the EDGAR filing-level header plus the document inventory and entity roster from the EDGAR index page. Its principal keys are:
formType — the literal string PRE 14A.accessionNo — the dashed 18-digit accession number; the same number with dashes stripped is the folder name.description — the EDGAR short description, typically "Form PRE 14A - Other preliminary proxy statements".filedAt — ISO-8601 timestamp of acceptance, with timezone offset (Eastern time on EDGAR).periodOfReport — YYYY-MM-DD, the meeting date or period of report stamped on the filing.linkToFilingDetails — URL of the primary HTML proxy document on www.sec.gov/Archives/edgar/data/....linkToTxt — URL of the EDGAR complete-submission text file (the legacy .txt bundle that wraps every document in the submission).linkToHtml — URL of the EDGAR filing-index HTML page (...-index.htm).linkToXbrl — URL of a standalone XBRL document if one exists; for PRE 14A this is typically empty because structured tagging is embedded inline.id — opaque 32-character hex identifier used internally as a stable record key.documentFormatFiles — array enumerating every file in the original EDGAR submission as listed on the index page. Each entry carries sequence ("1", "2", …; the trailing complete-submission txt entry uses a single space), description (free text such as "PRE 14A", "GRAPHIC", "EXHIBIT 99.1", "Complete submission text file"), type (the EDGAR document type code), documentUrl (canonical EDGAR URL), and size (byte count as a string). Sequence 1 is by convention the primary PRE 14A document; later sequences hold exhibits, graphics, XBRL artifacts, and the complete-submission txt.dataFiles — array of structured data attachments associated with the filing. For operating-company filers subject to inline XBRL tagging this typically lists the schema (EX-101.SCH, *.xsd) and the extracted inline XBRL instance (*_htm.xml) plus related linkbases. For investment-company filers it is typically empty.entities — array of one or more filer entity objects. Each carries cik, companyName (often suffixed with the EDGAR role string "(Filer)"), type (the form type for that entity), fileNo (the Exchange Act file number — 001- / 000- ranges for operating companies, 811- for registered investment companies), filmNo, act (Exchange Act, normally "34"), irsNo, stateOfIncorporation, fiscalYearEnd as a 4-character MMDD string, sic (SIC code with description, present for operating companies), and tickers (array of listed tickers, present for issuers with publicly traded equity). For joint fund-family proxies the array may hold many sibling entities, one per trust CIK, sharing the same accession number.seriesAndClassesContractsInformation — array used by Investment Company Act registrants. Each element is a series block with the EDGAR series identifier (S000XXXXXX), the fund name, and a classesContracts list whose items carry the class identifier (C000XXXXXX), the class/share-class name (e.g., "Class A", "Institutional Class"), and an optional ticker. For operating-company filings the array is empty; for fund-family proxies it may enumerate dozens of series and hundreds of classes.The primary file is one .htm document carrying the full Schedule 14A proxy statement marked as preliminary. Its filename is filer-chosen and not standardized — patterns include <ticker>-<yyyymmdd>.htm, <filer-prefix>_pre14a.htm, <filer-prefix>-pre14a_<slug>.htm, generic names such as formpre14a.htm or prelimproxy.htm, and filing-agent codes such as d<digits>dpre14a.htm (Donnelley) or tm<digits>d<n>_pre14a.htm (Toppan Merrill). The disk filename is mirrored verbatim from the basename of documentFormatFiles[0].documentUrl.
Two structural styles appear in modern PRE 14A primary documents:
SGML-wrapped HTML (legacy EDGAR document wrapper). The file opens with the EDGAR-internal <DOCUMENT> envelope:
1
<DOCUMENT>
2
<TYPE>PRE 14A
3
<SEQUENCE>1
4
<FILENAME>tm2519035d1_pre14a.htm
5
<DESCRIPTION>PRE 14A
6
<TEXT>
7
<HTML>... full HTML body of the preliminary proxy statement ...</HTML>
8
</TEXT>
9
</DOCUMENT>
The SGML tags carry the type, sequence number, filename, and description that EDGAR uses to identify the document inside the submission package; the actual proxy statement is the HTML inside <TEXT>...</TEXT>.
Inline XBRL XHTML (iXBRL). The file opens with an XML prologue and an <html> root carrying the inline-XBRL namespace declarations (ix, xbrli, xbrldi, dei, us-gaap, ecd, srt, …). The first <body> contains an <ix:header> block with hidden <ix:nonNumeric> facts (cover-page tags such as dei:DocumentType = "PRE 14A", dei:EntityCentralIndexKey, dei:AmendmentFlag) and a sequence of <xbrli:context> definitions used for the pay-versus-performance disclosures. The visible cover page and proxy statement follow in a normal <div> tree, with selected facts (registrant name, ticker, pay-versus-performance amounts) tagged via inline XBRL elements. iXBRL primaries are noticeably larger than their SGML-wrapped counterparts because of the embedded contexts and tagged facts.
Regardless of structural style, the visible body carries the same Schedule 14A content elements, in roughly the order below.
The first visible page is the Schedule 14A cover sheet. It carries the heading "UNITED STATES SECURITIES AND EXCHANGE COMMISSION" and the legend "SCHEDULE 14A INFORMATION / Proxy Statement Pursuant to Section 14(a) of the Securities Exchange Act of 1934". Below that sits the filing-class checkbox group:
The cover then identifies the registrant (and any other person filing the proxy statement, such as a separate soliciting party in a contested election) and resolves the Payment of Filing Fee section — historically a fee table with three options ("No fee required" / "Fee computed on table below" / "Fee paid previously with preliminary materials"), and since the 2022 amendments to Rule 14a-6(i) a simplified two-option block referencing the new fee-table exhibit (Exhibit 107) when a fee is required.
Following the cover, the document typically contains a chairman's or president's letter and a formal Notice of Annual or Special Meeting of Stockholders. The notice block states the date, time, location (or virtual-meeting URL/instructions), the record date for voting, and a bulleted list of proposals to be voted on.
The proxy statement proper follows. Its sections map to the items of Schedule 14A. The set actually present in any given filing depends on the matters being voted on, but the typical order is:
The closing section contains the form of proxy card or voting-instruction card, presented as a sample ballot enumerating each proposal with the corresponding For/Against/Abstain or For All/Withhold All/For All Except options, the registrant's recommendation on each proposal, signature blocks, and instructions for returning the proxy or voting electronically.
Filings that include transactional or governance items routinely append annexes after the proxy-statement body, embedded inside the same primary .htm rather than supplied as separate exhibit files. Common annexes include the merger agreement, the certificate-of-incorporation amendment, the equity-incentive plan document, the fairness opinion, fiscal-year audited financial statements (when incorporated rather than referenced), and reconciliations of any non-GAAP measures used in the proxy statement.
When a filer chooses to submit exhibits as separate files rather than annexing them to the primary document, those exhibit files appear as additional sequenced entries in documentFormatFiles and as additional files on disk inside the accession folder. Common standalone exhibits include the form of proxy card filed separately, an EX-99 letter to shareholders or investor presentation, and — since the 2022 fee-disclosure amendments — Exhibit 107 (EX-FILING FEES) carrying the Rule 0-11 fee calculation when a filing fee is required.
The dataset includes every non-image document from the original submission. This embraces HTML documents, plain-text exhibits, and PDFs where filers used PDF for ancillary materials. It excludes GRAPHIC files (.jpg, .gif, .png) regardless of whether the graphics are signature pages, charts, photographs, or logo art; the graphics remain enumerated in documentFormatFiles but are not present on disk. Standalone XBRL artifacts referenced in dataFiles are also typically embedded inline rather than materialized as separate files.
Investment Company Act registrants — open-end funds, closed-end funds, and BDCs — file PRE 14A under the same form code but use a different content profile. The cover page identifies the trust by its 1940 Act file number (the 811- series), and the Schedule 14A body emphasizes Item 22 disclosures: approval or amendment of advisory or sub-advisory contracts, election of trustees, fund reorganizations and tax-free mergers, changes to fundamental investment policies, and Rule 12b-1 distribution-plan adoption or amendment. Joint proxies are common, with one accession number covering many fund trusts; in metadata.json this surfaces as a multi-entry entities array and a richly populated seriesAndClassesContractsInformation array enumerating the affected series (S000XXXXXX) and classes/contracts (C000XXXXXX). Operating-company-style executive-compensation tables and pay-versus-performance disclosures are absent from fund proxies.
Amended preliminary proxies are filed as PRE 14A/A (a separate form type) and are not included in this dataset. Corrections or revisions to a preliminary proxy issued before the definitive version are therefore captured only in the form-specific PRE 14A/A dataset; PRE 14A records here are first-filed preliminary proxies under the bare PRE 14A form code.
The structure of Schedule 14A has been progressively expanded since EDGAR began accepting PRE 14A filings in 1994:
EX-FILING FEES).PRE 14A submissions span the full EDGAR era from January 1994 onward and have moved through three presentation regimes:
<DOCUMENT> / <TEXT> envelope. Tabular content (Summary Compensation Table, beneficial ownership) was hand-aligned with whitespace.<DOCUMENT> wrapper, with native HTML tables, bold headings, and limited inline images (which the dataset omits). The SGML wrapper continued to carry <TYPE>, <SEQUENCE>, <FILENAME>, and <DESCRIPTION> metadata for each document.dei cover-page facts tagged in an <ix:header>. The 2022 Pay-versus-Performance rule extended inline tagging to a substantive disclosure block within the proxy statement, embedding numerous <xbrli:context> definitions and <ix:nonFraction> facts directly in the primary document. Since this rollout, operating-company PRE 14A primary documents are predominantly iXBRL XHTML, while investment-company filings — which are not subject to PvP — generally remain SGML-wrapped HTML.entities array in metadata.json is the only place where the individual trust CIKs are surfaced.documentFormatFiles is faithful to the EDGAR index and therefore enumerates more files than are present on disk: graphics are listed but excluded, and structured XBRL side files may be listed even when their content is embedded inline rather than supplied separately. To reconcile what is on disk against what was submitted to EDGAR, compare the basenames of documentUrl entries with the file inventory in the folder.linkToTxt always points to the EDGAR complete-submission txt, the authoritative SGML bundle of every document in the original filing including the omitted graphics. For downstream uses that require byte-for-byte fidelity to the EDGAR submission, that URL is the canonical reference..htm is simultaneously the human-readable proxy statement and the XBRL instance document. Extracting structured cover-page or pay-versus-performance facts requires parsing <ix:header> and <ix:nonFraction> / <ix:nonNumeric> elements rather than relying solely on visible text.tm…, EdgarAgents ea…, Donnelley d…/dp…, Broadridge, RDG Filings) are visible in primary-document filenames and embedded HTML comments but are not surfaced as a structured field; downstream classification of agent or template requires document-level inspection.Form PRE 14A is filed by registrants soliciting proxies from holders of securities registered under Section 12 of the Securities Exchange Act of 1934. The filer is always the soliciting party on whose proxy card the vote is being requested.
The reporting population includes:
Foreign private issuers are excluded. FPIs are exempt from Regulation 14A under Rule 3a12-3(b) and instead furnish proxy and information materials on Form 6-K. They do not appear in this dataset.
PRE 14A is event-driven, not periodic. The trigger is a decision to solicit proxies on at least one matter that is not within the routine list in Rule 14a-6(a). Routine items that bypass preliminary filing include uncontested director elections, ratification of accountants, Rule 14a-8 shareholder proposals, ordinary compensation- and benefit-plan votes, and Rule 14a-21 say-on-pay/say-on-frequency votes. Annual meetings limited to those items go directly to DEF 14A and produce no PRE 14A.
A preliminary filing is required when the solicitation includes any non-routine matter, typically:
Timing is anchored to the definitive distribution. Under Rule 14a-6, the preliminary proxy must be on file with the SEC at least ten calendar days before the definitive DEF 14A is first sent or given to security holders. The clock runs from the EDGAR filing date and gives Corporation Finance staff (or Investment Management staff, for funds) the chance to issue comments. Revisions in response to comments are filed as PRER14A amendments before the registrant clears to DEF 14A.
The governing framework is Section 14(a) and Regulation 14A (Rules 14a-1 through 14a-21), with line-item content prescribed by Schedule 14A. The PRE 14A document is, in substance, a Schedule 14A marked "PRELIMINARY COPY," subject to the antifraud standard of Rule 14a-9 and the pre-statement-communication rules of Rule 14a-12.
PRE 14A sits inside a dense cluster of Section 14(a) and 14(c) filings. Boundaries with neighboring forms typically resolve along four axes: filing stage (preliminary, definitive, supplemental, revised), trigger (routine vs non-routine, vote vs written consent, contested vs uncontested), filer side (issuer vs dissident vs voter), and scope (standalone proxy vs combined registration).
The direct downstream counterpart. Same Schedule 14A scope, same registrant; PRE 14A is filed first for non-routine matters and held at least ten calendar days for staff review before the DEF 14A is mailed and used to solicit votes. PRE 14A may carry placeholder dates or to-be-finalized figures; DEF 14A is the operative, distributed version. Routine annual-meeting proxies skip the preliminary stage under Rule 14a-6(a), so PRE 14A is structurally biased toward non-routine matters while DEF 14A covers the full proxy population.
Supplemental campaign materials (presentations, press releases, shareholder letters, Q&A) filed after the DEF 14A, not a draft proxy. PRE 14A is the full preliminary statement filed before the DEF 14A; DEFA14A accumulates after it. Sequential and complementary, never substitutes.
PRER14A is a refiled preliminary proxy with material changes, typically responding to staff comments; DEFR14A is the revised definitive. PRE 14A captures only the initial preliminary submission. Reconstructing the staff-comment redline requires pairing PRE 14A with PRER14A, both of which sit under separate form codes.
Schedule 14C governs corporate actions taken without soliciting proxies, usually because written consents from a majority of voting power are already in hand. Disclosure content overlaps heavily with Schedule 14A, but no proxy card is included and the filer is informing rather than asking. PRE 14A covers matters going to an actual vote; PRE 14C covers the parallel consent-driven universe. The two should not be conflated: the legal posture and the room for shareholder influence differ fundamentally.
Adversarial-side analogs: PREC14A (preliminary contested proxy), PREN14A (preliminary non-management proxy), PRRN14A (revised non-management preliminary), DFAN14A (additional definitive non-management materials, the dissident DEFA14A). PRE 14A can be filed by an issuer in a contested matter, but the C and N series specifically flag dissident or non-management activity under distinct form codes. A full proxy contest record requires PRE 14A on the issuer side combined with the C/N filings on the insurgent side.
Reserved form codes for mergers, acquisitions, and similar extraordinary transactions. Mergers are inherently non-routine and are the most common driver of preliminary-stage filings, but they are routed to PREM14A/DEFM14A rather than PRE 14A/DEF 14A. PRE 14A therefore carries non-routine, non-merger matters: charter amendments, equity plan adoptions over Rule 14a-6 thresholds, going-private elements not labeled as mergers, and contested elections without a transaction component. M&A solicitation studies should anchor on PREM14A/DEFM14A and use PRE 14A only as a secondary source.
Post-meeting vote tally filed within four business days of the meeting. Structurally tabular (votes for, against, abstain, broker non-votes per proposal) versus PRE 14A's narrative and exhibit-heavy form. Timing is opposite: PRE 14A frames the ask before the meeting, Item 5.07 records the outcome after. Neither substitutes for the other; they bookend the solicitation cycle.
Voter-side disclosure filed by registered investment companies and, post-2022, certain institutional managers under Section 14A(b), reporting how they voted at portfolio companies. PRE 14A is issuer-side and pre-meeting; N-PX is voter-side and post-meeting. The datasets join on issuer and meeting date but capture opposite ends of the lifecycle.
Securities Act registration used when securities are issued in a business combination, exchange offer, or recapitalization. Where shareholder approval is also required, S-4 is commonly filed as a joint proxy statement/prospectus, folding Schedule 14A disclosures into a registration document subject to Securities Act review standards. For stock-for-stock mergers the S-4/joint proxy is typically the operative filing and no separate PREM14A or PRE 14A is filed. PRE 14A therefore underrepresents merger solicitations bundled with securities issuance; a complete merger corpus needs both.
10-K Part III (executive compensation, ownership, director independence, related-party transactions, accountant fees) is routinely satisfied by incorporating the definitive proxy by reference if filed within 120 days of fiscal year-end. The incorporation points to DEF 14A, not PRE 14A. PRE 14A is not the operative source for 10-K-linked governance and compensation disclosures; its value in this context is limited to studying how Part III content evolved between the preliminary and definitive versions.
PRE 14A is distinctive as the SEC-review-stage version of non-routine, vote-soliciting proxy materials filed under Schedule 14A by the issuer. It is narrower than the full proxy universe (excludes routine DEF 14As that skip the preliminary stage), distinct from Schedule 14C (vote vs consent), distinct from PREM14A (non-merger non-routine matters only), distinct from the contested C/N series (issuer-side rather than dissident), and distinct from S-4 (proxy disclosure only, no securities registration). Its unique value is the pre-clearance window: language and structure as first submitted to the staff, before the redline that produces the definitive proxy.
Preliminary proxies surface non-routine ballot items (mergers, charter changes, contested elections, equity-plan requests) before the definitive DEF 14A is mailed. Users anchor on specific sections: Schedule 14A item lists, Background-of-the-Merger, fairness opinions, CD&A, Say-on-Pay, exhibits, meeting and record dates, and the eventual PRE-to-DEF redline.
Event-driven traders price spreads off the Background-of-the-Merger narrative, the financial advisor's fairness opinion and projection tables, deal-protection terms (no-shop, matching rights, termination fees), and the disclosed meeting timeline. The preliminary version lets desks size positions before the record date is fixed and track PRE 14A/A amendments, which often signal SEC comments or litigation pressure on disclosure.
Activists screen PRE 14A filings for contested slates, charter items affecting shareholder rights (advance-notice bylaws, supermajority thresholds, classified boards), and Rule 14a-8 proposals that cleared no-action review. They work from the proposed slate, incumbent biographies, and the matters-to-be-voted-on schedule to time counter-solicitations or escalation letters before definitive distribution.
Voting-recommendation analysts use the preliminary stage to begin drafting reports on complex ballots: equity-plan share requests, M&A votes, compensation amendments, and governance proposals. They focus on equity-plan dilution and burn-rate tables, CD&A and Say-on-Pay narrative, related-party transactions, and director-independence representations, getting analyses ready before the DEF 14A opens the formal voting window.
Disclosure lawyers mine the dataset as a precedent library when drafting their own preliminary proxies, benchmarking treatment of deal background, alternatives considered, advisor conflicts, appraisal rights, and golden-parachute disclosures. Litigation counsel reviews the same sections to assess strike-suit and disclosure-only settlement exposure on merger proxies. Proxy solicitation firms use the filing flow to map upcoming contested or complex votes.
In-house teams benchmark structure, exhibit usage, and disclosure ordering against peer PRE 14As before filing their own. They study peer PRE 14A/A amendments to anticipate likely SEC staff comments on sensitive items: pay realignments, golden parachutes, dual-class structures, and shareholder proposal responses.
Reward-design teams compare CD&A drafts, peer-group selection rationale, performance-metric design, pay-versus-performance tables, clawback descriptions, and equity-plan share-request justifications across preliminary filings. PRE-to-DEF deltas reveal where staff comments or board feedback reshaped pay disclosure language.
Researchers run PRE-versus-DEF redline studies to identify disclosures modified after staff comments, plaintiff complaints, or institutional engagement. The same filings build event histories for studies linking disclosure quality, deal-protection design, or contest features to vote outcomes and post-deal performance.
Quant teams treat the dataset as a structured event feed. Filing date, meeting date, record date, matter-type classifications, and deal terms feed completion-probability and vote-outcome models. Textual features from Background-of-the-Merger, fairness-opinion language, and projection tables become inputs to systematic merger-arb and governance strategies.
Proxy operations teams at broker-dealers, custodians, and advisers use PRE 14A flow to stage upcoming non-routine ballots for pass-through voting, fund-board review, and ERISA fiduciary review. Issuer identifiers, meeting dates, and proposal-type tagging drive ballot setup and vote-recommendation pipelines.
Credit desks read deal-financing terms, pro forma capital structure, change-of-control covenants, and indemnification arrangements in M&A and recapitalization PRE 14As. Special-situations desks watch for asset-sale, going-private, and charter-change proposals that may trigger bond puts or covenant tests.
Governance and deal reporters use preliminary filings to surface contested elections, controversial pay, and complex M&A before the definitive proxy mails. Coverage back to 1994 supports long-horizon trend work on deal-protection terms, executive pay, and shareholder-proposal outcomes.
Teams building extraction and retrieval systems use the standardized Schedule 14A structure and paired PRE/DEF documents for training and evaluation: proposal classification, fairness-opinion extraction, board-slate parsing, equity-plan term extraction, and redline detection between draft and final disclosures.
PRE 14A records carry the first-filed, pre-clearance version of non-routine proxy materials. The use cases below tie to specific Schedule 14A items, exhibits, and the PRE-to-DEF lifecycle.
Event desks parse Item 14 sections — Background of the Merger, board reasons for approval, financial advisor fairness opinions and projection tables, no-shop and termination-fee provisions, treatment of equity awards, and the disclosed meeting timeline — out of the primary .htm to seed completion-probability and spread models. The periodOfReport meeting date and filedAt acceptance timestamp anchor the timeline; downstream PRER14A/PRE 14A/A activity flags SEC-comment or litigation pressure on the same disclosure block. The preliminary stage gives desks a 10+ day window to position before record dates are finalized.
Pair each PRE 14A primary document with the matching DEF 14A on the same accession lineage and diff Items 7, 8, 10, and 14 to surface staff-comment or plaintiff-driven edits — reworded fairness-opinion caveats, restated equity-plan dilution figures, softened CD&A language, added appraisal-rights detail, or new related-party disclosures. Output is a redline corpus used by governance researchers, disclosure counsel, and forensic accounting studies linking specific edits to vote outcomes or post-deal performance.
Filter the corpus for filings carrying Rule 14a-19 universal-proxy cards, Item 4 solicitation-cost disclosures by a non-management party, Item 5 interests of certain persons, and bylaw or charter items affecting advance-notice or board-classification provisions. Cross-join with the contested C/N series (PREC14A, PREN14A, DFAN14A) on issuer CIK and meeting date to assemble full proxy-contest records, feeding activist trackers and counter-solicitation timing tools.
Pull Item 8 CD&A, Summary Compensation Table, Pay-versus-Performance table (extracted via inline XBRL <ix:nonFraction> facts in the iXBRL primaries), peer-group selection rationale, clawback descriptions, and Item 10 equity-plan share-reserve, burn-rate, dilution, and new-plan-benefits tables. Compensation consultants use these to benchmark share-request justification, performance-metric design, and CEO-pay quantum across SIC- and ticker-matched peers drawn from entities[].sic and entities[].tickers.
Securities and M&A counsel query the dataset as a template bank: golden-parachute tables, appraisal-rights notices, advisor-conflict disclosures, no-shop and matching-rights language, charter-amendment marked copies, and Exhibit 107 fee-table layouts. Filtering by entities[].sic, transaction type (Item 14 vs Item 19 vs Item 22), and filing-agent fingerprint in the primary filename (tm…, d…, ea…) produces precedent sets matched to deal type and drafting house.
For Investment Company Act registrants (file numbers in the 811- range), use the seriesAndClassesContractsInformation array to enumerate every affected series (S000XXXXXX) and class (C000XXXXXX) under a single joint-proxy accession. This drives ballot setup at custodians and advisers, monitoring of Item 22 advisory-contract approvals, sub-advisory changes, fund reorganizations, and 12b-1 plan amendments at the share-class level.
The standardized Schedule 14A item ordering and the paired PRE/DEF lifecycle make the dataset a labeled-by-structure training source for proposal classification (Items 7, 8, 10, 14, 19, 22, 24), fairness-opinion-section extraction, board-slate parsing, equity-plan term extraction, and redline detection. Inline XBRL cover-page facts in the <ix:header> provide gold labels for dei fields (entity, ticker, document type), and the per-record metadata.json supplies stable join keys (accessionNo, id, CIK) for retrieval indexes.
The Form PRE 14A Files Dataset is accessible through a JSON index endpoint, a full archive download, and per-container downloads. The dataset covers filings from January 1, 1994 to the present, is delivered in ZIP container format, and includes TXT, JSON, HTML, and PDF file types.
Dataset Index JSON API: https://api.sec-api.io/datasets/form-pre-14a-files.json
Returns dataset-level metadata (name, description, last updated timestamp, earliest sample date, total record count, total size, form types covered, container format, and file types), the full dataset download URL, and the list of all individual container files with per-container size, record count, updated timestamp, and download URL. This endpoint does not require an API key. Use it to monitor which containers were updated in the most recent refresh and to decide which containers to download incrementally.
1
{
2
"datasetId": "1f13365b-9ae0-68fd-a540-d7488b5449a6",
3
"datasetDownloadUrl": "https://api.sec-api.io/datasets/form-pre-14a-files.zip",
4
"name": "Form PRE 14A Files Dataset",
5
"updatedAt": "2026-05-05T02:49:07.617Z",
6
"earliestSampleDate": "1994-01-01",
7
"totalRecords": 50453,
8
"totalSize": 3088655779,
9
"formTypes": ["PRE 14A"],
10
"containerFormat": "ZIP",
11
"fileTypes": ["TXT", "JSON", "HTML", "PDF"],
12
"containers": [
13
{
14
"downloadUrl": "https://api.sec-api.io/datasets/form-pre-14a-files/2026/2026-03.zip",
15
"key": "2026/2026-03.zip",
16
"size": 13818783,
17
"records": 154,
18
"updatedAt": "2026-05-05T02:49:07.617Z"
19
}
20
]
21
}
Download Entire Dataset: https://api.sec-api.io/datasets/form-pre-14a-files.zip?token=YOUR_API_KEY
Downloads the complete dataset as a single ZIP archive containing all monthly containers. This endpoint requires an API key.
Download Single Container: https://api.sec-api.io/datasets/form-pre-14a-files/2026/2026-03.zip?token=YOUR_API_KEY
Downloads one individual monthly container ZIP, which is useful for incremental updates or when only a specific time period is needed. This endpoint requires an API key.
The dataset covers Form PRE 14A, the preliminary proxy statement filed under Regulation 14A of the Securities Exchange Act of 1934. The substantive content of each filing is governed by Schedule 14A (Rule 14a-101), and every filing is marked "PRELIMINARY COPY" with the corresponding box on the Schedule 14A cover page checked.
One record represents a single EDGAR PRE 14A submission, identified by its 18-digit accession number and materialized on disk as one folder named after that accession (dashes stripped). Each folder contains a metadata.json filing-level descriptor and the original EDGAR documents (the primary preliminary proxy statement plus any separately filed exhibits), with image GRAPHIC files removed.
PRE 14A is filed by registrants soliciting proxies from holders of securities registered under Section 12 of the Exchange Act when the solicitation includes at least one matter that is not within the routine list in Rule 14a-6(a). The reporting population spans domestic operating companies, registered investment companies (open-end funds, closed-end funds, BDCs), bank holding companies, REITs, Section 12-registered limited partnerships, and SPACs. Foreign private issuers are excluded under Rule 3a12-3(b).
Under Rule 14a-6, the preliminary proxy must be on file with the SEC at least ten calendar days before the definitive DEF 14A is first sent or given to security holders. The window gives Corporation Finance staff (or Investment Management staff, for funds) the chance to issue comments before the solicitation reaches investors; revisions filed in response to staff comments use the PRER14A form code.
PRE 14A is the preliminary, pre-clearance draft submitted for staff review on non-routine matters; DEF 14A is the definitive version mailed to security holders. Routine annual-meeting proxies skip the preliminary stage under Rule 14a-6(a) and are filed directly as DEF 14A, so the PRE 14A dataset is structurally biased toward non-routine items — mergers, going-private deals, charter amendments, equity-plan adoptions, and contested elections — while DEF 14A covers the full proxy population. PRE 14A also captures placeholder dates and to-be-finalized figures that are resolved in the corresponding DEF 14A.
No. Amended preliminary proxies are filed under the separate PRE 14A/A form code and are captured in the form-specific PRE 14A/A dataset. This dataset contains only first-filed preliminary proxies under the bare PRE 14A form code.
The dataset is delivered in ZIP container format. Inside each container, records carry TXT (legacy ASCII proxies and the EDGAR complete-submission text bundle), HTML (the primary proxy-statement document, either SGML-wrapped HTML or Inline XBRL XHTML in the modern era), JSON (the per-record metadata.json descriptor), and PDF (used occasionally by filers for ancillary exhibits). Image files referenced in the original submission are excluded from the on-disk container.