Form ARS Files — Annual Report to Security Holders Filings Dataset

The Form ARS Files dataset is the EDGAR archive of the shareholder-facing "glossy" annual report furnished by U.S. operating-company registrants under Rule 14a-3 of the Securities Exchange Act of 1934. One record corresponds to a single Form ARS or Form ARS/A submission — one furnishing of an annual report to security holders by a registrant for a specific fiscal period — packaged as an accession-numbered folder containing the primary annual-report PDF and a metadata.json sidecar that mirrors the EDGAR submission header. The dataset spans January 1994 to the present and reflects a sharp density discontinuity at January 11, 2023, the compliance date on which amended Rule 101 of Regulation S-T made electronic submission of Form ARS on EDGAR in PDF format mandatory for all Regulation 14A proxy-soliciting issuers. Form types covered are ARS and ARS/A; file types found across the corpus are TXT, JSON, HTML, and PDF, with modern records standardized to one PDF plus the JSON metadata sidecar.

Update Frequency
Daily
Updated at
2026-05-19
Earliest Sample Date
1994-01-01
Total Size
84.7 GB
Total Records
15,534
Container Format
ZIP
Content Types
TXT, JSON, HTML, PDF
Form Types
ARS, ARS/A

Dataset APIs

Programmatically retrieve the full list of dataset archive files, download URLs and dataset metadata.

Dataset Index JSON API

Download the entire dataset as a single archive file.

Download Entire Dataset:

Download a single container file (e.g. monthly archive) from the dataset.

Download Single Container:

Dataset Files

359 files · 84.7 GB
Download All
2026-05.zip736.7 MB200 records
2026-04.zip11.5 GB3,234 records
2026-03.zip5.1 GB673 records
2026-02.zip357.1 MB44 records
2026-01.zip328.2 MB64 records
2025-12.zip485.1 MB61 records
2025-11.zip145.8 MB56 records
2025-10.zip428.7 MB108 records
2025-09.zip608.6 MB82 records
2025-08.zip386.1 MB52 records
2025-07.zip911.6 MB94 records
2025-06.zip562.7 MB85 records
2025-05.zip1.1 GB148 records
2025-04.zip14.6 GB1,707 records
2025-03.zip5.9 GB684 records
2025-02.zip416.9 MB46 records
2025-01.zip389.8 MB74 records
2024-12.zip352.7 MB59 records
2024-11.zip122.0 MB45 records
2024-10.zip628.1 MB116 records
2024-09.zip518.2 MB68 records
2024-08.zip551.9 MB70 records
2024-07.zip502.8 MB85 records
2024-06.zip589.5 MB86 records
2024-05.zip948.2 MB170 records
2024-04.zip11.0 GB1,726 records
2024-03.zip4.8 GB674 records
2024-02.zip416.2 MB40 records
2024-01.zip332.3 MB71 records
2023-12.zip369.8 MB54 records
2023-11.zip131.0 MB39 records
2023-10.zip280.4 MB91 records
2023-09.zip441.4 MB82 records
2023-08.zip427.2 MB66 records
2023-07.zip531.8 MB71 records
2023-06.zip395.3 MB86 records
2023-05.zip2.2 GB320 records
2023-04.zip9.3 GB1,532 records
2023-03.zip5.0 GB731 records
2023-02.zip318.2 MB37 records
2023-01.zip264.0 MB38 records
2022-10.zip280.7 KB2 records
2022-09.zip9.8 KB1 records
2022-08.zip203.7 KB1 records
2022-07.zip1.8 KB1 records
2022-06.zip196.2 KB1 records
2022-05.zip145.1 KB1 records
2022-04.zip1.1 MB7 records
2022-03.zip755.0 KB4 records
2022-01.zip31.8 KB1 records
2021-09.zip110.6 KB2 records
2021-08.zip157.3 KB1 records
2021-07.zip260.6 KB2 records
2021-05.zip2.5 KB1 records
2021-04.zip386.1 KB5 records
2021-03.zip1.1 MB5 records
2021-01.zip32.2 KB2 records
2020-10.zip191.0 KB3 records
2020-09.zip110.8 KB2 records
2020-07.zip289.3 KB2 records
2020-04.zip1.6 MB10 records
2020-03.zip77.6 MB6 records
2020-01.zip2.1 KB1 records
2019-11.zip79.0 KB1 records
2019-10.zip134.4 KB3 records
2019-09.zip102.1 KB2 records
2019-07.zip73.4 KB1 records
2019-06.zip321.2 KB2 records
2019-04.zip2.1 MB11 records
2019-03.zip1.2 MB5 records
2019-02.zip144.9 KB1 records
2019-01.zip2.1 KB1 records
2018-10.zip41.2 KB1 records
2018-09.zip103.1 KB1 records
2018-08.zip784.6 KB4 records
2018-07.zip66.4 KB1 records
2018-06.zip216.0 KB1 records
2018-05.zip490.8 KB9 records
2018-04.zip1.7 MB11 records
2018-03.zip838.2 KB3 records
2018-02.zip154.0 KB2 records
2018-01.zip2.0 KB1 records
2017-10.zip275.7 KB3 records
2017-09.zip250.3 KB2 records
2017-06.zip250.1 KB1 records
2017-05.zip207.7 KB2 records
2017-04.zip37.2 MB17 records
2017-03.zip927.2 KB4 records
2017-02.zip147.1 KB2 records
2017-01.zip2.0 KB1 records
2016-12.zip35.1 KB2 records
2016-11.zip34.0 KB1 records
2016-10.zip537.4 KB5 records
2016-09.zip257.9 KB2 records
2016-08.zip223.9 KB1 records
2016-07.zip183.7 KB2 records
2016-06.zip996.8 KB2 records
2016-05.zip473.2 KB3 records
2016-04.zip3.5 MB18 records
2016-03.zip579.9 KB6 records
2016-02.zip258.6 KB3 records
2016-01.zip1.8 KB1 records
2015-12.zip22.0 KB2 records
2015-11.zip131.3 KB2 records
2015-10.zip186.9 KB2 records
2015-09.zip583.4 KB4 records
2015-08.zip22 B0 records
2015-07.zip119.9 KB1 records
2015-06.zip227.0 KB1 records
2015-05.zip662.0 KB5 records
2015-04.zip2.7 MB18 records
2015-03.zip19.2 MB7 records
2015-02.zip152.5 KB1 records
2015-01.zip1.8 KB1 records
2014-12.zip336.8 KB4 records
2014-11.zip301.2 KB3 records
2014-10.zip64.4 KB1 records
2014-09.zip115.7 KB1 records
2014-08.zip22 B0 records
2014-07.zip22 B0 records
2014-06.zip213.6 KB1 records
2014-05.zip139.3 KB1 records
2014-04.zip1.8 MB11 records
2014-03.zip17.0 MB11 records
2014-02.zip139.9 KB1 records
2014-01.zip22.6 KB1 records
2013-12.zip216.8 KB3 records
2013-11.zip268.7 KB2 records
2013-10.zip34.6 KB1 records
2013-09.zip116.7 KB1 records
2013-08.zip131.5 KB1 records
2013-07.zip22 B0 records
2013-06.zip22 B0 records
2013-05.zip266.4 KB3 records
2013-04.zip1.8 MB11 records
2013-03.zip10.2 MB11 records
2013-02.zip22 B0 records
2013-01.zip235.3 KB3 records
2012-12.zip99.6 KB2 records
2012-11.zip22 B0 records
2012-10.zip22 B0 records
2012-09.zip112.1 KB1 records
2012-08.zip45.4 KB1 records
2012-07.zip22 B0 records
2012-06.zip74.3 KB1 records
2012-05.zip174.4 KB1 records
2012-04.zip1.4 MB9 records
2012-03.zip25.3 MB9 records
2012-02.zip125.6 KB1 records
2012-01.zip1.9 KB1 records
2011-12.zip205.2 KB3 records
2011-11.zip44.6 KB1 records
2011-10.zip1.2 MB2 records
2011-09.zip101.9 KB1 records
2011-08.zip30.8 KB1 records
2011-07.zip2.4 KB1 records
2011-06.zip120.6 KB1 records
2011-05.zip166.9 KB2 records
2011-04.zip1.9 MB18 records
2011-03.zip7.1 MB16 records
2011-02.zip22 B0 records
2011-01.zip155.1 KB1 records
2010-12.zip188.4 KB3 records
2010-11.zip143.2 KB1 records
2010-10.zip185.1 KB2 records
2010-09.zip87.1 KB1 records
2010-08.zip207.5 KB2 records
2010-07.zip22 B0 records
2010-06.zip22 B0 records
2010-05.zip142.6 KB1 records
2010-04.zip2.0 MB13 records
2010-03.zip1.8 MB14 records
2010-02.zip157.6 KB1 records
2010-01.zip11.4 KB1 records
2009-12.zip67.8 KB1 records
2009-11.zip22 B0 records
2009-10.zip288.5 KB3 records
2009-09.zip144.0 KB2 records
2009-08.zip27.8 KB1 records
2009-07.zip42.1 KB1 records
2009-06.zip226.2 KB2 records
2009-05.zip266.8 KB2 records
2009-04.zip1.3 MB10 records
2009-03.zip4.7 MB14 records
2009-02.zip290.2 KB8 records
2009-01.zip7.9 KB1 records
2008-12.zip65.0 KB1 records
2008-11.zip22 B0 records
2008-10.zip288.9 KB3 records
2008-09.zip312.0 KB3 records
2008-08.zip25.6 KB1 records
2008-07.zip22 B0 records
2008-06.zip102.5 KB1 records
2008-05.zip669.8 KB11 records
2008-04.zip1.9 MB14 records
2008-03.zip5.5 MB15 records
2008-02.zip391.2 KB4 records
2008-01.zip2.1 KB1 records
2007-12.zip135.2 KB2 records
2007-11.zip22 B0 records
2007-10.zip64.6 KB2 records
2007-09.zip180.6 KB3 records
2007-08.zip165.2 KB3 records
2007-07.zip22 B0 records
2007-06.zip62.2 KB1 records
2007-05.zip296.9 KB5 records
2007-04.zip2.0 MB18 records
2007-03.zip3.6 MB12 records
2007-02.zip181.5 KB2 records
2007-01.zip22 B0 records
2006-12.zip208.0 KB3 records
2006-11.zip141.8 KB7 records
2006-10.zip88.5 KB1 records
2006-09.zip55.0 KB2 records
2006-08.zip162.7 KB3 records
2006-07.zip22 B0 records
2006-06.zip165.6 KB2 records
2006-05.zip253.2 KB3 records
2006-04.zip5.6 MB16 records
2006-03.zip758.5 KB10 records
2006-02.zip22 B0 records
2006-01.zip230.9 KB3 records
2005-12.zip184.1 KB3 records
2005-11.zip130.7 KB5 records
2005-10.zip39.9 KB1 records
2005-09.zip5.0 KB1 records
2005-08.zip57.0 KB3 records
2005-07.zip194.5 KB8 records
2005-06.zip149.6 KB3 records
2005-05.zip498.5 KB6 records
2005-04.zip3.6 MB15 records
2005-03.zip1.6 MB25 records
2005-02.zip246.5 KB15 records
2005-01.zip20.0 KB1 records
2004-12.zip458.0 KB13 records
2004-11.zip126.0 KB7 records
2004-10.zip35.2 KB1 records
2004-09.zip215.6 KB4 records
2004-08.zip56.7 KB2 records
2004-07.zip178.4 KB7 records
2004-06.zip356.4 KB4 records
2004-05.zip684.2 KB6 records
2004-04.zip6.0 MB16 records
2004-03.zip1.1 MB18 records
2004-02.zip237.6 KB4 records
2004-01.zip134.2 KB2 records
2003-12.zip173.2 KB4 records
2003-11.zip183.5 KB9 records
2003-10.zip67.4 KB1 records
2003-09.zip251.0 KB4 records
2003-08.zip104.0 KB1 records
2003-07.zip93.6 KB2 records
2003-06.zip386.2 KB5 records
2003-05.zip230.1 KB6 records
2003-04.zip1.4 MB15 records
2003-03.zip943.0 KB19 records
2003-02.zip76.6 KB3 records
2003-01.zip162.4 KB3 records
2002-12.zip250.1 KB5 records
2002-11.zip162.2 KB4 records
2002-10.zip206.6 KB3 records
2002-09.zip250.9 KB6 records
2002-08.zip121.4 KB2 records
2002-07.zip56.8 KB2 records
2002-06.zip210.9 KB5 records
2002-05.zip140.9 KB4 records
2002-04.zip1.3 MB15 records
2002-03.zip1.0 MB17 records
2002-02.zip68.5 KB2 records
2002-01.zip54.6 KB1 records
2001-12.zip251.1 KB6 records
2001-11.zip1.4 MB7 records
2001-10.zip228.5 KB6 records
2001-09.zip91.0 KB4 records
2001-08.zip41.4 KB3 records
2001-07.zip7.3 MB4 records
2001-06.zip200.9 KB5 records
2001-05.zip138.3 KB5 records
2001-04.zip2.1 MB25 records
2001-03.zip679.4 KB15 records
2001-02.zip117.4 KB3 records
2001-01.zip87.1 KB2 records
2000-12.zip235.3 KB4 records
2000-11.zip51.3 KB1 records
2000-10.zip1.4 MB8 records
2000-09.zip200.0 KB6 records
2000-08.zip65.3 KB1 records
2000-07.zip291.7 KB4 records
2000-06.zip196.8 KB7 records
2000-05.zip287.8 KB7 records
2000-04.zip710.9 KB18 records
2000-03.zip1.1 MB25 records
2000-02.zip346.5 KB4 records
2000-01.zip21.1 KB1 records
1999-12.zip325.1 KB14 records
1999-11.zip243.9 KB4 records
1999-10.zip2.1 MB7 records
1999-09.zip173.4 KB3 records
1999-08.zip158.9 KB8 records
1999-07.zip303.1 KB6 records
1999-06.zip364.8 KB14 records
1999-05.zip260.7 KB9 records
1999-04.zip821.1 KB23 records
1999-03.zip1.2 MB30 records
1999-02.zip106.2 KB4 records
1999-01.zip23.9 KB1 records
1998-12.zip132.3 KB4 records
1998-11.zip179.4 KB6 records
1998-10.zip239.0 KB7 records
1998-09.zip228.8 KB6 records
1998-08.zip194.0 KB9 records
1998-07.zip313.2 KB7 records
1998-06.zip267.5 KB8 records
1998-05.zip398.0 KB9 records
1998-04.zip840.6 KB23 records
1998-03.zip713.9 KB19 records
1998-02.zip175.7 KB9 records
1998-01.zip23.8 KB2 records
1997-12.zip206.9 KB7 records
1997-11.zip194.6 KB5 records
1997-10.zip219.0 KB8 records
1997-09.zip452.1 KB9 records
1997-08.zip305.2 KB17 records
1997-07.zip149.0 KB3 records
1997-06.zip214.4 KB10 records
1997-05.zip281.7 KB13 records
1997-04.zip897.1 KB25 records
1997-03.zip1.5 MB47 records
1997-02.zip191.5 KB10 records
1997-01.zip163.8 KB6 records
1996-12.zip207.7 KB6 records
1996-11.zip94.0 KB4 records
1996-10.zip380.5 KB13 records
1996-09.zip232.2 KB10 records
1996-08.zip621.5 KB22 records
1996-07.zip92.3 KB3 records
1996-06.zip104.3 KB5 records
1996-05.zip351.2 KB8 records
1996-04.zip617.0 KB20 records
1996-03.zip1.3 MB33 records
1996-02.zip398.0 KB14 records
1996-01.zip83.2 KB5 records
1995-12.zip172.0 KB5 records
1995-11.zip336.9 KB10 records
1995-10.zip120.7 KB4 records
1995-09.zip151.1 KB6 records
1995-08.zip197.2 KB6 records
1995-07.zip120.3 KB2 records
1995-06.zip114.5 KB7 records
1995-05.zip228.4 KB6 records
1995-04.zip265.1 KB8 records
1995-03.zip552.1 KB13 records
1995-02.zip1.0 MB21 records
1994-10.zip118.7 KB4 records
1994-08.zip9.1 KB1 records
1994-07.zip85.3 KB2 records
1994-04.zip141.9 KB5 records
1994-03.zip559.7 KB25 records
1994-01.zip40.1 KB2 records

What This Dataset Contains

Form ARS is the annual report to security holders required under Rule 14a-3 of the Securities Exchange Act of 1934. Rule 14a-3 obliges a registrant to furnish this report to shareholders before or together with the proxy statement for an annual meeting at which directors are to be elected. The document is the marketing-grade "glossy" annual report distributed to shareholders — distinct from the Form 10-K filed with the SEC, although the audited financial statements and management's discussion typically overlap heavily with those in the contemporaneous 10-K. Form ARS/A is an amendment to a previously furnished ARS, used when the registrant updates or corrects the report after its original furnishing.

The substantive content prescribed by Rule 14a-3(b) includes audited financial statements for the registrant's last two fiscal years, supplementary financial information, management's discussion and analysis of financial condition and results of operations, market and dividend information for the registrant's common equity, segment and geographic data, and identification of directors and executive officers. Beyond these mandated items, presentation is an issuer design choice rather than an Item-by-Item template, so each ARS is internally an issuer-designed publication rather than a rigidly structured filing.

The dataset is delivered as monthly ZIP container files covering filings from January 1994 to the present. Modern records (post-January 2023) consist uniformly of one PDF plus the metadata.json sidecar, while older records may consist of differently formatted primary documents reflecting the conventions of their era. In all eras, the metadata sidecar follows the same JSON schema, providing a stable indexing layer regardless of the primary document's format.

Content Structure of a Single Record

What one record represents

One record in the Form ARS Files dataset corresponds to a single Form ARS or Form ARS/A submission on EDGAR, identified by a unique 18-digit accession number. On disk a record materializes as one accession-named folder containing two artifacts: a metadata.json sidecar that mirrors the EDGAR submission header, and a single primary PDF that is the glossy annual report itself. The unit of observation is the filing — one furnishing of an annual report to security holders by a registrant for a specific fiscal period — not a sub-document, page, or extracted section. Form ARS and Form ARS/A records share an identical on-disk shape; the distinction between an original report and an amendment is carried in metadata.json.formType, not in any folder or filename convention.

Internal structure of the underlying annual report

A typical glossy annual report contains, in roughly this order:

  • A cover page with the issuer's name, ticker, and fiscal-year designation, followed by a table of contents.
  • A letter from the chief executive officer (and sometimes a separate chairman's letter) addressing performance, strategy, capital allocation, and outlook.
  • Financial highlights and selected operating metrics, frequently presented graphically (revenue, earnings, free cash flow, segment KPIs, multi-year comparatives).
  • A business overview or "year in review" narrative describing segments, products, geographies, and end markets.
  • Management's discussion and analysis (MD&A) of financial condition and results of operations, including liquidity, capital resources, critical accounting estimates, and risk discussion.
  • The audited consolidated financial statements — balance sheet, statements of operations, comprehensive income, stockholders' equity, and cash flows — together with the accompanying notes.
  • The report of the independent registered public accounting firm, including the opinion on the financial statements and, where applicable, on internal control over financial reporting.
  • Corporate governance and shareholder-information disclosures: board composition, executive officers, common-stock market and dividend information, performance graph, and director nominees.
  • Back matter such as investor-relations contacts, transfer-agent information, stock-exchange listing, and annual-meeting logistics.

Photography, infographics, and brand design pervade the document. The form is engineered for shareholder readability rather than structured data capture, which is why every interior section ultimately lives inside the PDF as visual content rather than as discrete machine-readable fields.

On-disk layout of a single record

The dataset is delivered as monthly ZIP archives. Each archive unpacks to a year/month root whose immediate children are accession-numbered folders, one per filing. The accession folder name is the EDGAR accession number with hyphens removed (e.g., 0000896878-25-000061 becomes 000089687825000061). Inside each accession folder there are two files:

  1. metadata.json — always literally that name, never missing, one per record.
  2. The primary PDF — the glossy annual report, retaining the issuer's original EDGAR filename without normalization.

PDF filenames are not normalized and follow several conventions encountered across filing agents and registrants:

  • Filer-agent template names such as formars.pdf or form-ars.pdf (used by M2 Compliance, Issuer Direct, and similar agents).
  • Donnelley/Toppan/Workiva job-coded names such as d793293dars.pdf, tm2530112d1_ars.pdf, tm2531499d1_arsa.pdf, ny20056475x3_ars.pdf, or ea0266368-02.pdf.
  • Issuer-meaningful slugs such as intu014766-ars.pdf, panw014404-arsa.pdf, unf-ars-proxy_2025.pdf, gipr_annual_report_fy24.pdf, or lnsr-ars-2024annualrepor.pdf.

A trailing _arsa token (versus _ars) is a common but informal hint that the underlying form type is ARS/A; the authoritative signal is metadata.json.formType. The PDF can always be located generically by enumerating files in the accession folder and excluding metadata.json. PDF size varies widely with issuer design choices and page count, ranging from under one megabyte for small filers using simple templates to well over fifteen megabytes for large issuers with image-heavy designs.

The metadata.json schema

The metadata sidecar is a flat JSON object that mirrors the EDGAR <SEC-HEADER> block for the submission and adds dataset-internal identifiers. Its top-level fields are:

  • formType — string, either "ARS" for an original annual report or "ARS/A" for an amendment.
  • accessionNo — the EDGAR accession number in dashed form (e.g., "0000896878-25-000061").
  • filedAt — ISO-8601 filing timestamp with timezone offset (e.g., "2025-11-26T12:13:30-05:00").
  • effectivenessDate — the date the filing took effect on EDGAR, formatted as YYYY-MM-DD.
  • periodOfReport — the fiscal-year-end the report covers, formatted as YYYY-MM-DD.
  • description — a human-readable label such as "Form ARS - Annual Report to Security Holders" or, for amendments, "Form ARS/A - Annual Report to Security Holders: [Amend]".
  • linkToFilingDetails — direct URL to the primary PDF on EDGAR.
  • linkToTxt — URL to the complete SGML submission text file on EDGAR.
  • linkToHtml — URL of the EDGAR filing-index page.
  • linkToXbrl — empty string for Form ARS records; ARS filings carry no XBRL.
  • documentFormatFiles[] — array mirroring EDGAR's "Document Format Files" table (see below).
  • dataFiles — empty array for Form ARS records, reflecting the absence of XBRL data files.
  • entities[] — array of filer/co-filer header records (see below).
  • seriesAndClassesContractsInformation — array, empty for ARS records; the field is populated only for fund-style submissions.
  • id — stable internal MD5 identifier for the record.

documentFormatFiles[]

The array contains two entries describing the EDGAR document table for the submission:

  1. The primary PDF, with sequence: "1", type: "ARS" or "ARS/A", a filer-supplied description (commonly "ANNUAL REPORT TO SECURITY HOLDERS", "ARS", or "ARS/A"), a documentUrl pointing to the PDF on EDGAR, and a byte-string size.
  2. The SGML complete-submission wrapper, with sequence: " " and type: " ", whose documentUrl points to the <accession>.txt file on EDGAR. Only the URL is recorded — the SGML wrapper itself is not bundled into the dataset.

entities[]

Each element is one filer or co-filer, derived from the EDGAR header. Form ARS submissions almost always carry a single entity. Each entity object contains:

  • cik — bare-numeric Central Index Key (no zero padding).
  • companyName — the registrant's name with the EDGAR role suffix in parentheses, almost always (Filer).
  • type — the form type as recorded for the entity ("ARS" or "ARS/A").
  • act — the governing securities act, typically "34" for the Exchange Act.
  • fileNo — the SEC file number, retaining its embedded dash (e.g., "000-21180").
  • filmNo — the EDGAR film number assigned at acceptance.
  • irsNo — the registrant's IRS Employer Identification Number, digits only.
  • stateOfIncorporation — two-letter U.S. state or equivalent jurisdiction code.
  • fiscalYearEnd — four-digit MMDD string (e.g., "0731", "1231"), generally matching the month of periodOfReport.
  • sic — EDGAR SIC string combining the four-digit code and its human-readable description in a single space-delimited string (e.g., "7372 Services-Prepackaged Software"); embedded HTML entities such as &amp; may appear unescaped.
  • tickers — array of trading symbols associated with the registrant; usually a single symbol, occasionally a list of historical or alternate symbols.

Included content

For every accession in the dataset, a record contains:

  • The metadata.json sidecar with the EDGAR submission header, document table, and entity records described above.
  • The primary annual-report PDF — the glossy annual report end to end, including the CEO letter, financial highlights, business overview, MD&A, audited financial statements with notes, the auditor's report, governance and shareholder-information back matter, and any artwork or photography embedded in the PDF as raster content.

The narrative substance of a Form ARS record — CEO letter language, MD&A discussion, audit-report wording, footnote disclosures, financial-statement line items — is embedded in the PDF and is accessible only via PDF text extraction or, for image-rendered or scanned submissions, OCR. No structured field in metadata.json exposes the report's interior content.

Excluded or separate content

Several elements that appear in the underlying EDGAR submission are not materialized in a dataset record:

  • Image files. Standalone image attachments (JPG, PNG, GIF) that may have accompanied the original EDGAR submission are stripped from the record. Imagery embedded inside the primary PDF is preserved as part of that PDF; only loose image attachments are excluded.
  • The complete-submission .txt wrapper. The SGML complete-submission text file (e.g., 0000896878-25-000061.txt) is referenced in metadata.json via both linkToTxt and the second entry of documentFormatFiles[], but the file itself is not written into the accession folder. Consumers needing the SGML wrapper must fetch it from EDGAR using the provided URL.
  • XBRL. Form ARS filings do not carry inline or standalone XBRL. linkToXbrl is always empty and dataFiles is always an empty array. There is no structured financial data layer; financial figures live inside the PDF only.
  • Series and contracts information. seriesAndClassesContractsInformation is an empty array for Form ARS records because the field applies to fund-style filings rather than corporate annual reports.
  • Other ancillary EDGAR artifacts such as graphics manifests or correspondence files, when present in the original submission, are not bundled.

Form ARS versus Form ARS/A

Form ARS/A records are amendments to previously furnished annual reports. Their on-disk layout is identical to that of Form ARS records — one accession folder, one metadata.json, one PDF — and the only authoritative signal that a record is an amendment is the value "ARS/A" in metadata.json.formType (and correspondingly in the entity-level type and the primary documentFormatFiles[0].type). The description for amendments is rendered as "Form ARS/A - Annual Report to Security Holders: [Amend]". Filename hints such as a trailing _arsa token are common but not guaranteed. Amendments are a small fraction of the corpus, and amendment records do not carry a structured pointer to the original ARS accession they amend; relating amendments to originals requires matching on cik and periodOfReport.

How required content has changed over time

Form ARS is a furnishing under Rule 14a-3 rather than a filing built around an Item-by-Item form, so the SEC has historically prescribed the report's content through Rule 14a-3(b) rather than through a structural template. Updates to Rule 14a-3(b) over time have refined the financial-statement and selected-data requirements — most notably with the eventual phase-out of the five-year selected financial data table for many registrants — but the form's overall content character has remained consistent.

The most consequential change for this dataset is a furnishing-channel change rather than a content change: Form ARS was historically furnished to shareholders directly and only sometimes voluntarily filed with the SEC, often in paper. From the early EDGAR era forward, voluntary electronic submissions appeared in EDGAR in heterogeneous formats — frequently ASCII text, later HTML, and occasionally PDF. Effective January 2023, amended Rule 101 of Regulation S-T mandated electronic submission of Form ARS on EDGAR in PDF format, making EDGAR coverage essentially universal from that point forward and standardizing the artifact type.

How the data format has evolved

The dataset spans January 1994 to the present and inherits the format heterogeneity of the pre-2023 EDGAR corpus. The file types found in the dataset are TXT, JSON, HTML, and PDF: early voluntary submissions arrived as ASCII text or, later, HTML; some issuers furnished PDFs voluntarily; and from January 2023 onward, every record consists of one PDF plus the JSON metadata sidecar, reflecting the Reg S-T Rule 101 mandate. As a result, modern records are uniform — one PDF, one metadata.json — while older records may consist of differently formatted primary documents reflecting the conventions of their era. In all eras, the metadata sidecar follows the same JSON schema described above, providing a stable indexing layer regardless of the primary document's format.

Interpretation and extraction notes

  • PDF as primary content. Every modern record's substantive content is locked inside a PDF. Narrative analysis (CEO letter mining, MD&A extraction, sentiment scoring, topic modeling) requires PDF text extraction or OCR for image-rendered or scanned submissions. There is no inline HTML or structured markup fallback for current-era records.
  • No XBRL. Financial figures inside Form ARS reports are not tagged. Quantitative analysis of ARS financials requires either parsing tables out of the PDF or cross-referencing the registrant's Form 10-K, where XBRL-tagged financials are available.
  • Stable triple, unstable filenames. PDF filenames vary by filer and filing agent; the reliable record locator is the triple (accession folder, metadata.json, single non-metadata file). The canonical PDF filename is also recoverable as the basename of documentFormatFiles[0].documentUrl.
  • Authoritative form-type signal. The formType field in metadata.json is the authoritative indicator of ARS versus ARS/A. Filename suffixes and description strings are corroborating but not contractual.
  • Entity-level cohorting. Even though the record carries no XBRL, entities[].sic and entities[].tickers are populated, enabling industry- or ticker-level cohorting of glossy annual reports without external lookups.
  • Co-filers are rare but possible. The entities[] array is structured to accommodate multiple filer/co-filer records, even though Form ARS submissions typically contain only one entity.
  • String-typed numerics. documentFormatFiles[].size is a byte count expressed as a string, and cik, filmNo, and irsNo are digit strings rather than integers; consumers building typed schemas must coerce these explicitly.

Who Files or Publishes This Dataset, and When

Who furnishes the record

The furnisher of Form ARS is the issuer itself — the registrant whose CIK appears on the EDGAR submission. Form ARS is produced by domestic operating-company issuers with a class of securities registered under Section 12 of the Securities Exchange Act of 1934, when they solicit proxies (or written consents in lieu of a meeting) under Section 14(a) and Regulation 14A for an annual meeting at which directors are to be elected.

The population spans large accelerated, accelerated, non-accelerated, and smaller reporting companies, including emerging growth companies, so long as they are running a Regulation 14A proxy solicitation. Officers, directors, and the audit firm appear inside the report's content but are not the filers; the registrant is the sole legal furnisher.

The following are outside the Form ARS pathway:

What triggers the record

Form ARS is event-driven, not calendar-periodic. The trigger is the issuer's solicitation of proxies (or written consents) for an annual meeting at which directors are to be elected, governed by Rule 14a-3(b) of the Exchange Act. Rule 14a-3(b) requires that the proxy solicitation be preceded by or accompanied by an annual report to security holders containing audited financial statements and the prescribed narrative disclosure (MD&A, business description, market and selected financial data, and related items).

A companion provision, Rule 14a-3(c), requires the issuer to submit that annual report to the Commission on EDGAR no later than the date on which the report is first sent or given to security holders. This SEC-submission step — distinct from the act of furnishing the glossy to shareholders — is what produces the EDGAR record captured in this dataset.

Because the trigger is the meeting and not the fiscal year-end, an issuer that delays its annual meeting, or skips a year because of a transaction or restructuring, may generate no Form ARS record in that calendar period even though it remains a Section 12 reporting company. A special meeting held in lieu of an annual meeting at which directors are elected triggers the same Rule 14a-3(b) obligation and produces an ARS record on a non-standard cadence.

"Furnished," not "filed"

Rule 14a-3(c) provides that the annual report to security holders is furnished to the Commission, not filed. It is not deemed filed within the meaning of Section 18 of the Exchange Act, is not subject to Section 18 liability, and is not incorporated by reference into other Commission filings unless the registrant elects to incorporate it. The same audited financial statements may also be filed inside Form 10-K, where Section 18 liability does attach, but the glossy presentation transmitted under the ARS regime carries the lighter "furnished" status.

Regulatory framework and the January 11, 2023 EDGAR mandate

Before June 2022, issuers typically satisfied Rule 14a-3(c) by mailing seven paper copies of the glossy to the Commission, with EDGAR submission permitted on a voluntary basis. In June 2022 the Commission amended Rule 101 of Regulation S-T (Rule 101(b)/(c)) to eliminate the paper alternative and require electronic submission on EDGAR in PDF format. The compliance date for that mandatory electronic submission was January 11, 2023.

This boundary shapes the dataset's temporal density. EDGAR has accepted ARS-coded submissions since 1994, but pre-2023 records reflect voluntary or transitional electronic submissions; from January 11, 2023 onward, every Regulation 14A proxy-soliciting issuer must submit the glossy electronically, so the post-mandate population is substantially more complete per fiscal year. Pre-EDGAR paper ARS reports existed for decades under the original proxy rules but are not part of this electronic dataset.

Form ARS/A amendments

Form ARS/A records are amendments to a previously furnished annual report. Amendments arise when an issuer needs to correct, supplement, or replace a previously transmitted glossy — for example, to fix a financial-statement error, substitute a corrected page, add an inadvertently omitted exhibit, or re-furnish the report after a reissued audit opinion. Because ARS submissions are furnished rather than filed, ARS/A amendments are not driven by the Section 18 liability dynamics that motivate 10-K/A amendments; they exist to give security holders and the Commission an accurate copy of what was (or should have been) distributed for the annual meeting. ARS and ARS/A share the same dataset record schema.

Timing and calendar clustering

There is no single SEC-wide ARS deadline analogous to the 60/75/90-day 10-K timetable. Cadence follows each issuer's own annual-meeting calendar:

  • The annual report must be furnished to security holders before or at the time of delivery of the proxy statement.
  • The Rule 14a-3(c) submission to the Commission must occur no later than the date the report is first sent or given to security holders.
  • Annual meetings cluster heavily in the spring (April through June) for December fiscal year-ends, producing a pronounced Form ARS submission peak from late March through May each year.
  • Issuers with non-calendar fiscal years (June, July, September, and January-end retailers, among others) submit on rolling schedules throughout the year.

Distinctions worth noting

  • Form ARS vs. Form 10-K. The 10-K is filed under Section 13(a) and is subject to Section 18 liability. Form ARS is furnished under Rule 14a-3 and is not. Many issuers satisfy Rule 14a-3(b) by wrapping or incorporating their 10-K, but the wrapped glossy still travels separately as Form ARS, and the two remain distinct EDGAR records even when financial-statement content overlaps.
  • Form ARS vs. Form N-CSR. Operating companies use ARS; registered investment companies use N-CSR. A user seeking the annual report of a mutual fund or ETF will not find it in this dataset.
  • Form ARS vs. Form 20-F / 40-F. FPIs sit outside Regulation 14A and produce no ARS; their annual disclosure regime is form-based, not meeting-triggered.
  • Pre-2023 voluntary submissions. Records before January 11, 2023 reflect voluntary or transitional electronic submissions and are sparser per issuer-year than the post-mandate population.
  • Issuer identity in multi-tier structures. Only the entity that actually solicited proxies under Regulation 14A produces an ARS record; subsidiary registrants that did not solicit do not.

How This Dataset Differs From Similar Datasets or Filings

Form ARS sits in an unusual corner of EDGAR: the shareholder-facing annual report that overlaps in financial substance with the 10-K but diverges sharply in regulatory status, format, and structure. The closest comparison points are (1) the formal Section 13(a) annual reports (10-K, 20-F, 40-F), (2) the proxy materials ARS is procedurally tied to under Rule 14a-3, and (3) the investment-company shareholder-report regime (N-CSR/N-CSRS).

Form 10-K and 10-K/A

The 10-K is the most natural and most easily confused comparison. Both cover the same fiscal year and draw from the same audited financial statements; balance sheet, income statement, cash flows, MD&A, and auditor's report typically appear in both.

The differences are decisive:

  • Filed vs. furnished. 10-K is filed under Section 13(a) with full Section 18 liability. ARS is furnished under Rule 14a-3 and is generally not "filed" for Section 18 purposes unless the registrant expressly incorporates it.
  • Structure. 10-K is built around mandatory Items (1, 1A, 7, 8, etc.); ARS has no item numbering and no fixed schema.
  • Format and tagging. 10-K is HTML/text with Inline XBRL tagging of the financial statements and cover data. ARS, since the January 2023 mandate, is a single bound PDF with no XBRL or Inline XBRL component at all.
  • Presentation intent. 10-K is a regulatory disclosure document. ARS is a designed shareholder communication, often built around a CEO letter, photography, infographics, and graphic-design layouts that the 10-K specifically lacks.
  • 10-K wrap practice. Many issuers satisfy Rule 14a-3 by sending shareholders the 10-K itself (often inside a thin printed cover, the "10-K wrap") rather than producing a separate glossy report. Those issuers typically do not generate a Form ARS submission, so the dataset systematically under-represents registrants who use the wrap approach.

10-K/A and ARS/A parallel each other structurally but diverge in regime: 10-K/A restates a filed report and resets liability for the amended portions; ARS/A re-furnishes a corrected shareholder document without the same statutory consequences.

DEF 14A

DEF 14A is the document ARS is meant to accompany. Rule 14a-3 requires the annual report to be furnished before or together with the proxy statement when directors are up for election, so the two are temporally and procedurally linked.

They are not substitutes. DEF 14A is HTML, item-driven under Schedule 14A, and focused on vote items: director elections, executive compensation tables, say-on-pay, auditor ratification, shareholder proposals. ARS is PDF, narrative, and focused on the year's operating and financial story. Compensation and governance content live in the proxy; financials and MD&A live in the ARS (or the 10-K it wraps).

N-CSR and N-CSRS

N-CSR is the certified shareholder report for registered management investment companies and is the fund-side analog to ARS. It includes the schedule of investments, fund financial statements, and management's discussion of fund performance.

Differences from ARS:

  • Filer population. Mutual funds and closed-end funds, not operating companies.
  • Legal basis. Section 30 of the Investment Company Act and Rule 30b2-1, not Rule 14a-3.
  • Cadence. Annual (N-CSR) plus semi-annual (N-CSRS).
  • Status and structure. N-CSR is filed, item-numbered, and submitted in HTML with structured tagging. ARS is furnished, unstructured, and PDF.

The two datasets are mutually exclusive in practice.

Form 20-F and Form 40-F

20-F (foreign private issuers) and 40-F (Canadian MJDS issuers) are the annual reporting forms used in lieu of 10-K. They are filed, item-structured, and (for 20-F) Inline XBRL tagged.

Foreign private issuers are not subject to the U.S. proxy rules in the same way and generally do not produce a Form ARS. The shareholder-facing glossy annual report for foreign issuers is typically furnished as a 6-K exhibit, not via ARS. The Form ARS dataset is therefore overwhelmingly a domestic-issuer dataset; for foreign-issuer equivalents, look to 6-K furnished annual reports.

Pre-2023 Form ARS submissions

Before the January 2023 amendments to Rule 101 of Regulation S-T, electronic submission of the Rule 14a-3 annual report was not mandated. EDGAR ARS submissions before 2023 are sparse and inconsistent: many registrants did not file ARS electronically, and those that did frequently submitted plain HTML wrappers or skeletal placeholders rather than the actual glossy report.

The 2023 PDF mandate created a sharp dataset-density discontinuity. Post-2023 filings reliably contain the full visual annual report as a PDF; pre-2023 filings range from complete glossy submissions to near-empty stubs. A 10-K time series from 1994 to present is broadly comparable across years; a Form ARS time series is effectively two datasets stitched together at January 2023.

Boundary summary

Form ARS is the only EDGAR dataset where:

  • PDF is the primary content carrier rather than an auxiliary exhibit,
  • the underlying document is furnished rather than filed, with reduced Section 18 exposure, and
  • there is no XBRL or Inline XBRL layer at all.

It is not a substitute for 10-K when structured financial data, item-level disclosure, or filer-liability-grade text is needed; not a substitute for DEF 14A on governance, compensation, or vote-item disclosure; and not applicable to funds (N-CSR) or foreign private issuers (20-F, 40-F, 6-K). Coverage is also skewed by 10-K-wrap practice, which removes from the dataset any issuer that sends shareholders the 10-K itself in lieu of a separate glossy report.

What ARS uniquely captures is the shareholder communication itself: the CEO letter, the visual narrative, and the way management chooses to present the year to its owners rather than to the SEC. For research on corporate communication, impression management, narrative tone, visual disclosure, or comparison of "filed" versus "furnished" representations of the same fiscal year, no other EDGAR dataset substitutes.

Who Uses This Dataset

Each user group below pulls on a specific layer of the same record — designed PDFs with CEO letters, segment infographics, director biographies, sustainability spreads, and audited statements, paired with a metadata.json carrying accession, cik, fiscalYearEnd, entities, sic, filing date, and formType (ARS vs ARS/A).

ESG and stewardship researchers

Sustainability analysts and stewardship teams mine the PDF body for content absent from the 10-K: chair and CEO letters, climate and human-capital narratives, DEI commitments, and board oversight language. They cohort issuers by sic and align reporting cycles with fiscalYearEnd, then compare the glossy narrative against the matched 10-K to flag gaps between marketed ambition and audited risk language.

Equity and credit analysts

Fundamental and credit analysts use ARS PDFs for material missing from 10-K HTML: segment maps, geographic and end-market charts, multi-year KPI graphics, and management's strategy framing. Credit desks watch the CEO letter for capital-structure, dividend, and capex signals. The embedded audited statements anchor GAAP figures while non-GAAP segment views often appear more cleanly here than in the 10-K. Coverage back to 1994 supports long-horizon issuer narrative tracking.

Investor relations teams and IR consultants

IR officers and external advisors pull peer cohorts by sic and matching fiscalYearEnd to benchmark cover treatments, letter length and tone, KPI selection, segment storytelling, and ESG inclusions ahead of their own next annual report. The PDF format is itself the object of study: typography, layout, and infographic conventions inform design recommendations to executive sponsors.

Governance researchers and proxy advisors

Proxy and governance analysts read the ARS alongside the matched DEF 14A to assess director biographies, skills-matrix graphics, committee narrative, and the chair's governance letter as presented to shareholders. They use fiscalYearEnd and filing date to pair each ARS with the corresponding proxy and test consistency between shareholder-facing and formal disclosure.

Annual-report design agencies

Creative and art directors at design studios use the corpus as a reference library, surveying peer glossies within the same sic for cover concepts, photographic style, infographic conventions, and treatment of recurring spreads (segment overviews, multi-year financials, sustainability). The historical span supports tracing how annual-report design has evolved.

Corporate communications and executive speechwriters

Speechwriters and communications leads drafting CEO letters review peer cohorts filtered by sic and fiscal calendar to calibrate length, tone, topic mix, and rhetorical conventions, and to track how messaging shifts across cycles of macro stress and regulatory change.

Academic researchers in disclosure and communication

Accounting, finance, and corporate-communication researchers study tone, readability, narrative structure, and impression management. The 1994-onward span supports longitudinal work on CEO-letter sentiment and visual disclosure, while entities and fiscalYearEnd align matched ARS/10-K pairs for studies of voluntary versus regulated disclosure.

NLP and document-AI teams

ML engineers training PDF layout, table, and chart extraction systems use ARS files as visually rich evaluation data: multi-column layouts, sidebars, infographics, and embedded financial tables exercise capabilities that plain 10-K HTML does not. metadata.json provides clean ground-truth issuer, fiscal-year, and form-type labels; the ARS vs ARS/A flag supports amendment-detection training.

Specific Use Cases

The use cases below tie directly to the two artifacts in each record: the glossy annual-report PDF and the metadata.json sidecar (with formType, periodOfReport, entities[].sic, entities[].fiscalYearEnd, entities[].tickers, filedAt).

CEO-letter tone benchmarking within an industry cohort

Filter records by entities[].sic (e.g., "2834 Pharmaceutical Preparations") and a common fiscalYearEnd, extract the CEO and chair letters from the front matter of each PDF, and run sentiment, readability, and topic-modeling pipelines. The output is a per-issuer scorecard of letter length, tone polarity, and topic share (strategy, capital allocation, ESG, macro), usable by IR teams calibrating their next letter or by sell-side analysts tracking management mood across an industry.

ARS-vs-10-K narrative gap analysis

For each ARS, use entities[].cik and periodOfReport to locate the contemporaneous 10-K, extract MD&A and risk-factor text from both, and diff the language. The deliverable is a per-issuer table of statements present in the glossy ARS but absent from the audited 10-K (and vice versa) — useful for ESG analysts flagging marketed ambition that does not survive into Section 18 risk language, and for accounting researchers studying voluntary versus mandated disclosure.

Amendment-driven correction tracking

Select records where metadata.json.formType == "ARS/A", then match each amendment to its original ARS by cik plus periodOfReport (no structured back-pointer exists). Diff the two PDFs to isolate corrected pages — typically restated tables, auditor-report changes, or director-biography fixes. The output is a corpus of "what gets amended in glossy annual reports," which auditors, governance analysts, and disclosure researchers can use to characterize post-furnishing correction patterns.

Visual disclosure and infographic extraction for document AI

Use the post-January-2023 slice (where every record is a single PDF under the Reg S-T mandate) as evaluation data for layout, chart, and table-extraction models. entities[].sic, entities[].tickers, and periodOfReport provide clean ground-truth labels; the ARS vs ARS/A distinction supports amendment-classification training. The artifact is a labeled benchmark of multi-column glossy pages, segment infographics, and embedded financial tables that plain 10-K HTML does not exercise.

Peer-cohort design library for IR and creative teams

Build a browsable peer library by joining entities[].sic with fiscalYearEnd to assemble cohorts of glossies covering the same fiscal cycle, then index PDFs by cover treatment, page count, KPI dashboard format, and sustainability-section presence. IR officers and design agencies use the resulting catalog to benchmark cover concepts, letter length, segment storytelling, and infographic conventions ahead of producing the next annual report.

Longitudinal CEO-letter corpus for academic research

Using the 1994-onward span, build a panel keyed by (cik, periodOfReport) of extracted CEO-letter text. Because Reg S-T made PDF furnishing universal from January 2023, panel density is uneven — this itself is a research variable. The resulting dataset supports published work on impression management, sentiment around macro shocks, and the divergence between furnished shareholder communication and filed 10-K narrative for the same registrant-year.

ARS-DEF 14A pairing for governance presentation studies

Pair each ARS with the matching DEF 14A by cik and proxy-season alignment with periodOfReport, then extract director biographies, skills-matrix graphics, and committee descriptions from the ARS PDF and compare with the structured Schedule 14A items in the proxy. The output is a per-issuer consistency check between the glossy governance presentation shown to shareholders and the item-driven disclosure filed for the vote.

Dataset Access

The Form ARS Files dataset is accessible through three endpoints: a JSON metadata API, a full archive download, and per-container downloads. The dataset is delivered as monthly ZIP container files covering filings from January 1994 to the present.

Dataset Index JSON API: https://api.sec-api.io/datasets/form-ars-files.json

Returns dataset-level metadata (name, description, last updated timestamp, earliest sample date, total records, total size, form types, container format, and file types), the download URL for the full dataset archive, and a list of all monthly container files with their individual download URLs, sizes, record counts, and updated timestamps. This endpoint does not require an API key.

Poll this endpoint to monitor which containers were refreshed in the most recent run and decide which monthly archives to re-download on a day-by-day basis.

Example response:

Example
1 {
2 "datasetId": "1f13365b-9ae0-6900-b5e9-4e81e2695748",
3 "datasetDownloadUrl": "https://api.sec-api.io/datasets/form-ars-files.zip",
4 "name": "Form ARS Files Dataset",
5 "updatedAt": "2026-04-25T02:59:26.436Z",
6 "earliestSampleDate": "1994-01-01",
7 "totalRecords": 14330,
8 "totalSize": 81339575866,
9 "formTypes": ["ARS", "ARS/A"],
10 "containerFormat": "ZIP",
11 "fileTypes": ["TXT", "JSON", "HTML", "PDF"],
12 "containers": [
13 {
14 "downloadUrl": "https://api.sec-api.io/datasets/form-ars-files/2026/2026-04.zip",
15 "key": "2026/2026-04.zip",
16 "size": 13818783,
17 "records": 154,
18 "updatedAt": "2026-04-25T02:59:26.436Z"
19 }
20 ]
21 }

Download Entire Dataset: https://api.sec-api.io/datasets/form-ars-files.zip?token=YOUR_API_KEY

Downloads the complete dataset as a single ZIP archive containing every monthly container. This endpoint requires an API key.

Download Single Container: https://api.sec-api.io/datasets/form-ars-files/2026/2026-04.zip?token=YOUR_API_KEY

Downloads one monthly container ZIP instead of the full archive. Replace the year and month segments to fetch other periods. This endpoint requires an API key.

Frequently Asked Questions

What form does this dataset cover?

The dataset covers Form ARS, the annual report to security holders furnished under Rule 14a-3 of the Securities Exchange Act of 1934, together with Form ARS/A amendments to previously furnished annual reports. Form ARS is the marketing-grade "glossy" annual report distributed to shareholders alongside the proxy statement for an annual meeting at which directors are to be elected.

What does one record in this dataset represent?

One record corresponds to a single Form ARS or Form ARS/A submission on EDGAR, identified by a unique 18-digit accession number. On disk it materializes as one accession-named folder containing two artifacts: a metadata.json sidecar that mirrors the EDGAR submission header, and a single primary PDF that is the glossy annual report itself.

Who is required to furnish Form ARS?

Form ARS is furnished by the issuer — domestic operating-company registrants with a class of securities registered under Section 12 of the Exchange Act, when they solicit proxies under Regulation 14A for an annual meeting at which directors are to be elected. Foreign private issuers, Section 15(d)-only reporters, and registered investment companies are outside this pathway; investment companies use Form N-CSR instead.

What time period does the dataset cover?

The dataset spans January 1994 to the present. Coverage density changes sharply at January 11, 2023, the compliance date on which amended Rule 101 of Regulation S-T made electronic submission on EDGAR in PDF format mandatory; pre-2023 records reflect voluntary or transitional electronic submissions and are sparser per issuer-year.

What file format is the dataset distributed in?

The dataset is delivered as monthly ZIP container files. Each accession folder inside a container holds a metadata.json sidecar plus the primary annual-report document; for records from January 2023 onward the primary document is a single PDF, while older records may be TXT, HTML, or PDF reflecting the conventions of their era. Form ARS records carry no XBRL — linkToXbrl is empty and dataFiles is always an empty array.

How does Form ARS differ from Form 10-K?

The 10-K is filed under Section 13(a) with full Section 18 liability, structured around mandatory Items, and submitted as HTML/text with Inline XBRL tagging. Form ARS is furnished under Rule 14a-3 with reduced Section 18 exposure, has no item numbering, and is delivered as a single bound PDF with no XBRL layer. Many issuers satisfy Rule 14a-3 by sending shareholders the 10-K itself ("10-K wrap"), in which case no separate ARS submission is generated.

Form ARS/A records share the same on-disk shape as Form ARS records — one accession folder, one metadata.json, one PDF — and the only authoritative signal that a record is an amendment is the value "ARS/A" in metadata.json.formType. Amendments do not carry a structured pointer to the original ARS accession they amend; relating an ARS/A back to its original requires matching on cik and periodOfReport.