Form 10-K Annual Report Filings Dataset (1993–Present)

This dataset contains every Form 10-K annual report filing submitted to the SEC's EDGAR system from October 1993 through the present — 290,000+ records totaling 30+ GB — including all variants of the 10-K form family in original HTML and TXT format with inline XBRL preserved where applicable. It is updated daily and covers the complete EDGAR electronic filing era without survivorship bias: bankrupt companies, deregistered issuers, acquired firms, and delisted registrants are all represented.

The dataset is built for bulk download, large-scale corpus processing, financial research, compliance monitoring, and AI or LLM pipelines that require the full original text of annual reports at scale.

Update Frequency
Daily
Updated at
2026-04-04
Earliest Sample Date
1993-10-01
Total Size
33.8 GB
Total Records
302,947
Container Format
ZIP
Content Types
TXT, JSON, HTML, PAPER
Form Types
10-K, 10-K/A, 10-K405, 10-K405/A, 10-KSB +3 more

Dataset APIs

Programmatically retrieve the full list of dataset archive files, download URLs and dataset metadata.

Dataset Index JSON API

Download the entire dataset as a single archive file.

Download Entire Dataset:

Download a single container file (e.g. monthly archive) from the dataset.

Download Single Container:

Dataset Files

390 files · 33.8 GB
Download All
2026-04.zip9.4 MB93 records
2026-03.zip616.7 MB6,468 records
2026-02.zip548.9 MB1,923 records
2026-01.zip16.8 MB105 records
2025-12.zip26.0 MB140 records
2025-11.zip39.8 MB170 records
2025-10.zip12.9 MB95 records
2025-09.zip32.1 MB186 records
2025-08.zip31.4 MB168 records
2025-07.zip19.1 MB133 records
2025-06.zip24.7 MB166 records
2025-05.zip33.1 MB227 records
2025-04.zip85.1 MB675 records
2025-03.zip561.6 MB3,077 records
2025-02.zip552.8 MB1,989 records
2025-01.zip18.6 MB129 records
2024-12.zip28.2 MB169 records
2024-11.zip39.1 MB190 records
2024-10.zip19.2 MB133 records
2024-09.zip34.8 MB202 records
2024-08.zip32.2 MB186 records
2024-07.zip25.8 MB171 records
2024-06.zip26.1 MB187 records
2024-05.zip35.8 MB236 records
2024-04.zip151.9 MB1,086 records
2024-03.zip463.5 MB2,706 records
2024-02.zip565.4 MB2,094 records
2024-01.zip18.8 MB144 records
2023-12.zip28.8 MB158 records
2023-11.zip35.7 MB186 records
2023-10.zip20.3 MB156 records
2023-09.zip31.6 MB188 records
2023-08.zip29.2 MB189 records
2023-07.zip21.2 MB143 records
2023-06.zip33.5 MB225 records
2023-05.zip50.1 MB390 records
2023-04.zip96.1 MB702 records
2023-03.zip704.7 MB3,679 records
2023-02.zip651.5 MB1,878 records
2023-01.zip18.8 MB123 records
2022-12.zip33.6 MB188 records
2022-11.zip48.3 MB190 records
2022-10.zip20.8 MB156 records
2022-09.zip37.2 MB221 records
2022-08.zip34.2 MB184 records
2022-07.zip19.7 MB134 records
2022-06.zip29.3 MB208 records
2022-05.zip43.3 MB324 records
2022-04.zip102.6 MB794 records
2022-03.zip744.0 MB3,940 records
2022-02.zip644.2 MB1,942 records
2022-01.zip22.8 MB151 records
2021-12.zip41.1 MB258 records
2021-11.zip45.7 MB200 records
2021-10.zip17.1 MB141 records
2021-09.zip32.4 MB256 records
2021-08.zip33.1 MB190 records
2021-07.zip18.6 MB155 records
2021-06.zip32.8 MB278 records
2021-05.zip60.5 MB403 records
2021-04.zip66.1 MB698 records
2021-03.zip623.3 MB3,529 records
2021-02.zip607.2 MB1,796 records
2021-01.zip17.9 MB134 records
2020-12.zip31.1 MB198 records
2020-11.zip41.9 MB197 records
2020-10.zip14.5 MB125 records
2020-09.zip24.9 MB210 records
2020-08.zip36.0 MB214 records
2020-07.zip17.0 MB147 records
2020-06.zip32.1 MB310 records
2020-05.zip46.8 MB442 records
2020-04.zip48.4 MB554 records
2020-03.zip465.6 MB2,999 records
2020-02.zip600.9 MB1,870 records
2020-01.zip18.7 MB168 records
2019-12.zip27.2 MB196 records
2019-11.zip42.9 MB262 records
2019-10.zip16.9 MB277 records
2019-09.zip22.6 MB201 records
2019-08.zip29.5 MB215 records
2019-07.zip14.4 MB161 records
2019-06.zip21.2 MB216 records
2019-05.zip28.1 MB245 records
2019-04.zip98.9 MB1,081 records
2019-03.zip407.5 MB2,817 records
2019-02.zip436.2 MB1,845 records
2019-01.zip14.9 MB157 records
2018-12.zip26.0 MB207 records
2018-11.zip35.2 MB230 records
2018-10.zip13.9 MB158 records
2018-09.zip23.9 MB228 records
2018-08.zip26.7 MB223 records
2018-07.zip14.3 MB161 records
2018-06.zip27.4 MB322 records
2018-05.zip29.4 MB276 records
2018-04.zip101.5 MB1,152 records
2018-03.zip417.5 MB2,894 records
2018-02.zip408.7 MB1,819 records
2018-01.zip14.9 MB179 records
2017-12.zip25.5 MB223 records
2017-11.zip32.8 MB232 records
2017-10.zip15.2 MB180 records
2017-09.zip28.1 MB244 records
2017-08.zip25.8 MB215 records
2017-07.zip14.6 MB184 records
2017-06.zip26.5 MB289 records
2017-05.zip30.1 MB329 records
2017-04.zip59.9 MB734 records
2017-03.zip457.1 MB3,308 records
2017-02.zip391.3 MB1,802 records
2017-01.zip16.6 MB181 records
2016-12.zip31.0 MB262 records
2016-11.zip30.9 MB235 records
2016-10.zip17.8 MB190 records
2016-09.zip30.3 MB281 records
2016-08.zip30.8 MB284 records
2016-07.zip15.8 MB201 records
2016-06.zip26.1 MB301 records
2016-05.zip30.8 MB282 records
2016-04.zip70.4 MB935 records
2016-03.zip411.2 MB3,203 records
2016-02.zip435.1 MB2,046 records
2016-01.zip17.8 MB209 records
2015-12.zip28.9 MB265 records
2015-11.zip34.3 MB263 records
2015-10.zip16.4 MB184 records
2015-09.zip34.1 MB334 records
2015-08.zip26.8 MB254 records
2015-07.zip19.3 MB245 records
2015-06.zip30.5 MB353 records
2015-05.zip29.0 MB301 records
2015-04.zip74.4 MB977 records
2015-03.zip487.7 MB3,662 records
2015-02.zip355.2 MB1,798 records
2015-01.zip19.6 MB260 records
2014-12.zip34.2 MB323 records
2014-11.zip37.1 MB278 records
2014-10.zip22.7 MB282 records
2014-09.zip38.9 MB372 records
2014-08.zip28.4 MB289 records
2014-07.zip20.2 MB256 records
2014-06.zip30.7 MB374 records
2014-05.zip33.2 MB370 records
2014-04.zip86.1 MB1,091 records
2014-03.zip462.9 MB3,565 records
2014-02.zip343.1 MB1,808 records
2014-01.zip21.2 MB288 records
2013-12.zip37.7 MB368 records
2013-11.zip36.7 MB306 records
2013-10.zip22.6 MB305 records
2013-09.zip37.5 MB380 records
2013-08.zip32.5 MB346 records
2013-07.zip24.9 MB336 records
2013-06.zip31.1 MB383 records
2013-05.zip34.1 MB386 records
2013-04.zip136.9 MB1,659 records
2013-03.zip400.9 MB3,115 records
2013-02.zip305.4 MB1,713 records
2013-01.zip22.5 MB302 records
2012-12.zip36.9 MB381 records
2012-11.zip33.6 MB342 records
2012-10.zip25.8 MB370 records
2012-09.zip36.7 MB403 records
2012-08.zip29.4 MB313 records
2012-07.zip20.9 MB307 records
2012-06.zip32.5 MB387 records
2012-05.zip32.1 MB399 records
2012-04.zip92.0 MB1,248 records
2012-03.zip418.9 MB3,655 records
2012-02.zip330.4 MB1,887 records
2012-01.zip20.8 MB304 records
2011-12.zip42.5 MB442 records
2011-11.zip36.0 MB353 records
2011-10.zip23.8 MB319 records
2011-09.zip45.0 MB493 records
2011-08.zip32.8 MB376 records
2011-07.zip23.8 MB323 records
2011-06.zip38.1 MB445 records
2011-05.zip39.0 MB520 records
2011-04.zip88.7 MB1,134 records
2011-03.zip474.7 MB4,120 records
2011-02.zip271.5 MB1,676 records
2011-01.zip27.5 MB378 records
2010-12.zip44.8 MB490 records
2010-11.zip35.4 MB397 records
2010-10.zip27.6 MB373 records
2010-09.zip43.4 MB502 records
2010-08.zip31.1 MB372 records
2010-07.zip25.0 MB332 records
2010-06.zip39.9 MB456 records
2010-05.zip36.4 MB432 records
2010-04.zip105.8 MB1,407 records
2010-03.zip496.8 MB4,291 records
2010-02.zip248.7 MB1,583 records
2010-01.zip27.2 MB394 records
2009-12.zip43.1 MB496 records
2009-11.zip36.9 MB421 records
2009-10.zip33.3 MB443 records
2009-09.zip42.7 MB501 records
2009-08.zip33.7 MB415 records
2009-07.zip31.7 MB453 records
2009-06.zip44.1 MB526 records
2009-05.zip40.2 MB500 records
2009-04.zip113.7 MB1,505 records
2009-03.zip518.2 MB4,727 records
2009-02.zip218.5 MB1,450 records
2009-01.zip24.3 MB348 records
2008-12.zip41.2 MB413 records
2008-11.zip25.4 MB258 records
2008-10.zip22.5 MB321 records
2008-09.zip37.5 MB432 records
2008-08.zip24.0 MB284 records
2008-07.zip21.2 MB336 records
2008-06.zip33.5 MB405 records
2008-05.zip27.8 MB405 records
2008-04.zip73.1 MB927 records
2008-03.zip399.3 MB4,552 records
2008-02.zip262.2 MB1,672 records
2008-01.zip15.7 MB171 records
2007-12.zip32.5 MB295 records
2007-11.zip26.9 MB231 records
2007-10.zip18.4 MB197 records
2007-09.zip29.9 MB352 records
2007-08.zip26.7 MB236 records
2007-07.zip19.8 MB195 records
2007-06.zip34.4 MB319 records
2007-05.zip30.6 MB314 records
2007-04.zip133.7 MB1,637 records
2007-03.zip434.6 MB4,605 records
2007-02.zip164.8 MB1,097 records
2007-01.zip16.0 MB165 records
2006-12.zip41.1 MB369 records
2006-11.zip17.9 MB163 records
2006-10.zip18.1 MB175 records
2006-09.zip36.0 MB359 records
2006-08.zip18.7 MB336 records
2006-07.zip15.1 MB170 records
2006-06.zip37.2 MB388 records
2006-05.zip25.5 MB365 records
2006-04.zip54.3 MB643 records
2006-03.zip519.7 MB6,175 records
2006-02.zip76.6 MB604 records
2006-01.zip15.4 MB197 records
2005-12.zip42.0 MB429 records
2005-11.zip19.1 MB208 records
2005-10.zip15.4 MB188 records
2005-09.zip38.2 MB448 records
2005-08.zip20.0 MB336 records
2005-07.zip18.5 MB236 records
2005-06.zip34.5 MB424 records
2005-05.zip29.8 MB495 records
2005-04.zip63.0 MB947 records
2005-03.zip486.1 MB6,377 records
2005-02.zip46.1 MB427 records
2005-01.zip14.5 MB210 records
2004-12.zip35.5 MB433 records
2004-11.zip17.4 MB185 records
2004-10.zip14.2 MB211 records
2004-09.zip31.4 MB427 records
2004-08.zip15.2 MB208 records
2004-07.zip15.0 MB233 records
2004-06.zip32.6 MB447 records
2004-05.zip18.2 MB297 records
2004-04.zip51.8 MB874 records
2004-03.zip438.6 MB6,078 records
2004-02.zip42.3 MB521 records
2004-01.zip16.0 MB279 records
2003-12.zip33.5 MB410 records
2003-11.zip12.9 MB178 records
2003-10.zip17.0 MB269 records
2003-09.zip36.1 MB451 records
2003-08.zip14.0 MB214 records
2003-07.zip18.6 MB337 records
2003-06.zip32.6 MB522 records
2003-05.zip24.5 MB397 records
2003-04.zip63.3 MB1,069 records
2003-03.zip404.8 MB5,668 records
2003-02.zip29.1 MB349 records
2003-01.zip14.4 MB259 records
2002-12.zip30.4 MB547 records
2002-11.zip15.4 MB340 records
2002-10.zip13.9 MB252 records
2002-09.zip29.4 MB465 records
2002-08.zip13.2 MB293 records
2002-07.zip17.2 MB370 records
2002-06.zip21.9 MB426 records
2002-05.zip20.7 MB424 records
2002-04.zip149.2 MB2,446 records
2002-03.zip256.1 MB4,404 records
2002-02.zip18.8 MB321 records
2002-01.zip13.0 MB356 records
2001-12.zip24.7 MB459 records
2001-11.zip9.9 MB199 records
2001-10.zip14.1 MB280 records
2001-09.zip25.9 MB497 records
2001-08.zip12.2 MB255 records
2001-07.zip14.9 MB285 records
2001-06.zip23.9 MB533 records
2001-05.zip19.0 MB471 records
2001-04.zip149.0 MB2,914 records
2001-03.zip215.4 MB4,312 records
2001-02.zip15.9 MB314 records
2001-01.zip12.0 MB279 records
2000-12.zip23.2 MB474 records
2000-11.zip9.6 MB204 records
2000-10.zip12.7 MB306 records
2000-09.zip28.9 MB586 records
2000-08.zip10.3 MB242 records
2000-07.zip11.3 MB272 records
2000-06.zip24.8 MB622 records
2000-05.zip28.2 MB605 records
2000-04.zip80.0 MB1,182 records
2000-03.zip560.6 MB6,180 records
2000-02.zip27.3 MB362 records
2000-01.zip21.4 MB348 records
1999-12.zip45.1 MB659 records
1999-11.zip13.8 MB225 records
1999-10.zip19.9 MB321 records
1999-09.zip45.2 MB609 records
1999-08.zip16.7 MB276 records
1999-07.zip19.2 MB335 records
1999-06.zip40.8 MB654 records
1999-05.zip20.9 MB411 records
1999-04.zip96.6 MB1,621 records
1999-03.zip513.2 MB6,199 records
1999-02.zip24.3 MB337 records
1999-01.zip22.1 MB370 records
1998-12.zip43.2 MB571 records
1998-11.zip13.9 MB232 records
1998-10.zip20.8 MB353 records
1998-09.zip46.2 MB647 records
1998-08.zip14.3 MB249 records
1998-07.zip21.0 MB360 records
1998-06.zip43.1 MB711 records
1998-05.zip33.7 MB581 records
1998-04.zip83.1 MB1,535 records
1998-03.zip515.0 MB6,315 records
1998-02.zip25.7 MB324 records
1998-01.zip23.2 MB384 records
1997-12.zip45.9 MB615 records
1997-11.zip13.6 MB230 records
1997-10.zip24.5 MB438 records
1997-09.zip48.3 MB701 records
1997-08.zip17.4 MB300 records
1997-07.zip19.0 MB398 records
1997-06.zip40.1 MB683 records
1997-05.zip31.3 MB634 records
1997-04.zip74.0 MB1,371 records
1997-03.zip463.3 MB6,000 records
1997-02.zip27.8 MB408 records
1997-01.zip22.9 MB415 records
1996-12.zip44.5 MB610 records
1996-11.zip15.4 MB279 records
1996-10.zip18.4 MB353 records
1996-09.zip43.6 MB642 records
1996-08.zip14.0 MB276 records
1996-07.zip23.1 MB419 records
1996-06.zip22.6 MB497 records
1996-05.zip19.6 MB428 records
1996-04.zip87.2 MB1,426 records
1996-03.zip191.0 MB2,543 records
1996-02.zip17.8 MB242 records
1996-01.zip12.4 MB216 records
1995-12.zip19.0 MB301 records
1995-11.zip7.1 MB139 records
1995-10.zip9.9 MB193 records
1995-09.zip23.8 MB347 records
1995-08.zip7.7 MB154 records
1995-07.zip7.3 MB156 records
1995-06.zip15.4 MB283 records
1995-05.zip10.3 MB230 records
1995-04.zip20.6 MB369 records
1995-03.zip165.1 MB1,854 records
1995-02.zip9.8 MB132 records
1995-01.zip6.5 MB105 records
1994-12.zip10.5 MB157 records
1994-11.zip3.7 MB65 records
1994-10.zip4.0 MB62 records
1994-09.zip11.3 MB152 records
1994-08.zip3.8 MB58 records
1994-07.zip3.9 MB63 records
1994-06.zip5.8 MB165 records
1994-05.zip7.4 MB115 records
1994-04.zip12.8 MB219 records
1994-03.zip123.2 MB1,249 records
1994-02.zip5.9 MB74 records
1994-01.zip2.8 MB47 records
1993-12.zip197.5 KB4 records
1993-11.zip66.4 KB1 records

What This Dataset Contains

Form variants included:

Form TypeDescription
10-KStandard annual report for domestic U.S. Exchange Act registrants
10-K/AAmendment to a previously accepted 10-K; may refile the full document or amend specific items only
10-K405 / 10-K405/ALegacy form type used before 2002 under the Rule 405 check-box convention; functionally equivalent to 10-K; treated as standard annual reports for research purposes
10-KSB / 10-KSB/ASmall business annual report; accepted through approximately March 2009; retired by SEC rule effective February 2008
10-KT / 10-KT/ATransition period annual report filed when a registrant changes its fiscal year end

Date range: October 1993 to present. EDGAR mandatory electronic filing began in 1993 for large filers and expanded to all domestic registrants through the mid-1990s.

Update cadence: Daily. New filings are added within one business day of the EDGAR acceptance timestamp.

Registrant coverage: Survivorship-bias-free. All registrants required to file Form 10-K are represented — companies that subsequently went bankrupt, were acquired, went private, or were deregistered are included. No historical filers have been excluded.

Document formats by era:

EraFormat
1993–c. 1999Plain-text ASCII wrapped in SGML (<DOCUMENT>, <TEXT> delimiters); fixed-width financial tables
c. 2000–2009HTML; financial statements in <table> elements
2009–2019HTML primary document; standalone XBRL instance documents were filed as separate exhibits and are excluded from this dataset
2020–presentHTML with inline XBRL (iXBRL) embedded using ix:nonNumeric and ix:nonFraction elements; large accelerated filers required from 2020; all filers by 2021

What is excluded: Exhibits filed as separate EDGAR documents (Exhibit 21 subsidiary lists, Exhibit 23 auditor consents, CEO/CFO certifications, material contracts); embedded images and scanned graphs; standalone XBRL taxonomy extension files (.xsd, .cal, .def, .lab, .pre); standalone XBRL instance documents from the 2009–2019 era; EDGAR submission header metadata.

Not included: Form 20-F (foreign private issuers), Form 40-F (Canadian MJDS issuers), Form 10-Q (quarterly reports), DEF 14A proxy statements.

Content Structure of a Single Record

One record equals one EDGAR submission: a single annual report filing by a single registrant for a single fiscal period, uniquely identified by the accession number (e.g., 0001193125-24-123456). The record provides the primary filing document — the full annual report text in original submission format. Exhibits and ancillary documents attached to the same EDGAR submission are not included.

Form 10-K is the SEC's comprehensive annual report under Rules 13a-1 and 15d-1 of the Securities Exchange Act of 1934. Unlike the annual report to shareholders, it is a regulatory filing with prescribed content under Regulation S-K (narrative disclosures) and Regulation S-X (financial statement form and content). The document is organized into four Parts.

Part I

Item 1 — Business Products and services; business segments and revenue contributions; key customer concentrations; supply chain and distribution; competitive dynamics; applicable regulatory framework; intellectual property (patents, trademarks, trade secrets); and, since fiscal year 2020, human capital resources (workforce composition, development, and retention programs). For diversified companies this section routinely runs 15–30 pages and is the primary narrative source for business classification, competitive landscape research, and industry-specific disclosure analysis.

Item 1A — Risk Factors Structured enumeration of material risks to the business, financial condition, and the registrant's securities. Required as a standalone item for non-smaller reporting companies since fiscal year 2005 (SEC Release No. 33-8591). Smaller reporting companies may omit. Risk factor disclosures range from a few pages to 30+ pages. A summary of risk factors is required when risk factors exceed 15 pages (SEC Release No. 33-10825). The primary source for NLP risk extraction, taxonomy construction, and year-over-year disclosure change analysis.

Item 1B — Unresolved Staff Comments Written SEC staff comments on periodic or registration reports remaining unresolved after 180 days. Blank in the vast majority of filings; a non-blank disclosure signals active SEC staff review.

Item 1C — Cybersecurity Added by SEC Release No. 33-11216; required for fiscal years ending on or after December 15, 2023. Three components: material cybersecurity risk management processes; strategy and board/management governance; and material cybersecurity incidents if any. Absent from all filings before fiscal year 2023.

Item 2 — Properties Material owned and leased physical properties: headquarters, manufacturing facilities, warehouses, retail locations, research sites.

Item 3 — Legal Proceedings Material pending litigation and regulatory proceedings meeting the Regulation S-K Item 103 materiality threshold. Environmental proceedings with government authority claims above $300,000 require specific disclosure.

Item 4 — Mine Safety Disclosures Required only for registrants operating domestic coal or metal/nonmetal mines under the Federal Mine Safety and Health Act. Virtually all non-mining filers carry a standard "Not applicable" placeholder.

Part II

Item 5 — Market for Registrant's Common Equity and Issuer Purchases Equity market data; number of registered holders; dividend history and policy; share repurchase activity; performance graph comparing five-year total stockholder return to a market index and peer group (optional for smaller reporting companies).

Item 6 — [Reserved] Formerly "Selected Financial Data" — a five-year table of selected financial metrics. Eliminated effective February 10, 2021 (SEC Release No. 33-10890). Pre-2021 filings contain this table; post-2021 filings carry a placeholder or omit the item. A structural discontinuity to account for in long-horizon time-series construction.

Item 7 — Management's Discussion and Analysis (MD&A) The most analytically dense section of the annual report. Standard content: results of operations (revenue, cost, margin, and operating expense drivers by segment and year-over-year comparison); critical accounting estimates; liquidity and capital resources (operating, investing, and financing cash flows; debt capacity; going concern assessment when applicable); off-balance-sheet arrangements. Contains forward-looking language, qualitative outlook, and management's narrative framing of financial results — the primary source for tone signals, guidance extraction, and linguistic complexity analysis.

Item 7A — Quantitative and Qualitative Disclosures About Market Risk Quantified exposure to interest rate, foreign currency, and commodity price risk. Accelerated and large accelerated filers typically provide sensitivity tables or value-at-risk estimates. Smaller reporting companies may omit this item.

Item 8 — Financial Statements and Supplementary Data The complete audited annual financial statements under U.S. GAAP (or investment company accounting for BDCs), comprising:

  • Independent auditor's report, including PCAOB engagement partner name and, for accelerated filers since fiscal year 2019, Critical Audit Matters (CAMs)
  • Consolidated balance sheets (two periods)
  • Consolidated statements of operations or comprehensive income (three periods)
  • Consolidated statements of cash flows (three periods)
  • Consolidated statements of changes in stockholders' equity (three periods)
  • Notes to financial statements, including: accounting policies; revenue recognition (ASC 606); goodwill and intangibles; segment reporting (ASC 280); lease obligations (ASC 842); long-term debt schedule with maturity profile, covenant detail, and interest rates; pension and OPEB; share-based compensation; income taxes (rate reconciliation, deferred taxes, uncertain tax positions); fair value hierarchy (Level 1/2/3); related-party transactions; subsequent events

Item 9 — Changes in and Disagreements With Accountants Disclosure of auditor changes and disagreements on accounting or financial disclosure. Blank in most filings; a non-blank disclosure is rare and typically accompanied by a concurrent Form 8-K Item 4.01 filing.

Item 9A — Controls and Procedures Three sub-disclosures: (a) evaluation of disclosure controls and procedures under SOX Section 302; (b) management's annual report on ICFR effectiveness under SOX Section 404(a) — required for all reporting companies; (c) independent auditor attestation on ICFR under SOX Section 404(b) — required for accelerated and large accelerated filers only; EGCs exempt for up to five fiscal years post-IPO. Material weakness disclosures appear in sub-part (b).

Item 9B — Other Information Catch-all for events not yet reported on Form 8-K; since February 2023 (SEC Release No. 33-11138) also includes Rule 10b5-1 trading plan adoption, modification, and termination disclosures.

Item 9C — Foreign Jurisdictions Preventing PCAOB Inspections Adopted under the Holding Foreign Companies Accountable Act (HFCAA, 2020). Required when the registrant's auditor issued an audit report where PCAOB inspection access was restricted. Primarily relevant for Chinese-domiciled registrants and other issuers in PCAOB-restricted jurisdictions. Required for fiscal years ending after December 15, 2022.

Part III (Items 10–14)

Directors and corporate governance; executive compensation (Summary Compensation Table, Outstanding Equity Awards, Pension Benefits, Nonqualified Deferred Compensation); beneficial ownership of securities; related-party transactions and policies; accountant fees by category (audit, audit-related, tax, other).

Most calendar-year filers incorporate Part III by reference from the definitive proxy statement (DEF 14A) filed within 120 days of fiscal year end. When this is the case, the 10-K primary document contains only a cross-reference sentence — the substantive governance and compensation disclosures are in the DEF 14A, a separate EDGAR filing not included in this dataset. Registrants that do not timely file a proxy (many smaller, non-accelerated, and debt-only filers) include Part III substantively within the 10-K itself.

Part IV

Item 15 — Exhibits and Financial Statement Schedules Exhibit index listing all exhibits filed with or incorporated by reference into the 10-K (Exhibit 21 subsidiary list, Exhibit 23 auditor consent, Exhibit 31 SOX 302 certifications, Exhibit 32 SOX 906 certifications, Exhibit 10 material contracts, etc.); Regulation S-X Article 12 financial statement schedules when required. The listed exhibits are separate EDGAR documents and are not included in this dataset.

Item 16 — Form 10-K Summary Optional; rarely used in practice.

Signature block: Signed by the principal executive officer, principal financial officer, principal accounting officer, and a majority of the board of directors, each with title and execution date.

Parsing Notes for Practitioners

TXT/SGML format (1993–c. 1999): The primary 10-K document is the first <DOCUMENT> block with <TYPE>10-K. Strip <DOCUMENT>, <SEQUENCE>, <FILENAME>, and <TEXT> tags before processing. Financial tables use whitespace-padded columns with no semantic markup; extracting tabular data requires regex-based column alignment. Negative values are commonly shown with parentheses rather than a minus sign.

HTML format (c. 2000–present): Financial statement tables use nested <table>, <tr>, <td> elements with colspan/rowspan merged cells. Strip navigation headers, running footers, and CSS boilerplate before NLP processing. Character encoding: ISO-8859-1 for older filings; UTF-8 for most filings from approximately 2010 onward.

Inline XBRL (2020–present): ix:nonFraction wraps numerical financial values; ix:nonNumeric wraps textual XBRL content. Each element carries contextRef (period and entity), name (US-GAAP or extension taxonomy concept), and unitRef. For NLP pipelines: use a namespace-aware parser and strip ix:* elements, preserving inner text. For structured financial data extraction: parse XBRL context elements to recover period dates, entity identifiers, and segment dimensions.

10-K/A amendments: Full-document refiling is most common. Partial-item amendments are identifiable from the EDGAR header <ITEMS> field and the amendment cover page. Point-in-time databases must define a version selection rule (original-as-filed vs. most-recently-amended).

10-K405 and 10-K405/A: Legacy form types last used before 2002. Content and item structure are functionally equivalent to standard 10-K filings of the same era. No special handling required beyond form type identification.

10-KSB and 10-KSB/A: Compressed item structure (Items 1–13 vs. 1–16); Article 8 of Regulation S-X (two years of audited income data rather than three). Cross-period panels spanning 2007–2009 must handle the structural break when smaller companies transitioned from 10-KSB to 10-K.

10-KT and 10-KT/A: Transition period stated explicitly on the cover page. Income statement comparatives may cover a non-standard prior period. Longitudinal linking requires period-date-based matching rather than fiscal year conventions.

Who Files Form 10-K, and When

Filer Categories Covered

Domestic operating companies subject to Exchange Act periodic reporting under Section 13(a) or Section 15(d): companies listed on NYSE, Nasdaq, or other exchanges (Section 12(b)); companies meeting Section 12(g) holder-of-record thresholds (generally 2,000 holders of record); and issuers whose Securities Act registration statement became effective, triggering Section 15(d) reporting.

Filer sub-categories:

  • Large accelerated filers: public float ≥ $700 million; 60-day deadline; SOX 404(b) auditor attestation required
  • Accelerated filers: public float $75M–$700M; 75-day deadline; SOX 404(b) required
  • Non-accelerated filers: public float < $75M or no public float; 90-day deadline; SOX 404(b) not required
  • Smaller reporting companies (SRCs): public float < $250M or revenue < $100M (alternative test); may use scaled disclosures including optional omission of Items 1A and 7A and two (rather than three) years of audited financial statements
  • Emerging growth companies (EGCs): post-JOBS Act IPO (2011+); annual gross revenues below $1.235 billion; reduced executive compensation disclosures; exempt from SOX 404(b) for up to five fiscal years post-IPO

Shell companies and SPACs: File Form 10-K throughout their active reporting lifecycle. SPAC 10-Ks disclose trust account balances, extension provisions, and target search status. 10-KT filings commonly arise from SPACs that change fiscal year end in connection with a de-SPAC transaction. Post-combination, the surviving operating entity continues filing Form 10-K.

Business Development Companies (BDCs): Closed-end investment companies with BDC status under the Investment Company Act of 1940 that are Exchange Act reporting companies. BDC financial statements follow investment company accounting (ASC 946): portfolio investments at fair value; NAV per share as the primary balance sheet metric; schedule of investments; fair value hierarchy footnotes.

Debt-only registrants: High-yield bond issuers and other entities with only registered debt securities — no publicly traded equity — incur Exchange Act reporting obligations and file Form 10-K. Entirely absent from equity-focused databases (Compustat, CRSP, Bloomberg equity data). Their 10-Ks contain the most detailed covenant and debt structure disclosures in the 10-K universe: covenant packages, restricted payment baskets, intercreditor arrangements, and subordination hierarchies.

Foreign private issuers on domestic forms: FPIs where U.S. residents hold more than 50% of outstanding voting securities cannot claim FPI status and must file on domestic forms including Form 10-K. A small but notable sub-population.

Companies in bankruptcy or financial distress: Exchange Act reporting obligations continue through Chapter 11 reorganization and, in many cases, through Chapter 7 liquidation, until deregistration is formally effective. Their 10-Ks include going concern opinions (Item 8 audit report), DIP financing disclosures (Item 7), and reorganization plan summaries. Because this dataset is survivorship-bias-free, all such filings are included.

Filing Deadlines

Filer CategoryPublic FloatDeadline After Fiscal Year End
Large accelerated filer≥ $700 million60 days
Accelerated filer$75 million – < $700 million75 days
Non-accelerated filer< $75 million or no public float90 days

Public float is measured as of the last business day of the most recently completed second fiscal quarter. A 15-day automatic extension is available via Form 12b-25 (NT 10-K). For December 31 fiscal year end filers — the largest cohort — the peak filing window runs approximately February 1 through March 31.

DatasetFiler PopulationPeriodAudited FinancialsKey Distinction
Form 10-K (this dataset)Domestic issuers + FPIs on domestic formsAnnualYes (U.S. GAAP)Full text, all 8 variants, 1993–present, daily updates, survivorship-bias-free
Form 20-FForeign private issuers (non-Canadian)AnnualYes (IFRS or U.S. GAAP)Mutually exclusive filer population; IFRS permitted; 4-month deadline; different item structure
Form 40-FCanadian MJDS issuers onlyAnnualYes (IFRS or Canadian GAAP)Smallest annual report population; content from Canadian regulatory filings
Form 10-QSame as 10-KQuarterlyNo (reviewed, not audited)Condensed; no full business description; three filed per fiscal year
DEF 14AExchange Act registrants soliciting proxiesAnnual (pre-meeting)NoContains Part III disclosures often incorporated by reference into 10-K; not a periodic report
Form 8-KExchange Act registrantsEvent-drivenVariesReal-time single-event disclosure; not an annual report
Form 10-KSB (retired)Small U.S. domestic issuers (pre-2009)AnnualYesIncluded in this dataset as 10-KSB/10-KSB/A variants; compressed item structure; retired 2008

What makes this dataset distinct from commercial alternatives:

  • Original full text, no normalization: Every record is the primary filing document as accepted by EDGAR — no vendor-imposed parsing, truncation, or line-item normalization. HTML structure and iXBRL tagging are preserved.
  • All 8 variants unified: 10-K, 10-K/A, 10-K405, 10-K405/A, 10-KSB, 10-KSB/A, 10-KT, and 10-KT/A are all present. No separate queries are needed to retrieve a registrant's complete annual filing history.
  • Survivorship-bias-free across 33 years: Bankrupt, acquired, delisted, and deregistered filers are included. A requirement for valid empirical research on distress, long-run returns, and regulatory effects.
  • Debt-only registrant coverage: High-yield bond issuers and companies with only registered debt are present — a population systematically absent from equity-oriented databases.
  • Daily updates: New filings appear within one business day of EDGAR acceptance. Enables real-time compliance and monitoring pipelines.
  • iXBRL embedded for modern filings: Structured financial values are accessible directly from the included HTML documents for 2020+ filings without a separate XBRL download.

Who Uses This Dataset

Quantitative researchers and financial data scientists use Item 7 for MD&A tone signals and Item 8 with iXBRL to build survivorship-bias-free financial panel datasets that extend or supplement Compustat with bankrupt and deregistered filers. They run longitudinal factor models with 30+ year backtesting windows and use Item 1A to study risk category emergence and proliferation over time.

Credit analysts and fixed income researchers parse Item 8 debt footnotes for covenant terms, maturity schedules, EBITDA definitions, and restricted payment baskets. This dataset is especially valuable for high-yield bond issuers and debt-only registrants — companies with no public equity that are entirely absent from equity-focused databases but file Form 10-K with the most disclosure-rich covenant and debt structure language in the annual report universe.

NLP engineers and AI/ML practitioners use the 298,000+-filing full-text corpus for LLM pre-training, domain adaptation, and fine-tuning. The consistent item structure provides labeled training data for section segmentation without manual annotation. iXBRL tags enable text-to-structured-value alignment for financial NER models. RAG pipeline builders use item-level metadata to enable filtered retrieval by company, year, and disclosure type.

Compliance officers and regulatory monitoring teams ingest new 10-K filings daily to monitor material weakness disclosures (Item 9A), legal proceedings (Item 3), cybersecurity governance (Item 1C, post-2023), and PCAOB access risk (Item 9C) for counterparties and portfolio companies. Daily update cadence enables real-time alerts across all registrant types — including SPACs, BDCs, shell companies, and debt-only registrants not covered by commercial vendor surveillance products.

M&A analysts and investment bankers extract business descriptions (Item 1), audited historical financials and segment data (Item 8), and litigation exposure (Item 3) for due diligence, LBO and DCF modeling, and comparable company analysis. The survivorship-bias-free corpus supports precedent transaction research on companies that are no longer publicly traded.

Academic researchers in accounting, finance, and economics require the survivorship-bias-free full-population coverage, 33-year historical depth, and item-level text for empirical studies on disclosure quality, earnings quality, material weakness consequences, regulatory effects, and fraud prediction. The 10-KSB and 10-KT variants allow inclusion of small-filer and fiscal year transition observations typically excluded from commercial data providers.

Financial data vendors and platform operators use the raw corpus as an upstream source to build structured fundamental databases, full-text search indexes, and derived NLP analytics products. Original format preservation eliminates normalization constraints that would otherwise limit downstream product design.

Specific Use Cases

1. Survivorship-Bias-Free SOX 404 Material Weakness and Restatement Study (Academic Research)

An accounting researcher studying whether material weakness disclosures in Item 9A predict subsequent earnings restatements extracts Item 9A text from all 10-K and 10-K/A filings from 2004 to 2024. Because the dataset includes bankrupt and deregistered filers — disproportionately likely to have material weaknesses — the sample is free of survivorship bias. Restated fiscal years are identified from 10-K/A filings containing restated Item 8 financial statements, identifiable from the amendment cover page and restated financial statement headers. The (registrant, fiscal year) panel is linked by CIK to Compustat (GVKEY crosswalk) for financial control variables and to Audit Analytics for auditor identity. The result enables regression analysis of disclosure quality and audit outcomes across the full domestic filing population.

2. Risk Factor Corpus for LLM Fine-Tuning (NLP Engineering)

An NLP engineering team at a financial technology company extracts Item 1A text from all 10-K HTML filings from 2005 (when Item 1A became a required standalone item) to the present, segments each filing's risk section into individual risk factor paragraphs using heading-detection heuristics, and labels each paragraph with CIK, fiscal year, SIC code, and filer category. The resulting corpus of approximately 15–20 million paragraphs is used to domain-adapt a transformer model and fine-tune a binary classifier distinguishing boilerplate from specific, material, and forward-looking risk language. The production pipeline processes each newly filed 10-K within one business day of EDGAR acceptance and surfaces meaningfully changed risk factors using cosine similarity comparison against the same registrant's prior-year Item 1A.

3. High-Yield Portfolio Covenant Monitoring (Credit Analysis)

A credit-focused hedge fund ingests 10-K and 10-K/A filings for approximately 80 portfolio bond issuers, including roughly 30 debt-only registrants absent from all equity databases. An automated pipeline extracts Item 8 long-term debt footnote text and parses it for covenant terms (leverage ratio, interest coverage, restricted payment baskets), upcoming maturity dates, and change-of-control provisions. Year-over-year diff comparison of covenant language identifies basket erosions and covenant amendments before they trigger a ratings action. Item 9A material weakness disclosures generate an immediate escalation alert to the portfolio manager. Item 7 Liquidity and Capital Resources text is extracted to track management's forward-looking refinancing narrative.

4. MD&A Tone Shift Signal for Systematic Equity (Quantitative Finance)

A systematic equity fund processes each newly filed 10-K through a daily pipeline that extracts Item 7 text, computes Loughran-McDonald positive and negative word count scores normalized by total word count, and compares the score to the same registrant's prior-year Item 7 retrieved by CIK and fiscal year − 1. The tone change delta is normalized within the registrant's SIC-2-digit industry peer group (all registrants filing 10-Ks in the same 60-day rolling window). Registrants in the bottom decile of peer-group-normalized tone change generate a short-side signal; top-decile registrants generate a long-side signal. Both are fed as inputs to the fund's multi-factor return prediction model.

5. RAG-Based M&A Due Diligence Assistant (Legal Technology)

A legal technology company builds a retrieval-augmented generation system over 10-K filings for a transaction target and comparable companies. Each 10-K HTML document is loaded, stripped of navigation headers and CSS, segmented into item-level sections, and chunked at the paragraph level (300-500 tokens per chunk). Each chunk is tagged with CIK, registrant name, accession number, fiscal year, item number, and form type, embedded with a financial-domain model, and loaded into a vector store. M&A attorneys query the system in natural language — "What legal proceedings did the target disclose in the last three fiscal years?" or "Has this company ever disclosed a material weakness?" — and receive retrieved paragraphs with source citations grounded in the original SEC filing. The 10-K/A inclusion ensures restated financials and amended governance disclosures are indexed alongside originals.

6. SPAC Annual Report Compliance Surveillance (Compliance Monitoring)

A compliance officer at a SPAC-focused investment firm monitors 25 pre-combination SPACs through a daily pipeline that extracts Item 1 (trust account balance, extension terms, combination search status), Item 1A (extension risk, redemption risk, and liquidation risk language), and Item 7 Liquidity and Capital Resources (operating cash burn outside trust and remaining runway). 10-KT filings are flagged for additional review as indicators of a recent or imminent de-SPAC transaction or fiscal year change. Regulatory calendar alerts are set automatically 30, 60, and 90 days ahead of any extension deadline extracted from the Item 1 narrative.

7. Pre-XBRL Financial Time-Series Backfill for Multi-Decade Factor Research (Financial Data Vendor)

A financial data vendor extends its normalized fundamental database to 1993 by parsing TXT/SGML and early HTML 10-K primary documents from the pre-XBRL era. For TXT filings, the pipeline strips SGML wrappers, identifies financial statement sections using regex on standardized headers ("CONSOLIDATED STATEMENTS OF OPERATIONS," "CONSOLIDATED BALANCE SHEETS"), and parses fixed-width ASCII columns using position-based extraction. For HTML filings, a DOM-based table extractor resolves colspan merged headers and maps extracted line items to canonical taxonomy concepts via fuzzy label matching. Output rows in the form (CIK, accession number, fiscal year end, statement type, line item label, value, scale, unit) enable multi-decade factor construction and backtesting over a 30+ year window — covering the full EDGAR electronic filing era before the XBRL mandate.

Access and Format Notes

Container format: ZIP. The full dataset is distributed as a ZIP archive of primary filing documents.

Content types: HTML (for filings from approximately 2000 onward), and TXT with SGML wrappers (for older filings and some modern filers that continue to file in plain text).

Record identifiers: Each record carries the registrant's CIK, the accession number (unique per EDGAR submission), the form type, and the period of report (fiscal year end date).

iXBRL availability: Not present before 2009. For 2009–2019 filings, XBRL data was filed as a separate instance document (excluded from this dataset). For 2020–2021 onward (phased by filer category), iXBRL is embedded in the HTML primary document and is included.

Volume context: For December 31 fiscal year end registrants — the majority of U.S. public companies — the peak 10-K filing window runs approximately February 1 through March 31, accounting for several thousand filings in this period. Non-December fiscal year filers are distributed throughout the year, producing a continuous daily stream.

Intended use: Bulk download and large-scale corpus processing. For single-document lookup, EDGAR full-text search or the EDGAR filing viewer is more efficient than bulk dataset access.

Frequently Asked Questions

Does this dataset include 10-K/A amendments? Yes. 10-K/A filings are included as separate records, each with its own accession number and filing date. The original 10-K and all subsequent amendments for the same registrant and fiscal year share the same CIK and period-of-report date but differ in accession number. Users building point-in-time databases must define a version selection rule — original-as-filed, most-recently-amended, or all versions — and apply it consistently.

Are Form 20-F filings (foreign private issuers) included? No. Form 20-F is filed by foreign private issuers, a filer population that is largely mutually exclusive from the domestic 10-K population. For global coverage of EDGAR annual reports, a separate 20-F dataset is required.

What are the 10-K405 and 10-K405/A form types? These are legacy form types used before 2002. The "405" designation referred to a checkbox on the cover page for compliance with Exchange Act Rule 405 (relating to stock ownership reporting by insiders). The content and item structure of 10-K405 filings are functionally identical to standard 10-K filings of the same era. They are included in this dataset under their original EDGAR form type identifiers.

What happened to Form 10-KSB? Form 10-KSB was retired by SEC rule effective February 4, 2008. Smaller reporting companies transitioned to Form 10-K with scaled disclosure accommodations for fiscal years beginning after December 15, 2007. Historical 10-KSB filings through approximately early 2009 are included in this dataset. Note that 10-KSB uses a different item numbering structure (Items 1–13) and Article 8 of Regulation S-X financial statement rules, which differ from standard 10-K requirements.

How are Part III disclosures handled when incorporated by reference from the proxy statement? When a registrant incorporates Part III items (10–14) by reference from its DEF 14A proxy statement, the 10-K primary document contains only a cross-reference sentence — not the substantive director, compensation, or ownership disclosures. Those disclosures are in the DEF 14A, which is a separate EDGAR filing not included in this dataset. For research requiring full Part III data, users must cross-reference the corresponding DEF 14A by CIK and reporting period.

Does this dataset include structured financial data, or only the original documents? The dataset provides original filing documents in HTML and TXT format. For filings from approximately 2020–2021 onward (phased by filer category), inline XBRL embedded in the HTML provides machine-readable structured financial values. For earlier filings, financial values must be extracted from HTML tables or ASCII text using a parsing pipeline. No pre-normalized structured financial database is included — this dataset is the source material from which such databases are built.

Are debt-only registrants included? Yes. High-yield bond issuers and other companies with only registered debt securities — no publicly traded equity — incur Exchange Act reporting obligations and file Form 10-K. They are included in this dataset and are typically absent from all equity-focused databases (Compustat, CRSP, Bloomberg equity). Their annual reports typically contain the most detailed covenant and debt structure disclosures in the 10-K universe.

Does the dataset cover the new Item 1C Cybersecurity disclosures? Yes. Item 1C was required for fiscal years ending on or after December 15, 2023. Filings for fiscal year 2023 onward include this item where applicable. Filings before fiscal year 2023 do not contain Item 1C. New filings with cybersecurity disclosures are available within one business day of EDGAR acceptance due to the daily update cadence.

What is the XBRL coverage gap for 2009–2019 filings? From 2009 to approximately 2020–2021, EDGAR required XBRL data to be filed as a separate instance document (.xml file) rather than embedded in the primary HTML document. Those standalone XBRL files were attached as separate exhibits to the EDGAR submission — not part of the primary filing document. This dataset includes only the primary filing document, so standalone XBRL instance documents from the 2009–2019 era are excluded. For structured financial data extraction from that period, users must parse HTML table content directly from the primary document. From 2020 onward (large accelerated filers) and 2021 onward (all others), inline XBRL is embedded in the primary HTML document and is present in this dataset.

How does the 10-K filing season affect daily volume? For December 31 fiscal year end filers — the majority of U.S. public companies — the peak filing window is roughly February 1 through March 31 (60–90 days after fiscal year end, with 12b-25 extensions allowed). This produces several thousand new 10-K records during that window. Non-December fiscal year filers produce filings throughout the year. The daily update cadence ensures records are available within one business day of EDGAR acceptance regardless of filing season.