This dataset contains every Form 10-K annual report filing submitted to the SEC's EDGAR system from October 1993 through the present — 290,000+ records totaling 30+ GB — including all variants of the 10-K form family in original HTML and TXT format with inline XBRL preserved where applicable. It is updated daily and covers the complete EDGAR electronic filing era without survivorship bias: bankrupt companies, deregistered issuers, acquired firms, and delisted registrants are all represented.
The dataset is built for bulk download, large-scale corpus processing, financial research, compliance monitoring, and AI or LLM pipelines that require the full original text of annual reports at scale.
Programmatically retrieve the full list of dataset archive files, download URLs and dataset metadata.
Dataset Index JSON API
Download the entire dataset as a single archive file.
Download Entire Dataset:
Download a single container file (e.g. monthly archive) from the dataset.
Download Single Container:
Form variants included:
| Form Type | Description |
|---|---|
| 10-K | Standard annual report for domestic U.S. Exchange Act registrants |
| 10-K/A | Amendment to a previously accepted 10-K; may refile the full document or amend specific items only |
| 10-K405 / 10-K405/A | Legacy form type used before 2002 under the Rule 405 check-box convention; functionally equivalent to 10-K; treated as standard annual reports for research purposes |
| 10-KSB / 10-KSB/A | Small business annual report; accepted through approximately March 2009; retired by SEC rule effective February 2008 |
| 10-KT / 10-KT/A | Transition period annual report filed when a registrant changes its fiscal year end |
Date range: October 1993 to present. EDGAR mandatory electronic filing began in 1993 for large filers and expanded to all domestic registrants through the mid-1990s.
Update cadence: Daily. New filings are added within one business day of the EDGAR acceptance timestamp.
Registrant coverage: Survivorship-bias-free. All registrants required to file Form 10-K are represented — companies that subsequently went bankrupt, were acquired, went private, or were deregistered are included. No historical filers have been excluded.
Document formats by era:
| Era | Format |
|---|---|
| 1993–c. 1999 | Plain-text ASCII wrapped in SGML (<DOCUMENT>, <TEXT> delimiters); fixed-width financial tables |
| c. 2000–2009 | HTML; financial statements in <table> elements |
| 2009–2019 | HTML primary document; standalone XBRL instance documents were filed as separate exhibits and are excluded from this dataset |
| 2020–present | HTML with inline XBRL (iXBRL) embedded using ix:nonNumeric and ix:nonFraction elements; large accelerated filers required from 2020; all filers by 2021 |
What is excluded: Exhibits filed as separate EDGAR documents (Exhibit 21 subsidiary lists, Exhibit 23 auditor consents, CEO/CFO certifications, material contracts); embedded images and scanned graphs; standalone XBRL taxonomy extension files (.xsd, .cal, .def, .lab, .pre); standalone XBRL instance documents from the 2009–2019 era; EDGAR submission header metadata.
Not included: Form 20-F (foreign private issuers), Form 40-F (Canadian MJDS issuers), Form 10-Q (quarterly reports), DEF 14A proxy statements.
One record equals one EDGAR submission: a single annual report filing by a single registrant for a single fiscal period, uniquely identified by the accession number (e.g., 0001193125-24-123456). The record provides the primary filing document — the full annual report text in original submission format. Exhibits and ancillary documents attached to the same EDGAR submission are not included.
Form 10-K is the SEC's comprehensive annual report under Rules 13a-1 and 15d-1 of the Securities Exchange Act of 1934. Unlike the annual report to shareholders, it is a regulatory filing with prescribed content under Regulation S-K (narrative disclosures) and Regulation S-X (financial statement form and content). The document is organized into four Parts.
Item 1 — Business Products and services; business segments and revenue contributions; key customer concentrations; supply chain and distribution; competitive dynamics; applicable regulatory framework; intellectual property (patents, trademarks, trade secrets); and, since fiscal year 2020, human capital resources (workforce composition, development, and retention programs). For diversified companies this section routinely runs 15–30 pages and is the primary narrative source for business classification, competitive landscape research, and industry-specific disclosure analysis.
Item 1A — Risk Factors Structured enumeration of material risks to the business, financial condition, and the registrant's securities. Required as a standalone item for non-smaller reporting companies since fiscal year 2005 (SEC Release No. 33-8591). Smaller reporting companies may omit. Risk factor disclosures range from a few pages to 30+ pages. A summary of risk factors is required when risk factors exceed 15 pages (SEC Release No. 33-10825). The primary source for NLP risk extraction, taxonomy construction, and year-over-year disclosure change analysis.
Item 1B — Unresolved Staff Comments Written SEC staff comments on periodic or registration reports remaining unresolved after 180 days. Blank in the vast majority of filings; a non-blank disclosure signals active SEC staff review.
Item 1C — Cybersecurity Added by SEC Release No. 33-11216; required for fiscal years ending on or after December 15, 2023. Three components: material cybersecurity risk management processes; strategy and board/management governance; and material cybersecurity incidents if any. Absent from all filings before fiscal year 2023.
Item 2 — Properties Material owned and leased physical properties: headquarters, manufacturing facilities, warehouses, retail locations, research sites.
Item 3 — Legal Proceedings Material pending litigation and regulatory proceedings meeting the Regulation S-K Item 103 materiality threshold. Environmental proceedings with government authority claims above $300,000 require specific disclosure.
Item 4 — Mine Safety Disclosures Required only for registrants operating domestic coal or metal/nonmetal mines under the Federal Mine Safety and Health Act. Virtually all non-mining filers carry a standard "Not applicable" placeholder.
Item 5 — Market for Registrant's Common Equity and Issuer Purchases Equity market data; number of registered holders; dividend history and policy; share repurchase activity; performance graph comparing five-year total stockholder return to a market index and peer group (optional for smaller reporting companies).
Item 6 — [Reserved] Formerly "Selected Financial Data" — a five-year table of selected financial metrics. Eliminated effective February 10, 2021 (SEC Release No. 33-10890). Pre-2021 filings contain this table; post-2021 filings carry a placeholder or omit the item. A structural discontinuity to account for in long-horizon time-series construction.
Item 7 — Management's Discussion and Analysis (MD&A) The most analytically dense section of the annual report. Standard content: results of operations (revenue, cost, margin, and operating expense drivers by segment and year-over-year comparison); critical accounting estimates; liquidity and capital resources (operating, investing, and financing cash flows; debt capacity; going concern assessment when applicable); off-balance-sheet arrangements. Contains forward-looking language, qualitative outlook, and management's narrative framing of financial results — the primary source for tone signals, guidance extraction, and linguistic complexity analysis.
Item 7A — Quantitative and Qualitative Disclosures About Market Risk Quantified exposure to interest rate, foreign currency, and commodity price risk. Accelerated and large accelerated filers typically provide sensitivity tables or value-at-risk estimates. Smaller reporting companies may omit this item.
Item 8 — Financial Statements and Supplementary Data The complete audited annual financial statements under U.S. GAAP (or investment company accounting for BDCs), comprising:
Item 9 — Changes in and Disagreements With Accountants Disclosure of auditor changes and disagreements on accounting or financial disclosure. Blank in most filings; a non-blank disclosure is rare and typically accompanied by a concurrent Form 8-K Item 4.01 filing.
Item 9A — Controls and Procedures Three sub-disclosures: (a) evaluation of disclosure controls and procedures under SOX Section 302; (b) management's annual report on ICFR effectiveness under SOX Section 404(a) — required for all reporting companies; (c) independent auditor attestation on ICFR under SOX Section 404(b) — required for accelerated and large accelerated filers only; EGCs exempt for up to five fiscal years post-IPO. Material weakness disclosures appear in sub-part (b).
Item 9B — Other Information Catch-all for events not yet reported on Form 8-K; since February 2023 (SEC Release No. 33-11138) also includes Rule 10b5-1 trading plan adoption, modification, and termination disclosures.
Item 9C — Foreign Jurisdictions Preventing PCAOB Inspections Adopted under the Holding Foreign Companies Accountable Act (HFCAA, 2020). Required when the registrant's auditor issued an audit report where PCAOB inspection access was restricted. Primarily relevant for Chinese-domiciled registrants and other issuers in PCAOB-restricted jurisdictions. Required for fiscal years ending after December 15, 2022.
Directors and corporate governance; executive compensation (Summary Compensation Table, Outstanding Equity Awards, Pension Benefits, Nonqualified Deferred Compensation); beneficial ownership of securities; related-party transactions and policies; accountant fees by category (audit, audit-related, tax, other).
Most calendar-year filers incorporate Part III by reference from the definitive proxy statement (DEF 14A) filed within 120 days of fiscal year end. When this is the case, the 10-K primary document contains only a cross-reference sentence — the substantive governance and compensation disclosures are in the DEF 14A, a separate EDGAR filing not included in this dataset. Registrants that do not timely file a proxy (many smaller, non-accelerated, and debt-only filers) include Part III substantively within the 10-K itself.
Item 15 — Exhibits and Financial Statement Schedules Exhibit index listing all exhibits filed with or incorporated by reference into the 10-K (Exhibit 21 subsidiary list, Exhibit 23 auditor consent, Exhibit 31 SOX 302 certifications, Exhibit 32 SOX 906 certifications, Exhibit 10 material contracts, etc.); Regulation S-X Article 12 financial statement schedules when required. The listed exhibits are separate EDGAR documents and are not included in this dataset.
Item 16 — Form 10-K Summary Optional; rarely used in practice.
Signature block: Signed by the principal executive officer, principal financial officer, principal accounting officer, and a majority of the board of directors, each with title and execution date.
TXT/SGML format (1993–c. 1999): The primary 10-K document is the first <DOCUMENT> block with <TYPE>10-K. Strip <DOCUMENT>, <SEQUENCE>, <FILENAME>, and <TEXT> tags before processing. Financial tables use whitespace-padded columns with no semantic markup; extracting tabular data requires regex-based column alignment. Negative values are commonly shown with parentheses rather than a minus sign.
HTML format (c. 2000–present): Financial statement tables use nested <table>, <tr>, <td> elements with colspan/rowspan merged cells. Strip navigation headers, running footers, and CSS boilerplate before NLP processing. Character encoding: ISO-8859-1 for older filings; UTF-8 for most filings from approximately 2010 onward.
Inline XBRL (2020–present): ix:nonFraction wraps numerical financial values; ix:nonNumeric wraps textual XBRL content. Each element carries contextRef (period and entity), name (US-GAAP or extension taxonomy concept), and unitRef. For NLP pipelines: use a namespace-aware parser and strip ix:* elements, preserving inner text. For structured financial data extraction: parse XBRL context elements to recover period dates, entity identifiers, and segment dimensions.
10-K/A amendments: Full-document refiling is most common. Partial-item amendments are identifiable from the EDGAR header <ITEMS> field and the amendment cover page. Point-in-time databases must define a version selection rule (original-as-filed vs. most-recently-amended).
10-K405 and 10-K405/A: Legacy form types last used before 2002. Content and item structure are functionally equivalent to standard 10-K filings of the same era. No special handling required beyond form type identification.
10-KSB and 10-KSB/A: Compressed item structure (Items 1–13 vs. 1–16); Article 8 of Regulation S-X (two years of audited income data rather than three). Cross-period panels spanning 2007–2009 must handle the structural break when smaller companies transitioned from 10-KSB to 10-K.
10-KT and 10-KT/A: Transition period stated explicitly on the cover page. Income statement comparatives may cover a non-standard prior period. Longitudinal linking requires period-date-based matching rather than fiscal year conventions.
Domestic operating companies subject to Exchange Act periodic reporting under Section 13(a) or Section 15(d): companies listed on NYSE, Nasdaq, or other exchanges (Section 12(b)); companies meeting Section 12(g) holder-of-record thresholds (generally 2,000 holders of record); and issuers whose Securities Act registration statement became effective, triggering Section 15(d) reporting.
Filer sub-categories:
Shell companies and SPACs: File Form 10-K throughout their active reporting lifecycle. SPAC 10-Ks disclose trust account balances, extension provisions, and target search status. 10-KT filings commonly arise from SPACs that change fiscal year end in connection with a de-SPAC transaction. Post-combination, the surviving operating entity continues filing Form 10-K.
Business Development Companies (BDCs): Closed-end investment companies with BDC status under the Investment Company Act of 1940 that are Exchange Act reporting companies. BDC financial statements follow investment company accounting (ASC 946): portfolio investments at fair value; NAV per share as the primary balance sheet metric; schedule of investments; fair value hierarchy footnotes.
Debt-only registrants: High-yield bond issuers and other entities with only registered debt securities — no publicly traded equity — incur Exchange Act reporting obligations and file Form 10-K. Entirely absent from equity-focused databases (Compustat, CRSP, Bloomberg equity data). Their 10-Ks contain the most detailed covenant and debt structure disclosures in the 10-K universe: covenant packages, restricted payment baskets, intercreditor arrangements, and subordination hierarchies.
Foreign private issuers on domestic forms: FPIs where U.S. residents hold more than 50% of outstanding voting securities cannot claim FPI status and must file on domestic forms including Form 10-K. A small but notable sub-population.
Companies in bankruptcy or financial distress: Exchange Act reporting obligations continue through Chapter 11 reorganization and, in many cases, through Chapter 7 liquidation, until deregistration is formally effective. Their 10-Ks include going concern opinions (Item 8 audit report), DIP financing disclosures (Item 7), and reorganization plan summaries. Because this dataset is survivorship-bias-free, all such filings are included.
| Filer Category | Public Float | Deadline After Fiscal Year End |
|---|---|---|
| Large accelerated filer | ≥ $700 million | 60 days |
| Accelerated filer | $75 million – < $700 million | 75 days |
| Non-accelerated filer | < $75 million or no public float | 90 days |
Public float is measured as of the last business day of the most recently completed second fiscal quarter. A 15-day automatic extension is available via Form 12b-25 (NT 10-K). For December 31 fiscal year end filers — the largest cohort — the peak filing window runs approximately February 1 through March 31.
| Dataset | Filer Population | Period | Audited Financials | Key Distinction |
|---|---|---|---|---|
| Form 10-K (this dataset) | Domestic issuers + FPIs on domestic forms | Annual | Yes (U.S. GAAP) | Full text, all 8 variants, 1993–present, daily updates, survivorship-bias-free |
| Form 20-F | Foreign private issuers (non-Canadian) | Annual | Yes (IFRS or U.S. GAAP) | Mutually exclusive filer population; IFRS permitted; 4-month deadline; different item structure |
| Form 40-F | Canadian MJDS issuers only | Annual | Yes (IFRS or Canadian GAAP) | Smallest annual report population; content from Canadian regulatory filings |
| Form 10-Q | Same as 10-K | Quarterly | No (reviewed, not audited) | Condensed; no full business description; three filed per fiscal year |
| DEF 14A | Exchange Act registrants soliciting proxies | Annual (pre-meeting) | No | Contains Part III disclosures often incorporated by reference into 10-K; not a periodic report |
| Form 8-K | Exchange Act registrants | Event-driven | Varies | Real-time single-event disclosure; not an annual report |
| Form 10-KSB (retired) | Small U.S. domestic issuers (pre-2009) | Annual | Yes | Included in this dataset as 10-KSB/10-KSB/A variants; compressed item structure; retired 2008 |
What makes this dataset distinct from commercial alternatives:
Quantitative researchers and financial data scientists use Item 7 for MD&A tone signals and Item 8 with iXBRL to build survivorship-bias-free financial panel datasets that extend or supplement Compustat with bankrupt and deregistered filers. They run longitudinal factor models with 30+ year backtesting windows and use Item 1A to study risk category emergence and proliferation over time.
Credit analysts and fixed income researchers parse Item 8 debt footnotes for covenant terms, maturity schedules, EBITDA definitions, and restricted payment baskets. This dataset is especially valuable for high-yield bond issuers and debt-only registrants — companies with no public equity that are entirely absent from equity-focused databases but file Form 10-K with the most disclosure-rich covenant and debt structure language in the annual report universe.
NLP engineers and AI/ML practitioners use the 298,000+-filing full-text corpus for LLM pre-training, domain adaptation, and fine-tuning. The consistent item structure provides labeled training data for section segmentation without manual annotation. iXBRL tags enable text-to-structured-value alignment for financial NER models. RAG pipeline builders use item-level metadata to enable filtered retrieval by company, year, and disclosure type.
Compliance officers and regulatory monitoring teams ingest new 10-K filings daily to monitor material weakness disclosures (Item 9A), legal proceedings (Item 3), cybersecurity governance (Item 1C, post-2023), and PCAOB access risk (Item 9C) for counterparties and portfolio companies. Daily update cadence enables real-time alerts across all registrant types — including SPACs, BDCs, shell companies, and debt-only registrants not covered by commercial vendor surveillance products.
M&A analysts and investment bankers extract business descriptions (Item 1), audited historical financials and segment data (Item 8), and litigation exposure (Item 3) for due diligence, LBO and DCF modeling, and comparable company analysis. The survivorship-bias-free corpus supports precedent transaction research on companies that are no longer publicly traded.
Academic researchers in accounting, finance, and economics require the survivorship-bias-free full-population coverage, 33-year historical depth, and item-level text for empirical studies on disclosure quality, earnings quality, material weakness consequences, regulatory effects, and fraud prediction. The 10-KSB and 10-KT variants allow inclusion of small-filer and fiscal year transition observations typically excluded from commercial data providers.
Financial data vendors and platform operators use the raw corpus as an upstream source to build structured fundamental databases, full-text search indexes, and derived NLP analytics products. Original format preservation eliminates normalization constraints that would otherwise limit downstream product design.
An accounting researcher studying whether material weakness disclosures in Item 9A predict subsequent earnings restatements extracts Item 9A text from all 10-K and 10-K/A filings from 2004 to 2024. Because the dataset includes bankrupt and deregistered filers — disproportionately likely to have material weaknesses — the sample is free of survivorship bias. Restated fiscal years are identified from 10-K/A filings containing restated Item 8 financial statements, identifiable from the amendment cover page and restated financial statement headers. The (registrant, fiscal year) panel is linked by CIK to Compustat (GVKEY crosswalk) for financial control variables and to Audit Analytics for auditor identity. The result enables regression analysis of disclosure quality and audit outcomes across the full domestic filing population.
An NLP engineering team at a financial technology company extracts Item 1A text from all 10-K HTML filings from 2005 (when Item 1A became a required standalone item) to the present, segments each filing's risk section into individual risk factor paragraphs using heading-detection heuristics, and labels each paragraph with CIK, fiscal year, SIC code, and filer category. The resulting corpus of approximately 15–20 million paragraphs is used to domain-adapt a transformer model and fine-tune a binary classifier distinguishing boilerplate from specific, material, and forward-looking risk language. The production pipeline processes each newly filed 10-K within one business day of EDGAR acceptance and surfaces meaningfully changed risk factors using cosine similarity comparison against the same registrant's prior-year Item 1A.
A credit-focused hedge fund ingests 10-K and 10-K/A filings for approximately 80 portfolio bond issuers, including roughly 30 debt-only registrants absent from all equity databases. An automated pipeline extracts Item 8 long-term debt footnote text and parses it for covenant terms (leverage ratio, interest coverage, restricted payment baskets), upcoming maturity dates, and change-of-control provisions. Year-over-year diff comparison of covenant language identifies basket erosions and covenant amendments before they trigger a ratings action. Item 9A material weakness disclosures generate an immediate escalation alert to the portfolio manager. Item 7 Liquidity and Capital Resources text is extracted to track management's forward-looking refinancing narrative.
A systematic equity fund processes each newly filed 10-K through a daily pipeline that extracts Item 7 text, computes Loughran-McDonald positive and negative word count scores normalized by total word count, and compares the score to the same registrant's prior-year Item 7 retrieved by CIK and fiscal year − 1. The tone change delta is normalized within the registrant's SIC-2-digit industry peer group (all registrants filing 10-Ks in the same 60-day rolling window). Registrants in the bottom decile of peer-group-normalized tone change generate a short-side signal; top-decile registrants generate a long-side signal. Both are fed as inputs to the fund's multi-factor return prediction model.
A legal technology company builds a retrieval-augmented generation system over 10-K filings for a transaction target and comparable companies. Each 10-K HTML document is loaded, stripped of navigation headers and CSS, segmented into item-level sections, and chunked at the paragraph level (300-500 tokens per chunk). Each chunk is tagged with CIK, registrant name, accession number, fiscal year, item number, and form type, embedded with a financial-domain model, and loaded into a vector store. M&A attorneys query the system in natural language — "What legal proceedings did the target disclose in the last three fiscal years?" or "Has this company ever disclosed a material weakness?" — and receive retrieved paragraphs with source citations grounded in the original SEC filing. The 10-K/A inclusion ensures restated financials and amended governance disclosures are indexed alongside originals.
A compliance officer at a SPAC-focused investment firm monitors 25 pre-combination SPACs through a daily pipeline that extracts Item 1 (trust account balance, extension terms, combination search status), Item 1A (extension risk, redemption risk, and liquidation risk language), and Item 7 Liquidity and Capital Resources (operating cash burn outside trust and remaining runway). 10-KT filings are flagged for additional review as indicators of a recent or imminent de-SPAC transaction or fiscal year change. Regulatory calendar alerts are set automatically 30, 60, and 90 days ahead of any extension deadline extracted from the Item 1 narrative.
A financial data vendor extends its normalized fundamental database to 1993 by parsing TXT/SGML and early HTML 10-K primary documents from the pre-XBRL era. For TXT filings, the pipeline strips SGML wrappers, identifies financial statement sections using regex on standardized headers ("CONSOLIDATED STATEMENTS OF OPERATIONS," "CONSOLIDATED BALANCE SHEETS"), and parses fixed-width ASCII columns using position-based extraction. For HTML filings, a DOM-based table extractor resolves colspan merged headers and maps extracted line items to canonical taxonomy concepts via fuzzy label matching. Output rows in the form (CIK, accession number, fiscal year end, statement type, line item label, value, scale, unit) enable multi-decade factor construction and backtesting over a 30+ year window — covering the full EDGAR electronic filing era before the XBRL mandate.
Container format: ZIP. The full dataset is distributed as a ZIP archive of primary filing documents.
Content types: HTML (for filings from approximately 2000 onward), and TXT with SGML wrappers (for older filings and some modern filers that continue to file in plain text).
Record identifiers: Each record carries the registrant's CIK, the accession number (unique per EDGAR submission), the form type, and the period of report (fiscal year end date).
iXBRL availability: Not present before 2009. For 2009–2019 filings, XBRL data was filed as a separate instance document (excluded from this dataset). For 2020–2021 onward (phased by filer category), iXBRL is embedded in the HTML primary document and is included.
Volume context: For December 31 fiscal year end registrants — the majority of U.S. public companies — the peak 10-K filing window runs approximately February 1 through March 31, accounting for several thousand filings in this period. Non-December fiscal year filers are distributed throughout the year, producing a continuous daily stream.
Intended use: Bulk download and large-scale corpus processing. For single-document lookup, EDGAR full-text search or the EDGAR filing viewer is more efficient than bulk dataset access.
Does this dataset include 10-K/A amendments? Yes. 10-K/A filings are included as separate records, each with its own accession number and filing date. The original 10-K and all subsequent amendments for the same registrant and fiscal year share the same CIK and period-of-report date but differ in accession number. Users building point-in-time databases must define a version selection rule — original-as-filed, most-recently-amended, or all versions — and apply it consistently.
Are Form 20-F filings (foreign private issuers) included? No. Form 20-F is filed by foreign private issuers, a filer population that is largely mutually exclusive from the domestic 10-K population. For global coverage of EDGAR annual reports, a separate 20-F dataset is required.
What are the 10-K405 and 10-K405/A form types? These are legacy form types used before 2002. The "405" designation referred to a checkbox on the cover page for compliance with Exchange Act Rule 405 (relating to stock ownership reporting by insiders). The content and item structure of 10-K405 filings are functionally identical to standard 10-K filings of the same era. They are included in this dataset under their original EDGAR form type identifiers.
What happened to Form 10-KSB? Form 10-KSB was retired by SEC rule effective February 4, 2008. Smaller reporting companies transitioned to Form 10-K with scaled disclosure accommodations for fiscal years beginning after December 15, 2007. Historical 10-KSB filings through approximately early 2009 are included in this dataset. Note that 10-KSB uses a different item numbering structure (Items 1–13) and Article 8 of Regulation S-X financial statement rules, which differ from standard 10-K requirements.
How are Part III disclosures handled when incorporated by reference from the proxy statement? When a registrant incorporates Part III items (10–14) by reference from its DEF 14A proxy statement, the 10-K primary document contains only a cross-reference sentence — not the substantive director, compensation, or ownership disclosures. Those disclosures are in the DEF 14A, which is a separate EDGAR filing not included in this dataset. For research requiring full Part III data, users must cross-reference the corresponding DEF 14A by CIK and reporting period.
Does this dataset include structured financial data, or only the original documents? The dataset provides original filing documents in HTML and TXT format. For filings from approximately 2020–2021 onward (phased by filer category), inline XBRL embedded in the HTML provides machine-readable structured financial values. For earlier filings, financial values must be extracted from HTML tables or ASCII text using a parsing pipeline. No pre-normalized structured financial database is included — this dataset is the source material from which such databases are built.
Are debt-only registrants included? Yes. High-yield bond issuers and other companies with only registered debt securities — no publicly traded equity — incur Exchange Act reporting obligations and file Form 10-K. They are included in this dataset and are typically absent from all equity-focused databases (Compustat, CRSP, Bloomberg equity). Their annual reports typically contain the most detailed covenant and debt structure disclosures in the 10-K universe.
Does the dataset cover the new Item 1C Cybersecurity disclosures? Yes. Item 1C was required for fiscal years ending on or after December 15, 2023. Filings for fiscal year 2023 onward include this item where applicable. Filings before fiscal year 2023 do not contain Item 1C. New filings with cybersecurity disclosures are available within one business day of EDGAR acceptance due to the daily update cadence.
What is the XBRL coverage gap for 2009–2019 filings? From 2009 to approximately 2020–2021, EDGAR required XBRL data to be filed as a separate instance document (.xml file) rather than embedded in the primary HTML document. Those standalone XBRL files were attached as separate exhibits to the EDGAR submission — not part of the primary filing document. This dataset includes only the primary filing document, so standalone XBRL instance documents from the 2009–2019 era are excluded. For structured financial data extraction from that period, users must parse HTML table content directly from the primary document. From 2020 onward (large accelerated filers) and 2021 onward (all others), inline XBRL is embedded in the primary HTML document and is present in this dataset.
How does the 10-K filing season affect daily volume? For December 31 fiscal year end filers — the majority of U.S. public companies — the peak filing window is roughly February 1 through March 31 (60–90 days after fiscal year end, with 12b-25 extensions allowed). This produces several thousand new 10-K records during that window. Non-December fiscal year filers produce filings throughout the year. The daily update cadence ensures records are available within one business day of EDGAR acceptance regardless of filing season.