The Form 10-KSB Files Dataset is a closed historical corpus of small-business annual reports filed on EDGAR, covering every Form 10-KSB and Form 10-KSB/A submission accepted by the SEC from March 1, 1994 through the form's retirement in March 2009 and the trailing amendments that continue to arrive against pre-retirement fiscal years. Each record represents a single EDGAR accession — one annual report or one amendment — filed by a domestic "small business issuer" that qualified under Regulation S-B (revenues and public float both below $25 million) and elected the scaled reporting alternative to Form 10-K. Records are packaged as accession-numbered folders that bundle a normalized metadata.json manifest alongside the full set of SGML-wrapped documents from the original EDGAR submission — the primary 10-KSB body, exhibits, Sarbanes-Oxley certifications, correspondence, and supplemental binary attachments — with raster image binaries excluded but referenced. Folders are grouped into monthly ZIP containers and cover the full lifecycle of Regulation S-B, from the 1994 phase-in of mandatory EDGAR filing through the 2008–2009 wind-down that folded scaled disclosure into the "smaller reporting company" provisions of Form 10-K.
Programmatically retrieve the full list of dataset archive files, download URLs and dataset metadata.
Dataset Index JSON API
Download the entire dataset as a single archive file.
Download Entire Dataset:
Download a single container file (e.g. monthly archive) from the dataset.
Download Single Container:
The dataset is an exhaustive collection of every Form 10-KSB and Form 10-KSB/A submission accepted by EDGAR across the form's life. Form 10-KSB was the small-business annual report form under Section 13 or 15(d) of the Securities Exchange Act of 1934, filed by registrants that qualified as "small business issuers" under Regulation S-B. It was an optional, scaled alternative to Form 10-K, with simplified disclosure instructions keyed to Regulation S-B rather than Regulation S-K. The form was accepted on EDGAR from the start of modern electronic filing in 1994 and was discontinued for fiscal years ending on or after December 15, 2007, with the final wave of originals clearing through the first quarter of 2009. Form 10-KSB/A is the amendment variant, used to restate, correct, or supplement a previously filed 10-KSB; amendments to pre-retirement periods continue to arrive on the 10-KSB/A form type rather than being refiled as 10-K/A.
The record unit is the filing (the EDGAR accession) — not the registrant, not the fiscal year, and not an individual exhibit. Each annual report is represented once, and each amendment of that annual report is a separate record with its own accession number and the 10KSB/A form type. Records are distributed as monthly ZIP containers named YYYY-MM.zip; decompressing a container yields a single YYYY-MM/ directory whose child folders each correspond to exactly one filing. Folder names are the 18-digit EDGAR accession number with dashes removed (e.g. accession 0001062993-09-000823 becomes folder 000106299309000823). The accession folder is the atomic record; every file inside it belongs to that one filing. File types found across the corpus are TXT, JSON, HTML, PDF, XFD (Xerox Formatted Document, a legacy EDGAR layout format), and FRM (form-data binary).
Each accession folder is organized around two layers:
metadata.json — that captures filing-level metadata derived from the EDGAR submission header.<DOCUMENT> SGML envelope. The primary 10-KSB (or 10-KSB/A) body is one of these documents; the remainder are exhibits, Sarbanes-Oxley certifications, and occasional binary supplements.File naming inside the accession folder is not standardized: registrants submit documents under whatever names they chose, so the primary body may appear as form10ksb.htm, <registrant>_10ksb.txt, ttii10ksb_a3.htm, or any analogous variant, and exhibits follow loose conventions such as ex31, ex32, or ex10. The only filename guaranteed in every record is metadata.json. File extensions found in the dataset are .txt, .htm/.html, .pdf, .xfd, and .frm, alongside the .json manifest. Raster images (.jpg, .gif) that were part of the original submission are deliberately omitted from the ZIP payload, though they remain enumerated in documentFormatFiles so their existence, size, and EDGAR URL stay discoverable.
metadata.json schemaThe manifest is a single JSON object describing the filing as a whole. It carries the following fields:
formType — "10KSB" or "10KSB/A".accessionNo — the dashed EDGAR accession number (e.g. "0001062993-09-000823").description — the human-readable form description from the EDGAR header, with [Amend] appended for amendments.filedAt — the filing timestamp as an ISO-8601 datetime with timezone offset.periodOfReport — the fiscal period end date in YYYY-MM-DD form.linkToFilingDetails — absolute URL of the primary document on sec.gov/Archives/edgar/....linkToTxt — absolute URL of the complete SGML submission bundle.linkToHtml — absolute URL of the EDGAR filing index page.linkToXbrl — empty string for every 10-KSB filing, since the form predates and was never within scope of the XBRL financial-reporting mandate.documentFormatFiles — an array of objects, one per item in the original EDGAR submission. Each entry carries sequence (the EDGAR ordering integer as a string; blank for the complete-submission rollup), size (byte size as a string), documentUrl (absolute EDGAR URL), description (human label such as "FORM 10KSB", "CERTIFICATION", or "SUPPLEMENTAL PDF"), and type (the EDGAR document-type code such as 10KSB, 10KSB/A, EX-31.1, EX-31.2, EX-32, EX-32.1, EX-32.2, EX-10.BB, EX-10.CC, GRAPHIC, CORRESP, or blank for the submission bundle). Entries typed GRAPHIC remain present for reference even though the underlying binary is excluded from the ZIP.dataFiles — an array reserved by EDGAR for structured data attachments (XBRL instance documents, R-files, exhibit datasets). Empty for every 10-KSB filing.entities — an array of filer-entity objects, typically of length one. Each entity carries companyName (with a trailing (Filer) role suffix), cik (ten-digit zero-padded), irsNo (EIN), fileNo (SEC file number, e.g. "000-12132"), filmNo (SEC film number), type (filing role code matching formType), sic (SIC code concatenated with its industry name, e.g. "1040 Gold and Silver Ores"), stateOfIncorporation (two-letter state or country code such as "WA", "NV", or "A1" for British Columbia; may be absent), fiscalYearEnd in MMDD form, act (Exchange Act number, usually "34"), and, when the registrant has a listed ticker, tickers (an array of ticker symbols).id — an opaque 32-character hexadecimal identifier assigned by the dataset publisher.Every non-metadata file in the accession folder — whether .txt, .htm/.html, .pdf, .xfd, or .frm — is an EDGAR SGML-style document. The wrapper is identical across extensions:
1
<DOCUMENT>
2
<TYPE>10KSB/A
3
<SEQUENCE>1
4
<FILENAME>a5917242.txt
5
<DESCRIPTION>MPM TECHNOLOGIES, INC. 10KSB/A
6
<TEXT>
7
... body ...
8
</TEXT>
9
</DOCUMENT>
Header tags that consistently appear are <TYPE> (EDGAR document-type code, matching documentFormatFiles[].type), <SEQUENCE> (ordering integer matching documentFormatFiles[].sequence), <FILENAME> (original filename, matching the on-disk filename), and <DESCRIPTION> (human description; sometimes absent for graphics or bundles). The payload sits between <TEXT> and </TEXT>. For .htm artifacts the payload is a complete <HTML><HEAD>...<BODY>...</BODY></HTML> document; for .txt artifacts it is fixed-width ASCII, often with embedded <PAGE> markers delimiting the printed pages of the legacy paper form; for .pdf and other binary artifacts the payload is base64-encoded or uuencoded binary inside the same envelope. The SGML header sits outside the HTML document, so consumers who pass raw files directly to an HTML parser will see the <DOCUMENT>...<TEXT> preamble as leading noise; stripping the SGML wrapper before parsing is normally required.
The primary 10-KSB or 10-KSB/A document typically opens with the SEC letterhead block ("SECURITIES AND EXCHANGE COMMISSION / WASHINGTON, D.C. 20549 / FORM 10-KSB"), followed by the annual-vs-transition-report checkbox, the fiscal year covered, the Commission File Number, the registrant's exact legal name, its state or jurisdiction of incorporation, IRS EIN, principal executive office address and telephone, the Section 12(b)/12(g) registration statement listing titles and exchanges of registered securities, revenue for the most recent fiscal year, aggregate market value of voting and non-voting common equity held by non-affiliates, and the number of shares outstanding of each class as of a stated recent date. Many filings also include the documents-incorporated-by-reference notice on the cover page.
From there, the body follows the four-Part Regulation S-B structure:
Where the registrant elects to incorporate Part III by reference from a definitive proxy statement, Part III of the 10-KSB contains only a cross-reference note and the substantive compensation and governance content lives in a separate DEF 14A filing that is not part of this dataset. When the proxy is not timely filed, an amendment (10-KSB/A) typically supplies the Part III content directly.
Exhibits are submitted as separate files alongside the primary body and are enumerated in documentFormatFiles. The most common exhibit types encountered are:
EX-31.1 / EX-31.2 — Sarbanes-Oxley Section 302 certifications, signed individually by the CEO and the CFO, attesting to the accuracy and completeness of the annual report and to the effectiveness of disclosure controls and internal control over financial reporting.EX-32 / EX-32.1 / EX-32.2 — Sarbanes-Oxley Section 906 certifications, attesting that the report fully complies with Exchange Act requirements and fairly presents the issuer's financial condition and results. Depending on the registrant's convention these appear as one combined EX-32 or as two individual files.EX-10.xx — material contracts: employment agreements, assignment agreements, loan agreements, license agreements, indemnification agreements, and similar operative documents.EX-21 — subsidiaries of the registrant.EX-23 — consents of independent registered public accounting firms.EX-14 — code of ethics.CORRESP — correspondence with SEC staff, occasionally included in the submission.GRAPHIC — image attachments referenced in the submission; enumerated in metadata but excluded from the ZIP payload.Supplemental PDF attachments (maps, product diagrams, scanned exhibits) are retained and appear in the accession folder with type equal to the parent form type or an appropriate exhibit code. Unlike raster images, PDFs are included in the ZIP.
A record contains the normalized metadata.json manifest and every non-image document from the original EDGAR submission, each preserved in its native SGML wrapper: the primary 10-KSB or 10-KSB/A body, every exhibit (Section 302 and Section 906 certifications, material contracts, subsidiary lists, auditor consents, and any other registrant-submitted exhibits), any correspondence items bundled with the submission, and supplemental binary attachments such as PDFs and legacy format files (.xfd, .frm).
Raster image files (.jpg, .gif, and equivalent graphic binaries) that were part of the original EDGAR submission are omitted from the ZIP payload, though their existence remains discoverable through documentFormatFiles entries of type GRAPHIC, which preserve the EDGAR URL, size, and description. There are no XBRL instance documents, schemas, linkbases, or R-files — not because they were stripped, but because Form 10-KSB was never subject to the XBRL financial-reporting mandate; linkToXbrl is an empty string and dataFiles is an empty array for every record. The EDGAR filing-index HTML page itself is not included in the record, though its URL is preserved in linkToHtml.
Form 10-KSB/A records are structurally identical to 10-KSB records: same folder layout, same manifest schema, same SGML-wrapped document set, same exhibit taxonomy. The differences are semantic and are carried in metadata and document-type codes. formType is "10KSB/A", description includes the [Amend] suffix, each entity's type field matches 10KSB/A, and the primary document's <TYPE> header is 10KSB/A. periodOfReport points back to the fiscal year end of the original annual report rather than to the amendment's filing date, which makes it straightforward to link amendments to the underlying reporting period.
In content, an amendment may be a full re-submission with all Parts reproduced, or a targeted amendment replacing only a specific Part or Item — most commonly Part III executive-compensation and governance disclosures when the registrant intended to incorporate Part III by reference from a proxy statement that was never timely filed, or Part II Item 7 financial statements following a restatement. Exhibits and certifications are typically re-filed with the amendment, and the primary body usually opens with an explanatory note identifying the original accession, the reason for the amendment, and the scope of changes. Multiple amendments to the same original can coexist and are distinguished by filing date and accession number; filenames often encode the amendment ordinal (e.g. ttii10ksb_a3.htm for a third amendment). Amendments do not replace originals in the dataset: both the original 10-KSB and any 10-KSB/A amendments coexist as separate records, and downstream uses that want a single "final" view of an annual report must collapse them on (cik, periodOfReport) using filedAt ordering.
Regulation S-B remained comparatively stable across the form's life, but several rule changes altered required content during the period the dataset covers:
EX-31.x and EX-32.x exhibits, while post-2003 filings carry both.Form 10-KSB was filed electronically on EDGAR from the form's inception through its retirement, and presentation evolved across that window:
.txt) with embedded <PAGE> markers inside the SGML <TEXT> payload, reflecting the paginated typography of the paper form. Tables are rendered with spaces and rules; financial statements appear as ASCII columns. Legacy layout artifacts such as .xfd (Xerox Formatted Document) and .frm occasionally accompany these text filings..htm/.html), producing richer typography, true tables, and embedded styles inside the same SGML <DOCUMENT> wrapper. By the final years of the form's life, HTML is dominant and ASCII submissions are the minority.Regardless of format, the EDGAR SGML envelope (<DOCUMENT> / <TYPE> / <SEQUENCE> / <FILENAME> / <DESCRIPTION> / <TEXT>) is uniformly present across both eras, giving records a consistent extraction surface.
Several nuances matter for working with records:
documentFormatFiles array is an authoritative inventory of the original EDGAR submission, not of the ZIP payload. Entries typed GRAPHIC remain present even though the files themselves are excluded; consumers should filter by type or extension when iterating on-disk.documentFormatFiles[].type equalling 10KSB or 10KSB/A, or on the <TYPE> tag of the SGML header, rather than on filename heuristics.periodOfReport is the cleanest field for aligning records by fiscal year; filedAt reflects filing date and can lag the period end by several months, especially for amendments.entities[].tickers, they reflect the registrant's listing at the time of filing; historical ticker changes and delistings are not reconciled within the record.entities[].cik and periodOfReport is the reliable way to assemble a registrant timeline.Each record is an annual report (or amendment) filed on EDGAR by the registrant itself — a U.S. or Canadian domestic issuer that qualified as a "small business issuer" under Regulation S-B and elected the scaled annual-report form instead of Form 10-K. The filer is a public reporting company whose securities are registered under Section 12 of the Securities Exchange Act of 1934 or that is subject to Section 15(d) periodic reporting. Each accession number maps to one issuer's fiscal-year filing; records on form type 10-KSB/A are amendments filed by the same registrant.
Eligibility was set by Item 10(a)(1) of Regulation S-B (17 CFR 228.10). A filer had to meet all of the following at the time of filing:
A parent qualified only if its consolidated revenues and public float both met the thresholds. A company that crossed a threshold could continue using the small business forms for a one-year transition before moving to Form 10-K.
The practical filer population was dominated by micro- and nano-cap issuers: early-stage and development-stage operating companies, community and regional bank holding companies, former and current shell companies (often post-reverse-merger), exploration-stage mining and resource companies, small biotechs, and thinly traded OTC issuers.
Form 10-KSB implemented the annual reporting obligation under:
Implementing rules were Rule 13a-1 and Rule 15d-1. Content requirements came from Regulation S-B (17 CFR Part 228), the SEC's integrated small-business disclosure system adopted in 1992 (Release No. 33-6949). Regulation S-B paralleled Regulation S-K but required less — fewer years of selected financial data, simplified executive compensation, narrower MD&A, and, under Item 310, two years of audited financial statements rather than three.
The triggering event was the close of the registrant's fiscal year. The report was due 90 calendar days after fiscal year end, per the form's General Instructions and Rules 13a-1 / 15d-1. Small business issuers were not subject to the 75- or 60-day accelerated-filer deadlines adopted in 2002 (Release No. 33-8128); they sat below those public-float thresholds by definition and kept the 90-day window throughout the form's life.
A Rule 12b-25 extension of up to 15 calendar days was available via a Form NT 10-KSB notice, pushing the effective deadline to day 105 after fiscal year end. NT 10-KSB filings are not included in this dataset.
Form 10-KSB was always an election. A qualifying issuer could file Form 10-K instead on any given year, and some did — typically to signal fuller disclosure to investors, lenders, or acquirers, or while preparing to exit the small-business regime. The dataset therefore contains only annual reports where the filer actually chose 10-KSB, not every annual report by every 10-KSB-eligible company.
Records with form type 10-KSB/A are amendments to a previously filed Form 10-KSB. Amendments had no fixed deadline and were filed whenever the original report needed correction or supplementation. Common triggers:
Amendments rest on the same Exchange Act authority as the original filing and do not restart the 90-day clock.
Release No. 33-8876 ("Smaller Reporting Company Regulatory Relief and Simplification," adopted December 19, 2007; effective February 4, 2008) restructured the regime. It:
Under the transition, no Form 10-KSB could be filed for a fiscal year ending on or after October 31, 2008, and the form was formally retired effective March 15, 2009. Qualifying issuers moved to Form 10-K with SRC scaled accommodations.
The dataset spans EDGAR submissions from March 1, 1994 — coinciding with the phase-in of mandatory EDGAR filing — through the form's 2009 wind-down. A tail of late 10-KSB/A amendments to pre-cutoff fiscal years continues to appear after March 2009, because amendments to an original 10-KSB remained on the 10-KSB/A form type rather than being refiled as 10-K/A. No original 10-KSB was accepted for fiscal years ending on or after October 31, 2008.
Form 10-KSB sits at the intersection of domestic annual reporting and the retired Regulation S-B small-business regime. The most useful comparisons are to its full-disclosure counterpart (10-K), its post-2009 functional successor (10-K with Smaller Reporting Company accommodations), its amendment variant (10-KSB/A), its quarterly sibling (10-QSB), and its registration companion (SB-2).
Same statutory purpose (Section 13/15(d) annual report for domestic issuers) but filed under Regulation S-K rather than the simplified Regulation S-B. A 10-K from the same period is longer, requires three years of audited income statements and cash flows (versus two for 10-KSB), carries more extensive segment and market-risk disclosure, and uses a different item taxonomy. The 10-K corpus is continuous 1994-present and far larger; the 10-KSB corpus is closed. Use 10-KSB data when the small-issuer tail from 1994-2009 must be represented.
Effective March 15, 2009, the SEC retired 10-KSB and moved scaled accommodations into Form 10-K for issuers meeting the SRC thresholds. These successor filings sit in the 10-K dataset, not this one. Content is functionally similar (reduced compensation tables, two years of income-statement history, lighter market-risk disclosure), but SRC 10-Ks use 10-K item numbering and carry iXBRL financial exhibits, while 10-KSB filings use S-B numbering and have no XBRL at all. Any study of small-issuer annual reporting across the 2009 boundary must union this dataset with SRC-flagged 10-Ks.
Included in this dataset alongside originals. The /A suffix marks restatements, corrections, late Part III information, or responses to SEC comments. Unlike original 10-KSBs — not accepted after March 2009 — 10-KSB/A filings continue to arrive when issuers amend pre-2009 reports. Treat them as a correction layer over the underlying filing; many amendments replace only specific items or exhibits rather than restating the full report.
Quarterly counterpart under Regulation S-B, retired on the same March 2009 timeline. Same filer population and disclosure regime, but covers three months, is unaudited, and omits the full business description, risk factors, and governance disclosures. For a given issuer, Form 10-QSB filings are the three interim updates within the fiscal year anchored by the 10-KSB capstone. Natural complement when reconstructing a small-issuer filing timeline.
The S-B registration statement for small-issuer securities offerings, also retired in 2008. Transactional rather than periodic: shares S-B disclosure vocabulary and audited-financials content with 10-KSB, but adds offering-specific sections (use of proceeds, plan of distribution, underwriting) and is filed at the point of an offering rather than annually. Useful paired with 10-KSB when tracking an issuer's full disclosure history, not as a substitute. (See Form SB-2.)
This dataset is distinct in four concrete ways: it is a closed original-filing corpus (post-March-2009 originals do not exist, though /A amendments still arrive); it is the only dataset scoped to the Regulation S-B disclosure vocabulary and item numbering; it is the authoritative source for the small-business issuer tail from 1994-2009 that is otherwise absent from 10-K data; and it is pre-XBRL, so financial-figure extraction must come from narrative text and tables, not tagged instance documents. When any of these attributes matter, none of the comparison datasets above substitutes for it.
Because 10-KSB covered issuers below the $25 million revenue and public-float thresholds, users are professionals working on micro-cap operating companies, shell vehicles, development-stage issuers, and pre-reverse-merger entities.
Reconstruct pre-2009 operating histories of issuers that graduated to 10-K, were acquired, reverse-merged, or went dark. Primary value: MD&A, going-concern language, scaled financials, Exhibit 21 subsidiary lists, and metadata linking former CIKs and names.
Treat the 1994-2009 10-KSB population as a natural-experiment cohort for scaled-disclosure, audit-selection, and reverse-merger studies. Use the full corpus plus metadata (CIK, SIC, period) and Section 302/906 certifications to build panels on disclosure quality, restatements, and internal-control representations.
Reconstruct fact patterns for issuers later subject to enforcement, receivership, or trustee work. Focus on auditor changes, missing certifications, going concern paragraphs, related-party disclosures, and the 10-KSB/A amendment trail. Exhibit 21 maps affiliate structures; metadata timestamps anchor investigative timelines.
Link modern registrants back to pre-2009 shells via reverse merger, name change, or Rule 12g-3 succession. Use the primary 10-KSB for historical risk factors, capitalization, and litigation; metadata for former CIKs and names; amendments for corrections bearing on historical representations. Supports due-diligence letters, successor-liability analysis, and SEC comment-letter responses.
Extend fundamental panels backward for issuers too small to appear in mainstream databases. Extract line items, auditor identity, and MD&A features from the primary document; use accession and period dates for point-in-time alignment; use 10-KSB/A history to avoid look-ahead bias around restatements.
Inspect the full filing history of candidate shells and their predecessors. Pull every 10-KSB and 10-KSB/A under a CIK, read going-concern and liquidity sections, and check certification continuity and dormant-subsidiary disclosures. Outputs feed red-flag memos and negotiation of reps, warranties, and indemnities. (For background on de-SPAC transactions, see the SEC's SPAC-related materials.)
Reconstruct capital structure and covenant history for defaulted micro-cap debt, legacy private placements, and abandoned securities whose issuers never filed a full 10-K. Use debt notes, liquidity discussion, and contingent-liability disclosures to evaluate recovery claims.
Integrate 10-KSB into CIK-keyed historical filing inventories. Use metadata.json to normalize CIK, filer name, filing date, period, and document inventory; ingest all submission files to support downstream text parsing, table extraction, and amendment diffing.
Use 10-KSB as a distinct training domain: MD&A, risk factors, and going-concern language written under scaled disclosure differ stylistically from large-issuer 10-Ks. 10-KSB/A amendment pairs supply supervised data for redline, change-detection, and restatement-classification tasks.
Study Regulation S-B in practice and the 2009 transition to scaled disclosure inside Form 10-K. Use metadata for cohort construction across the form's lifecycle and exhibits for qualitative analysis of how small issuers implemented scaled requirements.
Concrete workflows the Form 10-KSB Files Dataset supports, grounded in its metadata manifest, SGML-wrapped primary body, exhibit inventory, and closed 1994-2009 scope.
Pull every accession under a target CIK, sort by periodOfReport, and read the primary 10-KSB body for capitalization, authorized-share changes, and legal-proceedings history. Cross-reference entities[].companyName and fileNo across years to surface name changes and Rule 12g-3 successions, and inspect EX-21 subsidiary lists for dormant affiliates that survived a reverse merger. Output: a predecessor-tracing memo with historical reps tied to specific accessions.
Filter records where formType equals 10KSB/A and group by (entities[].cik, periodOfReport) to isolate amendments against each annual period. Parse the explanatory note at the top of each amended primary body to classify the amendment as a restatement, a Part III insertion, or an exhibit re-file. Output: a labeled corpus of amendment events usable for look-ahead-bias correction in factor research and for supervised restatement classification.
For each record, scan documentFormatFiles[].type for EX-31.1, EX-31.2, EX-32, EX-32.1, and EX-32.2, and join to filedAt and periodOfReport. Flag filings after the August 2002 Sarbanes-Oxley window that are missing expected certifications, and extract CEO/CFO signer names from the exhibit bodies. Output: a compliance-coverage table feeding fraud-risk screens and academic studies of SOX adoption among small issuers.
Extract the auditor's report block and any going-concern paragraph from the primary 10-KSB body, and capture auditor name changes across consecutive fiscal years for the same CIK. Pair with EX-23 consents to confirm the signing firm. Output: an auditor-tenure and going-concern flag series covering issuers too small to appear in Audit Analytics or mainstream fundamentals databases.
Iterate accession folders, locate the file whose documentFormatFiles[].type equals EX-21, strip the SGML wrapper, and parse the subsidiary list into (parent_cik, subsidiary_name, jurisdiction) triples. Chain across years per CIK to detect subsidiary additions, disposals, and shell-entity accumulation. Output: an affiliate graph used in forensic fact reconstruction and de-SPAC diligence on candidate shells.
Use the primary body of each record as a training document for language models targeting small-issuer disclosure style, which differs from Regulation S-K drafting conventions. Construct amendment-original document pairs on (cik, periodOfReport) for redline and change-detection supervision, and mine ASCII-era filings (pre-HTML migration) to build robust parsers for fixed-width <PAGE>-delimited financial tables. Output: a domain-specific tokenizer, an MD&A sentiment model tuned to scaled disclosure, and a supervised amendment-diff dataset.
Because linkToXbrl is empty and dataFiles is empty for every record, financial figures must be pulled from the primary body's narrative tables. Run layout-aware extraction over balance sheet, income statement, and cash-flow sections, tag each figure with periodOfReport and filedAt for point-in-time alignment, and use entities[].sic to stratify by industry. Output: a back-filled fundamentals panel for 1994-2009 micro-caps suitable for factor backtests that would otherwise suffer survivorship bias from mainstream-database coverage gaps.
The Form 10KSB Files Dataset can be accessed in three ways: a metadata index endpoint, a full dataset archive download, and per-container downloads for individual monthly ZIPs.
Dataset Index JSON API: https://api.sec-api.io/datasets/form-10ksb-files.json
Returns dataset-level metadata (name, description, last updated timestamp, earliest sample date, total records, total size, form types, container format, file types), the full archive download URL, and the list of all container files with per-container key, size, records, updatedAt, and downloadUrl. Use this endpoint to discover which monthly containers were refreshed in the latest run and decide which ones to re-download. This endpoint does not require an API key.
Example response:
1
{
2
"datasetId": "1f13365b-9ae0-68fe-b307-8252739128cf",
3
"datasetDownloadUrl": "https://api.sec-api.io/datasets/form-10ksb-files.zip",
4
"name": "Form 10KSB Files Dataset",
5
"updatedAt": "2026-04-14T14:16:59.827Z",
6
"earliestSampleDate": "1996-01-01",
7
"totalRecords": 202195,
8
"totalSize": 3048147976,
9
"formTypes": ["10-KSB", "10-KSB/A"],
10
"containerFormat": "ZIP",
11
"fileTypes": ["TXT", "HTML"],
12
"containers": [
13
{
14
"downloadUrl": "https://api.sec-api.io/datasets/form-10ksb-files/2009/2009-03.zip",
15
"key": "2009/2009-03.zip",
16
"size": 42183920,
17
"records": 1184,
18
"updatedAt": "2026-04-14T14:16:59.827Z"
19
}
20
]
21
}
Download Entire Dataset: https://api.sec-api.io/datasets/form-10ksb-files.zip?token=YOUR_API_KEY
Downloads the full dataset as a single ZIP archive containing every monthly container. This endpoint requires an SEC API key.
Download Single Container: https://api.sec-api.io/datasets/form-10ksb-files/2009/2009-03.zip?token=YOUR_API_KEY
Downloads one monthly container ZIP listed in the containers[].downloadUrl field of the index JSON. Use this for incremental updates or when only a specific time window is needed. This endpoint requires an SEC API key.
Authentication: Pass your SEC API key either via the Authorization header or the token query parameter.
1
# Fetch the index JSON (no API key required)
2
curl -o form-10ksb-files.json \
3
https://api.sec-api.io/datasets/form-10ksb-files.json
4
5
# Download the full archive using the Authorization header
6
curl -H "Authorization: YOUR_API_KEY" \
7
-o form-10ksb-files.zip \
8
https://api.sec-api.io/datasets/form-10ksb-files.zip
9
10
# Download a single monthly container using the token query parameter
11
wget -O 2009-03.zip \
12
"https://api.sec-api.io/datasets/form-10ksb-files/2009/2009-03.zip?token=YOUR_API_KEY"
The dataset covers Form 10-KSB (the original small-business annual report under Regulation S-B) and Form 10-KSB/A (its amendment variant). Both form types satisfy the Section 13(a) or 15(d) annual reporting obligation of the Securities Exchange Act of 1934 for issuers that qualified as small business issuers.
One record is a single Form 10-KSB or Form 10-KSB/A submission to EDGAR, packaged as an accession-numbered folder that holds a normalized metadata.json descriptor plus the full set of SGML-wrapped documents from the original EDGAR submission. The record unit is the filing (the EDGAR accession) — not the registrant, not the fiscal year, and not an individual exhibit.
Form 10-KSB was an optional election available to U.S. or Canadian domestic issuers with revenues and public float both under $25 million that were not investment companies or asset-backed issuers, as specified by Item 10(a)(1) of Regulation S-B. Qualifying issuers could file Form 10-K instead on any given year, so the dataset contains only annual reports where the filer actually chose 10-KSB.
The dataset spans EDGAR submissions from March 1, 1994 — coinciding with the phase-in of mandatory EDGAR filing — through the form's retirement effective March 15, 2009. A tail of 10-KSB/A amendments to pre-cutoff fiscal years continues to appear after March 2009, because amendments to an original 10-KSB remain on the 10-KSB/A form type rather than being refiled as 10-K/A.
Records are grouped into monthly ZIP containers named YYYY-MM.zip, each containing one accession folder per filing. Inside each folder, file types include JSON (the metadata.json manifest), TXT, HTML, PDF, XFD (Xerox Formatted Document), and FRM, with every non-metadata file wrapped in an EDGAR <DOCUMENT> SGML envelope.
No. Form 10-KSB was never subject to the XBRL financial-reporting mandate, so linkToXbrl is an empty string and dataFiles is an empty array for every record. Financial figures must be extracted from the narrative tables inside the primary 10-KSB body, typically using layout-aware parsing over ASCII-era fixed-width tables or HTML tables in later filings.
The 10-KSB corpus is the authoritative source for the small-business issuer tail from 1994-2009 that used Regulation S-B item numbering, two years of audited financial history, and simplified executive compensation and MD&A requirements — none of which appear in standard 10-K filings. Post-March-2009 scaled-disclosure annual reports are filed as 10-K with "smaller reporting company" accommodations and sit in the 10-K dataset, not this one; any analysis that crosses the 2009 boundary must union both datasets.