The Form SB-2 Files Dataset is a closed archive of every Securities Act of 1933 registration statement on Form SB-2 and every amendment on Form SB-2/A submitted to EDGAR by "small business issuers" under former Regulation S-B. Each record is a single EDGAR submission — one accession-number folder containing a structured metadata.json plus every document the registrant filed with the SEC, except binary image files. The dataset spans April 1995 (the start of mandatory EDGAR filing) through the form's 2008 rescission, with later wind-down amendments tied to pre-rescission registration statements. Filers are issuers of the securities being registered: micro-cap companies, shell vehicles, recent reverse-merger entities, and resource-exploration startups that qualified for the Regulation S-B scaled-disclosure regime. The corpus is distributed as monthly ZIP containers organized by year and covers form types SB-2 and SB-2/A.
Programmatically retrieve the full list of dataset archive files, download URLs and dataset metadata.
Dataset Index JSON API
Download the entire dataset as a single archive file.
Download Entire Dataset:
Download a single container file (e.g. monthly archive) from the dataset.
Download Single Container:
The dataset packages the complete EDGAR submission for every SB-2 and SB-2/A accession across the form's regulatory lifespan. Form SB-2 was the long-form Securities Act registration statement for small business issuers under former Regulation S-B, the scaled-disclosure regime that applied to companies with revenues and public float each below the $25 million threshold. Functionally it served the same purpose as Form S-1 — registering securities for sale to the public — but with reduced obligations: shorter selected financial data, two years of audited financial statements rather than three, simpler executive compensation discussion, and abbreviated business and MD&A requirements. The form was promulgated alongside Regulation S-B in 1992, used continuously after EDGAR filing began in April 1995, and rescinded effective in 2008 when the SEC eliminated the "small business issuer" category and replaced it with the "smaller reporting company" framework folded into Regulation S-K.
Every record bundles three layers in one place: a structured JSON manifest describing the filing and the filer, the prospectus and exhibit content of the registration statement itself, and the SGML/EDGAR header that wraps every document body. The dataset preserves the underlying filing as filed — SGML envelopes intact, filer-controlled filenames unchanged, and exhibit ordering as submitted — so it functions as a source-of-truth bundle rather than an extracted slice. Containers are monthly ZIP files (YYYY/YYYY-MM.zip) covering form types SB-2 and SB-2/A, with file payloads in TXT, JSON, HTML, XFD, PDF, and FRM formats.
One record is a single EDGAR submission of either Form SB-2 (an initial small-business issuer registration statement under the Securities Act of 1933) or Form SB-2/A (a pre-effectiveness or post-effectiveness amendment to a previously filed SB-2). On disk, a record is one accession-number subdirectory within a monthly ZIP container. The subdirectory is named with the 18-digit EDGAR accession number with dashes stripped (for example, 0001144204-08-006062 becomes 000114420408006062). Inside that folder sit a single metadata.json plus every document the registrant submitted to EDGAR for that accession, with the sole exception of binary image files.
The underlying document is a prospectus-centric registration statement. It opens with the EDGAR/SGML cover and the form facing page (registrant name, state of incorporation, IRS number, primary SIC code, principal executive offices, agent for service, and the calculation-of-registration-fee table). It then steps through the prospectus proper, which under Regulation S-B item-numbered scaled disclosure typically contains: prospectus cover and outside back cover; prospectus summary; risk factors; cautionary language regarding forward-looking statements; use of proceeds; determination of offering price; dilution; selling shareholders table (where resale is contemplated); plan of distribution; description of securities to be registered; interests of named experts and counsel; description of business; description of property; legal proceedings; market for common equity and related shareholder matters; management's discussion and analysis or, for issuers without revenue history, the SB-2-specific shorter "plan of operation"; changes in and disagreements with accountants; directors, executive officers, promoters and control persons; executive compensation; security ownership of certain beneficial owners and management; certain relationships and related transactions; and the audited financial statements with notes. After the prospectus, Part II of the registration statement contains indemnification of directors and officers, recent sales of unregistered securities, the exhibit index, undertakings, and the signature block executed by the registrant, principal executive officer, principal financial officer, principal accounting officer, and a majority of the directors.
Each accession folder contains exactly one metadata.json and a flat set of document files; there are no nested subfolders. The metadata.json is the structured anchor: it enumerates every document the registrant filed under the accession, identifies the filer entity, and carries the form type, filed-at timestamp, accession number, and SEC file number. The companion document files carry the prospectus and exhibit content. Every document file — whether named with .htm, .html, .txt, .frm, .xfd, or .pdf — is wrapped in EDGAR's SGML <DOCUMENT> envelope, so each file begins with the pattern
1
<DOCUMENT>
2
<TYPE>SB-2
3
<SEQUENCE>1
4
<FILENAME>...
5
<DESCRIPTION>...
6
<TEXT>
7
... document body ...
8
</TEXT>
9
</DOCUMENT>
The <TYPE> tag holds the EDGAR document classifier (SB-2, SB-2/A, EX-3.1, EX-5.1, EX-10.3, EX-23.1, EX-24.1, EX-99, CORRESP, etc.), and <SEQUENCE>1 is reserved for the main registration statement. Inside <TEXT>, the body is either a full <html>…</html> document (often emitted by filing-agent tooling such as Vintage Filings's EDGARizer, RDG, or Donnelley, with embedded inline CSS and styled tables) or a legacy fixed-width ASCII layout using EDGAR's <TABLE>/<S>/<C> financial-table markers to delimit columnar financial data.
metadata.jsonmetadata.json mirrors the SEC-API filing-object shape and exposes the following top-level fields:
id — opaque hex hash identifier, stable per filing.accessionNo — canonical EDGAR accession number with dashes (e.g., 0001144204-08-006062).formType — SB-2 or SB-2/A.description — human-readable form description.filedAt — ISO 8601 timestamp with timezone (Eastern, matching EDGAR acceptance).linkToFilingDetails — absolute URL of the primary registration statement on EDGAR.linkToTxt — URL of the consolidated SGML submission .txt on EDGAR.linkToHtml — URL of the EDGAR accession -index.htm page.linkToXbrl — empty for this form type; SB-2 predates XBRL applicability and was rescinded before any phase-in could reach it.documentFormatFiles — array of every document referenced by the submission.dataFiles — empty across SB-2 records (no XBRL or financial-data files were ever attached to the form).entities — array of registrant/filer records.Each documentFormatFiles[] element carries sequence (numeric string; the synthetic "Complete submission text file" entry uses a blank sequence), type (the EDGAR document classifier), documentUrl (absolute EDGAR URL), size (bytes as a string), and an optional description (e.g., "OPINION OF QUARLES & BRADY LLP AS TO THE LEGALITY OF SECURITIES BEING REGISTERED", "GRAPHIC", "Complete submission text file").
Each entities[] element carries companyName (display name with parenthesized role suffix such as "(Filer)"), cik (10-digit zero-padded), irsNo (9-digit EIN, or "000000000" if not supplied), fileNo (the Securities Act registration file number, characteristically prefixed 333-), filmNo (8-digit EDGAR film number), act (always "33" because SB-2 is a Securities Act registration), type (form-type echo), sic (SIC code plus label, sometimes omitted), stateOfIncorporation (two-letter code), fiscalYearEnd (MMDD; sometimes omitted for shell companies), and tickers (often empty because SB-2 issuers were frequently not yet trading at the time of registration).
The sequence-1 document is the registration statement and prospectus itself, with <TYPE> set to SB-2 or SB-2/A. For HTML filings, the body is a single long HTML document containing the entire prospectus and Part II in reading order: cover page and calculation-of-registration-fee table at the top, prospectus narrative items in Regulation S-B order, audited financial statements as an embedded section (rendered as inline HTML tables or as image-free ASCII tables converted to HTML), and Part II followed by signatures. For plain-text filings, the same content is rendered as a fixed-width ASCII document. Amendment filings (SB-2/A) restate the entire registration statement rather than only the changed pages; redlines or bracketed change indicators are filer-discretionary and not standardized.
Exhibits are filed as additional sequenced documents in the same accession folder, each in its own SGML-wrapped file. The exhibit numbering follows Item 601 of Regulation S-B, which mirrored Regulation S-K with adjustments. The exhibit taxonomy used in this dataset is:
EX-3.x — articles of incorporation, certificates of designation, bylaws, and committee charters. Multiple EX-3 exhibits often appear (EX-3.1 through EX-3.4 are common).EX-4.x — instruments defining the rights of security holders: specimen stock certificates, warrant agreements, indentures, registration-rights agreements.EX-5.1 — opinion of counsel as to the legality of the securities being registered. Required and almost universally present.EX-10.x — material contracts: consulting and employment agreements, share-purchase agreements, leases, broker-dealer placement agreements, license agreements. Frequently the largest exhibit set by count.EX-21.x — list of subsidiaries (often omitted for shell-company filers with no subsidiaries).EX-23.x — consents of independent registered public accountants and consents of named legal counsel. Auditor consent (EX-23.1) is functionally always present because audited financials are incorporated.EX-24.1 — power of attorney granted by directors and officers to allow execution of subsequent amendments.EX-99 and EX-99.x — additional ancillary documents: subscription agreements, escrow agreements, sample investor forms, press releases.CORRESP — correspondence with the SEC staff, treated as a document type within the submission rather than a numbered prospectus exhibit.GRAPHIC — image entries (JPG, GIF) referenced by the prospectus for logos, cover-page artwork, geological maps, and similar visuals. The GRAPHIC entries appear in metadata.json, but the binary image files are intentionally excluded from the on-disk record.Every body file in the folder, regardless of file extension, opens with the same <DOCUMENT>/<TYPE>/<SEQUENCE>/<FILENAME>/<DESCRIPTION>/<TEXT> header block and closes with </TEXT></DOCUMENT>. The envelope is preserved as filed — it is not stripped during dataset assembly. Consumers extracting the raw prospectus or exhibit body must skip past the <TEXT> opening tag and stop at the closing </TEXT>. For HTML payloads the inner content is a self-contained <html> document and may include filer-tool signatures (e.g., <!-- Document Created using EDGARizer HTML 3.0.4.0 -->) that are useful as provenance markers.
metadata.json for every accession.CORRESP) documents when present in the submission.XFD (paper-form facsimile data) and FRM files when the registrant attached them.<TYPE>, <SEQUENCE>, <FILENAME>, and <DESCRIPTION> values.GRAPHIC-typed entries remain in documentFormatFiles[] with their EDGAR URLs, so consumers can still fetch them from EDGAR if needed, but the bytes are not in the ZIP..txt that EDGAR generates for an entire accession — is referenced in metadata.json (with a blank sequence value) but is not redistributed as a separate file in the folder. The same content is reconstructible by concatenating the included document files together with their preserved SGML envelopes.dataFiles is empty and linkToXbrl is the empty string in every record.The SB-2 corpus has a tightly bounded regulatory lifespan — April 1995 (EDGAR adoption) through 2008 (form rescission) — and the disclosure requirements were comparatively stable across that window because Regulation S-B itself was largely unchanged after its 1992 promulgation. The substantive shifts within the corpus are:
EX-23.1) became more legally consequential after PCAOB-registered-firm requirements took effect.<TEXT> block carrying fixed-width ASCII pages and EDGAR <TABLE>/<S>/<C> financial-table tags. Image content was effectively absent from this era.dataFiles and linkToXbrl are uniformly empty.v######_sb2.htm, Donnelley's d-prefixed dsb2.htm / dex51.htm), descriptive lowercase names (legalopinion.htm, auditorconsent.htm), generic placeholders (filename2.htm, filename13.htm) emitted when the original filename slot was empty, and bare 8.3-style names (g2177.txt, ex5-1.txt) all coexist. The authoritative document classification is always the type field in metadata.json and the <TYPE> tag inside each file's SGML envelope, never the filename.metadata.json alone. The formType field flattens every amendment to SB-2/A regardless of ordinal; only fileNo plus filedAt ordering reveal the sequence. Filers occasionally encode the amendment number in the filename (e.g., strasbaugh_sb2a3-020508.txt for a third amendment dated 2008-02-05), but this is convention, not specification.fileNo (the 333- Securities Act registration number) typically resolves to multiple records across the original SB-2 and several SB-2/A accessions. Deduplicating to "the registration" requires grouping on fileNo.documentFormatFiles[] array is the source of truth for which files exist in the EDGAR submission, including the ones omitted from redistribution (images and the synthetic complete-submission text). When the on-disk folder is missing a file referenced in documentFormatFiles[], that file is either a GRAPHIC entry or the trailing Complete submission text file entry; both can be retrieved by following documentUrl to EDGAR.<DOCUMENT> envelope must be parsed out before HTML parsers will accept the body cleanly. HTML payloads usually start at the first <html> or <HTML> tag inside <TEXT>; plain-text payloads should be read between the <TEXT> and </TEXT> markers and treated as fixed-width ASCII with the EDGAR financial-table tags optionally promoted to columnar tables.EX-23.1 is missing or where audited financials appear unsigned, the filing is almost always an early-stage SB-2 that was withdrawn or never declared effective; the dataset preserves these as filed.CORRESP documents reflect issuer-to-staff communication and are not part of the prospectus; treat them as supplementary metadata about the comment-letter process rather than as disclosure to investors.metadata.json type field and the SGML <TYPE> tag for document classification, on the entities[] block for issuer identification, and on filename matching only as a last resort.Each record is a Securities Act of 1933 registration statement on Form SB-2, or an amendment on Form SB-2/A, submitted to EDGAR by an issuer that qualified as a "small business issuer" under former Regulation S-B. The filer is always the issuer of the securities being registered, never the underwriter, selling shareholder, or auditor (although those parties are named, and accountants and counsel provide consents).
To use the form, an issuer had to satisfy every prong of Item 10(a)(1) of Regulation S-B (cross-referenced in Securities Act Rule 405 and Exchange Act Rule 12b-2) at the time of filing:
Issuers exceeding either threshold, foreign private issuers other than Canadian filers, registered investment companies, business development companies, and asset-backed issuers were excluded and used Form S-1, F-1, N-1A/N-2, Form S-3, Form S-4, or Form S-11 as applicable.
Form SB-2 is event-driven, not periodic. Section 5 of the Securities Act prohibits any offer (Section 5(c)) or sale (Section 5(a)) of securities in interstate commerce unless a registration statement is on file and effective. An eligible issuer therefore filed Form SB-2 whenever it elected to register a public offering under the scaled Regulation S-B regime, including:
Effectiveness ran on the Section 8(a) twenty-day clock from the most recent filing or amendment, but in practice nearly all issuers requested acceleration under Rule 461 after staff review concluded. The acceleration request was submitted jointly by the issuer and any managing underwriters and granted by an SEC notice of effectiveness.
Form SB-2/A amendments are filed whenever the registration statement must be revised before or after effectiveness, including in response to:
Each SB-2/A receives its own EDGAR accession number but shares file-number lineage with the original SB-2.
Form SB-2 filings were submitted through EDGAR with:
Form SB-2 was adopted in 1992 under the Small Business Initiatives release (Release No. 33-6949), which created Regulation S-B and the integrated small business disclosure system. EDGAR coverage of SB-2 filings begins in April 1995 with the phase-in of mandatory electronic filing.
The form was rescinded effective February 4, 2008 by Release No. 33-8876 (Smaller Reporting Company Regulatory Relief and Simplification), which eliminated the "small business issuer" category and Regulation S-B and folded scaled disclosure into Regulation S-K under the new "smaller reporting company" definition (initially a $75 million public float test). Former SB-2 filers transitioned to Form S-1 with SRC scaled disclosure available within that form. No new originating SB-2 filings have been accepted by EDGAR since the rescission; post-2008 SB-2 or SB-2/A submissions in the dataset are generally post-effective amendments or wind-down filings tied to pre-rescission registration statements.
The Form SB-2 Files Dataset sits in a tight cluster of Securities Act registration filings. Its closest neighbors fall into three groups: other registration forms in use during SB-2's lifespan (1992-2008), the broader Regulation S-B small-business reporting regime, and exempt offering frameworks that competed with SB-2 for small-issuer capital formation. The distinctions below are rules-based: eligibility, trigger, scope, disclosure scaling, and timing.
S-1 is the general-purpose Securities Act registration statement, available to any issuer including those eligible for SB-2. The structural overlap is heavy: prospectus, use of proceeds, risk factors, business, MD&A, audited financials, management, exhibits. The distinction is the disclosure rulebook. SB-2 ran on Regulation S-B (two years of audited statements rather than three, scaled MD&A, lighter executive compensation tables); S-1 ran on full Regulation S-K and Regulation S-X. After SB-2's 2008 rescission, the small-issuer population migrated to S-1 under the new smaller-reporting-company accommodations within S-K. The post-2008 S-1 corpus therefore absorbs SB-2's filer base; SB-2 is its predecessor, not a substitute.
SB-1 was the smaller small-business form, capped at $10 million in any rolling twelve-month period and permitting a question-and-answer prospectus format. SB-2 had no offering-size ceiling for qualifying small-business issuers and required a conventional narrative prospectus. Both used Regulation S-B scaling, but SB-1 generated a much smaller and shallower historical population.
10-SB is an Exchange Act Section 12(g) registration, not a Securities Act offering registration. It registers a class of securities to make the issuer a reporting company; it does not register a transaction. There is no offering price, no use of proceeds, no underwriting section. It is the small-business analogue of Form 10, not of S-1. Use 10-SB to identify when a small issuer became a reporting company; use SB-2 to study how a small issuer raised capital.
S-3 is the short-form registration for seasoned issuers meeting eligibility tests (reporting history, timely filings, qualifying public float, historically $75 million). It incorporates by reference from Exchange Act filings rather than restating business and financial information. SB-2 sat on the opposite side of that eligibility line, designed for issuers without S-3 qualifications and often without any Exchange Act reporting history. SB-2 filings are self-contained and information-dense; S-3 filings are thin and reference-driven. The two datasets cover non-overlapping issuer populations.
S-11 is the dedicated registration statement for REITs and other real-estate-focused issuers, with real-estate-specific schedules (property tables, occupancy data, Schedule III). A small real estate issuer qualifying under Regulation S-B could in some cases register on SB-2 instead of S-11, creating overlap at that boundary. For real estate offering research, S-11 is the complete corpus; SB-2 captures only the small-issuer slice and lacks the standardized real-estate schedules.
Form F-1 is the Securities Act registration for foreign private issuers, following Form 20-F-aligned disclosure with IFRS or reconciled GAAP financials. SB-2 was never available to foreign private issuers; the Regulation S-B small-business definition applied only to U.S. and Canadian issuers meeting specific tests. The two datasets are mutually exclusive by filer geography and reporting regime.
Form 10-KSB and Form 10-QSB were the Regulation S-B annual and quarterly reports — the periodic counterparts to SB-2's registration role. Same scaled disclosure regime, same 2008 rescission, same migration path to 10-K and 10-Q with smaller-reporting-company accommodations. SB-2 captures the registration event; 10-KSB and 10-QSB capture the ongoing reporting that followed. Complementary, not substitutable.
Form 1-A is the offering statement for exempt offerings under Regulation A (and post-2015 Regulation A+). Reg A offerings are exempt from Section 5 and qualified by staff rather than declared effective. Disclosure is scaled below SB-2, financials are lighter, and offering size has historically been capped (originally $5 million; post-2015 $20 million for Tier 1, $75 million for Tier 2). For small-issuer capital formation, Reg A is a parallel path, but legally and structurally distinct from SB-2.
The SB-2 Files Dataset packages the complete EDGAR submission per accession (metadata plus every document except images). Extracted datasets — prospectus-only, Item-level, exhibit-only — discard surrounding documents in exchange for cleaner, narrower content. The files dataset is the source-of-truth bundle; extracted variants are downstream refinements. They are complements.
The Form SB-2 Files Dataset is defined by the intersection of three constraints, each of which a neighboring dataset breaks:
Within that intersection, the dataset preserves the full EDGAR submission rather than an extracted slice, which is what distinguishes it from prospectus-only or exhibit-only derivatives of the same filings.
Users are concentrated in roles that work on small-issuer disclosure, micro-cap securities, historical enforcement, and the SB-2 to smaller reporting company transition. The closed 1995 to 2008 archive is dominated by micro-cap, penny-stock, early-stage, and shell or near-shell filers, and that filer profile shapes the user base.
Used as a precedent library for small-issuer registered offerings. Drafting teams pull historical prospectuses to model risk factors, plan of distribution, selling shareholder tables, and lock-up language for resale registrations, equity lines, PIPE warrant registrations, and best-efforts deals. Paralegals mine exhibit indexes for EX-5.1 legality opinions, EX-10 templates (subscription, registration rights, equity line, finder agreements), and EX-23 auditor consents. The SB-2 to SB-2/A amendment chain is itself a workflow input: comparing successive amendments shows how disclosure shifted in response to staff comments.
Used to reconstruct share-count and capital-structure history for long-listed issuers that went through multiple SB-2 registrations. The capitalization table, use-of-proceeds, and selling shareholder table together expose registered share counts, prior placement prices, warrant overhang, and the dilution path into today's float. For coverage initiations, SB-2 filings are often the only structured source for founder holdings, early seed positions, and pre-IPO convertible terms. Risk factors are read against current management narratives.
Used on penny-stock fraud, pump-and-dump, manipulation, and restatement matters, since many micro-cap enforcement cases trace back to an SB-2 or SB-2/A. Selling shareholder tables, plan of distribution, and EX-10 contracts (consulting, share-issuance, debt-conversion) reconstruct how shares moved into the float and who was paid in stock. Risk factors and use-of-proceeds are compared against actual cash use and later restatements. EX-23 consents and the disclosed auditor identity link issuers to small audit firms with their own PCAOB or SEC histories. Output is issuer chronologies and exhibit material for expert reports.
Used to triage issuers organized as shells or that became shells post-effectiveness and later served as reverse-merger vehicles. Diligence teams pull original promoter, auditor, CIK, file number, state of incorporation, and SIC code from metadata.json, then read use-of-proceeds and business description against subsequent operating history. Classic shell precursors (minimal proceeds, vague business plans, promoter-dominated selling shareholder tables) drive watchlists and target lists for custodianship petitions and shell revival projects.
Used when acquiring long-listed micro-cap targets or evaluating shell vehicles for go-public transactions. EX-10 material contracts surface legacy registration rights, anti-dilution clauses, board observer rights, and consultant share grants that may still be live. Selling shareholder tables identify legacy holders potentially still on the register. Metadata fields (CIK, file numbers, state of incorporation, fiscal year end, SIC) confirm continuity of the legal entity. Output is a legacy-securities diligence memo and a remediation plan for surviving registration rights.
Used for Section 15(g) penny-stock supervision, Rule 144 tacking on resale customers, and KYC on issuers onboarded for market-making or DVP settlement. Selling shareholder tables establish original holders and cost-basis representations; plan of distribution and legend disclosures establish whether shares were registered for resale or restricted. EX-5.1 confirms the legality basis for registered shares. CIK and file numbers in metadata tie SB-2 history to later trading symbols for surveillance and SRO inquiries.
Used for historical lookups tied to enforcement, registration revocations, and trading suspensions of SB-2-era issuers. Metadata (CIK, file numbers, SIC, state of incorporation, fiscal year end) indexes the corpus; prospectus body and exhibits drive substantive review. Section 12(j) deregistration, manipulation, and gatekeeper matters rely on EX-5.1 and EX-23 to identify the attorneys and auditors who signed offerings, feeding pattern analysis across issuers tied to the same gatekeepers.
Used as a primary corpus for small-issuer going-public dynamics, micro-cap IPO underpricing, scaled-disclosure cost, staff-review effectiveness, and the 2008 SRC regime change. The closed 1995 to 2008 window suits difference-in-differences and event-study designs. Risk factor text, executive compensation tables, capitalization tables, and offering-size figures are merged with later trading data. SB-2/A chains support studies on staff review intensity and the substantive effect of comment-letter cycles.
Used as training and evaluation data for extracting offering size and use-of-proceeds categories, classifying risk factors by topic, detecting shell-company language, and parsing capitalization and selling shareholder tables from heterogeneous HTML and TXT. The 1995 to 2008 span and per-accession metadata.json provide format-diverse coverage and ground-truth labels for entity linking and SIC classification. Shell-detection models use later observed outcomes (reverse mergers, deregistrations, enforcement) to label positives and negatives.
The workflows below draw on the prospectus body, the EX-3 through EX-99 exhibit set, the per-accession metadata.json, and the closed 1995 to 2008 amendment chains.
Pull every accession sharing a fileNo (the 333- Securities Act number), order by filedAt, and walk the SB-2 to SB-2/A chain to extract the calculation-of-registration-fee table, selling shareholder table, and capitalization section from each restatement. The output is a per-issuer share-issuance ledger keyed by CIK that ties registered share counts, warrant coverage, and named selling holders to the prices on the cover page. Used by micro-cap analysts to attribute today's float and overhang back to specific 2003 to 2008 placements.
Filter documentFormatFiles[] on type values matching EX-10.* and pull the SGML-wrapped bodies for equity line agreements, registration rights agreements, finder agreements, consulting share-issuance contracts, and Standby Equity Distribution Agreements. Drafting teams cluster these by counterparty (Cornell Capital, Dutchess, YA Global, etc.) to model current PIPE and ELOC documents on language that survived staff review. The exhibit description field in metadata.json accelerates the initial filter before any document is opened.
Group accessions by fileNo, then diff the sequence-1 prospectus across consecutive SB-2/A filings and align the diffs against any CORRESP documents in the same accession folders. The result is a labeled dataset of "staff comment leads to disclosure change" pairs covering risk factors, use of proceeds, plan of distribution, and going-concern language. Securities counsel use this to anticipate staff requests on small-issuer registrations; academic researchers use it to study comment-letter effectiveness across the 1995 to 2008 window.
Score each record on shell-precursor signals extracted from sequence-1: a "plan of operation" rather than full MD&A, two-year audited statements with minimal revenue, vague Item 101 business descriptions, promoter-dominated selling shareholder tables, and missing EX-21 subsidiary lists. Cross-reference cik, stateOfIncorporation, and sic from metadata.json against later trading suspensions and Section 12(j) deregistrations. Output is a ranked watchlist of dormant CIKs for custodianship petitions and revival diligence.
Parse the EX-5.1 legality opinion and EX-23.1 auditor consent from every accession to extract the issuing law firm and the consenting audit firm, then aggregate by gatekeeper across the corpus. The resulting issuer-to-counsel and issuer-to-auditor adjacency lists feed enforcement research on small audit firms and securities counsel who appeared repeatedly on filings later tied to manipulation, restatement, or deregistration matters. The description field on each exhibit (e.g., "OPINION OF ... AS TO THE LEGALITY OF SECURITIES BEING REGISTERED") gives a reliable starting filter.
Use the format-diverse population (legacy fixed-width ASCII with EDGAR <TABLE>/<S>/<C> markers from the late 1990s alongside filing-agent HTML from EDGARizer, RDG, and Donnelley in the mid-2000s) to train extractors for offering size, use-of-proceeds categories, risk-factor topics, and selling shareholder rows. The per-accession metadata.json supplies ground-truth formType, sic, stateOfIncorporation, and entity labels; the SGML <TYPE> tag inside each document body provides authoritative document classification independent of filer-controlled filenames.
The Form SB-2 Files Dataset is accessible through three endpoints: a JSON metadata index, a full archive download, and per-container downloads. Containers are monthly ZIP files organized by year, covering filings from April 1995 onward across SB-2 and SB-2/A form types.
Dataset Index JSON API: https://api.sec-api.io/datasets/form-sb2-files.json
Returns dataset-level metadata (name, description, last updated timestamp, earliest sample date, total record count, total size, form types, container format, and file types) along with the full dataset download URL and a list of all monthly container files. Each container entry includes its key, size, record count, last updated timestamp, and individual download URL. Poll this endpoint to detect which monthly containers were refreshed in the most recent run, and download only the changed containers on a daily basis. This endpoint does not require an API key.
Example response:
1
{
2
"datasetId": "1f13365b-9ae0-6917-849d-750c95918b65",
3
"datasetDownloadUrl": "https://api.sec-api.io/datasets/form-sb2-files.zip",
4
"name": "Form SB-2 Files Dataset",
5
"updatedAt": "2026-04-14T15:11:27.498Z",
6
"earliestSampleDate": "1995-04-01",
7
"totalRecords": 143958,
8
"totalSize": 3060933189,
9
"formTypes": ["SB-2", "SB-2/A"],
10
"containerFormat": "ZIP",
11
"fileTypes": ["TXT", "JSON", "HTML", "XFD", "PDF", "FRM"],
12
"containers": [
13
{
14
"downloadUrl": "https://api.sec-api.io/datasets/form-sb2-files/2008/2008-02.zip",
15
"key": "2008/2008-02.zip",
16
"size": 13818783,
17
"records": 154,
18
"updatedAt": "2026-04-14T15:11:27.498Z"
19
}
20
]
21
}
Download Entire Dataset: https://api.sec-api.io/datasets/form-sb2-files.zip?token=YOUR_API_KEY
Downloads the complete dataset as a single ZIP archive containing every monthly container. This endpoint requires an API key.
Download Single Container: https://api.sec-api.io/datasets/form-sb2-files/2008/2008-02.zip?token=YOUR_API_KEY
Downloads one monthly container archive identified by its YYYY/YYYY-MM.zip key, allowing incremental retrieval of only the months that changed. This endpoint requires an API key.
The dataset covers Form SB-2 (the initial small-business issuer registration statement under the Securities Act of 1933) and Form SB-2/A (pre-effectiveness and post-effectiveness amendments to a previously filed SB-2). Both form types share fileNo lineage when they belong to the same registration.
One record is a single EDGAR submission identified by an 18-digit accession number. On disk it is one accession-number folder containing a metadata.json manifest plus every document the registrant submitted to EDGAR for that accession, except binary image files. Each record bundles a structured JSON manifest, the prospectus and Part II content, exhibits, and the SGML envelopes wrapping every document body.
The filer was always the issuer of the securities being registered, and only issuers that qualified as "small business issuers" under Regulation S-B could use the form — meaning revenues and public float each below $25 million, organization in the United States or Canada, and not registered under the Investment Company Act. Foreign private issuers (other than Canadian filers), registered investment companies, business development companies, and asset-backed issuers were ineligible.
The dataset begins in April 1995, when EDGAR electronic filing became mandatory, and runs through Form SB-2's rescission in 2008 under Release No. 33-8876. Post-2008 records in the dataset are wind-down amendments tied to pre-rescission registration statements; no new originating SB-2 filings have been accepted by EDGAR since the rescission.
The dataset is distributed as monthly ZIP containers organized by year, with keys of the form YYYY/YYYY-MM.zip. Inside each container, every accession is its own subdirectory containing a metadata.json plus document files in TXT, HTML, XFD, PDF, or FRM format, each wrapped in EDGAR's SGML <DOCUMENT> envelope.
Form S-1 is the general-purpose Securities Act registration statement and runs on full Regulation S-K and S-X disclosure; Form SB-2 ran on the scaled Regulation S-B regime, with two years of audited financials rather than three and lighter MD&A and compensation requirements. After SB-2's 2008 rescission, the small-issuer population migrated to S-1 with smaller-reporting-company accommodations folded into Regulation S-K, so the post-2008 S-1 corpus absorbs SB-2's filer base — SB-2 is its predecessor, not a substitute.
No. Form SB-2 was rescinded before XBRL phase-in reached small registrants, so no record carries an XBRL instance document or inline-XBRL structured data. The dataFiles array and linkToXbrl field in metadata.json are uniformly empty across the corpus.