The Form 1-A Files Dataset is a complete, monthly-refreshed corpus of EDGAR offering-statement submissions filed under Regulation A — both initial Form 1-A filings and Form 1-A/A pre-qualification and post-qualification amendments. Each record represents one EDGAR submission identified by its 18-digit accession number and bundles the dataset header, the structured Form 1-A cover-page XML, the XSL-rendered cover view, the narrative offering circular, and every text-bearing exhibit attached to the submission. Records are filed by the issuer itself — early- and growth-stage operating companies, real-estate sponsors, special-purpose vehicles, non-listed REITs, and eligible Canadian issuers — seeking SEC qualification before any sales of securities under Reg A. Electronic Form 1-A coverage on EDGAR begins in January 2002, with filing volume heavily concentrated after the Reg A+ rewrite (Release 33-9741) took effect on June 19, 2015. The dataset is distributed as monthly ZIP containers covering form types 1-A and 1-A/A, with file payloads in XML, HTML, JSON, TXT, and PDF.
Programmatically retrieve the full list of dataset archive files, download URLs and dataset metadata.
Dataset Index JSON API
Download the entire dataset as a single archive file.
Download Entire Dataset:
Download a single container file (e.g. monthly archive) from the dataset.
Download Single Container:
The dataset assembles, on a per-submission basis, every Form 1-A and Form 1-A/A filing made on EDGAR. Form 1-A is the offering statement prescribed by Regulation A under Section 3(b) of the Securities Act of 1933. It functions as a quasi-prospectus: it must qualify with the SEC before sales are permitted, and it carries substantive disclosures about the issuer, the securities, the terms of the offering, the use of proceeds, the risks, the management, and supporting financials. Modern Reg A is two-tiered: Tier 1 covers offerings up to $20 million in any rolling 12-month period and remains subject to state blue-sky review with no audit requirement; Tier 2 covers offerings up to $75 million, requires audited GAAP financial statements, preempts state review, and triggers ongoing reporting on Forms 1-K, 1-SA, and 1-U. A Form 1-A/A amends a previously filed offering statement under the same 024-##### offering file number (distinct from the 333-##### series used for full registration).
Internally an EDGAR Form 1-A submission is a hybrid of (a) a structured cover-page XML produced through the EDGAR Online Forms "Reg A Filer" workflow and (b) a set of attached HTML documents containing the actual narrative and legal content. The XML carries the machine-readable issuer/offering data; the HTML attachments carry the prose, tables, signatures, and legal text. The dataset preserves both layers as separate files and ships records inside monthly ZIP containers keyed YYYY-MM/, packaged as form-1a-files/YYYY/YYYY-MM.zip. Coverage runs from January 2002 to present and spans the pre-2015 legacy Reg A regime, the 2015 Reg A+ rewrite, and the 2021 amendment that raised the Tier 2 cap from $50 million to $75 million.
One record in the Form 1-A Files Dataset is a single EDGAR offering-statement submission filed under Regulation A — either an original Form 1-A (initial offering statement) or a Form 1-A/A (pre-qualification amendment) — identified by its 18-digit accession number. A record is not a single document but the full bundle EDGAR received as one submission: the EDGAR-generated dataset header, the structured Form 1-A cover-page XML, the XSL-rendered view of that cover page, the narrative offering circular, and every text-bearing exhibit attached to the submission. On disk the record is one folder named with the dash-stripped accession number (e.g. 000149315225025045 for accession 0001493152-25-025045), placed inside a monthly ZIP keyed YYYY-MM/ and shipped as form-1a-files/YYYY/YYYY-MM.zip. Every file in that folder, taken together, is the record.
Within the monthly ZIP, each record occupies one folder whose name is the accession number with all dashes removed. Representative tree:
1
form-1a-files/2025/2025-11.zip
2
└── 2025-11/
3
├── 000149315225025045/
4
│ ├── metadata.json
5
│ ├── primary_doc.xml
6
│ ├── partiiandiii.htm
7
│ ├── ex1-1.htm
8
│ ├── ex2-2.htm
9
│ ├── ex3-1.htm
10
│ ├── ex3-2.htm
11
│ ├── ex8-1.htm
12
│ ├── ex11-1.htm
13
│ ├── ex11-2.htm
14
│ ├── ex12-1.htm
15
│ ├── ex17-1.htm
16
│ └── xsl1-A_X01/
17
│ └── primary_doc.xml
18
└── ...
Five file roles populate that folder:
| File | Format | Role |
|---|---|---|
metadata.json | JSON | Dataset-produced submission header (always present) |
primary_doc.xml | EDGAR XML (oneafiler schema) | Structured Form 1-A cover page (Issuer Information / Part I) |
xsl1-A_X01/primary_doc.xml | XHTML 1.0 Strict | XSL-rendered human-readable view of the cover page |
partiiandiii.htm (or filer-slug variant) | SGML-wrapped HTML | Offering circular: Part II narrative plus Part III exhibit index |
ex<N>[-<M>].htm | SGML-wrapped HTML | One file per attached exhibit |
metadata.json is the one component never absent. primary_doc.xml and its XSL render appear in essentially every modern submission. The narrative offering-circular file and the exhibit set vary in count from a single document (a minimal cover-only amendment) to several dozen (a fully exhibited initial filing).
metadata.json — the dataset headerThe dataset's normalized header for the submission, derived from the EDGAR submission index. Fields:
formType — "1-A" or "1-A/A".accessionNo — dashed accession (e.g. "0001493152-25-025045").description — EDGAR human-readable form description (e.g. "Form 1-A/A - Offering Statement [Regulation A]: [Amend]").filedAt — ISO-8601 timestamp with timezone.linkToFilingDetails, linkToTxt, linkToHtml, linkToXbrl — canonical sec.gov URLs for the rendered primary doc, the SGML full-submission text file, the EDGAR filing-index page, and any XBRL data (empty for Reg A).documentFormatFiles[] — one element per document in the EDGAR submission, each with sequence, size, documentUrl, optional description, and type (e.g. "1-A/A", "PART II AND III", "ADD EXHB", "GRAPHIC"). The terminating element with empty sequence/type and description: "Complete submission text file" references the SGML .txt wrapper.entities[] — filer/issuer records, each with companyName, cik, irsNo, fileNo (the 024-##### Reg A offering file number), filmNo, type, act (always "33" for Reg A), sic with description, stateOfIncorporation, fiscalYearEnd (MMDD), and optional tickers[].seriesAndClassesContractsInformation[] — typically empty for Form 1-A.dataFiles[] — typically empty (Reg A does not carry XBRL data files).id — opaque dataset record identifier.primary_doc.xml — the structured Form 1-A cover pageAn XML instance against the http://www.sec.gov/edgar/rega/oneafiler namespace. Root element <edgarSubmission> splits into <headerData> and <formData>.
<headerData> carries submission-level routing fields: submissionType, and <filerInfo> containing the issuer's CIK/CCC, liveTestFlag, and a <filer> block holding <issuerCredentials> and <issuerCik>. The Reg A offering file number is exposed at the form level (see below) rather than as a header attribute.
<formData> carries the substantive cover-page disclosures, organized into the following sub-blocks (note the schema's idiosyncratic spelling — juridication instead of jurisdiction — appears in element names exactly as shown):
<issuerInfo> — issuer name, address, contact telephone, jurisdiction of organization, year of incorporation, primary SIC, IRS employer-ID, full-time / part-time employee headcounts, the offering file number, and a balance-sheet/income-statement snapshot covering cash, accounts receivable, PP&E, total assets, accounts payable, long-term debt, total liabilities, stockholder equity, total revenues, total expenses, depreciation and amortization, net income, EPS basic and diluted, and the auditor's name.<commonEquity>, <preferredEquity>, <debtSecurities> (each repeatable) — class name, outstanding amount, CUSIP, and listing venue for each class of issued security.<issuerEligibility>, <applicationRule262> — yes/no certifications that the issuer is eligible under Rule 251 and not subject to the bad-actor disqualifications of Rule 262.<summaryInfo> — the offering-summary block: indicateTier1Tier2Offering, financialStatementAuditStatus, types of securities offered, securitiesOffered count, pricePerSecurity, issuerAggregateOffering, totalAggregateOffering, named service providers (sales-commissions broker, finders, auditors, legal counsel) with their fees and CRD numbers, and estimatedNetAmount to the issuer.<juridictionSecuritiesOffered> — repeating two-letter codes for both issueJuridicationSecuritiesOffering (states where the issuer offers directly) and dealersJuridicationSecuritiesOffering (states where dealer activity is contemplated). Tier 2 filings often list all 50 states plus DC and PR; Tier 1 filings tend to list only the states actually qualified.<securitiesIssued>, <unregisteredSecuritiesAct> — disclosure of prior unregistered issuances within the lookback window and the exemption(s) relied upon.This XML is the structured machine-readable counterpart to the cover-page items of the form. It does not contain the offering-circular narrative; that lives in the HTML attachment.
xsl1-A_X01/primary_doc.xml — the rendered cover pageThe EDGAR XSL-rendered HTML view of the structured primary_doc.xml. Despite the .xml extension and the path under xsl1-A_X01/, this file is XHTML 1.0 Strict (declared via <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" ...>) styled with /css/REGA_1A_print.css. It presents the structured fields as a tabular cover page and adds no information beyond what is already in the XML, but it is the rendered form most users see on the EDGAR website. It is the only HTML-flavored file in the record that is not SGML-wrapped.
partiiandiii.htm and variants)The narrative body of the filing — the offering circular itself — is delivered as a single SGML-wrapped HTML document, conventionally named partiiandiii.htm because it embodies both Part II (the prospectus-like Offering Circular) and Part III (the Index to Exhibits). Some filers substitute a custom filename formed as <issuer-prefix>_1a.htm or <issuer-prefix>_1aa.htm (e.g. glob_1aa.htm, spot_1aa.htm, bdcc_1a.htm, karx_1a.htm, tranquil_1a.htm, alkaline_1a.htm, bhic1a111925.htm).
The narrative typically follows the post-2015 SEC-prescribed Offering Circular item order:
Exhibits are attached as separate SGML-wrapped HTML files, one per exhibit. The post-2015 Item 17 exhibit table groups exhibits by numeric category — for example: 1 (underwriting/placement-agent agreements), 2 (charter and bylaws), 3 (instruments defining rights of securityholders), 4 (subscription agreements), 6 (material contracts), 7 (acquisition-related plans), 8 (escrow agreements), 9 (letters re unaudited interim information), 10 (powers of attorney), 11 (consents of experts and auditors), 12 (opinions of legal counsel re legality), 13 (testing-the-waters materials), 15 (additional exhibits), 17 (additional schedules or correspondence).
Three exhibit-filename schemes are observed in the dataset, and they coexist within the same monthly ZIP:
ex1-1.htm, ex3-2.htm, ex11-1.htm, ex11-2.htm, ex12-1.htm, ex17-1.htm. The first number is the Item 17 category; the second is the within-category sequence.ex1.htm, ex2.htm, … ex31.htm. Position within the sequence does not indicate exhibit type.spot_ex6z3.htm, bdcc_ex41.htm, tranquil_ex0303.htm, bhicex11-1.htm. Some embed the category-and-sequence digits, others do not.Exhibit content is heterogeneous: charter documents, bylaws, certificates of designation, subscription agreements, escrow agreements, broker/placement-agent agreements, material contracts, employment agreements, consents of independent registered public accounting firms, legal opinions on legality of securities, testing-the-waters communications, and supplemental financial schedules.
.htmEvery .htm file in the record except the XSL render — i.e. the offering circular and every exhibit — is not plain HTML. Each is one <DOCUMENT> segment lifted from the EDGAR full-submission .txt, with the actual HTML payload nested inside <TEXT>...</TEXT> and prefaced by an SGML header:
1
<DOCUMENT>
2
<TYPE>ADD EXHB
3
<SEQUENCE>8
4
<FILENAME>ex11-1.htm
5
<DESCRIPTION>ADD EXHB
6
<TEXT>
7
<HTML>
8
... (full HTML payload) ...
9
</HTML>
10
</TEXT>
11
</DOCUMENT>
These tags are SGML, not XML: the preamble lines (<TYPE>, <SEQUENCE>, <FILENAME>, <DESCRIPTION>) have no closing tags and rely on newline-as-terminator, and <TEXT> is closed by </TEXT> only at the document's end. The <TYPE> value mirrors the documentFormatFiles[].type in metadata.json (e.g. "1-A/A", "PART II AND III", "ADD EXHB"); <SEQUENCE> matches the EDGAR submission ordering. Robust extraction strips the preamble and the <TEXT>/</TEXT> envelope and parses only the inner HTML payload.
The file types found in the dataset are XML, HTML, JSON, TXT, and PDF. In practice:
metadata.json only.primary_doc.xml only..htm except the XSL render is SGML-wrapped.metadata.json (the EDGAR full-submission .txt) but is not bundled into the ZIP; the per-document SGML segments are already broken out as the individual .htm files.Included in each record folder:
metadata.json header.primary_doc.xml and its XSL-rendered XHTML view.Excluded from each record folder:
metadata.json -> documentFormatFiles with type: "GRAPHIC". A heavily illustrated offering circular may reference dozens of image_###.jpg and <exhibit>_###.jpg files at sec.gov URLs that have no on-disk counterpart. Renderers that need the images must fetch them via the documentUrl of each GRAPHIC entry..txt wrapper. Its content is preserved in decomposed form across the per-document .htm files; the consolidated wrapper itself is referenced by URL in the metadata but not duplicated locally.The dataset spans January 2002 through the present, which straddles the most consequential rewrite of Form 1-A in its modern history. On June 19, 2015, the SEC's "Regulation A+" amendments adopted under Title IV of the JOBS Act took effect, replacing Form 1-A in its entirety. Records on either side of that transition differ substantially in content structure:
oneafiler namespace.primary_doc.xml under the oneafiler namespace), Part II (a single Offering Circular item set replacing the Model A / Model B fork), and Part III (Index to Exhibits, governed by the Item 17 numeric categories). Two tiers were established — Tier 1 up to $20 million and Tier 2 originally up to $50 million, raised to $75 million effective March 15, 2021. Tier 2 introduced audited-financials requirements, ongoing periodic reporting (Form 1-K annual, Form 1-SA semiannual, Form 1-U current), and federal preemption of state blue-sky review. Testing-the-waters communications acquired a defined exhibit slot (Item 17(13)). Reg A filing volume rose sharply after 2015, so the bulk of records reflect the post-2015 architecture described above.The summaryInfo/indicateTier1Tier2Offering value, the breadth of juridictionSecuritiesOffered lists, and the presence of testing-the-waters exhibits are all post-2015 artifacts. Pre-2015 records use the older structured-cover layout and a Model A or Model B narrative offering circular.
entities[].fileNo (024-#####) is the join key.<DOCUMENT>/<TYPE>/<SEQUENCE>/<FILENAME>/<DESCRIPTION>/<TEXT> lines but will misclassify them as text nodes, polluting downstream extraction. The robust approach is to detect the SGML preamble, extract the substring between <TEXT> and </TEXT>, and parse that as HTML.<img src="image_001.jpg"> (and similar) references. These point to files listed in metadata.json -> documentFormatFiles but absent from the ZIP. Resolve them via the sec.gov documentUrl of the matching GRAPHIC entry.<TYPE>/<DESCRIPTION> SGML preamble inside the file paired with the matching documentFormatFiles[] entry in metadata.json.partiiandiii.htm; others attach them as separate exhibits. Tier 2 filings carry audited GAAP statements; Tier 1 filings may carry unaudited statements. Extraction logic should look in both locations and disambiguate via exhibit <DESCRIPTION>/<TYPE>.summaryInfo/indicateTier1Tier2Offering drives whether audited financials, ongoing reporting, and federal preemption apply, and it correlates with the financialStatementAuditStatus field, the breadth of the jurisdiction lists, and the offering-amount ceilings. It is the single most important structured field in primary_doc.xml for downstream classification.oneafiler namespace will not validate against pre-2015 cover pages.A Form 1-A record is filed on EDGAR by an issuer seeking to qualify a securities offering under Regulation A (Securities Act of 1933, Section 3(b)(2), as added by Title IV of the JOBS Act of 2012). The filer is always the issuer itself; underwriters, placement agents, selling securityholders, and signing officers and directors appear inside the offering statement but do not file it.
To be eligible under Rule 251(a), an issuer must be organized in, and have its principal place of business in, the United States or Canada. In practice the eligible filer population includes:
Rule 251(b) categorically excludes:
Exchange Act reporting companies were originally barred but became eligible following the 2018 Economic Growth, Regulatory Relief, and Consumer Protection Act amendment to Rule 251(b)(2).
Form 1-A is a single form, but the issuer elects one of two tiers on the cover, and the election controls disclosure depth, state-law preemption, and ongoing reporting:
Tier election determines whether a filing chain produces downstream Form 1-K, 1-SA, 1-U, and 1-Z filings; those records live in their own form-type populations.
Form 1-A is filed before any sales occur: securities may not be sold in reliance on Regulation A until the SEC issues a notice of qualification under Rule 252(e). Records arise from these triggers:
Section 3(b) small-offering authority dates to the Securities Act of 1933, and Form 1-A in some form has existed since the SEC's earliest rulemaking. Paper-era Reg A filings predating EDGAR are not in this dataset. Electronic Form 1-A coverage on EDGAR begins in January 2002, with filing volume heavily concentrated after June 19, 2015, when the Reg A+ rewrite under Release 33-9741 made Tier 2 a viable capital-raising channel.
Form 1-A overlaps with several SEC offering and reporting forms, but each differs on dollar cap, eligibility, audit standard, ongoing-reporting consequences, and the legal mechanics of qualification versus registration. The closest neighbors are full Securities Act registrations (S-1, S-3, F-1), other exempt-offering filings (Form D, Form C), and the surrounding Regulation A family (1-K, 1-SA, 1-U, 1-Z, and 253(g) supplements). Form 10 sits adjacent on the Exchange Act side.
Form S-1 / Form S-1/A. Structurally the nearest cousin: both are pre-offering disclosure documents containing a prospectus or offering circular, risk factors, MD&A, business description, use of proceeds, officer and director disclosure, and financials. Differences:
Form S-3. A streamlined shelf registration limited to seasoned issuers meeting public-float and reporting-history tests, supporting incorporation by reference. Form 1-A has no float test and is not available for shelf takedowns; most 1-A issuers cannot incorporate by reference because they are not Exchange Act reporters. S-3 is a follow-on tool for established public companies; 1-A is an entry path for smaller and emerging issuers.
Form F-1. Foreign analog of S-1, with home-country accounting accommodations (IFRS without reconciliation in many cases). Form 1-A is restricted to issuers organized in and principally based in the U.S. or Canada, so the populations rarely intersect.
Form 10. Registers a class of securities under the Exchange Act, triggering 10-K/10-Q/8-K reporting; it does not register an offering. Form 1-A qualifies the offering itself but does not register a class. The two are sometimes paired when a Tier 2 issuer voluntarily files Form 10 or 8-A to obtain a listing. Form 10 creates a reporting company; Form 1-A creates a qualified offering.
These are post-qualification complements to Form 1-A, not substitutes:
A full Reg A lifecycle requires the 1-A (base offering document), 1-A/A (pre-qualification amendments), 253(g) supplements (post-qualification updates), and the 1-K/1-SA/1-U stream until 1-Z exit.
Form D (Regulation D). A brief XML notice filing for Rule 504 ($10M cap) and Rule 506(b)/(c) private placements. Differences from Form 1-A:
Form D and Form 1-A cover overlapping issuer populations (small private companies raising capital) but are not interchangeable: Form D is metadata; Form 1-A is a full disclosure document.
Form C (Regulation Crowdfunding). Section 4(a)(6) exemption capped at $5M per 12 months, conducted exclusively through registered intermediaries (funding portals or broker-dealers). Like Form 1-A, Form C requires structured disclosure including financials and risk factors, but content is narrower, the cap is far lower, secondary trading is restricted for one year, and the issuer population skews earlier-stage. Form C does not preempt state blue-sky in the same Section 18 sense as Tier 2 Reg A (Reg CF has its own preemption mechanic under Section 4(a)(6)). Form C-AR, Form C-U, and Form C-TR provide ongoing/exit reporting analogous to the 1-K/1-U/1-Z stream.
Form 1-A is the only SEC offering document that is (a) qualified rather than registered, (b) capped at $20M (Tier 1) or $75M (Tier 2) per 12 months, (c) able to preempt state blue-sky review under Section 18 (Tier 2 only) while permitting both general solicitation and non-accredited retail participation, and (d) followed by a purpose-built lighter ongoing-reporting regime (1-K/1-SA/1-U) rather than the full 10-K/10-Q/8-K stack. It is narrower in cap and lighter in audit and reporting consequences than S-1/F-1; far richer in narrative and financial content than Form D or Form C; and aimed at a different issuer stage and population than S-3 or Form 10. The 1-A/A amendments, 253(g) supplements, and 1-K/1-SA/1-U/1-Z reports are its true companion datasets, together describing a Reg A issuer from initial qualification through exit.
Form 1-A offering statements are used as a precedent library, market dataset, diligence file, enforcement record, and research corpus. The roles below anchor on different parts of the filing, including offering-circular sections, exhibits, financial statements, metadata, and amendment chains.
Issuer's counsel preparing Tier 1 and Tier 2 offerings use the corpus as a drafting precedent. They pull the "Plan of Distribution," "Use of Proceeds," "Risk Factors," and "Business" sections from comparable issuers to benchmark hedge language and operational detail. Placement-agent counsel compare commission structures, lock-ups, and escrow arrangements in exhibits. The 1-A versus 1-A/A diff is the most useful signal: it reveals which disclosures staff push back on, which drives comment-letter prediction on a new filing.
Corporation Finance reviewers focused on Regulation A use the dataset to prepare comment letters and maintain consistency across the small-issuer pipeline. They focus on financial-statement footnotes, going-concern language, related-party transactions, officer and director disclosures, and material-contract exhibits. Amendment frequency and the substance of corrections feed both individual review and aggregate program assessment.
Capital markets desks build league tables and pricing comps from the dataset. Tier, maximum offering amount, and security-type metadata, combined with cover-page disclosures, set headline economics; the "Plan of Distribution" exhibit reveals selling-agent commissions, expense reimbursements, and warrant coverage. Coverage analysts use the pipeline to identify candidates for follow-on raises, PIPEs, or graduation to S-1.
Diligence and onboarding teams at funding portals and Tier 2 placement marketplaces vet prospective issuers against prior 1-A filings, financial statements, bad-actor disclosures, and the exhibit index (organizational documents, escrow agreements, subscription documents). Aggregate metrics on tier mix, sector mix, average raise size, and amendment frequency feed product, pricing, and risk dashboards.
Pre-IPO and growth analysts use Form 1-A as a window into otherwise opaque private-company fundamentals. The financial statements, capitalization disclosures, and prior-round narratives in the offering circular support diligence memos on alternative-asset sponsors and consumer brands raising under Reg A. Use-of-proceeds and risk-factor language help classify a raise as a bridge, Series B substitute, or retail-distribution play.
Corporate development and private-credit underwriters mine the "Business" section, customer and supplier disclosures, property descriptions, and material-contract exhibits for intelligence on targets that do not file 10-Ks. Segment-level disclosures in the offering circular support revenue-mix, gross-margin, and working-capital triangulation in target memos.
CFOs, controllers, general counsel, and compliance officers at Reg A issuers track ongoing obligations and align internal disclosure with peer practice. They reference the exhibit index, qualification-correspondence patterns across amendment chains, bad-actor representations, ongoing reporting commitments, and state blue-sky carve-outs to maintain internal disclosure libraries.
Empirical researchers studying the JOBS Act, retail investor protection, and exempt offerings use the dataset as a primary corpus. Tier selection, offering size, time-to-qualification (from amendment timestamps), industry classification, and outcomes linked to 1-K, 1-SA, 1-U follow-ons or full registration support panel datasets. The offering-circular text supports NLP work on risk-factor evolution, readability, and disclosure tone.
Reporters covering Reg A markets and small-issuer fraud source stories from the offering-circular narrative, named officers and directors, related-party disclosures, and use-of-proceeds language. Amendment history, qualification status, and withdrawal indicators support stories on stalled or pulled deals; aggregate tier, sector, and raise-size metadata supports trend coverage.
Enforcement attorneys, forensic accountants, and plaintiff securities firms generate enforcement leads from the corpus. Forensic accountants compare financial statements across 1-A/A amendments and later 1-K, 1-SA, and 1-U filings to identify restated figures, shifting going-concern language, and emergent related-party transactions. Plaintiff teams mine biographical disclosures and bad-actor representations for inconsistencies; material-contract, escrow, and selling-agent exhibits anchor specific allegations.
Engineering teams at alternative-data vendors and systematic shops parse the XML metadata, extract structured fields (tier, offering amount, security type, dates, filer identifiers), and run document parsing over the HTML and PDF offering circulars to lift financial statements, officer lists, and risk-factor text. Outputs feed entity-resolution layers linking Reg A issuers to later registrations, name changes, and post-qualification reporting, and power models that flag anomalous amendment patterns.
Analysts covering Reg A debt, preferred, and asset-backed offerings underwrite non-traditional credits from the description of indebtedness, indenture and covenant exhibits, collateral descriptions, and sponsor or servicer disclosures. Comparative analysis across peer Reg A debt issuances supports coupon, tenor, redemption, and structural-protection benchmarks.
Teams building retrieval-augmented systems use the corpus as a structured-yet-narrative body of small-issuer text. Each filing pairs metadata XML with offering-circular HTML, exhibits, and PDFs, supporting chunking, embedding, and retrieval indexes for small-issuer Q&A, comparable-company lookup, and disclosure-pattern detection.
Capital markets desks at Reg A-focused broker-dealers construct league tables and economics comps. They pull summaryInfo/indicateTier1Tier2Offering, pricePerSecurity, totalAggregateOffering, and the named-service-provider blocks (sales-commissions broker, finders, auditors, legal counsel, with CRD numbers and fees) from primary_doc.xml, then cross-reference selling-agent commissions, expense caps, and warrant coverage in the ex1-*.htm placement-agent agreements. Output is a quarterly Tier 2 comp table by sector and security type, used to set headline economics on new mandates.
Issuer's counsel preparing a Tier 2 filing assemble the qualification history of comparable offerings by joining all records sharing the same entities[].fileNo (024-#####) and ordering them by filedAt. They diff the offering-circular HTML (partiiandiii.htm) and exhibit set across the 1-A and successive 1-A/A records, focusing on revisions to risk factors, going-concern language, related-party transactions, and use-of-proceeds tables. The resulting diff log feeds a comment-letter prediction memo for new drafts.
Forensic accountants and plaintiff-side analysts pull the <issuerInfo> balance-sheet and income-statement snapshot from primary_doc.xml plus the audited statements embedded in partiiandiii.htm or attached as exhibits, then chain them to later 1-K, 1-SA, and 1-U filings via CIK and offering file number. Restated revenue, shifting going-concern language, and newly disclosed related-party transactions become enforcement and litigation leads, with specific exhibit citations to material contracts and consents (ex11-*.htm, ex6-*.htm).
Onboarding teams at funding portals and Reg A placement marketplaces ingest each new monthly ZIP, filter records where summaryInfo/indicateTier1Tier2Offering is Tier 2 and financialStatementAuditStatus indicates audited GAAP, and rank candidates by totalAggregateOffering, SIC, and stateOfIncorporation. The bad-actor certification in <applicationRule262> and the auditor name from <issuerInfo> feed a pre-screen scorecard; the juridictionSecuritiesOffered block tells sales which states are in scope.
Empirical researchers studying the 2015 and 2021 Reg A+ amendments build panels keyed on accession number and CIK. They extract tier, offering ceiling, audit status, security type, primary SIC, and filedAt from primary_doc.xml; compute time-to-qualification from the timestamp gap between an initial 1-A and the final 1-A/A in the same 024-##### chain; and link issuers forward to 1-K, 1-SA, 1-U, or full S-1 registrations. The resulting panel supports regressions on amendment count, qualification latency, and post-qualification survival.
LLM application teams chunk and embed the offering-circular HTML and exhibit set after stripping the SGML <DOCUMENT>/<TEXT> wrapper from each .htm. Chunks are tagged with accessionNo, exhibit <TYPE> and <DESCRIPTION> from the SGML preamble, Item 17 category, CIK, SIC, and tier. The index powers comparable-issuer Q&A ("show risk-factor language from Tier 2 consumer-brand issuers raising more than $20M") and disclosure-pattern lookup that conventional EDGAR search cannot answer.
Disclosure counsel and platform legal teams build a clause library by harvesting Item 17 category 4 (subscription agreements) and category 8 (escrow agreements) exhibits across the corpus. They identify the right files via the <TYPE>/<DESCRIPTION> SGML preamble paired with documentFormatFiles[] entries (filename heuristics are unreliable due to mixed naming schemes), then cluster by issuer SIC and security class to surface standard versus aggressive terms on minimum investment, refund mechanics, escrow agent identity, and break conditions.
Corporate development teams at strategic acquirers mine Form 1-A as a fundamentals source on private targets that do not file 10-Ks. They extract the "Description of Business," "Description of Property," MD&A, and customer/supplier disclosures from partiiandiii.htm, plus material-contract exhibits in Item 17 category 6, to triangulate revenue mix, segment economics, working capital, and key contractual relationships. Output is a target memo built from primary disclosure rather than vendor-summarized data.
Dataset Index JSON API: https://api.sec-api.io/datasets/form-1a-files.json
Returns dataset-level metadata (name, description, last updated timestamp, earliest sample date, total record and size counts, covered form types, container format, and file types) along with a list of all individual container files. Each container entry includes its key, size, records, updatedAt, and a downloadUrl. Use this endpoint to monitor which containers were updated in the most recent refresh and to decide which files to download incrementally. This endpoint does not require an API key.
Example response:
1
{
2
"datasetId": "1f13365b-9ae0-6939-bf28-d0916cc9b4df",
3
"datasetDownloadUrl": "https://api.sec-api.io/datasets/form-1a-files.zip",
4
"name": "Form 1-A Files Dataset",
5
"updatedAt": "2026-04-25T02:56:39.138Z",
6
"earliestSampleDate": "2002-01-01",
7
"totalRecords": 56805,
8
"totalSize": 2593830705,
9
"formTypes": ["1-A", "1-A/A"],
10
"containerFormat": "ZIP",
11
"fileTypes": ["XML", "HTML", "JSON", "TXT", "PDF"],
12
"containers": [
13
{
14
"downloadUrl": "https://api.sec-api.io/datasets/form-1a-files/2026/2026-04.zip",
15
"key": "2026/2026-04.zip",
16
"size": 13818783,
17
"records": 154,
18
"updatedAt": "2026-04-25T02:56:39.138Z"
19
}
20
]
21
}
Download Entire Dataset: https://api.sec-api.io/datasets/form-1a-files.zip?token=YOUR_API_KEY
Downloads the complete dataset as a single ZIP archive containing all Form 1-A and 1-A/A filings from January 2002 to present. This endpoint requires an API key.
Download Single Container: https://api.sec-api.io/datasets/form-1a-files/2026/2026-04.zip?token=YOUR_API_KEY
Downloads one monthly ZIP container instead of the full archive. Use the downloadUrl values from the index JSON's containers array to fetch specific months. This endpoint requires an API key.
The dataset covers Form 1-A (initial Regulation A offering statements) and Form 1-A/A (pre-qualification and post-qualification amendments). Both form types share the same underlying schema and offering file number (024-#####), and amendments do not replace prior filings on disk — each is its own record under its own accession number.
One record is a single EDGAR offering-statement submission identified by its 18-digit accession number. It is the full bundle EDGAR received: the dataset's metadata.json header, the structured Form 1-A cover-page XML (primary_doc.xml), the XSL-rendered cover view, the narrative offering circular (partiiandiii.htm or a filer-slug variant), and every text-bearing exhibit attached to the submission.
The filer is always the issuer itself, seeking to qualify a securities offering under Regulation A before any sales occur. Eligible issuers must be organized in and have their principal place of business in the United States or Canada under Rule 251(a); investment companies, BDCs, blank-check companies, mineral-rights issuers, asset-backed issuers, and Rule 262 bad-actor disqualified issuers are categorically excluded.
Tier 1 covers offerings up to $20 million in any rolling 12 months, requires no audit, and remains subject to state blue-sky review. Tier 2 covers offerings up to $75 million, requires audited U.S. GAAP financials, preempts state registration under Section 18 of the Securities Act, and triggers ongoing reporting on Forms 1-K (annual), 1-SA (semiannual), 1-U (current), and 1-Z (exit). The election is captured in summaryInfo/indicateTier1Tier2Offering in primary_doc.xml.
Electronic Form 1-A coverage on EDGAR begins in January 2002 and runs to the present, refreshed monthly. The dataset spans the pre-2015 legacy Reg A regime, the June 19, 2015 Reg A+ rewrite (Release 33-9741) that established the two-tier structure, and the 2021 amendment (Release 33-10884) that raised the Tier 2 cap from $50 million to $75 million. Paper-era Reg A filings predating EDGAR are not included.
Each record folder contains JSON (metadata.json), XML (the EDGAR primary_doc.xml under the oneafiler schema), and HTML (the XSL-rendered cover page and every offering-circular and exhibit file). PDFs appear occasionally as per-document exhibit alternatives. Every .htm except the XSL render is SGML-wrapped — the actual HTML payload is nested inside a <DOCUMENT>/<TEXT> envelope that must be stripped before HTML parsing. Image files referenced by filings are excluded from the ZIP and must be fetched from sec.gov via the documentUrl in metadata.json.
Form 1-A is qualified rather than registered, capped at $20M (Tier 1) or $75M (Tier 2) per 12 months, and followed by the lighter 1-K/1-SA/1-U reporting regime rather than the full 10-K/10-Q/8-K stack. It is far narrower in cap and lighter in audit and reporting consequences than S-1; it is a full disclosure document rather than a brief notice like Form D; and it permits substantially larger raises with broader retail participation than Form C (Reg CF), which is capped at $5M and conducted exclusively through registered intermediaries.
Fundamental post-qualification changes — new security type or terms, new selling securityholders, changed plan of distribution, or material business change — require a full Form 1-A/A and re-qualification before sales resume. Non-fundamental updates (pricing within permitted bounds, minor disclosure updates) go through an offering-circular supplement under Rule 253(g) and appear under separate EDGAR types 253G1 through 253G4, which are not part of this dataset.