sec-api.ioSEC API by D2V
FilingsPricingSandboxDocs
Log inGet Free API Key
API Documentation
Introduction

Filing Query API
Full-Text Search API
Stream API
Download & PDF Generator API
XBRL-to-JSON Converter 
Extractor API 

Form ADV API - Investment Advisers

Form 3/4/5 API - Insider Trading
Form 144 API - Restricted Sales
Form 13F API - Institut. Holdings
Form 13D/13G API - Activist Invst.
Form N-PORT API - Mutual Funds

Form N-CEN API - Annual Reports
Form N-PX API - Proxy Voting

Form S-1/424B4 API - IPOs, Notes
Form C API - Crowdfunding
Form D API - Private Sec. Offerings
Form 1-A/1-K/1-Z - Reg A Offerings
Overview
Example: Python

Form 8-K API - Item 4.01
Form 8-K API - Item 4.02
Form 8-K API - Item 5.02

Executive Compensation API
Directors & Board Members
Company Subsidiaries
Outstanding Shares & Public Float

SEC Enforcement Actions
SEC Litigation Releases
SEC Administrative Proceedings
AAER Database API
SRO Filings Database

CIK, CUSIP, Ticker Mapping API
EDGAR Entities Database

Financial Statements

Analysis of Regulation A Offering Statement Disclosures

Open In Colab   Download Notebook

On this page:
  • Quick Start
    • Download Dataset
      • Analyzing Data
        • Offering Tier
          • Offering amounts
            • Form 1-K Annual Report Dataset
              • Form 1-Z Exit Reports

                This notebook demonstrates an exploratory data analysis examining offering statement disclosures filed under Regulation A in SEC Form 1-A, as well as subsequent updates submitted via SEC Forms 1-K and 1-Z.

                Since 2015, companies have submitted Part I of these disclosures, containing notification details, in XML format. Leveraging our Regulation A Offering Statement API, we transform these disclosures into a standardized JSON format, facilitating comprehensive and efficient analysis.

                Our analysis addresses several critical dimensions:

                • Temporal trends in the number of Regulation A disclosures from 2015 to 2024, segmented by quarter, month, and intraday timing (pre-market, regular market hours, after-market).
                • Distribution patterns across structured data fields, including the proportion of disclosures categorized by specific form types.
                • Analysis of the offered amounts, including their distribution and temporal evolution.
                • Progress tracking of offering campaigns over the reported years.
                • Final aggregated metrics, including total securities sold and the corresponding completion percentages at campaign conclusion.
                • Analysis of fee percentages relative to the amounts raised during offering campaigns.

                Quick Start

                To quickly retrieve data for a specific company, modify the following example as needed. For more detail, see Regulation A Offering Statement API and sec-api-python package readme.

                %pip install sec_api # use %pip for reliable install in current environment
                # NOTE: Replace with your own API key
                API_KEY_SEC_API = "YOUR_API_KEY"
                from sec_api import RegASearchAllApi
                import json

                searchApi = RegASearchAllApi(api_key=API_KEY_SEC_API)

                search_params = {
                    "query": "cik:1061040",
                    "from": "0",
                    "size": "1",
                    "sort": [{"filedAt": {"order": "desc"}}],
                }

                # get C filing metadata: issuer background, offering details,
                # financial information, and more
                response = searchApi.get_data(search_params)
                form_1a_filing = response["data"]

                print(json.dumps(form_1a_filing, indent=2))
                {
                  "id": "d9e400922a251c022bcebf0cc1c2f632",
                  "accessionNo": "0001477932-25-001240",
                  "fileNo": "024-12580",
                  "formType": "1-A",
                  "filedAt": "2025-02-25T06:47:00-05:00",
                  "cik": "1061040",
                  "ticker": "",
                  "companyName": "New Generation Consumer Group, Inc.",
                  "employeesInfo": [
                    {
                      "issuerName": "New Generation Consumer Group Inc.",
                      "jurisdictionOrganization": "DE",
                      "yearIncorporation": "1998",
                      "cik": "0001061040",
                      "sicCode": 7371,
                      "irsNum": "11-3118271",
                      "fullTimeEmployees": 1,
                      "partTimeEmployees": 0
                    }
                  ],
                  "issuerInfo": {
                    "street1": "7950 E. Redfield Rd, Unit 210",
                    "city": "Scottsdale",
                    "stateOrCountry": "AZ",
                    "zipCode": "85260",
                    "phoneNumber": "480-755-0591",
                    "connectionName": "Eric Newlan",
                    "industryGroup": "Other",
                    "cashEquivalents": 0,
                    "investmentSecurities": 0,
                    "accountsReceivable": 0,
                    "propertyPlantEquipment": 0,
                    "totalAssets": 100000,
                    "accountsPayable": 27724,
                    "longTermDebt": 0,
                    "totalLiabilities": 27724,
                    "totalStockholderEquity": 72276,
                    "totalLiabilitiesAndEquity": 100000,
                    "totalRevenues": 0,
                    "costAndExpensesApplToRevenues": 0,
                    "depreciationAndAmortization": 0,
                    "netIncome": 0,
                    "earningsPerShareBasic": 0,
                    "earningsPerShareDiluted": 0
                  },
                  "commonEquity": [
                    {
                      "commonEquityClassName": "Common Stock",
                      "outstandingCommonEquity": 1786672777,
                      "commonCusipEquity": "584976302",
                      "publiclyTradedCommonEquity": "OTC Pink"
                    }
                  ],
                  "preferredEquity": [
                    {
                      "preferredEquityClassName": "Series A-2 Preferred Stock",
                      "outstandingPreferredEquity": 1000000,
                      "preferredCusipEquity": "0000000",
                      "publiclyTradedPreferredEquity": "N/A"
                    }
                  ],
                  "debtSecurities": [
                    {
                      "debtSecuritiesClassName": "None",
                      "outstandingDebtSecurities": 0,
                      "cusipDebtSecurities": "0000000",
                      "publiclyTradedDebtSecurities": "N/A"
                    }
                  ],
                  "issuerEligibility": {
                    "certifyIfTrue": true
                  },
                  "applicationRule262": {
                    "certifyIfNotDisqualified": true,
                    "certifyIfBadActor": false
                  },
                  "summaryInfo": {
                    "indicateTier1Tier2Offering": "Tier1",
                    "financialStatementAuditStatus": "Unaudited",
                    "securitiesOfferedTypes": [
                      "Equity (common or preferred stock)"
                    ],
                    "offerDelayedContinuousFlag": true,
                    "offeringYearFlag": false,
                    "offeringAfterQualifFlag": true,
                    "offeringBestEffortsFlag": true,
                    "solicitationProposedOfferingFlag": false,
                    "resaleSecuritiesAffiliatesFlag": false,
                    "securitiesOffered": 2000000000,
                    "outstandingSecurities": 1821272777,
                    "pricePerSecurity": 0.0005,
                    "issuerAggregateOffering": 1000000,
                    "securityHolderAggegate": 0,
                    "qualificationOfferingAggregate": 0,
                    "concurrentOfferingAggregate": 0,
                    "totalAggregateOffering": 1000000,
                    "underwritersFees": 0,
                    "salesCommissionsServiceProviderFees": 0,
                    "finderFeesFee": 0,
                    "auditorFees": 0,
                    "legalServiceProviderName": "Newlan Law Firm, PLLC",
                    "legalFees": 7500,
                    "promotersFees": 0,
                    "blueSkyServiceProviderName": "State Regulators",
                    "blueSkyFees": 2500,
                    "estimatedNetAmount": 990000
                  },
                  "juridictionSecuritiesOffered": {
                    "jurisdictionsOfSecOfferedNone": true,
                    "issueJuridicationSecuritiesOffering": [
                      "AK",
                      "AL",
                      "AR",
                      "AZ",
                      "CA",
                      "CO",
                      "CT",
                      "DC",
                      "DC",
                      "DE",
                      "FL",
                      "GA",
                      "HI",
                      "IA",
                      "ID",
                      "IL",
                      "IN",
                      "KS",
                      "KY",
                      "LA",
                      "MA",
                      "MD",
                      "ME",
                      "MI",
                      "MN",
                      "MO",
                      "MS",
                      "MT",
                      "NC",
                      "ND",
                      "NE",
                      "NH",
                      "NJ",
                      "NM",
                      "NV",
                      "NY",
                      "OH",
                      "OK",
                      "OR",
                      "PA",
                      "RI",
                      "SC",
                      "SD",
                      "TN",
                      "TX",
                      "UT",
                      "VA",
                      "VT",
                      "WA",
                      "WI",
                      "WV",
                      "WY",
                      "PR"
                    ]
                  },
                  "unregisteredSecurities": {
                    "ifUnregsiteredNone": false
                  },
                  "securitiesIssued": [
                    {
                      "securitiesIssuerName": "New Generation Consumer Group Inc.",
                      "securitiesIssuerTitle": "COMMON",
                      "securitiesIssuedTotalAmount": 500000000,
                      "securitiesPrincipalHolderAmount": 0,
                      "securitiesIssuedAggregateAmount": "$50000; TERMS OF CONTRACTS; DETERMINATION OF BOARD OF DIRECTORS"
                    }
                  ],
                  "unregisteredSecuritiesAct": {
                    "securitiesActExcemption": "Section 4(a)(2) of the Securities Act of 1933, as amended"
                  }
                }

                Download Dataset

                To load and prepare the dataset of over 10,000 offering statement filings from Forms 1-A, 1-K and 1-Z in since June 2015, we utilize the Search endpoint of the Regulation A Offering Statements API. This endpoint can retrieve the filings of all three subtypes. The following code handles data loading and preparation by executing multiple download processes in parallel, significantly reducing downloading time.

                Once downloaded, all data objects are saved in JSONL format to ./reg-a-dataset.jsonl.

                Downloading the data may take several minutes.

                import sys
                import os
                import time
                import random

                # from multiprocessing import Pool # use in .py files only
                from concurrent.futures import ThreadPoolExecutor

                YEARS = range(2025, 2014, -1) # from 2025 to 2015
                TEMP_FILE_TEMPLATE = "./temp_file_reg_a_{}.jsonl"
                TARGET_FILE = "./reg-a-dataset.jsonl"


                def process_year(year):
                    backoff_time = random.randint(10, 800) / 1000
                    print(f"Starting year {year} with backoff time {backoff_time:,}s")
                    sys.stdout.flush()
                    time.sleep(backoff_time)

                    tmp_filename = TEMP_FILE_TEMPLATE.format(year)
                    tmp_file = open(tmp_filename, "a")

                    for month in range(12, 0, -1):
                        search_from = 0
                        month_counter = 0

                        while True:
                            query = f"filedAt:[{year}-{month:02d}-01 TO {year}-{month:02d}-31]"
                            searchRequest = {
                                "query": query,
                                "from": search_from,
                                "size": "50",
                                "sort": [{"filedAt": {"order": "desc"}}],
                            }

                            response = None
                            try:
                                response = searchApi.get_data(searchRequest)
                            except Exception as e:
                                print(f"{year}-{month:02d} error: {e}")
                                sys.stdout.flush()
                                continue

                            if response == None or len(response["data"]) == 0:
                                break

                            search_from += 50
                            month_counter += len(response["data"])
                            jsonl_data = "\n".join([json.dumps(entry) for entry in response["data"]])
                            tmp_file.write(jsonl_data + "\n")

                        print(f"Finished loading {month_counter} filings for {year}-{month:02d}")
                        sys.stdout.flush()

                    tmp_file.close()

                    return year


                if not os.path.exists(TARGET_FILE):
                    with ThreadPoolExecutor(max_workers=4) as pool:
                        processed_years = list(pool.map(process_year, YEARS))
                    print("Finished processing all years.", processed_years)

                    # merge the temporary files into one final file
                    with open(TARGET_FILE, "a") as outfile:
                        for year in YEARS:
                            temp_file = TEMP_FILE_TEMPLATE.format(year)
                            if os.path.exists(temp_file):
                                with open(temp_file, "r") as infile:
                                    outfile.write(infile.read())
                else:
                    print("File already exists. Skipping download.")
                File already exists. Skipping download.

                Analyzing Data

                # install all dependencies required for the notebook
                # %pip install pandas numpy matplotlib seaborn
                import pandas as pd
                import numpy as np
                import matplotlib.pyplot as plt
                import matplotlib.style as style
                import matplotlib.ticker as mtick
                import seaborn as sns

                style.use("default")

                params = {
                    "axes.labelsize": 8,
                    "font.size": 8,
                    "legend.fontsize": 8,
                    "xtick.labelsize": 8,
                    "ytick.labelsize": 8,
                    "font.family": "sans-serif",
                    "axes.spines.top": False,
                    "axes.spines.right": False,
                    "grid.color": "grey",
                    "axes.grid": True,
                    "axes.grid.axis": "y",
                    "grid.alpha": 0.5,
                    "grid.linestyle": ":",
                }

                plt.rcParams.update(params)

                form_name = "Reg A"
                form_name_escaped = "reg-a"
                TARGET_FILE = "./REG-A-dataset.jsonl"
                structured_data = pd.read_json(TARGET_FILE, lines=True)
                structured_data = pd.json_normalize(structured_data.to_dict(orient="records"))

                structured_data["filedAt"] = pd.to_datetime(structured_data["filedAt"], utc=True)
                structured_data["filedAt"] = structured_data["filedAt"].dt.tz_convert("US/Eastern")
                structured_data = structured_data.sort_values("filedAt", ascending=True).reset_index(
                    drop=True
                )
                structured_data.drop_duplicates("accessionNo", keep="first", inplace=True)
                structured_data["year"] = structured_data["filedAt"].dt.year
                structured_data["month"] = structured_data["filedAt"].dt.month
                structured_data["qtr"] = structured_data["month"].apply(lambda x: (x - 1) // 3 + 1)
                structured_data["dayOfWeek"] = structured_data["filedAt"].dt.day_name()
                # filedAtClass: preMarket (4:00AM-9:30AM), regularMarket (9:30AM-4:00PM), afterMarket (4:00PM-8:00PM)
                structured_data["filedAtClass"] = structured_data["filedAt"].apply(
                    lambda x: (
                        "preMarket"
                        if x.hour < 9 or (x.hour == 9 and x.minute < 30)
                        else (
                            "regularMarket"
                            if x.hour < 16
                            else "afterMarket" if x.hour < 20 else "other"
                        )
                    )
                )

                structured_data.head()

                unique_years = structured_data["year"].nunique()
                unique_companies = structured_data["cik"].nunique()
                unique_filings = structured_data["accessionNo"].nunique()
                min_year = structured_data["year"].min()
                max_year = structured_data["year"].max()
                max_year_full = max_year - 1 # to avoid incomplete data for the current year
                print("Loaded dataframe with main documents of Regulation A Offering Statement filings")
                print(f"Number of filings: {unique_filings:,}")
                print(f"Number of records: {len(structured_data):,}")
                print(f"Number of years: {unique_years:,} ({min_year}-{max_year})")
                print(f"Number of unique companies: {unique_companies:,}")

                structured_data.head()
                Loaded dataframe with main documents of Regulation A Offering Statement filings
                Number of filings: 11,411
                Number of records: 11,411
                Number of years: 11 (2015-2025)
                Number of unique companies: 1,696
                Out[7]:
                idaccessionNofileNoformTypefiledAtciktickercompanyNameemployeesInfocommonEquity...issuerInfo.depositsissuerInfo.totalInterestIncomeissuerInfo.totalInterestExpensesissuerInfo.totalInvestmentsissuerInfo.policyLiabilitiesAndAccrualsyearmonthqtrdayOfWeekfiledAtClass
                0aa6c5e9327a19e1db80556a577e07dc10001645471-15-000001024-104551-A2015-06-22 15:19:25-04:001645471Tuscan Gardens Secured Income Fund LLC[{'issuerName': 'Tuscan Gardens Secured Income...[{'commonEquityClassName': '0', 'outstandingCo......NaNNaNNaNNaNNaN201562MondayregularMarket
                1747b503117a537e5c7729301c33ddeec0001587999-15-000003024-104561-A2015-06-23 09:56:37-04:001587999Southern Tier Region Rural Broadband Company, ...[{'issuerName': 'Southern Tier Region Rural Br...[{'commonEquityClassName': 'common voting', 'o......NaNNaNNaNNaNNaN201562TuesdayregularMarket
                26476f78a7b89dca2c0dc0c456b2be5380001587999-15-000004024-104571-A2015-06-23 10:40:37-04:001579586Finger Lakes Region Rural Broadband Company, Inc.[{'issuerName': 'Finger Lakes Region Rural Bro...[{'commonEquityClassName': 'common voting', 'o......NaNNaNNaNNaNNaN201562TuesdayregularMarket
                3ff589f8779fddb23498dcada5d0b75ac0001587999-15-000005024-104581-A2015-06-23 11:49:12-04:001640170Western Gateway Region Rural Broadband Company...[{'issuerName': 'Western Gateway Region Rural ...[{'commonEquityClassName': 'common voting', 'o......NaNNaNNaNNaNNaN201562TuesdayregularMarket
                44aa98853a21d4de1f9550dbef5cf76430001587999-15-000007024-104591-A2015-06-23 12:34:20-04:001644516Mid-Hudson Broadband Co[{'issuerName': 'Mid-Hudson Broadband Co', 'ju...[{'commonEquityClassName': 'common voting', 'o......NaNNaNNaNNaNNaN201562TuesdayregularMarket

                5 rows × 121 columns

                structured_data.info()
                <class 'pandas.core.frame.DataFrame'>
                Index: 11411 entries, 0 to 11435
                Columns: 121 entries, id to filedAtClass
                dtypes: datetime64[ns, US/Eastern](1), float64(47), int32(2), int64(2), object(69)
                memory usage: 10.5+ MB
                for s in structured_data.columns:
                    print(f"{s}: {structured_data[s].notna().size} non-nan values")
                id: 11411 non-nan values
                accessionNo: 11411 non-nan values
                fileNo: 11411 non-nan values
                formType: 11411 non-nan values
                filedAt: 11411 non-nan values
                cik: 11411 non-nan values
                ticker: 11411 non-nan values
                companyName: 11411 non-nan values
                employeesInfo: 11411 non-nan values
                commonEquity: 11411 non-nan values
                preferredEquity: 11411 non-nan values
                debtSecurities: 11411 non-nan values
                securitiesIssued: 11411 non-nan values
                unregisteredSecurities: 11411 non-nan values
                periodOfReport: 11411 non-nan values
                item1: 11411 non-nan values
                item1Info: 11411 non-nan values
                item2: 11411 non-nan values
                certificationSuspension: 11411 non-nan values
                signatureTab: 11411 non-nan values
                summaryInfoOffering: 11411 non-nan values
                issuerInfo.street1: 11411 non-nan values
                issuerInfo.street2: 11411 non-nan values
                issuerInfo.city: 11411 non-nan values
                issuerInfo.stateOrCountry: 11411 non-nan values
                issuerInfo.zipCode: 11411 non-nan values
                issuerInfo.phoneNumber: 11411 non-nan values
                issuerInfo.connectionName: 11411 non-nan values
                issuerInfo.industryGroup: 11411 non-nan values
                issuerInfo.cashEquivalents: 11411 non-nan values
                issuerInfo.investmentSecurities: 11411 non-nan values
                issuerInfo.accountsReceivable: 11411 non-nan values
                issuerInfo.propertyPlantEquipment: 11411 non-nan values
                issuerInfo.totalAssets: 11411 non-nan values
                issuerInfo.accountsPayable: 11411 non-nan values
                issuerInfo.longTermDebt: 11411 non-nan values
                issuerInfo.totalLiabilities: 11411 non-nan values
                issuerInfo.totalStockholderEquity: 11411 non-nan values
                issuerInfo.totalLiabilitiesAndEquity: 11411 non-nan values
                issuerInfo.totalRevenues: 11411 non-nan values
                issuerInfo.costAndExpensesApplToRevenues: 11411 non-nan values
                issuerInfo.depreciationAndAmortization: 11411 non-nan values
                issuerInfo.netIncome: 11411 non-nan values
                issuerInfo.earningsPerShareBasic: 11411 non-nan values
                issuerInfo.earningsPerShareDiluted: 11411 non-nan values
                issuerInfo.nameAuditor: 11411 non-nan values
                issuerEligibility.certifyIfTrue: 11411 non-nan values
                applicationRule262.certifyIfNotDisqualified: 11411 non-nan values
                summaryInfo.indicateTier1Tier2Offering: 11411 non-nan values
                summaryInfo.financialStatementAuditStatus: 11411 non-nan values
                summaryInfo.securitiesOfferedTypes: 11411 non-nan values
                summaryInfo.offerDelayedContinuousFlag: 11411 non-nan values
                summaryInfo.offeringYearFlag: 11411 non-nan values
                summaryInfo.offeringAfterQualifFlag: 11411 non-nan values
                summaryInfo.offeringBestEffortsFlag: 11411 non-nan values
                summaryInfo.solicitationProposedOfferingFlag: 11411 non-nan values
                summaryInfo.resaleSecuritiesAffiliatesFlag: 11411 non-nan values
                summaryInfo.securitiesOffered: 11411 non-nan values
                summaryInfo.outstandingSecurities: 11411 non-nan values
                summaryInfo.pricePerSecurity: 11411 non-nan values
                summaryInfo.issuerAggregateOffering: 11411 non-nan values
                summaryInfo.securityHolderAggegate: 11411 non-nan values
                summaryInfo.qualificationOfferingAggregate: 11411 non-nan values
                summaryInfo.concurrentOfferingAggregate: 11411 non-nan values
                summaryInfo.totalAggregateOffering: 11411 non-nan values
                summaryInfo.underwritersServiceProviderName: 11411 non-nan values
                summaryInfo.underwritersFees: 11411 non-nan values
                summaryInfo.salesCommissionsServiceProviderName: 11411 non-nan values
                summaryInfo.salesCommissionsServiceProviderFees: 11411 non-nan values
                summaryInfo.findersFeesServiceProviderName: 11411 non-nan values
                summaryInfo.finderFeesFee: 11411 non-nan values
                summaryInfo.auditorServiceProviderName: 11411 non-nan values
                summaryInfo.auditorFees: 11411 non-nan values
                summaryInfo.legalServiceProviderName: 11411 non-nan values
                summaryInfo.legalFees: 11411 non-nan values
                summaryInfo.promotersServiceProviderName: 11411 non-nan values
                summaryInfo.promotersFees: 11411 non-nan values
                summaryInfo.blueSkyServiceProviderName: 11411 non-nan values
                summaryInfo.blueSkyFees: 11411 non-nan values
                summaryInfo.estimatedNetAmount: 11411 non-nan values
                summaryInfo.clarificationResponses: 11411 non-nan values
                juridictionSecuritiesOffered.jurisdictionsOfSecOfferedNone: 11411 non-nan values
                juridictionSecuritiesOffered.issueJuridicationSecuritiesOffering: 11411 non-nan values
                unregisteredSecuritiesAct.securitiesActExcemption: 11411 non-nan values
                unregisteredSecuritiesAct: 11411 non-nan values
                applicationRule262.certifyIfBadActor: 11411 non-nan values
                juridictionSecuritiesOffered.jurisdictionsOfSecOfferedSame: 11411 non-nan values
                unregisteredSecurities.ifUnregsiteredNone: 11411 non-nan values
                juridictionSecuritiesOffered.dealersJuridicationSecuritiesOffering: 11411 non-nan values
                summaryInfo.brokerDealerCrdNumber: 11411 non-nan values
                summaryInfo.securitiesOfferedOtherDesc: 11411 non-nan values
                issuerInfo: 11411 non-nan values
                issuerEligibility: 11411 non-nan values
                applicationRule262: 11411 non-nan values
                summaryInfo: 11411 non-nan values
                juridictionSecuritiesOffered: 11411 non-nan values
                item1.formIndication: 11411 non-nan values
                item1.fiscalYearEnd: 11411 non-nan values
                item1.street1: 11411 non-nan values
                item1.city: 11411 non-nan values
                item1.stateOrCountry: 11411 non-nan values
                item1.zipCode: 11411 non-nan values
                item1.phoneNumber: 11411 non-nan values
                item1.issuedSecuritiesTitle: 11411 non-nan values
                item2.regArule257: 11411 non-nan values
                item1.street2: 11411 non-nan values
                item1.issuerName: 11411 non-nan values
                item1.phone: 11411 non-nan values
                item1.commissionFileNumber: 11411 non-nan values
                issuerInfo.loans: 11411 non-nan values
                issuerInfo.propertyAndEquipment: 11411 non-nan values
                issuerInfo.deposits: 11411 non-nan values
                issuerInfo.totalInterestIncome: 11411 non-nan values
                issuerInfo.totalInterestExpenses: 11411 non-nan values
                issuerInfo.totalInvestments: 11411 non-nan values
                issuerInfo.policyLiabilitiesAndAccruals: 11411 non-nan values
                year: 11411 non-nan values
                month: 11411 non-nan values
                qtr: 11411 non-nan values
                dayOfWeek: 11411 non-nan values
                filedAtClass: 11411 non-nan values
                structured_data_full_years = structured_data[
                    structured_data["year"].between(min_year, max_year - 1)
                ]
                def plot_timeseries(ts, title):
                    fig, ax = plt.subplots(figsize=(4, 2.5))
                    ts["count"].plot(ax=ax, legend=False)
                    ax.set_title(title)
                    ax.set_xlabel("Year")
                    ax.set_ylabel(f"Number of\n{form_name} Filings")
                    ax.set_xticks(np.arange(min_year, max_year, 1))
                    ax.yaxis.set_major_formatter(mtick.StrMethodFormatter("{x:,.0f}"))
                    ax.set_xlim(min_year - 1, max_year)
                    ax.grid(axis="x")
                    ax.set_axisbelow(True)
                    plt.xticks(rotation=45, ha="right")

                    for year in range(min_year, max_year, 1):
                        year_y_max = ts.loc[year, "count"]
                        ax.vlines(year, 0, year_y_max, linestyles=":", colors="grey", alpha=0.5, lw=1)

                    plt.tight_layout()
                    plt.show()


                reg_a_filings = (
                    structured_data_full_years.drop_duplicates(subset=["accessionNo"])
                    .groupby(["year"])
                    .size()
                    .to_frame(name="count")
                )

                plot_timeseries(
                    reg_a_filings,
                    title=f"{form_name} Filings per Year ({min_year} - {max_year_full})",
                )
                count_formType = (
                    structured_data_full_years.drop_duplicates(subset=["accessionNo"])
                    .groupby(["formType"])
                    .size()
                    .sort_values(ascending=False)
                    .to_frame(name="Count")
                ).rename_axis("Submission Type")
                count_formType["Pct"] = (
                    count_formType["Count"].astype(int) / count_formType["Count"].astype(int).sum()
                ).map("{:.0%}".format)
                count_formType["Count"] = count_formType["Count"].map(lambda x: f"{x:,}")

                print(f"{form_name} Disclosures by Submission Type ({min_year} - {max_year_full})")
                count_formType
                Reg A Disclosures by Submission Type (2015 - 2024)
                Out[12]:
                CountPct
                Submission Type
                1-A/A4,15237%
                1-A POS2,24220%
                1-A2,07318%
                1-K1,96517%
                1-A-W4284%
                1-Z3113%
                1-K/A1031%
                1-Z/A90%
                1-Z-W60%
                form_counts_by_type_and_year = (
                    structured_data_full_years.drop_duplicates(subset=["accessionNo"])
                    .groupby(["year", "formType"])
                    .size()
                    .to_frame(name="count")
                    .unstack(fill_value=0)
                )

                form_counts_by_type_and_year.loc["Total"] = form_counts_by_type_and_year.sum()
                form_counts_by_type_and_year["Total"] = form_counts_by_type_and_year.sum(axis=1)


                print(f"{form_name} counts from {min_year} to {max_year_full}.")
                form_counts_by_type_and_year
                Reg A counts from 2015 to 2024.
                Out[13]:
                countTotal
                formType1-A1-A POS1-A-W1-A/A1-K1-K/A1-Z1-Z-W1-Z/A
                year
                20155331010700500178
                20161458431347131902632
                2017125148223535972411740
                2018144196313848742411872
                201920419841444109922001027
                2020255291456061551125001388
                202136032455684244730041708
                2022352368564623563364201693
                2023264374643944582059101634
                2024171256733714841149111417
                Total20732242428415219651033116911289
                fig, ax = plt.subplots(figsize=(6, 3))
                form_counts_by_type_and_year["count"].drop("Total").plot(
                    kind="bar", stacked=True, ax=ax
                )
                ax.set_xlabel("Year")
                ax.set_ylabel("Number of Filings")
                ax.yaxis.set_major_formatter(mtick.StrMethodFormatter("{x:,.0f}"))
                ax.grid(axis="x")
                ax.set_axisbelow(True)
                handles, labels = ax.get_legend_handles_labels()
                ax.legend(
                    list(reversed(handles)),
                    list(reversed(labels)),
                    title="Form Type",
                    labelspacing=0.15,
                )
                ax.set_title(
                    f"{form_name} Filings by Form Type per Year ({min_year} - {max_year_full})"
                )
                plt.show()
                counts_qtr_yr_piv = (
                    structured_data_full_years.groupby(["year", "qtr"]).size().unstack().fillna(0)
                ).astype(int)

                print(f"{form_name} counts by quarter from {min_year} to {max_year_full}.")
                counts_qtr_yr_piv.T
                Reg A counts by quarter from 2015 to 2024.
                Out[15]:
                year2015201620172018201920202021202220232024
                qtr
                10137177216182260402323373247
                212133183251322379523632704686
                367173185225275359393356301241
                499189195180248390390382256243
                plt.figure(figsize=(4, 2))
                sns.heatmap(
                    counts_qtr_yr_piv.T,
                    annot=True, # Display the cell values
                    fmt="d", # Integer formatting
                    cmap="magma", # Color map
                    cbar_kws={"label": "Count"}, # Colorbar label
                    mask=counts_qtr_yr_piv.T == 0, # Mask the cells with value 0
                    cbar=False,
                    annot_kws={"fontsize": 7},
                )
                plt.grid(False)
                plt.title(f"{form_name} Counts by Quarter {min_year} to {max_year_full}")
                plt.xlabel("Year")
                plt.ylabel("Quarter")
                plt.tight_layout()
                plt.show()
                form_types = count_formType.index.tolist()

                fig, axes = plt.subplots(3, 3, figsize=(9, 7))

                cnt = 0
                for formType in form_types:
                    data = (
                        structured_data_full_years[structured_data_full_years["formType"] == formType]
                        .groupby(["year", "qtr"])
                        .size()
                        .unstack()
                        .fillna(0)
                        .astype(int)
                        .reindex(columns=range(1, 5), fill_value=0) # ensure all month are included
                    )

                    filing_name = formType
                    # if data.sum().sum() < 100:
                    # continue

                    ax = axes.flatten()[cnt]

                    sns.heatmap(
                        data.T,
                        ax=ax,
                        annot=True, # Display the cell values
                        fmt="d", # Integer formatting
                        cmap="magma", # Color map
                        cbar_kws={"label": "Count"}, # Colorbar label
                        mask=data.T == 0, # Mask the cells with value 0
                        cbar=False,
                        annot_kws={"fontsize": 7},
                    )
                    ax.grid(False)
                    ax.set_title(f"{filing_name} Counts")
                    ax.set_xlabel("Year")
                    ax.set_ylabel("Quarter")

                    cnt += 1

                fig.suptitle(f"Regulation A Filing Counts by Quarter {min_year} to {max_year_full}")
                plt.tight_layout()
                counts_qtr_yr = counts_qtr_yr_piv.stack().reset_index(name="count")

                fig, ax = plt.subplots(figsize=(6, 2.5))
                counts_qtr_yr_piv.plot(kind="bar", ax=ax, legend=True)
                ax.legend(title="Quarter", loc="upper right", bbox_to_anchor=(1.15, 1))
                ax.set_title(f"Number of {form_name} Filings per Quarter\n({min_year}-{max_year_full})")
                ax.set_xlabel("Year")
                ax.set_ylabel(f"Number of\n{form_name} Filings")
                ax.yaxis.set_major_formatter(mtick.StrMethodFormatter("{x:,.0f}"))
                ax.grid(axis="x")
                ax.set_axisbelow(True)
                plt.tight_layout()
                plt.show()
                counts_month_yr_piv = (
                    structured_data_full_years.groupby(["year", "month"]).size().unstack().fillna(0)
                ).astype(int)

                plt.figure(figsize=(6, 4))
                sns.heatmap(
                    counts_month_yr_piv,
                    annot=True,
                    fmt="d",
                    cmap="magma",
                    cbar_kws={"label": "Count"},
                    mask=counts_month_yr_piv == 0,
                    cbar=False,
                    annot_kws={"size": 7},
                )
                # convert x-labels to month names: 1 => Jan, 2 => Feb, etc.
                plt.xticks(
                    ticks=np.arange(0.5, 12.5, 1),
                    labels=[pd.to_datetime(str(i), format="%m").strftime("%b") for i in range(1, 13)],
                )
                plt.grid(False)
                plt.title(f"{form_name} Counts by Month ({min_year} - {max_year_full})")
                plt.xlabel("")
                plt.ylabel("Year")
                plt.tight_layout()
                plt.show()
                counts_initial_only_month_yr_piv = (
                    structured_data_full_years[structured_data_full_years["formType"] == "1-A"]
                    .groupby(["year", "month"])
                    .size()
                    .unstack()
                    .fillna(0)
                ).astype(int)

                plt.figure(figsize=(6, 4))
                sns.heatmap(
                    counts_initial_only_month_yr_piv,
                    annot=True,
                    fmt="d",
                    cmap="magma",
                    cbar_kws={"label": "Count"},
                    mask=counts_initial_only_month_yr_piv == 0,
                    cbar=False,
                    annot_kws={"size": 7},
                )
                # convert x-labels to month names: 1 => Jan, 2 => Feb, etc.
                plt.xticks(
                    ticks=np.arange(0.5, 12.5, 1),
                    labels=[pd.to_datetime(str(i), format="%m").strftime("%b") for i in range(1, 13)],
                )
                plt.grid(False)
                plt.title(f"Initial Filing Counts (Form 1-A) by Month ({min_year} - {max_year_full})")
                plt.xlabel("")
                plt.ylabel("Year")
                plt.tight_layout()
                plt.show()
                print(
                    f"Descriptive statistics for Form 1-A filing counts by month from {min_year} to {max_year_full}."
                )
                month_stats = (
                    counts_initial_only_month_yr_piv.loc[2004:]
                    .describe(percentiles=[0.025, 0.975])
                    .round(0)
                    .astype(int)
                )
                month_stats
                Descriptive statistics for Form 1-A filing counts by month from 2015 to 2024.
                Out[21]:
                month123456789101112
                count101010101010101010101010
                mean161319141620171918181818
                std101011119797119129
                min000007299668
                2.5%212119399668
                50%141018121618181816141816
                97.5%322935322831292940333933
                max343036342832292943353933
                def plot_box_plot_as_line(
                    data: pd.DataFrame,
                    x_months=True,
                    title="",
                    x_label="",
                    x_pos_mean_label=2,
                    pos_labels=None,
                    pos_high_low=None,
                    y_label="",
                    y_formatter=lambda x, p: "{:.0f}".format(int(x) / 1000),
                    show_high_low_labels=True,
                    show_inline_labels=True,
                    show_bands=True,
                    figsize=(4, 2.5),
                    line_source="mean",
                ):
                    fig, ax = plt.subplots(figsize=figsize)

                    line_to_plot = data[line_source]
                    lower_label = "2.5%"
                    upper_label = "97.5%"
                    lower = data[lower_label]
                    upper = data[upper_label]

                    line_to_plot.plot(ax=ax)

                    if show_bands:
                        ax.fill_between(line_to_plot.index, lower, upper, alpha=0.2)

                    if x_months:
                        ax.set_xlim(0.5, 12.5)
                        ax.set_xticks(range(1, 13))
                        ax.set_xticklabels(["J", "F", "M", "A", "M", "J", "J", "A", "S", "O", "N", "D"])

                    ax.yaxis.set_major_formatter(mtick.FuncFormatter(y_formatter))
                    ax.set_ylabel(y_label)
                    ax.set_xlabel(x_label)
                    ax.set_title(title)

                    ymin, ymax = ax.get_ylim()
                    y_scale = ymax - ymin

                    max_x = int(line_to_plot.idxmax())
                    max_y = line_to_plot.max()
                    min_x = int(line_to_plot.idxmin())
                    min_y = line_to_plot.min()

                    ax.axvline(
                        max_x,
                        ymin=0,
                        ymax=((max_y - ymin) / (ymax - ymin)),
                        linestyle="dashed",
                        color="tab:blue",
                        alpha=0.5,
                    )
                    ax.scatter(max_x, max_y, color="tab:blue", s=10)
                    ax.axvline(
                        min_x,
                        ymin=0,
                        ymax=((min_y - ymin) / (ymax - ymin)),
                        linestyle="dashed",
                        color="tab:blue",
                        alpha=0.5,
                    )
                    ax.scatter(min_x, min_y, color="tab:blue", s=10)

                    x_pos_mean_label_int = int(x_pos_mean_label)
                    if show_inline_labels:
                        mean_x = x_pos_mean_label
                        mean_y = line_to_plot.iloc[x_pos_mean_label_int] * 1.02
                        upper_x = x_pos_mean_label
                        upper_y = upper.iloc[x_pos_mean_label_int]
                        lower_x = x_pos_mean_label
                        lower_y = lower.iloc[x_pos_mean_label_int] * 0.95

                        if pos_labels:
                            mean_x = pos_labels["mean"]["x"]
                            mean_y = pos_labels["mean"]["y"]
                            upper_x = pos_labels["upper"]["x"]
                            upper_y = pos_labels["upper"]["y"]
                            lower_x = pos_labels["lower"]["x"]
                            lower_y = pos_labels["lower"]["y"]

                        ax.text(mean_x, mean_y, "Mean", color="tab:blue", fontsize=8)
                        ax.text(upper_x, upper_y, upper_label, color="tab:blue", fontsize=8)
                        ax.text(lower_x, lower_y, lower_label, color="tab:blue", fontsize=8)

                    if show_high_low_labels:
                        high_x_origin = max_x
                        high_y_origin = max_y
                        high_x_label = high_x_origin + 0.5
                        high_y_label = high_y_origin + 0.1 * y_scale
                        if pos_high_low:
                            high_x_label = pos_high_low["high"]["x"]
                            high_y_label = pos_high_low["high"]["y"]
                        ax.annotate(
                            "High",
                            (high_x_origin, high_y_origin),
                            xytext=(high_x_label, high_y_label),
                            arrowprops=dict(facecolor="black", arrowstyle="->"),
                        )

                        low_x_origin = min_x * 1.01
                        low_y_origin = min_y
                        low_x_label = low_x_origin + 1.5
                        low_y_label = low_y_origin - 0.1 * y_scale
                        if pos_high_low:
                            low_x_label = pos_high_low["low"]["x"]
                            low_y_label = pos_high_low["low"]["y"]
                        ax.annotate(
                            "Low",
                            (low_x_origin, low_y_origin),
                            xytext=(low_x_label, low_y_label),
                            arrowprops=dict(facecolor="black", arrowstyle="->"),
                        )

                    ax.grid(axis="x")
                    ax.set_axisbelow(True)

                    plt.tight_layout()
                    plt.show()


                plot_box_plot_as_line(
                    data=month_stats.T,
                    title=f"Descriptive Statistics for Form 1-A Filings by Month\n({min_year} - {max_year_full})",
                    x_label="Month",
                    y_label="Number of\nForm 1-A Filings",
                    y_formatter=lambda x, p: "{:.0f}".format(int(x)),
                    x_pos_mean_label=5,
                )
                form_types = count_formType.index.tolist()

                fig, axes = plt.subplots(4, 3, figsize=(9, 7))

                cnt = 0
                for formType in form_types:

                    data = (
                        structured_data_full_years[structured_data_full_years["formType"] == formType]
                        .groupby(["year", "month"])
                        .size()
                        .unstack()
                        .fillna(0)
                        .reindex(columns=range(1, 13), fill_value=0) # ensure all month are included
                    )

                    # if data.sum().sum() < 100:
                    # continue

                    ax = axes.flatten()[cnt]
                    cnt += 1
                    try:
                        data.boxplot(
                            ax=ax,
                            grid=False,
                            showfliers=False,
                            flierprops=dict(marker="o", markersize=3),
                            patch_artist=True,
                            boxprops=dict(facecolor="white", color="tab:blue"),
                            showmeans=True,
                            meanline=True,
                            meanprops={"color": "tab:blue", "linestyle": ":"},
                            medianprops={"color": "black"},
                            capprops={"color": "none"},
                        )

                        ax.set_title(f"Form {formType}")
                        ax.set_xlabel("")
                        ax.set_ylabel(f"Form {formType} Count")
                        xticklabels = [
                            pd.to_datetime(str(x), format="%m").strftime("%b") for x in range(1, 13)
                        ]
                        ax.set_xticklabels(xticklabels)
                        ax.tick_params(axis="x", rotation=45)
                    except Exception as e:
                        print(f"Error: {e}")

                # disable the empty subplots
                for i in range(cnt, 12):
                    axes.flatten()[i].axis("off")

                fig.suptitle(f"{form_name} Filings by Month\n({min_year} - {max_year_full})")
                plt.tight_layout()
                plt.show()
                counts_per_month_by_formType = (
                    structured_data[["year", "month", "accessionNo", "formType"]]
                    .groupby(["year", "month", "formType"])
                    .count()
                    .rename(columns={"accessionNo": "count"})
                    .pivot_table(
                        index=["year", "month"], # Rows
                        columns="formType", # Columns
                        values="count", # Values to fill
                        fill_value=0, # Replace NaN with 0
                    )
                    .astype(int)
                    .reset_index() # Make year and month normal columns
                )

                counts_per_month_by_formType
                Out[24]:
                formTypeyearmonth1-A1-A POS1-A-W1-A/A1-K1-K/A1-Z1-Z-W1-Z/A
                020156700500000
                120157200400000
                2201589022200000
                32015912031300000
                42015106131500000
                ....................................
                112202410141864060101
                113202411101743310100
                114202412152143941700
                11520251101252652200
                11620252171532230000

                117 rows × 11 columns

                fix, ax = plt.subplots(figsize=(6, 4))

                ax.stackplot(
                    counts_per_month_by_formType["year"].astype(str)
                    + "-"
                    + counts_per_month_by_formType["month"].astype(str),
                    *[counts_per_month_by_formType[ft] for ft in form_types],
                    labels=[f"{ft}" for ft in form_types],
                    alpha=0.8,
                )
                handles, labels = ax.get_legend_handles_labels()
                ax.legend(
                    list(reversed(handles)),
                    list(reversed(labels)),
                    title="Form Type",
                    labelspacing=0.15,
                )

                ax.set_title(f"{form_name} Filings per Month")
                ax.set_ylabel("Filings per Month")
                xticks = (
                    counts_per_month_by_formType["year"].astype(str)
                    + "-"
                    + counts_per_month_by_formType["month"].astype(str)
                )
                ax.set_xticks([i for i, x in enumerate(xticks) if x.endswith("-1")])
                ax.set_xticklabels(
                    [label.get_text()[:4] for label in ax.get_xticklabels()], rotation=90, ha="left"
                )

                ax.grid(axis="y", linestyle=":", alpha=0.5)
                ax.spines["top"].set_visible(False)
                ax.spines["right"].set_visible(False)
                # draw vertical lines for each first month of the year, dotted, transparency 0.5,
                # with height of the y value for the respective month
                for year, month in counts_per_month_by_formType[["year", "month"]].values:
                    if month == 1:
                        ax.vlines(
                            f"{year}-{month}",
                            ymin=0,
                            ymax=counts_per_month_by_formType[
                                (counts_per_month_by_formType["year"] == year)
                                & (counts_per_month_by_formType["month"] == month)
                            ]
                            .drop(columns=["year", "month"])
                            .sum(axis=1),
                            linestyle=":",
                            alpha=0.5,
                            color="grey",
                        )

                ax.axvspan("2020-1", "2022-1", alpha=0.1, color="red", zorder=-100)
                ax.text(
                    "2020-12",
                    ax.get_ylim()[1] - 45,
                    "COVID",
                    horizontalalignment="center",
                    verticalalignment="center",
                    color="red",
                    alpha=0.5,
                )
                plt.show()
                counts_filedAtClass = (
                    (
                        structured_data.drop_duplicates(subset=["accessionNo"])
                        .groupby(["filedAtClass"])
                        .size()
                        .sort_values(ascending=False)
                        .to_frame(name="Count")
                    )
                    .rename_axis("Publication Time")
                    .sort_values("Count", ascending=True)
                )
                counts_filedAtClass["Pct"] = (
                    counts_filedAtClass["Count"].astype(int)
                    / counts_filedAtClass["Count"].astype(int).sum()
                ).map("{:.0%}".format)
                counts_filedAtClass["Count"] = counts_filedAtClass["Count"].map(lambda x: f"{x:,}")
                counts_filedAtClass.index = (
                    counts_filedAtClass.index.str.replace("preMarket", "Pre-Market (4:00 - 9:30 AM)")
                    .str.replace("marketHours", "Market Hours (9:30 AM - 4:00 PM)")
                    .str.replace("afterMarket", "After Market (4:00 - 8:00 PM)")
                )
                counts_filedAtClass = counts_filedAtClass.reindex(counts_filedAtClass.index[::-1])

                print(
                    f"{form_name} filing counts by pre-market, regular market hours,\n"
                    "and after-market publication time ({min_year} - {max_year_full})."
                )
                counts_filedAtClass
                Reg A filing counts by pre-market, regular market hours,
                and after-market publication time ({min_year} - {max_year_full}).
                Out[26]:
                CountPct
                Publication Time
                After Market (4:00 - 8:00 PM)4,91243%
                regularMarket4,73141%
                other9538%
                Pre-Market (4:00 - 9:30 AM)8157%
                counts_dayOfWeek = (
                    structured_data.drop_duplicates(subset=["accessionNo"])
                    .groupby(["dayOfWeek"])
                    .size()
                    .to_frame(name="Count")
                ).rename_axis("Day of the Week")
                counts_dayOfWeek["Pct"] = (
                    counts_dayOfWeek["Count"].astype(int) / counts_dayOfWeek["Count"].astype(int).sum()
                ).map("{:.0%}".format)
                counts_dayOfWeek["Count"] = counts_dayOfWeek["Count"].map(lambda x: f"{x:,}")

                print(f"{form_name} filing counts by day of the week ({min_year} - {max_year}).")
                counts_dayOfWeek.loc[["Monday", "Tuesday", "Wednesday", "Thursday", "Friday"]]
                Reg A filing counts by day of the week (2015 - 2025).
                Out[27]:
                CountPct
                Day of the Week
                Monday2,19119%
                Tuesday2,05218%
                Wednesday2,10018%
                Thursday2,01718%
                Friday3,05027%

                Offering Tier

                from IPython.display import display


                def plot_count_by_category(df, category, title=None, pretty_name=None):
                    if pretty_name is None:
                        pretty_name = category
                    count_formType = (
                        df.drop_duplicates(subset=["accessionNo"])
                        .groupby([category])
                        .size()
                        .sort_values(ascending=False)
                        .to_frame(name="Count")
                    ).rename_axis(pretty_name)
                    count_formType["Pct"] = (
                        count_formType["Count"].astype(int) / count_formType["Count"].astype(int).sum()
                    ).map("{:.0%}".format)
                    count_formType["Count"] = count_formType["Count"].map(lambda x: f"{x:,}")

                    print(f"{form_name} filing count by {pretty_name} ({min_year} - {max_year_full})")
                    display(count_formType)

                    counts_by_category_month_year = (
                        df.drop_duplicates(subset=["accessionNo"])
                        .groupby(["year", category])
                        .size()
                        .to_frame(name="count")
                        .rename_axis(["year", pretty_name])
                        .unstack(fill_value=0)
                    )

                    counts_by_category_month_year.loc["Total"] = counts_by_category_month_year.sum()
                    counts_by_category_month_year["Total"] = counts_by_category_month_year.sum(axis=1)

                    print(
                        f"{form_name} counts by {pretty_name} and year from {min_year} to {max_year_full}."
                    )
                    display(counts_by_category_month_year)

                    fig, ax = plt.subplots(figsize=(6, 3))
                    counts_by_category_month_year["count"].drop("Total").plot(
                        kind="bar", stacked=True, ax=ax
                    )
                    ax.set_xlabel("Year")
                    ax.set_ylabel("Number of Filings")
                    ax.yaxis.set_major_formatter(mtick.StrMethodFormatter("{x:,.0f}"))
                    ax.grid(axis="x")
                    ax.set_axisbelow(True)
                    handles, labels = ax.get_legend_handles_labels()
                    ax.legend(
                        list(reversed(handles)),
                        list(reversed(labels)),
                        title=pretty_name,
                        labelspacing=0.15,
                    )
                    ax.set_title(
                        f"{form_name} Disclosures by {pretty_name} per Year ({min_year} - {max_year_full})"
                    )
                    plt.show()


                plot_count_by_category(
                    structured_data,
                    "summaryInfo.indicateTier1Tier2Offering",
                    pretty_name="Offering Tier",
                )
                Reg A filing count by Offering Tier (2015 - 2024)
                CountPct
                Offering Tier
                Tier26,75379%
                Tier11,81621%
                Reg A counts by Offering Tier and year from 2015 to 2024.
                countTotal
                Offering TierTier1Tier2
                year
                20158281163
                2016252324576
                2017177449626
                2018188536724
                2019174672846
                20202099431152
                202122311451368
                20221989841182
                20231618711032
                2024134664798
                20251884102
                Total181667538569

                Under Regulation A, Tier 1 allows companies to raise up to $20 million annually without audited financial statements but requires state-by-state compliance, while Tier 2 allows up to $75 million annually with audited financial statements, ongoing SEC reporting, investor limits, and exemption from state securities regulations.

                Offering amounts

                In this section, we analyze the offering amount in the initial Form 1-A filings.

                form_1a = structured_data_full_years[structured_data_full_years["formType"] == "1-A"]
                form_1a["securitiesIssued"].info()
                <class 'pandas.core.series.Series'>
                Index: 2073 entries, 0 to 11310
                Series name: securitiesIssued
                Non-Null Count Dtype
                -------------- -----
                1388 non-null object
                dtypes: object(1)
                memory usage: 32.4+ KB
                data = form_1a["summaryInfo.totalAggregateOffering"]
                data = data[data > 1000]

                # Define log-spaced bins
                bin_edges = np.logspace(np.log10(min(data)), np.log10(max(data)), num=20)

                fig, ax = plt.subplots(figsize=(3, 2))
                ax.hist(
                    data,
                    bins=bin_edges,
                    color="steelblue",
                    edgecolor="black",
                    linewidth=0.5,
                )
                ax.set_yscale("log")
                ax.set_xscale("log")
                ax.xaxis.set_major_formatter(mtick.StrMethodFormatter("{x:,.0f}"))
                ax.yaxis.set_major_formatter(mtick.StrMethodFormatter("{x:,.0f}"))
                ax.tick_params(axis="x", rotation=45)
                ax.set_title(
                    f"Offering Amount Distribution in Form 1A Filings ({min_year} - {max_year})"
                )
                ax.set_xlabel("Offering Amount ($)")
                ax.set_ylabel("Count")
                plt.show()
                offering_amounts = (
                    form_1a[["accessionNo", "fileNo", "year", "summaryInfo.totalAggregateOffering"]]
                    .drop_duplicates(subset=["accessionNo", "fileNo"])
                    .groupby(["year"])
                    .sum()["summaryInfo.totalAggregateOffering"]
                )

                # offering_amounts.loc["Total"] = offering_amounts.sum()

                print(f"Offering Amount in Form 1A filings from {min_year} to {max_year_full}.")
                offering_amounts
                Offering Amount in Form 1A filings from 2015 to 2024.
                Out[31]:
                year
                2015 1.088415e+09
                2016 2.006582e+09
                2017 2.827206e+09
                2018 2.781807e+09
                2019 4.068989e+09
                2020 4.498605e+09
                2021 7.562768e+09
                2022 6.471967e+09
                2023 4.821942e+09
                2024 4.239667e+09
                Name: summaryInfo.totalAggregateOffering, dtype: float64
                fig, ax = plt.subplots(figsize=(3.5, 2))
                offering_amounts.apply(lambda x: x / 1e6).plot(kind="bar", stacked=True, ax=ax)
                ax.set_xlabel("Year")
                ax.set_ylabel("Offering Amount (Million $)")
                ax.yaxis.set_major_formatter(mtick.StrMethodFormatter("{x:,.0f}"))
                ax.grid(axis="x")
                ax.set_axisbelow(True)
                ax.set_title(
                    f"Offering Amount in Form 1A filings per Year ({min_year} - {max_year_full})"
                )
                plt.show()

                Form 1-K Annual Report Dataset

                In this section we analyze the selling progress of offered securities by year since initial offering of the securities based on information in the Form 1-K Annual Report filings.

                form_1k_df = structured_data_full_years[structured_data_full_years["formType"] == "1-K"]
                form_1k_df = form_1k_df.dropna(subset=["summaryInfo"])
                form_1k_df = form_1k_df.explode("summaryInfo")
                form_1k_df = pd.json_normalize(form_1k_df.to_dict(orient="records"))
                selling_progress_data = form_1k_df[
                    [
                        "accessionNo",
                        "fileNo",
                        "filedAt",
                        "summaryInfo.offeringCommenceDate",
                        "summaryInfo.qualifiedSecuritiesSold",
                        "summaryInfo.offeringSecuritiesSold",
                    ]
                ].dropna()
                # convert the date fields to datetime
                selling_progress_data["summaryInfo.offeringCommenceDate"] = pd.to_datetime(
                    selling_progress_data["summaryInfo.offeringCommenceDate"], errors="coerce"
                ).dt.date
                selling_progress_data["filedAt"] = pd.to_datetime(
                    selling_progress_data["filedAt"], utc=True
                )
                selling_progress_data["filedAt"] = selling_progress_data["filedAt"].dt.tz_convert(
                    "US/Eastern"
                )
                selling_progress_data["filedAtDate"] = selling_progress_data["filedAt"].dt.date

                selling_progress_data["timeOfferedDays"] = (
                    selling_progress_data["summaryInfo.offeringCommenceDate"]
                    - selling_progress_data["filedAtDate"]
                ).apply(lambda x: -x.days)

                selling_progress_data["percentSold"] = (
                    selling_progress_data["summaryInfo.offeringSecuritiesSold"]
                    / selling_progress_data["summaryInfo.qualifiedSecuritiesSold"]
                )
                selling_progress_data
                Out[36]:
                accessionNofileNofiledAtsummaryInfo.offeringCommenceDatesummaryInfo.qualifiedSecuritiesSoldsummaryInfo.offeringSecuritiesSoldfiledAtDatetimeOfferedDayspercentSold
                00001214659-16-01107824R-000092016-04-29 11:05:02-04:002015-11-203.009600e+0716920576.02016-04-291610.562220
                10001376474-16-00067924R-000132016-05-06 13:16:15-04:002016-02-046.000000e+060.02016-05-06920.000000
                20001644600-16-00014224R-000072016-05-20 12:03:45-04:002016-01-212.000000e+070.02016-05-201200.000000
                30001644600-16-00017124R-000222016-07-27 14:09:16-04:002016-03-226.432247e+061644018.02016-07-271270.255590
                40001477932-16-01259324R-000242016-09-19 17:53:12-04:002015-02-032.500000e+071400000.02016-09-195940.056000
                ..............................
                4820001410708-24-00000524R-009152024-09-13 12:43:16-04:002023-08-181.500000e+090.02024-09-133920.000000
                4830001925674-24-00000424R-009572024-09-13 21:00:42-04:002023-12-112.500000e+074505096.02024-09-132770.180204
                4840001096906-24-00194024R-005292024-09-30 21:52:59-04:002021-09-271.221760e+07135260.02024-09-3010990.011071
                4850001062993-24-01810124R-006942024-10-30 12:54:39-04:002022-07-265.000000e+062101819.02024-10-308270.420364
                4860001829126-24-00867024R-008892024-12-31 17:25:50-05:002023-08-107.500000e+063870.02024-12-315090.000516

                484 rows × 9 columns

                # scatter plot of the time offered vs the percentage of securities qualified
                fig, ax = plt.subplots(figsize=(4, 3))
                sns.scatterplot(
                    x="timeOfferedDays",
                    y="percentSold",
                    data=selling_progress_data,
                    alpha=0.5,
                    ax=ax,
                )
                ax.set_xlabel("Time Offered (days)")
                ax.set_ylabel("Percentage `sold`")
                ax.set_title("Time Offered vs Percentage Sold")
                ax.set_xlim(0, 2100)
                plt.show()

                Form 1-Z Exit Reports

                form_1z_df = structured_data_full_years[structured_data_full_years["formType"] == "1-Z"]
                form_1z_suminfo = form_1z_df.dropna(subset=["summaryInfoOffering"])
                form_1z_suminfo = form_1z_suminfo.explode("summaryInfoOffering")
                form_1z_suminfo = pd.json_normalize(form_1z_suminfo.to_dict(orient="records"))
                form_1z_suminfo.info()
                <class 'pandas.core.frame.DataFrame'>
                RangeIndex: 270 entries, 0 to 269
                Columns: 144 entries, id to summaryInfoOffering.crdNumberBrokerDealer
                dtypes: datetime64[ns, US/Eastern](1), float64(110), int64(4), object(29)
                memory usage: 303.9+ KB
                aggregated_value_df = form_1z_suminfo[
                    [
                        "accessionNo",
                        "fileNo",
                        "filedAt",
                        "summaryInfoOffering.offeringCommenceDate",
                        "summaryInfoOffering.offeringSecuritiesQualifiedSold",
                        "summaryInfoOffering.offeringSecuritiesSold",
                        "summaryInfoOffering.pricePerSecurity",
                        "summaryInfoOffering.findersFees",
                        "summaryInfoOffering.legalFees",
                        "summaryInfoOffering.auditorFees",
                        "summaryInfoOffering.blueSkyFees",
                        "summaryInfoOffering.promotersFees",
                        "summaryInfoOffering.salesCommissionsFee",
                        "summaryInfoOffering.underwriterFees",
                    ]
                ]
                securities_sold = aggregated_value_df["summaryInfoOffering.offeringSecuritiesSold"]
                price_per_share = aggregated_value_df["summaryInfoOffering.pricePerSecurity"]
                aggregated_value_df = aggregated_value_df.assign(
                    aggregatedValue=securities_sold * price_per_share
                ).dropna(subset=["aggregatedValue"])
                aggregated_value_df = aggregated_value_df.loc[
                    aggregated_value_df["aggregatedValue"] > 1000
                ].copy()
                aggregated_value_df.head()
                Out[40]:
                accessionNofileNofiledAtsummaryInfoOffering.offeringCommenceDatesummaryInfoOffering.offeringSecuritiesQualifiedSoldsummaryInfoOffering.offeringSecuritiesSoldsummaryInfoOffering.pricePerSecuritysummaryInfoOffering.findersFeessummaryInfoOffering.legalFeessummaryInfoOffering.auditorFeessummaryInfoOffering.blueSkyFeessummaryInfoOffering.promotersFeessummaryInfoOffering.salesCommissionsFeesummaryInfoOffering.underwriterFeesaggregatedValue
                20001144204-15-06520724R-000032015-11-13 14:20:10-05:0009-02-2015545000.0505000.010.00NaN508000.030000.06000.0NaNNaNNaN5050000.0
                30001389049-15-00000324R-000042015-11-19 18:03:09-05:0001-30-201560000.060000.025.000.00.00.00.00.00.00.01500000.0
                40001144204-15-07365324R-000032015-12-31 17:00:02-05:0010-30-20151453000.01303000.010.00NaN16000.04000.03000.0NaNNaNNaN13030000.0
                50001506275-16-00000424R-000062016-01-28 15:58:00-05:0012-29-20144613422.0429000.01000.000.017898.00.04396.00.00.00.0429000000.0
                60001354488-16-00726324R-000142016-05-06 19:16:18-04:0003-02-20161023110.0885345.012.22NaN150000.025000.0NaNNaNNaNNaN10818915.9
                # convert the date fields to datetime
                aggregated_value_df["summaryInfoOffering.offeringCommenceDate"] = pd.to_datetime(
                    aggregated_value_df["summaryInfoOffering.offeringCommenceDate"], errors="coerce"
                ).dt.date
                aggregated_value_df["filedAt"] = pd.to_datetime(
                    aggregated_value_df["filedAt"], utc=True
                )
                aggregated_value_df["filedAt"] = aggregated_value_df["filedAt"].dt.tz_convert(
                    "US/Eastern"
                )
                aggregated_value_df["filedAtDate"] = aggregated_value_df["filedAt"].dt.date

                aggregated_value_df = aggregated_value_df.dropna(
                    subset=["filedAtDate", "summaryInfoOffering.offeringCommenceDate"]
                )

                aggregated_value_df.loc[:, "timeOfferedDays"] = (
                    aggregated_value_df["summaryInfoOffering.offeringCommenceDate"]
                    - aggregated_value_df["filedAtDate"]
                ).apply(lambda x: -x.days)

                aggregated_value_df.loc[:, "percentSold"] = (
                    100
                    * aggregated_value_df["summaryInfoOffering.offeringSecuritiesSold"]
                    / aggregated_value_df["summaryInfoOffering.offeringSecuritiesQualifiedSold"]
                )

                aggregated_value_df.loc[:, "raisedPerYear"] = aggregated_value_df["aggregatedValue"] / (
                    aggregated_value_df["timeOfferedDays"] / 365
                )

                aggregated_value_df = aggregated_value_df.dropna(subset=["raisedPerYear"])
                aggregated_value_df
                Out[42]:
                accessionNofileNofiledAtsummaryInfoOffering.offeringCommenceDatesummaryInfoOffering.offeringSecuritiesQualifiedSoldsummaryInfoOffering.offeringSecuritiesSoldsummaryInfoOffering.pricePerSecuritysummaryInfoOffering.findersFeessummaryInfoOffering.legalFeessummaryInfoOffering.auditorFeessummaryInfoOffering.blueSkyFeessummaryInfoOffering.promotersFeessummaryInfoOffering.salesCommissionsFeesummaryInfoOffering.underwriterFeesaggregatedValuefiledAtDatetimeOfferedDayspercentSoldraisedPerYear
                20001144204-15-06520724R-000032015-11-13 14:20:10-05:002015-09-025.450000e+05505000.010.0000NaN508000.030000.06000.0NaNNaNNaN5.050000e+062015-11-137292.6605502.560069e+07
                30001389049-15-00000324R-000042015-11-19 18:03:09-05:002015-01-306.000000e+0460000.025.00000.00.00.00.00.00.00.01.500000e+062015-11-19293100.0000001.868601e+06
                40001144204-15-07365324R-000032015-12-31 17:00:02-05:002015-10-301.453000e+061303000.010.0000NaN16000.04000.03000.0NaNNaNNaN1.303000e+072015-12-316289.6765317.670887e+07
                50001506275-16-00000424R-000062016-01-28 15:58:00-05:002014-12-294.613422e+06429000.01000.00000.017898.00.04396.00.00.00.04.290000e+082016-01-283959.2989543.964177e+08
                60001354488-16-00726324R-000142016-05-06 19:16:18-04:002016-03-021.023110e+06885345.012.2200NaN150000.025000.0NaNNaNNaNNaN1.081892e+072016-05-066586.5346836.075237e+07
                ............................................................
                2650001683168-24-00847724R-009752024-12-03 15:35:28-05:002023-08-291.000000e+089000000.00.0100NaN15000.0NaN1500.0NaNNaNNaN9.000000e+042024-12-034629.0000007.110390e+04
                2660001683168-24-00857224R-009762024-12-09 15:38:10-05:002023-08-072.000000e+09665000000.00.0002NaN15000.0NaN1500.0NaNNaNNaN1.330000e+052024-12-0949033.2500009.907143e+04
                2670001214659-24-02007224R-009772024-12-09 16:11:33-05:002023-09-111.000000e+0715117666.00.0053NaNNaNNaNNaNNaNNaNNaN8.012363e+042024-12-09455151.1766606.427500e+04
                2680001683168-24-00878324R-005082024-12-17 14:17:32-05:002023-09-262.083333e+0785849.03.6000NaN90000.046500.0NaNNaNNaN3090.03.090564e+052024-12-174480.4120752.517982e+05
                2690001829126-24-00867324R-008892024-12-31 17:27:40-05:002023-08-107.500000e+063870.010.00000.0132500.040000.04750.00.0387.00.03.870000e+042024-12-315090.0516002.775147e+04

                195 rows × 19 columns

                def plot_histogram(data, title, x_label, y_label, log_scale=True):
                    if log_scale:
                        bin_edges = np.logspace(np.log10(min(data)), np.log10(max(data)), num=20)
                    else:
                        bin_edges = 20
                    fig, ax = plt.subplots(figsize=(3, 2))
                    ax.hist(
                        data,
                        bins=bin_edges,
                        color="steelblue",
                        edgecolor="black",
                        linewidth=0.5,
                    )
                    if log_scale:
                        ax.set_yscale("log")
                        ax.set_xscale("log")
                        ax.xaxis.set_major_formatter(mtick.StrMethodFormatter("{x:,.0f}"))
                        ax.yaxis.set_major_formatter(mtick.StrMethodFormatter("{x:,.0f}"))
                        ax.tick_params(axis="x", rotation=45)
                    ax.set_title(title)
                    ax.set_xlabel(x_label)
                    ax.set_ylabel(y_label)

                    return fig, ax


                fig, ax = plot_histogram(
                    aggregated_value_df["raisedPerYear"],
                    "Raised Capital per Year reported in Form 1-Z Filings",
                    "Raised Capital per Year ($)",
                    "Count",
                )

                ax.axvline(75_000_000, color="red", linestyle="--", label="Maximum\namount allowed")
                ax.legend()
                plt.show()

                There are some outliers here probably cause by errors in the reporting. The maximum allowed amount to be raised per year under Regulation A is $75 million. Let's exclude all filings which report a higher number and visualize the total amount raised.

                aggregated_value_df = aggregated_value_df[
                    aggregated_value_df["raisedPerYear"] < 75_000_000
                ]
                fig, ax = plot_histogram(
                    aggregated_value_df["aggregatedValue"],
                    f"Final aggregated value of securities sold\nin Form 1-Z filings ({min_year} - {max_year_full})",
                    "Offering Amount ($)",
                    "Count",
                )
                fig, ax = plot_histogram(
                    aggregated_value_df["percentSold"][
                        aggregated_value_df["percentSold"].between(0, 100)
                    ],
                    f"Percentage of securities sold in Form 1-Z filings\n({min_year} - {max_year_full})",
                    "Percentage Sold",
                    "Count",
                    log_scale=False,
                )
                aggregated_value_df
                Out[47]:
                accessionNofileNofiledAtsummaryInfoOffering.offeringCommenceDatesummaryInfoOffering.offeringSecuritiesQualifiedSoldsummaryInfoOffering.offeringSecuritiesSoldsummaryInfoOffering.pricePerSecuritysummaryInfoOffering.findersFeessummaryInfoOffering.legalFeessummaryInfoOffering.auditorFeessummaryInfoOffering.blueSkyFeessummaryInfoOffering.promotersFeessummaryInfoOffering.salesCommissionsFeesummaryInfoOffering.underwriterFeesaggregatedValuefiledAtDatetimeOfferedDayspercentSoldraisedPerYear
                20001144204-15-06520724R-000032015-11-13 14:20:10-05:002015-09-025.450000e+055.050000e+0510.0000NaN508000.030000.06000.0NaNNaNNaN5.050000e+062015-11-137292.6605502.560069e+07
                30001389049-15-00000324R-000042015-11-19 18:03:09-05:002015-01-306.000000e+046.000000e+0425.00000.00.00.00.00.00.00.01.500000e+062015-11-19293100.0000001.868601e+06
                60001354488-16-00726324R-000142016-05-06 19:16:18-04:002016-03-021.023110e+068.853450e+0512.2200NaN150000.025000.0NaNNaNNaNNaN1.081892e+072016-05-066586.5346836.075237e+07
                70001621388-16-00000424R-000152016-05-16 12:17:55-04:002015-04-011.000000e+101.015000e+090.01000.038991.00.02050.0NaN0.00.01.015000e+072016-05-1641110.1500009.013990e+06
                80001104659-16-13174624R-000202016-07-08 16:03:16-04:002015-11-301.538462e+061.184726e+066.5000NaN113274.022880.05000.0NaNNaNNaN7.700719e+062016-07-0822177.0071671.271838e+07
                ............................................................
                2650001683168-24-00847724R-009752024-12-03 15:35:28-05:002023-08-291.000000e+089.000000e+060.0100NaN15000.0NaN1500.0NaNNaNNaN9.000000e+042024-12-034629.0000007.110390e+04
                2660001683168-24-00857224R-009762024-12-09 15:38:10-05:002023-08-072.000000e+096.650000e+080.0002NaN15000.0NaN1500.0NaNNaNNaN1.330000e+052024-12-0949033.2500009.907143e+04
                2670001214659-24-02007224R-009772024-12-09 16:11:33-05:002023-09-111.000000e+071.511767e+070.0053NaNNaNNaNNaNNaNNaNNaN8.012363e+042024-12-09455151.1766606.427500e+04
                2680001683168-24-00878324R-005082024-12-17 14:17:32-05:002023-09-262.083333e+078.584900e+043.6000NaN90000.046500.0NaNNaNNaN3090.03.090564e+052024-12-174480.4120752.517982e+05
                2690001829126-24-00867324R-008892024-12-31 17:27:40-05:002023-08-107.500000e+063.870000e+0310.00000.0132500.040000.04750.00.0387.00.03.870000e+042024-12-315090.0516002.775147e+04

                169 rows × 19 columns

                # calculate the total fees
                fees = [
                    "findersFees",
                    "legalFees",
                    "auditorFees",
                    "blueSkyFees",
                    "promotersFees",
                    "salesCommissionsFee",
                    "underwriterFees",
                ]
                aggregated_value_df = aggregated_value_df.assign(
                    totalFees=aggregated_value_df[[f"summaryInfoOffering.{fee}" for fee in fees]].sum(
                        axis=1
                    )
                )

                aggregated_value_df = aggregated_value_df.assign(
                    feePercentage=(
                        100 * aggregated_value_df["totalFees"] / aggregated_value_df["aggregatedValue"]
                    )
                )

                fig, ax = plot_histogram(
                    aggregated_value_df["feePercentage"][
                        aggregated_value_df["feePercentage"].between(0, 100)
                    ],
                    f"Percentage of fees reported in Form 1-Z filings\n({min_year} - {max_year_full})",
                    "Percentage of Fees",
                    "Count",
                    log_scale=False,
                )
                # scatter plot of the total fees vs the aggregated value
                fig, ax = plt.subplots(figsize=(4, 3))
                ax.scatter(
                    aggregated_value_df["aggregatedValue"],
                    aggregated_value_df["feePercentage"],
                    alpha=0.5,
                )
                ax.set_xlabel("Aggregated Value ($)")
                ax.set_ylabel("Percentage of Fees")
                ax.set_title(
                    f"Aggregated Value of Sold Securities vs Total Fees\n"
                    f"reported in Form 1-Z filings ({min_year} - {max_year_full})"
                )
                ax.set_xscale("log")
                ax.set_yscale("log")
                ax.xaxis.set_major_formatter(mtick.StrMethodFormatter("{x:,.0f}"))
                ax.yaxis.set_major_formatter(mtick.StrMethodFormatter("{x:,.0f}"))
                ax.tick_params(axis="x", rotation=45)
                plt.show()

                Footer

                Products

                • EDGAR Filing Search API
                • Full-Text Search API
                • Real-Time Filing Stream API
                • Filing Download & PDF Generator API
                • XBRL-to-JSON Converter
                • 10-K/10-Q/8-K Item Extractor
                • Investment Adviser & Form ADV API
                • Insider Trading Data - Form 3, 4, 5
                • Restricted Sales Notifications - Form 144
                • Institutional Holdings - Form 13F
                • Form N-PORT API - Investment Company Holdings
                • Form N-CEN API - Annual Reports by Investment Companies
                • Form N-PX API - Proxy Voting Records
                • Form 13D/13G API
                • Form S-1/424B4 - IPOs, Debt & Rights Offerings
                • Form C - Crowdfunding Offerings
                • Form D - Private Placements & Exempt Offerings
                • Regulation A Offering Statements API
                • Changes in Auditors & Accountants
                • Non-Reliance on Prior Financial Statements
                • Executive Compensation Data API
                • Directors & Board Members Data
                • Company Subsidiaries Database
                • Outstanding Shares & Public Float
                • SEC Enforcement Actions
                • Accounting & Auditing Enforcement Releases (AAERs)
                • SRO Filings
                • CIK, CUSIP, Ticker Mapping

                General

                • Pricing
                • Features
                • Supported Filings
                • EDGAR Filing Statistics

                Account

                • Sign Up - Start Free Trial
                • Log In
                • Forgot Password

                Developers

                • API Sandbox
                • Documentation
                • Resources & Tutorials
                • Python API SDK
                • Node.js API SDK

                Legal

                • Terms of Service
                • Privacy Policy

                Legal

                • Terms of Service
                • Privacy Policy

                SEC API

                © 2025 sec-api.io by Data2Value GmbH. All rights reserved.

                SEC® and EDGAR® are registered trademarks of the U.S. Securities and Exchange Commission (SEC).

                EDGAR is the Electronic Data Gathering, Analysis, and Retrieval system operated by the SEC.

                sec-api.io and Data2Value GmbH are independent of, and not affiliated with, sponsored by, or endorsed by the U.S. Securities and Exchange Commission.

                sec-api.io is classified under SIC code 7375 (Information Retrieval Services), providing on-demand access to structured data and online information services.