sec-api.ioSEC API by D2V
FilingsPricingSandboxDocs
Log inGet Free API Key
API Documentation
Introduction

Filing Query API
Full-Text Search API
Stream API
Download & PDF Generator API
XBRL-to-JSON Converter 
Extractor API 

Form ADV API - Investment Advisers

Form 3/4/5 API - Insider Trading
Form 144 API - Restricted Sales
Form 13F API - Institut. Holdings
Form 13D/13G API - Activist Invst.
Form N-PORT API - Mutual Funds

Form N-CEN API - Annual Reports
Form N-PX API - Proxy Voting

Form S-1/424B4 API - IPOs, Notes
Form C API - Crowdfunding
Form D API - Private Sec. Offerings
Form 1-A/1-K/1-Z - Reg A Offerings

Form 8-K API - Item 4.01
Form 8-K API - Item 4.02
Overview
Example: Python
Form 8-K API - Item 5.02

Executive Compensation API
Directors & Board Members
Company Subsidiaries
Outstanding Shares & Public Float

SEC Enforcement Actions
SEC Litigation Releases
SEC Administrative Proceedings
AAER Database API
SRO Filings Database

CIK, CUSIP, Ticker Mapping API
EDGAR Entities Database

Financial Statements

Analysis of Financial Restatements and Non-Reliance Disclosures

Open In Colab   Download Notebook

On this page:
  • Data Loading and Preparation
    • Standardization of Data Fields
    • Vizualization of Non-Reliance Disclosures over Time
      • Distribution of Disclosures by Their Characteristics
        • Parties Responsible for Discovering Issues
          • Most Involved Auditors in Restatements
            • Affected Periods by Restatements
              • Affected Financial Statement Items

                We illustrate how to perform an exploratory data analysis on disclosures informing investors about non-reliance of previously issued financial statements (also known as financial restatements) of publicly traded companies on U.S. stock exchanges. These financial restatements are disclosed in Form 8-K filings with the SEC, specifically under Item 4.02, titled "Non-Reliance on Previously Issued Financial Statements or a Related Audit Report or Completed Interim Review." These disclosures are presented in text form by companies. Utilizing our Structured Data API, we extract and structure the relevant information from the text, making it available for detailed analysis.

                Our analysis will focus on several key areas:

                • Number of Item 4.02 disclosures made each year from 2004 to 2023, per quarter, month and at what time of the day (pre-market, regular market, after-market).
                • Distribution of disclosures across structured data fields, such as the proportion of disclosures reporting material weaknesses in internal controls.
                • Identification of the party most often responsible for discovering the issue, whether it was the company itself, its auditor, or the SEC.
                • Number of times an auditor was involved in the restatement process.
                • Number of reporting periods (quarters or years) affected by each restatement.
                • Statistics concerning the financial statement items impacted by the restatements.

                Data Loading and Preparation

                To load and prepare the data, we will use the Form 8-K Item 4.02 Structured Data API to download all structured data related to Form 8-K filings that include Item 4.02 disclosures. The data spanning the years 2004 to 2024 is saved in a JSONL file ./form-8k-item-4-02-structured-data.jsonl.

                import os
                import json
                import re
                import pandas as pd
                import numpy as np
                import matplotlib.pyplot as plt
                import matplotlib.style as style
                import matplotlib.ticker as mtick

                style.use("default")

                params = {
                    "axes.labelsize": 8, "font.size": 8, "legend.fontsize": 8,
                    "xtick.labelsize": 8, "ytick.labelsize": 8, "font.family": "sans-serif",
                    "axes.spines.top": False, "axes.spines.right": False, "grid.color": "grey",
                    "axes.grid": True, "axes.grid.axis": "y", "grid.alpha": 0.5, "grid.linestyle": ":",
                }

                plt.rcParams.update(params)
                !pip install sec-api
                from sec_api import Form_8K_Item_X_Api

                item_X_api = Form_8K_Item_X_Api("YOUR_API_KEY")

                YEARS = range(2024, 2003, -1) # from 2024 to 2004
                TARGET_FILE = "./form-8k-item-4-02-structured-data.jsonl"

                if not os.path.exists(TARGET_FILE):
                    for year in YEARS:
                        done = False
                        search_from = 0
                        year_counter = 0

                        while not done:
                            searchRequest = {
                                "query": f"item4_02:* AND filedAt:[{year}-01-01 TO {year}-12-31]",
                                "from": search_from,
                                "size": "50",
                                "sort": [{"filedAt": {"order": "desc"}}],
                            }

                            response = item_X_api.get_data(searchRequest)

                            if len(response["data"]) == 0:
                                break

                            search_from += 50
                            year_counter += len(response["data"])

                            with open(TARGET_FILE, "a") as f:
                                for entry in response["data"]:
                                    f.write(json.dumps(entry) + "\n")

                        print(f"Finished loading {year_counter} Item 4.02 for year {year}")
                else:
                    print("File already exists, skipping download")
                Finished loading 240 Item 4.02 for year 2024
                Finished loading 262 Item 4.02 for year 2023
                Finished loading 304 Item 4.02 for year 2022
                Finished loading 864 Item 4.02 for year 2021
                Finished loading 96 Item 4.02 for year 2020
                Finished loading 98 Item 4.02 for year 2019
                Finished loading 132 Item 4.02 for year 2018
                Finished loading 141 Item 4.02 for year 2017
                Finished loading 173 Item 4.02 for year 2016
                Finished loading 216 Item 4.02 for year 2015
                Finished loading 243 Item 4.02 for year 2014
                Finished loading 326 Item 4.02 for year 2013
                Finished loading 349 Item 4.02 for year 2012
                Finished loading 402 Item 4.02 for year 2011
                Finished loading 458 Item 4.02 for year 2010
                Finished loading 465 Item 4.02 for year 2009
                Finished loading 576 Item 4.02 for year 2008
                Finished loading 786 Item 4.02 for year 2007
                Finished loading 1057 Item 4.02 for year 2006
                Finished loading 1013 Item 4.02 for year 2005
                Finished loading 172 Item 4.02 for year 2004

                Standardization of Data Fields

                The following section includes boilerplate code used to normalize various fields and enhance the dataset by deriving additional variables through field combinations. For example, the Item 4.02 dataset contains information on affected financial statement line items as disclosed in non-reliance filings. However, these line items often lack standard nomenclature, deviating from US GAAP definitions. To address this, we apply standardization techniques, such as converting "net loss" to "net income" or "cost of goods sold" to "cost of sales."

                The approach described below primarily relies on regular expressions (regex) to identify and normalize patterns, which effectively standardizes the majority of cases. However, some inconsistencies remain, leading to a small portion of false positives. For the purposes of this analysis, the error rate is considered negligible.

                A similar standardization process is applied to auditor names, ensuring consistency across entries. For instance, some companies report their auditor as "Ernst & Young," while others use the abbreviation "EY." These variations are unified under a single standardized label to improve data consistency and facilitate accurate analysis.

                def standardize_affected_line_item(affected_line_item):
                    item = affected_line_item.lower()
                    item = re.sub(r"(net revenue.?|net sales)", "revenue", item, flags=re.IGNORECASE)
                    item = re.sub(r"net loss", "net income", item, flags=re.IGNORECASE)
                    item = re.sub(r"^net income \(loss\)$", "net income", item, flags=re.IGNORECASE)
                    item = re.sub(
                        r"^net income per share$", "earnings per share", item, flags=re.IGNORECASE
                    )
                    item = re.sub(
                        r"^total current liabilities$", "current liabilities", item, flags=re.IGNORECASE
                    )
                    item = re.sub(
                        r"^total current assets$", "current assets", item, flags=re.IGNORECASE
                    )
                    item = re.sub(r"^liabilities$", "total liabilities", item, flags=re.IGNORECASE)
                    item = re.sub(r"^assets$", "total assets", item, flags=re.IGNORECASE)
                    item = re.sub(r"^earnings$", "net income", item, flags=re.IGNORECASE)
                    item = re.sub(
                        r"^derivative liability$", "derivative liabilities", item, flags=re.IGNORECASE
                    )
                    item = re.sub(
                        r"^additional paid in capital$",
                        "additional paid-in capital",
                        item,
                        flags=re.IGNORECASE,
                    )
                    item = re.sub(r"^cost of goods sold$", "cost of sales", item, flags=re.IGNORECASE)
                    item = re.sub(
                        r"total stockholders' equity|total shareholders' equity|shareholders' equity|shareholder's equity|stockholders' equity|equity section|temporary equity|equity classification",
                        "equity",
                        item,
                        flags=re.IGNORECASE,
                    )
                    item = re.sub(r"^total equity$", "equity", item, flags=re.IGNORECASE)
                    # earnings per share calculation => earnings per share
                    item = re.sub(
                        r"earnings per share calculation|diluted earnings per share|earnings per share \(eps\)|diluted net income \(loss\) per share|loss per share",
                        "earnings per share",
                        item,
                        flags=re.IGNORECASE,
                    )
                    item = re.sub(r"^net sales$", "revenue", item, flags=re.IGNORECASE)
                    item = re.sub(r"^revenues$", "revenue", item, flags=re.IGNORECASE)
                    return item


                def standardize_affected_line_items(affected_line_items):
                    if isinstance(affected_line_items, list):
                        return [standardize_affected_line_item(item) for item in affected_line_items]
                    return affected_line_items


                def standardize_auditor(auditor: str) -> str:
                    substitutions = [
                        (r"\.|,", ""),
                        (r"LLP", ""),
                        (r" LLC", ""),
                        (r" PLLC", ""),
                        (r"BDO .*", "BDO"),
                        (r".*PwC.*", "PricewaterhouseCoopers"),
                        (r"PricewaterhouseCoopers", "PwC"),
                        (r"Deloitte & Touche", "Deloitte"),
                        (r"Ernst & Young", "EY"),
                        (r"(.*)?Malone & Bailey(.*)?", "MaloneBailey"),
                        (r"(.*)?WithumSmith(.*)?", "WithumSmith+Brown"),
                    ]

                    for pattern, replacement in substitutions:
                        auditor = re.sub(pattern, replacement, auditor, flags=re.IGNORECASE)

                    # set to empty string if the following patterns are found
                    to_empty_string_patterns = [
                        (r"Independent registered public accounting firm", ""),
                        (r"independent registered public", ""),
                        (r"Not specified", ""),
                        (r"Not explicitly mentioned", ""),
                        (r"Independent accountant", ""),
                        (r"NaN", ""),
                        (r"Unknown", ""),
                    ]

                    for pattern, replacement in to_empty_string_patterns:
                        # check if pattern is found in auditor. if yes, return empty string
                        if re.search(pattern, auditor, flags=re.IGNORECASE):
                            return "Unknown"

                    if auditor == "":
                        return "Unknown"

                    return auditor.strip()


                def standardize_auditors(auditors):
                    if isinstance(auditors, list):
                        return [standardize_auditor(auditor) for auditor in auditors]
                    return auditors
                structured_data = pd.read_json(TARGET_FILE, lines=True)

                # add date-related columns
                structured_data["filedAt"] = pd.to_datetime(structured_data["filedAt"], utc=True)
                structured_data["filedAt"] = structured_data["filedAt"].dt.tz_convert("US/Eastern")
                structured_data["year"] = structured_data["filedAt"].dt.year
                structured_data["month"] = structured_data["filedAt"].dt.month
                structured_data["qtr"] = structured_data["filedAt"].dt.quarter
                structured_data["dayOfWeek"] = structured_data["filedAt"].dt.day_name()
                # filedAtClass: preMarket (4:00AM-9:30AM), regularMarket (9:30AM-4:00PM), afterMarket (4:00PM-8:00PM)
                structured_data["filedAtClass"] = structured_data["filedAt"].apply(
                    lambda x: (
                        "preMarket"
                        if x.hour < 9 or (x.hour == 9 and x.minute < 30)
                        else (
                            "regularMarket"
                            if x.hour < 16
                            else "afterMarket" if x.hour < 20 else "other"
                        )
                    )
                )
                # convert long-form of each item into item id only, e.g. "Item 4.02: ..." => "4.02"
                structured_data["items"] = structured_data["items"].apply(
                    lambda items: [re.search(r"\d+\.\d+", x).group(0) if x else None for x in items]
                )
                # explode column "item4_02" into multiple columns
                # where each column is a key-value pair of the JSON object
                # and drop all structured data columns for items, eg "item5_02"
                item_cols = list(
                    structured_data.columns[
                        structured_data.columns.str.contains(r"item\d+_", case=False)
                    ]
                )
                structured_data = pd.concat(
                    [
                        structured_data.drop(item_cols, axis=1),
                        structured_data["item4_02"].apply(pd.Series),
                    ],
                    axis=1,
                )
                # drop "id" column
                structured_data.drop(["id"], axis=1, inplace=True)
                # standardize affected line itmes
                structured_data["affectedLineItems"] = structured_data["affectedLineItems"].apply(
                    standardize_affected_line_items
                )
                # standardize auditor names
                structured_data["auditors"] = structured_data["auditors"].apply(standardize_auditors)
                # add "hasBig4Auditor" (bool). True if auditor is one of the Big 4 (Deloitte, EY, KPMG, PwC)
                structured_data["hasBig4Auditor"] = structured_data["auditors"].apply(
                    lambda x: any(auditor in ["Deloitte", "EY", "KPMG", "PwC"] for auditor in x)
                )
                # add column: "numberPeriodsAffected" = number of periods affected
                structured_data["numberPeriodsAffected"] = structured_data[
                    "affectedReportingPeriods"
                ].apply(lambda x: len(x) if isinstance(x, list) else 0)
                # add column: "numberQuartersAffected" = number of "Q\d" occurrences in "affectedReportingPeriods"
                structured_data["numberQuartersAffected"] = structured_data[
                    "affectedReportingPeriods"
                ].apply(lambda x: len([period for period in x if re.search(r"Q\d", period)]))
                # add column: "numberYearsAffected" = number of "FY" occurrences in "affectedReportingPeriods"
                structured_data["numberYearsAffected"] = structured_data[
                    "affectedReportingPeriods"
                ].apply(lambda x: len([period for period in x if re.search(r"FY", period)]))
                # add "reportedWithEarnings" (bool). True if item 2.02 or 9.01 is present
                structured_data["reportedWithEarnings"] = structured_data["items"].apply(
                    lambda x: "2.02" in x or "9.01" in x
                )
                # add "reportedWithOtherItems" (bool). True if more than one item is present
                structured_data["reportedWithOtherItems"] = structured_data["items"].apply(
                    lambda x: len(x) > 1
                )
                # add column "issueIdentifiedByAuditor" (bool).
                structured_data["identifiedByAuditor"] = structured_data["identifiedBy"].apply(
                    lambda x: "Auditor" in x if isinstance(x, list) else False
                )
                # add column "identifiedByCompany"
                structured_data["identifiedByCompany"] = structured_data["identifiedBy"].apply(
                    lambda x: "Company" in x if isinstance(x, list) else False
                )
                # add column "identifiedBySec"
                structured_data["identifiedBySec"] = structured_data["identifiedBy"].apply(
                    lambda x: "SEC" in x if isinstance(x, list) else False
                )
                # check if revenue or net income adjustment contains "million" or "billion"
                structured_data["revenueAdjustmentContainsWordMillion"] = structured_data[
                    "revenueAdjustment"
                ].apply(
                    # lambda x: "million" in x if isinstance(x, str) else False
                    # use case insensitive regex to match "million" or "billion" and return boolean
                    # True if match, False otherwise
                    lambda x: (
                        bool(re.search(r"million|billion", x, re.IGNORECASE))
                        if isinstance(x, str)
                        else False
                    )
                )
                structured_data["netIncomeAdjustmentContainsWordMillion"] = structured_data[
                    "netIncomeAdjustment"
                ].apply(
                    lambda x: (
                        bool(re.search(r"million|billion", x, re.IGNORECASE))
                        if isinstance(x, str)
                        else False
                    )
                )
                print(
                    f"Loaded {len(structured_data):,} records from Item 4.02 disclosures between 2004 to 2024."
                )
                structured_data.head()
                Loaded 8,373 records from Item 4.02 disclosures between 2004 to 2024.
                Out[85]:
                accessionNoformTypefiledAtperiodOfReportciktickercompanyNameitemsyearmonth...numberPeriodsAffectednumberQuartersAffectednumberYearsAffectedreportedWithEarningsreportedWithOtherItemsidentifiedByAuditoridentifiedByCompanyidentifiedBySecrevenueAdjustmentContainsWordMillionnetIncomeAdjustmentContainsWordMillion
                00001477932-24-0083548-K2024-12-27 17:29:12-05:002024-12-241437750TRXAT-REX Acquisition Corp.[4.02, 5.02]202412...101FalseTrueTrueTrueFalseFalseFalse
                10001437749-24-0382258-K2024-12-23 08:00:28-05:002024-12-201000230OCCOPTICAL CABLE CORP[4.02]202412...761FalseFalseFalseTrueFalseFalseFalse
                20001683168-24-0089068-K2024-12-23 07:52:53-05:002024-12-19725394DFCODALRADA FINANCIAL CORP[4.02]202412...110FalseFalseFalseTrueFalseFalseFalse
                30001437749-24-0381568-K2024-12-20 17:15:00-05:002024-12-19914122PPIHPerma-Pipe International Holdings, Inc.[4.02]202412...110FalseFalseFalseTrueFalseFalseFalse
                40001140361-24-0495468-K2024-12-13 17:16:31-05:002024-12-101856028SDIGStronghold Digital Mining, Inc.[4.02, 9.01]202412...330TrueTrueFalseTrueTrueFalseFalse

                5 rows × 43 columns

                Vizualization of Non-Reliance Disclosures over Time

                item_4_02_counts = (
                    structured_data.drop_duplicates(subset=["accessionNo"])
                    .groupby(["year"])
                    .size()
                    .to_frame(name="count")
                )

                print(f"Item 4.02 counts from 2004 to 2024.")
                item_4_02_counts.T
                Item 4.02 counts from 2004 to 2024.
                Out[86]:
                year2004200520062007200820092010201120122013...2015201620172018201920202021202220232024
                count1719621004780572465455402346326...2131731411329896864304262240

                1 rows × 21 columns

                def plot_timeseries(ts, title):
                    fig, ax = plt.subplots(figsize=(4, 2.5))
                    ts["count"].plot(ax=ax, legend=False)
                    ax.set_title(title)
                    ax.set_xlabel("Year")
                    ax.set_ylabel("Number of\nItem 4.02 Filings")
                    ax.set_xticks(np.arange(2004, 2025, 2))
                    ax.yaxis.set_major_formatter(mtick.StrMethodFormatter("{x:,.0f}"))
                    ax.set_xlim(2003, 2025)
                    ax.grid(axis="x")
                    ax.set_axisbelow(True)
                    plt.xticks(rotation=45, ha="right")

                    for year in YEARS:
                        year_y_max = ts.loc[year, "count"]
                        ax.vlines(year, 0, year_y_max, linestyles=":", colors="grey", alpha=0.5, lw=1)

                    plt.tight_layout()
                    plt.show()


                plot_timeseries(
                    item_4_02_counts,
                    title="Disclosures of Financial Statement Non-Reliance\nForm 8-K with Item 4.02 per Year (2004 - 2024)",
                )
                structured_data["qtr"] = structured_data["month"].apply(lambda x: (x - 1) // 3 + 1)

                counts_qtr_yr_piv = (
                    structured_data.drop_duplicates(subset=["accessionNo"])
                    .groupby(["year", "qtr"])
                    .size()
                    .unstack()
                    .fillna(0)
                ).astype(int)

                print(f"Item 4.02 counts by quarter from 2004 to 2024.")
                counts_qtr_yr_piv.T
                Item 4.02 counts by quarter from 2004 to 2024.
                Out[88]:
                year2004200520062007200820092010201120122013...2015201620172018201920202021202220232024
                qtr
                1030128123918711813212099106...575934412728291368061
                202212222011321051311159597...494241342824370566270
                33319023916813012494767954...54312820202239595952
                413825026217212311898917369...534138372322426536157

                4 rows × 21 columns

                counts_qtr_yr = counts_qtr_yr_piv.stack().reset_index(name="count")

                fig, ax = plt.subplots(figsize=(6, 2.5))
                counts_qtr_yr_piv.plot(kind="bar", ax=ax, legend=True)
                ax.legend(title="Quarter", loc="upper right", bbox_to_anchor=(1.02, 1))
                ax.set_title("Number of Non-Reliance Disclosures per Quarter\n(2004 - 2024)")
                ax.set_xlabel("Year")
                ax.set_ylabel("Number of\nItem 4.02 Filings")
                ax.yaxis.set_major_formatter(mtick.StrMethodFormatter("{x:,.0f}"))
                ax.grid(axis="x")
                ax.set_axisbelow(True)
                plt.tight_layout()
                plt.show()
                counts_month_yr_piv = (
                    structured_data.drop_duplicates(subset=["accessionNo"])
                    .groupby(["year", "month"])
                    .size()
                    .unstack()
                    .fillna(0)
                ).astype(int)

                print(f"Item 4.02 counts by month from 2004 to 2024.")
                counts_month_yr_piv
                Item 4.02 counts by month from 2004 to 2024.
                Out[90]:
                month123456789101112
                year
                20040000000726296346
                200537891758591454691536312364
                2006458415281786360115647610977
                20076067112856254419136537643
                2008407473634524326236316032
                2009334045503322314647365824
                2010374154494834234724343232
                2011242670493927283117304516
                201226284539312526351826389
                2013184048413620112914153915
                2014162531222619112412172611
                201510192824214201915182510
                2016171428152169111152511
                201710915201110715610217
                201812141514137311616156
                20196714121069838105
                202021313912381047114
                2021410155426452201186271149
                202238465223231063815133010
                202392546242810102821172519
                202410232833316112714202314
                print(f"Descriptive statistics for Item 4.02 counts by month from 2005 to 2024.")
                month_stats = (
                    counts_month_yr_piv.loc[2005:]
                    .describe(percentiles=[0.025, 0.975])
                    .round(0)
                    .astype(int)
                )
                month_stats
                Descriptive statistics for Item 4.02 counts by month from 2005 to 2024.
                Out[91]:
                month123456789101112
                count202020202020202020202020
                mean233553404622213721255328
                std162545245618153017196035
                min271391033835104
                2.5%3813101034935104
                50%182645363120162815183114
                97.5%53871648518259531045970201115
                max60891758526463601156476271149
                def plot_box_plot_as_line(
                    data: pd.DataFrame,
                    x_months=True,
                    title="",
                    x_label="",
                    x_pos_mean_label=2,
                    pos_labels=None, # {"mean": {"x": 2, "y": 150}, "upper": {"x": 2, "y": 150}, "lower": {"x": 2, "y": 150}},
                    pos_high_low=None, # {"high": {"x": 2, "y": 150}, "low": {"x": 2, "y": 150}},
                    y_label="",
                    y_formatter=lambda x, p: "{:.0f}".format(int(x) / 1000),
                    show_high_low_labels=True,
                    show_inline_labels=True,
                    show_bands=True,
                    figsize=(4, 2.5),
                    line_source="mean",
                ):
                    fig, ax = plt.subplots(figsize=figsize)

                    line_to_plot = data[line_source]
                    lower_label = "2.5%"
                    upper_label = "97.5%"
                    lower = data[lower_label]
                    upper = data[upper_label]

                    line_to_plot.plot(ax=ax)

                    if show_bands:
                        ax.fill_between(line_to_plot.index, lower, upper, alpha=0.2)

                    if x_months:
                        ax.set_xlim(0.5, 12.5)
                        ax.set_xticks(range(1, 13))
                        ax.set_xticklabels(["J", "F", "M", "A", "M", "J", "J", "A", "S", "O", "N", "D"])

                    ax.yaxis.set_major_formatter(mtick.FuncFormatter(y_formatter))
                    ax.set_ylabel(y_label)
                    ax.set_xlabel(x_label)
                    ax.set_title(title)

                    ymin, ymax = ax.get_ylim()
                    y_scale = ymax - ymin

                    max_x = int(line_to_plot.idxmax())
                    max_y = line_to_plot.max()
                    min_x = int(line_to_plot.idxmin())
                    min_y = line_to_plot.min()

                    ax.axvline(
                        max_x,
                        ymin=0,
                        ymax=((max_y - ymin) / (ymax - ymin)),
                        linestyle="dashed",
                        color="tab:blue",
                        alpha=0.5,
                    )
                    ax.scatter(max_x, max_y, color="tab:blue", s=10)
                    ax.axvline(
                        min_x,
                        ymin=0,
                        ymax=((min_y - ymin) / (ymax - ymin)),
                        linestyle="dashed",
                        color="tab:blue",
                        alpha=0.5,
                    )
                    ax.scatter(min_x, min_y, color="tab:blue", s=10)

                    x_pos_mean_label_int = int(x_pos_mean_label)
                    if show_inline_labels:
                        mean_x = x_pos_mean_label
                        mean_y = line_to_plot.iloc[x_pos_mean_label_int] * 1.02
                        upper_x = x_pos_mean_label
                        upper_y = upper.iloc[x_pos_mean_label_int]
                        lower_x = x_pos_mean_label
                        lower_y = lower.iloc[x_pos_mean_label_int] * 0.95

                        if pos_labels:
                            mean_x = pos_labels["mean"]["x"]
                            mean_y = pos_labels["mean"]["y"]
                            upper_x = pos_labels["upper"]["x"]
                            upper_y = pos_labels["upper"]["y"]
                            lower_x = pos_labels["lower"]["x"]
                            lower_y = pos_labels["lower"]["y"]

                        ax.text(mean_x, mean_y, "Mean", color="tab:blue", fontsize=8)
                        ax.text(upper_x, upper_y, upper_label, color="tab:blue", fontsize=8)
                        ax.text(lower_x, lower_y, lower_label, color="tab:blue", fontsize=8)

                    if show_high_low_labels:
                        high_x_origin = max_x
                        high_y_origin = max_y
                        high_x_label = high_x_origin + 0.5
                        high_y_label = high_y_origin + 0.1 * y_scale
                        if pos_high_low:
                            high_x_label = pos_high_low["high"]["x"]
                            high_y_label = pos_high_low["high"]["y"]
                        ax.annotate(
                            "High",
                            (high_x_origin, high_y_origin),
                            xytext=(high_x_label, high_y_label),
                            arrowprops=dict(facecolor="black", arrowstyle="->"),
                        )

                        low_x_origin = min_x * 1.01
                        low_y_origin = min_y
                        low_x_label = low_x_origin + 1.5
                        low_y_label = low_y_origin - 0.1 * y_scale
                        if pos_high_low:
                            low_x_label = pos_high_low["low"]["x"]
                            low_y_label = pos_high_low["low"]["y"]
                        ax.annotate(
                            "Low",
                            (low_x_origin, low_y_origin),
                            xytext=(low_x_label, low_y_label),
                            arrowprops=dict(facecolor="black", arrowstyle="->"),
                        )

                    ax.grid(axis="x")
                    ax.set_axisbelow(True)

                    plt.tight_layout()
                    plt.show()


                plot_box_plot_as_line(
                    data=month_stats.T,
                    title="Descriptive Statistics for Item 4.02 Filings by Month\n(2005 - 2024)",
                    x_label="Month",
                    y_label="Number of\nItem 4.02 Filings",
                    y_formatter=lambda x, p: "{:.0f}".format(int(x)),
                )
                fig, ax = plt.subplots(figsize=(3.5, 3))

                counts_month_yr_piv.loc[2005:].boxplot(
                    ax=ax,
                    grid=False,
                    showfliers=False,
                    flierprops=dict(marker="o", markersize=3),
                    patch_artist=True,
                    boxprops=dict(facecolor="white", color="tab:blue"),
                    showmeans=True,
                    meanline=True,
                    meanprops={"color": "tab:blue", "linestyle": ":"},
                    medianprops={"color": "black"},
                    capprops={"color": "none"},
                )

                ax.set_title("Item 4.02 Disclosures by Month\n(2005 - 2024)")
                ax.set_xlabel("Month")
                ax.set_ylabel("Item 4.02 Count")
                xticklabels = [pd.to_datetime(str(x), format="%m").strftime("%b") for x in range(1, 13)]
                ax.set_xticklabels(xticklabels)
                plt.xticks(rotation=45)
                plt.tight_layout()
                plt.show()
                counts_filedAtClass = (
                    structured_data.drop_duplicates(subset=["accessionNo"])
                    .groupby(["filedAtClass"])
                    .size()
                    .sort_values(ascending=False)
                    .to_frame(name="Count")
                ).rename_axis("Publication Time")
                counts_filedAtClass["Pct"] = (
                    counts_filedAtClass["Count"].astype(int)
                    / counts_filedAtClass["Count"].astype(int).sum()
                ).map("{:.0%}".format)
                counts_filedAtClass["Count"] = counts_filedAtClass["Count"].map(lambda x: f"{x:,}")
                counts_filedAtClass.index = (
                    counts_filedAtClass.index.str.replace("preMarket", "Pre-Market (4:00 - 9:30 AM)")
                    .str.replace("regularMarket", "Market Hours (9:30 AM - 4:00 PM)")
                    .str.replace("afterMarket", "After Market (4:00 - 8:00 PM)")
                )
                counts_filedAtClass = counts_filedAtClass.reindex(counts_filedAtClass.index[::-1])

                print(
                    f"Item 4.02 counts by pre-market, regular market hours,\nand after-market publication time (2004 - 2025)."
                )
                counts_filedAtClass
                Item 4.02 counts by pre-market, regular market hours,
                and after-market publication time (2004 - 2025).
                Out[94]:
                CountPct
                Publication Time
                other3044%
                Pre-Market (4:00 - 9:30 AM)94311%
                Market Hours (9:30 AM - 4:00 PM)2,04925%
                After Market (4:00 - 8:00 PM)4,95060%
                counts_dayOfWeek = (
                    structured_data.drop_duplicates(subset=["accessionNo"])
                    .groupby(["dayOfWeek"])
                    .size()
                    .to_frame(name="Count")
                ).rename_axis("Day of the Week")
                counts_dayOfWeek["Pct"] = (
                    counts_dayOfWeek["Count"].astype(int) / counts_dayOfWeek["Count"].astype(int).sum()
                ).map("{:.0%}".format)
                counts_dayOfWeek["Count"] = counts_dayOfWeek["Count"].map(lambda x: f"{x:,}")

                print(f"Item 4.02 disclosures by day of the week (2004 - 2024).")
                counts_dayOfWeek.loc[["Monday", "Tuesday", "Wednesday", "Thursday", "Friday"]]
                Item 4.02 disclosures by day of the week (2004 - 2024).
                Out[95]:
                CountPct
                Day of the Week
                Monday1,63920%
                Tuesday1,70521%
                Wednesday1,53219%
                Thursday1,56019%
                Friday1,81022%

                Distribution of Disclosures by Their Characteristics

                bool_variables_to_analyze = [
                    "impactIsMaterial",
                    "restatementIsNecessary",
                    "impactYetToBeDetermined",
                    "materialWeaknessIdentified",
                    "reportedWithOtherItems",
                    "reportedWithEarnings",
                    "netIncomeDecreased",
                    "netIncomeIncreased",
                    "netIncomeAdjustmentContainsWordMillion",
                    "revenueDecreased",
                    "revenueIncreased",
                    "revenueAdjustmentContainsWordMillion",
                    "identifiedByAuditor",
                    "identifiedByCompany",
                    "identifiedBySec",
                ]

                var_to_label = {
                    "impactIsMaterial": "Impact is Material",
                    "restatementIsNecessary": "Restatement is Necessary",
                    "impactYetToBeDetermined": "Impact Yet to be Determined",
                    "materialWeaknessIdentified": "Material Weakness Identified",
                    "reportedWithOtherItems": "Reported with Other Items",
                    "reportedWithEarnings": "Reported with Earnings Announcement",
                    "netIncomeDecreased": "Net Income Decreased",
                    "netIncomeIncreased": "Net Income Increased",
                    "netIncomeAdjustmentContainsWordMillion": "Net Inc. Adj. Contains 'Million'",
                    "revenueDecreased": "Revenue Decreased",
                    "revenueIncreased": "Revenue Increased",
                    "revenueAdjustmentContainsWordMillion": "Revenue Adj. Contains 'Million'",
                    "identifiedByAuditor": "Identified by Auditor",
                    "identifiedByCompany": "Identified by Company",
                    "identifiedBySec": "Identified by SEC",
                }

                bool_variables_stats = []

                for variable in bool_variables_to_analyze:
                    variable_stats = (
                        structured_data[variable]
                        .value_counts()
                        .to_frame()
                        .reset_index()
                        .rename(columns={variable: "value"})
                    )
                    variable_stats = variable_stats.sort_values(by="value", ascending=False)
                    variable_stats["pct"] = (
                        variable_stats["count"] / variable_stats["count"].sum() * 100
                    ).round(1)
                    variable_stats.index = pd.MultiIndex.from_tuples(
                        [(variable, row["value"]) for _, row in variable_stats.iterrows()],
                    )
                    variable_stats.drop(columns="value", inplace=True)

                    bool_variables_stats.append(variable_stats)


                bool_variables_stats = pd.concat(bool_variables_stats, axis=0)
                bool_variables_stats.index.set_names(["Variable", "Value"], inplace=True)
                bool_variables_stats.rename(index=var_to_label, columns={"count": "Samples", "pct": "Pct."}, inplace=True)
                bool_variables_stats["Samples"] = bool_variables_stats["Samples"].apply(lambda x: f"{x:,.0f}")

                print(f"Number of non-reliance filings by \ntheir disclosed characteristics (2004 - 2024):")
                bool_variables_stats
                Number of non-reliance filings by 
                their disclosed characteristics (2004 - 2024):
                Out[96]:
                SamplesPct.
                VariableValue
                Impact is MaterialTrue6,22474.3
                False2,14925.7
                Restatement is NecessaryTrue8,15597.4
                False2182.6
                Impact Yet to be DeterminedTrue2,05224.5
                False6,32175.5
                Material Weakness IdentifiedTrue2,24126.8
                False6,13273.2
                Reported with Other ItemsTrue3,67643.9
                False4,69756.1
                Reported with Earnings AnnouncementTrue3,24838.8
                False5,12561.2
                Net Income DecreasedTrue2,08424.9
                False6,28975.1
                Net Income IncreasedTrue7589.1
                False7,61590.9
                Net Inc. Adj. Contains 'Million'True4495.4
                False7,92494.6
                Revenue DecreasedTrue6337.6
                False7,74092.4
                Revenue IncreasedTrue2042.4
                False8,16997.6
                Revenue Adj. Contains 'Million'True1962.3
                False8,17797.7
                Identified by AuditorTrue2,17326.0
                False6,20074.0
                Identified by CompanyTrue6,99683.6
                False1,37716.4
                Identified by SECTrue1,27915.3
                False7,09484.7

                Parties Responsible for Discovering Issues

                identifiedBy = (
                    structured_data["identifiedBy"].explode().value_counts().to_frame().head(3)
                )
                identifiedBy.index.name = "Identified By"
                identifiedBy.columns = ["Count"]
                identifiedBy["Pct."] = identifiedBy["Count"] / identifiedBy["Count"].sum() * 100
                identifiedBy["Pct."] = identifiedBy["Pct."].round(1)
                identifiedBy["Count"] = identifiedBy["Count"].apply(lambda x: f"{x:,.0f}")

                print(
                    f"Top 3 entities identifying issues in\npreviously reported financial statements (2004 - 2024):"
                )
                identifiedBy
                Top 3 entities identifying issues in
                previously reported financial statements (2004 - 2024):
                Out[97]:
                CountPct.
                Identified By
                Company6,99667.0
                Auditor2,17320.8
                SEC1,27912.2

                Most Involved Auditors in Restatements

                all_auditors = structured_data["auditors"].explode()
                all_auditors = all_auditors[all_auditors.str.len() > 0].reset_index(drop=True)
                auditors = all_auditors.value_counts().to_frame().reset_index()
                auditors["pct"] = auditors["count"] / auditors["count"].sum() * 100
                auditors["pct"] = auditors["pct"].round(2)

                print("Top 10 auditors involved in \nnon-reliance disclosures from 2004 to 2024:")
                auditors.head(10)
                Top 10 auditors involved in 
                non-reliance disclosures from 2004 to 2024:
                Out[98]:
                auditorscountpct
                0PwC6399.50
                1EY5988.89
                2Marcum4777.09
                3Deloitte4366.48
                4KPMG4306.39
                5WithumSmith+Brown3354.98
                6BDO3274.86
                7Unknown3244.82
                8Grant Thornton1792.66
                9MaloneBailey1001.49
                auditors_year = structured_data[["auditors", "year", "accessionNo"]].explode("auditors")

                auditors_year_pivot = pd.pivot_table(
                    auditors_year,
                    index="auditors",
                    columns="year",
                    values="accessionNo",
                    aggfunc="count",
                    fill_value=0,
                )

                auditors_year_pivot["total"] = auditors_year_pivot.sum(axis=1)
                auditors_year_pivot = auditors_year_pivot.sort_values(by="total", ascending=False)

                top_10_auditors = auditors_year_pivot.head(10)

                others = auditors_year_pivot[~auditors_year_pivot.index.isin(top_10_auditors.index)]
                others = others.sum().to_frame().T
                others.index = ["Others"]

                top_10_auditors = pd.concat([top_10_auditors, others], axis=0)

                top_10_auditors
                Out[99]:
                year2004200520062007200820092010201120122013...201620172018201920202021202220232024total
                PwC4215914948282120141817...16511669101313639
                EY20878385313112182531...14873724162628598
                Marcum0000035514...32155340453618477
                Deloitte111031045438201310811...1653016296436
                KPMG24116785430131110715...43045291045430
                WithumSmith+Brown0000001002...40000237581713335
                BDO8442326242126117...151517111318331710327
                Unknown7634035251420101219...1514236641324
                Grant Thornton4342023107143113...30313106316179
                MaloneBailey0891934141521...126102251100
                Others19181267251234211240219158143...7767585456971051011122883

                11 rows × 22 columns

                fig, ax = plt.subplots(figsize=(5, 3))

                top_10_auditors.drop(columns="total").T.plot(
                    kind="bar", stacked=True, ax=ax, cmap="tab20"
                )

                ax.set_title("Number of Item 4.02 Filings\nby Auditor and Year")
                ax.set_xlabel("Year")
                ax.set_ylabel("Number of Filings")
                ax.xaxis.grid(False)
                ax.set_axisbelow(True)
                handles, labels = ax.get_legend_handles_labels() # reverse order of legend items
                ax.legend(
                    reversed(handles),
                    reversed(labels),
                    title="Auditor",
                    bbox_to_anchor=(1.05, 1),
                    labelspacing=0.3,
                    fontsize=8,
                )

                plt.tight_layout()
                plt.show()

                Affected Periods by Restatements

                print(
                    f"Descriptive statistics for number of years and quarters \naffected by Item 4.02 filings (2004 - 2024):"
                )
                quarters_stats = (
                    structured_data[["numberQuartersAffected", "numberYearsAffected"]]
                    .describe()
                    .round(0)
                    .astype(int)
                )
                quarters_stats.T
                Descriptive statistics for number of years and quarters 
                affected by Item 4.02 filings (2004 - 2024):
                Out[101]:
                countmeanstdmin25%50%75%max
                numberQuartersAffected837323012356
                numberYearsAffected837311001220

                Affected Financial Statement Items

                affectedLineItems_stats = (
                    structured_data["affectedLineItems"]
                    .explode()
                    .value_counts()
                    .to_frame()
                    .reset_index()
                    .head(10)
                )
                print(
                    f"Top 10 line items affected by non-reliance disclosures\nacross all years (2004 - 2024):"
                )
                affectedLineItems_stats
                Top 10 line items affected by non-reliance disclosures
                across all years (2004 - 2024):
                Out[102]:
                affectedLineItemscount
                0net income2346
                1equity1075
                2revenue966
                3total liabilities679
                4additional paid-in capital672
                5earnings per share564
                6accumulated deficit543
                7total assets450
                8retained earnings404
                9warrants334
                line_items_year_pivot = pd.pivot_table(
                    structured_data.explode("affectedLineItems"),
                    index="affectedLineItems",
                    columns="year",
                    values="accessionNo",
                    aggfunc="count",
                    fill_value=0,
                    margins=True,
                )
                line_items_year_pivot = line_items_year_pivot.sort_values(by="All", ascending=False)
                print(f"Top 20 line items affected by Item 4.02 filings per year (2004 - 2023):")
                line_items_year_pivot.head(20)
                Top 20 line items affected by Item 4.02 filings per year (2004 - 2023):
                Out[103]:
                year2004200520062007200820092010201120122013...201620172018201920202021202220232024All
                affectedLineItems
                All508308934682424211416841757142211421171...610394402367346225690385078427320
                net income553483012262081271591189585...42273034311525965412346
                equity116510949454358502742...147131574034726131075
                revenue3115010481704846503748...292621222223283630966
                total liabilities11375635263741432326...1287910194241616679
                additional paid-in capital3396573524849361824...118698108521021672
                earnings per share8566948531839332613...5576582521111564
                accumulated deficit8243951382530282024...857511103501323543
                total assets10383834404324352028...11149796111116450
                retained earnings7626334414232171115...575657657404
                warrants2523126610533...00001247341334
                cost of sales10264838312323161515...1221024631216333
                interest expense843423338242917129...7114312117309
                goodwill3273926192112262011...9535366711272
                derivative liabilities00211811421102115...6324188614259
                current liabilities1121212117181117816...4353573137232
                accounts receivable21857121912101599...6837225410226
                common stock28271812131210612...4712419732177
                inventory323282019411781...747231277175
                income tax expense0282214191013729...457021294165

                20 rows × 22 columns

                Footer

                Products

                • EDGAR Filing Search API
                • Full-Text Search API
                • Real-Time Filing Stream API
                • Filing Download & PDF Generator API
                • XBRL-to-JSON Converter
                • 10-K/10-Q/8-K Item Extractor
                • Investment Adviser & Form ADV API
                • Insider Trading Data - Form 3, 4, 5
                • Restricted Sales Notifications - Form 144
                • Institutional Holdings - Form 13F
                • Form N-PORT API - Investment Company Holdings
                • Form N-CEN API - Annual Reports by Investment Companies
                • Form N-PX API - Proxy Voting Records
                • Form 13D/13G API
                • Form S-1/424B4 - IPOs, Debt & Rights Offerings
                • Form C - Crowdfunding Offerings
                • Form D - Private Placements & Exempt Offerings
                • Regulation A Offering Statements API
                • Changes in Auditors & Accountants
                • Non-Reliance on Prior Financial Statements
                • Executive Compensation Data API
                • Directors & Board Members Data
                • Company Subsidiaries Database
                • Outstanding Shares & Public Float
                • SEC Enforcement Actions
                • Accounting & Auditing Enforcement Releases (AAERs)
                • SRO Filings
                • CIK, CUSIP, Ticker Mapping

                General

                • Pricing
                • Features
                • Supported Filings
                • EDGAR Filing Statistics

                Account

                • Sign Up - Start Free Trial
                • Log In
                • Forgot Password

                Developers

                • API Sandbox
                • Documentation
                • Resources & Tutorials
                • Python API SDK
                • Node.js API SDK

                Legal

                • Terms of Service
                • Privacy Policy

                Legal

                • Terms of Service
                • Privacy Policy

                SEC API

                © 2025 sec-api.io by Data2Value GmbH. All rights reserved.

                SEC® and EDGAR® are registered trademarks of the U.S. Securities and Exchange Commission (SEC).

                EDGAR is the Electronic Data Gathering, Analysis, and Retrieval system operated by the SEC.

                sec-api.io and Data2Value GmbH are independent of, and not affiliated with, sponsored by, or endorsed by the U.S. Securities and Exchange Commission.

                sec-api.io is classified under SIC code 7375 (Information Retrieval Services), providing on-demand access to structured data and online information services.