sec-api.ioSEC API by D2V
FilingsPricingSandboxDocs
Log inGet Free API Key
API Documentation
Introduction

Filing Query API
Full-Text Search API
Stream API
Download & PDF Generator API
XBRL-to-JSON Converter 
Extractor API 

Form ADV API - Investment Advisers

Form 3/4/5 API - Insider Trading
Form 144 API - Restricted Sales
Form 13F API - Institut. Holdings
Form 13D/13G API - Activist Invst.
Form N-PORT API - Mutual Funds

Form N-CEN API - Annual Reports
Form N-PX API - Proxy Voting

Form S-1/424B4 API - IPOs, Notes
Form C API - Crowdfunding
Form D API - Private Sec. Offerings
Form 1-A/1-K/1-Z - Reg A Offerings

Form 8-K API - Item 4.01
Overview
Example: Python
Form 8-K API - Item 4.02
Form 8-K API - Item 5.02

Executive Compensation API
Directors & Board Members
Audit Fees API
Company Subsidiaries
Outstanding Shares & Public Float

SEC Enforcement Actions
SEC Litigation Releases
SEC Administrative Proceedings
AAER Database API
SRO Filings Database

CIK, CUSIP, Ticker Mapping API
EDGAR Entities Database

Financial Statements

Analysis of Accountant Changes and Disagreement Reports

Open In Colab   Download Notebook

On this page:
  • Data Loading and Preparation
    • Item 4.01 Example
    • Vizualization of Change-of-Accountant Disclosures over Time
      • Distribution of Disclosures by Their Characteristics
        • Disagreements
          • ICFR Weaknesses
            • Reason for End of Engagement
              • Opinion Types of Audit Reports

                We illustrate how to perform an exploratory data analysis on disclosures informing investors about changes of accountant and potential disagreements with the former accountant of publicly traded companies on U.S. stock exchanges. These changes are disclosed in Form 8-K filings with the SEC, specifically under Item 4.01, titled "Changes in Registrant’s Certifying Accountant." These disclosures are presented in text form by companies. Utilizing our Structured Data API, we extract and structure the relevant information from the text, making it available for detailed analysis.

                Our analysis will focus on several key areas:

                • Number of Item 4.01 disclosures made each year from 2004 to 2023, per quarter, month and at what time of the day (pre-market, regular market, after-market)
                • Distribution of disclosures across structured data fields, such as the proportion of disclosures reporting a going concern
                • Number of disagreements and resolution thereof disclosed per year
                • Disclosed weaknesses in internal control over financial reporting (ICFR)
                • Identification of the reason for the termination of the engagement with the former accountant
                • Identification of the types of opinion expressed in the audit reports

                Data Loading and Preparation

                To load and prepare the data, we will use the Form 8-K Item 4.01 Structured Data API to download all structured data related to Form 8-K filings that include Item 4.01 disclosures. The data spanning the years 2004 to 2024 is saved to a JSONL file ./form-8k-item-4-01-structured-data.jsonl.

                import os
                import json
                import pandas as pd
                import numpy as np
                import matplotlib.pyplot as plt
                import matplotlib.style as style
                import matplotlib.ticker as mtick

                style.use("default")

                params = {
                    "axes.labelsize": 8, "font.size": 8, "legend.fontsize": 8,
                    "xtick.labelsize": 8, "ytick.labelsize": 8, "font.family": "sans-serif",
                    "axes.spines.top": False, "axes.spines.right": False, "grid.color": "grey",
                    "axes.grid": True, "axes.grid.axis": "y", "grid.alpha": 0.5, "grid.linestyle": ":",
                }

                plt.rcParams.update(params)
                !pip install sec-api
                from sec_api import Form_8K_Item_X_Api

                item_X_api = Form_8K_Item_X_Api("YOUR_API_KEY")

                YEARS = range(2024, 2003, -1) # from 2024 to 2004
                TARGET_FILE = "./form-8k-item-4-01-structured-data.jsonl"

                if not os.path.exists(TARGET_FILE):
                    for year in YEARS:
                        done = False
                        search_from = 0
                        year_counter = 0

                        while not done:
                            searchRequest = {
                                "query": f"item4_01:* AND filedAt:[{year}-01-01 TO {year}-12-31]",
                                "from": search_from,
                                "size": "50",
                                "sort": [{"filedAt": {"order": "desc"}}],
                            }

                            response = item_X_api.get_data(searchRequest)

                            if len(response["data"]) == 0:
                                break

                            search_from += 50
                            year_counter += len(response["data"])

                            with open(TARGET_FILE, "a") as f:
                                for entry in response["data"]:
                                    f.write(json.dumps(entry) + "\n")

                        print(f"Finished loading {year_counter} Item 4.01 for year {year}")
                else:
                    print("File already exists, skipping download")
                Finished loading 1003 Item 4.01 for year 2024
                Finished loading 844 Item 4.01 for year 2023
                Finished loading 793 Item 4.01 for year 2022
                Finished loading 772 Item 4.01 for year 2021
                Finished loading 606 Item 4.01 for year 2020
                Finished loading 717 Item 4.01 for year 2019
                Finished loading 837 Item 4.01 for year 2018
                Finished loading 982 Item 4.01 for year 2017
                Finished loading 957 Item 4.01 for year 2016
                Finished loading 1216 Item 4.01 for year 2015
                Finished loading 1410 Item 4.01 for year 2014
                Finished loading 1566 Item 4.01 for year 2013
                Finished loading 1173 Item 4.01 for year 2012
                Finished loading 1448 Item 4.01 for year 2011
                Finished loading 1646 Item 4.01 for year 2010
                Finished loading 2102 Item 4.01 for year 2009
                Finished loading 1628 Item 4.01 for year 2008
                Finished loading 1974 Item 4.01 for year 2007
                Finished loading 2175 Item 4.01 for year 2006
                Finished loading 2104 Item 4.01 for year 2005
                Finished loading 771 Item 4.01 for year 2004
                structured_data = pd.read_json(TARGET_FILE, lines=True)

                structured_data["filedAt"] = pd.to_datetime(structured_data["filedAt"], utc=True)
                structured_data["filedAt"] = structured_data["filedAt"].dt.tz_convert("US/Eastern")
                structured_data["year"] = structured_data["filedAt"].dt.year
                structured_data["month"] = structured_data["filedAt"].dt.month
                structured_data["dayOfWeek"] = structured_data["filedAt"].dt.day_name()
                # filedAtClass:
                # preMarket (4:00 AM to 9:30 AM),
                # regularMarket (9:30 AM to 4:00 PM),
                # afterMarket (4:00 PM to 8:00 PM),
                # other (8:00 PM to 10:00 PM)
                structured_data["filedAtClass"] = structured_data["filedAt"].apply(
                    lambda x: (
                        "preMarket"
                        if x.hour < 9 or (x.hour == 9 and x.minute < 30)
                        else (
                            "regularMarket"
                            if x.hour < 16
                            else "afterMarket" if x.hour < 20 else "other"
                        )
                    )
                )
                unique_years = structured_data["year"].nunique()
                unique_companies = structured_data["cik"].nunique()
                print(
                    f"Loaded {len(structured_data):,} Item 4.01 structured data records from {YEARS[-1]} to {YEARS[0]}, \ncovering {unique_years} years and {unique_companies:,} unique companies."
                )
                structured_data.head()
                Loaded 26,724 Item 4.01 structured data records from 2004 to 2024, 
                covering 21 years and 10,495 unique companies.
                Out[12]:
                idaccessionNoformTypefiledAtperiodOfReportciktickercompanyNameitemsitem4_01item5_02item4_02yearmonthdayOfWeekfiledAtClass
                04ceff8128ec58ee74c3a30d2682ea4620001599916-24-0003238-K2024-12-31 12:22:18-05:002024-05-031367408OILYSino American Oil Co[Item 4.01: Changes in Registrant's Certifying...{'keyComponents': 'Sino American Oil Co. dismi...NaNNaN202412TuesdayregularMarket
                1bab09dc10fa2f239ef5e6abcc3971d6c0001683168-24-0090578-K2024-12-31 08:36:36-05:002024-12-311827855MCLEMedicale Corp.[Item 4.01: Changes in Registrant's Certifying...{'keyComponents': 'Medicale Corp. dismissed Gr...NaNNaN202412TuesdaypreMarket
                280d01f5a70c47acac5f31e3c1b2227850001493152-24-0525508-K2024-12-30 17:13:59-05:002024-06-171850767ACUTAccustem Sciences Inc.[Item 4.01: Changes in Registrant's Certifying...{'keyComponents': 'On June 17, 2024, Mercurius...NaNNaN202412MondayafterMarket
                3beb9b44a4b20c882bacffe1cf9f39ac90001493152-24-0525128-K2024-12-30 16:41:43-05:002024-12-301643988LPTVLoop Media, Inc.[Item 4.01: Changes in Registrant's Certifying...{'keyComponents': 'The Company dismissed Marcu...NaNNaN202412MondayafterMarket
                40cf18570d923c6e72e44a4a567704a040001213900-24-1129938-K2024-12-27 13:45:06-05:002024-12-202034406Tortoise Capital Series Trust[Item 4.01: Changes in Registrant's Certifying...{'keyComponents': 'As a result of the Mergers,...NaNNaN202412FridayregularMarket
                print(structured_data.info())
                <class 'pandas.core.frame.DataFrame'>
                RangeIndex: 26724 entries, 0 to 26723
                Data columns (total 16 columns):
                 # Column Non-Null Count Dtype
                --- ------ -------------- -----
                 0 id 26724 non-null object
                 1 accessionNo 26724 non-null object
                 2 formType 26724 non-null object
                 3 filedAt 26724 non-null datetime64[ns, US/Eastern]
                 4 periodOfReport 26724 non-null object
                 5 cik 26724 non-null int64
                 6 ticker 20390 non-null object
                 7 companyName 26724 non-null object
                 8 items 26724 non-null object
                 9 item4_01 26724 non-null object
                 10 item5_02 1895 non-null object
                 11 item4_02 263 non-null object
                 12 year 26724 non-null int32
                 13 month 26724 non-null int32
                 14 dayOfWeek 26724 non-null object
                 15 filedAtClass 26724 non-null object
                dtypes: datetime64[ns, US/Eastern](1), int32(2), int64(1), object(12)
                memory usage: 3.1+ MB
                None

                Item 4.01 Example

                print(json.dumps(structured_data["item4_01"][0], indent=2))
                {
                  "keyComponents": "Sino American Oil Co. dismissed BF Borgers CPA PC as its independent auditor on May 3, 2024, due to BF Borgers not being permitted to appear or practice before the SEC. Subsequently, the company appointed Michael Gillespie & Associates, PLLC as its new auditor on August 28, 2024.",
                  "newAccountantDate": "2024-08-28",
                  "engagedNewAccountant": true,
                  "formerAccountantDate": "2024-05-03",
                  "engagementEndReason": "dismissal",
                  "formerAccountantName": "BF Borgers CPA PC",
                  "newAccountantName": "Michael Gillespie & Associates, PLLC",
                  "reportedDisagreements": false,
                  "reportableEventsExist": true,
                  "reportableEventsList": [
                    "Identified material weaknesses in internal control over financial reporting."
                  ],
                  "reportedIcfrWeakness": true,
                  "opinionType": "unqualified",
                  "goingConcern": true,
                  "goingConcernDetail": "The reports included an explanatory paragraph relating to the Company\u2019s ability to continue as a going concern.",
                  "approvedChange": true
                }

                Vizualization of Change-of-Accountant Disclosures over Time

                item_4_01_counts = (
                    structured_data.drop_duplicates(subset=["accessionNo"])
                    .groupby(["year"])
                    .size()
                    .to_frame(name="count")
                )

                print(f"Item 4.01 counts from {YEARS[-1]} to {YEARS[0]}")
                item_4_01_counts.T
                Item 4.01 counts from 2004 to 2024
                Out[16]:
                year2004200520062007200820092010201120122013...2015201620172018201920202021202220232024
                count771210421751974162821021646144811731566...12169579828377176067727938441003

                1 rows × 21 columns

                def plot_timeseries(ts, title):
                    fig, ax = plt.subplots(figsize=(4, 2.5))
                    ts["count"].plot(ax=ax, legend=False)
                    ax.set_title(title)
                    ax.set_xlabel("Year")
                    ax.set_ylabel("Number of\nItem 4.01 Filings")
                    ax.set_xticks(np.arange(2004, 2025, 2))
                    ax.yaxis.set_major_formatter(mtick.StrMethodFormatter("{x:,.0f}"))
                    ax.set_xlim(2003, 2025)
                    ax.grid(axis="x")
                    ax.set_axisbelow(True)
                    plt.xticks(rotation=45, ha="right")

                    for year in YEARS:
                        year_y_max = ts.loc[year, "count"]
                        ax.vlines(year, 0, year_y_max, linestyles=":", colors="grey", alpha=0.5, lw=1)

                    plt.tight_layout()
                    plt.show()


                plot_timeseries(
                    item_4_01_counts,
                    title="Disclosures of Change of Accountant\nForm 8-K with Item 4.01 per Year (2004 - 2024)",
                )
                structured_data["qtr"] = structured_data["month"].apply(lambda x: (x - 1) // 3 + 1)

                counts_qtr_yr_piv = (
                    structured_data.drop_duplicates(subset=["accessionNo"])
                    .groupby(["year", "qtr"])
                    .size()
                    .unstack()
                    .fillna(0)
                ).astype(int)

                print(f"Item 4.01 counts by quarter from 2004 to 2024.")
                counts_qtr_yr_piv.T
                Item 4.01 counts by quarter from 2004 to 2024.
                Out[22]:
                year2004200520062007200820092010201120122013...2015201620172018201920202021202220232024
                qtr
                10462586572529387494478306441...315245257244191184172163175246
                20632606527387433379415267422...338292260217232157184198234404
                3227550496439322789413232252371...312188223183148111222215211171
                4544460487436390493360323348332...251232242193146154194217224182

                4 rows × 21 columns

                counts_qtr_yr = counts_qtr_yr_piv.stack().reset_index(name="count")

                fig, ax = plt.subplots(figsize=(6, 2.5))
                counts_qtr_yr_piv.plot(kind="bar", ax=ax, legend=True)
                ax.legend(title="Quarter", loc="upper right", bbox_to_anchor=(0.92, 1))
                ax.set_title("Number of Change-of-Accountant Disclosures per Quarter\n(2004 - 2024)")
                ax.set_xlabel("Year")
                ax.set_ylabel("Number of\nItem 4.01 Filings")
                ax.yaxis.set_major_formatter(mtick.StrMethodFormatter("{x:,.0f}"))
                ax.grid(axis="x")
                ax.set_axisbelow(True)
                plt.tight_layout()
                plt.show()
                counts_month_yr_piv = (
                    structured_data.drop_duplicates(subset=["accessionNo"])
                    .groupby(["year", "month"])
                    .size()
                    .unstack()
                    .fillna(0)
                ).astype(int)

                print(f"Item 4.01 counts by month from 2004 to 2024.")
                counts_month_yr_piv
                Item 4.01 counts by month from 2004 to 2024.
                Out[25]:
                month123456789101112
                year
                2004000000061166211174159
                2005130155177218185229177209164151178131
                2006188155243208202196163166167160155172
                2007175202195201185141157146136172147117
                2008212155162148129110110103109128141121
                200914194152185140108127329333287105101
                2010190119185160114105112174127131114115
                201115312320217211512886806694124105
                2012789113710585776711273114123111
                201315512416216512912814413889107121104
                201415011418012798105133111941327690
                2015991021141209911910911588918080
                201686708910910380696752638188
                201788681019495714810471599588
                201875681018767636163591063255
                20196050811016665674536534548
                2020545080543469462936496243
                2021534574554089866373595679
                2022404182556677636884945964
                2023394690828369647968756782
                2024756310890214100666738606656
                print(f"Descriptive statistics for Item 4.01 counts by month from 2004 to 2024.")
                month_stats = (
                    counts_month_yr_piv.loc[2005:]
                    .describe(percentiles=[0.025, 0.975])
                    .round(0)
                    .astype(int)
                )
                month_stats
                Descriptive statistics for Item 4.01 counts by month from 2004 to 2024.
                Out[27]:
                month123456789101112
                count202020202020202020202020
                mean1129713612711210698113981099692
                std554650525143416967564032
                min394174543463462936493243
                2.5%394377543764473736513845
                50%949212611410110286104781008889
                97.5%202180224213208213170272254232167153
                max212202243218214229177329333287178172
                def plot_box_plot_as_line(
                    data: pd.DataFrame,
                    x_months=True,
                    title="",
                    x_label="",
                    x_pos_mean_label=2,
                    pos_labels=None, # {"mean": {"x": 2, "y": 150}, "upper": {"x": 2, "y": 150}, "lower": {"x": 2, "y": 150}},
                    pos_high_low=None, # {"high": {"x": 2, "y": 150}, "low": {"x": 2, "y": 150}},
                    y_label="",
                    y_formatter=lambda x, p: "{:.0f}".format(int(x) / 1000),
                    show_high_low_labels=True,
                    show_inline_labels=True,
                    show_bands=True,
                    figsize=(4, 2.5),
                    line_source="mean",
                ):
                    fig, ax = plt.subplots(figsize=figsize)

                    line_to_plot = data[line_source]
                    lower_label = "2.5%"
                    upper_label = "97.5%"
                    lower = data[lower_label]
                    upper = data[upper_label]

                    line_to_plot.plot(ax=ax)

                    if show_bands:
                        ax.fill_between(line_to_plot.index, lower, upper, alpha=0.2)

                    if x_months:
                        ax.set_xlim(0.5, 12.5)
                        ax.set_xticks(range(1, 13))
                        ax.set_xticklabels(["J", "F", "M", "A", "M", "J", "J", "A", "S", "O", "N", "D"])

                    ax.yaxis.set_major_formatter(mtick.FuncFormatter(y_formatter))
                    ax.set_ylabel(y_label)
                    ax.set_xlabel(x_label)
                    ax.set_title(title)

                    ymin, ymax = ax.get_ylim()
                    y_scale = ymax - ymin

                    max_x = int(line_to_plot.idxmax())
                    max_y = line_to_plot.max()
                    min_x = int(line_to_plot.idxmin())
                    min_y = line_to_plot.min()

                    ax.axvline(
                        max_x,
                        ymin=0,
                        ymax=((max_y - ymin) / (ymax - ymin)),
                        linestyle="dashed",
                        color="tab:blue",
                        alpha=0.5,
                    )
                    ax.scatter(max_x, max_y, color="tab:blue", s=10)
                    ax.axvline(
                        min_x,
                        ymin=0,
                        ymax=((min_y - ymin) / (ymax - ymin)),
                        linestyle="dashed",
                        color="tab:blue",
                        alpha=0.5,
                    )
                    ax.scatter(min_x, min_y, color="tab:blue", s=10)

                    x_pos_mean_label_int = int(x_pos_mean_label)
                    if show_inline_labels:
                        mean_x = x_pos_mean_label
                        mean_y = line_to_plot.iloc[x_pos_mean_label_int] * 1.02
                        upper_x = x_pos_mean_label
                        upper_y = upper.iloc[x_pos_mean_label_int]
                        lower_x = x_pos_mean_label
                        lower_y = lower.iloc[x_pos_mean_label_int] * 0.95

                        if pos_labels:
                            mean_x = pos_labels["mean"]["x"]
                            mean_y = pos_labels["mean"]["y"]
                            upper_x = pos_labels["upper"]["x"]
                            upper_y = pos_labels["upper"]["y"]
                            lower_x = pos_labels["lower"]["x"]
                            lower_y = pos_labels["lower"]["y"]

                        ax.text(mean_x, mean_y, "Mean", color="tab:blue", fontsize=8)
                        ax.text(upper_x, upper_y, upper_label, color="tab:blue", fontsize=8)
                        ax.text(lower_x, lower_y, lower_label, color="tab:blue", fontsize=8)

                    if show_high_low_labels:
                        high_x_origin = max_x
                        high_y_origin = max_y
                        high_x_label = high_x_origin + 0.5
                        high_y_label = high_y_origin + 0.1 * y_scale
                        if pos_high_low:
                            high_x_label = pos_high_low["high"]["x"]
                            high_y_label = pos_high_low["high"]["y"]
                        ax.annotate(
                            "High",
                            (high_x_origin, high_y_origin),
                            xytext=(high_x_label, high_y_label),
                            arrowprops=dict(facecolor="black", arrowstyle="->"),
                        )

                        low_x_origin = min_x * 1.01
                        low_y_origin = min_y
                        low_x_label = low_x_origin + 1.5
                        low_y_label = low_y_origin - 0.1 * y_scale
                        if pos_high_low:
                            low_x_label = pos_high_low["low"]["x"]
                            low_y_label = pos_high_low["low"]["y"]
                        ax.annotate(
                            "Low",
                            (low_x_origin, low_y_origin),
                            xytext=(low_x_label, low_y_label),
                            arrowprops=dict(facecolor="black", arrowstyle="->"),
                        )

                    ax.grid(axis="x")
                    ax.set_axisbelow(True)

                    plt.tight_layout()
                    plt.show()


                plot_box_plot_as_line(
                    data=month_stats.T,
                    title="Descriptive Statistics for Item 4.01 Filings by Month\n(2005 - 2024)",
                    x_label="Month",
                    y_label="Number of\nItem 4.01 Filings",
                    y_formatter=lambda x, p: "{:.0f}".format(int(x)),
                    x_pos_mean_label=5,
                )
                fig, ax = plt.subplots(figsize=(3.5, 3))

                counts_month_yr_piv.loc[2005:].boxplot(
                    ax=ax,
                    grid=False,
                    showfliers=False,
                    flierprops=dict(marker="o", markersize=3),
                    patch_artist=True,
                    boxprops=dict(facecolor="white", color="tab:blue"),
                    showmeans=True,
                    meanline=True,
                    meanprops={"color": "tab:blue", "linestyle": ":"},
                    medianprops={"color": "black"},
                    capprops={"color": "none"},
                )

                ax.set_title("Item 4.01 Disclosures by Month\n(2005 - 2024)")
                ax.set_xlabel("Month")
                ax.set_ylabel("Item 4.01 Count")
                xticklabels = [pd.to_datetime(str(x), format="%m").strftime("%b") for x in range(1, 13)]
                ax.set_xticklabels(xticklabels)
                plt.xticks(rotation=45)
                plt.tight_layout()
                plt.show()
                counts_filedAtClass = (
                    structured_data.drop_duplicates(subset=["accessionNo"])
                    .groupby(["filedAtClass"])
                    .size()
                    .sort_values(ascending=False)
                    .to_frame(name="Count")
                ).rename_axis("Publication Time")
                counts_filedAtClass["Pct"] = (
                    counts_filedAtClass["Count"].astype(int)
                    / counts_filedAtClass["Count"].astype(int).sum()
                ).map("{:.0%}".format)
                counts_filedAtClass["Count"] = counts_filedAtClass["Count"].map(lambda x: f"{x:,}")
                counts_filedAtClass.index = (
                    counts_filedAtClass.index.str.replace("preMarket", "Pre-Market (4:00 - 9:30 AM)")
                    .str.replace("regularMarket", "Market Hours (9:30 AM - 4:00 PM)")
                    .str.replace("afterMarket", "After Market (4:00 - 8:00 PM)")
                )
                counts_filedAtClass = counts_filedAtClass.reindex(counts_filedAtClass.index[::-1])

                print(
                    f"Item 4.01 counts by pre-market, regular market hours,\nand after-market publication time (2004 - 2024)."
                )
                counts_filedAtClass
                Item 4.01 counts by pre-market, regular market hours,
                and after-market publication time (2004 - 2024).
                Out[34]:
                CountPct
                Publication Time
                other6372%
                Pre-Market (4:00 - 9:30 AM)1,6686%
                Market Hours (9:30 AM - 4:00 PM)10,64740%
                After Market (4:00 - 8:00 PM)13,77252%
                counts_dayOfWeek = (
                    structured_data.drop_duplicates(subset=["accessionNo"])
                    .groupby(["dayOfWeek"])
                    .size()
                    .to_frame(name="Count")
                ).rename_axis("Day of the Week")
                counts_dayOfWeek["Pct"] = (
                    counts_dayOfWeek["Count"].astype(int) / counts_dayOfWeek["Count"].astype(int).sum()
                ).map("{:.0%}".format)
                counts_dayOfWeek["Count"] = counts_dayOfWeek["Count"].map(lambda x: f"{x:,}")

                print(f"Item 4.01 disclosures by day of the week (2004 - 2024).")
                counts_dayOfWeek.loc[["Monday", "Tuesday", "Wednesday", "Thursday", "Friday"]]
                Item 4.01 disclosures by day of the week (2004 - 2024).
                Out:
                CountPct
                Day of the Week
                Monday4,83018%
                Tuesday5,28520%
                Wednesday5,36320%
                Thursday5,32020%
                Friday5,92622%

                Distribution of Disclosures by Their Characteristics

                structured_data["item4_01"].apply(
                    lambda x: x["consultedNewAccountant"] if "consultedNewAccountant" in x else None
                ).value_counts().to_frame().reset_index().rename(columns={"item4_01": "value"})
                Out[55]:
                valuecount
                0False20423
                1True125
                bool_variables_to_analyze = [
                    "engagedNewAccountant",
                    "consultedNewAccountant",
                    "reportedDisagreements",
                    "resolvedDisagreements",
                    "reportableEventsExist",
                    "reportedIcfrWeakness",
                    "remediatedIcfrWeakness",
                    "goingConcern",
                    "auditDisclaimer",
                    "authorizedInquiry",
                    "approvedChange",
                ]

                var_to_label = {
                    "engagedNewAccountant": "New Accountant was Engaged",
                    "consultedNewAccountant": "Consulted new Accountant Prior to Engagement",
                    "reportedDisagreements": "Disagreements Reported",
                    "resolvedDisagreements": "Reported Disagreements were Resolved",
                    "reportableEventsExist": "Reportable Events exist",
                    "reportedIcfrWeakness": "ICFR Weakness Reported",
                    "remediatedIcfrWeakness": "Reported ICFR Weakness Remediated",
                    "goingConcern": "Report includes Going Concern Statement",
                    "auditDisclaimer": "Audit Report includes Disclaimer of Opinion",
                    "authorizedInquiry": "Former Accountant Authorized to Respond to Inquiries",
                    "approvedChange": "Change Approved by Board or Audit Committee",
                }


                total_samples = len(structured_data)
                # Create a row for the total samples
                total_row = pd.DataFrame(
                    {
                        "Samples": [f"{total_samples:,.0f}"],
                        "Pct.": [""],
                        "Pct. tot.": [100],
                    },
                    index=pd.MultiIndex.from_tuples([("Total", "")], names=["Variable", "Value"]),
                )


                bool_variables_stats = []

                for variable in bool_variables_to_analyze:
                    variable_stats = (
                        structured_data["item4_01"]
                        .apply(lambda x: x[variable] if variable in x else None)
                        .value_counts()
                        .to_frame()
                        .reset_index()
                        .rename(columns={"item4_01": "value"})
                    )
                    variable_stats = variable_stats.sort_values(by="value", ascending=False)
                    variable_stats["pct"] = (
                        variable_stats["count"] / variable_stats["count"].sum() * 100
                    ).round(1)
                    variable_stats["pct_tot"] = (variable_stats["count"] / total_samples * 100).round(1)
                    variable_stats.index = pd.MultiIndex.from_tuples(
                        [(variable, row["value"]) for _, row in variable_stats.iterrows()],
                    )
                    variable_stats.drop(columns="value", inplace=True)

                    bool_variables_stats.append(variable_stats)

                bool_variables_stats = pd.concat(bool_variables_stats, axis=0)
                bool_variables_stats.index.set_names(["Variable", "Value"], inplace=True)
                bool_variables_stats.rename(
                    index=var_to_label,
                    columns={"count": "Samples", "pct": "Pct.", "pct_tot": "Pct. tot."},
                    inplace=True,
                )
                bool_variables_stats["Samples"] = bool_variables_stats["Samples"].apply(
                    lambda x: f"{x:,.0f}"
                )


                bool_variables_stats = pd.concat([total_row, bool_variables_stats])


                print(
                    f"Number of change-of-accountant filings by \ntheir disclosed characteristics (2004 - 2024):"
                )
                bool_variables_stats
                Number of change-of-accountant filings by 
                their disclosed characteristics (2004 - 2024):
                Out:
                SamplesPct.Pct. tot.
                VariableValue
                Total26,724100.0
                New Accountant was EngagedTrue20,91397.878.3
                False4762.21.8
                Consulted new Accountant Prior to EngagementTrue1250.60.5
                False20,42399.476.4
                Disagreements ReportedTrue4191.71.6
                False23,84898.389.2
                Reported Disagreements were ResolvedTrue16042.00.6
                False22158.00.8
                Reportable Events existTrue2,44113.19.1
                False16,19286.960.6
                ICFR Weakness ReportedTrue3,28164.512.3
                False1,80535.56.8
                Reported ICFR Weakness RemediatedTrue57656.42.2
                False44543.61.7
                Report includes Going Concern StatementTrue10,56295.239.5
                False5324.82.0
                Audit Report includes Disclaimer of OpinionTrue581.70.2
                False3,39798.312.7
                Former Accountant Authorized to Respond to InquiriesTrue2,50196.69.4
                False873.40.3
                Change Approved by Board or Audit CommitteeTrue19,58898.473.3
                False3191.61.2

                Now, we count the occurrences when the flags have a value of interest. E.g., "True" for "Disagreements Reported" and "False" for "Change Approved by Board or Audit Committee"

                bool_variables_to_analyze_true = [
                    "consultedNewAccountant",
                    "reportedDisagreements",
                    "resolvedDisagreements",
                    "reportableEventsExist",
                    "reportedIcfrWeakness",
                    "remediatedIcfrWeakness",
                    "goingConcern",
                    "auditDisclaimer",
                ]
                bool_variables_to_analyze_false = ["authorizedInquiry", "approvedChange"]

                events_var_labels = {
                    "consultedNewAccountant": "Consulted with New Accountant",
                    "reportedDisagreements": "Reported Disagreements",
                    "resolvedDisagreements": "Reported Disagreements Resolved",
                    "reportableEventsExist": "Reportable Events in Filing",
                    "reportedIcfrWeakness": "ICFR Weakness Reported",
                    "remediatedIcfrWeakness": "Reported ICFR Weakness Remediated",
                    "goingConcern": "Report includes Going Concern Statement",
                    "auditDisclaimer": "Audit Report includes Disclaimer of Opinion",
                    "authorizedInquiry": "Former Accountant Not Authorized to Respond to Inquiries",
                    "approvedChange": "Change Not Approved by Board or Audit Committee",
                }

                item_4_01_exploded = structured_data["item4_01"].apply(pd.Series)
                item_4_01_exploded = pd.concat([structured_data, item_4_01_exploded], axis=1)

                true_count_year_pivot = pd.pivot_table(
                    item_4_01_exploded,
                    index="year",
                    # values=["goingConcern", "reportedIcfrWeakness"],
                    values=bool_variables_to_analyze_true,
                    aggfunc=lambda x: (x == True).sum(),
                    fill_value=0,
                )

                false_count_year_pivot = pd.pivot_table(
                    item_4_01_exploded,
                    index="year",
                    values=bool_variables_to_analyze_false,
                    aggfunc=lambda x: (x == False).sum(),
                    fill_value=0,
                )

                event_counts_year_pivot = pd.concat(
                    [true_count_year_pivot, false_count_year_pivot], axis=1
                )
                event_counts_year_pivot = event_counts_year_pivot.T
                event_counts_year_pivot["total"] = event_counts_year_pivot.sum(axis=1)
                event_counts_year_pivot = event_counts_year_pivot.sort_values(
                    by="total", ascending=False
                )


                event_counts_year_pivot.rename(index=events_var_labels, inplace=True)
                event_counts_year_pivot
                Out[60]:
                year2004200520062007200820092010201120122013...201620172018201920202021202220232024total
                Report includes Going Concern Statement215600694719642973750687540848...35636027226018227326631435810562
                ICFR Weakness Reported612692752191721221117468117...1091481251251181821882552393281
                Reportable Events in Filing5720720615510811583645380...89948292851321411751912441
                Reported ICFR Weakness Remediated1649775536252012720...222520202325292528576
                Reported Disagreements16465538112832282031...714714855410419
                Change Not Approved by Board or Audit Committee26674832151699712...7131073541110319
                Reported Disagreements Resolved9252616611110517...044541003160
                Consulted with New Accountant79138225134115...204231222125
                Former Accountant Not Authorized to Respond to Inquiries3411361413623...01101152287
                Audit Report includes Disclaimer of Opinion05166726443...11030000058

                10 rows × 22 columns

                top_5_events = event_counts_year_pivot.head(5)

                others = event_counts_year_pivot[
                    ~event_counts_year_pivot.index.isin(event_counts_year_pivot.index)
                ]
                others = others.sum().to_frame().T
                others.index = ["Others"]

                top_5_events = pd.concat([top_5_events, others], axis=0)
                fig, ax = plt.subplots(figsize=(8, 3))

                top_5_events.drop(columns="total").T.plot(kind="bar", stacked=True, ax=ax, cmap="tab20")

                ax.set_title("Number of Item 4.01 Filings\nby disclosure event and year")
                ax.set_xlabel("Year")
                ax.set_ylabel("Number of Filings")
                ax.xaxis.grid(False)
                ax.set_axisbelow(True)
                handles, labels = ax.get_legend_handles_labels() # reverse order of legend items

                labels = [
                    "declination" if label == "declination to stand for reappointment" else label
                    for label in labels
                ]

                ax.legend(
                    reversed(handles),
                    reversed(labels),
                    title="Disclosure Events",
                    bbox_to_anchor=(1.05, 1),
                    labelspacing=0.3,
                    fontsize=8,
                )

                plt.tight_layout()
                plt.show()

                Disagreements

                disagreements_year_pivot = pd.pivot_table(
                    item_4_01_exploded,
                    index="year",
                    values=["reportedDisagreements", "resolvedDisagreements"],
                    aggfunc=lambda x: (x == True).sum(),
                    fill_value=0,
                )


                disagreements_year_pivot = disagreements_year_pivot.T
                disagreements_year_pivot["total"] = disagreements_year_pivot.sum(axis=1)
                disagreements_year_pivot = disagreements_year_pivot.sort_values(
                    by="total", ascending=False
                )
                disagreements_year_pivot
                Out[63]:
                year2004200520062007200820092010201120122013...201620172018201920202021202220232024total
                reportedDisagreements16465538112832282031...714714855410419
                resolvedDisagreements9252616611110517...044541003160

                2 rows × 22 columns

                fig, ax = plt.subplots(figsize=(5, 3))

                disagreements_year_pivot.T.drop("total").plot(
                    kind="bar", stacked=False, ax=ax, cmap="tab20"
                )

                ax.set_title("Number of Item 4.01 Filings\nwith disagreements disclosure by year")
                ax.set_xlabel("Year")
                ax.set_ylabel("Number of Filings")
                ax.xaxis.grid(False)
                ax.set_axisbelow(True)
                handles, labels = ax.get_legend_handles_labels() # reverse order of legend items

                labels = [
                    "declination" if label == "declination to stand for reappointment" else label
                    for label in labels
                ]

                ax.legend(
                    handles,
                    labels,
                    title="Reported Weakness",
                    bbox_to_anchor=(1.05, 1),
                    labelspacing=0.3,
                    fontsize=8,
                )

                plt.tight_layout()
                plt.show()

                ICFR Weaknesses

                icfr_weaknesses_pivot = pd.pivot_table(
                    item_4_01_exploded,
                    index="year",
                    values=[
                        "remediatedIcfrWeakness",
                        "reportedIcfrWeakness",
                    ],
                    aggfunc=lambda x: (x == True).sum(),
                    fill_value=0,
                )


                icfr_weaknesses_pivot = icfr_weaknesses_pivot.T
                icfr_weaknesses_pivot["total"] = icfr_weaknesses_pivot.sum(axis=1)
                icfr_weaknesses_pivot = icfr_weaknesses_pivot.sort_values(by="total", ascending=False)
                icfr_weaknesses_pivot
                Out[65]:
                year2004200520062007200820092010201120122013...201620172018201920202021202220232024total
                reportedIcfrWeakness612692752191721221117468117...1091481251251181821882552393281
                remediatedIcfrWeakness1649775536252012720...222520202325292528576

                2 rows × 22 columns

                fig, ax = plt.subplots(figsize=(8, 3))

                icfr_weaknesses_pivot.T.drop("total").plot(
                    kind="bar", stacked=False, ax=ax, cmap="tab20"
                )

                ax.set_title("Number of Item 4.01 Filings\nby signal and year")
                ax.set_xlabel("Year")
                ax.set_ylabel("Number of Filings")
                ax.xaxis.grid(False)
                ax.set_axisbelow(True)
                handles, labels = ax.get_legend_handles_labels() # reverse order of legend items

                labels = [
                    "declination" if label == "declination to stand for reappointment" else label
                    for label in labels
                ]

                ax.legend(
                    handles,
                    labels,
                    title="Reported Weakness",
                    bbox_to_anchor=(1.05, 1),
                    labelspacing=0.3,
                    fontsize=8,
                )

                plt.tight_layout()
                plt.show()

                Reason for End of Engagement

                engagementEndReason = (
                    item_4_01_exploded["engagementEndReason"].explode().value_counts().to_frame().head(5)
                )
                engagementEndReason.index.name = "Audit Opinion Type"
                engagementEndReason.columns = ["Count"]
                engagementEndReason["Pct."] = (
                    engagementEndReason["Count"] / engagementEndReason["Count"].sum() * 100
                )
                engagementEndReason["Pct."] = engagementEndReason["Pct."].round(1)
                engagementEndReason["Count"] = engagementEndReason["Count"].apply(lambda x: f"{x:,.0f}")

                print(
                    f"Top 4 reason for end of engagement with the former accountant\nif stated in the Item 4.01 filings (2004 - 2024):"
                )
                engagementEndReason.head(4)
                Top 4 reason for end of engagement with the former accountant
                if stated in the Item 4.01 filings (2004 - 2024):
                Out[69]:
                CountPct.
                Audit Opinion Type
                dismissal14,94165.5
                resignation6,08226.6
                declination to stand for reappointment1,1875.2
                dissolution6092.7
                endReason_year = item_4_01_exploded[
                    ["engagementEndReason", "year", "accessionNo"]
                ].explode("engagementEndReason")

                endReason_year_pivot = pd.pivot_table(
                    endReason_year,
                    index="engagementEndReason",
                    columns="year",
                    values="accessionNo",
                    aggfunc="count",
                    fill_value=0,
                )

                endReason_year_pivot["total"] = endReason_year_pivot.sum(axis=1)
                endReason_year_pivot = endReason_year_pivot.sort_values(by="total", ascending=False)
                # remove artifacts
                endReason_year_pivot = endReason_year_pivot[
                    endReason_year_pivot["total"] >= 0.001 * endReason_year_pivot["total"].sum()
                ]

                endReason_year_pivot
                Out[70]:
                year2004200520062007200820092010201120122013...201620172018201920202021202220232024total
                engagementEndReason
                dismissal364968115010469251312912888721877...54053845541240049645548264314941
                resignation214537521473389379347281231359...2292422151801071231611871666082
                declination to stand for reappointment39176100784850964430103...3534292116293659421187
                dissolution41950602810733474548...30165467291814609

                4 rows × 22 columns

                fig, ax = plt.subplots(figsize=(5, 3))

                endReason_year_pivot.drop(columns="total").T.plot(
                    kind="bar", stacked=True, ax=ax, cmap="tab20"
                )

                ax.set_title("Number of Item 4.01 Filings\nby End of Engagement Reason and Year")
                ax.set_xlabel("Year")
                ax.set_ylabel("Number of Filings")
                ax.xaxis.grid(False)
                ax.set_axisbelow(True)
                handles, labels = ax.get_legend_handles_labels() # reverse order of legend items

                labels = [
                    "declination" if label == "declination to stand for reappointment" else label
                    for label in labels
                ]

                ax.legend(
                    reversed(handles),
                    reversed(labels),
                    title="Disengagement Reason",
                    bbox_to_anchor=(1.05, 1),
                    labelspacing=0.3,
                    fontsize=8,
                )

                plt.tight_layout()
                plt.show()

                Opinion Types of Audit Reports

                opinionType = item_4_01_exploded["opinionType"].explode().value_counts().to_frame().head(5)
                opinionType.index.name = "Audit Opinion Type"
                opinionType.columns = ["Count"]
                opinionType["Pct."] = opinionType["Count"] / opinionType["Count"].sum() * 100
                opinionType["Pct."] = opinionType["Pct."].round(1)
                opinionType["Count"] = opinionType["Count"].apply(lambda x: f"{x:,.0f}")

                print(
                    f"Opinion types of the audit reports if stated in the Item 4.01 filings (2004 - 2024):"
                )
                opinionType
                Opinion types of the audit reports if stated in the Item 4.01 filings (2004 - 2024):
                Out[73]:
                CountPct.
                Audit Opinion Type
                unqualified18,12897.2
                qualified4772.6
                adverse380.2
                opinionType_year = item_4_01_exploded[["opinionType", "year", "accessionNo"]].explode(
                    "opinionType"
                )

                opinionType_year_pivot = pd.pivot_table(
                    opinionType_year,
                    index="opinionType",
                    columns="year",
                    values="accessionNo",
                    aggfunc="count",
                    fill_value=0,
                )

                opinionType_year_pivot["total"] = opinionType_year_pivot.sum(axis=1)
                opinionType_year_pivot = opinionType_year_pivot.sort_values(by="total", ascending=False)
                # remove artifacts
                opinionType_year_pivot = opinionType_year_pivot[
                    opinionType_year_pivot["total"] >= 0.001 * opinionType_year_pivot["total"].sum()
                ]

                opinionType_year_pivot
                Out[74]:
                year2004200520062007200820092010201120122013...201620172018201920202021202220232024total
                opinionType
                unqualified50113481405127211031353115910038261112...66966854849144758057559167218128
                qualified13614639385542311925...12986268315477
                adverse0425000235...01341122238

                3 rows × 22 columns

                fig, ax = plt.subplots(figsize=(5, 3))

                opinionType_year_pivot.drop(columns="total").T.plot(
                    kind="bar", stacked=True, ax=ax, cmap="tab20"
                )

                ax.set_title("Number of Item 4.01 Filings\nby End of Engagement Reason and Year")
                ax.set_xlabel("Year")
                ax.set_ylabel("Number of Filings")
                ax.xaxis.grid(False)
                ax.set_axisbelow(True)
                handles, labels = ax.get_legend_handles_labels() # reverse order of legend items

                labels = [
                    "declination" if label == "declination to stand for reappointment" else label
                    for label in labels
                ]

                ax.legend(
                    reversed(handles),
                    reversed(labels),
                    title="Auditor",
                    bbox_to_anchor=(1.05, 1),
                    labelspacing=0.3,
                    fontsize=8,
                )

                plt.tight_layout()
                plt.show()

                Footer

                Products

                • EDGAR Filing Search API
                • Full-Text Search API
                • Real-Time Filing Stream API
                • Filing Download & PDF Generator API
                • XBRL-to-JSON Converter
                • 10-K/10-Q/8-K Item Extractor
                • Investment Adviser & Form ADV API
                • Insider Trading Data - Form 3, 4, 5
                • Restricted Sales Notifications - Form 144
                • Institutional Holdings - Form 13F
                • Form N-PORT API - Investment Company Holdings
                • Form N-CEN API - Annual Reports by Investment Companies
                • Form N-PX API - Proxy Voting Records
                • Form 13D/13G API
                • Form S-1/424B4 - IPOs, Debt & Rights Offerings
                • Form C - Crowdfunding Offerings
                • Form D - Private Placements & Exempt Offerings
                • Regulation A Offering Statements API
                • Changes in Auditors & Accountants
                • Non-Reliance on Prior Financial Statements
                • Executive Compensation Data API
                • Audit Fees Data API
                • Directors & Board Members Data
                • Company Subsidiaries Database
                • Outstanding Shares & Public Float
                • SEC Enforcement Actions
                • Accounting & Auditing Enforcement Releases (AAERs)
                • SRO Filings
                • CIK, CUSIP, Ticker Mapping

                General

                • Pricing
                • Features
                • Supported Filings
                • EDGAR Filing Statistics

                Account

                • Sign Up - Start Free Trial
                • Log In
                • Forgot Password

                Developers

                • API Sandbox
                • Documentation
                • Resources & Tutorials
                • Python API SDK
                • Node.js API SDK

                Legal

                • Terms of Service
                • Privacy Policy

                Legal

                • Terms of Service
                • Privacy Policy

                SEC API

                © 2025 sec-api.io by Data2Value GmbH. All rights reserved.

                SEC® and EDGAR® are registered trademarks of the U.S. Securities and Exchange Commission (SEC).

                EDGAR is the Electronic Data Gathering, Analysis, and Retrieval system operated by the SEC.

                sec-api.io and Data2Value GmbH are independent of, and not affiliated with, sponsored by, or endorsed by the U.S. Securities and Exchange Commission.

                sec-api.io is classified under SIC code 7375 (Information Retrieval Services), providing on-demand access to structured data and online information services.