Analysis of Financial Restatements and Non-Reliance Disclosures
We illustrate how to perform an exploratory data analysis on disclosures informing investors about non-reliance of previously issued financial statements (also known as financial restatements) of publicly traded companies on U.S. stock exchanges. These financial restatements are disclosed in Form 8-K filings with the SEC, specifically under Item 4.02, titled "Non-Reliance on Previously Issued Financial Statements or a Related Audit Report or Completed Interim Review." These disclosures are presented in text form by companies. Utilizing our Structured Data API, we extract and structure the relevant information from the text, making it available for detailed analysis.
Our analysis will focus on several key areas:
- Number of Item 4.02 disclosures made each year from 2004 to 2023, per quarter, month and at what time of the day (pre-market, regular market, after-market).
- Distribution of disclosures across structured data fields, such as the proportion of disclosures reporting material weaknesses in internal controls.
- Identification of the party most often responsible for discovering the issue, whether it was the company itself, its auditor, or the SEC.
- Number of times an auditor was involved in the restatement process.
- Number of reporting periods (quarters or years) affected by each restatement.
- Statistics concerning the financial statement items impacted by the restatements.
Data Loading and Preparation
To load and prepare the data, we will use the Form 8-K Structured Data API to download all structured data related to Form 8-K filings that include Item 4.02 disclosures. For the sake of brevity, the data spanning the years 2004 to 2023 has already been preprocessed and saved in a CSV file, allowing us to jump directly into the analysis phase.
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import matplotlib.style as style
import matplotlib.ticker as mtick
style.use("default")
params = {
"axes.labelsize": 8, "font.size": 8, "legend.fontsize": 8,
"xtick.labelsize": 8, "ytick.labelsize": 8, "font.family": "sans-serif",
"axes.spines.top": False, "axes.spines.right": False, "grid.color": "grey",
"axes.grid": True, "axes.grid.axis": "y", "grid.alpha": 0.5, "grid.linestyle": ":",
}
plt.rcParams.update(params)
# data loading comes here
print(f"Loaded {len(structured_data):,} structured data records from 2004 to 2023.")
structured_data.head()
Loaded 8,133 structured data records from 2004 to 2023.
accessionNo | formType | filedAt | periodOfReport | cik | ticker | companyName | items | month | year | qtr | keyComponents | identifiedIssues | affectedReportingPeriods | identifiedBy | restatementIsNecessary | reasonsForRestatement | impactYetToBeDetermined | impactOfError | impactIsMaterial | materialWeaknessIdentified | auditors | affectedLineItems | netIncomeDecreased | netIncomeIncreased | netIncomeAdjustment | revenueDecreased | revenueIncreased | revenueAdjustment | eventClassification | numberPeriodsAffected | numberQuartersAffected | numberYearsAffected | reportedWithOtherItems | reportedWithEarnings | identifiedByAuditor | identifiedByCompany | identifiedBySec | revenueAdjustmentContainsWordMillion | netIncomeAdjustmentContainsWordMillion | filedAtClass | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 0000898080-04-000677 | 8-K | 2004-12-30 17:01:53-05:00 | 2004-12-30 | 1064122 | SCOTTISH RE GROUP LTD | [4.02] | 12 | 2004 | 4 | Scottish Re Group Limited determined that its ... | [Incorrectly reported premiums earned, Incorre... | [Q2 2004, Q3 2004] | [Company] | True | [Incorrect references within the spreadsheets ... | False | Understated net income by $1.1 million in Q2 2... | True | False | [EY] | [premiums earned, claims and other policy bene... | True | True | False | False | Financial Restatement Due to Revenue Recogniti... | 2 | 2 | 0 | False | False | False | True | False | False | False | afterMarket | |||
1 | 0001035704-04-000802 | 8-K | 2004-12-30 14:02:00-05:00 | 2004-12-28 | 1013880 | TTEC | TELETECH HOLDINGS INC | [4.02, 9.01] | 12 | 2004 | 4 | The Audit Committee of TeleTech Holdings, Inc.... | [Tax basis depreciation expense, Liability for... | [FY 2001, FY 2002, FY 2003, Q1 2004, Q2 2004, ... | [Company, Auditor, SEC] | True | [Correct tax basis depreciation expense, Recor... | False | The adjustments were initially deemed immateri... | True | False | [Arthur Andersen] | [sg&a, cos, tax, net income] | True | False | - | False | False | Financial Restatement Due to Tax and Other Adj... | 6 | 3 | 3 | True | True | True | True | True | False | True | marketHours | |
2 | 0001144204-04-023140 | 8-K | 2004-12-30 12:18:24-05:00 | 2004-12-28 | 799235 | ERHE | ENVIRONMENTAL REMEDIATION HOLDING CORP | [4.02, 9.01] | 12 | 2004 | 4 | Environmental Remediation Holding Corporation ... | [Beneficial conversion feature of convertible ... | [Q4 2003, Q1 2004, Q2 2004] | [Company] | True | [Beneficial conversion feature of convertible ... | False | The Company has recorded a beneficial conversi... | False | False | [] | [convertible debt, interest expense] | False | False | False | False | Financial Restatement Due to Convertible Debt ... | 3 | 3 | 0 | True | True | False | True | False | False | False | marketHours | ||
3 | 0000899681-04-000864 | 8-K | 2004-12-27 13:19:05-05:00 | 2004-12-21 | 1276676 | NATIONSRENT COMPANIES INC | [4.02] | 12 | 2004 | 4 | NationsRent Companies, Inc. determined that in... | [Under-accrued incentive compensation bonuses,... | [Q3 2004, Nine-month period ended September 30... | [Company] | True | [Under-accrued incentive compensation bonuses,... | False | The corrections are expected to decrease net i... | False | False | [] | [cost of rental, selling, general and administ... | True | False | False | False | Financial Restatement Due to Under-accrual of ... | 2 | 1 | 0 | False | False | False | True | False | False | False | marketHours | |||
4 | 0001193125-04-219339 | 8-K | 2004-12-23 17:24:54-05:00 | 2004-12-17 | 735993 | BKBO | BAKBONE SOFTWARE INC | [4.02, 9.01] | 12 | 2004 | 4 | BakBone Software Incorporated concluded that c... | [Error in the calculation of a non-cash benefi... | [Q2 2004, FY 2004, Q1 2005] | [Company, Auditor] | True | [Error in the calculation of a non-cash benefi... | True | The corrected calculation will result in an in... | True | False | [KPMG, Deloitte] | [net income, net income per common share, shar... | True | False | True | False | not exceed $500,000 | Financial Restatement Due to Beneficial Conver... | 3 | 2 | 1 | True | True | True | True | False | False | False | afterMarket |
item_4_02_counts = (
structured_data.drop_duplicates(subset=["accessionNo"])
.groupby(["year"])
.size()
.to_frame(name="count")
)
print(f"Item 4.02 counts from 2004 to 2023.")
item_4_02_counts.T
Item 4.02 counts from 2004 to 2023.
year | 2004 | 2005 | 2006 | 2007 | 2008 | 2009 | 2010 | 2011 | 2012 | 2013 | 2014 | 2015 | 2016 | 2017 | 2018 | 2019 | 2020 | 2021 | 2022 | 2023 |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
count | 171 | 962 | 1004 | 780 | 572 | 465 | 455 | 402 | 346 | 326 | 240 | 213 | 173 | 141 | 132 | 98 | 96 | 864 | 304 | 262 |
def plot_timeseries(ts, title):
fig, ax = plt.subplots(figsize=(4, 2.5))
ts["count"].plot(ax=ax, legend=False)
ax.set_title(title); ax.set_xlabel("Year"); ax.set_ylabel("Number of\nItem 4.02 Filings")
ax.set_xticks(np.arange(2004, 2024, 2)); ax.yaxis.set_major_formatter(mtick.StrMethodFormatter("{x:,.0f}"))
ax.set_xlim(2003, 2023); ax.grid(axis="x"); ax.set_axisbelow(True); plt.xticks(rotation=45, ha="right")
for year in range(2004, 2024, 1):
year_y_max = ts.loc[year, "count"]
ax.vlines(year, 0, year_y_max, linestyles=":", colors="grey", alpha=0.5, lw=1)
plt.tight_layout()
plt.show()
plot_timeseries(
item_4_02_counts,
title="Disclosures of Financial Statement Non-Reliance\nForm 8-K with Item 4.02 per Year (2004 - 2023)"
)
structured_data["qtr"] = structured_data["month"].apply(lambda x: (x - 1) // 3 + 1)
counts_qtr_yr_piv = (
structured_data.drop_duplicates(subset=["accessionNo"])
.groupby(["year", "qtr"])
.size()
.unstack()
.fillna(0)
).astype(int)
print(f"Item 4.02 counts by quarter from 2004 to 2023.")
counts_qtr_yr_piv.T
Item 4.02 counts by quarter from 2004 to 2023.
year | 2004 | 2005 | 2006 | 2007 | 2008 | 2009 | 2010 | 2011 | 2012 | 2013 | 2014 | 2015 | 2016 | 2017 | 2018 | 2019 | 2020 | 2021 | 2022 | 2023 |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
qtr | ||||||||||||||||||||
1 | 0 | 301 | 281 | 239 | 187 | 118 | 132 | 120 | 99 | 106 | 72 | 57 | 59 | 34 | 41 | 27 | 28 | 29 | 136 | 80 |
2 | 0 | 221 | 222 | 201 | 132 | 105 | 131 | 115 | 95 | 97 | 67 | 49 | 42 | 41 | 34 | 28 | 24 | 370 | 56 | 62 |
3 | 33 | 190 | 239 | 168 | 130 | 124 | 94 | 76 | 79 | 54 | 47 | 54 | 31 | 28 | 20 | 20 | 22 | 39 | 59 | 59 |
4 | 138 | 250 | 262 | 172 | 123 | 118 | 98 | 91 | 73 | 69 | 54 | 53 | 41 | 38 | 37 | 23 | 22 | 426 | 53 | 61 |
counts_qtr_yr = counts_qtr_yr_piv.stack().reset_index(name="count")
fig, ax = plt.subplots(figsize=(6, 2.5))
counts_qtr_yr_piv.plot(kind="bar", ax=ax, legend=True)
ax.legend(title="Quarter", loc="upper right", bbox_to_anchor=(1.02, 1))
ax.set_title("Number of Non-Reliance Disclosures per Quarter\n(2004 - 2023)")
ax.set_xlabel("Year"); ax.set_ylabel("Number of\nItem 4.02 Filings")
ax.yaxis.set_major_formatter(mtick.StrMethodFormatter("{x:,.0f}"))
ax.grid(axis="x"); ax.set_axisbelow(True)
plt.tight_layout()
plt.show()
counts_month_yr_piv = (
structured_data.drop_duplicates(subset=["accessionNo"])
.groupby(["year", "month"])
.size()
.unstack()
.fillna(0)
).astype(int)
print(f"Item 4.02 counts by month from 2004 to 2023.")
counts_month_yr_piv
Item 4.02 counts by month from 2004 to 2023.
month | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 |
---|---|---|---|---|---|---|---|---|---|---|---|---|
year | ||||||||||||
2004 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 7 | 26 | 29 | 63 | 46 |
2005 | 37 | 89 | 175 | 85 | 91 | 45 | 46 | 91 | 53 | 63 | 123 | 64 |
2006 | 45 | 84 | 152 | 81 | 78 | 63 | 60 | 115 | 64 | 76 | 109 | 77 |
2007 | 60 | 67 | 112 | 85 | 62 | 54 | 41 | 91 | 36 | 53 | 76 | 43 |
2008 | 40 | 74 | 73 | 63 | 45 | 24 | 32 | 62 | 36 | 31 | 60 | 32 |
2009 | 33 | 40 | 45 | 50 | 33 | 22 | 31 | 46 | 47 | 36 | 58 | 24 |
2010 | 37 | 41 | 54 | 49 | 48 | 34 | 23 | 47 | 24 | 34 | 32 | 32 |
2011 | 24 | 26 | 70 | 49 | 39 | 27 | 28 | 31 | 17 | 30 | 45 | 16 |
2012 | 26 | 28 | 45 | 39 | 31 | 25 | 26 | 35 | 18 | 26 | 38 | 9 |
2013 | 18 | 40 | 48 | 41 | 36 | 20 | 11 | 29 | 14 | 15 | 39 | 15 |
2014 | 16 | 25 | 31 | 22 | 26 | 19 | 11 | 24 | 12 | 17 | 26 | 11 |
2015 | 10 | 19 | 28 | 24 | 21 | 4 | 20 | 19 | 15 | 18 | 25 | 10 |
2016 | 17 | 14 | 28 | 15 | 21 | 6 | 9 | 11 | 11 | 5 | 25 | 11 |
2017 | 10 | 9 | 15 | 20 | 11 | 10 | 7 | 15 | 6 | 10 | 21 | 7 |
2018 | 12 | 14 | 15 | 14 | 13 | 7 | 3 | 11 | 6 | 16 | 15 | 6 |
2019 | 6 | 7 | 14 | 12 | 10 | 6 | 9 | 8 | 3 | 8 | 10 | 5 |
2020 | 2 | 13 | 13 | 9 | 12 | 3 | 8 | 10 | 4 | 7 | 11 | 4 |
2021 | 4 | 10 | 15 | 54 | 264 | 52 | 20 | 11 | 8 | 6 | 271 | 149 |
2022 | 38 | 46 | 52 | 23 | 23 | 10 | 6 | 38 | 15 | 13 | 30 | 10 |
2023 | 9 | 25 | 46 | 24 | 28 | 10 | 10 | 28 | 21 | 17 | 25 | 19 |
print(f"Descriptive statistics for Item 4.02 counts by month from 2005 to 2023.")
month_stats = (
counts_month_yr_piv.loc[2005:]
.describe(percentiles=[0.025, 0.975])
.round(0)
.astype(int)
)
month_stats
Descriptive statistics for Item 4.02 counts by month from 2005 to 2023.
month | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 |
---|---|---|---|---|---|---|---|---|---|---|---|---|
count | 19 | 19 | 19 | 19 | 19 | 19 | 19 | 19 | 19 | 19 | 19 | 19 |
mean | 23 | 35 | 54 | 40 | 47 | 23 | 21 | 38 | 22 | 25 | 55 | 29 |
std | 16 | 26 | 46 | 25 | 57 | 18 | 16 | 31 | 18 | 20 | 61 | 35 |
min | 2 | 7 | 13 | 9 | 10 | 3 | 3 | 8 | 3 | 5 | 10 | 4 |
2.5% | 3 | 8 | 13 | 10 | 10 | 3 | 4 | 9 | 3 | 5 | 10 | 4 |
50% | 18 | 26 | 45 | 39 | 31 | 20 | 20 | 29 | 15 | 17 | 32 | 15 |
97.5% | 53 | 87 | 165 | 85 | 186 | 59 | 54 | 104 | 59 | 70 | 204 | 117 |
max | 60 | 89 | 175 | 85 | 264 | 63 | 60 | 115 | 64 | 76 | 271 | 149 |
def plot_box_plot_as_line(
data: pd.DataFrame,
x_months=True,
title="",
x_label="",
x_pos_mean_label=2,
pos_labels=None, # {"mean": {"x": 2, "y": 150}, "upper": {"x": 2, "y": 150}, "lower": {"x": 2, "y": 150}},
pos_high_low=None, # {"high": {"x": 2, "y": 150}, "low": {"x": 2, "y": 150}},
y_label="",
y_formatter=lambda x, p: "{:.0f}".format(int(x) / 1000),
show_high_low_labels=True,
show_inline_labels=True,
show_bands=True,
figsize=(4, 2.5),
line_source="mean",
):
fig, ax = plt.subplots(figsize=figsize)
line_to_plot = data[line_source]
lower_label = "2.5%"
upper_label = "97.5%"
lower = data[lower_label]
upper = data[upper_label]
line_to_plot.plot(ax=ax)
if show_bands:
ax.fill_between(line_to_plot.index, lower, upper, alpha=0.2)
if (x_months):
ax.set_xlim(0.5, 12.5)
ax.set_xticks(range(1, 13))
ax.set_xticklabels(["J", "F", "M", "A", "M", "J", "J", "A", "S", "O", "N", "D"])
ax.yaxis.set_major_formatter(mtick.FuncFormatter(y_formatter))
ax.set_ylabel(y_label); ax.set_xlabel(x_label); ax.set_title(title)
ymin, ymax = ax.get_ylim()
y_scale = ymax - ymin
max_x = int(line_to_plot.idxmax())
max_y = line_to_plot.max()
min_x = int(line_to_plot.idxmin())
min_y = line_to_plot.min()
ax.axvline(
max_x,
ymin=0,
ymax=((max_y - ymin) / (ymax - ymin)),
linestyle="dashed",
color="tab:blue",
alpha=0.5,
)
ax.scatter(max_x, max_y, color="tab:blue", s=10)
ax.axvline(
min_x,
ymin=0,
ymax=((min_y - ymin) / (ymax - ymin)),
linestyle="dashed",
color="tab:blue",
alpha=0.5,
)
ax.scatter(min_x, min_y, color="tab:blue", s=10)
x_pos_mean_label_int = int(x_pos_mean_label)
if show_inline_labels:
mean_x = x_pos_mean_label
mean_y = line_to_plot.iloc[x_pos_mean_label_int] * 1.02
upper_x = x_pos_mean_label
upper_y = upper.iloc[x_pos_mean_label_int]
lower_x = x_pos_mean_label
lower_y = lower.iloc[x_pos_mean_label_int] * 0.95
if pos_labels:
mean_x = pos_labels["mean"]["x"]
mean_y = pos_labels["mean"]["y"]
upper_x = pos_labels["upper"]["x"]
upper_y = pos_labels["upper"]["y"]
lower_x = pos_labels["lower"]["x"]
lower_y = pos_labels["lower"]["y"]
ax.text(mean_x, mean_y, "Mean", color="tab:blue", fontsize=8)
ax.text(upper_x, upper_y, upper_label, color="tab:blue", fontsize=8)
ax.text(lower_x, lower_y, lower_label, color="tab:blue", fontsize=8)
if show_high_low_labels:
high_x_origin = max_x
high_y_origin = max_y
high_x_label = high_x_origin + 0.5
high_y_label = high_y_origin + 0.1 * y_scale
if pos_high_low:
high_x_label = pos_high_low["high"]["x"]
high_y_label = pos_high_low["high"]["y"]
ax.annotate(
"High",
(high_x_origin, high_y_origin),
xytext=(high_x_label, high_y_label),
arrowprops=dict(facecolor="black", arrowstyle="->"),
)
low_x_origin = min_x * 1.01
low_y_origin = min_y
low_x_label = low_x_origin + 1.5
low_y_label = low_y_origin - 0.1 * y_scale
if pos_high_low:
low_x_label = pos_high_low["low"]["x"]
low_y_label = pos_high_low["low"]["y"]
ax.annotate(
"Low",
(low_x_origin, low_y_origin),
xytext=(low_x_label, low_y_label),
arrowprops=dict(facecolor="black", arrowstyle="->"),
)
ax.grid(axis="x"); ax.set_axisbelow(True)
plt.tight_layout()
plt.show()
plot_box_plot_as_line(
data=month_stats.T,
title="Descriptive Statistics for Item 4.02 Filings by Month\n(2005 - 2023)",
x_label="Month",
y_label="Number of\nItem 4.02 Filings",
y_formatter=lambda x, p: "{:.0f}".format(int(x)),
)
counts_filedAtClass = (
structured_data.drop_duplicates(subset=["accessionNo"])
.groupby(["filedAtClass"])
.size()
.sort_values(ascending=False)
.to_frame(name="Count")
).rename_axis("Publication Time")
counts_filedAtClass["Pct"] = (
counts_filedAtClass["Count"].astype(int)
/ counts_filedAtClass["Count"].astype(int).sum()
).map("{:.0%}".format)
counts_filedAtClass["Count"] = counts_filedAtClass["Count"].map(lambda x: f"{x:,}")
counts_filedAtClass.index = (
counts_filedAtClass.index.str.replace("preMarket", "Pre-Market (4:00 - 9:30 AM)")
.str.replace("marketHours", "Market Hours (9:30 AM - 4:00 PM)")
.str.replace("afterMarket", "After Market (4:00 - 8:00 PM)")
)
counts_filedAtClass = counts_filedAtClass.reindex(counts_filedAtClass.index[::-1])
print(
f"Item 4.02 counts by pre-market, regular market hours,\nand after-market publication time (2004 - 2023)."
)
counts_filedAtClass
Item 4.02 counts by pre-market, regular market hours,
and after-market publication time (2004 - 2023).
Count | Pct | |
---|---|---|
Publication Time | ||
Pre-Market (4:00 - 9:30 AM) | 899 | 11% |
Market Hours (9:30 AM - 4:00 PM) | 2,022 | 25% |
After Market (4:00 - 8:00 PM) | 5,085 | 64% |
structured_data["dayOfWeek"] = structured_data["filedAt"].dt.strftime('%a')
counts_dayOfWeek = (
structured_data.drop_duplicates(subset=["accessionNo"])
.groupby(["dayOfWeek"])
.size()
.to_frame(name="Count")
).rename_axis("Day of the Week")
counts_dayOfWeek["Pct"] = (
counts_dayOfWeek["Count"].astype(int)
/ counts_dayOfWeek["Count"].astype(int).sum()
).map("{:.0%}".format)
counts_dayOfWeek['Count'] = counts_dayOfWeek['Count'].map(lambda x: f"{x:,}")
print(f"Item 4.02 disclosures by day of the week (2004 - 2023).")
counts_dayOfWeek.loc[['Mon', 'Tue', 'Wed', 'Thu', 'Fri']]
Item 4.02 disclosures by day of the week (2004 - 2023).
Count | Pct | |
---|---|---|
Day of the Week | ||
Mon | 1,584 | 20% |
Tue | 1,655 | 21% |
Wed | 1,490 | 19% |
Thu | 1,519 | 19% |
Fri | 1,758 | 22% |
bool_variables_to_analyze = [
"impactIsMaterial",
"restatementIsNecessary",
"impactYetToBeDetermined",
"materialWeaknessIdentified",
"reportedWithOtherItems",
"reportedWithEarnings",
"netIncomeDecreased",
"netIncomeIncreased",
"netIncomeAdjustmentContainsWordMillion",
"revenueDecreased",
"revenueIncreased",
"revenueAdjustmentContainsWordMillion",
"identifiedByAuditor",
"identifiedByCompany",
"identifiedBySec",
]
var_to_label = {
"impactIsMaterial": "Impact is Material",
"restatementIsNecessary": "Restatement is Necessary",
"impactYetToBeDetermined": "Impact Yet to be Determined",
"materialWeaknessIdentified": "Material Weakness Identified",
"reportedWithOtherItems": "Reported with Other Items",
"reportedWithEarnings": "Reported with Earnings Announcement",
"netIncomeDecreased": "Net Income Decreased",
"netIncomeIncreased": "Net Income Increased",
"netIncomeAdjustmentContainsWordMillion": "Net Inc. Adj. Contains 'Million'",
"revenueDecreased": "Revenue Decreased",
"revenueIncreased": "Revenue Increased",
"revenueAdjustmentContainsWordMillion": "Revenue Adj. Contains 'Million'",
"identifiedByAuditor": "Identified by Auditor",
"identifiedByCompany": "Identified by Company",
"identifiedBySec": "Identified by SEC",
}
bool_variables_stats = []
for variable in bool_variables_to_analyze:
variable_stats = (
structured_data[variable]
.value_counts()
.to_frame()
.reset_index()
.rename(columns={variable: "value"})
)
variable_stats = variable_stats.sort_values(by="value", ascending=False)
variable_stats["pct"] = (
variable_stats["count"] / variable_stats["count"].sum() * 100
).round(1)
variable_stats.index = pd.MultiIndex.from_tuples(
[(variable, row["value"]) for _, row in variable_stats.iterrows()],
)
variable_stats.drop(columns="value", inplace=True)
bool_variables_stats.append(variable_stats)
bool_variables_stats = pd.concat(bool_variables_stats, axis=0)
bool_variables_stats.index.set_names(["Variable", "Value"], inplace=True)
bool_variables_stats.rename(index=var_to_label, columns={"count": "Samples", "pct": "Pct."}, inplace=True)
bool_variables_stats["Samples"] = bool_variables_stats["Samples"].apply(lambda x: f"{x:,.0f}")
print(f"Number of non-reliance filings by \ntheir disclosed characteristics (2004 - 2023):")
bool_variables_stats
Number of non-reliance filings by
their disclosed characteristics (2004 - 2023):
Samples | Pct. | ||
---|---|---|---|
Variable | Value | ||
Impact is Material | True | 6,054 | 74.4 |
False | 2,079 | 25.6 | |
Restatement is Necessary | True | 7,925 | 97.4 |
False | 208 | 2.6 | |
Impact Yet to be Determined | True | 1,983 | 24.4 |
False | 6,150 | 75.6 | |
Material Weakness Identified | True | 2,116 | 26.0 |
False | 6,017 | 74.0 | |
Reported with Other Items | True | 3,564 | 43.8 |
False | 4,569 | 56.2 | |
Reported with Earnings Announcement | True | 3,149 | 38.7 |
False | 4,984 | 61.3 | |
Net Income Decreased | True | 2,049 | 25.2 |
False | 6,084 | 74.8 | |
Net Income Increased | True | 747 | 9.2 |
False | 7,386 | 90.8 | |
Net Inc. Adj. Contains 'Million' | True | 440 | 5.4 |
False | 7,693 | 94.6 | |
Revenue Decreased | True | 615 | 7.6 |
False | 7,518 | 92.4 | |
Revenue Increased | True | 196 | 2.4 |
False | 7,937 | 97.6 | |
Revenue Adj. Contains 'Million' | True | 186 | 2.3 |
False | 7,947 | 97.7 | |
Identified by Auditor | True | 2,122 | 26.1 |
False | 6,011 | 73.9 | |
Identified by Company | True | 6,782 | 83.4 |
False | 1,351 | 16.6 | |
Identified by SEC | True | 1,262 | 15.5 |
False | 6,871 | 84.5 |
identifiedBy = structured_data["identifiedBy"].explode().value_counts().to_frame().head(3)
identifiedBy.index.name = "Identified By"
identifiedBy.columns = ["Count"]
identifiedBy["Pct."] = identifiedBy["Count"] / identifiedBy["Count"].sum() * 100
identifiedBy["Pct."] = identifiedBy["Pct."].round(1)
identifiedBy["Count"] = identifiedBy["Count"].apply(lambda x: f"{x:,.0f}")
print(f"Top 3 entities identifying issues in\npreviously reported financial statements (2004 - 2023):")
identifiedBy
Top 3 entities identifying issues in
previously reported financial statements (2004 - 2023):
Count | Pct. | |
---|---|---|
Identified By | ||
Company | 6,782 | 66.7 |
Auditor | 2,122 | 20.9 |
SEC | 1,262 | 12.4 |
all_auditors = structured_data["auditors"].explode()
all_auditors = all_auditors[all_auditors.str.len() > 0].reset_index(drop=True)
auditors = all_auditors.value_counts().to_frame().reset_index()
auditors["pct"] = auditors["count"] / auditors["count"].sum() * 100
auditors["pct"] = auditors["pct"].round(2)
print("Top 10 auditors involved in \nnon-reliance disclosures from 2004 to 2023:")
auditors.head(10)
Top 10 auditors involved in
non-reliance disclosures from 2004 to 2023:
auditors | count | pct | |
---|---|---|---|
0 | PwC | 626 | 9.62 |
1 | EY | 570 | 8.76 |
2 | Marcum | 459 | 7.06 |
3 | Deloitte | 430 | 6.61 |
4 | KPMG | 425 | 6.53 |
5 | Unknown | 323 | 4.97 |
6 | WithumSmith+Brown | 322 | 4.95 |
7 | BDO | 317 | 4.87 |
8 | Grant Thornton | 163 | 2.51 |
9 | MaloneBailey | 99 | 1.52 |
auditors_year = structured_data[["auditors", "year", "accessionNo"]].explode("auditors")
auditors_year_pivot = pd.pivot_table(
auditors_year,
index="auditors",
columns="year",
values="accessionNo",
aggfunc="count",
fill_value=0,
)
auditors_year_pivot["total"] = auditors_year_pivot.sum(axis=1)
auditors_year_pivot = auditors_year_pivot.sort_values(by="total", ascending=False)
top_10_auditors = auditors_year_pivot.head(10)
others = auditors_year_pivot[~auditors_year_pivot.index.isin(top_10_auditors.index)]
others = others.sum().to_frame().T
others.index = ["Others"]
top_10_auditors = pd.concat([top_10_auditors, others], axis=0)
top_10_auditors
year | 2004 | 2005 | 2006 | 2007 | 2008 | 2009 | 2010 | 2011 | 2012 | 2013 | 2014 | 2015 | 2016 | 2017 | 2018 | 2019 | 2020 | 2021 | 2022 | 2023 | total |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
PwC | 42 | 159 | 149 | 48 | 28 | 21 | 20 | 14 | 18 | 17 | 18 | 16 | 16 | 5 | 11 | 6 | 6 | 9 | 10 | 13 | 626 |
EY | 20 | 87 | 83 | 85 | 31 | 31 | 12 | 18 | 25 | 31 | 28 | 14 | 14 | 8 | 7 | 3 | 7 | 24 | 16 | 26 | 570 |
Marcum | 0 | 0 | 0 | 0 | 0 | 3 | 5 | 5 | 1 | 4 | 3 | 1 | 3 | 2 | 1 | 5 | 5 | 340 | 45 | 36 | 459 |
Deloitte | 11 | 103 | 104 | 54 | 38 | 20 | 13 | 10 | 8 | 11 | 7 | 9 | 16 | 5 | 3 | 0 | 1 | 6 | 2 | 9 | 430 |
KPMG | 24 | 116 | 78 | 54 | 30 | 13 | 11 | 10 | 7 | 15 | 5 | 3 | 4 | 3 | 0 | 4 | 5 | 29 | 10 | 4 | 425 |
Unknown | 7 | 63 | 40 | 35 | 25 | 14 | 20 | 10 | 12 | 19 | 8 | 11 | 1 | 5 | 1 | 4 | 2 | 36 | 6 | 4 | 323 |
WithumSmith+Brown | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 2 | 2 | 1 | 4 | 0 | 0 | 0 | 0 | 237 | 58 | 17 | 322 |
BDO | 8 | 44 | 23 | 26 | 24 | 2 | 12 | 6 | 11 | 7 | 4 | 11 | 15 | 15 | 17 | 11 | 13 | 18 | 33 | 17 | 317 |
Grant Thornton | 4 | 34 | 20 | 23 | 10 | 7 | 14 | 3 | 11 | 3 | 4 | 1 | 3 | 0 | 3 | 1 | 3 | 10 | 6 | 3 | 163 |
MaloneBailey | 0 | 8 | 9 | 19 | 3 | 4 | 14 | 15 | 2 | 1 | 0 | 5 | 1 | 2 | 6 | 1 | 0 | 2 | 2 | 5 | 99 |
Others | 19 | 181 | 267 | 251 | 234 | 211 | 240 | 219 | 158 | 143 | 124 | 109 | 77 | 67 | 58 | 54 | 56 | 97 | 105 | 101 | 2771 |
fig, ax = plt.subplots(figsize=(5, 3))
top_10_auditors.drop(columns="total").T.plot(
kind="bar",
stacked=True,
ax=ax,
cmap="tab20"
)
ax.set_title("Number of Item 4.02 Filings\nby Auditor and Year")
ax.set_xlabel("Year"); ax.set_ylabel("Number of Filings")
ax.xaxis.grid(False); ax.set_axisbelow(True)
handles, labels = ax.get_legend_handles_labels() # reverse order of legend items
ax.legend(reversed(handles), reversed(labels), title="Auditor", bbox_to_anchor=(1.05, 1), labelspacing=0.3, fontsize=8)
plt.tight_layout()
plt.show()
print(f"Descriptive statistics for number of years and quarters \naffected by Item 4.02 filings (2004 - 2023):")
quarters_stats = structured_data[["numberQuartersAffected", "numberYearsAffected"]].describe().round(0).astype(int)
quarters_stats.T
Descriptive statistics for number of years and quarters
affected by Item 4.02 filings (2004 - 2023):
count | mean | std | min | 25% | 50% | 75% | max | |
---|---|---|---|---|---|---|---|---|
numberQuartersAffected | 8133 | 2 | 3 | 0 | 1 | 2 | 3 | 56 |
numberYearsAffected | 8133 | 1 | 1 | 0 | 0 | 1 | 2 | 20 |
affectedLineItems_stats = (
structured_data["affectedLineItems"]
.explode()
.value_counts()
.to_frame()
.reset_index()
.head(10)
)
print(f"Top 10 line items affected by non-reliance disclosures\nacross all years (2004 - 2023):")
affectedLineItems_stats
Top 10 line items affected by non-reliance disclosures
across all years (2004 - 2023):
affectedLineItems | count | |
---|---|---|
0 | net income | 2305 |
1 | equity | 1062 |
2 | revenue | 936 |
3 | total liabilities | 663 |
4 | additional paid-in capital | 651 |
5 | earnings per share | 553 |
6 | accumulated deficit | 520 |
7 | total assets | 434 |
8 | retained earnings | 397 |
9 | warrants | 333 |
line_items_year_pivot = pd.pivot_table(
structured_data.explode("affectedLineItems"),
index="affectedLineItems",
columns="year",
values="accessionNo",
aggfunc="count",
fill_value=0,
margins=True,
)
line_items_year_pivot = line_items_year_pivot.sort_values(by="All", ascending=False)
print(f"Top 20 line items affected by Item 4.02 filings per year (2004 - 2023):")
line_items_year_pivot.head(20)
Top 20 line items affected by Item 4.02 filings per year (2004 - 2023):
year | 2004 | 2005 | 2006 | 2007 | 2008 | 2009 | 2010 | 2011 | 2012 | 2013 | 2014 | 2015 | 2016 | 2017 | 2018 | 2019 | 2020 | 2021 | 2022 | 2023 | All |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
affectedLineItems | |||||||||||||||||||||
All | 508 | 3089 | 3468 | 2424 | 2114 | 1684 | 1757 | 1422 | 1142 | 1171 | 819 | 810 | 610 | 394 | 402 | 367 | 346 | 2256 | 903 | 850 | 26536 |
net income | 55 | 348 | 301 | 226 | 208 | 127 | 159 | 118 | 95 | 85 | 78 | 65 | 42 | 27 | 30 | 34 | 31 | 152 | 59 | 65 | 2305 |
equity | 11 | 65 | 109 | 49 | 45 | 43 | 58 | 50 | 27 | 42 | 13 | 18 | 14 | 7 | 13 | 15 | 7 | 403 | 47 | 26 | 1062 |
revenue | 31 | 150 | 104 | 81 | 70 | 48 | 46 | 50 | 37 | 48 | 35 | 29 | 29 | 26 | 21 | 22 | 22 | 23 | 28 | 36 | 936 |
total liabilities | 11 | 37 | 56 | 35 | 26 | 37 | 41 | 43 | 23 | 26 | 30 | 18 | 12 | 8 | 7 | 9 | 10 | 194 | 24 | 16 | 663 |
additional paid-in capital | 3 | 39 | 65 | 73 | 52 | 48 | 49 | 36 | 18 | 24 | 26 | 6 | 11 | 8 | 6 | 9 | 8 | 108 | 52 | 10 | 651 |
earnings per share | 8 | 56 | 69 | 48 | 53 | 18 | 39 | 33 | 26 | 13 | 9 | 8 | 5 | 5 | 7 | 6 | 5 | 82 | 52 | 11 | 553 |
accumulated deficit | 8 | 24 | 39 | 51 | 38 | 25 | 30 | 28 | 20 | 24 | 17 | 14 | 8 | 5 | 7 | 5 | 11 | 103 | 50 | 13 | 520 |
total assets | 10 | 38 | 38 | 34 | 40 | 43 | 24 | 35 | 20 | 28 | 25 | 21 | 11 | 14 | 9 | 7 | 9 | 6 | 11 | 11 | 434 |
retained earnings | 7 | 62 | 63 | 34 | 41 | 42 | 32 | 17 | 11 | 15 | 11 | 16 | 5 | 7 | 5 | 6 | 5 | 7 | 6 | 5 | 397 |
warrants | 2 | 5 | 23 | 12 | 6 | 6 | 10 | 5 | 3 | 3 | 2 | 1 | 0 | 0 | 0 | 0 | 1 | 247 | 3 | 4 | 333 |
cost of sales | 10 | 26 | 48 | 38 | 31 | 23 | 23 | 16 | 15 | 15 | 7 | 14 | 12 | 2 | 10 | 2 | 4 | 6 | 3 | 12 | 317 |
interest expense | 8 | 43 | 42 | 33 | 38 | 24 | 29 | 17 | 12 | 9 | 9 | 8 | 7 | 1 | 1 | 4 | 3 | 1 | 2 | 11 | 302 |
goodwill | 3 | 27 | 39 | 26 | 19 | 21 | 12 | 26 | 20 | 11 | 7 | 6 | 9 | 5 | 3 | 5 | 3 | 6 | 6 | 7 | 261 |
derivative liabilities | 0 | 0 | 21 | 18 | 11 | 4 | 21 | 10 | 21 | 15 | 9 | 14 | 6 | 3 | 2 | 4 | 1 | 88 | 6 | 1 | 255 |
current liabilities | 11 | 21 | 21 | 21 | 17 | 18 | 11 | 17 | 8 | 16 | 11 | 10 | 4 | 3 | 5 | 3 | 5 | 7 | 3 | 13 | 225 |
accounts receivable | 2 | 18 | 57 | 12 | 19 | 12 | 10 | 15 | 9 | 9 | 8 | 8 | 6 | 8 | 3 | 7 | 2 | 2 | 5 | 4 | 216 |
common stock | 2 | 8 | 27 | 18 | 12 | 13 | 12 | 10 | 6 | 12 | 5 | 3 | 4 | 7 | 1 | 2 | 4 | 19 | 7 | 3 | 175 |
inventory | 3 | 23 | 28 | 20 | 19 | 4 | 11 | 7 | 8 | 1 | 2 | 9 | 7 | 4 | 7 | 2 | 3 | 1 | 2 | 7 | 168 |
income tax expense | 0 | 28 | 22 | 14 | 19 | 10 | 13 | 7 | 2 | 9 | 5 | 2 | 4 | 5 | 7 | 0 | 2 | 1 | 2 | 9 | 161 |