Analysis of SEC Administrative Proceedings with Python
This tutorial provides a step-by-step guide to accessing and analyzing SEC administrative proceedings using Python and the Administrative Proccedings API. The dataset for this tutorial contains all SEC administrative proceedings from 1995 to 2024, including parties involved, order types, penalties, and more.
The analysis will cover the following topics:
- Data loading, cleaning, and preprocessing
- Proceedings published per year and month
- Top 10 most common order types
- Penalty amounts per year and respondent type
- And more
!pip install sec-api
# NOTE: Replace with your own API key
API_KEY_SEC_API = "YOUR_API_KEY"
from sec_api import SecAdministrativeProceedingsApi
adminProceedingsApi = SecAdministrativeProceedingsApi(API_KEY_SEC_API)
searchRequest = {
"query": "releasedAt:[2024-01-01 TO 2024-12-31]",
"from": "0",
"size": "50",
"sort": [{"releasedAt": {"order": "desc"}}],
}
response = adminProceedingsApi.get_data(searchRequest)
proceedings = response["data"]
print(f"Loaded first {len(proceedings)} proceedings")
Loaded first 50 proceedings
import pandas as pd
df = pd.DataFrame(proceedings)
df.head()
Out[3]:
id | releasedAt | releaseNo | fileNumbers | respondents | respondentsText | resources | title | summary | tags | ... | parallelActionsTakenBy | hasAgreedToSettlement | hasAgreedToPayPenalty | penaltyAmounts | requestedRelief | violatedSections | orders | investigationConductedBy | litigationLedBy | otherAgenciesInvolved | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 0ab80b58b2fcf40e7497aa0000759a37 | 2024-12-31T12:19:45-05:00 | [34-102060, AAER-4554] | [3-22386] | [{'name': 'Accell Audit & Compliance, PA', 'ty... | Accell Audit & Compliance, PA | [{'label': 'primary', 'url': 'https://www.sec.... | ORDER INSTITUTING PUBLIC ADMINISTRATIVE PROCEE... | The SEC has instituted public administrative p... | [fraudulent financial reporting, accounting mi... | ... | [] | True | True | [{'penaltyAmount': '75000', 'penaltyAmountText... | [] | [Section 10(b) of the Exchange Act, Rule 10b-5] | [Accell is suspended from appearing or practic... | [] | [] | [] |
1 | 0dbf2331563447a91e64b78077239115 | 2024-12-30T13:02:35-05:00 | [IA-6806] | [3-22385] | [{'name': 'Steven J. Susoeff', 'type': 'indivi... | Steven J. Susoeff | [{'label': 'primary', 'url': 'https://www.sec.... | ORDER INSTITUTING ADMINISTRATIVE PROCEEDINGS P... | The SEC has instituted administrative proceedi... | [fraudulent cherry-picking, investment adviser... | ... | [] | True | False | [] | [] | [Sections 17(a) of the Securities Act of 1933,... | [Respondent Susoeff is barred from association... | [] | [] | [] |
2 | 8ad2aaf148601d33de3460509dd08cec | 2024-12-27T17:12:33-05:00 | [34-102047, AAER-4553] | [3-22384] | [{'name': 'Christopher Hiestand', 'type': 'ind... | Christopher Hiestand, CPA | [{'label': 'primary', 'url': 'https://www.sec.... | ORDER INSTITUTING PUBLIC ADMINISTRATIVE PROCEE... | The SEC has instituted public administrative p... | [fraudulent financial reporting, accounting mi... | ... | [] | True | True | [{'penaltyAmount': '50000', 'penaltyAmountText... | [] | [Section 10(b) of the Exchange Act, Rule 10b-5] | [Hiestand is suspended from appearing or pract... | [] | [] | [] |
3 | 804947f56e7146f94b3ce9bf64d8012a | 2024-12-23T14:42:30-05:00 | [34-102031] | [3-22383] | [{'name': 'Richard Brown', 'type': 'individual... | Richard Brown | [{'label': 'primary', 'url': 'https://www.sec.... | ORDER INSTITUTING ADMINISTRATIVE PROCEEDINGS P... | The SEC has instituted administrative proceedi... | [securities fraud, disclosure fraud] | ... | [United States District Court for the Eastern ... | True | False | [] | [disgorgement of profits, civil penalties] | [Section 17(a) of the Securities Act of 1933, ... | [Respondent Brown is barred from association w... | [] | [] | [] |
4 | 91a2c34129c53e911759a959be1aa829 | 2024-12-20T17:51:48-05:00 | [33-11349] | [3-22382] | [{'name': 'Tai Mo Shan Limited', 'type': 'comp... | Tai Mo Shan Limited | [{'label': 'primary', 'url': 'https://www.sec.... | ORDER INSTITUTING CEASE-AND-DESIST PROCEEDINGS... | The SEC has instituted cease-and-desist procee... | [unregistered securities, crypto, misleading c... | ... | [] | True | True | [{'penaltyAmount': '73452756', 'penaltyAmountT... | [disgorgement of profits, civil penalties] | [Sections 5(a) and 5(c) of the Securities Act,... | [Respondent Tai Mo Shan cease and desist from ... | [] | [] | [] |
5 rows × 22 columns
print(df.info())
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 50 entries, 0 to 49
Data columns (total 22 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 id 50 non-null object
1 releasedAt 50 non-null object
2 releaseNo 50 non-null object
3 fileNumbers 50 non-null object
4 respondents 50 non-null object
5 respondentsText 50 non-null object
6 resources 50 non-null object
7 title 50 non-null object
8 summary 50 non-null object
9 tags 50 non-null object
10 entities 50 non-null object
11 complaints 50 non-null object
12 parallelActionsTakenBy 50 non-null object
13 hasAgreedToSettlement 50 non-null bool
14 hasAgreedToPayPenalty 50 non-null bool
15 penaltyAmounts 50 non-null object
16 requestedRelief 50 non-null object
17 violatedSections 50 non-null object
18 orders 50 non-null object
19 investigationConductedBy 50 non-null object
20 litigationLedBy 50 non-null object
21 otherAgenciesInvolved 50 non-null object
dtypes: bool(2), object(20)
memory usage: 8.0+ KB
None
import os
import json
import time
import random
import sys
from concurrent.futures import ThreadPoolExecutor
import pandas as pd
from sec_api import SecAdministrativeProceedingsApi
adminProceedingsApi = SecAdministrativeProceedingsApi(API_KEY_SEC_API)
YEARS = range(2024, 1994, -1) # from 2024 to 2004
TEMP_FILE_TEMPLATE = "./temp_file_{}.jsonl"
TARGET_FILE = "./sec-administrative-proceedings.jsonl"
def process_year(year):
backoff_time = random.randint(10, 800) / 1000
print(f"Starting year {year} with backoff time {backoff_time:,}s"); sys.stdout.flush()
time.sleep(backoff_time)
tmp_filename = TEMP_FILE_TEMPLATE.format(year)
tmp_file = open(tmp_filename, "a")
for month in range(12, 0, -1):
search_from = 0
month_counter = 0
while True:
query = f"releasedAt:[{year}-{month:02d}-01 TO {year}-{month:02d}-31]"
searchRequest = {
"query": query,
"from": search_from,
"size": "50",
"sort": [{"releasedAt": {"order": "desc"}}],
}
response = None
try:
response = adminProceedingsApi.get_data(searchRequest)
except Exception as e:
print(f"{year}-{month:02d} error: {e}"); sys.stdout.flush()
continue
if response == None or len(response["data"]) == 0:
break
search_from += 50
month_counter += len(response["data"])
jsonl_data = "\n".join([json.dumps(entry) for entry in response["data"]])
tmp_file.write(jsonl_data + "\n")
print(f"Finished loading {month_counter} proceedings for {year}-{month:02d}")
sys.stdout.flush()
tmp_file.close()
return year
if not os.path.exists(TARGET_FILE):
with ThreadPoolExecutor(max_workers=4) as pool:
processed_years = list(pool.map(process_year, YEARS))
print("Finished processing all years.", processed_years)
# merge the temporary files into one final file
with open(TARGET_FILE, "a") as outfile:
for year in YEARS:
temp_file = TEMP_FILE_TEMPLATE.format(year)
if os.path.exists(temp_file):
with open(temp_file, "r") as infile:
outfile.write(infile.read())
else:
print("File already exists. Skipping download.")
File already exists. Skipping download.
# load JSONL into dataframe
df = pd.read_json(TARGET_FILE, lines=True)
# convert "releasedAt" to datetime and to EST timezone
df["releasedAt"] = pd.to_datetime(df["releasedAt"], utc=True)
df["releasedAt"] = df["releasedAt"].dt.tz_convert("US/Eastern")
df["releasedAtYear"] = df["releasedAt"].dt.year
df["releasedAtMonth"] = df["releasedAt"].dt.month
df["releasedAtYearMonth"] = df["releasedAt"].map(lambda x: x.strftime("%Y-%m"))
# Monday, Tuesday, Wednesday, Thursday, Friday, Saturday, Sunday
df["releasedAtDay"] = df["releasedAt"].dt.day_name()
df["hasAgreedToSettlement"] = df["hasAgreedToSettlement"].astype(bool)
df["hasAgreedToPayPenalty"] = df["hasAgreedToPayPenalty"].astype(bool)
first_year = df["releasedAtYear"].min(); last_year = df["releasedAtYear"].max()
print(f"Loaded {len(df):,.0f} SEC administrative proceedings from {first_year} to {last_year}\n")
df.head()
Loaded 17,987 SEC administrative proceedings from 1995 to 2024
Out:
id | releasedAt | releaseNo | fileNumbers | respondents | respondentsText | resources | title | summary | tags | ... | requestedRelief | violatedSections | orders | investigationConductedBy | litigationLedBy | otherAgenciesInvolved | releasedAtYear | releasedAtMonth | releasedAtYearMonth | releasedAtDay | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 0ab80b58b2fcf40e7497aa0000759a37 | 2024-12-31 12:19:45-05:00 | [34-102060, AAER-4554] | [3-22386] | [{'name': 'Accell Audit & Compliance, PA', 'ty... | Accell Audit & Compliance, PA | [{'label': 'primary', 'url': 'https://www.sec.... | ORDER INSTITUTING PUBLIC ADMINISTRATIVE PROCEE... | The SEC has instituted public administrative p... | [fraudulent financial reporting, accounting mi... | ... | [] | [Section 10(b) of the Exchange Act, Rule 10b-5] | [Accell is suspended from appearing or practic... | [] | [] | [] | 2024 | 12 | 2024-12 | Tuesday |
1 | 0dbf2331563447a91e64b78077239115 | 2024-12-30 13:02:35-05:00 | [IA-6806] | [3-22385] | [{'name': 'Steven J. Susoeff', 'type': 'indivi... | Steven J. Susoeff | [{'label': 'primary', 'url': 'https://www.sec.... | ORDER INSTITUTING ADMINISTRATIVE PROCEEDINGS P... | The SEC has instituted administrative proceedi... | [fraudulent cherry-picking, investment adviser... | ... | [] | [Sections 17(a) of the Securities Act of 1933,... | [Respondent Susoeff is barred from association... | [] | [] | [] | 2024 | 12 | 2024-12 | Monday |
2 | 8ad2aaf148601d33de3460509dd08cec | 2024-12-27 17:12:33-05:00 | [34-102047, AAER-4553] | [3-22384] | [{'name': 'Christopher Hiestand', 'type': 'ind... | Christopher Hiestand, CPA | [{'label': 'primary', 'url': 'https://www.sec.... | ORDER INSTITUTING PUBLIC ADMINISTRATIVE PROCEE... | The SEC has instituted public administrative p... | [fraudulent financial reporting, accounting mi... | ... | [] | [Section 10(b) of the Exchange Act, Rule 10b-5] | [Hiestand is suspended from appearing or pract... | [] | [] | [] | 2024 | 12 | 2024-12 | Friday |
3 | 804947f56e7146f94b3ce9bf64d8012a | 2024-12-23 14:42:30-05:00 | [34-102031] | [3-22383] | [{'name': 'Richard Brown', 'type': 'individual... | Richard Brown | [{'label': 'primary', 'url': 'https://www.sec.... | ORDER INSTITUTING ADMINISTRATIVE PROCEEDINGS P... | The SEC has instituted administrative proceedi... | [securities fraud, disclosure fraud] | ... | [disgorgement of profits, civil penalties] | [Section 17(a) of the Securities Act of 1933, ... | [Respondent Brown is barred from association w... | [] | [] | [] | 2024 | 12 | 2024-12 | Monday |
4 | 91a2c34129c53e911759a959be1aa829 | 2024-12-20 17:51:48-05:00 | [33-11349] | [3-22382] | [{'name': 'Tai Mo Shan Limited', 'type': 'comp... | Tai Mo Shan Limited | [{'label': 'primary', 'url': 'https://www.sec.... | ORDER INSTITUTING CEASE-AND-DESIST PROCEEDINGS... | The SEC has instituted cease-and-desist procee... | [unregistered securities, crypto, misleading c... | ... | [disgorgement of profits, civil penalties] | [Sections 5(a) and 5(c) of the Securities Act,... | [Respondent Tai Mo Shan cease and desist from ... | [] | [] | [] | 2024 | 12 | 2024-12 | Friday |
5 rows × 26 columns
import matplotlib.pyplot as plt
import matplotlib.style as style
style.use("default")
params = {
"axes.labelsize": 8, "font.size": 8, "legend.fontsize": 8,
"xtick.labelsize": 8, "ytick.labelsize": 8, "font.family": "sans-serif",
"axes.spines.top": False, "axes.spines.right": False, "grid.color": "grey",
"axes.grid": True, "axes.grid.axis": "y", "grid.alpha": 0.5, "grid.linestyle": ":",
}
plt.rcParams.update(params)
# piv table with years = index, months = columns, values = count of proceedings
unique_proceedings = df.explode('fileNumbers').drop_duplicates(subset=['fileNumbers'], keep='last')
df_year_month = unique_proceedings.pivot_table(
# df_year_month = df.pivot_table(
index="releasedAtYear",
columns="releasedAtMonth",
values="id",
aggfunc="count",
fill_value=0,
)
total_col = df_year_month.sum(axis=1)
mean_col = round(df_year_month.mean(axis=1), 0)
median_col = round(df_year_month.median(axis=1), 0)
df_year_month["total"] = total_col
df_year_month["mean"] = mean_col
df_year_month["median"] = median_col
total_row = df_year_month.sum(axis=0)
mean_row = round(df_year_month.mean(axis=0), 0)
median_row = round(df_year_month.median(axis=0), 0)
df_year_month.loc["total"] = total_row
df_year_month.loc["mean"] = mean_row
df_year_month.loc["median"] = median_row
df_year_month = df_year_month.astype(int)
print("SEC Administrative Proceedings by Year and Month")
df_year_month
SEC Administrative Proceedings by Year and Month
Out[8]:
releasedAtMonth | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 | total | mean | median |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
releasedAtYear | |||||||||||||||
1995 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 11 | 9 | 7 | 28 | 2 | 0 |
1996 | 25 | 27 | 15 | 9 | 14 | 14 | 11 | 8 | 66 | 19 | 8 | 12 | 228 | 19 | 14 |
1997 | 17 | 19 | 15 | 17 | 19 | 13 | 10 | 16 | 75 | 6 | 5 | 25 | 237 | 20 | 16 |
1998 | 15 | 3 | 14 | 19 | 13 | 20 | 21 | 17 | 56 | 16 | 8 | 14 | 216 | 18 | 16 |
1999 | 19 | 15 | 20 | 24 | 27 | 11 | 10 | 49 | 61 | 14 | 15 | 15 | 280 | 23 | 17 |
2000 | 4 | 15 | 15 | 19 | 13 | 32 | 9 | 13 | 64 | 12 | 10 | 28 | 234 | 20 | 14 |
2001 | 8 | 18 | 16 | 10 | 31 | 28 | 14 | 16 | 39 | 15 | 16 | 18 | 229 | 19 | 16 |
2002 | 20 | 19 | 33 | 25 | 13 | 27 | 36 | 23 | 24 | 18 | 30 | 39 | 307 | 26 | 24 |
2003 | 23 | 20 | 21 | 20 | 39 | 26 | 30 | 35 | 49 | 33 | 19 | 23 | 338 | 28 | 24 |
2004 | 12 | 27 | 31 | 19 | 29 | 24 | 35 | 53 | 56 | 32 | 24 | 35 | 377 | 31 | 30 |
2005 | 18 | 30 | 15 | 3 | 1 | 2 | 1 | 2 | 15 | 2 | 2 | 8 | 99 | 8 | 2 |
2006 | 3 | 10 | 3 | 2 | 4 | 3 | 5 | 4 | 13 | 0 | 5 | 2 | 54 | 4 | 4 |
2007 | 4 | 1 | 1 | 2 | 3 | 3 | 4 | 7 | 12 | 0 | 1 | 2 | 40 | 3 | 2 |
2008 | 1 | 7 | 8 | 5 | 3 | 4 | 6 | 4 | 13 | 3 | 1 | 2 | 57 | 5 | 4 |
2009 | 7 | 4 | 3 | 14 | 19 | 44 | 36 | 38 | 27 | 33 | 28 | 31 | 284 | 24 | 28 |
2010 | 42 | 26 | 45 | 41 | 42 | 34 | 37 | 39 | 57 | 20 | 36 | 28 | 447 | 37 | 38 |
2011 | 34 | 62 | 32 | 36 | 47 | 39 | 43 | 29 | 48 | 35 | 36 | 31 | 472 | 39 | 36 |
2012 | 40 | 56 | 38 | 34 | 39 | 33 | 35 | 33 | 47 | 28 | 28 | 44 | 455 | 38 | 36 |
2013 | 32 | 24 | 38 | 50 | 38 | 18 | 36 | 44 | 97 | 53 | 28 | 33 | 491 | 41 | 37 |
2014 | 51 | 51 | 41 | 38 | 36 | 54 | 42 | 48 | 131 | 38 | 58 | 33 | 621 | 52 | 45 |
2015 | 35 | 47 | 47 | 55 | 37 | 101 | 45 | 55 | 104 | 53 | 38 | 57 | 674 | 56 | 50 |
2016 | 43 | 57 | 48 | 46 | 23 | 57 | 35 | 149 | 85 | 43 | 41 | 53 | 680 | 57 | 47 |
2017 | 58 | 42 | 29 | 46 | 45 | 47 | 28 | 52 | 95 | 28 | 18 | 33 | 521 | 43 | 44 |
2018 | 23 | 26 | 33 | 41 | 49 | 49 | 46 | 85 | 152 | 15 | 24 | 52 | 595 | 50 | 44 |
2019 | 13 | 21 | 105 | 24 | 31 | 33 | 67 | 104 | 158 | 17 | 14 | 20 | 607 | 51 | 28 |
2020 | 51 | 35 | 20 | 35 | 31 | 30 | 51 | 44 | 157 | 22 | 14 | 34 | 524 | 44 | 34 |
2021 | 14 | 18 | 16 | 17 | 76 | 27 | 59 | 63 | 98 | 13 | 20 | 28 | 449 | 37 | 24 |
2022 | 32 | 56 | 19 | 25 | 33 | 41 | 18 | 68 | 142 | 17 | 14 | 16 | 481 | 40 | 28 |
2023 | 13 | 12 | 17 | 43 | 71 | 27 | 26 | 57 | 177 | 11 | 14 | 15 | 483 | 40 | 22 |
2024 | 17 | 38 | 17 | 25 | 25 | 28 | 3 | 53 | 178 | 57 | 48 | 50 | 539 | 45 | 33 |
total | 674 | 786 | 755 | 744 | 851 | 869 | 799 | 1208 | 2297 | 664 | 612 | 788 | 11047 | 920 | 757 |
mean | 22 | 26 | 25 | 25 | 28 | 29 | 27 | 40 | 77 | 22 | 20 | 26 | 368 | 31 | 25 |
median | 18 | 22 | 20 | 24 | 30 | 28 | 29 | 38 | 62 | 18 | 17 | 28 | 412 | 34 | 26 |
fig, ax = plt.subplots(figsize=(5, 3))
df_year_month.loc[1996:2024]["total"].plot(
kind="line",
ax=ax,
marker="o",
markersize=3,
linewidth=1,
)
ax.set_title("Administrative Proceedings with \nUnique File Numbers per Year")
ax.set_xlabel("Year")
ax.set_ylabel("Number of Proceedings")
plt.tight_layout()
plt.grid(axis="x")
ax.set_axisbelow(True)
plt.show()
data:image/s3,"s3://crabby-images/49265/49265166e187af97b2c6fab71bb5f95ab79613c6" alt=""
# piv table for hasAgreedToSettlement
# col = hasAgreedToSettlement (true, false), index = year, values = count of cases
df_year_settlement = unique_proceedings.pivot_table(
index="releasedAtYear",
columns="hasAgreedToSettlement",
values="id",
aggfunc="count",
fill_value=0,
)
total_col = df_year_settlement.sum(axis=1)
mean_col = round(df_year_settlement.mean(axis=1), 0)
median_col = round(df_year_settlement.median(axis=1), 0)
total_row = df_year_settlement.sum(axis=0)
mean_row = round(df_year_settlement.mean(axis=0), 0)
median_row = round(df_year_settlement.median(axis=0), 0)
df_year_settlement.loc["total"] = total_row
df_year_settlement.loc["mean"] = mean_row
df_year_settlement.loc["median"] = median_row
df_year_settlement = df_year_settlement.astype(int)
df_year_settlement
Out[10]:
hasAgreedToSettlement | False | True |
---|---|---|
releasedAtYear | ||
1995 | 3 | 25 |
1996 | 56 | 172 |
1997 | 46 | 191 |
1998 | 45 | 171 |
1999 | 71 | 209 |
2000 | 41 | 193 |
2001 | 34 | 195 |
2002 | 56 | 251 |
2003 | 69 | 269 |
2004 | 95 | 282 |
2005 | 25 | 74 |
2006 | 13 | 41 |
2007 | 6 | 34 |
2008 | 13 | 44 |
2009 | 120 | 164 |
2010 | 192 | 255 |
2011 | 207 | 265 |
2012 | 219 | 236 |
2013 | 212 | 279 |
2014 | 245 | 376 |
2015 | 287 | 387 |
2016 | 201 | 479 |
2017 | 170 | 351 |
2018 | 174 | 421 |
2019 | 176 | 431 |
2020 | 178 | 346 |
2021 | 168 | 281 |
2022 | 209 | 272 |
2023 | 151 | 332 |
2024 | 131 | 408 |
total | 3613 | 7434 |
mean | 120 | 248 |
median | 126 | 260 |
# stacked bar chart for hasAgreedToSettlement
fig, ax = plt.subplots(figsize=(6, 4))
df_year_settlement.drop(["total", "mean", "median"]).plot(
kind="bar", stacked=False, ax=ax, color=["#1f77b4", "#ff7f0e"]
)
plt.title("SEC Administrative Proceedings by Year and Settlement")
plt.xlabel("Year")
plt.ylabel("Count")
plt.legend(["No Settlement", "Settlement"], loc="upper right")
plt.grid(axis="x")
ax.set_axisbelow(True)
plt.tight_layout()
plt.show()
data:image/s3,"s3://crabby-images/00224/00224ceec0cc080a14a3d01178a5f621cf3e64bb" alt=""
# piv table for hasAgreedToPayPenalty
# col = hasAgreedToPayPenalty (true, false), index = year, values = count of cases
df_year_penalty = unique_proceedings.pivot_table(
index="releasedAtYear",
columns="hasAgreedToPayPenalty",
values="id",
aggfunc="count",
fill_value=0,
)
total_col = df_year_penalty.sum(axis=1)
mean_col = round(df_year_penalty.mean(axis=1), 0)
median_col = round(df_year_penalty.median(axis=1), 0)
total_row = df_year_penalty.sum(axis=0)
mean_row = round(df_year_penalty.mean(axis=0), 0)
median_row = round(df_year_penalty.median(axis=0), 0)
df_year_penalty.loc["total"] = total_row
df_year_penalty.loc["mean"] = mean_row
df_year_penalty.loc["median"] = median_row
df_year_penalty = df_year_penalty.astype(int)
df_year_penalty
Out[12]:
hasAgreedToPayPenalty | False | True |
---|---|---|
releasedAtYear | ||
1995 | 22 | 6 |
1996 | 179 | 49 |
1997 | 184 | 53 |
1998 | 158 | 58 |
1999 | 212 | 68 |
2000 | 167 | 67 |
2001 | 170 | 59 |
2002 | 238 | 69 |
2003 | 270 | 68 |
2004 | 263 | 114 |
2005 | 63 | 36 |
2006 | 35 | 19 |
2007 | 24 | 16 |
2008 | 37 | 20 |
2009 | 210 | 74 |
2010 | 355 | 92 |
2011 | 374 | 98 |
2012 | 375 | 80 |
2013 | 369 | 122 |
2014 | 396 | 225 |
2015 | 427 | 247 |
2016 | 420 | 260 |
2017 | 313 | 208 |
2018 | 326 | 269 |
2019 | 421 | 186 |
2020 | 309 | 215 |
2021 | 264 | 185 |
2022 | 313 | 168 |
2023 | 249 | 234 |
2024 | 223 | 316 |
total | 7366 | 3681 |
mean | 246 | 123 |
median | 256 | 86 |
# stacked bar chart for hasAgreedToSettlement
fig, ax = plt.subplots(figsize=(6, 3.5))
df_year_penalty.drop(["total", "mean", "median"]).plot(
kind="bar", stacked=False, ax=ax, color=["#1f77b4", "#ff7f0e"]
)
plt.title("SEC Administrative Proceedings by Year and Penalty")
plt.xlabel("Year")
plt.ylabel("Count")
plt.legend(["Did not agree to penalty", "Agreed to penalty"], loc="upper left")
plt.grid(axis="x")
ax.set_axisbelow(True)
plt.tight_layout()
plt.show()
data:image/s3,"s3://crabby-images/6b7c7/6b7c77c832f53a9747c81715a05e28767a25c752" alt=""
all_penalties = []
file_numbers = {}
# iterate over all rows, extract penalties and append to all_penalties
for i, row in unique_proceedings.iterrows():
penaltyAmounts = row["penaltyAmounts"]
if isinstance(penaltyAmounts, list):
for penalty in penaltyAmounts:
if "penaltyAmount" in penalty:
# find entity with "name" == "imposedOn"
entity = list(filter(lambda x: x["name"] == penalty["imposedOn"], row["entities"]))
entity_type = entity[0]["type"] if entity else None
all_penalties.append(
{
# "caseCitation": row['caseCitation'],
"fileNumber": row["fileNumbers"],
"releaseNo": row["releaseNo"],
"releasedAt": row["releasedAt"],
"releasedAtYear": row["releasedAtYear"],
"releasedAtMonth": row["releasedAtMonth"],
"amount": penalty["penaltyAmount"],
"imposedOn": penalty["imposedOn"],
"imposedOnType": entity_type,
# "url": row["url"],
}
)
all_penalties_df = pd.DataFrame(all_penalties)
all_penalties_df["amount"] = all_penalties_df["amount"].astype(float)
# remove all doubles by caseCitation and imposedOn
all_penalties_df = all_penalties_df.drop_duplicates(
subset=["fileNumber", "imposedOn", "amount"], keep="last"
)
all_penalties_df
Out[14]:
fileNumber | releaseNo | releasedAt | releasedAtYear | releasedAtMonth | amount | imposedOn | imposedOnType | |
---|---|---|---|---|---|---|---|---|
0 | 3-22386 | [34-102060, AAER-4554] | 2024-12-31 12:19:45-05:00 | 2024 | 12 | 75000.00 | Accell Audit & Compliance, PA | company |
1 | 3-22384 | [34-102047, AAER-4553] | 2024-12-27 17:12:33-05:00 | 2024 | 12 | 50000.00 | Christopher Hiestand | individual |
2 | 3-22382 | [33-11349] | 2024-12-20 17:51:48-05:00 | 2024 | 12 | 73452756.00 | Tai Mo Shan Limited | company |
3 | 3-22382 | [33-11349] | 2024-12-20 17:51:48-05:00 | 2024 | 12 | 12916153.00 | Tai Mo Shan Limited | company |
4 | 3-22382 | [33-11349] | 2024-12-20 17:51:48-05:00 | 2024 | 12 | 36726378.00 | Tai Mo Shan Limited | company |
... | ... | ... | ... | ... | ... | ... | ... | ... |
5933 | 3-8865 | [34-36338] | 1995-10-05 09:11:20-04:00 | 1995 | 10 | 10000.00 | John Laurienti | individual |
5934 | 3-8865 | [34-36338] | 1995-10-05 09:11:20-04:00 | 1995 | 10 | 50000.00 | Dickinson & Co. | company |
5935 | 3-8864 | [34-36334] | 1995-10-04 09:11:20-04:00 | 1995 | 10 | 11402.50 | Lewco Securities Corporation | company |
5936 | 3-8863 | [34-36333] | 1995-10-04 09:11:20-04:00 | 1995 | 10 | 107000.00 | French American Banking Corporation | company |
5937 | 3-8863 | [34-36333] | 1995-10-04 09:11:20-04:00 | 1995 | 10 | 15918.84 | French American Banking Corporation | company |
5909 rows × 8 columns
# aggregate amount by year
penalties_year = all_penalties_df.groupby("releasedAtYear")["amount"].sum()
penalties_year = penalties_year.astype(int)
penalties_year = pd.DataFrame(penalties_year)
penalties_year["amount"] = round(penalties_year["amount"] / 1_000_000, 2)
print("Total Penalties in Million USD by Year")
penalties_year
Total Penalties in Million USD by Year
Out[15]:
amount | |
---|---|
releasedAtYear | |
1995 | 5.22 |
1996 | 30.19 |
1997 | 20.13 |
1998 | 20.49 |
1999 | 54.28 |
2000 | 131.97 |
2001 | 635.92 |
2002 | 60.40 |
2003 | 431.53 |
2004 | 1602.46 |
2005 | 752.19 |
2006 | 317.54 |
2007 | 31.34 |
2008 | 681.46 |
2009 | 243.64 |
2010 | 1566.58 |
2011 | 1011.27 |
2012 | 319.56 |
2013 | 889.66 |
2014 | 4464.57 |
2015 | 1254.10 |
2016 | 1787.28 |
2017 | 887.40 |
2018 | 1845.09 |
2019 | 944.43 |
2020 | 3181.67 |
2021 | 792.16 |
2022 | 1557.67 |
2023 | 1478.46 |
2024 | 2158.77 |
fig, ax = plt.subplots(figsize=(5, 3))
penalties_year["amount"].plot(
kind="line",
ax=ax,
marker="o",
markersize=3,
linewidth=1,
)
ax.get_yaxis().set_major_formatter(plt.FuncFormatter(lambda x, loc: "{:,}".format(int(x))))
ax.set_title("Penalties per Year")
ax.set_xlabel("Year")
ax.set_ylabel("Penalty Amount\nin Million USD")
plt.tight_layout()
plt.grid(axis="x")
ax.set_axisbelow(True)
plt.show()
data:image/s3,"s3://crabby-images/61be1/61be138a93650dbdd974eee36dd49f5fb712cf1a" alt=""
# sort all penalties by amount, show top 10
top_10_penalties = all_penalties_df.sort_values("amount", ascending=False).head(10)
top_10_penalties["amount"] = round(top_10_penalties["amount"] / 1_000_000, 2)
top_10_penalties["amount"] = top_10_penalties["amount"].map("{:,.1f}".format)
print("Top 10 SEC Penalties in Million USD between 1995 and 2024")
top_10_penalties[['amount', 'imposedOn', 'releasedAt', 'releaseNo']]
Top 10 SEC Penalties in Million USD between 1995 and 2024
Out[17]:
amount | imposedOn | releasedAt | releaseNo | |
---|---|---|---|---|
3808 | 900.0 | S.A.C. Capital Advisors, LLC | 2014-06-27 09:21:21-04:00 | [IA-3864] |
3809 | 900.0 | CR Intrinsic Investors, LLC | 2014-06-27 09:21:21-04:00 | [IA-3864] |
3810 | 900.0 | Sigma Capital Management, LLC | 2014-06-27 09:21:21-04:00 | [IA-3864] |
3807 | 900.0 | S.A.C. Capital Advisors, L.P. | 2014-06-27 09:21:21-04:00 | [IA-3864] |
2027 | 853.2 | Petróleo Brasileiro S.A. – Petrobras | 2018-09-27 09:24:56-04:00 | [33-10561, 34-84205, AAER-3989] |
944 | 675.0 | Allianz Global Investors U.S. LLC | 2022-05-17 09:28:19-04:00 | [34-94927, IA-6027] |
1345 | 606.3 | The Goldman Sachs Group, Inc. | 2020-10-22 09:26:36-04:00 | [34-90243, AAER-4191] |
5341 | 606.0 | Republic New York Securities Corp. | 2001-12-17 08:12:59-05:00 | [34-45157] |
4329 | 583.5 | Barry J. Minkow | 2011-11-22 08:18:30-05:00 | [IA-3320] |
1449 | 556.7 | Herbalife Nutrition Ltd. | 2020-08-28 09:26:48-04:00 | [34-89704, AAER-4165] |
# create piv table across years and imposedOnType, with values = sum of penalties
penalties_entity_type = all_penalties_df.pivot_table(
index="releasedAtYear",
columns="imposedOnType",
values="amount",
aggfunc="sum",
fill_value=0,
)
total_col = penalties_entity_type.sum(axis=1)
mean_col = round(penalties_entity_type.mean(axis=1), 0)
median_col = round(penalties_entity_type.median(axis=1), 0)
total_row = penalties_entity_type.sum(axis=0)
mean_row = round(penalties_entity_type.mean(axis=0), 0)
median_row = round(penalties_entity_type.median(axis=0), 0)
penalties_entity_type.loc["total"] = total_row
penalties_entity_type.loc["mean"] = mean_row
penalties_entity_type.loc["median"] = median_row
# format to million and , notation
penalties_entity_type = round(penalties_entity_type / 1_000_000, 2)
penalties_entity_type = penalties_entity_type.map("{:,.1f}".format)
print("SEC Penalties in Million USD by Year and Entity Type")
penalties_entity_type = penalties_entity_type[['company', 'individual']]
penalties_entity_type
SEC Penalties in Million USD by Year and Entity Type
Out[18]:
imposedOnType | company | individual |
---|---|---|
releasedAtYear | ||
1995 | 5.2 | 0.0 |
1996 | 21.2 | 8.9 |
1997 | 11.6 | 5.8 |
1998 | 13.4 | 3.3 |
1999 | 51.3 | 1.9 |
2000 | 119.7 | 12.2 |
2001 | 607.9 | 28.0 |
2002 | 13.4 | 46.9 |
2003 | 375.4 | 56.0 |
2004 | 1,002.4 | 206.5 |
2005 | 231.4 | 5.6 |
2006 | 297.7 | 2.6 |
2007 | 28.3 | 3.1 |
2008 | 520.7 | 101.1 |
2009 | 106.0 | 12.2 |
2010 | 1,120.7 | 39.2 |
2011 | 374.0 | 606.1 |
2012 | 161.9 | 141.3 |
2013 | 741.8 | 145.2 |
2014 | 4,128.8 | 51.9 |
2015 | 913.4 | 83.5 |
2016 | 1,713.4 | 39.7 |
2017 | 790.6 | 38.1 |
2018 | 1,705.0 | 46.1 |
2019 | 802.8 | 78.2 |
2020 | 3,121.8 | 23.0 |
2021 | 764.8 | 23.8 |
2022 | 1,333.2 | 19.0 |
2023 | 1,041.8 | 18.5 |
2024 | 1,904.9 | 29.7 |
total | 24,024.4 | 1,877.4 |
mean | 800.8 | 62.6 |
median | 564.3 | 28.8 |
fig, ax = plt.subplots(figsize=(5, 3))
data_to_plot = penalties_entity_type.loc[1995:2024].map(
lambda x: float(x.replace(",", ""))
)
data_to_plot.plot(kind="bar", stacked=False, ax=ax, color=["#1f77b4", "#ff7f0e"])
plt.title("Proceedings Penalties by Year and Entity Type")
plt.xlabel("Year")
plt.ylabel("Penalty Amount\nin Million USD")
plt.legend(["Company", "Individual"], loc="upper right")
plt.grid(axis="x")
ax.set_axisbelow(True)
plt.tight_layout()
plt.show()
data:image/s3,"s3://crabby-images/22aa1/22aa1809ab11242c425d1c1ededa62020378b26d" alt=""
all_tags = []
for i, row in unique_proceedings.iterrows():
tags = row["tags"]
if isinstance(tags, list):
all_tags.extend(tags)
all_tags = pd.Series(all_tags)
all_tags = all_tags.value_counts().reset_index()
all_tags.columns = ["tag", "count"]
print("Top 10 Tags in SEC Administrative Proceedings")
all_tags.head(10)
Top 10 Tags in SEC Administrative Proceedings
Out[20]:
tag | count | |
---|---|---|
0 | securities fraud | 2115 |
1 | delinquent filings | 1676 |
2 | disclosure fraud | 1336 |
3 | fraud | 894 |
4 | accounting fraud | 743 |
5 | unregistered securities | 634 |
6 | securities violation | 532 |
7 | investment adviser | 499 |
8 | insider trading | 464 |
9 | misrepresentation | 395 |
# count all unique requestedRelief
all_requested_relief = []
for i, row in unique_proceedings.iterrows():
requestedRelief = row["requestedRelief"]
if isinstance(requestedRelief, list):
all_requested_relief.extend(requestedRelief)
all_requested_relief = pd.DataFrame(all_requested_relief, columns=["requestedRelief"])
all_requested_relief["requestedRelief"] = (
all_requested_relief["requestedRelief"]
.str.replace("cease-and-desist order", "cease and desist order")
.replace("permanent injunctions", "permanent injunction")
)
all_requested_relief = all_requested_relief.value_counts().reset_index()
print("Top 10 Requested Reliefs in SEC Administrative Proceedings")
all_requested_relief.head(10)
Top 10 Requested Reliefs in SEC Administrative Proceedings
Out[21]:
requestedRelief | count | |
---|---|---|
0 | civil penalties | 2591 |
1 | disgorgement of profits | 2045 |
2 | cease and desist order | 1709 |
3 | permanent injunction | 913 |
4 | suspend or revoke the registration of each cla... | 635 |
5 | censure | 466 |
6 | civil money penalty | 378 |
7 | suspend for a period not exceeding twelve months | 287 |
8 | revoke the registration of each class of secur... | 285 |
9 | disgorgement | 234 |
# count all unique violatedSections
all_violated_sections = []
for i, row in unique_proceedings.iterrows():
violatedSections = row["violatedSections"]
if isinstance(violatedSections, list):
all_violated_sections.extend(violatedSections)
all_violated_sections = pd.Series(all_violated_sections)
all_violated_sections = all_violated_sections.value_counts().reset_index()
all_violated_sections.columns = ["violatedSections", "count"]
print("Top 10 Violated Securities Laws in SEC Administrative Proceedings")
all_violated_sections.head(10)
Top 10 Violated Securities Laws in SEC Administrative Proceedings
Out[22]:
violatedSections | count | |
---|---|---|
0 | Rule 10b-5 | 2547 |
1 | Exchange Act Section 13(a) | 1701 |
2 | Section 10(b) of the Exchange Act | 1445 |
3 | Rules 13a-1 and 13a-13 | 1251 |
4 | Section 17(a) of the Securities Act of 1933 | 1137 |
5 | Section 10(b) of the Securities Exchange Act o... | 699 |
6 | Section 17(a) of the Securities Act | 570 |
7 | Rules 13a-1 | 528 |
8 | Section 13(a) of the Exchange Act | 492 |
9 | Rules 13a-13 | 472 |