How to Download Exhibit 99 Files from Form 8-K Filings
On this page:
This tutorial guides you through the process of finding and downloading Exhibit 99 files attached to EDGAR Form 8-K filings. It is split into two main parts: locating the relevant filings and downloading the corresponding exhibit files.
Tutorial Overview
Finding Exhibit 99 Files: We use the EDGAR Filing Query API to search for Form 8-K filings that include Exhibit 99 files. The API provides access to the entire EDGAR filings database, from 1993 to the present. We will retrieve a list of URLs that point directly to these Exhibit 99 files.
Downloading the Exhibit 99 Files: Next, we utilize the Download API to download the files. This API supports downloading up to 40 files in parallel directly from our servers, and is not restricted by the SEC's rate limits. The files will be saved and organized into folders based on year, month, and filer for easy access.
What is Exhibit 99?
Exhibit 99 is classified as "additional exhibits" under § 229.601 (Item 601) Exhibits of Regulation S-K. These exhibits contain supplementary information that, while not filed directly as an EDGAR filing, is attached to a filing for further context. Examples include:
- Investor presentations
- Press releases
- Supplier agreements
- Reports from independent accounting firms
Each triggering event category for a Form 8-K filing can include an Exhibit 99 attachment. For example, a company might file a press release as Exhibit 99 under Item 5.02 when announcing leadership changes.
Example Exhibit 99 Files
Setup
We start by installing the sec-api
Python package and import the necessary libraries, such as pandas
. Ensure to replace YOUR_API_KEY
with your personal API key, which you can find in your profile on sec-api.io/signup.
!pip install -q sec-api pandas
API_KEY = "YOUR_API_KEY"
import pandas as pd
from sec_api import QueryApi, RenderApi
queryApi = QueryApi(API_KEY)
renderApi = RenderApi(API_KEY)
Find URLs of Exhibit 99 Files
We will use the Filing Query API to retrieve URLs for Exhibit 99 files attached to Form 8-K filings. For reproducibility, we'll focus on the date range from January 1, 2020, to December 31, 2020. This ensures that running the code at any time will yield the same results as shown in this tutorial.
The Query API allows searches based on various parameters such as form type, filing date, and company name. You can explore all available search options in the API documentation.
To search for Form 8-K filings with Exhibit 99 files filed during 2020, we will use the following query:
formType:"8-K" AND documentFormatFiles.type:*99* AND filedAt:[2020-01-01 TO 2020-12-31]
The query uses Lucene syntax, where field:value
specifies the field
to search in and the value
the field
should to contain. For example, formType:"8-K"
searches for records where the formType
field exactly matches "8-K"
. Multiple conditions can be combined using logical operators like AND
, OR
, and NOT
. For a more detailed explanation of Lucene syntax, refer to this guide.
Now, let's fetch the 10 most recent Form 8-K filings with Exhibit 99 from 2020 using a simple example.
lucene_query = 'formType:"8-K" AND documentFormatFiles.type:*99* AND filedAt:[2020-01-01 TO 2020-12-31]'
query = {
"query": lucene_query,
# defines the start record to fetch. Used for pagination.
"from": "0",
# defines how many records to return. Maximum is 50.
"size": "10",
# sort results by filedAt, starting with the most recent filings.
"sort": [{"filedAt": {"order": "desc"}}],
}
response = queryApi.get_filings(query)
print(f"Number of Form 8-K filings with exhibit 99 in 2020: {response['total']}")
Number of Form 8-K filings with exhibit 99 in 2020: {'value': 10000, 'relation': 'gte'}
The Query API identified over 10,000 Form 8-K filings with Exhibit 99 attachments published in 2020, as indicated by the gte
(greater than or equal to) value in the relation
field of the response.
For use cases requiring Exhibit 99 files tied to a specific event category, such as Item 5.02 (Departure of Directors or Certain Officers), you can refine the query by adding an items
filter like this:
formType:"8-K" AND documentFormatFiles.type:*99* AND filedAt:[2020-01-01 TO 2020-12-31] AND items:"5.02"
The API response includes two main fields:
total
: the total number of filings matching the query.filings
: a list of filings that meet the search criteria.
Each filing object contains over 20 metadata fields, including details such as the unique accession number, the CIK, ticker and name of the filer, filing date, information about attached exhibits and files, and more.
Next, we'll convert the JSON response from the Query API into a DataFrame and display the first 5 rows for review.
filings = pd.DataFrame(response["filings"])
filings[
[
"accessionNo",
"filedAt",
"companyName",
"cik",
"ticker",
"items",
"documentFormatFiles",
]
].head()
accessionNo | filedAt | companyName | cik | ticker | items | documentFormatFiles | |
---|---|---|---|---|---|---|---|
0 | 0001654954-20-014094 | 2020-12-31T17:20:28-05:00 | TOMI Environmental Solutions, Inc. | 314227 | TOMZ | [Item 5.07: Submission of Matters to a Vote of... | [{'sequence': '1', 'size': '49284', 'documentU... |
1 | 0001104659-20-141038 | 2020-12-31T17:10:56-05:00 | GOLD RESOURCE CORP | 1160791 | GORO | [Item 5.02: Departure of Directors or Certain ... | [{'sequence': '1', 'size': '40187', 'documentU... |
2 | 0001104659-20-141037 | 2020-12-31T17:08:55-05:00 | McEwen Mining Inc. | 314203 | MUX | [Item 3.02: Unregistered Sales of Equity Secur... | [{'sequence': '1', 'size': '27630', 'documentU... |
3 | 0001640334-20-003199 | 2020-12-31T17:07:50-05:00 | Lexaria Bioscience Corp. | 1348362 | LEXX | [Item 7.01: Regulation FD Disclosure, Item 9.0... | [{'sequence': '1', 'size': '16594', 'documentU... |
4 | 0001493152-20-024694 | 2020-12-31T17:04:37-05:00 | MONMOUTH REAL ESTATE INVESTMENT CORP | 67625 | MNR.PC | [Item 7.01: Regulation FD Disclosure, Item 8.0... | [{'sequence': '1', 'size': '41725', 'documentU... |
We are primarily interested in the documentUrl
field within the documentFormatFiles
array of a filing. This field contains the URL of any attached files, including Exhibit 99 files. Each object inside the documentFormatFiles
array includes the following fields:
sequence
(string, optional): The sequence number of the file attached to the filing, e.g., "1".description
(string, optional): A brief description of the file, e.g., "EXHIBIT 31.1".documentUrl
(string): The URL of the file hosted on SEC.gov.type
(string, optional): The type of the file, e.g., "EX-32.1", "GRAPHIC", or "10-Q".size
(string, optional): The file size in bytes, e.g., "6627216".
For example, here are the first three objects in the documentFormatFiles
array of the first filing object:
filings["documentFormatFiles"][0][:3]
[{'sequence': '1',
'size': '49284',
'documentUrl': 'https://www.sec.gov/Archives/edgar/data/314227/000165495420014094/tomi_8k.htm',
'description': 'CURRENT REPORT',
'type': '8-K'},
{'sequence': '2',
'size': '24539',
'documentUrl': 'https://www.sec.gov/Archives/edgar/data/314227/000165495420014094/tomi_ex991.htm',
'description': 'PRESENTATION',
'type': 'EX-99.1'},
{'sequence': '3',
'size': '35390',
'documentUrl': 'https://www.sec.gov/Archives/edgar/data/314227/000165495420014094/tomi_ex991000.jpg',
'description': 'IMAGE',
'type': 'GRAPHIC'}]
Note: A single filing may include multiple Exhibit 99 files, resulting in multiple
EX-99
types within thedocumentFormatFiles
array.
Let's extract the URLs of all Exhibit 99 files from the response and create a new DataFrame that includes the exhibit URLs along with other relevant metadata fields, such as the filing date, CIK and ticker of the filer, and accession number.
def extract_ex_99_urls(row):
urls = []
for file in row["documentFormatFiles"]:
if "EX-99" in file["type"]:
urls.append(
{
"filedAt": row["filedAt"],
"accessionNo": row["accessionNo"],
"cik": row["cik"],
"ticker": row["ticker"],
"type": file["type"],
"exhibit99Url": file["documentUrl"],
}
)
return urls
exhibit_99_urls = filings.apply(lambda row: extract_ex_99_urls(row), axis=1)
exhibit_99_urls = pd.DataFrame(exhibit_99_urls.explode().to_list())
exhibit_99_urls
filedAt | accessionNo | cik | ticker | type | exhibit99Url | |
---|---|---|---|---|---|---|
0 | 2020-12-31T17:20:28-05:00 | 0001654954-20-014094 | 314227 | TOMZ | EX-99.1 | https://www.sec.gov/Archives/edgar/data/314227... |
1 | 2020-12-31T17:10:56-05:00 | 0001104659-20-141038 | 1160791 | GORO | EX-99.1 | https://www.sec.gov/Archives/edgar/data/116079... |
2 | 2020-12-31T17:08:55-05:00 | 0001104659-20-141037 | 314203 | MUX | EX-99.1 | https://www.sec.gov/Archives/edgar/data/314203... |
3 | 2020-12-31T17:08:55-05:00 | 0001104659-20-141037 | 314203 | MUX | EX-99.2 | https://www.sec.gov/Archives/edgar/data/314203... |
4 | 2020-12-31T17:07:50-05:00 | 0001640334-20-003199 | 1348362 | LEXX | EX-99.1 | https://www.sec.gov/Archives/edgar/data/134836... |
5 | 2020-12-31T17:04:37-05:00 | 0001493152-20-024694 | 67625 | MNR.PC | EX-99.1 | https://www.sec.gov/Archives/edgar/data/67625/... |
6 | 2020-12-31T17:01:35-05:00 | 0001469709-20-000101 | 1647705 | GBBT | EX-99.1 | https://www.sec.gov/Archives/edgar/data/164770... |
7 | 2020-12-31T17:01:35-05:00 | 0001469709-20-000101 | 1647705 | GBBT | EX-99.2 | https://www.sec.gov/Archives/edgar/data/164770... |
8 | 2020-12-31T17:00:11-05:00 | 0001104659-20-141025 | 1815903 | PTPI | EX-99.1 | https://www.sec.gov/Archives/edgar/data/181590... |
9 | 2020-12-31T17:00:11-05:00 | 0001104659-20-141025 | 1815903 | PTPI | EX-99.2 | https://www.sec.gov/Archives/edgar/data/181590... |
10 | 2020-12-31T17:00:10-05:00 | 0001477932-20-007599 | 1281984 | WDLF | EX-99.1 | https://www.sec.gov/Archives/edgar/data/128198... |
11 | 2020-12-31T16:55:00-05:00 | 0001104659-20-141020 | 837852 | IDEX | EX-99.1 | https://www.sec.gov/Archives/edgar/data/837852... |
12 | 2020-12-31T16:43:06-05:00 | 0001580695-20-000463 | 1372183 | NXTP | EX-99.1 | https://www.sec.gov/Archives/edgar/data/137218... |
13 | 2020-12-31T16:43:06-05:00 | 0001580695-20-000463 | 1372183 | NXTP | EX-99.2 | https://www.sec.gov/Archives/edgar/data/137218... |
Now that we've successfully extracted the Exhibit 99 URLs for 10 filings, let's implement a function to retrieve Exhibit 99 URLs for all Form 8-K filings in 2020. The Query API limits results to 10,000 entries per search expression. Since our initial search found more than 10,000 Form 8-K filings with Exhibit 99 files in 2020, we need to refine our search to reduce the result set. A simple solution is to query filings on a month-by-month basis, ensuring that we capture all Exhibit 99 files by running the query for each month separately.
To speed up the process, we'll use the pandarallel
package to parallelize the pandas apply
function across multiple processes. pandarallel
extends pandas DataFrame objects with a parallel_apply
method, which behaves similarly to the standard apply
method but processes each row in parallel across multiple threads. The number of threads (or processes) is controlled by the nb_workers
parameter. In our case, we set nb_workers=10
, instructing pandarallel
to spawn ten threads. Once parallel_apply
is invoked, pandarallel
efficiently distributes the rows among the threads and waits for all threads to complete before returning the results.
In this scenario, the fetch_exhibit_99_urls
function is applied to each row in the queries
DataFrame via parallel_apply
, enabling us to execute 10 Query API requests concurrently to retrieve the Exhibit 99 URLs for each query.
Note: While we use the terms processes, workers, and threads interchangeably in this tutorial, they have distinct technical meanings. For simplicity, we treat them as synonymous in this context.
!pip install -q pandarallel ipywidgets
from pandarallel import pandarallel
pandarallel.initialize(nb_workers=10, progress_bar=False)
INFO: Pandarallel will run on 10 workers.
INFO: Pandarallel will use standard multiprocessing data transfer (pipe) to transfer data between the main process and workers.
"""
fetch_exhibit_99_urls(query, retry_counter=0)
Fetches the exhibit 99 URLs for a given Query API query.
Parameters:
- query (dict): The Query API query to be used to fetch filing metadata and exhibit 99 URLs.
- retry_counter (int): The number of times the function has retried to fetch data.
Returns:
- list: A list of exhibit 99 URLs for the filings returned by the query.
"""
def fetch_exhibit_99_urls(query, retry_counter=0):
try:
response = queryApi.get_filings(query)
except Exception as e:
if retry_counter < 3:
print(f"Retrying... {retry_counter}")
return fetch_exhibit_99_urls(query, retry_counter + 1)
else:
print(f"Failed to fetch data after {retry_counter} retries")
return []
if len(response["filings"]) == 0:
return []
filings = pd.DataFrame(response["filings"])
return filings.apply(lambda row: extract_ex_99_urls(row), axis=1).explode().to_list()
"""
fetch_all_exhibit_99_urls(start_year, end_year)
Fetches all exhibit 99 URLs of Form 8-K filings for the specified range of years.
Parameters:
- start_year (int): The start year of the range (inclusive).
- end_year (int): The end year (inclusive).
Returns:
- list: A list of dictionaries containing the exhibit 99 URLs.
"""
def fetch_all_exhibit_99_urls(start_year, end_year):
if start_year > end_year:
raise ValueError("start_year must be less than or equal to end_year")
all_exhibit_99_urls = []
for year in range(start_year, end_year + 1):
print(f"Fetching exhibit 99 URLs for year {year}")
for month in range(1, 13):
print(f" Processing month: {month}")
queries = []
query_from = 0
form_type_filter = 'formType:"8-K"'
file_filter = "documentFormatFiles.type:*99*"
date_filter = f"filedAt:[{year}-{month:02d}-01 TO {year}-{month:02d}-31]"
lucene_query = f"{form_type_filter} AND {file_filter} AND {date_filter}"
query = {
"query": lucene_query,
"from": query_from,
"size": "50",
"sort": [{"filedAt": {"order": "desc"}}],
}
response = queryApi.get_filings(query)
total_filings = response["total"]["value"]
print(f" Found {total_filings} filings in {year}-{month:02d}")
if total_filings == 0:
continue
# create queries, each query with a from value of 50, 100, 150, etc.
for i in range(0, total_filings, 50):
queries.append(
{
"query": {
"query": lucene_query,
"from": i,
"size": "50",
"sort": [{"filedAt": {"order": "desc"}}],
}
}
)
queries = pd.DataFrame(queries)
# use pandarallel to parallelize the fetching of exhibit 99 URLs
exhibit_99_urls = queries["query"].parallel_apply(fetch_exhibit_99_urls)
all_exhibit_99_urls.extend(exhibit_99_urls)
# flatten, filter, and sort the exhibit 99 URLs
all_exhibit_99_urls_flat = [item for sublist in all_exhibit_99_urls for item in sublist]
all_exhibit_99_urls_flat = [item for item in all_exhibit_99_urls_flat if type(item) == dict]
all_exhibit_99_urls_df = pd.DataFrame(all_exhibit_99_urls_flat)
all_exhibit_99_urls_df["filedAt"] = pd.to_datetime(all_exhibit_99_urls_df["filedAt"], utc=True)
all_exhibit_99_urls_df["filedAt"] = all_exhibit_99_urls_df["filedAt"].dt.tz_convert("America/New_York")
all_exhibit_99_urls_df = all_exhibit_99_urls_df.sort_values("filedAt", ascending=True)
return all_exhibit_99_urls_df
exhibit_99_urls_2020 = fetch_all_exhibit_99_urls(2020, 2020)
Fetching exhibit 99 URLs for year 2020
Processing month: 1
Found 3068 filings in 2020-01
Processing month: 2
Found 4037 filings in 2020-02
Processing month: 3
Found 3762 filings in 2020-03
Processing month: 4
Found 4023 filings in 2020-04
Processing month: 5
Found 4768 filings in 2020-05
Processing month: 6
Found 2673 filings in 2020-06
Processing month: 7
Found 3688 filings in 2020-07
Processing month: 8
Found 4376 filings in 2020-08
Processing month: 9
Found 2489 filings in 2020-09
Processing month: 10
Found 3899 filings in 2020-10
Processing month: 11
Found 4426 filings in 2020-11
Processing month: 12
Found 2762 filings in 2020-12
print(f"{len(exhibit_99_urls_2020):,} exhibit 99 URLs fetched for 2020")
exhibit_99_urls_2020
53,166 exhibit 99 URLs fetched for 2020
filedAt | accessionNo | cik | ticker | type | exhibit99Url | |
---|---|---|---|---|---|---|
3726 | 2020-01-02 06:03:38-05:00 | 0001104659-20-000041 | 1526113 | GNL | EX-99.1 | https://www.sec.gov/Archives/edgar/data/152611... |
3725 | 2020-01-02 06:04:33-05:00 | 0001104659-20-000050 | 1568162 | RTL | EX-99.1 | https://www.sec.gov/Archives/edgar/data/156816... |
3724 | 2020-01-02 06:38:00-05:00 | 0000052795-20-000004 | 52795 | AXE | EX-99.2 | https://www.sec.gov/Archives/edgar/data/52795/... |
3723 | 2020-01-02 06:38:00-05:00 | 0000052795-20-000004 | 52795 | AXE | EX-99.1 | https://www.sec.gov/Archives/edgar/data/52795/... |
3722 | 2020-01-02 06:41:40-05:00 | 0001193125-20-000100 | 1337553 | AERI | EX-99.1 | https://www.sec.gov/Archives/edgar/data/133755... |
... | ... | ... | ... | ... | ... | ... |
49778 | 2020-12-31 17:07:50-05:00 | 0001640334-20-003199 | 1348362 | LEXX | EX-99.1 | https://www.sec.gov/Archives/edgar/data/134836... |
49777 | 2020-12-31 17:08:55-05:00 | 0001104659-20-141037 | 314203 | MUX | EX-99.2 | https://www.sec.gov/Archives/edgar/data/314203... |
49776 | 2020-12-31 17:08:55-05:00 | 0001104659-20-141037 | 314203 | MUX | EX-99.1 | https://www.sec.gov/Archives/edgar/data/314203... |
49775 | 2020-12-31 17:10:56-05:00 | 0001104659-20-141038 | 1160791 | GORO | EX-99.1 | https://www.sec.gov/Archives/edgar/data/116079... |
49774 | 2020-12-31 17:20:28-05:00 | 0001654954-20-014094 | 314227 | TOMZ | EX-99.1 | https://www.sec.gov/Archives/edgar/data/314227... |
53166 rows × 6 columns
Download Exhibit 99 Files
With the list of Exhibit 99 URLs created, we can now download the files in parallel using the Download API. The Download API supports downloading up to 40 EDGAR filings or exhibit files simultaneously.
To optimize this process, we use pandarallel
again to spawn 10 threads, enabling parallel downloads. Additionally, we'll activate the progress bar feature to monitor the download progress.
Note: When using standard
for
orwhile
loops, downloads occur sequentially, meaning each download waits for the previous one to complete. By usingpandarallel
or othermultiprocessing
techniques, multiple downloads can run concurrently, each in its own thread or process, greatly speeding up the process.
We organize the downloaded Exhibit 99 files using the following directory structure:
ex-99-files/
└── 2020/
├── 01/
│ └── CIK-1/
│ └── accessionNo_ex99-filename.html
├── 02/
└── ...
└── 12/
This structure ensures that files are sorted by year, month, and company (CIK), making them easier to manage and reference.
import os
pandarallel.initialize(nb_workers=10, progress_bar=True)
def download_ex_99_file(row, retry_counter=0):
accession_no = row["accessionNo"]
cik = row["cik"]
exhibit_99_url = row["exhibit99Url"]
exhibit_99_filename = exhibit_99_url.split("/")[-1]
publication_year = row["filedAt"].year
publication_month = row["filedAt"].month
file_name = f"{accession_no}_{exhibit_99_filename}"
file_path = (
f"ex-99-files/{publication_year}/{publication_month:02d}/{cik}/{file_name}"
)
os.makedirs(os.path.dirname(file_path), exist_ok=True)
content = None
try:
content = renderApi.get_filing(exhibit_99_url)
except Exception as e:
if retry_counter < 3:
return download_ex_99_file(row, retry_counter + 1)
else:
print(f"Failed: {exhibit_99_url}")
return
with open(file_path, "wb") as f:
f.write(content.encode("utf-8"))
# download sample of 1000
exhibit_99_urls_2020[:1000].parallel_apply(download_ex_99_file, axis=1)
# download all
# exhibit_99_urls_2020_df.parallel_apply(download_ex_99_file, axis=1)
print("Download complete")
INFO: Pandarallel will run on 10 workers.
INFO: Pandarallel will use standard multiprocessing data transfer (pipe) to transfer data between the main process and workers.
VBox(children=(HBox(children=(IntProgress(value=0, description='0.00%'), Label(value='0 / 100'))), HBox(childr…
Download complete