Download SEC Filings With Python
Find ready-to-execute Python examples to download SEC EDGAR filings and exhibits in their original format or as PDFs. The examples cover how to locate and download historical 10-K and 10-Q filings to local disk, export financial statements from these filings as Excel files, download XBRL data files attached to EDGAR filings, and retrieve various exhibits such as material contracts, press releases, and more.
- Quick Start
- Download SEC 10-K Filings of Russell 3000 Companies
- Export Financial Statements from 10-K and 10-Q Filings to Excel Files
- Download SEC Filings as PDF
- Download XBRL Data Files from SEC Filings
- Download Material Contracts (Exhibit 10) from 10-K Filings
- Download Press Releases with Financial Results in Exhibit 99 from 8-K Filings
Quick Start
This Python quick start example demonstrates how to download SEC EDGAR filings and exhibits using the Filing Download API with the sec-api
package. The package includes a RenderApi
class, which provides the .get_filing(URL)
method to download the original EDGAR filing or exhibit from a provided URL.
The example also demonstrates how to download SEC filings and exhibits as PDFs using the .get_pdf(URL)
method of the PdfGeneratorApi
class.
pip install sec-api
from sec_api import RenderApi
renderApi = RenderApi(api_key="YOUR_API_KEY")
# examples
url_8k_html = "https://www.sec.gov/Archives/edgar/data/1045810/000104581023000014/nvda-20230222.htm"
url_8k_txt = "https://www.sec.gov/Archives/edgar/data/1045810/000104581023000014/0001045810-23-000014.txt"
url_exhibit99 = "https://www.sec.gov/Archives/edgar/data/1045810/000104581023000014/q4fy23pr.htm"
url_xbrl_instance = "https://www.sec.gov/Archives/edgar/data/1045810/000104581023000014/nvda-20230222_htm.xml"
url_excel_file = "https://www.sec.gov/Archives/edgar/data/1045810/000104581023000014/Financial_Report.xlsx"
url_pdf_file = "https://www.sec.gov/Archives/edgar/data/1798925/999999999724004095/filename1.pdf"
url_image_file = "https://www.sec.gov/Archives/edgar/data/1424404/000106299324017776/form10kxz001.jpg"
filing_8k_html = renderApi.get_filing(url_8k_html)
filing_8k_txt = renderApi.get_filing(url_8k_txt)
exhibit99 = renderApi.get_filing(url_exhibit99)
xbrl_instance = renderApi.get_filing(url_xbrl_instance)
# use .get_file() and set return_binary=True
# to get non-text files such as images, PDFs, etc.
excel_file = renderApi.get_file(url_excel_file, return_binary=True)
pdf_file = renderApi.get_file(url_pdf_file, return_binary=True)
image_file = renderApi.get_file(url_image_file, return_binary=True)
from sec_api import PdfGeneratorApi
pdfGeneratorApi = PdfGeneratorApi("YOUR_API_KEY")
# Tesla's 2024 10-K filing URL
filing_10K_url = "https://www.sec.gov/Archives/edgar/data/1318605/000162828024002390/tsla-20231231.htm"
# Nvidia's 2024 proxy statement (DEF14A)
proxy_statement_url = "https://www.sec.gov/Archives/edgar/data/1045810/000104581024000104/nvda-20240514.htm"
# Form 4 disclosing Berkshire Hathaway's $86 million purchase of SIRI stock
filing_4_url = "https://www.sec.gov/Archives/edgar/data/315090/000095017024114414/xslF345X05/ownership.xml"
# Mirosoft's Form 8-K filing disclosing a cybersecurity incident
filing_8K_url = "https://www.sec.gov/Archives/edgar/data/789019/000119312524011295/d708866d8k.htm"
# Exhibit 99 disclosing updates of financial results
exhibit_99_url = "https://www.sec.gov/ix?doc=/Archives/edgar/data/1320695/000132069520000148/ths12-31x201910krecast.htm"
pdf_10K_filing = pdfGeneratorApi.get_pdf(filing_10K_url)
pdf_proxy_filing = pdfGeneratorApi.get_pdf(proxy_statement_url)
pdf_4_filing = pdfGeneratorApi.get_pdf(filing_4_url)
pdf_8K_filing = pdfGeneratorApi.get_pdf(filing_8K_url)
pdf_ex_99 = pdfGeneratorApi.get_pdf(exhibit_99_url)
Download SEC 10-K Filings of Russell 3000 Companies
This ready-to-execute Python example demonstrates how to download all SEC 10-K filings for Russell 3000 companies from 2014 to 2023. The example can be easily adapted to download filings for other companies or different form types.
Steps:
- Generate a list of tickers for all current Russell 3000 companies.
- Use the Query API to retrieve the URLs of historical 10-K filings for each ticker.
- Download all 10-K filings using the Filing Download API.
Create a List of Tickers for Russell 3000 Companies
The constituents of the Russell 3000 index can be found in the latest Form N-PORT filings for ETFs that mirror the index, or they can be directly downloaded as a CSV file from the corresponding ETF’s website.
For this example, the most up-to-date list of tickers is extracted from the CSV file available on the iShares Russell 3000 ETF website. The CSV file includes the ticker, name, asset class, sector, and other details for each holding, and is downloaded and saved locally as russell-3000-constituents.csv
.
The iShares website provides both historical and current constituent data:
Current constituents can be accessed directly via the following link: Download current constituents.
Historical constituents for specific dates can be retrieved by appending a query parameter, such as:
asOfDate=20221230
Example URL: Download historical constituents for 2022-12-30.
The CSV file with current constituents is used to generate the list of tickers for Russell 3000 companies, which will serve as input for the next step in the process.
import requests
url = "https://www.ishares.com/us/products/239714/ishares-russell-3000-etf/1467271812596.ajax?" + \
"fileType=csv&fileName=IWV_holdings&dataType=fund&asOfDate=20221230"
response = requests.get(url)
with open("russell-3000-constituents.csv", "wb") as f:
f.write(response.content)
The CSV file begins with nine metadata lines and ends with a disclaimer, both of which need to be removed. The remaining rows contain the tickers, names, and other information of Russell 3000 companies. Using pandas, the CSV file can be read while skipping the first nine lines, and the last two lines can be removed by slicing the DataFrame.
first_15_lines = "\n".join(response.text.split("\n")[:15])
print("First 15 lines of the CSV file:\n")
print(first_15_lines)
First 15 lines of the CSV file:
iShares Russell 3000 ETF
Fund Holdings as of,"Dec 30, 2022"
Inception Date,"May 22, 2000"
Shares Outstanding,"49,100,000.00"
Stock,"-"
Bond,"-"
Cash,"-"
Other,"-"
Ticker,Name,Sector,Asset Class,Market Value,Weight (%),Notional Value,Shares,Price,Location,Exchange,Currency,FX Rate,Market Currency,Accrual Date
"AAPL","APPLE INC","Information Technology","Equity","559,365,151.11","5.16","559,365,151.11","4,305,127.00","129.93","United States","NASDAQ","USD","1.00","USD","-"
"MSFT","MICROSOFT CORP","Information Technology","Equity","513,917,712.42","4.74","513,917,712.42","2,142,931.00","239.82","United States","NASDAQ","USD","1.00","USD","-"
"AMZN","AMAZON COM INC","Consumer Discretionary","Equity","213,823,596.00","1.97","213,823,596.00","2,545,519.00","84.00","United States","NASDAQ","USD","1.00","USD","-"
"BRKB","BERKSHIRE HATHAWAY INC CLASS B","Financials","Equity","159,603,687.60","1.47","159,603,687.60","516,684.00","308.90","United States","New York Stock Exchange Inc.","USD","1.00","USD","-"
"GOOGL","ALPHABET INC CLASS A","Communication","Equity","151,996,026.75","1.40","151,996,026.75","1,722,725.00","88.23","United States","NASDAQ","USD","1.00","USD","-"
import pandas as pd
russell_3000 = pd.read_csv("russell-3000-constituents.csv", skiprows=9)
# remove last two rows
russell_3000 = russell_3000.iloc[:-2]
print("Number of all constituents:", len(russell_3000))
print("First five Russell 3000 constituents:")
russell_3000.head()
Number of all constituents: 2611
First five Russell 3000 constituents:
Ticker | Name | Sector | Asset Class | Market Value | Weight (%) | Notional Value | Shares | Price | Location | Exchange | Currency | FX Rate | Market Currency | Accrual Date | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | AAPL | APPLE INC | Information Technology | Equity | 559,365,151.11 | 5.16 | 559,365,151.11 | 4,305,127.00 | 129.93 | United States | NASDAQ | USD | 1.0 | USD | - |
1 | MSFT | MICROSOFT CORP | Information Technology | Equity | 513,917,712.42 | 4.74 | 513,917,712.42 | 2,142,931.00 | 239.82 | United States | NASDAQ | USD | 1.0 | USD | - |
2 | AMZN | AMAZON COM INC | Consumer Discretionary | Equity | 213,823,596.00 | 1.97 | 213,823,596.00 | 2,545,519.00 | 84.00 | United States | NASDAQ | USD | 1.0 | USD | - |
3 | BRKB | BERKSHIRE HATHAWAY INC CLASS B | Financials | Equity | 159,603,687.60 | 1.47 | 159,603,687.60 | 516,684.00 | 308.90 | United States | New York Stock Exchange Inc. | USD | 1.0 | USD | - |
4 | GOOGL | ALPHABET INC CLASS A | Communication | Equity | 151,996,026.75 | 1.40 | 151,996,026.75 | 1,722,725.00 | 88.23 | United States | NASDAQ | USD | 1.0 | USD | - |
Create a List of URLs for 10-K Filings
This part demonstrates how to use the Query API to retrieve the URLs of historical Form 10-K filings for all current Russell 3000 constituents and save them to local disk. Given the 2,611 constituents, a maximum of 2,611 Form 10-K filings per year can be expected, though likely fewer, as some holdings represent money market funds, futures, or other non-equity securities.
The Query API can return up to 10,000 results per search query, utilizing pagination. Since the search spans from 2014 to 2023 (a 10-year period), approximately 25,000 10-K filings are expected, based on an average of 2,500 Russell 3000 constituents disclosing 10-K filings per year. The initial search query might look like this:
formType:"10-K" AND filedAt:[2014-01-01 TO 2023-12-31] AND ticker:(<all tickers of Russell 3000 constituents>)
However, since the API can only return up to 10,000 results per query, the search must be broken into smaller chunks to ensure that each query retrieves fewer than 10,000 filings. One approach is to create individual search queries per company ticker, as each company typically files one 10-K per year, totaling around 10 filings over the 10-year period. A more efficient strategy would be to search for multiple companies (e.g., 100 tickers) at a time, which would yield around 1,000 10-K filings per query over 10 years.
Here’s an example of how to split the search queries:
# Query 1: First 100 constituents of the Russell 3000. Use pagination to retrieve all 1,000 10-K filings.
formType:"10-K" AND filedAt:[2014-01-01 TO 2023-12-31] AND ticker:(<first 100 tickers of Russell 3000 constituents>)
# Query 2: Next 100 constituents.
formType:"10-K" AND filedAt:[2014-01-01 TO 2023-12-31] AND ticker:(<next 100 tickers of Russell 3000 constituents>)
# Continue splitting the search by 100 tickers at a time...
As a first step, a simple function is defined to split up the list of 2,611 tickers into batches of 100. The function create_batches
takes a list of tickers and a batch size as input and returns a list of batches, where each batch contains the specified number of tickers. The function is then used to create batches of 100 tickers each. For example, given a list [A,B,C,D,E,F]
and a batch size of 3, the result would be two batches [[A,B,C], [D,E,F]]
.
def create_batches(tickers = [], batch_size = 100):
return [list(tickers[i:i + batch_size]) for i in range(0, len(tickers), batch_size)]
ticker_batches = create_batches(russell_3000["Ticker"], batch_size=100)
# convert ticker_batches to dataframe with one column "Tickers"
# where each row contains a list (batch) of tickers
ticker_batches_df = pd.DataFrame({"Tickers": ticker_batches})
print("Number of batches:", len(ticker_batches_df))
print("First five ticker batches:")
ticker_batches_df.head()
Number of batches: 27
First five ticker batches:
Tickers | |
---|---|
0 | [AAPL, MSFT, AMZN, BRKB, GOOGL, UNH, GOOG, JNJ... |
1 | [SLB, REGN, VRTX, BDX, ZTS, TGT, APD, ITW, BSX... |
2 | [PH, PRU, MSCI, YUM, CHTR, ALL, ECL, KMI, LULU... |
3 | [VMC, AEE, ETR, WY, FE, FTV, EBAY, LEN, DTE, A... |
4 | [TXT, ETSY, MOS, HWM, WRB, AVY, SWKS, SYF, FIC... |
The next Python code snippet iterates over all batches, creating a search query for each batch to retrieve all 10-K filings for the respective tickers from 2014 to 2023. The filing URLs are then extracted from the Query API response and saved in a DataFrame. Since the Query API can handle up to 40 requests per second, the pandarallel
package is used to parallelize requests and improve efficiency by performing multiple queries simultaneously.
pandarallel
extends pandas DataFrame objects with the .parallel_apply(func)
method, which applies a function in parallel across rows. The number of concurrent tasks is controlled by the nb_workers
parameter, set via pandarallel.initialize(nb_workers=4)
. Using .parallel_apply
, the function to get all 10-K filing URLs is applied to each batch of tickers, enabling parallel processing of multiple batches, significantly speeding up the retrieval of metadata.
pip install sec-api pandarallel ipywidgets
from pandarallel import pandarallel
from sec_api import QueryApi
pandarallel.initialize(nb_workers=4, progress_bar=True)
SEC_API_KEY = "YOUR_API_KEY"
queryApi = QueryApi(api_key=SEC_API_KEY)
INFO: Pandarallel will run on 4 workers.
INFO: Pandarallel will use standard multiprocessing data transfer (pipe) to transfer data between the main process and workers.
def get_10K_filing_urls(row=None):
if row is None:
return []
ticker_batch = row["Tickers"]
if len(ticker_batch) == 0:
return []
# create a query string to search for 10-K filings for the given tickers
# "(ticker:AAPL OR ticker:MSFT OR ...)"
ticker_query = " OR ".join([f'ticker:"{ticker}"' for ticker in ticker_batch])
ticker_query = f"({ticker_query})"
# search for 10-K filings filed between 2014-01-01 and 2023-12-31
date_query = "filedAt:[2014-01-01 TO 2023-12-31]"
# exclude 10-K/A and NT 10-K filings
form_type_query = 'formType:"10-K" AND NOT formType:"10-K/A" AND NOT formType:"NT"'
search_query = f"{ticker_query} AND {date_query} AND {form_type_query}"
search_params = {
"query": search_query,
"from": 0,
"size": 50,
"sort": [{"filedAt": {"order": "desc"}}],
}
print(f"Fetching filings for {ticker_batch[:4]}...\n")
has_more_filings = True
filing_urls = []
while has_more_filings:
search_results = queryApi.get_filings(search_params)
filings = search_results["filings"]
if len(filings) == 0:
break
# extract metadata for each filing
# { "ticker": "...", "cik": "...", "filedAt": "...", "filingUrl": "..." }
metadata = list(
map(
lambda f: {
"ticker": f["ticker"],
"cik": f["cik"],
"filedAt": f["filedAt"],
"accessionNo": f["accessionNo"],
"filingUrl": f["linkToFilingDetails"],
},
filings,
)
)
filing_urls.extend(metadata)
search_params["from"] += search_params["size"]
return pd.DataFrame(filing_urls)
# use the first two batches of tickers for a test run
metadata = ticker_batches_df[:2].parallel_apply(get_10K_filing_urls, axis=1)
# uncomment the line below to process all batches
# metadata = ticker_batches_df.parallel_apply(get_10K_filing_urls, axis=1)
# concatenate the metadata dataframes to get a single dataframe
metadata = pd.concat(metadata.tolist(), ignore_index=True)
VBox(children=(HBox(children=(IntProgress(value=0, description='0.00%', max=1), Label(value='0 / 1'))), HBox(c…
Fetching filings for ['SLB', 'REGN', 'VRTX', 'BDX']...
Fetching filings for ['AAPL', 'MSFT', 'AMZN', 'BRKB']...
metadata.to_csv("russell-3000-10k-filing-urls.csv", index=False)
print("Filing metadata of 10-K filings disclosed between 2014 and 2023:")
metadata
Filing metadata of 10-K filings disclosed between 2014 and 2023:
ticker | cik | filedAt | accessionNo | filingUrl | |
---|---|---|---|---|---|
0 | AMAT | 6951 | 2023-12-15T16:01:24-05:00 | 0000006951-23-000041 | https://www.sec.gov/Archives/edgar/data/6951/0... |
1 | DE | 315189 | 2023-12-15T10:27:39-05:00 | 0001558370-23-019812 | https://www.sec.gov/Archives/edgar/data/315189... |
2 | AVGO | 1730168 | 2023-12-14T16:54:05-05:00 | 0001730168-23-000096 | https://www.sec.gov/Archives/edgar/data/173016... |
3 | DIS | 1744489 | 2023-11-21T17:04:04-05:00 | 0001744489-23-000216 | https://www.sec.gov/Archives/edgar/data/174448... |
4 | ADI | 6281 | 2023-11-21T16:24:08-05:00 | 0000006281-23-000203 | https://www.sec.gov/Archives/edgar/data/6281/0... |
... | ... | ... | ... | ... | ... |
1936 | GD | 40533 | 2014-02-07T09:05:33-05:00 | 0000040533-14-000002 | https://www.sec.gov/Archives/edgar/data/40533/... |
1937 | BIIB | 875045 | 2014-02-06T17:28:12-05:00 | 0000875045-14-000004 | https://www.sec.gov/Archives/edgar/data/875045... |
1938 | GM | 1467858 | 2014-02-06T12:49:42-05:00 | 0001467858-14-000043 | https://www.sec.gov/Archives/edgar/data/146785... |
1939 | CMG | 1058090 | 2014-02-04T20:22:21-05:00 | 0001193125-14-035451 | https://www.sec.gov/Archives/edgar/data/105809... |
1940 | SLB | 87347 | 2014-01-31T08:41:08-05:00 | 0001564590-14-000090 | https://www.sec.gov/Archives/edgar/data/87347/... |
1941 rows × 5 columns
Download SEC 10-K Filings to Local Disk
After aggregating the URLs of all 10-K filings between 2014 and 2023 for companies in the Russell 3000 index, the .get_filing(file_url)
method of the RenderApi
class in the sec-api
package is used to download the filings in their original HTML or text format. This method accepts the URL of the filing or any other EDGAR file (such as exhibits) and returns the original content.
For each company, the 10-K filings are downloaded and saved in a structured directory on the local disk. The directory structure is organized as follows: filings/{ticker}/{filing_date}_{accession_number}.html
, where the filing content is stored as .html
or .txt
files. The filename includes the filing date and accession number for easy reference. An example of this structure is:
filings/
AAPL/
2022-10-31_0000320193-22-000002.html
2021-10-29_0000320193-21-000096.html
...
MSFT/
2022-07-29_0001564590-22-031000.html
2021-07-30_0001564590-21-034056.html
...
...
The .get_filing(file_url)
function sends requests to the Filing Download API and supports up to 50 requests per second. To utilize this limit efficiently, the pandarallel
package is used again to parallelize the downloading process. A new function download_filing(row)
is defined to handle downloading and saving the filing content for each row in the metadata
DataFrame. By applying this function to the DataFrame using metadata.parallel_apply(download_filing)
, multiple filings are downloaded simultaneously, significantly speeding up the process.
import os
from sec_api import RenderApi
renderApi = RenderApi(SEC_API_KEY)
def download_filing(row):
ticker = row["ticker"]
accessionNo = row["accessionNo"]
filedAt = row["filedAt"].split("T")[0]
filing_url = row["filingUrl"]
try:
content = renderApi.get_filing(filing_url)
# check if path "filings/{ticker}/" exists. if not, create it
if not os.path.exists(f"filings/{ticker}/"):
os.makedirs(f"filings/{ticker}/")
file_type = filing_url.split("/")[-1].split(".")[1]
local_file_name = f"filings/{ticker}/{filedAt}_{accessionNo}.{file_type}"
with open(local_file_name, "w") as f:
f.write(content)
print(f"✅ Downloaded {local_file_name}")
except:
print(f"❌ {ticker}: downloaded failed for {filing_url}")
# perform test download of first 10 filings
downloaded = metadata[:10].parallel_apply(download_filing, axis=1)
# uncomment the line below to download all filings
# downloaded = metadata.parallel_apply(download_filing, axis=1)
print(f"Completed downloading {len(downloaded)} filings")
VBox(children=(HBox(children=(IntProgress(value=0, description='0.00%', max=3), Label(value='0 / 3'))), HBox(c…
✅ Downloaded filings/AMAT/2023-12-15_0000006951-23-000041.htm
✅ Downloaded filings/DIS/2023-11-21_0001744489-23-000216.htm
✅ Downloaded filings/DE/2023-12-15_0001558370-23-019812.htm
✅ Downloaded filings/ADI/2023-11-21_0000006281-23-000203.htm
✅ Downloaded filings/V/2023-11-15_0001403161-23-000099.htm
✅ Downloaded filings/QCOM/2023-11-01_0000804328-23-000055.htm
✅ Downloaded filings/AVGO/2023-12-14_0001730168-23-000096.htm
✅ Downloaded filings/SBUX/2023-11-17_0000829224-23-000058.htm
✅ Downloaded filings/AAPL/2023-11-02_0000320193-23-000106.htm
✅ Downloaded filings/ACN/2023-10-12_0001467373-23-000324.htm
Completed downloading 10 filings
Using a simple for
loop to iterate over each filing's metadata would result in downloading all filings sequentially, due to Python's single-threaded architecture. This approach would take several hours to complete. By spreading the download tasks across four independent threads (or workers) with pandarallel
, the total download time is reduced by approximately 75%, or to 1/4 of the original time. With five workers, the time would be reduced to 1/5 of the initial time.
Consider the following example:
- Average time to download a single filing: 300 ms
- Number of filings to download: 10 years × 2,500 filings per year = 25,000 filings
Time calculation for downloading sequentially (1 filing at a time):
- Total time = (25,000 filings × 300 ms) / 1,000 ms (per second) / 60 seconds (per minute) = 125 minutes (or 2 hours)
Time calculation for downloading with four workers (4 filings in parallel):
- Total time = (25,000 filings × 300 ms) / 1,000 ms / 60 seconds / 4 workers = 31 minutes (or 0.5 hours)
The figure below illustrates the difference between downloading 1 filing at a time versus downloading 4 filings in parallel. This parallel approach significantly reduces the overall download time and improves efficiency.
Download Financial Statements from 10-K and 10-Q Filings as Excel Files
SEC Form 10-K and 10-Q filings often include an attached Excel file named Financial_Report.xlsx
, which contains key financial statements across multiple sheets, such as:
- Document and entity information
- Balance sheets
- Income statements (statements of operations)
- Statement of shareholders' equity
- Statements of cash flows
- Notes to financial statements
For example, Amazon's financial statements for Q2 2024, included in its 10-Q filing, can be found in this Excel file.
To access the Excel file from any 10-K or 10-Q filing, the URL can be derived from the original filing URL as follows:
Original URL:
https://www.sec.gov/Archives/edgar/data/1018724/000101872424000130/amzn-20240630.htm
^-----^ ^----------------^ ^---------------^
CIK Accession No. Filing Filename
Replace the filing filename with Financial_Report.xlsx
:
https://www.sec.gov/Archives/edgar/data/1018724/000101872424000130/Financial_Report.xlsx
^-----^ ^----------------^ ^-------------------^
CIK Accession No. Excel Filename
The Excel file can be downloaded using the .get_file(file_url, return_binary=True)
method of the RenderApi
class. When downloading non-text, binary files such as Excel files, PDFs, or images, the return_binary=True
flag must be set to ensure the file is retrieved in binary format rather than text format.
pip install sec-api
import re
from sec_api import RenderApi
renderApi = RenderApi("YOUR_API_KEY")
# Form 10-Q filing URL for Amazon (AMZN) for the period Q2 2024
filing_url = "https://www.sec.gov/Archives/edgar/data/1018724/000101872424000130/amzn-20240630.htm"
# replace the filing file name with `Financial_Report.xlsx`
excel_file_url = re.sub(r"/[^\/]+\.htm", "/Financial_Report.xlsx", filing_url)
print("10-Q Filing URL:", filing_url)
print("Excel file URL:\t", excel_file_url)
10-Q Filing URL: https://www.sec.gov/Archives/edgar/data/1018724/000101872424000130/amzn-20240630.htm
Excel file URL: https://www.sec.gov/Archives/edgar/data/1018724/000101872424000130/Financial_Report.xlsx
excel_file = renderApi.get_file(excel_file_url, return_binary=True)
with open("Financial_Report.xlsx", "wb") as file:
file.write(excel_file)
print("Excel file saved to Financial_Report.xlsx")
Excel file saved to Financial_Report.xlsx
Download SEC EDGAR Filings as PDFs
Most SEC EDGAR filings are published in HTML format, while older filings were submitted as plain text (.txt
). Certain form types, such as Form 4 (insider trading reports), Form 13F (institutional investment manager reports), and Form N-PORT (investment company updates), are filed in XML format only. In rare instances, some filings, like SEC staff actions (ORDER
), are directly published as PDFs.
Since most SEC filings are not published in PDF format, converting the original content is necessary to download them as PDFs. The PDF Generator API offers this functionality by converting HTML, XML, or text-based filings and exhibits into PDFs while preserving the original formatting, including images, fonts and tables. The API supports downloading any EDGAR filing or attached exhibit as PDF, such as Form 10-K, 10-Q, 8-K, DEF 14A, Exhibit 99, and more.
In Python, a filing or exhibit can be downloaded as PDF file with the .get_pdf(file_url)
method of the PdfGeneratorApi
class from the sec-api
package. This method accepts the URL of the filing or exhibit and returns the converted PDF, preserving the original content and all images, tables, and formatting. Images, such as voting proposals in proxy statements, are optimized for printing, and invisible inline XBRL tags are removed to reduce PDF file size and prevent bloating.
Example Use Cases:
- Downloading a Form 10-K filing as a PDF
- Download proxy statements (DEF 14A) as PDFs
- Downloading a Form 4 filing as a PDF
- Downloading a Form 8-K filing as a PDF
- Downloading an Exhibit 99 file from a Form 8-K as a PDF
pip install sec-api
from sec_api import PdfGeneratorApi
pdfGeneratorApi = PdfGeneratorApi("YOUR_API_KEY")
# Tesla's 2024 10-K filing URL
filing_10K_url = "https://www.sec.gov/Archives/edgar/data/1318605/000162828024002390/tsla-20231231.htm"
# Nvidia's 2024 proxy statement (DEF14A)
proxy_statement_url = "https://www.sec.gov/Archives/edgar/data/1045810/000104581024000104/nvda-20240514.htm"
# Form 4 disclosing Berkshire Hathaway's $86 million purchase of SIRI stock
filing_4_url = "https://www.sec.gov/Archives/edgar/data/315090/000095017024114414/xslF345X05/ownership.xml"
# Mirosoft's Form 8-K filing disclosing a cybersecurity incident
filing_8K_url = "https://www.sec.gov/Archives/edgar/data/789019/000119312524011295/d708866d8k.htm"
# Exhibit 99 disclosing updates of financial results
exhibit_99_url = "https://www.sec.gov/ix?doc=/Archives/edgar/data/1320695/000132069520000148/ths12-31x201910krecast.htm"
# convert all EDGAR filings and exhibits to PDF
pdf_10K_filing = pdfGeneratorApi.get_pdf(filing_10K_url)
pdf_proxy_filing = pdfGeneratorApi.get_pdf(proxy_statement_url)
pdf_4_filing = pdfGeneratorApi.get_pdf(filing_4_url)
pdf_8K_filing = pdfGeneratorApi.get_pdf(filing_8K_url)
pdf_ex_99 = pdfGeneratorApi.get_pdf(exhibit_99_url)
# save the PDF files to disk
with open("tesla_10K.pdf", "wb") as file:
file.write(pdf_10K_filing)
with open("nvidia_proxy_statement.pdf", "wb") as file:
file.write(pdf_proxy_filing)
with open("berkshire_form_4.pdf", "wb") as file:
file.write(pdf_4_filing)
with open("microsoft_8K.pdf", "wb") as file:
file.write(pdf_8K_filing)
with open("exhibit_99.pdf", "wb") as file:
file.write(pdf_ex_99)
Download XBRL Data Files from SEC Filings
This example demonstrates how to find and download original XBRL data files attached to SEC filings, such as annual reports on Form 10-K.
Note: XBRL data is also accessible in an aggregated and structured JSON format via the XBRL-to-JSON API.
XBRL (eXtensible Business Reporting Language) files are XML-based documents that provide structured data for SEC EDGAR filings. They contain financial information like income statements, balance sheets, entity details (e.g., address, ticker symbol, auditor), and text blocks such as notes to financial statements. Many filings, such as annual and quarterly reports (Form 10-K, Form 10-Q), prospectuses (Form 424Bx) and registration statements (Form S-1, etc.), include an XBRL schema file that defines the structure of the filing data, along with other files that hold the actual structured data in XBRL format.
The metadata for these XBRL files, including their URLs and types, is found in the dataFiles
array within the filing object returned by the Query API. Key fields such as documentUrl
, type
, and description
provide details about each XBRL file, including its URL on EDGAR, the file type (e.g., EX-101.INS
), and a description like XBRL INSTANCE DOCUMENT
.
This example shows how to retrieve and download XBRL files attached to Form 10-Q filings from 2020 to 2023. The steps include:
- Find and aggregate all URLs of the XBRL files attached to Form 10-Q filings.
- Download all XBRL XML files retrieved in step 1 using the Filing Download API, and save them locally.
The example can be easily adapted to search for any filing form type, filer, or date range. For instance, it can be modified to locate XBRL files from 424B2 prospectuses.
pip install sec-api
SEC_API_KEY = "YOUR_API_KEY"
Finding URLs of XBRL Files
To locate all EDGAR 10-Q filings that include XBRL data, a match-any search query such as dataFiles:*
can be used. This query identifies any filing that contains a non-empty dataFiles
array, as the array exclusively holds XBRL data. To further narrow down the search, a form type filter formType:"10-Q" AND NOT formType:"10-Q/A"
is added, ensuring only 10-Q filings are retrieved, excluding amended versions. A date range filter is also applied to limit the search to a specific time frame, focusing on quarterly reports with XBRL data published within that period.
The final query looks like this:
dataFiles:* AND formType:"10-Q" AND NOT formType:"10-Q/A" AND filedAt:[2020-01-01 TO 2023-12-31]
Since the Query API can return a maximum of 10,000 filings per search query, and the number of 10-Q filings matching the above criteria exceeds 10,000, the query needs to be broken into smaller subsets. One approach is to construct search queries by month, iterating through the result pages for each month from 2020 to 2023. With a maximum of approximately 5,000 10-Q filings per month, the 10,000 result limit is never exceeded, allowing all filings to be fetched month by month, year over year.
import pandas as pd
from sec_api import QueryApi
queryApi = QueryApi(SEC_API_KEY)
filings = []
base_query = 'dataFiles:* AND formType:"10-Q" AND NOT formType:"10-Q/A"'
start_year = 2020
end_year = 2023
for year in range(start_year, end_year + 1):
print(f"Starting to download metadata of filings from {year}")
for month in range(1, 13):
print(f"-- Starting month {month}")
date_range_query = f"filedAt:[{year}-{month:02d}-01 TO {year}-{month:02d}-31]"
query = f"{base_query} AND {date_range_query}"
search_parameters = {
"query": query,
"from": 0,
"size": 50,
"sort": [{"filedAt": {"order": "desc"}}],
}
has_more_filings = True
while has_more_filings:
response = queryApi.get_filings(search_parameters)
if len(response["filings"]) == 0:
has_more_filings = False
break
filings.append(response["filings"])
search_parameters["from"] += 50
# uncomment the following line to fetch all filings
break
filings = [item for sublist in filings for item in sublist]
filings = pd.DataFrame(filings)
Starting to download metadata of filings from 2022
-- Starting month 1
-- Starting month 2
-- Starting month 3
-- Starting month 4
-- Starting month 5
-- Starting month 6
-- Starting month 7
-- Starting month 8
-- Starting month 9
-- Starting month 10
-- Starting month 11
-- Starting month 12
Starting to download metadata of filings from 2023
-- Starting month 1
-- Starting month 2
-- Starting month 3
-- Starting month 4
-- Starting month 5
-- Starting month 6
-- Starting month 7
-- Starting month 8
-- Starting month 9
-- Starting month 10
-- Starting month 11
-- Starting month 12
print(f"Total filings fetched: {len(filings)}")
print("10-Q filing metadata including XBRL data files:")
filings[["ticker", "cik", "formType", "accessionNo", "filedAt", "dataFiles"]].head(10)
Total filings fetched: 1200
10-Q filing metadata including XBRL data files:
ticker | cik | formType | accessionNo | filedAt | dataFiles | |
---|---|---|---|---|---|---|
0 | EVOA | 728447 | 10-Q | 0000950170-22-000601 | 2022-01-31T19:24:32-05:00 | [{'sequence': '6', 'size': '125600', 'document... |
1 | EVOA | 728447 | 10-Q | 0000950170-22-000600 | 2022-01-31T19:22:42-05:00 | [{'sequence': '6', 'size': '82410', 'documentU... |
2 | EVOA | 728447 | 10-Q | 0000950170-22-000599 | 2022-01-31T19:20:34-05:00 | [{'sequence': '6', 'size': '862403', 'document... |
3 | TVC | 1376986 | 10-Q | 0001376986-22-000005 | 2022-01-31T17:36:44-05:00 | [{'sequence': '6', 'size': '120761', 'document... |
4 | HP | 46765 | 10-Q | 0000046765-22-000006 | 2022-01-31T17:23:47-05:00 | [{'sequence': '5', 'size': '59546', 'documentU... |
5 | LUB | 16099 | 10-Q | 0000016099-22-000006 | 2022-01-31T16:52:26-05:00 | [{'sequence': '7', 'size': '50542', 'documentU... |
6 | DLHC | 785557 | 10-Q | 0000785557-22-000003 | 2022-01-31T16:32:17-05:00 | [{'sequence': '5', 'size': '38022', 'documentU... |
7 | CRUS | 772406 | 10-Q | 0000772406-22-000006 | 2022-01-31T16:01:19-05:00 | [{'sequence': '6', 'size': '36119', 'documentU... |
8 | MNRO | 876427 | 10-Q | 0000876427-22-000003 | 2022-01-31T15:51:34-05:00 | [{'sequence': '6', 'size': '34194', 'documentU... |
9 | ADP | 8670 | 10-Q | 0000008670-22-000014 | 2022-01-31T15:19:08-05:00 | [{'sequence': '8', 'size': '51863', 'documentU... |
import json
print("Metadata of XBRL files from the first filing:")
print(json.dumps(response["filings"][0]["dataFiles"], indent=2))
Metadata of XBRL files from the first filing:
[
{
"sequence": "6",
"size": "21999",
"documentUrl": "https://www.sec.gov/Archives/edgar/data/1884072/000119983523000643/jewl-20230630.xsd",
"description": "XBRL SCHEMA FILE",
"type": "EX-101.SCH"
},
{
"sequence": "7",
"size": "36299",
"documentUrl": "https://www.sec.gov/Archives/edgar/data/1884072/000119983523000643/jewl-20230630_cal.xml",
"description": "XBRL CALCULATION FILE",
"type": "EX-101.CAL"
},
{
"sequence": "8",
"size": "68475",
"documentUrl": "https://www.sec.gov/Archives/edgar/data/1884072/000119983523000643/jewl-20230630_def.xml",
"description": "XBRL DEFINITION FILE",
"type": "EX-101.DEF"
},
{
"sequence": "9",
"size": "197267",
"documentUrl": "https://www.sec.gov/Archives/edgar/data/1884072/000119983523000643/jewl-20230630_lab.xml",
"description": "XBRL LABEL FILE",
"type": "EX-101.LAB"
},
{
"sequence": "10",
"size": "157708",
"documentUrl": "https://www.sec.gov/Archives/edgar/data/1884072/000119983523000643/jewl-20230630_pre.xml",
"description": "XBRL PRESENTATION FILE",
"type": "EX-101.PRE"
},
{
"sequence": "42",
"size": "298216",
"documentUrl": "https://www.sec.gov/Archives/edgar/data/1884072/000119983523000643/jewl-10q_htm.xml",
"description": "EXTRACTED XBRL INSTANCE DOCUMENT",
"type": "XML"
}
]
xbrl_files = filings.explode("dataFiles")[
["ticker", "cik", "formType", "accessionNo", "filedAt", "dataFiles"]
]
columns_to_add = ["type", "description", "documentUrl"]
for col in columns_to_add:
xbrl_files[col] = xbrl_files["dataFiles"].apply(
lambda x: x[col] if col in x else None
)
xbrl_files = xbrl_files.drop(columns=["dataFiles"])
# save to CSV file
xbrl_files.to_csv("xbrl_files.csv", index=False)
print("XBRL data files:")
xbrl_files
XBRL data files:
ticker | cik | formType | accessionNo | filedAt | type | description | documentUrl | |
---|---|---|---|---|---|---|---|---|
0 | EVOA | 728447 | 10-Q | 0000950170-22-000601 | 2022-01-31T19:24:32-05:00 | EX-101.SCH | XBRL TAXONOMY EXTENSION SCHEMA DOCUMENT | https://www.sec.gov/Archives/edgar/data/728447... |
0 | EVOA | 728447 | 10-Q | 0000950170-22-000601 | 2022-01-31T19:24:32-05:00 | EX-101.PRE | XBRL TAXONOMY EXTENSION PRESENTATION LINKBASE ... | https://www.sec.gov/Archives/edgar/data/728447... |
0 | EVOA | 728447 | 10-Q | 0000950170-22-000601 | 2022-01-31T19:24:32-05:00 | EX-101.LAB | XBRL TAXONOMY EXTENSION LABEL LINKBASE DOCUMENT | https://www.sec.gov/Archives/edgar/data/728447... |
0 | EVOA | 728447 | 10-Q | 0000950170-22-000601 | 2022-01-31T19:24:32-05:00 | EX-101.CAL | XBRL TAXONOMY EXTENSION CALCULATION LINKBASE D... | https://www.sec.gov/Archives/edgar/data/728447... |
0 | EVOA | 728447 | 10-Q | 0000950170-22-000601 | 2022-01-31T19:24:32-05:00 | EX-101.DEF | XBRL TAXONOMY EXTENSION DEFINITION LINKBASE DO... | https://www.sec.gov/Archives/edgar/data/728447... |
... | ... | ... | ... | ... | ... | ... | ... | ... |
1199 | AITR | 1966734 | 10-Q | 0001493152-23-045528 | 2023-12-20T11:13:54-05:00 | EX-101.CAL | XBRL CALCULATION FILE | https://www.sec.gov/Archives/edgar/data/196673... |
1199 | AITR | 1966734 | 10-Q | 0001493152-23-045528 | 2023-12-20T11:13:54-05:00 | EX-101.DEF | XBRL DEFINITION FILE | https://www.sec.gov/Archives/edgar/data/196673... |
1199 | AITR | 1966734 | 10-Q | 0001493152-23-045528 | 2023-12-20T11:13:54-05:00 | EX-101.LAB | XBRL LABEL FILE | https://www.sec.gov/Archives/edgar/data/196673... |
1199 | AITR | 1966734 | 10-Q | 0001493152-23-045528 | 2023-12-20T11:13:54-05:00 | EX-101.PRE | XBRL PRESENTATION FILE | https://www.sec.gov/Archives/edgar/data/196673... |
1199 | AITR | 1966734 | 10-Q | 0001493152-23-045528 | 2023-12-20T11:13:54-05:00 | XML | EXTRACTED XBRL INSTANCE DOCUMENT | https://www.sec.gov/Archives/edgar/data/196673... |
7196 rows × 8 columns
Download XBRL Files from SEC Filings
In the final step, the XBRL files from SEC filings are downloaded and organized into the following folder structure: xbrl-files/<cik>/<accessionNo>/<edgar_file_type>.<file_extension>
, where <file_extension>
refers to the XBRL file type (e.g., xml
or xsd
). An example folder structure is shown below:
xbrl-files/
320193/
0000320193-21-000139/
EX-101.SCH.xsd
EX-101.CAL.xml
EX-101.DEF.xml
EX-101.LAB.xml
EX-101.PRE.xml
XML.xml
.../
The following table provides an overview of the XBRL file types and their descriptions:
EDGAR File Type | File Extension | Description |
---|---|---|
EX-101.SCH | *.xsd | XBRL Taxonomy Schema |
EX-101.CAL | *.xml | XBRL Calculation Linkbase |
EX-101.DEF | *.xml | XBRL Definition Linkbase |
EX-101.LAB | *.xml | XBRL Label Linkbase |
EX-101.PRE | *.xml | XBRL Presentation Linkbase |
XML | *.xml | XBRL Instance Document |
To efficiently download multiple XBRL files at once, the pandarallel
package is used to parallelize the download process across multiple threads, significantly speeding up the retrieval process.
pip install pandarallel ipywidgets
import os
from pandarallel import pandarallel
from sec_api import RenderApi
pandarallel.initialize(nb_workers=4, progress_bar=True)
renderApi = RenderApi(SEC_API_KEY)
INFO: Pandarallel will run on 4 workers.
INFO: Pandarallel will use standard multiprocessing data transfer (pipe) to transfer data between the main process and workers.
def download_and_save_xbrl_file(row):
cik = row["cik"]
accessionNo = row["accessionNo"]
file_url = row["documentUrl"]
file_type = row["type"]
filename_type = file_url.split(".")[-1]
try:
xbrl_data = renderApi.get_file(file_url)
xbrl_file_name = f"{file_type}.{filename_type}"
folder_path = f"xbrl-files/{cik}/{accessionNo}"
file_path = f"{folder_path}/{xbrl_file_name}"
if not os.path.exists(folder_path):
os.makedirs(folder_path)
with open(file_path, "w") as f:
f.write(xbrl_data)
except Exception as e:
print(f"Failed to download {file_url} for {cik} - {accessionNo}\n")
return None
# download and save the first 50 XBRL files
results = xbrl_files[:50].parallel_apply(download_and_save_xbrl_file, axis=1)
# uncomment the line below to download all XBRL files
# results = xbrl_files.parallel_apply(download_and_save_xbrl_file, axis=1)
print(f"Downloaded {len(results)} XBRL files")
VBox(children=(HBox(children=(IntProgress(value=0, description='0.00%', max=13), Label(value='0 / 13'))), HBox…
Downloaded 50 XBRL files
Download Material Contracts (Exhibit 10) from SEC 10-K Filings
This example demonstrates how to locate material contracts disclosed in Exhibit 10 of SEC filings—such as Forms 10-K, 10-Q, or S-1—and download them in their original HTML format and as PDFs to local disk.
For this example, the focus is on Exhibit 10 contracts disclosed in Form 10-K filings from 2020 to 2023. The steps are as follows:
- Use the Query API to find URLs for Exhibit 10 files within 10-K filings.
- Download the material contracts as both HTML and PDF files using the Filing Download and PDF Generator APIs.
pip install sec-api
SEC_API_KEY = "YOUR_API_KEY"
Find and Aggregate URLs of Exhibit 10 Files from 10-K Filings
The Query API is used to locate and compile URLs for material contracts disclosed in Exhibit 10 from 10-K filings submitted between 2020 and 2023. The following search query filters the desired filings:
formType:"10-K" AND documentFormatFiles.type:"EX-10" AND filedAt:[2020-01-01 TO 2023-12-31]
This search retrieves metadata for all Form 10-K filings containing EX-10
(Exhibit 10) documents filed within the specified date range (January 2020 to December 2023). The documentFormatFiles
array within each filing’s metadata includes detailed information about each attached document, such as its URL (documentUrl
), type, description, and size. An example structure of the documentFormatFiles
array is shown below:
"documentFormatFiles": [
{
"sequence": "3",
"size": "50752",
"documentUrl": "https://www.sec.gov/Archives/edgar/data/72331/000007233123000242/ex10-unordsonxformofstocko.htm",
"description": "EX-10.U",
"type": "EX-10.U"
}
// ... additional documents
],
// ... other filing metadata
Examples of Exhibit 10 Files
Exhibit 10 material contracts encompass a variety of agreement types, including but not limited to:
- Compensation plans for non-employee directors
- Stock award notices
- Board resolutions (e.g., issuance of convertible notes)
- Investment management trust agreements
- Credit card program agreements
- License agreements
- Credit agreements
- Note purchase agreements
- Share exchange agreements
Although the Query API locates filings containing Exhibit 10, it cannot filter specific types of material contracts within Exhibit 10. To identify specific contract types, additional filtering can be performed client-side by downloading the HTML content and searching for specific keywords or phrases, such as "license agreement," within the exhibit text.
import pandas as pd
from sec_api import QueryApi
queryApi = QueryApi(api_key=SEC_API_KEY)
query = 'formType:"10-K" AND documentFormatFiles.type:"EX-10" AND filedAt:[2020-01-01 TO 2023-12-31]'
search_params = {
"query": query,
"from": 0,
"size": 50,
"sort": [{"filedAt": {"order": "desc"}}],
}
response = queryApi.get_filings(search_params)
print(f'Number of 10-K filings with Exhibit 10 (2020-2023) found:')
print(response["total"]["value"])
Number of 10-K filings with Exhibit 10 (2020-2023) found:
923
import re
def is_exhibit_10(file):
return bool(re.search(r"EX-10", file["type"]))
def get_exhibit_10_urls():
exhibits = []
has_more_filings = True
query = 'formType:"10-K" AND documentFormatFiles.type:"EX-10" AND filedAt:[2020-01-01 TO 2023-12-31]'
search_params = {
"query": query,
"from": 0,
"size": 50,
"sort": [{"filedAt": {"order": "desc"}}],
}
while has_more_filings:
response = queryApi.get_filings(search_params)
if len(response["filings"]) == 0:
break
for filing in response["filings"]:
for file in filing["documentFormatFiles"]:
if is_exhibit_10(file):
exhibits.append(
{
"accessionNo": filing["accessionNo"],
"filedAt": filing["filedAt"],
"companyName": filing["companyName"],
"ticker": filing["ticker"],
"cik": filing["cik"],
"exhibit10Url": file["documentUrl"],
}
)
search_params["from"] += 50
return pd.DataFrame(exhibits)
exhibit_10_files = get_exhibit_10_urls()
print("Exhibit 10 files:")
exhibit_10_files
Exhibit 10 files:
accessionNo | filedAt | companyName | ticker | cik | exhibit10Url | |
---|---|---|---|---|---|---|
0 | 0000072331-23-000242 | 2023-12-20T17:04:04-05:00 | NORDSON CORP | NDSN | 72331 | https://www.sec.gov/Archives/edgar/data/72331/... |
1 | 0000072331-23-000242 | 2023-12-20T17:04:04-05:00 | NORDSON CORP | NDSN | 72331 | https://www.sec.gov/Archives/edgar/data/72331/... |
2 | 0000072331-23-000242 | 2023-12-20T17:04:04-05:00 | NORDSON CORP | NDSN | 72331 | https://www.sec.gov/Archives/edgar/data/72331/... |
3 | 0000072331-23-000242 | 2023-12-20T17:04:04-05:00 | NORDSON CORP | NDSN | 72331 | https://www.sec.gov/Archives/edgar/data/72331/... |
4 | 0001437749-23-034783 | 2023-12-18T17:26:13-05:00 | HOVNANIAN ENTERPRISES INC | HOV | 357294 | https://www.sec.gov/Archives/edgar/data/357294... |
... | ... | ... | ... | ... | ... | ... |
2796 | 0001133421-20-000006 | 2020-01-30T06:39:26-05:00 | NORTHROP GRUMMAN CORP /DE/ | NOC | 1133421 | https://www.sec.gov/Archives/edgar/data/113342... |
2797 | 0001067701-20-000008 | 2020-01-29T16:35:59-05:00 | UNITED RENTALS, INC. | URI | 1067701 | https://www.sec.gov/Archives/edgar/data/104716... |
2798 | 0001067701-20-000008 | 2020-01-29T16:35:59-05:00 | UNITED RENTALS NORTH AMERICA INC | URI | 1047166 | https://www.sec.gov/Archives/edgar/data/104716... |
2799 | 0001564590-20-002467 | 2020-01-29T16:32:43-05:00 | SYNNEX CORP | SNX | 1177394 | https://www.sec.gov/Archives/edgar/data/117739... |
2800 | 0001052918-20-000008 | 2020-01-10T13:18:02-05:00 | Timberline Resources Corp | TLRS | 1288750 | https://www.sec.gov/Archives/edgar/data/128875... |
2801 rows × 6 columns
exhibit_10_files['exhibit10Url'][:10].to_list()
['https://www.sec.gov/Archives/edgar/data/72331/000007233123000242/ex10-unordsonxformofstocko.htm',
'https://www.sec.gov/Archives/edgar/data/72331/000007233123000242/ex10-vnordsonxformofrestri.htm',
'https://www.sec.gov/Archives/edgar/data/72331/000007233123000242/ex10-wnoticeofawardpsufy24.htm',
'https://www.sec.gov/Archives/edgar/data/72331/000007233123000242/ex10-xrestrictedshareunita.htm',
'https://www.sec.gov/Archives/edgar/data/357294/000143774923034783/ex_606399.htm',
'https://www.sec.gov/Archives/edgar/data/357294/000143774923034783/ex_605281.htm',
'https://www.sec.gov/Archives/edgar/data/357294/000143774923034783/ex_605282.htm',
'https://www.sec.gov/Archives/edgar/data/357294/000143774923034783/ex_605283.htm',
'https://www.sec.gov/Archives/edgar/data/357294/000143774923034783/ex_605284.htm',
'https://www.sec.gov/Archives/edgar/data/357294/000143774923034783/ex_605285.htm']
Download Material Contracts as HTML and PDF Files
This final step illustrates how to download both HTML and PDF versions of material contracts disclosed in Exhibit 10. The Filing Download and PDF Generator APIs facilitate the file downloads, while pandarallel
is used to parallelize the process, enabling concurrent downloads for improved speed.
The downloaded Exhibit 10 files are organized into a structured folder hierarchy as follows:
exhibit_10_files/<cik>/<accession_number>/<exhibit_filename>.(htm|pdf)
Example folder structure:
exhibit_10_files/
72331/
000007233123000242/
ex10-stock-options.htm
ex10-stock-options.pdf
ex10-share-unit-awards.htm
ex10-share-unit-awards.pdf
...
...
pip install pandarallel ipywidgets
import os
from pandarallel import pandarallel
from sec_api import RenderApi, PdfGeneratorApi
pandarallel.initialize(nb_workers=5, progress_bar=True)
renderApi = RenderApi(SEC_API_KEY)
pdfGeneratorApi = PdfGeneratorApi(SEC_API_KEY)
def download_exhibit_10_file(row):
cik = row["cik"]
accessionNo = row["accessionNo"]
url = row["exhibit10Url"]
file_name_html = url.split("/")[-1]
file_name_pdf = file_name_html.replace(".htm", ".pdf")
folder = f"exhibit_10_files/{cik}/{accessionNo}/"
if not os.path.exists(folder):
os.makedirs(folder)
exhibit_file_html = renderApi.get_file(url)
exhibit_file_pdf = pdfGeneratorApi.get_pdf(url)
# save HTML and PDF file
with open(folder + file_name_html, "w") as file:
file.write(exhibit_file_html)
with open(folder + file_name_pdf, "wb") as file:
file.write(exhibit_file_pdf)
# download the first 20 Exhibit 10 files
results = exhibit_10_files[:20].parallel_apply(download_exhibit_10_file, axis=1)
# uncomment to download all Exhibit 10 files
# results = exhibit_10_files.parallel_apply(download_exhibit_10_file, axis=1)
INFO: Pandarallel will run on 5 workers.
INFO: Pandarallel will use standard multiprocessing data transfer (pipe) to transfer data between the main process and workers.
VBox(children=(HBox(children=(IntProgress(value=0, description='0.00%', max=4), Label(value='0 / 4'))), HBox(c…
Download Press Releases with Financial Results in Exhibit 99 from SEC 8-K Filings
This tutorial outlines the steps to download press releases (PRs) that disclose annual and quarterly financial results, filed as Exhibit 99 within SEC 8-K filings.
Steps for downloading PRs with financial results in Exhibit 99:
- Use the Query API to locate all 8-K filings that include press releases with financial results in Exhibit 99.
- Extract the URLs for each press release from the filing metadata and save them locally.
- Download the press releases using the list of URLs with the Download API.
!pip install -q sec-api
from sec_api import QueryApi
api_key = "YOUR_API_KEY"
queryApi = QueryApi(api_key)
Exhibit 99 in 8-K Filings
Press releases (PRs) are typically attached to 8-K filings as Exhibit 99. To identify all relevant 8-K filings containing PRs, it is necessary to filter for those that include Exhibit 99, excluding 8-Ks without it.
The example below shows the filing details page from an 8-K filing by Nvidia, which contains two Exhibit 99 files:
- Press release with financial results: Q4FY23 PRESS RELEASE (q4fy23pr.htm)
- CFO commentary: Q4FY23 CFO COMMENTARY (q4fy23cfocommentary.htm)
Types of Exhibit 99 Content in 8-K Filings
Exhibit 99 in 8-K filings includes a diverse range of material information, often beyond just press releases. Below is an overview of the types of content typically disclosed in Exhibit 99:
- Press releases about annual and quarterly financial results, management changes, and other material events. Example.
- CFO commentary on company performance. Example
- Investor updates, such as slides presented at a conference with investors and analysts, health conference presentations, and general business updates.
- Announcements of
- Cash dividends. Example.
- Share repurchase agreements and authorizations. Example.
- Agreement and plan of a merger and acquisition. Example, example.
- Private offering of convertible senior notes. Example.
- Results of preclinical studies. Example.
- Phase 1 clinicial trial updates. Example.
- Launch of clinical trials. Example.
- New Drug Applications (NDAs). Example.
- Letter of intent counterparty. Example.
- Changes in management, such as appointment of new CFO or board member. Example.
- Receipt of NASDAQ notification about non-compliance with minimum bid price requirement. Example.
- Receipt of Continued Listing Standard Notice from NYSE. Example.
- Stockholder approval of amended Articles of Incorporation. Example
- Letter to shareholders. Example.
- Conference posters. Example.
- Asset acquisition term sheet. Example.
- Report of independent registered public accounting firm. Example.
- Notification of total voting rights. Example.
- Material change reports. Example.
- Consulting agreement. Example.
- By-laws. Example.
- Board actions and resolutions. Example.
- Management resignation letters. Example.
- Results of annual meeting of stockholders. Example.
- Invitations to shareholder meetings. Example.
- Earnings call conference slides. Example.
8-K filings are used to inform investors about a multitude of material events, such as changes to management, board members, auditors, by-laws and more. Each event is categorized under one of 33 items, such as Item 9.01 Financial Statements and Exhibits which covers, among other things, quarterly or annual business performance updates. Refer to Supported 8-K Section Items for a complete list of all 8-K items.
Find 8-K Filings with Exhibit 99
An 8-K filing reporting a management change (Item 5.02 Departure of Directors or Certain Officers) and one disclosing annual financial results (Item 2.02 Results of Operations and Financial Condition and Item 9.01 Financial Statements and Exhibits) can both include Exhibit 99 attachments. For this example, however, only 8-K filings with Exhibit 99 and items 2.02 and 9.01 are relevant.
The search expression criteria are as follows:
formType:"8-K"
to include both 8-K and 8-K/A filings,documentFormatFiles.type:99
to focus only on exhibits of type99
(e.g.,EX-99.1
andEXHIBIT99
),items:"2.02" AND items:"9.01"
to limit results to filings that include Item 2.02 Results of Operations and Financial Condition and Item 9.01 Financial Statements and Exhibits.
Combining these conditions with the AND
operator results in the following query:
formType:"8-K" AND items:"2.02" AND items:"9.01" AND documentFormatFiles.type:(99, 99*, *99, *99*)
search_params = {
"query": 'formType:"8-K" AND documentFormatFiles.type:(99, 99*, *99, *99*) AND items:"9.01" AND items:"2.02"',
"from": "0",
"size": "50",
"sort": [{"filedAt": {"order": "desc"}}],
}
response = queryApi.get_filings(search_params)
Let's convert the metadata of the first 50 matching filings into a pandas DataFrame.
import pandas as pd
# convert Query API response into a DataFrame
filings = pd.DataFrame.from_records(response['filings'])
print('Keys of the metadata for each filing')
print('---------------------------------')
print(*list(filings.keys()), sep='\n')
Keys of the metadata for each filing
---------------------------------
ticker
formType
accessionNo
cik
companyNameLong
companyName
linkToFilingDetails
description
linkToTxt
filedAt
documentFormatFiles
periodOfReport
entities
id
seriesAndClassesContractsInformation
items
linkToHtml
linkToXbrl
dataFiles
filings.head(3)
ticker | formType | accessionNo | cik | companyNameLong | companyName | linkToFilingDetails | description | linkToTxt | filedAt | documentFormatFiles | periodOfReport | entities | id | seriesAndClassesContractsInformation | items | linkToHtml | linkToXbrl | dataFiles | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | NUWE | 8-K | 0001140361-24-045016 | 1506492 | Nuwellis, Inc. (Filer) | Nuwellis, Inc. | https://www.sec.gov/Archives/edgar/data/150649... | Form 8-K - Current report - Item 2.02 Item 9.01 | https://www.sec.gov/Archives/edgar/data/150649... | 2024-11-01T16:45:31-04:00 | [{'sequence': '1', 'size': '29515', 'documentU... | 2024-11-01 | [{'fiscalYearEnd': '1231', 'stateOfIncorporati... | 05cca44e7a981e8c0640dafab4a50149 | [] | [Item 2.02: Results of Operations and Financia... | https://www.sec.gov/Archives/edgar/data/150649... | [{'sequence': '3', 'size': '3997', 'documentUr... | |
1 | CPSH | 8-K | 0001437749-24-033004 | 814676 | CPS TECHNOLOGIES CORP/DE/ (Filer) | CPS TECHNOLOGIES CORP/DE/ | https://www.sec.gov/Archives/edgar/data/814676... | Form 8-K - Current report - Item 2.02 Item 8.0... | https://www.sec.gov/Archives/edgar/data/814676... | 2024-11-01T16:40:51-04:00 | [{'sequence': '1', 'size': '24803', 'documentU... | 2024-11-01 | [{'fiscalYearEnd': '1228', 'stateOfIncorporati... | aee603751997acf7a312c9543c0e1f53 | [] | [Item 2.02: Results of Operations and Financia... | https://www.sec.gov/Archives/edgar/data/814676... | [{'sequence': '4', 'size': '3409', 'documentUr... | |
2 | INMB | 8-K | 0001213900-24-093531 | 1711754 | Inmune Bio, Inc. (Filer) | Inmune Bio, Inc. | https://www.sec.gov/Archives/edgar/data/171175... | Form 8-K - Current report - Item 2.02 Item 9.01 | https://www.sec.gov/Archives/edgar/data/171175... | 2024-11-01T16:23:48-04:00 | [{'sequence': '1', 'size': '24650', 'documentU... | 2024-10-31 | [{'fiscalYearEnd': '1231', 'stateOfIncorporati... | 8a3b08e2f061a3c4d0b97044c089fb76 | [] | [Item 2.02: Results of Operations and Financia... | https://www.sec.gov/Archives/edgar/data/171175... | [{'sequence': '4', 'size': '3018', 'documentUr... |
To extract URLs for all Exhibit 99 files from the documentFormatFiles
list, iterate through each filing’s metadata and filter the documentFormatFiles
entries based on the type
containing "99". Since type
values are not standardized, partial matching on "99" is necessary to capture variations like EX-99.1
, EX-99
, or EX-99.01
.
documentFormatFiles = [doc for sublist in list(filings['documentFormatFiles']) for doc in sublist]
exhibit_99s = list(filter(lambda doc: '99' in doc['type'], documentFormatFiles))
exhibit_99s[:5]
[{'sequence': '2',
'size': '14187',
'documentUrl': 'https://www.sec.gov/Archives/edgar/data/1506492/000114036124045016/ef20037953_99-1.htm',
'description': 'EXHIBIT 99.1',
'type': 'EX-99.1'},
{'sequence': '2',
'size': '200055',
'documentUrl': 'https://www.sec.gov/Archives/edgar/data/814676/000143774924033004/ex_741572.htm',
'description': 'EXHIBIT 99.1 PRESS RELEASE',
'type': 'EX-99.1'},
{'sequence': '3',
'size': '184062',
'documentUrl': 'https://www.sec.gov/Archives/edgar/data/814676/000143774924033004/ex_741707.htm',
'description': 'EXHIBIT 99.2',
'type': 'EX-99.2'},
{'sequence': '2',
'size': '100966',
'documentUrl': 'https://www.sec.gov/Archives/edgar/data/1711754/000121390024093531/ea021928902ex99-1_inmune.htm',
'description': 'PRESS RELEASE OF INMUNE BIO INC., DATED OCTOBER 31, 2024',
'type': 'EX-99.1'},
{'sequence': '2',
'size': '66514',
'documentUrl': 'https://www.sec.gov/Archives/edgar/data/1538822/000110465924113497/tm2427172d1_ex99-1.htm',
'description': 'EXHIBIT 99.1',
'type': 'EX-99.1'}]
With the logic in place to locate metadata for relevant 8-K filings and extract URLs for Exhibit 99 files, the next step is to create a function, download_metadata(start_year, end_year)
. This function will encapsulate all necessary steps and execute them over a specified range of years. The function will return a DataFrame with the URLs of all Exhibit 99 files from the selected 8-K filings. The results will be saved to a CSV file exhibit-99-8k-filings-metadata.csv
for further processing.
from pathlib import Path
import json
def download_metadata(start_year=2020, end_year=2023):
output_file = "exhibit-99-8k-filings-metadata.csv"
if Path(output_file).is_file():
result = pd.read_csv(output_file)
return result
print("✅ Starting download process")
# create ticker batches, with 100 tickers per batch
frames = []
for year in range(start_year, end_year + 1):
for month in range(1, 13):
for from_index in range(0, 9950, 50):
date_range_query = f"filedAt:[{year}-{month:02d}-01 TO {year}-{month:02d}-31]"
form_tye_query = 'formType:"8-K"'
document_format_query = "documentFormatFiles.type:(99, 99*, *99, *99*)"
items_query = 'items:("9.01" AND "2.02")'
query = (
form_tye_query
+ " AND "
+ document_format_query
+ " AND "
+ items_query
+ " AND "
+ date_range_query
)
search_params = {
"query": query,
"from": from_index,
"size": "50",
"sort": [{"filedAt": {"order": "desc"}}],
}
# print(json.dumps(query))
response = queryApi.get_filings(search_params)
if len(response["filings"]) == 0:
break
filings = pd.DataFrame.from_records(response["filings"])
documentFormatFiles = [
doc
for sublist in list(filings["documentFormatFiles"])
for doc in sublist
]
exhibit_99s_list = list(
filter(lambda doc: "99" in doc["type"], documentFormatFiles)
)
exhibit_99s_df = pd.DataFrame.from_records(exhibit_99s_list)
frames.append(exhibit_99s_df)
print(
"Month {year}-{month:02d}, from {from_index} completed".format(
year=year, month=month, from_index=from_index
)
)
print("✅ Downloaded metadata for year", year)
result = pd.concat(frames)
result.to_csv(output_file, index=False)
number_metadata_downloaded = len(result)
print(
"✅ Downloaded completed. Metadata downloaded for {} filings.".format(
number_metadata_downloaded
)
)
return result
exhibit_99s = download_metadata(start_year=2023, end_year=2023)
✅ Starting download process
Month 2023-01, from 0 completed
Month 2023-02, from 0 completed
Month 2023-03, from 0 completed
Month 2023-04, from 0 completed
Month 2023-05, from 0 completed
Month 2023-06, from 0 completed
Month 2023-07, from 0 completed
Month 2023-08, from 0 completed
Month 2023-09, from 0 completed
Month 2023-10, from 0 completed
Month 2023-11, from 0 completed
Month 2023-12, from 0 completed
✅ Downloaded metadata for year 2023
✅ Downloaded completed. Metadata downloaded for 720 filings.
print('Number of Exhibit 99 URLs found for 2023:', len(exhibit_99s))
exhibit_99s
Number of Exhibit 99 URLs found for 2023: 720
sequence | size | documentUrl | description | type | |
---|---|---|---|---|---|
0 | 2 | 76339 | https://www.sec.gov/Archives/edgar/data/117515... | EXHIBIT 99.1 | EX-99.1 |
1 | 2 | 310936 | https://www.sec.gov/Archives/edgar/data/130221... | EX-99.1 | EX-99.1 |
2 | 2 | 4285072 | https://www.sec.gov/Archives/edgar/data/103754... | EX-99.1 | EX-99.1 |
3 | 3 | 274620 | https://www.sec.gov/Archives/edgar/data/103754... | EX-99.2 | EX-99.2 |
4 | 2 | 4285072 | https://www.sec.gov/Archives/edgar/data/103754... | EX-99.1 | EX-99.1 |
... | ... | ... | ... | ... | ... |
54 | 3 | 24419 | https://www.sec.gov/Archives/edgar/data/122338... | EX-99.2 | EX-99.2 |
55 | 2 | 90940 | https://www.sec.gov/Archives/edgar/data/171662... | PRESS RELEASE | EX-99.1 |
56 | 4 | 29311 | https://www.sec.gov/Archives/edgar/data/730255... | EX-99.1 | EX-99.1 |
57 | 2 | 5475 | https://www.sec.gov/Archives/edgar/data/706129... | EX-99.1 | EX-99.1 |
58 | 2 | 103416 | https://www.sec.gov/Archives/edgar/data/779544... | EX-99.1 | EX-99.1 |
720 rows × 5 columns
Download Press Releases from Exhibit 99 as HTML and PDF
With all Exhibit 99 URLs collected, the next step is to download the press releases in both HTML and PDF formats. The Download API retrieves the original HTML content, while the PDF Generator API converts this HTML content into PDF format. The downloaded files are organized using the following folder structure:
exhibit-99-files/
- <cik>/
- <accessionNo>-<fileName>.htm
- <accessionNo>-<fileName>.pdf
- ...
- ...
The download_exhibit(metadata)
function will handle the folder creation for each company's CIK and download each exhibit into its respective folder. Files are named according to the pattern <accessionNo>-<fileName>.htm
and <accessionNo>-<fileName>.pdf
, where accessionNo
and fileName
are derived from the metadata.
from sec_api import RenderApi, PdfGeneratorApi
import os
renderApi = RenderApi(api_key)
pdfGeneratorApi = PdfGeneratorApi(api_key)
def download_exhibit(metadata):
url = metadata["documentUrl"].replace("ix?doc=/", "")
try:
cik = [cik for cik in url.split("/") if cik.isdigit()][0]
accession_number = [cik for cik in url.split("/") if cik.isdigit()][1]
new_folder = "./exhibit-99-files/" + cik
if not os.path.isdir(new_folder):
os.makedirs(new_folder)
file_content = renderApi.get_filing(url)
file_content_pdf = pdfGeneratorApi.get_pdf(url)
file_name = accession_number + "-" + url.split("/")[-1]
file_name_pdf = file_name + ".pdf"
with open(new_folder + "/" + file_name, "w") as f:
f.write(file_content)
with open(new_folder + "/" + file_name_pdf, "wb") as f:
f.write(file_content_pdf)
except:
print("❌ download failed: {url}".format(url=url))
To parallelize the exhibit download process with pandarallel
, configure it to use four worker threads, enabling concurrent downloads of four exhibits at a time. The download_exhibit
function is then applied to each row of the exhibit_99s
DataFrame using the .parallel_apply
method from pandarallel
.
!pip install -q pandarallel
from pandarallel import pandarallel
number_of_workers = 4
pandarallel.initialize(progress_bar=True, nb_workers=number_of_workers, verbose=0)
# run a quick test and download 50 exhibits
sample = exhibit_99s.head(50)
sample.parallel_apply(download_exhibit, axis=1)
# uncomment to download all exhibits
# exhibit_99s.parallel_apply(download_filing, axis=1)
print('✅ Download completed')
VBox(children=(HBox(children=(IntProgress(value=0, description='0.00%', max=13), Label(value='0 / 13'))), HBox…
✅ Download completed