Download Financial Statements from EDGAR Filings as Excel File
On this page:
In this tutorial, we will guide you through the process of downloading the Financial_Report.xlsx
Excel files attached to 10-K and 10-Q filings using Python.
Here is an overview of the steps involved:
- Query API: We will start by using the Query API to filter the EDGAR database and build a list of URLs for the financial reports we are interested in. For this example, we will focus on a small universe of companies, including Microsoft, Amazon, Netflix, Apple, and Tesla. However, you can easily expand the search universe by adjusting the search query to include more companies or even all filings.
- Render API: Once we have the list of URLs, we will utilize the Render API to download the
Financial_Report.xlsx
files. These files will be saved in thereports
folder on our local disk. The Render API allows us to retrieve any filing and exhibit on EDGAR, including the financial reports in Excel format.
The resulting folder structure will look similar to the following:
By following this tutorial, you will be able to download the financial statements from EDGAR filings for the specified companies and store them locally as Excel files.
Overview of Financial_Report.xlsx
The Financial_Report.xlsx
Excel file contains essential tables and footnotes extracted from 10-K and 10-Q EDGAR filings. These financial statements are also available in XBRL format. The Excel file includes various sheets that provide comprehensive information, such as:
- Cover page: Contains general information about the filing.
- Income statement (also known as CONDENSED CONSOLIDATED STATEMENTS OF OPERATIONS): Presents the company's revenue, expenses, and net income or loss for a specific period.
- Balance sheet: Shows the company's assets, liabilities, and shareholders' equity at a specific point in time.
- Cash flow statement: Illustrates the company's cash inflows and outflows from operating, investing, and financing activities.
- Summary of significant accounting policies: Describes the company's accounting principles and policies.
- Footnotes to all financial statements: Provides additional information and explanations for the financial data.
- Disaggregated revenues by product and services: Breaks down the company's revenues by different product lines or services.
Here is an example of the income statement sheet from the Financial_Report.xlsx
file attached to Apple's 10-Q filing for Q1 2023. You can access the Excel file here.
You can find the corresponding 10-Q filing here.
Another example from the same file is the net sales disaggregated by products and services, as shown below:
Getting Started
To begin, we need to install the sec-api
Python package. This package provides the necessary functionalities for interacting with the Query API. Once installed, you can use the QueryApi
class from the sec-api
package to make API calls and retrieve filing metadata for 10-K and 10-Q filings.
To install the sec-api
package, you can use the following command:
!pip install -q sec-api
After installing the package, you can instantiate the QueryApi
class by providing your API key. The API key is required to authenticate your requests to the Query API. Once instantiated, you can utilize the .get_filings(search_query)
function provided by the QueryApi
class to make API calls and retrieve the desired filing metadata for 10-K and 10-Q filings, or any other filing of interest.
API_KEY = 'YOUR_API_KEY'
from sec_api import QueryApi
queryApi = QueryApi(api_key=API_KEY)
Find URLs of All Financial Reports Excel Files
In the first step, we will retrieve the metadata for 10-K and 10-Q filings, including information such as the ticker and name of the filer, filed at date, form type, URL to the filing, and exhibits.
To accomplish this, we will utilize the Query API, which allows us to search the EDGAR database using various metadata fields. By constructing a Lucene search query, we can retrieve up to 200 objects that represent the metadata for each filing. To ensure that we retrieve all matching objects and not just the first 200, we have implemented the get_filings(query)
function, which handles pagination.
The search expression is written in Lucene syntax. The Lucene queries used to build the list of URLs for the Excel files associated with each company and EDGAR filing consist of three main parts:
Form Type Filter
The form type filter searches for all 10-K and 10-Q EDGAR filings while excluding amended filings (10-K/A and 10-Q/A) as well as NT filings (notifications of late filings).
formType:("10-K", "10-Q") AND NOT formType:("10-K/A", "10-Q/A", NT)
Company Filter
The company or ticker filter matches the metadata field ticker
with one of the following tickers: MSFT, TSLA, AMZN, NFLX, or AAPL.
ticker:(MSFT, TSLA, AMZN, NFLX, AAPL)
Date Range Filter
The date range filter searches for filings filed between 2022-01-01
and 2022-12-31
, including both dates.
filedAt:[2022-01-01 TO 2022-12-31]
By combining these filters using the AND operator, we can construct the final search query.
For more information on the Lucene query syntax, you can refer to the overview of the Lucene query syntax here.
import pandas as pd
def get_filings(query):
from_param = 0
size_param = 200
all_filings = []
while True:
query['from'] = from_param
query['size'] = size_param
response = queryApi.get_filings(query)
filings = response['filings']
if len(filings) == 0:
break
all_filings.extend(filings)
from_param += size_param
return pd.json_normalize(all_filings)
form_type_query = 'formType:("10-K", "10-Q") AND NOT formType:("10-K/A", "10-Q/A", NT)'
ticker_query = 'ticker:(MSFT, TSLA, AMZN, NFLX, AAPL)'
date_range_query = 'filedAt:[2022-01-01 TO 2022-12-31]'
lucene_query = f"{form_type_query} AND {ticker_query} AND {date_range_query}"
search_query = {
"query": lucene_query,
"from": "0",
"size": "200",
"sort": [{ "filedAt": { "order": "desc" } }]
}
filings = get_filings(search_query)
filings.head()
id | accessionNo | cik | ticker | companyName | companyNameLong | formType | description | filedAt | linkToTxt | linkToHtml | linkToXbrl | linkToFilingDetails | entities | documentFormatFiles | dataFiles | seriesAndClassesContractsInformation | periodOfReport | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 83331c1958a4c0a2718d7c47eaf6157c | 0001018724-22-000023 | 1018724 | AMZN | AMAZON COM INC | AMAZON COM INC (Filer) | 10-Q | Form 10-Q - Quarterly report [Sections 13 or 1... | 2022-10-27T18:23:57-04:00 | https://www.sec.gov/Archives/edgar/data/101872... | https://www.sec.gov/Archives/edgar/data/101872... | https://www.sec.gov/Archives/edgar/data/101872... | [{'companyName': 'AMAZON COM INC (Filer)', 'ci... | [{'sequence': '1', 'description': '10-Q', 'doc... | [{'sequence': '6', 'description': 'XBRL TAXONO... | [] | 2022-09-30 | |
1 | 9b58a0b3d70dc9de06bdf3d560b3317a | 0000320193-22-000108 | 320193 | AAPL | Apple Inc. | Apple Inc. (Filer) | 10-K | Form 10-K - Annual report [Section 13 and 15(d... | 2022-10-27T18:01:14-04:00 | https://www.sec.gov/Archives/edgar/data/320193... | https://www.sec.gov/Archives/edgar/data/320193... | https://www.sec.gov/Archives/edgar/data/320193... | [{'companyName': 'Apple Inc. (Filer)', 'cik': ... | [{'sequence': '1', 'description': '10-K', 'doc... | [{'sequence': '8', 'description': 'XBRL TAXONO... | [] | 2022-09-24 | |
2 | 4a746e26f404299c8bb632a6c864b1fa | 0001564590-22-035087 | 789019 | MSFT | MICROSOFT CORP | MICROSOFT CORP (Filer) | 10-Q | Form 10-Q - Quarterly report [Sections 13 or 1... | 2022-10-25T16:08:55-04:00 | https://www.sec.gov/Archives/edgar/data/789019... | https://www.sec.gov/Archives/edgar/data/789019... | https://www.sec.gov/Archives/edgar/data/789019... | [{'companyName': 'MICROSOFT CORP (Filer)', 'ci... | [{'sequence': '1', 'description': '10-Q', 'doc... | [{'sequence': '9', 'description': 'XBRL TAXONO... | [] | 2022-09-30 | |
3 | 9b09ba89105cd748ffc3dc4958a13f37 | 0000950170-22-019867 | 1318605 | TSLA | Tesla, Inc. | Tesla, Inc. (Filer) | 10-Q | Form 10-Q - Quarterly report [Sections 13 or 1... | 2022-10-24T06:08:50-04:00 | https://www.sec.gov/Archives/edgar/data/131860... | https://www.sec.gov/Archives/edgar/data/131860... | https://www.sec.gov/Archives/edgar/data/131860... | [{'companyName': 'Tesla, Inc. (Filer)', 'cik':... | [{'sequence': '1', 'description': '10-Q', 'doc... | [{'sequence': '5', 'description': 'XBRL TAXONO... | [] | 2022-09-30 | |
4 | 8248fa5382cb23fca66d1a1653dbeb15 | 0001065280-22-000368 | 1065280 | NFLX | NETFLIX INC | NETFLIX INC (Filer) | 10-Q | Form 10-Q - Quarterly report [Sections 13 or 1... | 2022-10-20T16:04:17-04:00 | https://www.sec.gov/Archives/edgar/data/106528... | https://www.sec.gov/Archives/edgar/data/106528... | https://www.sec.gov/Archives/edgar/data/106528... | [{'companyName': 'NETFLIX INC (Filer)', 'cik':... | [{'sequence': '1', 'description': '10-Q', 'doc... | [{'sequence': '6', 'description': 'XBRL TAXONO... | [] | 2022-09-30 |
Now that you have created the complete list of filing metadata and saved it in the dataframe filings
, let's convert the metadata of all the found filings into a new dataframe called urls
. This new dataframe will contain the following information for each filing:
- Ticker
- Form type
- Filed at date
- Filing URL
- URL to the corresponding
Financial_Report.xlsx
file
urls = filings[['ticker',
'formType',
'periodOfReport',
'filedAt',
'linkToFilingDetails']].rename(columns={'linkToFilingDetails': 'filingUrl'})
urls['financialReportsUrl'] = urls['filingUrl'].apply(lambda url: '/'.join(url.split('/')[:-1]) + '/Financial_Report.xlsx')
urls.head(10)
ticker | formType | periodOfReport | filedAt | filingUrl | financialReportsUrl | |
---|---|---|---|---|---|---|
0 | AMZN | 10-Q | 2022-09-30 | 2022-10-27T18:23:57-04:00 | https://www.sec.gov/Archives/edgar/data/101872... | https://www.sec.gov/Archives/edgar/data/101872... |
1 | AAPL | 10-K | 2022-09-24 | 2022-10-27T18:01:14-04:00 | https://www.sec.gov/Archives/edgar/data/320193... | https://www.sec.gov/Archives/edgar/data/320193... |
2 | MSFT | 10-Q | 2022-09-30 | 2022-10-25T16:08:55-04:00 | https://www.sec.gov/Archives/edgar/data/789019... | https://www.sec.gov/Archives/edgar/data/789019... |
3 | TSLA | 10-Q | 2022-09-30 | 2022-10-24T06:08:50-04:00 | https://www.sec.gov/Archives/edgar/data/131860... | https://www.sec.gov/Archives/edgar/data/131860... |
4 | NFLX | 10-Q | 2022-09-30 | 2022-10-20T16:04:17-04:00 | https://www.sec.gov/Archives/edgar/data/106528... | https://www.sec.gov/Archives/edgar/data/106528... |
5 | AMZN | 10-Q | 2022-06-30 | 2022-07-28T18:24:06-04:00 | https://www.sec.gov/Archives/edgar/data/101872... | https://www.sec.gov/Archives/edgar/data/101872... |
6 | AAPL | 10-Q | 2022-06-25 | 2022-07-28T18:06:56-04:00 | https://www.sec.gov/Archives/edgar/data/320193... | https://www.sec.gov/Archives/edgar/data/320193... |
7 | MSFT | 10-K | 2022-06-30 | 2022-07-28T16:06:19-04:00 | https://www.sec.gov/Archives/edgar/data/789019... | https://www.sec.gov/Archives/edgar/data/789019... |
8 | TSLA | 10-Q | 2022-06-30 | 2022-07-25T06:07:25-04:00 | https://www.sec.gov/Archives/edgar/data/131860... | https://www.sec.gov/Archives/edgar/data/131860... |
9 | NFLX | 10-Q | 2022-06-30 | 2022-07-21T16:02:17-04:00 | https://www.sec.gov/Archives/edgar/data/106528... | https://www.sec.gov/Archives/edgar/data/106528... |
Download Financial_reports.xlsx
of All EDGAR Filings
The Render API provides a convenient way to download any EDGAR filing or exhibit, allowing for up to 40 downloads per second. It offers a straightforward interface to facilitate the download process.
For instance, if we consider the original URL of Apple's Financial_Report.xlsx
file:
sec.gov/Archives/edgar/data/320193/000032019323000064/Financial_Report.xlsx
To download the file using the Render API, we can make a simple modification to the URL:
archive.sec-api.io/320193/000032019323000064/Financial_Report.xlsx
In essence, we replace sec.gov/Archives/edgar/data
with archive.sec-api.io
in the URL.
To download the file and save it to our local disk, we can utilize the requests.get(URL)
function. The download_report(filing)
function serves as a convenient wrapper that encapsulates this entire logic.
import os, requests
folder_path = './reports'
if not os.path.exists(folder_path):
os.makedirs(folder_path)
def download_report(filing):
reports_path = filing['financialReportsUrl'].replace('https://www.sec.gov/Archives/edgar/data/', '')
base_url = 'https://archive.sec-api.io/' + reports_path
render_api_url = base_url + '?token=' + API_KEY
response = requests.get(render_api_url)
file_name = f"{filing['ticker']}-{filing['periodOfReport']}-{filing['formType']}.xlsx"
file_path = f"{folder_path}/{file_name}"
output = open(file_path, 'wb')
output.write(response.content)
output.close()
Python typically executes commands in a blocking manner, allowing only one download at a time. Consequently, downloading thousands of financial statements would be a time-consuming process.
However, you can optimize the task by parallelizing it using the pandarallel
package. This package offers a convenient way to parallelize operations on dataframes, reducing the need for additional boilerplate code.
To illustrate the concept, let's download four reports at a time in parallel by setting the number_of_workers
parameter to 4
.
!pip install -q pandarallel
from pandarallel import pandarallel
number_of_workers = 4
pandarallel.initialize(progress_bar=True, nb_workers=number_of_workers, verbose=0)
urls.parallel_apply(download_report, axis=1)
print(f"✅ Downloaded {len(urls)} reports")
VBox(children=(HBox(children=(IntProgress(value=0, description='0.00%', max=5), Label(value='0 / 5'))), HBox(c…
✅ Downloaded 20 reports