Download Financial Statements from EDGAR Filings as Excel File

On this page:

Overview of Financial_Report.xlsx
Getting Started
Find URLs of All Financial Reports Excel Files
Download Financial_reports.xlsx of All EDGAR Filings

In this tutorial, we will guide you through the process of downloading the Financial_Report.xlsx Excel files attached to 10-K and 10-Q filings using Python.

Here is an overview of the steps involved:

Query API: We will start by using the Query API to filter the EDGAR database and build a list of URLs for the financial reports we are interested in. For this example, we will focus on a small universe of companies, including Microsoft, Amazon, Netflix, Apple, and Tesla. However, you can easily expand the search universe by adjusting the search query to include more companies or even all filings.
Render API: Once we have the list of URLs, we will utilize the Render API to download the Financial_Report.xlsx files. These files will be saved in the reports folder on our local disk. The Render API allows us to retrieve any filing and exhibit on EDGAR, including the financial reports in Excel format.

The resulting folder structure will look similar to the following:

Financial Reports Folder Structure

By following this tutorial, you will be able to download the financial statements from EDGAR filings for the specified companies and store them locally as Excel files.

Overview of `Financial_Report.xlsx`

The Financial_Report.xlsx Excel file contains essential tables and footnotes extracted from 10-K and 10-Q EDGAR filings. These financial statements are also available in XBRL format. The Excel file includes various sheets that provide comprehensive information, such as:

Cover page: Contains general information about the filing.
Income statement (also known as CONDENSED CONSOLIDATED STATEMENTS OF OPERATIONS): Presents the company's revenue, expenses, and net income or loss for a specific period.
Balance sheet: Shows the company's assets, liabilities, and shareholders' equity at a specific point in time.
Cash flow statement: Illustrates the company's cash inflows and outflows from operating, investing, and financing activities.
Summary of significant accounting policies: Describes the company's accounting principles and policies.
Footnotes to all financial statements: Provides additional information and explanations for the financial data.
Disaggregated revenues by product and services: Breaks down the company's revenues by different product lines or services.

Here is an example of the income statement sheet from the Financial_Report.xlsx file attached to Apple's 10-Q filing for Q1 2023. You can access the Excel file here.

Example Excel File with Financial Statements

You can find the corresponding 10-Q filing here.

Another example from the same file is the net sales disaggregated by products and services, as shown below:

Disaggregated Revenues

Getting Started

To begin, we need to install the sec-api Python package. This package provides the necessary functionalities for interacting with the Query API. Once installed, you can use the QueryApi class from the sec-api package to make API calls and retrieve filing metadata for 10-K and 10-Q filings.

To install the sec-api package, you can use the following command:

!pip install -q sec-api

After installing the package, you can instantiate the QueryApi class by providing your API key. The API key is required to authenticate your requests to the Query API. Once instantiated, you can utilize the .get_filings(search_query) function provided by the QueryApi class to make API calls and retrieve the desired filing metadata for 10-K and 10-Q filings, or any other filing of interest.

API_KEY = 'YOUR_API_KEY'

from sec_api import QueryApi

queryApi = QueryApi(api_key=API_KEY)

Find URLs of All Financial Reports Excel Files

In the first step, we will retrieve the metadata for 10-K and 10-Q filings, including information such as the ticker and name of the filer, filed at date, form type, URL to the filing, and exhibits.

To accomplish this, we will utilize the Query API, which allows us to search the EDGAR database using various metadata fields. By constructing a Lucene search query, we can retrieve up to 200 objects that represent the metadata for each filing. To ensure that we retrieve all matching objects and not just the first 200, we have implemented the get_filings(query) function, which handles pagination.

The search expression is written in Lucene syntax. The Lucene queries used to build the list of URLs for the Excel files associated with each company and EDGAR filing consist of three main parts:

Form Type Filter

The form type filter searches for all 10-K and 10-Q EDGAR filings while excluding amended filings (10-K/A and 10-Q/A) as well as NT filings (notifications of late filings).

formType:("10-K", "10-Q") AND NOT formType:("10-K/A", "10-Q/A", NT)

Company Filter

The company or ticker filter matches the metadata field ticker with one of the following tickers: MSFT, TSLA, AMZN, NFLX, or AAPL.

ticker:(MSFT, TSLA, AMZN, NFLX, AAPL)

Date Range Filter

The date range filter searches for filings filed between 2022-01-01 and 2022-12-31, including both dates.

filedAt:[2022-01-01 TO 2022-12-31]

By combining these filters using the AND operator, we can construct the final search query.

For more information on the Lucene query syntax, you can refer to the overview of the Lucene query syntax here.

import pandas as pd

def get_filings(query):
  from_param = 0
  size_param = 200
  all_filings = []

  while True:
    query['from'] = from_param
    query['size'] = size_param

    response = queryApi.get_filings(query)
    filings = response['filings']

    if len(filings) == 0:
      break

    all_filings.extend(filings)

    from_param += size_param

  return pd.json_normalize(all_filings)


form_type_query  = 'formType:("10-K", "10-Q") AND NOT formType:("10-K/A", "10-Q/A", NT)'
ticker_query     = 'ticker:(MSFT, TSLA, AMZN, NFLX, AAPL)'
date_range_query = 'filedAt:[2022-01-01 TO 2022-12-31]'

lucene_query = f"{form_type_query} AND {ticker_query} AND {date_range_query}"

search_query = {
  "query": lucene_query,
  "from": "0",
  "size": "200",
  "sort": [{ "filedAt": { "order": "desc" } }]
}

filings = get_filings(search_query)

filings.head()

Out:

	id	accessionNo	cik	ticker	companyName	companyNameLong	formType	description	filedAt	linkToTxt	linkToHtml	linkToFilingDetails	entities	documentFormatFiles	dataFiles	seriesAndClassesContractsInformation	periodOfReport
0	83331c1958a4c0a2718d7c47eaf6157c	0001018724-22-000023	1018724	AMZN	AMAZON COM INC	AMAZON COM INC (Filer)	10-Q	Form 10-Q - Quarterly report [Sections 13 or 1...	2022-10-27T18:23:57-04:00	https://www.sec.gov/Archives/edgar/data/101872...	https://www.sec.gov/Archives/edgar/data/101872...	https://www.sec.gov/Archives/edgar/data/101872...	[{'companyName': 'AMAZON COM INC (Filer)', 'ci...	[{'sequence': '1', 'description': '10-Q', 'doc...	[{'sequence': '6', 'description': 'XBRL TAXONO...	[]	2022-09-30
1	9b58a0b3d70dc9de06bdf3d560b3317a	0000320193-22-000108	320193	AAPL	Apple Inc.	Apple Inc. (Filer)	10-K	Form 10-K - Annual report [Section 13 and 15(d...	2022-10-27T18:01:14-04:00	https://www.sec.gov/Archives/edgar/data/320193...	https://www.sec.gov/Archives/edgar/data/320193...	https://www.sec.gov/Archives/edgar/data/320193...	[{'companyName': 'Apple Inc. (Filer)', 'cik': ...	[{'sequence': '1', 'description': '10-K', 'doc...	[{'sequence': '8', 'description': 'XBRL TAXONO...	[]	2022-09-24
2	4a746e26f404299c8bb632a6c864b1fa	0001564590-22-035087	789019	MSFT	MICROSOFT CORP	MICROSOFT CORP (Filer)	10-Q	Form 10-Q - Quarterly report [Sections 13 or 1...	2022-10-25T16:08:55-04:00	https://www.sec.gov/Archives/edgar/data/789019...	https://www.sec.gov/Archives/edgar/data/789019...	https://www.sec.gov/Archives/edgar/data/789019...	[{'companyName': 'MICROSOFT CORP (Filer)', 'ci...	[{'sequence': '1', 'description': '10-Q', 'doc...	[{'sequence': '9', 'description': 'XBRL TAXONO...	[]	2022-09-30
3	9b09ba89105cd748ffc3dc4958a13f37	0000950170-22-019867	1318605	TSLA	Tesla, Inc.	Tesla, Inc. (Filer)	10-Q	Form 10-Q - Quarterly report [Sections 13 or 1...	2022-10-24T06:08:50-04:00	https://www.sec.gov/Archives/edgar/data/131860...	https://www.sec.gov/Archives/edgar/data/131860...	https://www.sec.gov/Archives/edgar/data/131860...	[{'companyName': 'Tesla, Inc. (Filer)', 'cik':...	[{'sequence': '1', 'description': '10-Q', 'doc...	[{'sequence': '5', 'description': 'XBRL TAXONO...	[]	2022-09-30
4	8248fa5382cb23fca66d1a1653dbeb15	0001065280-22-000368	1065280	NFLX	NETFLIX INC	NETFLIX INC (Filer)	10-Q	Form 10-Q - Quarterly report [Sections 13 or 1...	2022-10-20T16:04:17-04:00	https://www.sec.gov/Archives/edgar/data/106528...	https://www.sec.gov/Archives/edgar/data/106528...	https://www.sec.gov/Archives/edgar/data/106528...	[{'companyName': 'NETFLIX INC (Filer)', 'cik':...	[{'sequence': '1', 'description': '10-Q', 'doc...	[{'sequence': '6', 'description': 'XBRL TAXONO...	[]	2022-09-30

Now that you have created the complete list of filing metadata and saved it in the dataframe filings, let's convert the metadata of all the found filings into a new dataframe called urls. This new dataframe will contain the following information for each filing:

Ticker
Form type
Filed at date
Filing URL
URL to the corresponding Financial_Report.xlsx file

urls = filings[['ticker',
                'formType',
                'periodOfReport',
                'filedAt',
                'linkToFilingDetails']].rename(columns={'linkToFilingDetails': 'filingUrl'})

urls['financialReportsUrl'] = urls['filingUrl'].apply(lambda url: '/'.join(url.split('/')[:-1]) + '/Financial_Report.xlsx')

urls.head(10)

Out:

	ticker	formType	periodOfReport	filedAt	filingUrl	financialReportsUrl
0	AMZN	10-Q	2022-09-30	2022-10-27T18:23:57-04:00	https://www.sec.gov/Archives/edgar/data/101872...	https://www.sec.gov/Archives/edgar/data/101872...
1	AAPL	10-K	2022-09-24	2022-10-27T18:01:14-04:00	https://www.sec.gov/Archives/edgar/data/320193...	https://www.sec.gov/Archives/edgar/data/320193...
2	MSFT	10-Q	2022-09-30	2022-10-25T16:08:55-04:00	https://www.sec.gov/Archives/edgar/data/789019...	https://www.sec.gov/Archives/edgar/data/789019...
3	TSLA	10-Q	2022-09-30	2022-10-24T06:08:50-04:00	https://www.sec.gov/Archives/edgar/data/131860...	https://www.sec.gov/Archives/edgar/data/131860...
4	NFLX	10-Q	2022-09-30	2022-10-20T16:04:17-04:00	https://www.sec.gov/Archives/edgar/data/106528...	https://www.sec.gov/Archives/edgar/data/106528...
5	AMZN	10-Q	2022-06-30	2022-07-28T18:24:06-04:00	https://www.sec.gov/Archives/edgar/data/101872...	https://www.sec.gov/Archives/edgar/data/101872...
6	AAPL	10-Q	2022-06-25	2022-07-28T18:06:56-04:00	https://www.sec.gov/Archives/edgar/data/320193...	https://www.sec.gov/Archives/edgar/data/320193...
7	MSFT	10-K	2022-06-30	2022-07-28T16:06:19-04:00	https://www.sec.gov/Archives/edgar/data/789019...	https://www.sec.gov/Archives/edgar/data/789019...
8	TSLA	10-Q	2022-06-30	2022-07-25T06:07:25-04:00	https://www.sec.gov/Archives/edgar/data/131860...	https://www.sec.gov/Archives/edgar/data/131860...
9	NFLX	10-Q	2022-06-30	2022-07-21T16:02:17-04:00	https://www.sec.gov/Archives/edgar/data/106528...	https://www.sec.gov/Archives/edgar/data/106528...

Download `Financial_reports.xlsx` of All EDGAR Filings

The Render API provides a convenient way to download any EDGAR filing or exhibit, allowing for up to 40 downloads per second. It offers a straightforward interface to facilitate the download process.

For instance, if we consider the original URL of Apple's Financial_Report.xlsx file:

sec.gov/Archives/edgar/data/320193/000032019323000064/Financial_Report.xlsx

To download the file using the Render API, we can make a simple modification to the URL:

archive.sec-api.io/320193/000032019323000064/Financial_Report.xlsx

In essence, we replace sec.gov/Archives/edgar/data with archive.sec-api.io in the URL.

To download the file and save it to our local disk, we can utilize the requests.get(URL) function. The download_report(filing) function serves as a convenient wrapper that encapsulates this entire logic.

import os, requests

folder_path = './reports'

if not os.path.exists(folder_path):
  os.makedirs(folder_path)


def download_report(filing):
  reports_path = filing['financialReportsUrl'].replace('https://www.sec.gov/Archives/edgar/data/', '')
  base_url = 'https://archive.sec-api.io/' + reports_path
  render_api_url = base_url + '?token=' + API_KEY

  response = requests.get(render_api_url)

  file_name = f"{filing['ticker']}-{filing['periodOfReport']}-{filing['formType']}.xlsx"
  file_path = f"{folder_path}/{file_name}"

  output = open(file_path, 'wb')
  output.write(response.content)
  output.close()

Python typically executes commands in a blocking manner, allowing only one download at a time. Consequently, downloading thousands of financial statements would be a time-consuming process.

However, you can optimize the task by parallelizing it using the pandarallel package. This package offers a convenient way to parallelize operations on dataframes, reducing the need for additional boilerplate code.

To illustrate the concept, let's download four reports at a time in parallel by setting the number_of_workers parameter to 4.

!pip install -q pandarallel

from pandarallel import pandarallel

number_of_workers = 4
pandarallel.initialize(progress_bar=True, nb_workers=number_of_workers, verbose=0)

urls.parallel_apply(download_report, axis=1)
print(f"✅ Downloaded {len(urls)} reports")

VBox(children=(HBox(children=(IntProgress(value=0, description='0.00%', max=5), Label(value='0 / 5'))), HBox(c…

✅ Downloaded 20 reports

Download Financial Statements from EDGAR Filings as Excel File

Overview of Financial_Report.xlsx

Getting Started

Find URLs of All Financial Reports Excel Files

Download Financial_reports.xlsx of All EDGAR Filings

Overview of `Financial_Report.xlsx`

Download `Financial_reports.xlsx` of All EDGAR Filings