sec-api.ioSEC API by D2V
FilingsPricingSandboxDocs
Log inGet Free API Key
  1. Home
  2. Tutorials

Download SEC Filings from EDGAR

Open In Colab   Download Notebook

On this page:

  • Getting Started
    • Create a List of URLs for All EDGAR Filings
      • Define the Filing Search Query
        • Response of the Query API
          • Create a List of 10-K URLs on EDGAR
            • Create a List of 10-Q URLs on EDGAR
            • Download EDGAR Filings to Disk
              • Download EDGAR Filings as PDFs

                This Python tutorial will guide you through the process of downloading SEC filings from the EDGAR database and saving them to your local disk. By following this tutorial, you will be able to download filings without being blocked by sec.gov, as we will utilize the Query API and Render API provided by sec-api.io. No prior Python experience is required, and you can run the example in your browser using the "Open in Colab" button.

                The tutorial consists of two main steps:

                1. Building a List of HTML Filings URLs: Using the Query API, you will search and filter the EDGAR database to create a list of URLs for all the HTML filings. This list will be saved to your local disk.
                2. Downloading and Scraping the Filings: Using the Render API, you will download and scrape the filings, saving them to the filings folder on your local disk. This step allows you to download up to 40 filings in parallel while accessing the filings directly from sec-api.io servers, avoiding any issues with being blocked by sec.gov servers. We will also cover how to download all filings as PDF files.

                The tutorial focuses on downloading 10-K filings filed between 2020 and 2022, but you can adjust the search criteria and date range as needed. Additionally, the examples provided can be adapted to download other form types, filing exhibits, and XBRL files.

                Following is an example of the folder structure for the downloaded filings:

                10-K Folder Structure

                Please note that downloading and parsing master.idx index files is not required for this tutorial.

                Let's get started with downloading SEC filings from EDGAR!

                Getting Started

                To begin, we need to install the sec-api Python package, which will enable us to utilize the Query API and Render API for accessing and downloading SEC filings from the EDGAR database.

                The Query API allows us to filter the EDGAR database using different search criteria, such as form types, filing dates, tickers, and more. On the other hand, the Render API enables us to download any EDGAR filing or exhibit at a speed of up to 40 downloads per second.

                !pip install -q sec-api
                API_KEY = 'YOUR_API_KEY'

                Create a List of URLs for All EDGAR Filings

                To obtain the URLs of all EDGAR filings that match our search criteria, we will utilize the Query API. This API allows us to search and filter all filings on the EDGAR database filed since 1994 using various parameters, such as form types, filing dates, tickers, and more. By defining a search query, we can retrieve the metadata of all filings that meet our specified criteria.

                Please refer to the full documentation of the Query API to learn more about all available search parameters.

                Define the Filing Search Query

                Defining a search query is straightforward. For example, to retrieve all 10-K filings, we can use the following query:

                formType:"10-K"

                It's important to note that this search query will also include "10-K/A" (amended 10-K) and "NT 10-K" (notification of inability to timely file Form 10-K) filings. If you want to exclude these types, you can modify the query as follows:

                formType:"10-K" AND NOT formType:("10-K/A", NT)

                Additionally, you can narrow down the search by specifying a filing date range. For example, to search for filings filed between January 1, 2020, and December 31, 2020, you can use the following search term:

                filedAt:[2020-01-01 TO 2020-12-31]

                The search query follows the Lucene syntax. For more information on building complex search terms and using Lucene, refer to the Lucene Query Syntax Overview documentation.

                Response of the Query API

                The Query API will return the metadata of all filings that match our search query. Each filing's metadata contains various information, including:

                • formType: The EDGAR form type (e.g., 10-K, 10-Q).
                • cik: The Central Index Key (CIK) of the filer, with trailing zeros removed.
                • companyName: The name of the filer.
                • linkToFilingDetails: The URL to the HTML version of the filing on EDGAR.
                • linkToHtml: The URL to the index page of the filing, which lists all attachments, exhibits, XBRL files, images, and more.
                • filedAt: The date and time when the filing was accepted by the EDGAR system.
                • periodOfReport: The reporting period covered by the filing.
                • documentFormatFiles: An array of primary files associated with the filing, including the filing itself and additional exhibits or documents.
                • dataFiles: A list of data files attached to the filing, such as XBRL files.

                These are just a few key parameters of the metadata. For a comprehensive list of available response parameters per filing, refer to the Query API documentation.

                Create a List of 10-K URLs on EDGAR

                To create a comprehensive list of URLs for all 10-K filings on EDGAR, we need to handle pagination, use date range filters, and iterate month by month to avoid hitting the maximum response limit of 10,000 filings per search universe.

                The code provided demonstrates how to achieve this. Here are the key components explained:

                1. The get_10K_metadata() function takes start_year and end_year parameters to define the range of years for which we want to retrieve the metadata.

                2. Within the get_10K_metadata() function, a nested loop iterates over the specified years and months. For each month, a Lucene query is constructed using the date_range_filter and form_type_filter variables. The date_range_filter ensures that only filings within the specified month are included, while the form_type_filter excludes amended filings (10-K/A) and notifications (NT).

                3. The query_from and query_size variables are initialized to handle pagination. The query_from parameter represents the offset or starting position in the search results, and the query_size parameter determines the number of filings to retrieve per request.

                4. The while True loop ensures that all filings are fetched by incrementing the query_from value and retrieving the next set of filings until no more matches are returned.

                5. The metadata of each filing is extracted and stored in a dataframe. The standardize_filing_url() function is used to remove the ix?doc=/ part from the URL that links to the iXBRL reader instead of the original HTML filing.

                6. The resulting dataframe is appended to the frames list, and the number of downloaded objects is tracked.

                7. After iterating through all specified years and months, the frames are concatenated into a single dataframe called result. Any entries without a ticker symbol are removed.

                from sec_api import QueryApi

                queryApi = QueryApi(api_key=API_KEY)
                import pandas as pd


                def standardize_filing_url(url):
                  return url.replace('ix?doc=/', '')


                def get_10K_metadata(start_year = 2021, end_year = 2022):
                  frames = []

                  for year in range(start_year, end_year + 1):
                    number_of_objects_downloaded = 0

                    for month in range(1, 13):
                      padded_month = str(month).zfill(2) # "1" -> "01"
                      date_range_filter = f'filedAt:[{year}-{padded_month}-01 TO {year}-{padded_month}-31]'
                      form_type_filter = f'formType:"10-K" AND NOT formType:("10-K/A", NT)'
                      lucene_query = date_range_filter + ' AND ' + form_type_filter

                      query_from = 0
                      query_size = 200

                      while True:
                        query = {
                          "query": lucene_query,
                          "from": query_from,
                          "size": query_size,
                          "sort": [{ "filedAt": { "order": "desc" } }]
                        }

                        response = queryApi.get_filings(query)
                        filings = response['filings']

                        if len(filings) == 0:
                          break
                        else:
                          query_from += query_size

                        metadata = list(map(lambda f: {'ticker': f['ticker'],
                                                       'cik': f['cik'],
                                                       'formType': f['formType'],
                                                       'filedAt': f['filedAt'],
                                                       'filingUrl': f['linkToFilingDetails']
                                                      }, filings))

                        df = pd.DataFrame.from_records(metadata)
                        # remove all entries without a ticker symbol
                        df = df[df['ticker'].str.len() > 0]
                        df['filingUrl'] = df['filingUrl'].apply(standardize_filing_url)
                        frames.append(df)
                        number_of_objects_downloaded += len(df)

                    print(f'✅ Downloaded {number_of_objects_downloaded} metadata objects for year {year}')

                  result = pd.concat(frames)

                  print(f'✅ Download completed. Metadata downloaded for {len(result)} filings.')

                  return result
                metadata_10K = get_10K_metadata(start_year=2020, end_year=2022)
                ✅ Downloaded 5019 metadata objects for year 2020
                ✅ Downloaded 5890 metadata objects for year 2021
                ✅ Downloaded 6454 metadata objects for year 2022
                ✅ Download completed. Metadata downloaded for 17363 filings.
                metadata_10K
                Out:
                tickercikformTypefiledAtfilingUrl
                0DOMH1223910-K2020-01-31T18:42:32-05:00https://www.sec.gov/Archives/edgar/data/12239/...
                1SCRH83148910-K2020-01-31T17:25:50-05:00https://www.sec.gov/Archives/edgar/data/831489...
                2EBAY106508810-K2020-01-31T16:53:51-05:00https://www.sec.gov/Archives/edgar/data/106508...
                4BA1292710-K2020-01-31T13:23:40-05:00https://www.sec.gov/Archives/edgar/data/12927/...
                5NOBH7220510-K2020-01-31T11:54:47-05:00https://www.sec.gov/Archives/edgar/data/72205/...
                ..................
                154TGL190595610-K2022-12-05T16:38:57-05:00https://www.sec.gov/Archives/edgar/data/190595...
                155DLHC78555710-K2022-12-05T16:16:18-05:00https://www.sec.gov/Archives/edgar/data/785557...
                156VERU86389410-K2022-12-05T15:23:56-05:00https://www.sec.gov/Archives/edgar/data/863894...
                157MCLE182785510-K2022-12-02T16:27:58-05:00https://www.sec.gov/Archives/edgar/data/182785...
                159RGCO106953310-K2022-12-02T14:47:39-05:00https://www.sec.gov/Archives/edgar/data/106953...

                17363 rows × 5 columns

                You can save the entire list of URLs of all 10-K filings to a CSV file named metadata_10K.csv using the following command:

                metadata_10K.to_csv('metadata_10K.csv', index=False)

                Let's inspect the downloaded metadata by displaying all 10-K filings filed by Apple. We expect to see three filings, and we can verify this by executing the following code:

                metadata_10K[metadata_10K['ticker'] == 'AAPL']
                Out:
                tickercikformTypefiledAtfilingUrl
                4AAPL32019310-K2020-10-29T18:06:25-04:00https://www.sec.gov/Archives/edgar/data/320193...
                12AAPL32019310-K2021-10-28T18:04:28-04:00https://www.sec.gov/Archives/edgar/data/320193...
                28AAPL32019310-K2022-10-27T18:01:14-04:00https://www.sec.gov/Archives/edgar/data/320193...

                Create a List of 10-Q URLs on EDGAR

                To create a list of URLs for all 10-Q filings on EDGAR, you can update the form_type_filter in the get_10K_metadata(start_year, end_year) function to include the desired form type. The resulting Lucene search query would look like this:

                formType:"10-Q" AND NOT formType:("10-Q/A", NT)

                You can rename the get_10K_metadata function to get_10Q_metadata to reflect the change in form type. The rest of the function remains the same.

                def get_10Q_metadata(start_year = 2021, end_year = 2022):
                  frames = []

                  for year in range(start_year, end_year + 1):
                    number_of_objects_downloaded = 0

                    for month in range(1, 13):
                      padded_month = str(month).zfill(2) # "1" -> "01"
                      date_range_filter = f'filedAt:[{year}-{padded_month}-01 TO {year}-{padded_month}-31]'
                      form_type_filter = f'formType:"10-Q" AND NOT formType:("10-Q/A", NT)'
                      lucene_query = date_range_filter + ' AND ' + form_type_filter

                      query_from = 0
                      query_size = 200

                      while True:
                        query = {
                          "query": lucene_query,
                          "from": query_from,
                          "size": query_size,
                          "sort": [{ "filedAt": { "order": "desc" } }]
                        }

                        response = queryApi.get_filings(query)
                        filings = response['filings']

                        if len(filings) == 0:
                          break
                        else:
                          query_from += query_size

                        metadata = list(map(lambda f: {'ticker': f['ticker'],
                                                       'cik': f['cik'],
                                                       'formType': f['formType'],
                                                       'filedAt': f['filedAt'],
                                                       'filingUrl': f['linkToFilingDetails']
                                                      }, filings))

                        df = pd.DataFrame.from_records(metadata)
                        # remove all entries without a ticker symbol
                        df = df[df['ticker'].str.len() > 0]
                        df['filingUrl'] = df['filingUrl'].apply(standardize_filing_url)
                        frames.append(df)
                        number_of_objects_downloaded += len(df)

                    print(f'✅ Downloaded {number_of_objects_downloaded} metadata objects for year {year}')

                  result = pd.concat(frames)

                  print(f'✅ Download completed. Metadata downloaded for {len(result)} filings.')

                  return result
                metadata_10Q = get_10Q_metadata(start_year=2020, end_year=2020)
                ✅ Downloaded 15638 metadata objects for year 2020
                ✅ Download completed. Metadata downloaded for 15638 filings.
                metadata_10Q
                Out:
                tickercikformTypefiledAtfilingUrl
                1SOBR142562710-Q2020-01-31T17:38:31-05:00https://www.sec.gov/Archives/edgar/data/142562...
                2BTTR147172710-Q2020-01-31T17:19:14-05:00https://www.sec.gov/Archives/edgar/data/147172...
                3KOSS5670110-Q2020-01-31T16:37:01-05:00https://www.sec.gov/Archives/edgar/data/56701/...
                4FLEX86637410-Q2020-01-31T16:24:59-05:00https://www.sec.gov/Archives/edgar/data/866374...
                5CVCO27816610-Q2020-01-31T16:21:17-05:00https://www.sec.gov/Archives/edgar/data/278166...
                ..................
                181HOME164622810-Q2020-12-02T06:29:18-05:00https://www.sec.gov/Archives/edgar/data/164622...
                182KDCE104901110-Q2020-12-01T19:18:45-05:00https://www.sec.gov/Archives/edgar/data/104901...
                183NIHK108447510-Q2020-12-01T14:09:58-05:00https://www.sec.gov/Archives/edgar/data/108447...
                184TJX10919810-Q2020-12-01T11:19:23-05:00https://www.sec.gov/Archives/edgar/data/109198...
                185GSGG166852310-Q2020-12-01T11:11:48-05:00https://www.sec.gov/Archives/edgar/data/166852...

                15638 rows × 5 columns

                Download EDGAR Filings to Disk

                In this final step, we will create a download_filing(metadata) function that uses the get_filing(filing_url) method from the RenderApi class to download the content of the filings. Each filing will be saved in a folder named after the corresponding ticker. The file name will include the filing date, form type, and the original name of the file on EDGAR.

                The resulting folder structure is going to look like this:

                10-K Folder Structure

                To speed up the download process, we will utilize the pandarallel package, which allows us to apply the download_filing function in parallel to multiple rows. By specifying the number_of_workers, we can control the number of workers running in parallel. It's important to note that setting a high number of workers may lead to rate limit issues with the Render API, so it's recommended to choose a reasonable value.

                from sec_api import RenderApi

                renderApi = RenderApi(api_key=API_KEY)
                import os

                def download_filing(metadata):
                  ticker = metadata['ticker']
                  url = metadata['filingUrl']

                  try:
                    new_folder = './filings/' + ticker
                    date = metadata['filedAt'][:10]
                    file_name = date + '_' + metadata['formType'] + '_' + url.split('/')[-1]

                    if not os.path.isdir(new_folder):
                      os.makedirs(new_folder)

                    file_content = renderApi.get_filing(url)

                    with open(new_folder + "/" + file_name, "w") as f:
                      f.write(file_content)
                  except:
                     print(f"❌ {ticker}: downloaded failed: {url}")
                download_filing(metadata_10K.iloc[0])
                print('✅ Sample 10-K filing downloaded for {}'.format(metadata_10K.iloc[0]['ticker']))
                ✅ Sample 10-K filing downloaded for DOMH
                !pip install -q pandarallel
                from pandarallel import pandarallel

                number_of_workers = 4
                pandarallel.initialize(progress_bar=True, nb_workers=number_of_workers, verbose=0)
                # uncomment to run a quick sample and download 500 filings
                sample = metadata_10K.sort_values('ticker').head(500)
                sample.parallel_apply(download_filing, axis=1)

                # download all filings
                # metadata_10K.parallel_apply(download_filing, axis=1)

                print('✅ Download completed')
                VBox(children=(HBox(children=(IntProgress(value=0, description='0.00%', max=125), Label(value='0 / 125'))), HB…
                ✅ Download completed

                Download EDGAR Filings as PDFs

                In this section, we demonstrate how to download the 10-K filings as PDF files using the Render API, which can convert HTML filings into PDF versions.

                The code uses the requests library to make a GET request to the PDF Generator API (PDF_GENERATOR_API) with the appropriate parameters. The API converts the HTML filing into a PDF file, which is then streamed and saved to the specified folder location.

                Note that response.raise_for_status() is used to check the status of the API response and raise an exception if an error occurs during the request. This helps handle any potential errors during the download process.

                The code example demonstrates downloading a sample of 10 filings in parallel. You can adjust the number of filings to download by modifying the sample2 dataframe.

                import requests

                PDF_GENERATOR_API = 'https://api.sec-api.io/filing-reader'

                def download_pdf(metadata):
                  ticker = metadata['ticker']
                  filing_url = metadata['filingUrl']

                  try:
                    new_folder = './filings/' + ticker
                    date = metadata['filedAt'][:10]
                    file_name = date + '_' + metadata['formType'] + '_' + filing_url.split('/')[-1] + '.pdf'

                    if not os.path.isdir(new_folder):
                      os.makedirs(new_folder)

                    api_url = f"{PDF_GENERATOR_API}?token={API_KEY}&type=pdf&url={filing_url}"
                    response = requests.get(api_url, stream=True)
                    response.raise_for_status()

                    with open(new_folder + "/" + file_name, "wb") as file:
                      for chunk in response.iter_content(chunk_size=8192):
                          file.write(chunk)
                  except:
                     print(f"❌ {ticker}: downloaded failed: {filing_url}")
                sample2 = metadata_10K.sort_values('ticker').head(10)
                sample2.parallel_apply(download_pdf, axis=1)

                # download all filings as PDFs
                # metadata_10K.parallel_apply(download_pdf, axis=1)

                print('✅ Download completed')
                VBox(children=(HBox(children=(IntProgress(value=0, description='0.00%', max=3), Label(value='0 / 3'))), HBox(c…
                ✅ Download completed

                Footer

                Products

                • EDGAR Filing Search API
                • Full-Text Search API
                • Real-Time Filing Stream API
                • Filing Download & PDF Generator API
                • XBRL-to-JSON Converter
                • 10-K/10-Q/8-K Item Extractor
                • Investment Adviser & Form ADV API
                • Insider Trading Data - Form 3, 4, 5
                • Restricted Sales Notifications - Form 144
                • Institutional Holdings - Form 13F
                • Form N-PORT API - Investment Company Holdings
                • Form N-PX API - Proxy Voting Records
                • Form 13D/13G API
                • Form S-1/424B4 - IPOs, Debt & Rights Offerings
                • Form C - Crowdfunding Offerings
                • Form D - Private Placements & Exempt Offerings
                • Regulation A Offering Statements API
                • Changes in Auditors & Accountants
                • Non-Reliance on Prior Financial Statements
                • Executive Compensation Data API
                • Directors & Board Members Data
                • Company Subsidiaries Database
                • Outstanding Shares & Public Float
                • SEC Enforcement Actions
                • Accounting & Auditing Enforcement Releases (AAERs)
                • SRO Filings
                • CIK, CUSIP, Ticker Mapping

                General

                • Pricing
                • Features
                • Supported Filings
                • EDGAR Filing Statistics

                Account

                • Sign Up - Start Free Trial
                • Log In
                • Forgot Password

                Developers

                • API Sandbox
                • Documentation
                • Resources & Tutorials
                • Python API SDK
                • Node.js API SDK

                Legal

                • Terms of Service
                • Privacy Policy

                Legal

                • Terms of Service
                • Privacy Policy

                SEC API

                © 2025 sec-api.io by Data2Value GmbH. All rights reserved.

                SEC® and EDGAR® are registered trademarks of the U.S. Securities and Exchange Commission (SEC).

                EDGAR is the Electronic Data Gathering, Analysis, and Retrieval system operated by the SEC.

                sec-api.io and Data2Value GmbH are independent of, and not affiliated with, sponsored by, or endorsed by the U.S. Securities and Exchange Commission.

                sec-api.io is classified under SIC code 7375 (Information Retrieval Services), providing on-demand access to structured data and online information services.