EDGAR Index APIs
The EDGAR Index APIs provide access to SEC EDGAR filings and their attached files (exhibits, images, headers, etc.) from 1994 to present, and include the Ingestion Logs API, File Index Archive APIs, and Tar Archive APIs.
The APIs enable incremental synchronization, historical backfills, and full-content bulk downloads.
1
{
2
"lastUpdatedAt": "2026-01-16T21:54:18-05:00",
3
"total": { "value": 2709, "relation": "eq" },
4
"data": [
5
{
6
"accessionNo": "0001214659-26-000611",
7
"formType": "4",
8
"filedAt": "2026-01-16T21:53:50-05:00"
9
},
10
// ... more entries
11
{
12
"accessionNo": "0000000000-25-005071",
13
"formType": "LETTER",
14
"filedAt": "2025-05-13T11:36:12-04:00"
15
}
16
// ... more entries
17
]
18
}
Overview of Endpoints
Ingestion Logs (Daily Publication Index)
Track filings that were published and indexed by our systems on a specific day, providing accession number, form type, and filedAt timestamp.
GET https://api.sec-api.io/edgar-index/ingestion-log/YYYY-MM-DD
The date YYYY-MM-DD refers to the indexing/publication date of filings, not the filedAt timestamp (EDGAR acceptance). For example, SEC comment letters are often published weeks or months after acceptance.
Refer to the Ingestion Logs API section for full details.
File Index Archives (Metadata Only)
Retrieve URLs and metadata for all EDGAR files, not just filings.
GET https://api.sec-api.io/edgar-index/archive/files/index.jsonGET https://api.sec-api.io/edgar-index/archive/files/YYYY-MM-DD.jsonl.gz
Includes: filing documents, headers, index pages, exhibits, XML/XBRL files, images, submission text files, and all other published artifacts.
Refer to the File Index Archive APIs section for full details.
Tar Archives (Full Content)
Download the entire EDGAR directory tree for a given filedAt date, including file contents.
GET https://api.sec-api.io/edgar-index/archive/tar/index.jsonGET https://api.sec-api.io/edgar-index/archive/tar/YYYY-MM-DD.tar
All files inside a tar archive are individually gzip-compressed, and comprise the complete set of EDGAR filings and attached files for that date.
Which Endpoint Should You Use?
The table below summarizes the recommended endpoint combinations based on data type, execution cadence, and operational intent.
| Data Type | Execution Cadence | Use Case | Recommended Endpoint(s) |
|---|---|---|---|
| Metadata | Daily | Validate that your Stream API consumer successfully received all filings published on a specific date. | Ingestion Logs |
| Metadata | Daily | Verify that your Query API–based ingestion pipeline loaded all filing metadata for a specific publication date. | Ingestion Logs |
| Metadata | Daily | Nightly batch job that first retrieves all accession numbers published that day, then loads the complete metadata for each filing via the Query API. | Ingestion Logs + Query API |
| Metadata + Content | Daily | Identify specific form types published on a given date (e.g., all 424B2 prospectuses for selected entities) and download their full filing content to meet regulatory or exchange requirements. | Ingestion Logs + Download API |
| Content | Daily | Perform full daily replication of the EDGAR file tree—including filings, exhibits, index pages, header files, and XML/XBRL—typically for AI/ML/LLM training, fine-tuning, or inference workloads. | Tar Archive |
| Content | One-time | Backfill multiple years or decades of the complete EDGAR dataset. | Tar Archive |
| Content | One-time | Backfill specific form types (e.g., all 10-K and 10-Q filings over the past 15 years) without downloading unrelated files. | File Index Archives (or Query API) + Download API |
Practical Guidance
- Use Ingestion Logs for completeness guarantees and daily reconciliation. Avoid using Ingestion Logs for historical backfills. They are optimized for daily recurring synchronizations, not bulk history retrieval.
- Use File Index Archives when you need fine-grained, historical file selection without downloading unnecessary content.
- Use Tar Archives for high-throughput historical ingestion or full-content replication at scale.
Filing Metadata vs Filing Content
| Category | Definition | Examples | Sample |
|---|---|---|---|
| Metadata | Descriptive and structural information that identifies/classifies a filing and its associated files, without including the file contents themselves. | Accession number, form type, filing and acceptance timestamps (filedAt), reporting entity CIKs, file paths, document types, and URLs to filings and exhibits, etc. | {"accessionNo": "0001628280-25-045968", "formType": "10-Q", "...": "..."} |
| Content | The actual content data of a file, such as the entire text and tables in a filing, or the binary content of an image or PDF. | Filings (HTML, XML, TXT), extracted structured data (JSON), XBRL instance and taxonomy files, images, PDFs, Excel files, and other attached artifacts | Text: Our mission is to accelerate the world’s transition to sustainable energy. We design, develop, ... |
Ingestion Logs API
The Ingestion Logs API returns metadata of all EDGAR filings published and indexed by our systems at the specified date in JSON format. The API tracks filings that were published and indexed by our systems on a specific day, providing accession number, form type, and filedAt timestamp.
Coverage & Update Policy
- Coverage: All filings, from December 1, 2025 to present
- Update frequency: < 300 milliseconds; new log entries are added continuously throughout the day as filings are indexed.
Request Structure
Endpoint
HTTP Method: GET
Response Format: JSON
Replace YYYY-MM-DD with the date of interest, such as 2026-01-16. The date YYYY-MM-DD refers to the indexing/publication date of filings, not the filedAt timestamp (EDGAR acceptance). For example, SEC comment letters are often published weeks or months after acceptance; accordingly, the Ingestion Log for 2026-01-16 includes comment letters with filedAt dates of 2025-05-13, 2025-06-17, and 2025-08-28.
Example
Response Structure
The Ingestion Logs API returns a JSON object with the following keys:
lastUpdatedAt(string) – Timestamp indicating when the ingestion log for a givenYYYY-MM-DDdate was last updated, in RFC 3339 / ISO 8601 format (e.g.,2026-01-20T21:56:14-05:00). This value corresponds to the indexing time of the most recently processed filing for that date.total.value(number): Number of filings indexed on that date. Filings are de-duplicated and counted by accession number, e.g.2861.data(array of objects) - De-duplicated list of filings metadata (one entry per accession number). EDGAR filings that appear multiple times due to multiple reporting entities are normalized into a single entry. Each object has the following keys:accessionNo(string) - Accession number of the filing in the formatd{10}-d{2}-d{6}, e.g.0001062993-26-000350.formType(string) - Form type of the filing, e.g.SCHEDULE 13G.filedAt(datestring) - TheAcceptedattribute of a filing in ISO 8601 format, showing the date and time the filing was accepted by the EDGAR system, e.g.2026-01-20T21:55:54-05:00.
Example Response
Request to https://api.sec-api.io/edgar-index/ingestion-log/2026-01-16?token=YOUR_API_KEY:
1
{
2
"lastUpdatedAt": "2026-01-16T21:54:18-05:00",
3
"total": {
4
"value": 2709,
5
"relation": "eq"
6
},
7
"data": [
8
{
9
"accessionNo": "0001214659-26-000611",
10
"formType": "4",
11
"filedAt": "2026-01-16T21:53:50-05:00"
12
},
13
// ... more entries
14
{
15
"accessionNo": "0000000000-25-005071",
16
"formType": "LETTER",
17
"filedAt": "2025-05-13T11:36:12-04:00"
18
}
19
// ... more entries
20
]
21
}
File Index Archive APIs
The File Index Archive APIs provide access to URLs and metadata for all EDGAR files, not just filings. This includes filing documents, headers, index pages, exhibits, XML/XBRL files, images, submission text files, and all other published artifacts.
| Endpoint | Description | HTTP Method | Response Format |
|---|---|---|---|
/edgar-index/archive/files/index.json | Index file listing all available daily file index archives, including file paths, last update timestamps, file sizes, and object counts for each YYYY-MM-DD.jsonl.gz dataset. | GET | JSON |
/edgar-index/archive/files/YYYY-MM-DD.jsonl.gz | Gzip-compressed JSON Lines (JSONL) file containing URLs and metadata for all files published on EDGAR with a filedAt date matching the specified YYYY-MM-DD, including filings and all attached exhibits. | GET | JSONL (gzip-compressed) |
Coverage & Update Policy
- Form types: 100% (all EDGAR form types)
- Entities: 100% (issuers, filers, reporters; multi-entity filings supported)
- History: 1993/1994 to present
- Update frequency: Daily at 10:30 PM ET. At 10:30 PM, the dataset for that date is complete and safe to ingest.
/edgar-index/archive/files/index.json
Returns an index of all available daily file index archives. Each entry describes a gzip-compressed JSON Lines (YYYY-MM-DD.jsonl.gz) file containing metadata for every EDGAR file associated with filings whose filedAt date matches the YYYY-MM-DD date of the filename.
This index enables clients to detect which historical archive files have changed and need to be reprocessed.
The response is an array of objects with the following fields:
key(string) - Filename of the archive, for example2025-12-01.jsonl.gz.updatedAt(string) - ISO 8601 (RFC 3339) timestamp indicating when the archive was last updated, for example2026-01-08T05:00:06.000Z. Archives may be updated long after their nominal date. For example, if an SEC comment letter is published on2026-01-16with afiledAtdate of2025-06-17, the filing is appended to2025-06-17.jsonl.gz, and theupdatedAtvalue of that archive is updated to2026-01-16.size(number) - Size of the compressed.jsonl.gzfile in bytes.objectCount(number) - Number of JSON objects (i.e., filings) contained in the archive.
Response Example
1
[
2
{
3
"key": "2025-12-01.jsonl.gz",
4
"updatedAt": "2026-01-08T05:00:06.000Z",
5
"size": 74929,
6
"objectCount": 343454
7
},
8
{
9
"key": "2025-12-02.jsonl.gz",
10
"updatedAt": "2026-01-08T11:00:53.000Z",
11
"size": 227586,
12
"objectCount": 343454
13
}
14
// ... more files
15
]
/edgar-index/archive/files/YYYY-MM-DD.jsonl.gz
Returns a gzip-compressed JSON Lines (JSONL) file. Each line is a self-contained JSON object representing a single EDGAR filing and all files published as part of that filing.
This endpoint provides a complete file-level view of EDGAR publications for filings whose filedAt date matches the specified YYYY-MM-DD of the filename.
File Coverage: Each archive includes metadata for all EDGAR files associated with a filing, including but not limited to:
- Filing index pages
- SGML header files
- Primary filing documents
- Exhibits (e.g., bylaws, lease agreements, press releases, earnings call transcripts)
- XML, XBRL files, and XBRL instance documents
- XBRL ZIP archives
- Images
- PDFs
- Filing summary files
- Complete submission text files (
.txt) - Financial statements in Excel format
- XBRL-to-HTML rendered financial statement tables (
R<X>.html) - Any other file published as part of the filing
JSONL Record Structure
Each line in a YYYY-MM-DD.jsonl.gz file represents a complete, standalone JSON object with the following structure:
accessionNo(string) – Accession number of the filing, for example0001493152-26-002304.filedAt(string) – EDGAR acceptance timestamp (Acceptedon the EDGAR filing index page), identical to thefiledAtvalue returned by the Query API, for example2026-01-14T20:57:07-05:00. Format: ISO 8601 (RFC 3339).formType(string) – Form type of the filing, for example10-Q.entities(array of objects) – Array of filer entities associated with the filing. Each object represents one entity and includes its CIK, file number, and film number, for example[{"cik": "1823635", "fileNo": "000-56425", "filmNo": "26534532"}].cik(string) – Central Index Key (CIK) of the entity, with leading zeros removed, e.g.1823635.fileNo(string) – SEC file number, for example000-56425.filmNo(string) – SEC film number, for example26534532.
files(array of objects) – Array of all files published as part of the filing. This includes files listed on the EDGAR filing index page as well as additional files present in the filing directory.sequence(string) – Document sequence number (Seq) as shown on the EDGAR filing index page and in the correspondingdocumentFormatFilesobject returned by the Query API. May be empty if the file is not listed on the index page.description(string) – EDGAR document description (e.g.,PRIMARY DOCUMENT,Complete submission text file). May be empty if the file is not listed on the index page.document(string) – File name as published by EDGAR, for exampleedgardoc.html.type(string) – EDGAR document type, for exampleEX-31.1. May be empty for files without a designated document type. -size(number) – File size in bytes.path(string) – Relative path to the file within the filing directory. For XML-to-HTML rendered files, this may include a subdirectory (e.g.,xslF345X05/ownership.xml), whereas raw files appear directly (e.g.,ownership.xmlor0001493152-26-002304-index.html).downloadUrl(string) – URL used to download the file via the Download API, for examplehttps://archive.sec-api.io/000197292826000001/xslF345X05/edgardoc.xml. Requests require authentication via?token=YOUR_API_KEYor anAuthorizationheader.
Example of a decompressed jsonl.gz file
1
{"accessionNo":"0001972928-26-000001","formType": "4","filedAt":"2026-01-12T19:04:08-05:00","entities":[{"cik":"..","fileNo":"..","filmNo":".."}],"files":[]}
2
{ ... }
3
{ ... }
Examples of pretty-printed single records from a jsonl.gz file
Example: Form 4 Filing
1
{
2
"accessionNo": "0001972928-26-000001",
3
"formType": "4",
4
"filedAt": "2026-01-12T19:04:08-05:00",
5
"entities": [
6
{ "cik": "1972928", "fileNo": "001-34756", "filmNo": "26528094" },
7
{ "cik": "1318605" }
8
],
9
"files": [
10
// filing index page and headers
11
{
12
"path": "0001972928-26-000001-index.htm",
13
"downloadUrl": "https://archive.sec-api.io/000197292826000001/0001972928-26-000001-index.htm"
14
},
15
{
16
"path": "0001972928-26-000001-index-headers.html",
17
"downloadUrl": "https://archive.sec-api.io/000197292826000001/0001972928-26-000001-index-headers.html"
18
},
19
{
20
"path": "0001972928-26-000001.hdr.sgml",
21
"downloadUrl": "https://archive.sec-api.io/000197292826000001/0001972928-26-000001.hdr.sgml"
22
},
23
// document format files
24
{
25
"sequence": "1",
26
"description": "PRIMARY DOCUMENT",
27
"document": "edgardoc.html",
28
"type": "4",
29
"path": "xslF345X05/edgardoc.xml",
30
"downloadUrl": "https://archive.sec-api.io/000197292826000001/xslF345X05/edgardoc.xml"
31
},
32
{
33
"sequence": "1",
34
"description": "PRIMARY DOCUMENT",
35
"document": "edgardoc.xml",
36
"type": "4",
37
"size": "3954",
38
"path": "edgardoc.xml",
39
"downloadUrl": "https://archive.sec-api.io/000197292826000001/edgardoc.xml"
40
},
41
{
42
"description": "Complete submission text file",
43
"size": "5393",
44
"path": "0001972928-26-000001.txt",
45
"downloadUrl": "https://archive.sec-api.io/000197292826000001/0001972928-26-000001.txt"
46
}
47
]
48
}
Example: Form 10-Q Filing
1
{
2
"accessionNo": "0001493152-26-002304",
3
"filedAt": "2026-01-14T20:57:07",
4
"formType": "10-Q",
5
"entities": [
6
{ "cik": "1823635", "fileNo": "000-56425", "filmNo": "26534532" }
7
],
8
"files": [
9
// filing index page and headers
10
{
11
"path": "0001493152-26-002304-index.htm",
12
"downloadUrl": "https://archive.sec-api.io/000149315226002304/0001493152-26-002304-index.htm"
13
},
14
{
15
"path": "0001493152-26-002304-index-headers.html",
16
"downloadUrl": "https://archive.sec-api.io/000149315226002304/0001493152-26-002304-index-headers.html"
17
},
18
{
19
"path": "0001493152-26-002304.hdr.sgml",
20
"downloadUrl": "https://archive.sec-api.io/000149315226002304/0001493152-26-002304.hdr.sgml"
21
},
22
// document format files
23
{
24
"sequence": "1",
25
"description": "10-Q",
26
"document": "form10-q.htm",
27
"type": "10-Q",
28
"size": "1302492",
29
"path": "form10-q.htm",
30
"downloadUrl": "https://archive.sec-api.io/000149315226002304/form10-q.htm"
31
},
32
{
33
"sequence": "2",
34
"description": "EX-31.1",
35
"document": "ex31-1.htm",
36
"type": "EX-31.1",
37
"size": "16906",
38
"path": "ex31-1.htm",
39
"downloadUrl": "https://archive.sec-api.io/000149315226002304/ex31-1.htm"
40
},
41
// ... more exhibit files
42
{
43
"sequence": "",
44
"description": "Complete submission text file",
45
"document": "0001493152-26-002304.txt",
46
"type": "",
47
"size": "8230004",
48
"path": "0001493152-26-002304.txt",
49
"downloadUrl": "https://archive.sec-api.io/000149315226002304/0001493152-26-002304.txt"
50
},
51
// data files = XBRL/XML files
52
{
53
"sequence": "6",
54
"description": "XBRL SCHEMA FILE",
55
"document": "ecxj-20251130.xsd",
56
"type": "EX-101.SCH",
57
"size": "66700",
58
"path": "ecxj-20251130.xsd",
59
"downloadUrl": "https://archive.sec-api.io/000149315226002304/ecxj-20251130.xsd"
60
},
61
// ... more XBRL files
62
{
63
"sequence": "104",
64
"description": "EXTRACTED XBRL INSTANCE DOCUMENT",
65
"document": "form10-q_htm.xml",
66
"type": "XML",
67
"size": "1539560",
68
"path": "form10-q_htm.xml",
69
"downloadUrl": "https://archive.sec-api.io/000149315226002304/form10-q_htm.xml"
70
},
71
// all other files in directory listing of filing
72
{
73
"size": "200620",
74
"path": "0001493152-26-002304-xbrl.zip",
75
"downloadUrl": "https://archive.sec-api.io/000149315226002304/0001493152-26-002304-xbrl.zip"
76
},
77
{
78
"size": "51446",
79
"path": "R1.htm",
80
"downloadUrl": "https://archive.sec-api.io/000149315226002304/R1.htm"
81
},
82
{
83
"size": "48818",
84
"path": "FilingSummary.xml",
85
"downloadUrl": "https://archive.sec-api.io/000149315226002304/FilingSummary.xml"
86
}
87
// ...
88
]
89
}
FAQ
Common questions about querying the EDGAR Index APIs, the response shape, and the bulk archives.
How do I get a complete list of every SEC filing that was published on a specific date?
To pull every filing published on a given day, request the Ingestion Logs endpoint for that calendar date at /edgar-index/ingestion-log/YYYY-MM-DD. The date in the path refers to the indexing/publication date inside the sec-api.io system, not to the EDGAR acceptance time, so all filings indexed on that day are returned regardless of when EDGAR originally accepted them.
The response body lists each distinct filing once in the data array, with the accession number (data.accessionNo), the form type (data.formType), and the EDGAR acceptance timestamp (data.filedAt). The total count for that date is exposed at total.value. Coverage starts on December 1, 2025.
How can I verify that my real-time filings pipeline did not miss any filings yesterday?
To confirm a real-time pipeline did not drop filings, treat the Ingestion Logs endpoint as the source of truth for what was published on a given day. Request /edgar-index/ingestion-log/YYYY-MM-DD for yesterday's date and compare the set of accession numbers in the response (data.accessionNo) against the accession numbers your ingestion store recorded for that same date.
The Ingestion Logs response is de-duplicated by accession number, so any accession that appears in data.accessionNo but not in your store is a missed filing. The total expected count is available at total.value, which gives a fast row-count check before the full diff. The Ingestion Logs documentation explicitly recommends this endpoint for daily reconciliation against the Stream API.
How do I find out exactly when an SEC filing was accepted by EDGAR versus when it was published to the public index?
Both timestamps are available through /edgar-index/ingestion-log/YYYY-MM-DD. The EDGAR acceptance time appears on each filing entry as data.filedAt, which is the Accepted attribute set by EDGAR when the submission was received. The publication/indexing date is the YYYY-MM-DD you place in the endpoint path; that is the date the filing first appeared in the sec-api.io index.
The two dates frequently differ. SEC comment letters in particular are often published weeks or months after acceptance, so an Ingestion Log for 2026-01-16 can include entries whose data.filedAt falls on 2025-05-13, 2025-06-17, or other earlier dates. Comparing data.filedAt to the date portion of the endpoint path is the supported way to measure the gap between EDGAR acceptance and public indexing.
How can I retrieve all accession numbers from a given day so I can then pull full filing metadata for each one?
First call /edgar-index/ingestion-log/YYYY-MM-DD to enumerate the accession numbers for the day. The response's data array contains one entry per filing, and each entry's data.accessionNo is the accession number you need for downstream lookups.
For each accession number returned, then hand it to the Query API at https://api.sec-api.io and filter on the accession number to retrieve the full filing metadata record. This Ingestion Logs plus Query API combination is the documented pattern for a nightly batch that first lists every published accession and then loads the complete metadata for each one.
How do I identify SEC comment letters that were just released today but actually pertain to filings accepted months earlier?
Request /edgar-index/ingestion-log/YYYY-MM-DD for today's date, then keep only the entries whose form type is a comment letter (data.formType set to LETTER). Because the date in the endpoint path is the publication/indexing date, every entry in the response was released today even though many were accepted by EDGAR much earlier.
For each letter entry, compare the EDGAR acceptance timestamp on the entry (data.filedAt) to today's date. Any letter whose data.filedAt is materially earlier than the path date is a letter that was accepted months ago but only published today. The Ingestion Logs documentation calls out this specific behavior, noting that a log for 2026-01-16 can include LETTER entries with filedAt values such as 2025-05-13, 2025-06-17, or 2025-08-28.
How often is the daily ingestion log refreshed during the trading day?
The Ingestion Logs endpoint at /edgar-index/ingestion-log/YYYY-MM-DD is updated continuously, with new entries appended in under 300 milliseconds as filings are indexed. Each response carries a lastUpdatedAt timestamp that reflects when the most recent filing for that date was processed, so consumers can detect new activity by polling and watching lastUpdatedAt advance.
How do I tell whether an ingestion log for a given day is finalized or still receiving new entries?
The Ingestion Logs response at /edgar-index/ingestion-log/YYYY-MM-DD exposes lastUpdatedAt, the ISO 8601 timestamp of the most recently indexed filing for that date. New entries are added continuously throughout the day, so a fresh lastUpdatedAt value means the day is still being written to.
There is no explicit finalized flag on the Ingestion Logs response itself. The practical signal is that lastUpdatedAt stops advancing well past the end of the trading day, and the entry count at total.value has stabilized between polls. For a hard cutoff on related daily file-index archives, consult /edgar-index/archive/files/index.json, where the per-day YYYY-MM-DD.jsonl.gz archive is published once the dataset for that date is complete.
How can I download URLs and metadata for every file (not just the primary document) attached to all filings on a specific historical date?
Request /edgar-index/archive/files/YYYY-MM-DD.jsonl.gz for the date of interest. The response is a gzip-compressed JSON Lines file in which each line is a self-contained JSON record for one filing whose EDGAR filedAt falls on that date.
Each record carries accessionNo, formType, filedAt, and an entities array, plus a files array that enumerates every artifact published with that filing. For each file, files.path is the relative path within the filing directory, files.document is the EDGAR-published name, files.type is the EDGAR document type (for example EX-31.1, XML, or EX-101.SCH), files.description is the human-readable label, files.size is the byte count, and files.downloadUrl is the URL you fetch to retrieve the actual file. This covers index pages, header files, primary documents, exhibits, XBRL files, images, PDFs, the complete submission text file, and any other artifact in the filing directory.
How do I get a catalogue of all available daily file-index archives along with their sizes and freshness?
Request /edgar-index/archive/files/index.json. The response is an array where each entry describes one daily YYYY-MM-DD.jsonl.gz archive.
Each entry exposes the filename (key, for example 2025-12-01.jsonl.gz), the last update time (updatedAt, ISO 8601), the compressed size in bytes (size), and the number of filings packed into that archive (objectCount). This index lets a client see every archive that exists, how large each one is, and when it was most recently rewritten.
How can I detect which historical daily archives have been amended or updated since I last ingested them?
Poll /edgar-index/archive/files/index.json and use the updatedAt field on each archive entry as the change marker. Each entry's key identifies the archive (for example 2025-06-17.jsonl.gz) and updatedAt is the ISO 8601 timestamp of the most recent write to that archive.
Keep a record of the updatedAt value you saw the last time you ingested each archive, and on the next poll re-fetch only those YYYY-MM-DD.jsonl.gz files where the new updatedAt is later than the stored one. Archives are commonly updated long after their nominal date because, for example, a comment letter published on 2026-01-16 with an EDGAR filedAt of 2025-06-17 is appended to 2025-06-17.jsonl.gz and bumps that archive's updatedAt.
How do I bulk-download the entire raw content of EDGAR filings (HTML, XML, XBRL, exhibits, images) for a single day?
Request /edgar-index/archive/tar/YYYY-MM-DD.tar for the date you want. The response is a tar archive covering the complete EDGAR directory tree for filings whose filedAt date matches YYYY-MM-DD, including filings, exhibits, index pages, header files, XML, and XBRL documents. Every file inside the tar is individually gzip-compressed.
How can I list all dates for which a full-content tar archive is available?
Request /edgar-index/archive/tar/index.json. The response enumerates the dates for which a YYYY-MM-DD.tar full-content archive exists, so a client can discover the available range before requesting individual day tars at /edgar-index/archive/tar/YYYY-MM-DD.tar.
How do I backfill many years of complete EDGAR content for an LLM training corpus?
Iterate through /edgar-index/archive/tar/YYYY-MM-DD.tar across the date range you want to ingest. Each daily tar contains the entire EDGAR directory tree for filings whose filedAt date matches that day, with every inner file individually gzip-compressed, which is the recommended endpoint for full-content replication at scale and for one-time backfills spanning multiple years or decades.
Use /edgar-index/archive/tar/index.json to discover the dates that have an available tar before iterating, so you do not request days that are not yet published. The File Index Archive APIs cover EDGAR history from 1993/1994 to the present, which bounds the practical backfill window.
How do I selectively download only 10-K and 10-Q filings over the last 15 years without pulling unrelated files?
Iterate through /edgar-index/archive/files/YYYY-MM-DD.jsonl.gz for each day across the 15-year window. For every JSONL record, keep only those whose formType equals 10-K or 10-Q, and drop the rest before downloading any content. This avoids pulling the much larger volume of unrelated filings that the full-content tar archives would include.
For each retained record, walk the files array and download each artifact through its files.downloadUrl. This Ingestion-Logs-style selective pattern is the documented option when you need fine-grained, form-type-restricted historical selection rather than full-content replication.
How can I find the URL to a specific exhibit (e.g., EX-31.1) inside a filing?
Open the filing's record inside /edgar-index/archive/files/YYYY-MM-DD.jsonl.gz for the day the filing was accepted by EDGAR. Walk the files array and look for the entry whose EDGAR document type matches the exhibit you want, for example files.type set to EX-31.1. The files.downloadUrl on that entry is the direct URL for the exhibit, and files.description and files.document give the human label and original file name.
How do I get the download URL for the complete submission text (.txt) file of a filing?
Inside the filing's record in /edgar-index/archive/files/YYYY-MM-DD.jsonl.gz, scan the files array for the entry whose files.description is Complete submission text file. That entry's files.downloadUrl is the URL of the consolidated .txt submission file, and files.path typically ends in .txt (for example 0001493152-26-002304.txt).
How can I list all XBRL instance and schema files for a 10-Q so I can run my own financial extraction?
Pull the 10-Q's record from /edgar-index/archive/files/YYYY-MM-DD.jsonl.gz for the day it was accepted, then filter the files array by EDGAR document type. The XBRL instance document is the entry whose files.type is XML (its files.description is typically EXTRACTED XBRL INSTANCE DOCUMENT), and the XBRL schema is the entry whose files.type is EX-101.SCH. Additional XBRL-related entries follow the same EX-101.* type pattern.
Each matching entry exposes the file name (files.document), the byte size (files.size), and the direct URL (files.downloadUrl), which together give you the full set of XBRL artifacts you need to feed into a custom extraction pipeline.
How do I find every filing on a given day that has more than one reporting entity (e.g., insider and issuer)?
Read every JSONL record from /edgar-index/archive/files/YYYY-MM-DD.jsonl.gz for that day and count the length of the entities array on each record. Records whose entities array contains more than one object are multi-entity filings, which is the standard shape for filings that report both an insider and an issuer (for example, a Form 4 with one entry for the reporting person and one for the issuer).
For each multi-entity record, the per-entity details are exposed as entities.cik, entities.fileNo, and entities.filmNo. The accessionNo on the record is the identifier you would use to look up the filing elsewhere.
How can I retrieve all filings on a date that were filed by a specific CIK across all roles (filer/issuer/reporter)?
Iterate every JSONL record in /edgar-index/archive/files/YYYY-MM-DD.jsonl.gz for that day and keep records where any element of the entities array has entities.cik equal to the CIK you are looking for. Because entities enumerates all filer entities associated with the filing (issuers, filers, and reporters), this single check captures every role the CIK played that day.
Each matching record exposes accessionNo, formType, filedAt, the per-entity entities.fileNo and entities.filmNo, and a files array with files.downloadUrl values for every artifact in the filing.
How do I look up the SEC file number and film number associated with a particular filing?
Open the filing's record inside /edgar-index/archive/files/YYYY-MM-DD.jsonl.gz for the day the filing was accepted, identified by its accessionNo. The entities array on that record carries one object per reporting entity, and each object exposes the SEC file number as entities.fileNo (for example 000-56425) and the SEC film number as entities.filmNo (for example 26534532).
How can I locate XBRL-to-HTML rendered financial statement tables (the R1.htm, R2.htm, ... files) for a 10-K?
Read the 10-K's record from /edgar-index/archive/files/YYYY-MM-DD.jsonl.gz for the day it was accepted and scan the files array for entries whose files.path matches the rendered-table pattern (for example R1.htm, R2.htm, and so on). These rendered files are listed alongside the primary document and exhibits, each with its own files.size and files.downloadUrl that you fetch to retrieve the HTML table.
How do I download a filing's pre-packaged XBRL ZIP bundle in one shot rather than file-by-file?
Locate the filing's record in /edgar-index/archive/files/YYYY-MM-DD.jsonl.gz and scan the files array for the entry whose files.path ends in -xbrl.zip (for example 0001493152-26-002304-xbrl.zip). Fetching that entry's files.downloadUrl returns the pre-packaged XBRL bundle as a single ZIP, which avoids the file-by-file download of the individual XBRL instance and schema artifacts.
How can I get only the SGML header file for a filing to parse submission-level metadata?
In the filing's record inside /edgar-index/archive/files/YYYY-MM-DD.jsonl.gz, scan the files array for the entry whose files.path ends in .hdr.sgml (for example 0001972928-26-000001.hdr.sgml). That entry's files.downloadUrl is the direct URL to the SGML header, which is what you fetch to parse submission-level metadata without pulling the full filing.
How can I estimate storage needs before downloading a day's worth of EDGAR content?
For metadata-level sizing, request /edgar-index/archive/files/index.json and read the size and objectCount fields on the entry for the date of interest. size is the byte size of the compressed YYYY-MM-DD.jsonl.gz archive, and objectCount is the number of filings packed into it, which together let you project how many records you will process and how much disk the metadata archive itself consumes.
For full-content sizing, request /edgar-index/archive/tar/index.json, which is the catalog of available daily tar archives and lets you measure the per-day tar before requesting /edgar-index/archive/tar/YYYY-MM-DD.tar.
How do I authenticate requests to download files referenced by downloadUrl in the archives?
Every files.downloadUrl returned by /edgar-index/archive/files/YYYY-MM-DD.jsonl.gz points at https://archive.sec-api.io/... and is authenticated with your sec-api.io API key. There are two supported options: include ?token=YOUR_API_KEY on the URL itself, or attach the credential through an Authorization header.
How do I process the daily file-index archive without loading the whole gzip into memory?
Request /edgar-index/archive/files/YYYY-MM-DD.jsonl.gz and stream the response. The archive is gzip-compressed JSON Lines, and each line is a self-contained, standalone JSON object representing one filing and all of its files. Decompress the byte stream incrementally and parse one JSON object per line, which lets you handle archives of any size without holding the full payload in memory.
How do I track which form types were most common on a particular day in EDGAR?
Request /edgar-index/ingestion-log/YYYY-MM-DD for the day of interest and group the data array by data.formType, counting the entries in each group. Because the response is de-duplicated by accession number, each filing contributes exactly one observation to the form-type tally, and the total filing count is exposed at total.value for a sanity check against the per-form-type sum.
How can I monitor for late additions to a historical day (e.g., a comment letter appended weeks later)?
Poll /edgar-index/archive/files/index.json and watch the updatedAt field on the entry whose key is the YYYY-MM-DD.jsonl.gz archive for the historical day of interest. When a late filing such as a comment letter is published with an EDGAR filedAt of that day, it is appended to that day's archive and updatedAt advances.
A newer updatedAt than the value you last recorded for that key is the signal to re-fetch /edgar-index/archive/files/YYYY-MM-DD.jsonl.gz and diff the new JSONL records against your stored set; any records with an accessionNo you have not seen are the late additions.
How do I build a daily nightly job that pulls only the filings published today plus their full document content?
First request /edgar-index/ingestion-log/YYYY-MM-DD for today's date and read the data array; each entry's data.accessionNo, data.formType, and data.filedAt describe a filing that was published today. This is the documented endpoint for daily completeness against publication date.
Then, for each accession returned, fetch the filing's per-file artifacts through files.downloadUrl URLs taken from the corresponding record in /edgar-index/archive/files/YYYY-MM-DD.jsonl.gz for that filing's filedAt date. The downloadUrl endpoints sit on archive.sec-api.io and are authenticated by appending ?token=YOUR_API_KEY or sending an Authorization header. This Ingestion Logs plus Download pattern is the recommended combination for a nightly metadata-plus-content job.
How do I know when today's file index archive is finalized and safe to ingest?
The File Index Archive APIs publish their daily YYYY-MM-DD.jsonl.gz dataset at 10:30 PM ET, and the documented coverage policy is that at 10:30 PM ET the dataset for that date is complete and safe to ingest. After that cutoff you can request /edgar-index/archive/files/YYYY-MM-DD.jsonl.gz for today's date and treat it as final.
As a programmatic check, request /edgar-index/archive/files/index.json and read the entry whose key equals today's YYYY-MM-DD.jsonl.gz. Once that entry's updatedAt reflects a time at or after the 10:30 PM ET publication, and its size and objectCount are populated, the archive is ready to download.