sec-api.ioSEC API by D2V
FilingsPricingSandboxDocs
Log inGet Free API Key
  1. Home
  2. Tutorials

EDGAR Filer Analysis

Open In Colab   Download Notebook

On this page:

  • Data Sourcing and Preparation
    • Data Collection
      • Understanding Multiple Entities
        • Dataset Availability
          • Data Enrichment
          • Getting Started
            • Augmenting with Sector & Industry Data
              • Augmenting with SIC Codes and SEC Offices
              • Visualizing EDGAR Filers' Trends
                • Sector and Industry Analysis of EDGAR Filers
                  • Industry-Level Analysis
                  • EDGAR Entities by SIC and SEC Office

                    Welcome to the EDGAR Filer Analysis tutorial! Through this tutorial, we aim to offer a comprehensive overview of the trends and factors that have influenced the growth and fluctuations of entities on the SEC EDGAR platform from 1994 to 2022.

                    By employing various visualization techniques, we'll shed light on the number of EDGAR filers over this time span. Additionally, we'll enrich our dataset by incorporating sector and industry information, SIC codes, and the respective SEC office designations for each filer.

                    You'll get hands-on experience crafting stacked bar charts, line graphs, heatmaps, box plots, and more to unearth meaningful correlations and insights. Here's a sneak peek of the visualizations you're about to explore:

                    gallery

                    Below, we highlight some of the key observations and insights derived from the EDGAR filer statistics:

                    • Peak Filers: The year 2021 recorded the highest number of EDGAR filers ever.
                    • Sectoral Growth: In 2021, there was a significant influx of new EDGAR filers, second only to 2003. Notably, the industrials and healthcare sectors observed the most pronounced growth.
                    • SPAC Surge: A noticeable surge in Blank cheque companies (SPACs or shell companies) was evident between 2020 and 2022.
                    • Tech Sector Warning: There were anomalous increases in new tech sector filers between 1997 and 1999, hinting at the impending dot-com bubble of 2000.
                    • Financial Crisis Precursor: Between 2003 and 2006, there was a spike in asset-backed securities – a potential early warning sign of the looming financial crisis.
                    • Assignment Trends: Over time, the number of new EDGAR filers linked to specific sectors or industries declined, while unassigned filers saw an uptick.
                    • SIC Code Omissions: Astonishingly, 90% of EDGAR filers are not assigned an SIC code, making them unlinked to any specific SEC office.
                    • Assignment Decline: Beginning in 2000, there was a consistent decline in the number of EDGAR filers associated with a sector, SIC code, or SEC office. This trend was counterbalanced by an increase in filers without any such affiliations.

                    Data Sourcing and Preparation

                    Throughout this section, note that we use the terms filer and entity interchangeably.

                    Data Collection

                    The annual count of EDGAR filers was determined using the Query API. We extracted metadata for all EDGAR filings between 1994 and 2022, which included details like form type (e.g., 10-K) and filing entities (such as CIK, SIC, IRS number, and filer name). By referencing these entities in respective filings for each year, we compiled a comprehensive list. However, a deep dive into the aggregation process will not be covered here. Be prepared: executing the code responsible for this data collection and aggregation can span several days.

                    Understanding Multiple Entities

                    A single filing can encompass multiple entities. Take the insider trading form, Form 4, as an example. It always includes at least two entities: the issuer (the company issuing the securities) and the reporter (the individual or entity acquiring or disposing of the securities). Thus, one SEC Form 4 filing, with a unique accession number, results in two separate records in the EDGAR index—one for each entity. Here's an illustrative example showcasing Meta (formerly Facebook) and Mark Zuckerberg as the two entities:

                    Form 4 Example

                    form-4-entities-example

                    On the other hand, S-4 filings can encompass even more entities. The following example houses nine distinct entities:

                    S-4 Filing Example

                    form-s-4-entities-example

                    Key takeaways:

                    • A single filing can contain multiple entities.
                    • Every entity will possess a CIK and name, but not all will have an SIC code.

                    Dataset Availability

                    For the sake of convenience, we've prepared the entities dataset in advance. You can freely download the data from the following links:

                    • Entities from 1994 to 2011
                    • Entities from 2012 to 2022

                    The dataset structure is as follows:

                    {
                    "2022": [
                    {
                    "cik": "789019",
                    "companyName": "MICROSOFT CORP (Issuer)",
                    "irsNo": "911144442",
                    "fiscalYearEnd": "0630",
                    "stateOfIncorporation": "WA",
                    "sic": "7372 Services-Prepackaged Software",
                    "year": "2022"
                    },
                    {
                    "cik": "902012",
                    "companyName": "GATES WILLIAM H III (Reporting)",
                    "year": "2022"
                    }
                    // ... additional filers
                    ]
                    // ... other years
                    }

                    Data Enrichment

                    In the forthcoming steps, we will enhance our dataset by adding sector and industry data for each entity. This is achieved through the Mapping API, which links each entity's CIK to its corresponding sector and industry. If a mapping isn't available, the value will be represented as "NaN". Furthermore, we will associate the Standard Industrial Classification (SIC) codes of each entity to the appropriate SEC office, as detailed by the SEC's guidelines found here.

                    Getting Started

                    As we venture further into our EDGAR Filer Analysis, our next step is crucial: obtaining and processing our primary dataset. In this section, we will walk you through the process of:

                    1. Downloading the Entities Data: We'll programmatically fetch the entities.json files which house our essential data spanning from 1994 to 2022.
                    2. Conversion to a DataFrame: The raw JSON data will be transformed into a structured pandas DataFrame. This format is more conducive to the analytical operations we'll be performing.
                    3. Enriching the Data: We'll augment our dataset by appending sector, industry, and SEC office designations to each entity. This added information will give us a richer context as we delve deeper into our analysis.

                    By the end of this section, you'll have a robust, enriched dataset at your fingertips, ready for detailed exploration and visualization. Let's dive in!

                    !pip install -q sec-api
                    import pandas as pd
                    import numpy as np
                    import matplotlib.pyplot as plt
                    import matplotlib.ticker as mtick
                    import seaborn as sns
                    import json
                    import requests

                    plt.rcParams.update({'font.size': 9})
                    # load entities_1994_2011.json into entities_1994_2011 variable
                    url_a = "https://statistics.sec-api.io/tutorials/edgar-filer-analysis/entities_1994_2011.json"
                    response_a = requests.get(url_a)
                    entities_1994_2011 = response_a.json()

                    url_b = "https://statistics.sec-api.io/tutorials/edgar-filer-analysis/entities_2012_2022.json"
                    response_b = requests.get(url_b)
                    entities_2012_2022 = response_b.json()

                    # merge entities_1994_2011 and entities_2012_2022 into entities variable
                    entities_json = {**entities_1994_2011, **entities_2012_2022}

                    Next, we'll transform the list of entities from the JSON format into a pandas DataFrame. This DataFrame will feature the following columns: year, cik, irsNo, companyName, fiscalYearEnd, stateOfIncorporation, and sic.

                    def convert_json_to_dataframe(entities_json: dict) -> pd.DataFrame:
                        entities_json_list = []

                        for year in entities_json.keys():
                            for entity in entities_json[year]:
                                entity['cik'] = entity['cik'].lstrip('0')
                                entity['year'] = year
                                entities_json_list.append(entity)

                        entities = pd.DataFrame(entities_json_list)

                        return entities


                    entities = convert_json_to_dataframe(entities_json)
                    entities.drop(labels=['act', 'fileNo', 'type', 'filmNo'], axis=1, inplace=True)
                    year = entities['year']
                    entities.drop(labels=['year'], axis=1, inplace=True)
                    entities.insert(0, 'year', year)
                    entities['year'] = entities['year'].astype(int)
                    print('All EDGAR entities from 1994 - 2022')
                    entities # 3056738 rows
                    All EDGAR entities from 1994 - 2022
                    Out:
                    yearfiscalYearEndstateOfIncorporationcikcompanyNamesicirsNo
                    019941231LA96035SYSTEM FUELS INC (Filer)0000NaN
                    119941130NaN89954TRUST FOR SHORT TERM U S GOVERNMENT SECURITIES...NaN251289316
                    219940331MA901823PUTNAM MANAGED HIGH YIELD TRUST (Filer)0000046733967
                    319941231NY872243EMPIRE STATE MUNICIPAL EXEMPT TRUST GUARANTEED...0000NaN
                    419941231NY872246EMPIRE STATE MUNICIPAL EXEMPT TRUST GUARANTEED...0000NaN
                    ........................
                    305673320221231NY1956590GO UES Investors LLC (Filer)NaN920625088
                    305673420221231T21955130Broomfield International Ltd (Filed by)NaN000000000
                    305673520221231DE1895349Bridger Healthcare SIV I, LP (Filer)NaN000000000
                    30567362022NaNNaN1830807Eden Douglas Todd (Filed by)NaNNaN
                    30567372022NaNDE1895098XN Opportunities IV LP (Filer)NaN000000000

                    3056738 rows × 7 columns

                    Augmenting with Sector & Industry Data

                    To further enrich our dataset, we'll incorporate sector and industry information for each entity using the Mapping API. Start by retrieving the mapping data for all CIKs available in the mapping database. Once obtained, we'll associate the CIKs from our entities DataFrame with their respective sectors and industries. This is achieved using pandas' df_A.merge(df_B) function, allowing us to seamlessly append the sector and industry columns to our DataFrame.

                    from sec_api import MappingApi

                    mappingApi = MappingApi(api_key='YOUR_API_KEY')
                    mappings_json = mappingApi.resolve('cik', '[0-9]')
                    mappings = pd.DataFrame(mappings_json).drop(columns=["id"])
                    mappings["cik"] = mappings["cik"].astype(str)
                    mappings[mappings["sector"] == ""] = np.nan
                    mappings[mappings["industry"] == ""] = np.nan

                    The mappings DataFrame comprises 16 columns. Among these, the cik, sector, and industry columns are most important for our analysis.

                    print(f"Number of mappings: {len(mappings)}")
                    mappings.head()
                    Number of mappings: 41847
                    Out:
                    nametickercikcusipexchangeisDelistedcategorysectorindustrysicsicSectorsicIndustryfamaSectorfamaIndustrycurrencylocation
                    0AGILENT TECHNOLOGIES INCA109087200846U101NYSEFalseDomestic Common StockHealthcareDiagnostics & Research3826ManufacturingLaboratory Analytical InstrumentsMeasuring and Control EquipmentUSDCalifornia; U.S.A
                    1ALCOA CORPAA1675149013872106NYSEFalseDomestic Common StockBasic MaterialsAluminum3334ManufacturingPrimary Production Of AluminumSteel Works EtcUSDPennsylvania; U.S.A
                    2ADMIRALTY BANCORP INCAAAB1066808007231103NASDAQTrueDomestic Common StockFinancial ServicesBanks - Regional6022Finance Insurance And Real EstateState Commercial BanksBankingUSDFlorida; U.S.A
                    3ALTANA AKTIENGESELLSCHAFTAAAGY118280202143N103NYSETrueADR Common StockHealthcareBiotechnology2834ManufacturingPharmaceutical PreparationsPharmaceutical ProductsEURJordan
                    4ADVANCED ACCELERATOR APPLICATIONS SAAAAP161178700790T100NASDAQTrueADR Common StockHealthcareBiotechnology2834ManufacturingPharmaceutical PreparationsPharmaceutical ProductsEURFrance
                    tmp_mappings = mappings.dropna(subset=['sector', 'industry'])
                    print(f"{len(entities)} filers before merging")
                    entities = entities.merge(tmp_mappings[['cik', 'sector', 'industry']].drop_duplicates(subset=['cik']), how='left', on='cik')
                    entities.drop_duplicates(inplace=True)
                    entities.reset_index(drop=True, inplace=True)
                    print(f"{len(entities)} filers after merging")
                    3056738 filers before merging
                    3056738 filers after merging

                    Upon merging the sector and industry data into our entities DataFrame, the final list of entities looks like this:

                    entities[entities['year']==2022].head()
                    Out:
                    yearfiscalYearEndstateOfIncorporationcikcompanyNamesicirsNosectorindustry
                    289822120221231DE19617JPMORGAN CHASE & CO (Filer)6021 National Commercial Banks132624428Financial ServicesBanks - Diversified
                    289822220221231DE1665650JPMorgan Chase Financial Co. LLC (Filer)6021 National Commercial Banks475462128NaNNaN
                    289822320221231X0312070BARCLAYS BANK PLC (Filer)6029 Commercial Banks, NEC000000000NaNNaN
                    28982242022NaNNaN1504477Blend L Michael (Reporting)NaNNaNNaNNaN
                    289822520221231DE1805833System1, Inc. (Issuer)7370 Services-Computer Programming, Data Proce...981531250IndustrialsSpecialty Business Services

                    Augmenting with SIC Codes and SEC Offices

                    Our next enhancement involves associating SIC codes with their respective SEC Offices. Fortunately, the SEC website provides a detailed mapping between SIC codes and SEC offices, which can be found here: SEC SIC Code List.

                    For ease of access, we've already extracted and converted this mapping, inclusive of SIC codes, SEC office designations, and industry titles, into a JSON file. You can fetch it directly from: SIC-to-Office Mapping JSON.

                    In this section, we'll:

                    1. Download the provided JSON file.
                    2. Separate the SIC column in our entities DataFrame into two distinct columns: sic_code (housing the 3-4 digit code) and sic_industry (containing the industry title).
                    3. Integrate the SIC-to-Office mapping with our entities DataFrame.

                    Post these steps, our enriched entities DataFrame will feature an additional office column, indicating the designated SEC office for each entry.

                    sic_code_url = "https://statistics.sec-api.io/tutorials/edgar-filer-analysis/sic-codes-list.json"
                    sic_code_response = requests.get(sic_code_url)
                    sic_codes_json = sic_code_response.json()
                    sic_codes = pd.DataFrame(sic_codes_json)
                    sic_codes['sic_code'] = sic_codes['sic_code'].astype(str)
                    sic_codes
                    Out:
                    sic_codeofficeindustry_title
                    0100Industrial Applications and ServicesAGRICULTURAL PRODUCTION-CROPS
                    1200Industrial Applications and ServicesAGRICULTURAL PROD-LIVESTOCK & ANIMAL SPECIALTIES
                    2700Industrial Applications and ServicesAGRICULTURAL SERVICES
                    3800Industrial Applications and ServicesFORESTRY
                    4900Industrial Applications and ServicesFISHING, HUNTING AND TRAPPING
                    ............
                    4398880Office of International Corp FinAMERICAN DEPOSITARY RECEIPTS
                    4408888Office of International Corp FinFOREIGN GOVERNMENTS
                    4418900Office of Trade & ServicesSERVICES-SERVICES, NEC
                    4429721Office of International Corp FinINTERNATIONAL AFFAIRS
                    4439995Office of Real Estate & ConstructionNON-OPERATING ESTABLISHMENTS

                    444 rows × 3 columns

                    # split "sic" column in sic_stats into "sic_code" and "sic_industry" columns by splitting at first space
                    entities[['sic_code', 'sic_industry']] = entities['sic'].str.split(n=1, expand=True)
                    entities['sic_code'] = entities['sic_code'].str.lstrip('0')
                    entities['sic_code'].replace('', np.nan, inplace=True)
                    entities['sic_industry'].replace({None: np.nan}, inplace=True)
                    print(f"{len(entities)} filers before merging")
                    entities = entities.merge(sic_codes[['sic_code', 'office']], how='left', on='sic_code')
                    entities.reset_index(drop=True, inplace=True)
                    print(f"{len(entities)} filers after merging")
                    3056738 filers before merging
                    3056738 filers after merging

                    We're set! Our entities DataFrame now encompasses sector and industry details for each EDGAR entity, complemented by the designated SEC Office, SIC code, and SIC industry title. With this enriched dataset in hand, we're ready to delve into visualization and analysis.

                    entities[entities['year']==2022].head()
                    Out:
                    yearfiscalYearEndstateOfIncorporationcikcompanyNamesicirsNosectorindustrysic_codesic_industryoffice
                    289822120221231DE19617JPMORGAN CHASE & CO (Filer)6021 National Commercial Banks132624428Financial ServicesBanks - Diversified6021National Commercial BanksOffice of Finance
                    289822220221231DE1665650JPMorgan Chase Financial Co. LLC (Filer)6021 National Commercial Banks475462128NaNNaN6021National Commercial BanksOffice of Finance
                    289822320221231X0312070BARCLAYS BANK PLC (Filer)6029 Commercial Banks, NEC000000000NaNNaN6029Commercial Banks, NECOffice of Finance
                    28982242022NaNNaN1504477Blend L Michael (Reporting)NaNNaNNaNNaNNaNNaNNaN
                    289822520221231DE1805833System1, Inc. (Issuer)7370 Services-Computer Programming, Data Proce...981531250IndustrialsSpecialty Business Services7370Services-Computer Programming, Data Processing...Office of Technology

                    Visualizing EDGAR Filers' Trends

                    It's time to bring our data to life through visualization. In this section, we'll illustrate the trajectory of EDGAR filers over the years.

                    1. Annual EDGAR Filers: A bar chart will showcase the number of entities that filed an EDGAR filing from 1994 to 2022.
                    2. New vs. Total Filers: We'll compute the yearly count of new EDGAR filers and contrast this against the cumulative total using compelling visuals.
                    3. Growth Analysis: To further understand the dynamics, we'll calculate and visualize the year-on-year percentage growth in new filers.

                    These visualizations will not only offer a bird's-eye view of the trends but also equip us with insights into the nuances of filer growth and the factors influencing them.

                    years = entities['year'].unique().tolist()
                    num_entities = entities.groupby('year').size().tolist()

                    plt.bar(years, num_entities)
                    plt.xlabel('Year'), plt.ylabel('Entities'), plt.title('EDGAR Entities per Year')
                    ax = plt.gca()
                    ax.yaxis.set_major_formatter(mtick.FuncFormatter(lambda x, p: format(int(x), ',')))
                    plt.xticks(years, rotation=90), plt.grid(axis='y'), plt.gca().set_axisbelow(True)
                    plt.show()
                    EDGAR Entities per Year

                    Let us now find entities that are seen for the first time in each year, allowing us to track when each entity first appeared in the dataset.

                    For this, we define the function get_new_filers_per_year that processes the entities dataframe to identify new entities (or 'filers') for each year, i.e., entities that are seen for the first time in that year.

                    def get_new_filers_per_year(entities):
                        new_filers = None
                        processed_ciks = set()
                        years = entities['year'].unique().tolist()

                        for year in years:
                            ciks = entities[entities['year'] == year]['cik'].tolist()
                            if year == 1994:
                                new_filers = pd.DataFrame(entities[entities['year'] == year])
                                processed_ciks.update(ciks)
                                continue

                            year_entities = entities[entities['year'] == year]
                            new_filers_for_year = year_entities[~year_entities['cik'].isin(processed_ciks)]
                            new_filers = pd.concat([new_filers, new_filers_for_year], ignore_index=True)

                            processed_ciks.update(ciks)

                        return new_filers
                    new_filers = get_new_filers_per_year(entities)
                    print('New EDGAR entities in 2020:', len(new_filers[new_filers['year'] == 2020])) # 35055
                    New EDGAR entities in 2020: 35055

                    The next code snippet is designed to visualize the distribution of entities over the years, specifically highlighting the difference between all entities and the new entities added each year.

                    years = entities['year'].unique().tolist()
                    num_entities = entities.groupby('year').size().tolist()
                    num_new_filers = new_filers.groupby('year').size().tolist()

                    plt.bar(years, num_entities, label='All entities')
                    plt.bar(years, num_new_filers, label='New entities')
                    plt.xlabel('Year'), plt.ylabel('Entities'), plt.title('New & Old EDGAR Entities per Year')
                    plt.legend()
                    ax = plt.gca()
                    ax.yaxis.set_major_formatter(mtick.FuncFormatter(lambda x, p: format(int(x), ',')))
                    plt.xticks(years, rotation=90), plt.grid(axis='y'), plt.gca().set_axisbelow(True)
                    plt.show()
                    New & Old EDGAR Entities per Year
                    years = entities['year'].unique().tolist()

                    filers_df = pd.DataFrame({'total filers': num_entities, 'new filers': num_new_filers}, index=sorted(years))
                    filers_df['new filers %'] = filers_df['new filers'] / filers_df['total filers'] * 100
                    filers_df['new filers %'] = filers_df['new filers %'].map('{:,.1f}'.format)
                    filers_df['filers growth YoY'] = filers_df['total filers'].pct_change() * 100
                    filers_df['filers growth YoY'] = filers_df['filers growth YoY'].map('{:,.1f}'.format)
                    filers_df['new filers growth YoY'] = filers_df['new filers'].pct_change() * 100
                    filers_df['new filers growth YoY'] = filers_df['new filers growth YoY'].map('{:,.1f}'.format)
                    filers_df = filers_df.sort_index(ascending=False)

                    filers_df.head()
                    Out:
                    total filersnew filersnew filers %filers growth YoYnew filers growth YoY
                    20221585174868230.7-1.0-9.4
                    20211601865371333.518.353.2
                    20201353933505525.91.410.3
                    20191334943178323.80.5-1.2
                    20181328263217324.2-1.77.9
                    filers_df['new filers growth YoY'] = filers_df['new filers growth YoY'].astype(float)
                    plt.figure(figsize=(5, 4)), plt.bar(filers_df.index, filers_df['new filers growth YoY'])
                    plt.title('New filers growth YoY (%)'), plt.xlabel('Year'), plt.ylabel('New filers growth YoY (%)')
                    plt.grid(axis='y'), plt.grid(axis='x')
                    plt.gca().set_axisbelow(True), plt.xticks(filers_df.index, rotation=90)
                    plt.show()
                    New filers growth YoY (%)

                    Let's summarize the insights from the EDGAR filer's data so far collected:

                    1. The year 2021 marked an all-time high in the number of EDGAR entities, reaching a pinnacle of 160,186 filers. The year recorded a 53% surge in new filers relative to 2020.
                    2. 2021 recorded the second-highest count of newcomers to the EDGAR platform.
                    3. While 2022 experienced a slight dip in new filers compared to its preceding year, it still ranks third in the all-time highest new filer counts.
                    4. Despite the challenges of the COVID-19 pandemic in 2020, both new and total EDGAR filers saw an upward trend compared to 2019.
                    5. The record for the most significant number of new filers goes to 2003, standing second only to 2021, while 2002 witnessed a staggering 230% growth in new filers when contrasted with 2001. It's worth noting that the subsequent year, 2004, witnessed the enforcement of the Sarbanes-Oxley Act, which augmented reporting norms for public entities. This legislative change possibly fueled the surge in new filers in 2003.
                    6. The global financial crisis had a visible impact on EDGAR's metrics. For the first time since EDGAR's inception in 1994, the total number of entities declined during 2008 and 2009.
                    7. A consistent year-on-year decrease in new filers was observed from 2003 to 2009.
                    8. Post-2009, from 2010 onwards, there has been a gradual year-on-year increase in both new and overall entities on EDGAR.

                    Sector and Industry Analysis of EDGAR Filers

                    In this section, we'll focus on a detailed analysis of EDGAR filers segmented by sectors and industries from 1994 to 2022. Our approach will involve:

                    1. Data Segmentation: Utilizing pivot tables, we'll categorize the number of EDGAR filers per sector and industry over the specified timeframe.
                    2. Visualization: Stacked bar charts and line charts will be employed to represent the distribution and trends of filers across different sectors and industries.
                    3. Anomaly Detection: We'll also examine the data for any anomalies which might indicate underlying market trends or significant events.
                    sector_stats = entities.groupby(['year', 'sector'], dropna=False).cik.count().reset_index(name='cik_count')
                    sector_stats[sector_stats['year'] == 2022].sort_values(by='cik_count', ascending=False).head(20)
                    Out:
                    yearsectorcik_count
                    3472022NaN151476
                    3422022Healthcare1461
                    3432022Industrials1360
                    3452022Technology934
                    3412022Financial Services921
                    3382022Consumer Cyclical631
                    3362022Basic Materials374
                    3442022Real Estate315
                    3402022Energy308
                    3372022Communication Services303
                    3392022Consumer Defensive296
                    3462022Utilities138

                    We've segmented the data into 12 unique sectors, one of which is the 'NaN' category. This category encompasses entities that haven't been designated a specific sector, such as trusts or insiders often documented in Form 4.

                    From the 2022 snapshot generated, we can observe the distribution of EDGAR filers across these 12 sectors. The 'NaN' category stands out with a substantial count of 151,477 entities. However, looking beyond this, the 'Healthcare' sector is prominently represented with 1,461 entities, closely trailed by the 'Industrials' sector with 1,360 entities. Such figures indicate a high level of activity and growth in these sectors. Additionally, the 'Technology' and 'Financial Services' sectors remain influential with 934 and 920 entities, respectively. On the other end of the spectrum, the strictly regulated 'Utilities' sector has the least presence, with only 138 entities for 2022.

                    To examine the distribution of filers across sectors over the years, we construct a pivot table. This table designates years as the index column and sectors as column labels. Each cell within this matrix indicates the quantity of EDGAR filers corresponding to a specific sector for a particular year.

                    sector_stats_pivot = sector_stats.pivot(index='year', columns='sector', values='cik_count')
                    sector_stats_pivot[-10:]
                    Out:
                    sectorNaNBasic MaterialsCommunication ServicesConsumer CyclicalConsumer DefensiveEnergyFinancial ServicesHealthcareIndustrialsReal EstateTechnologyUtilities
                    year
                    2013124658452370725311453108711588003371250140
                    2014128024426324639296455107811787443371062146
                    2015129015415310622296451103512037423431040148
                    201612613540730061228143010191185731340966150
                    20171287844023016082804339801213738340920151
                    20181265553872955992684159781272718335861143
                    20191273003782956042703879411292719325848135
                    20201288453643026212763619551403918320891137
                    202115301537531964429733995414851344318958138
                    202215147637430363129630892114611360315934138

                    Let's start by visualizing the data to get a clearer sense of the trends. Once we have the visual representation, we can identify and list the key insights.

                    sector_stats_pivot.plot(kind='bar', stacked=True, figsize=(7, 5))
                    plt.title("EDGAR Filers per Sector per Year")
                    plt.ylabel("Filers"), plt.xlabel("Year"), plt.tight_layout()
                    ax = plt.gca()
                    ax.yaxis.set_major_formatter(mtick.FuncFormatter(lambda x, p: format(int(x), ',')))
                    handles, labels = ax.get_legend_handles_labels()
                    ax.legend(reversed(handles), reversed(labels), loc='lower left', bbox_to_anchor=(1, 0.5), fontsize=8)
                    plt.show()
                    EDGAR Filers per Sector per Year
                    sector_stats_pivot.drop(columns=[np.nan]).plot(kind='bar', stacked=True, figsize=(7, 5))
                    plt.title("EDGAR Filers per Sector per Year (Without NaN Category)")
                    plt.ylabel("Filers"), plt.xlabel("Year"), plt.tight_layout()
                    ax = plt.gca()
                    ax.yaxis.set_major_formatter(mtick.FuncFormatter(lambda x, p: format(int(x), ',')))
                    handles, labels = ax.get_legend_handles_labels()
                    ax.legend(reversed(handles), reversed(labels), loc='lower left', bbox_to_anchor=(1, 0.5), fontsize=8)
                    plt.show()
                    EDGAR Filers per Sector per Year (Without NaN Category)
                    sector_stats_pivot.drop(columns=[np.nan]).plot(kind='line', figsize=(7, 5))
                    plt.title("EDGAR Filers per Sector per Year")
                    plt.ylabel("Filers"), plt.xlabel("Year"), plt.tight_layout(), plt.grid(True), plt.gca().set_axisbelow(True)
                    plt.xticks(sector_stats_pivot.index, rotation=90)
                    ax = plt.gca()
                    ax.yaxis.set_major_formatter(mtick.FuncFormatter(lambda x, p: format(int(x), ',')))
                    handles, labels = ax.get_legend_handles_labels()
                    ax.legend(reversed(handles), reversed(labels), loc='lower left', bbox_to_anchor=(1, 0.5), fontsize=8)
                    plt.show()
                    EDGAR Filers per Sector per Year

                    From the visual representation of the EDGAR filers per sector over time, we can derive the following key insights:

                    1. Technology Boom and Subsequent Stabilization: The Technology sector experienced a significant rise in the number of filers from 1996 to 2000, likely corresponding to the dot-com boom. However, after 2000, there was a decline, and then the number of filers stabilized and even increased slightly in the most recent years.

                    2. Financial Services Peak and Decline: The Financial Services sector saw a steady increase in the number of filers up to around 1999, after which there was a gradual decline, with minor fluctuations in the subsequent years.

                    3. Healthcare's Consistent Rise: The Healthcare sector has experienced a consistent increase in the number of filers over the years, with notable growth in the most recent years.

                    4. Consumer Cyclical Variability: The Consumer Cyclical sector experienced a steady rise until around 1999, followed by a decline. The sector has seen fluctuations but remains one of the larger sectors in terms of filers.

                    5. Stability in Utilities: The Utilities sector has remained relatively stable over the years in terms of the number of filers.

                    6. Industrials' Recent Surge: The Industrials sector has seen a significant rise in the number of filers in the last couple of years, indicating recent activity or interest in this sector.

                    7. General Sector Trends: Most sectors experienced growth in the number of filers during the mid to late 90s, followed by stabilization or slight decline in the 2000s. The past decade has seen either stabilization or growth for many sectors, with some exceptions.

                    8. Smallest Sectors: Communication Services and Real Estate have consistently been among the sectors with the lowest number of filers, though they too have experienced fluctuations over the years.

                    9. Impact of Global Events: Notable declines in many sectors can be observed around 2008, likely corresponding to the global financial crisis.

                    These insights provide a high-level view of the trends and shifts in the different sectors over the years based on the EDGAR filers' data. Further analysis could delve deeper into the reasons behind these trends, especially the recent surge in the Industrials sector.

                    sector_stats_new_filers = new_filers.groupby(['year', 'sector'], dropna=False).cik.count().reset_index(name='cik_count')
                    sector_stats_new_filers[sector_stats_new_filers['year'] == 2022].sort_values(by='cik_count', ascending=False).head(20)
                    Out:
                    yearsectorcik_count
                    3462022NaN48522
                    3422022Industrials51
                    3412022Healthcare21
                    3442022Technology21
                    3372022Consumer Cyclical18
                    3352022Basic Materials11
                    3362022Communication Services9
                    3392022Energy9
                    3402022Financial Services8
                    3382022Consumer Defensive7
                    3432022Real Estate3
                    3452022Utilities2
                    sector_stats_new_filers_pivot = sector_stats_new_filers.pivot(index='year', columns='sector', values='cik_count')
                    sector_stats_new_filers_pivot.fillna(0, inplace=True)
                    sector_stats_new_filers_pivot = sector_stats_new_filers_pivot.astype(int)
                    sector_stats_new_filers_pivot[-5:]
                    Out:
                    sectorNaNBasic MaterialsCommunication ServicesConsumer CyclicalConsumer DefensiveEnergyFinancial ServicesHealthcareIndustrialsReal EstateTechnologyUtilities
                    year
                    20183192010143311103181217323
                    2019315391115241772571217442
                    2020344261022562114514722511856
                    20215290715265024834964565839
                    2022485221191879821513212
                    sector_stats_new_filers_pivot.loc[1997:, :].drop(columns=[np.nan]).plot(kind='bar', stacked=True, figsize=(7, 5))
                    plt.title("New EDGAR Filers per Sector Per Year"), plt.ylabel("New Filers"), plt.xlabel("Year"), plt.tight_layout()
                    ax = plt.gca()
                    ax.yaxis.set_major_formatter(mtick.FuncFormatter(lambda x, p: format(int(x), ',')))
                    handles, labels = ax.get_legend_handles_labels()
                    ax.legend(reversed(handles), reversed(labels), loc='lower left', bbox_to_anchor=(1, 0.5), fontsize=8)
                    plt.show()
                    New EDGAR Filers per Sector Per Year

                    Industry-Level Analysis

                    Building on our sectorial analysis, we now narrow our focus to the industry level, offering a more granular perspective on the landscape of EDGAR filers over last three decades.

                    While bar charts will provide a clear representation of the data, we'll also introduce heatmaps to capture and visualize correlations between industries. Such visual tools offer a nuanced understanding of industry interplay and their respective prominence over time.

                    industry_stats = entities.groupby(['year', 'industry'], dropna=False).cik.count().reset_index(name='cik_count')
                    industry_stats[industry_stats['year'] == 2022].sort_values(by='cik_count', ascending=False).head(10)
                    Out:
                    yearindustrycik_count
                    43602022NaN151476
                    42302022Biotechnology831
                    43342022Shell Companies644
                    42262022Banks - Regional417
                    43362022Software - Application317
                    42922022Medical Devices181
                    43372022Software - Infrastructure165
                    42202022Asset Management141
                    42522022Drug Manufacturers - Specialty & Generic115
                    42982022Oil & Gas E&P103
                    print('Number of industries:', len(industry_stats['industry'].unique()))
                    Number of industries: 154
                    industry_stats_pivot = industry_stats.pivot(index='year', columns='industry', values='cik_count')
                    industry_stats_pivot.fillna(0, inplace=True)
                    industry_stats_pivot = industry_stats_pivot.astype(int)
                    industry_stats_pivot[-5:]
                    Out:
                    industryNaNAdvertising AgenciesAerospace & DefenseAgricultural InputsAirlinesAirports & Air ServicesAluminumApparel ManufacturingApparel RetailAsset Management...Travel ServicesTruckingUraniumUtilities - DiversifiedUtilities - Independent Power ProducersUtilities - Regulated ElectricUtilities - Regulated GasUtilities - Regulated WaterUtilities - RenewableWaste Management
                    year
                    2018126555416422241062546136...1521101496823171225
                    2019127300426417241062343137...2021101486220181329
                    2020128845457317251062244145...212381596019161830
                    2021153015527917251262541149...202191575719172327
                    2022151476517919251262337141...1822101575619162526

                    5 rows × 154 columns

                    industry_stats_pivot.loc[1997:, :].drop(columns=[np.nan]).plot(kind='bar', stacked=True, figsize=(7, 5))
                    plt.title("EDGAR Filers per Industry Per Year")
                    plt.ylabel("New Filers"), plt.xlabel("Year"), plt.legend().set_visible(False)
                    plt.show()
                    EDGAR Filers per Industry Per Year
                    industry_stats_new_filers = new_filers.groupby(['year', 'industry'], dropna=False).cik.count().reset_index(name='cik_count')
                    industry_stats_pivot_new_filers = industry_stats_new_filers.pivot(index='year', columns='industry', values='cik_count')
                    industry_stats_pivot_new_filers.fillna(0, inplace=True)
                    industry_stats_pivot_new_filers = industry_stats_pivot_new_filers.astype(int)
                    industry_stats_pivot_new_filers[-5:]
                    Out:
                    industryNaNAdvertising AgenciesAerospace & DefenseAgricultural InputsAirlinesAirports & Air ServicesAluminumApparel ManufacturingApparel RetailAsset Management...Travel ServicesTruckingUraniumUtilities - DiversifiedUtilities - Independent Power ProducersUtilities - Regulated ElectricUtilities - Regulated GasUtilities - Regulated WaterUtilities - RenewableWaste Management
                    year
                    201831920102110103...0010002011
                    201931539232010015...0100001012
                    2020344263910200110...2101000050
                    202152907790020218...0000001172
                    202248522122010001...0110000022

                    5 rows × 154 columns

                    top_10_industries = {}
                    for year in range(1994, 2023):
                        top_10_industries[year] = industry_stats_new_filers[industry_stats_new_filers['year'] == year].sort_values(by='cik_count', ascending=False).head(20)
                        top_10_industries[year] = top_10_industries[year][top_10_industries[year]['cik_count'] > 15]

                    top_10_industries_df = pd.concat(top_10_industries, axis=0)
                    top_10_industries_df.reset_index(inplace=True)
                    top_10_industries_df.drop(columns=['level_0', 'level_1'], inplace=True)
                    top_10_industries_df['year'] = top_10_industries_df['year'].astype(int)
                    top_10_industries_df
                    Out:
                    yearindustrycik_count
                    01994NaN9915
                    11994Banks - Regional137
                    21994Software - Application65
                    31994Biotechnology63
                    41994Specialty Industrial Machinery59
                    ............
                    1712021Software - Application38
                    1722021Software - Infrastructure22
                    1732021Banks - Regional16
                    1742022NaN48522
                    1752022Shell Companies29

                    176 rows × 3 columns

                    top_10_industries_df = top_10_industries_df['industry'].unique().tolist()
                    industry_stats_pivot_new_filers_top_10 = industry_stats_pivot_new_filers[top_10_industries_df]
                    industry_stats_pivot_new_filers_top_10.loc[1997:, :].drop(columns=[np.nan]).plot(kind='bar', stacked=True, figsize=(7, 5))
                    plt.title("New EDGAR Filers per Top 10 Industry Per Year"), plt.ylabel("New Filers"), plt.xlabel("Year")
                    ax = plt.gca()
                    handles, labels = ax.get_legend_handles_labels()
                    ax.legend(reversed(handles), reversed(labels), loc='center left', bbox_to_anchor=(1, 0.5), fontsize=8)
                    plt.show()
                    New EDGAR Filers per Top 10 Industry Per Year
                    industry_stats_pivot_new_filers_top_10_pct_change = industry_stats_pivot_new_filers_top_10.pct_change() * 100
                    industry_stats_pivot_new_filers_top_10_pct_change = industry_stats_pivot_new_filers_top_10_pct_change.iloc[1:, :]
                    industry_stats_pivot_new_filers_top_10_pct_change = industry_stats_pivot_new_filers_top_10_pct_change.replace([np.inf, -np.inf], 0)
                    industry_stats_pivot_new_filers_top_10_pct_change = industry_stats_pivot_new_filers_top_10_pct_change.iloc[:, 1:]
                    industry_stats_pivot_new_filers_top_10_pct_change = industry_stats_pivot_new_filers_top_10_pct_change.reindex(sorted(industry_stats_pivot_new_filers_top_10_pct_change.columns), axis=1)
                    industry_stats_pivot_new_filers_top_10_pct_change[-5:]
                    Out:
                    industryAerospace & DefenseAsset ManagementAuto PartsBanks - RegionalBiotechnologyBuilding Products & EquipmentCommunication EquipmentComputer HardwareDiagnostics & ResearchElectrical Equipment & Parts...Semiconductor Equipment & MaterialsSemiconductorsShell CompaniesSoftware - ApplicationSoftware - InfrastructureSpecialty ChemicalsSpecialty Industrial MachinerySpecialty RetailTelecom ServicesUtilities - Regulated Electric
                    year
                    2018-100.00000050.000000150.00000062.50000012.0000000.0000000.000000100.00.0200.000000...-100.00.000000-66.66666730.000000-18.181818-50.0-50.000000-100.000000-100.000000-100.0
                    20190.00000066.666667-60.000000-30.769231-23.214286100.000000200.0000000.0300.0-66.666667...NaN0.000000166.66666746.153846-22.222222-50.0100.0000000.0000000.000000NaN
                    2020200.000000100.000000150.0000000.000000116.27907050.000000-66.666667300.025.0300.000000...NaN200.0000002212.500000136.84210585.7142860.0200.000000225.000000200.000000NaN
                    20210.000000-20.000000-40.00000077.777778-31.182796-66.666667200.000000-87.5-40.0150.000000...0.033.333333125.945946-15.55555669.230769100.0-66.666667-76.92307733.333333NaN
                    2022-77.777778-87.500000-33.333333-75.000000-79.687500-100.000000-66.6666670.0-100.0-100.000000...-100.0-50.000000-93.062201-76.315789-77.272727-100.00.000000-100.000000-100.000000NaN

                    5 rows × 33 columns

                    sns.heatmap(industry_stats_pivot_new_filers_top_10_pct_change, cmap='Blues')
                    plt.gcf().set_size_inches(7, 7), plt.title("Change of New Filers per Industry per Year")
                    plt.ylabel("Year"), plt.xlabel(""), plt.tight_layout(), plt.grid(color='lightgrey', linestyle='dashed', linewidth=0.5)
                    ax = plt.gca()
                    ax.tick_params(axis='x', rotation=50), ax.set_xticklabels(ax.get_xticklabels(), ha='left')
                    plt.xticks(np.arange(len(industry_stats_pivot_new_filers_top_10_pct_change.columns)), industry_stats_pivot_new_filers_top_10_pct_change, rotation=90)
                    plt.yticks(np.arange(0.5, len(industry_stats_pivot_new_filers_top_10_pct_change.loc[1995:, :].index), 1), industry_stats_pivot_new_filers_top_10_pct_change.loc[1995:, :].index, rotation=0)
                    plt.show()
                    Change of New Filers per Industry per Year
                    industry_stats_pivot_new_filers_top_10_corr = industry_stats_pivot_new_filers_top_10_pct_change.corr()
                    industry_stats_pivot_new_filers_top_10_corr = industry_stats_pivot_new_filers_top_10_corr[(industry_stats_pivot_new_filers_top_10_corr > 0.5) | (industry_stats_pivot_new_filers_top_10_corr < -0.5)]
                    # hide all values below the diagonal
                    mask = np.tril(np.ones_like(industry_stats_pivot_new_filers_top_10_corr, dtype=bool))
                    plt.figure(figsize=(12, 10))
                    sns.heatmap(industry_stats_pivot_new_filers_top_10_corr, annot=False, cmap='coolwarm', vmin=-1, vmax=1, mask=mask)
                    plt.title('Correlation Heatmap between Industries'), plt.tight_layout(), plt.gcf().set_size_inches(10, 7)
                    plt.ylabel(''), plt.xlabel('')
                    plt.gca().xaxis.grid(color='lightgrey', linestyle='dashed', linewidth=0.5, which='major', alpha=0.5)
                    plt.gca().yaxis.grid(color='lightgrey', linestyle='dashed', linewidth=0.5, which='major', alpha=0.5)
                    plt.gca().set_axisbelow(True)
                    ax = plt.gca()
                    ax.tick_params(axis='x', rotation=50), ax.set_xticklabels(ax.get_xticklabels(), ha='right')
                    plt.show()
                    Correlation Heatmap between Industries
                    most_correlated_industries = industry_stats_pivot_new_filers_top_10_corr.sum().sort_values(ascending=False).head(10)
                    print('Top 10 industries correlated the most with the other industries')
                    print('-'*70)
                    most_correlated_industries
                    Top 10 industries correlated the most with the other industries
                    ----------------------------------------------------------------------
                    Out:
                    industry
                    Semiconductors 7.680265
                    Biotechnology 7.614659
                    Telecom Services 5.887208
                    Computer Hardware 5.270098
                    Software - Application 4.974338
                    Other Industrial Metals & Mining 4.202393
                    Electronic Components 4.129252
                    Shell Companies 4.060234
                    Specialty Retail 3.819267
                    Specialty Industrial Machinery 3.433053
                    dtype: float64

                    EDGAR Entities by SIC and SEC Office

                    We now turn our attention to two key categorizations: the Standard Industrial Classification (SIC) codes and the designated SEC Offices. While SIC codes offer insights into the primary business activities of these entities, the SEC Offices provide an organizational perspective on how the SEC manages and oversees these entities.

                    SEC Offices Overview: There are 12 distinct SEC Offices responsible for different sectors and industries. These are:

                    • Industrial Applications and Services
                    • Energy & Transportation
                    • Finance
                    • International Corp Fin
                    • Life Sciences
                    • Manufacturing
                    • Real Estate & Construction
                    • Structured Finance
                    • Technology
                    • Trade & Services
                    • Crypto Assets (Note: Some entities fall under both "Finance" and "Crypto Assets")

                    SIC Codes Overview: The SIC system classifies 444 industries using a unique 3 or 4 digit code. For instance, while code 7374 represents Computer Processing & Data Preparation, another code like 2834 denotes Pharmaceutical Preparations.

                    In this section, we will analyze and visualize the distribution of EDGAR entities based on their SIC codes and the SEC Offices they fall under, focusing especially on the trends over the past three decades.

                    office_stats = entities.groupby(['year', 'office'], dropna=False).cik.count().reset_index(name='cik_count')
                    office_stats
                    Out:
                    yearofficecik_count
                    01994Industrial Applications and Services315
                    11994Office of Energy & Transportation561
                    21994Office of Finance514
                    31994Office of Finance or Office of Crypto Assets11
                    41994Office of International Corp Fin1
                    ............
                    3692022Office of Real Estate & Construction1928
                    3702022Office of Structured Finance1115
                    3712022Office of Technology1256
                    3722022Office of Trade & Services1683
                    3732022NaN146558

                    374 rows × 3 columns

                    office_stats_pivot = office_stats.pivot(index='year', columns='office', values='cik_count')
                    office_stats_pivot.fillna(0, inplace=True)
                    office_stats_pivot = office_stats_pivot.astype(int)
                    office_stats_pivot[-5:]
                    Out:
                    officeNaNIndustrial Applications and ServicesOffice of Crypto AssetsOffice of Energy & TransportationOffice of FinanceOffice of Finance or Office of Crypto AssetsOffice of International Corp FinOffice of Life SciencesOffice of ManufacturingOffice of Real Estate & ConstructionOffice of Structured FinanceOffice of TechnologyOffice of Trade & Services
                    year
                    2018121938878107149011686110985613241215109911291452
                    201912249889498141611217417791712781227106511411588
                    20201244838608412901094749897612641485106710921526
                    2021148335948911285111397101111012991916109511981598
                    2022146558942891196107912391117312841928111512561683
                    sns.heatmap(office_stats_pivot.drop(columns=[np.nan]), cmap='Blues')
                    plt.gcf().set_size_inches(5, 7)
                    plt.title("EDGAR Filers per SEC Office per Year"), plt.ylabel("Year"), plt.xlabel("SEC Office")
                    plt.tight_layout(), plt.gca().yaxis.grid(color='lightgrey', linestyle='dashed', linewidth=0.5)
                    ax = plt.gca()
                    ax.tick_params(axis='x', rotation=50), ax.set_xticklabels(ax.get_xticklabels(), ha='right')
                    plt.show()
                    EDGAR Filers per SEC Office per Year
                    office_stats_pivot.drop(columns=[np.nan]).plot(kind='line', figsize=(7, 5))
                    plt.title("EDGAR Filers per SEC Office per Year")
                    plt.ylabel("Filers"), plt.xlabel("Year"), plt.tight_layout(), plt.grid(True), plt.gca().set_axisbelow(True)
                    plt.xticks(sector_stats_pivot.index, rotation=90)
                    ax = plt.gca()
                    ax.yaxis.set_major_formatter(mtick.FuncFormatter(lambda x, p: format(int(x), ',')))
                    handles, labels = ax.get_legend_handles_labels()
                    ax.legend(reversed(handles), reversed(labels), loc='lower left', bbox_to_anchor=(1, 0.5), fontsize=8)
                    plt.show()
                    EDGAR Filers per SEC Office per Year - Line Plot
                    office_stats_pivot.drop(columns=[np.nan]).plot(kind='bar', stacked=True, figsize=(5, 4))
                    plt.title("CIK Count per Office Over the Years"), plt.ylabel("CIK Count"), plt.xlabel("Year")
                    plt.tight_layout()
                    ax = plt.gca()
                    ax.yaxis.set_major_formatter(mtick.FuncFormatter(lambda x, p: format(int(x), ',')))
                    handles, labels = ax.get_legend_handles_labels()
                    ax.legend(reversed(handles), reversed(labels), loc='center left', bbox_to_anchor=(1, 0.5), fontsize=8)
                    plt.show()
                    CIK Count per Office Over the Years
                    office_stats_pivot.plot(kind='bar', stacked=True, figsize=(6, 5))
                    plt.title("CIK Count per Office Over the Years"), plt.ylabel("CIK Count"), plt.xlabel("Year")
                    plt.tight_layout()
                    ax = plt.gca()
                    ax.yaxis.set_major_formatter(mtick.FuncFormatter(lambda x, p: format(int(x), ',')))
                    handles, labels = ax.get_legend_handles_labels()
                    ax.legend(reversed(handles), reversed(labels), loc='center left', bbox_to_anchor=(1, 0.5), fontsize=8)
                    plt.show()
                    CIK Count per Office Over the Years
                    office_stats_pivot.loc[1996:2022].drop(columns=[np.nan]).plot(kind='box', figsize=(7, 5))
                    plt.title("Entities per SEC Office per Year (1996 - 2022)"), plt.ylabel("Entities"), plt.xlabel("SEC Office"), plt.tight_layout()
                    ax = plt.gca()
                    ax.tick_params(axis='x', rotation=50), ax.set_xticklabels(ax.get_xticklabels(), ha='right')
                    ax.yaxis.set_major_formatter(mtick.FuncFormatter(lambda x, p: format(int(x), ',')))
                    plt.gca().xaxis.grid(color='lightgrey', linestyle='dashed', linewidth=0.5, which='major', alpha=0.5)
                    plt.show()
                    Entities per SEC Office per Year - Box Plot (1996 - 2022)

                    In analyzing the EDGAR filers per SEC Office over the years, several interesting trends emerge. First and foremost, it's evident that the majority of filers is not associated with a SIC code and therefore not associated with an SEC Office. Historically, in 1994, 70% of all filers were not associated with a SIC code. By 2022, this number has surged to 92%. Such a large proportion of filers not being associated with an SEC Office raises the question of who is responsible for overseeing these filers and their filings at the SEC.

                    To provide context, persons reporting security acquisitions/disposals (Form 3, 4, 5) are not associated with a SIC code. However, they are still mandated to file with the SEC and are subsequently included as an entity in the EDGAR filer data. Given that Form 3/4/5 filings account for around 25% of all filings filed annually, this might shed light on the high number of filers not associated with a SIC code. Yet, it only accounts for a fraction of the 92% of filers without a SIC code.

                    Delving deeper into specific SEC Offices, the Office of Life Sciences has shown consistent growth, while offices like the Office of Energy & Transportation, Office of Finance, and Office of Technology have seen a mostly consistent decline. The Office of Structured Finance, in particular, peaked around 2005 before witnessing a sharp decline and then stabilizing after 2010. On the other hand, the Office of Crypto Assets, starting from a modest base, has displayed growth, underscoring the rising interest or activity in crypto assets over the years. The dynamics of these trends reflect shifts in sectors or focus areas over the years and can be crucial for understanding the evolving landscape of financial filings.

                    sic_stats = entities.groupby(['year', 'sic'], dropna=False).cik.count().reset_index(name='cik_count')
                    sic_stats_pivot = sic_stats.pivot(index='year', columns='sic', values='cik_count')
                    sic_stats_pivot.fillna(0, inplace=True)
                    sic_stats_pivot = sic_stats_pivot.loc[:, sic_stats_pivot.mean() >= 50]
                    sic_stats_pivot = sic_stats_pivot.astype(int)
                    sic_stats_pivot[-5:]
                    Out:
                    sicNaN00001000 Metal Mining1040 Gold and Silver Ores1311 Crude Petroleum &amp; Natural Gas1381 Drilling Oil &amp; Gas Wells1382 Oil &amp; Gas Field Exploration Services1400 Mining &amp; Quarrying of Nonmetallic Minerals (No Fuels)1531 Operative Builders2834 Pharmaceutical Preparations...7372 Services-Prepackaged Software7373 Services-Computer Integrated Systems Design7374 Services-Computer Processing &amp; Data Preparation7389 Services-Business Services, NEC7990 Services-Miscellaneous Amusement &amp; Recreation8200 Services-Educational Services8731 Services-Commercial Physical &amp; Biological Research8742 Services-Management Consulting Services8880 American Depositary Receipts9995 Non-Operating Establishments
                    year
                    20181218548314311631840404755658...3075912519433493979689
                    20191224296913413328540344946693...333531312073857377513612
                    20201244255811111324331254355715...32346140194476134735616
                    20211482795610712024334343950797...37548167212447236695618
                    20221464896910211919726253728835...4055317621948763276468

                    5 rows × 56 columns

                    top_20_sics_2022 = sic_stats[sic_stats['year'] == 2022].sort_values(by='cik_count', ascending=False)[:10]
                    top_20_sics_2022
                    Out:
                    yearsiccik_count
                    126242022NaN146489
                    1253620226189 Asset-Backed Securities1115
                    1255720226770 Blank Checks1098
                    1228720222834 Pharmaceutical Preparations835
                    1257820227372 Services-Prepackaged Software405
                    1256120226798 Real Estate Investment Trusts390
                    1252520226022 State Commercial Banks253
                    1252320225990 Retail-Retail Stores, NEC248
                    1228920222836 Biological Products, (No Diagnostic Subst...240
                    1258420227389 Services-Business Services, NEC219
                    sic_stats_wo_nan = sic_stats_pivot.drop(columns=[np.nan, '0000'])
                    sns.heatmap(sic_stats_wo_nan, cmap='Blues')
                    plt.gcf().set_size_inches(13, 5)
                    plt.title("CIK Count per SIC Over the Years"), plt.ylabel("Year"), plt.xlabel("SIC")
                    plt.gca().yaxis.grid(color='lightgrey', linestyle='dashed', linewidth=0.5)
                    ax = plt.gca()
                    ax.set_xticklabels(ax.get_xticklabels(), ha='left')
                    plt.xticks(np.arange(len(sic_stats_wo_nan.columns)), sic_stats_wo_nan, rotation=90)
                    plt.show()
                    CIK Count per SIC Over the Years - Heatmap
                    sns.heatmap(sic_stats_wo_nan.loc[2009:2022], cmap='Blues')
                    plt.gcf().set_size_inches(13, 4)
                    plt.title("CIK Count per SIC Over the Years"), plt.ylabel("Year"), plt.xlabel("SIC")
                    plt.gca().yaxis.grid(color='lightgrey', linestyle='dashed', linewidth=0.5)
                    ax = plt.gca()
                    ax.set_xticklabels(ax.get_xticklabels(), ha='left')
                    plt.xticks(np.arange(len(sic_stats_wo_nan.columns)), sic_stats_wo_nan, rotation=90)
                    plt.show()
                    CIK Count per SIC Over the Years (2009 - 2022) - Heatmap

                    Footer

                    Products

                    • EDGAR Filing Search API
                    • Full-Text Search API
                    • Real-Time Filing Stream API
                    • Filing Download & PDF Generator API
                    • XBRL-to-JSON Converter
                    • 10-K/10-Q/8-K Item Extractor
                    • Investment Adviser & Form ADV API
                    • Insider Trading Data - Form 3, 4, 5
                    • Restricted Sales Notifications - Form 144
                    • Institutional Holdings - Form 13F
                    • Form N-PORT API - Investment Company Holdings
                    • Form N-PX API - Proxy Voting Records
                    • Form 13D/13G API
                    • Form S-1/424B4 - IPOs, Debt & Rights Offerings
                    • Form C - Crowdfunding Offerings
                    • Form D - Private Placements & Exempt Offerings
                    • Regulation A Offering Statements API
                    • Changes in Auditors & Accountants
                    • Non-Reliance on Prior Financial Statements
                    • Executive Compensation Data API
                    • Directors & Board Members Data
                    • Company Subsidiaries Database
                    • Outstanding Shares & Public Float
                    • SEC Enforcement Actions
                    • Accounting & Auditing Enforcement Releases (AAERs)
                    • SRO Filings
                    • CIK, CUSIP, Ticker Mapping

                    General

                    • Pricing
                    • Features
                    • Supported Filings
                    • EDGAR Filing Statistics

                    Account

                    • Sign Up - Start Free Trial
                    • Log In
                    • Forgot Password

                    Developers

                    • API Sandbox
                    • Documentation
                    • Resources & Tutorials
                    • Python API SDK
                    • Node.js API SDK

                    Legal

                    • Terms of Service
                    • Privacy Policy

                    Legal

                    • Terms of Service
                    • Privacy Policy

                    SEC API

                    © 2025 sec-api.io by Data2Value GmbH. All rights reserved.

                    SEC® and EDGAR® are registered trademarks of the U.S. Securities and Exchange Commission (SEC).

                    EDGAR is the Electronic Data Gathering, Analysis, and Retrieval system operated by the SEC.

                    sec-api.io and Data2Value GmbH are independent of, and not affiliated with, sponsored by, or endorsed by the U.S. Securities and Exchange Commission.

                    sec-api.io is classified under SIC code 7375 (Information Retrieval Services), providing on-demand access to structured data and online information services.