sec-api.ioSEC API by D2V
FilingsPricingSandboxDocs
Log inGet Free API Key
  1. Home
  2. Tutorials

Extract Financial Statements from SEC Filings and XBRL Data with Python

Open In Colab   Download Notebook

On this page:

  • Overview of XBRL and the Converter API
    • XBRL Files of Apple's 10-K
    • Getting Started
      • Extract Income Statement from XBRL Data in 10-K Filing
        • Export Income Statement from SEC 10-K Filing to Excel
          • Extract and Merge Financial Statements from Multiple 10-K Filings
            • Extract and Visualize Revenue by Product Category per Year from 10-Ks
              • Extract and Visualize Revenue by Region per Year from 10-Ks

                Welcome to this Python tutorial on how to extract financial data from XBRL in SEC filings using Python. You will learn how to convert XBRL data into a pandas dataframe, extract income statements and balance sheets from 10-K filings, and build financial tables from EDGAR XBRL files. We will also explore how to extract and visualize revenue metrics by product categories and geographical regions.

                You don't need to have any prior understanding of XBRL to follow and complete this tutorial successfully.

                The 10-K filings of Apple serve as an excellent example throughout the tutorial. To retrieve the XBRL data of an SEC filing by its URL, we use the .xbrl_to_json(filing_url) function. The XBRL data is already converted to JSON format, and the financial statements are standardized for ease of use.

                It's important to note that XBRL data often includes non-standard names for financial statements such as ConsolidatedStatementsofOperations and ConsolidatedStatementsOfLossIncome. To solve this issue, the API automatically converts non-standard names for all financial statements, for example, income statements to StatementsOfIncome.

                All US GAAP item names are standardized as well. For instance, us-gaap_SalesRevenueGoodsNet is renamed to SalesRevenueGoodsNet by removing the "us-gaap_" prefix. It's crucial to bear in mind that GAAP items are not renamed to their parent name. As an example, us-gaap_SalesRevenueGoodsNet is not renamed to Revenue. Though, the original GAAP item name is always maintained.

                Overview of XBRL and the Converter API

                Feel free to skip to the next section if you already have a good understanding of XBRL.

                XBRL (eXtensible Business Reporting Language) is a standardized format for exchanging business and financial information between organizations, such as the US SEC and publicly listed companies. The majority of unstructured data in commonly known filings like 10-Ks, is nowadays available in a structured, but highly complex format, as XBRL.

                Before the introduction of XBRL, extracting financial statements like income statements from filings involved scraping and heavily post-processing the data to structure it. With XBRL, we can now access information such as revenue, net profit, etc. without scraping the HTML filing. However, using raw XBRL data can be challenging due to its complexity. This is where our XBRL-to-JSON Converter API comes in, simplifying the process of parsing XBRL data.

                The SEC requires certain filings to include structured XBRL data. These filings include 10-K, 10-Q, 8-K, 20-F, S-1 and many others. A comprehensive list of filings that contain XBRL data can be found here.

                XBRL Files of Apple's 10-K

                Let's take one of Apple's 10-K filings as an example:

                xbrl-files-example

                As we can see from the screenshot of the EDGAR filing details page above, the XBRL data is divided across six different XML files:

                • Schema document (aapl-20200926.xsd)
                • Calculation linkbase doc (aapl-20200926_cal.xml)
                • Definition linkbase doc (_def.xml)
                • Label linkbase doc (_lab.xml)
                • Presentation linkbase doc (_pre.xml)
                • Instance doc (_htm.xml)

                These files contain the financial statements and financial metrics such as revenue, earnings per share, net profit, and various text and notes to financial statements, along with information about how the data should be represented, for example in what order keys such as revenue, cost of goods sold, should be displayed in an income statement.

                The XBRL-to-JSON Converter API takes the URL of Apple's 10-K filing as input, merges all those XBRL XML files into a single data object, converts the merged content into JSON data, standardizes all element labels and returns the data in JSON format. You don't have to take care of any part of that process anymore. You don't have to worry about merging and combining individual XBRL files or learn how to find the context of XBRL instance elements. All this heavy lifting is done by the XBRL-to-JSON Converter API.

                In summary, you simply provide the URL of the 10-K filing and you receive a completely structured SEC filing in JSON format as a result that can simply be imported into a pandas dataframe.

                intro

                Getting Started

                We start with installing the sec-api Python package an instantiating the XbrlApi() providing us with a simply way to access the XBRL-to-JSON converter.

                API_KEY = 'YOUR_API_KEY'
                !pip install -q sec-api
                from sec_api import XbrlApi

                xbrlApi = XbrlApi(API_KEY)

                Extract Income Statement from XBRL Data in 10-K Filing

                We show you how to extract the income statement of Apple's 10-K filing by using the .xbrl_to_json(htm_url) function and providing the URL of the filing as the input parameter. The function extracts and standardizes all XBRL XML and XSD files attached to the filing into a JSON format, making it easy to locate and analyze specific financial data. The function also returns all other financial statements from the 10-K filing, such as the balance sheet and cash flow statement, as well as all disclosure notes.

                In 10-K filings, income statements are disclosed in item 8 "Financial Statements and Supplementary Data". Using the .xbrl_to_json(htm_url) function, we don't have to scrape the filing section and convert the HTML table into a dataframe, and instead can simply access the statement with aapl_xbrl_json['StatementsOfIncome']. Following is an overview of the mapping between the financial statements and the keys of their XBRL counterparts.

                Financial StatementXBRL-JSON Counterpart
                Income statementxbrl_json['StatementsOfIncome']
                Balance sheetxbrl_json['BalanceSheets']
                Cash flow statementxbrl_json['StatementsOfCashFlows']

                If you want to locate the URLs of Apple's or other companies' 10-K filings, you can use our Query API to do so.

                # URL of Apple's 10-K filings
                url_10k_aapl = 'https://www.sec.gov/Archives/edgar/data/320193/000032019322000108/aapl-20220924.htm'

                aapl_xbrl_json = xbrlApi.xbrl_to_json(htm_url=url_10k_aapl)
                print("Keys of income statement dictionary in XBRL from Apple's 10-K filing")
                print('--------------------------------------------------------------------')
                print(*list(aapl_xbrl_json['StatementsOfIncome'].keys()), sep='
                '
                )
                Keys of income statement dictionary in XBRL from Apple's 10-K filing
                --------------------------------------------------------------------
                RevenueFromContractWithCustomerExcludingAssessedTax
                CostOfGoodsAndServicesSold
                GrossProfit
                ResearchAndDevelopmentExpense
                SellingGeneralAndAdministrativeExpense
                OperatingExpenses
                OperatingIncomeLoss
                NonoperatingIncomeExpense
                IncomeLossFromContinuingOperationsBeforeIncomeTaxesExtraordinaryItemsNoncontrollingInterest
                IncomeTaxExpenseBenefit
                NetIncomeLoss
                EarningsPerShareBasic
                EarningsPerShareDiluted
                WeightedAverageNumberOfSharesOutstandingBasic
                WeightedAverageNumberOfDilutedSharesOutstanding

                Next, we build a function get_income_statement(xbrl_json) to convert the income statement inside the XBRL-JSON data to a pandas dataframe while ignoring financial metrics related to specific dimensions, such as revenue generated by product category or geographical region.

                import pandas as pd 

                # convert XBRL-JSON of income statement to pandas dataframe
                def get_income_statement(xbrl_json):
                    income_statement_store = {}

                    # iterate over each US GAAP item in the income statement
                    for usGaapItem in xbrl_json['StatementsOfIncome']:
                        values = []
                        indicies = []

                        for fact in xbrl_json['StatementsOfIncome'][usGaapItem]:
                            # only consider items without segment. not required for our analysis.
                            if 'segment' not in fact:
                                index = fact['period']['startDate'] + '-' + fact['period']['endDate']
                                # ensure no index duplicates are created
                                if index not in indicies:
                                    values.append(fact['value'])
                                    indicies.append(index)

                        income_statement_store[usGaapItem] = pd.Series(values, index=indicies)

                    income_statement = pd.DataFrame(income_statement_store)
                    # switch columns and rows so that US GAAP items are rows and each column header represents a date range
                    return income_statement.T


                income_statement_aapl = get_income_statement(aapl_xbrl_json)


                print("Income statement of Apple's 2022 10-K filing as dataframe")
                print('---------------------------------------------------------')
                income_statement_aapl
                Income statement of Apple's 2022 10-K filing as dataframe
                ---------------------------------------------------------
                Out:
                2021-09-26-2022-09-242020-09-27-2021-09-252019-09-29-2020-09-26
                RevenueFromContractWithCustomerExcludingAssessedTax394328000000365817000000274515000000
                CostOfGoodsAndServicesSold223546000000212981000000169559000000
                GrossProfit170782000000152836000000104956000000
                ResearchAndDevelopmentExpense262510000002191400000018752000000
                SellingGeneralAndAdministrativeExpense250940000002197300000019916000000
                OperatingExpenses513450000004388700000038668000000
                OperatingIncomeLoss11943700000010894900000066288000000
                NonoperatingIncomeExpense-334000000258000000803000000
                IncomeLossFromContinuingOperationsBeforeIncomeTaxesExtraordinaryItemsNoncontrollingInterest11910300000010920700000067091000000
                IncomeTaxExpenseBenefit19300000000145270000009680000000
                NetIncomeLoss998030000009468000000057411000000
                EarningsPerShareBasic6.155.673.31
                EarningsPerShareDiluted6.115.613.28
                WeightedAverageNumberOfSharesOutstandingBasic162159630001670127200017352119000
                WeightedAverageNumberOfDilutedSharesOutstanding163258190001686491900017528214000

                The following screenshot displays the same consolidated statements of operations, also known as the income statement, from the 10-K filing of Apple in "Item 8 - Financial Statements and Supplementary Data".

                apple-10-k-income-statement

                To provide further context, we included a table that maps the financial metric names such as "net sales" and "cost of sales" from the human-readable 10-K to its corresponding XBRL counterparts accessible in our income_statement_aapl dataframe. The 10-K filing can be accessed through this link.

                Human-Readable 10-K FilingCounterpart in XBRL
                Total net salesRevenueFromContractWithCustomerExcludingAssessedTax
                Total cost of salesCostOfGoodsAndServicesSold
                Gross marginGrossProfit
                Research and developmentResearchAndDevelopmentExpense
                Selling, general and administrativeSellingGeneralAndAdministrativeExpense
                Total operating expensesOperatingExpenses
                Operating incomeOperatingIncomeLoss
                Other income/(expense), netNonoperatingIncomeExpense
                Income before provision for income taxesIncomeLossFromContinuingOperationsBeforeIncomeTaxesExtraordinaryItemsNoncontrollingInterest
                Provision for income taxesIncomeTaxExpenseBenefit
                Net incomeNetIncomeLoss
                Earnings per share: BasicEarningsPerShareBasic
                Earnings per share: DilutedEarningsPerShareDiluted
                Shares used in computing earnings per share: BasicWeightedAverageNumberOfSharesOutstandingBasic
                Shares used in computing earnings per share: DilutedWeightedAverageNumberOfDilutedSharesOutstanding

                The income statement with the same values but with different financial metric names can also be found on the EDGAR interactive data page, which is generated from the same XBRL data. You can access it by visiting this link.

                apple-interactive-data-income-statement

                Export Income Statement from SEC 10-K Filing to Excel

                Extracting and exporting financial statements from SEC filings, such as 10-K, 10-Q, S-1, F-1, or 20-F, to Excel can be done with just a few lines of Python code. The .xbrl_to_json(htm_url) function takes the URL of the filing as input and returns the filing's XBRL data in JSON format. After extracting the financial statement, such as the income statement, with the get_income_statement(xbrl_json) function, the resulting pandas dataframe can be easily exported to Excel using the .to_excel(filename) method.

                url_10k_aapl = 'https://www.sec.gov/Archives/edgar/data/320193/000032019322000108/aapl-20220924.htm'

                xbrl_json_aapl = xbrlApi.xbrl_to_json(htm_url=url_10k_aapl)

                income_statement_aapl = get_income_statement(xbrl_json_aapl)

                income_statement_aapl.to_excel('income-statement-aapl.xlsx')

                The resulting exported income statement in Excel looks like this:

                exported-income-statement-as-excel

                Extract and Merge Financial Statements from Multiple 10-K Filings

                So far we only looked at extracting data from a single 10-K filing. Let's take a step further and explore how to extract data from multiple 10-K filings filed in recent years and merge the data for further analysis.

                We specifically look at Apple's 10-Ks filed annually between 2019 and 2022. Firstly, we provide the URLs of the filings to the xbrl_to_json() converter function, and then convert the result into separate income statements for each 10-K, where each statement is represented by a dataframe. Finally, we merge all the statements into a single dataframe and remove all the metrics that are duplicated across multiple filings. For instance, the 10-K filed in 2022 has revenue data for the timeframe from 2020 to 2021, while the 10-K filed in 2021 includes the same data for this period.

                # Apple's 10Ks of the last 4 years, 2019 to 2022
                url_10k_2019 = 'https://www.sec.gov/Archives/edgar/data/320193/000032019319000119/a10-k20199282019.htm'
                url_10k_2020 = 'https://www.sec.gov/Archives/edgar/data/320193/000032019320000096/aapl-20200926.htm'
                url_10k_2021 = 'https://www.sec.gov/Archives/edgar/data/320193/000032019321000105/aapl-20210925.htm'
                url_10k_2022 = 'https://www.sec.gov/Archives/edgar/data/320193/000032019322000108/aapl-20220924.htm'

                xbrl_json_2019 = xbrlApi.xbrl_to_json(htm_url=url_10k_2019)
                xbrl_json_2020 = xbrlApi.xbrl_to_json(htm_url=url_10k_2020)
                xbrl_json_2021 = xbrlApi.xbrl_to_json(htm_url=url_10k_2021)
                xbrl_json_2022 = xbrlApi.xbrl_to_json(htm_url=url_10k_2022)
                income_statement_2019 = get_income_statement(xbrl_json_2019)
                income_statement_2020 = get_income_statement(xbrl_json_2020)
                income_statement_2021 = get_income_statement(xbrl_json_2021)
                income_statement_2022 = get_income_statement(xbrl_json_2022)
                income_statements_merged = pd.concat([income_statement_2019, income_statement_2020, income_statement_2021, income_statement_2022], axis=0, sort=False)

                # sort & reset the index of the merged dataframe
                income_statements_merged = income_statements_merged.sort_index().reset_index()

                # convert cells to float
                income_statements_merged = income_statements_merged.applymap(lambda x: pd.to_numeric(x, errors='ignore'))


                print("Merged, uncleaned financials of all income statements")
                print('-----------------------------------------------------')
                income_statements_merged.head(10)
                Merged, uncleaned financials of all income statements
                -----------------------------------------------------
                Out:
                index2016-09-25-2017-09-302017-10-01-2017-12-302017-10-01-2018-09-292017-12-31-2018-03-312018-04-01-2018-06-302018-07-01-2018-09-292018-09-30-2018-12-292018-09-30-2019-09-282018-12-30-2019-03-302019-03-31-2019-06-292019-06-30-2019-09-282019-09-29-2019-12-282019-09-29-2020-09-262019-12-29-2020-03-282020-03-29-2020-06-272020-06-28-2020-09-262020-09-27-2021-09-252021-09-26-2022-09-24
                0CostOfGoodsAndServicesSold1.410480e+11NaN1.637560e+11NaNNaNNaNNaN1.617820e+11NaNNaNNaNNaNNaNNaNNaNNaNNaNNaN
                1CostOfGoodsAndServicesSoldNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaN1.695590e+11NaNNaNNaN2.129810e+112.235460e+11
                2CostOfGoodsAndServicesSoldNaNNaNNaNNaNNaNNaNNaN1.617820e+11NaNNaNNaNNaN1.695590e+11NaNNaNNaN2.129810e+11NaN
                3CostOfGoodsAndServicesSoldNaNNaN1.637560e+11NaNNaNNaNNaN1.617820e+11NaNNaNNaNNaN1.695590e+11NaNNaNNaNNaNNaN
                4EarningsPerShareBasicNaNNaN3.000000e+00NaNNaNNaN1.052.990000e+000.620.550.761.263.310000e+000.640.650.74NaNNaN
                5EarningsPerShareBasicNaNNaNNaNNaNNaNNaNNaN2.990000e+00NaNNaNNaNNaN3.310000e+00NaNNaNNaN5.670000e+00NaN
                6EarningsPerShareBasicNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaN3.310000e+00NaNNaNNaN5.670000e+006.150000e+00
                7EarningsPerShareBasic9.270000e+003.921.201000e+012.752.362.944.221.197000e+012.472.203.05NaNNaNNaNNaNNaNNaNNaN
                8EarningsPerShareDiluted9.210000e+003.891.191000e+012.732.342.914.181.189000e+012.462.183.03NaNNaNNaNNaNNaNNaNNaN
                9EarningsPerShareDilutedNaNNaN2.980000e+00NaNNaNNaN1.052.970000e+000.610.550.761.253.280000e+000.640.650.73NaNNaN
                income_statements = income_statements_merged.groupby('index').max()

                # reindex the merged dataframe using the index of the first dataframe
                income_statements = income_statements.reindex(income_statement_2019.index)

                # loop over the columns
                for col in income_statements.columns[1:]:
                    # extract start and end dates from the column label
                    splitted = col.split('-')
                    start = '-'.join(splitted[:3])
                    end = '-'.join(splitted[3:])

                    # convert start and end dates to datetime objects
                    start_date = pd.to_datetime(start)
                    end_date = pd.to_datetime(end)

                    # calculate the duration between start and end dates
                    duration = (end_date - start_date).days / 360

                    # drop the column if duration is less than a year
                    if duration < 1:
                        income_statements.drop(columns=[col], inplace=True)

                # convert datatype of cells to readable format, e.g. "2.235460e+11" becomes "223546000000"
                income_statements = income_statements.apply(lambda row: pd.to_numeric(row, errors='coerce', downcast='integer').astype(str), axis=1)


                print("Income statements from Apple's 10-K filings (2016 to 2022) as dataframe")
                print('------------------------------------------------------------------------')
                income_statements
                Income statements from Apple's 10-K filings (2016 to 2022) as dataframe
                ------------------------------------------------------------------------
                Out:
                2016-09-25-2017-09-302017-10-01-2018-09-292018-09-30-2019-09-282019-09-29-2020-09-262020-09-27-2021-09-252021-09-26-2022-09-24
                RevenueFromContractWithCustomerExcludingAssessedTax229234000000265595000000260174000000274515000000365817000000394328000000
                CostOfGoodsAndServicesSold141048000000163756000000161782000000169559000000212981000000223546000000
                GrossProfit8818600000010183900000098392000000104956000000152836000000170782000000
                ResearchAndDevelopmentExpense115810000001423600000016217000000187520000002191400000026251000000
                SellingGeneralAndAdministrativeExpense152610000001670500000018245000000199160000002197300000025094000000
                OperatingExpenses268420000003094100000034462000000386680000004388700000051345000000
                OperatingIncomeLoss61344000000708980000006393000000066288000000108949000000119437000000
                NonoperatingIncomeExpense274500000020050000001807000000803000000258000000-334000000
                IncomeLossFromContinuingOperationsBeforeIncomeTaxesExtraordinaryItemsNoncontrollingInterest64089000000729030000006573700000067091000000109207000000119103000000
                IncomeTaxExpenseBenefit15738000000133720000001048100000096800000001452700000019300000000
                NetIncomeLoss483510000005953100000055256000000574110000009468000000099803000000
                EarningsPerShareBasic9.2712.0111.973.315.676.15
                EarningsPerShareDiluted9.2111.9111.893.285.616.11
                WeightedAverageNumberOfSharesOutstandingBasic52172420001982151000018471336000173521190001670127200016215963000
                WeightedAverageNumberOfDilutedSharesOutstanding52516920002000043500018595651000175282140001686491900016325819000

                Extract and Visualize Revenue by Product Category per Year from 10-Ks

                In this section, we explore how Apple's revenue is distributed across its major products, such as iPhone, Mac, iPad, and wearables. Additionally, we analyze the geographic distribution of revenue to determine the areas from which Apple generates its income.

                Both the product and geographic revenue data are also included in the XBRL data we previously extracted from the 10-K filings. This data can be extracted and converted into a pandas dataframe for further analysis and visualization. By doing this, we can get insights into Apple's revenue streams and the markets where Apple generates the most income.

                We begin by merging all the revenue objects from ['StatementsOfIncome']['RevenueFromContractWithCustomerExcludingAssessedTax'] in each 10-K filing into a list, which we then convert into a dataframe. We remove any duplicate values and then filter the revenue objects by focusing only on those that are relevant to Apple's products, such as iPhone, iPad, Mac, and wearables.

                all_revenues_json = xbrl_json_2019['StatementsOfIncome']['RevenueFromContractWithCustomerExcludingAssessedTax'] + \
                                    xbrl_json_2020['StatementsOfIncome']['RevenueFromContractWithCustomerExcludingAssessedTax'] + \
                                    xbrl_json_2021['StatementsOfIncome']['RevenueFromContractWithCustomerExcludingAssessedTax'] + \
                                    xbrl_json_2022['StatementsOfIncome']['RevenueFromContractWithCustomerExcludingAssessedTax']

                all_revenues = pd.json_normalize(all_revenues_json)

                all_revenues.drop_duplicates(inplace=True)
                # convert the 'value' column to a numeric type
                all_revenues['value'] = all_revenues['value'].astype(int)

                mask_iphone = all_revenues['segment.value'] == 'aapl:IPhoneMember'
                mask_ipad = all_revenues['segment.value'] == 'aapl:IPadMember'
                mask_mac = all_revenues['segment.value'] == 'aapl:MacMember'
                mask_wearables = all_revenues['segment.value'] == 'aapl:WearablesHomeandAccessoriesMember'

                revenue_product = all_revenues[(mask_iphone | mask_ipad | mask_mac | mask_wearables)]

                # pivot the dataframe to create a new dataframe with period.endDate as the index,
                # segment.value as the columns, and value as the values
                revenue_product_pivot = revenue_product.pivot(index='period.endDate', columns='segment.value', values='value')


                print("Apple's revenues by product from 2017 to 2022")
                print('---------------------------------------------')
                revenue_product_pivot
                Apple's revenues by product from 2017 to 2022
                ---------------------------------------------
                Out:
                segment.valueaapl:IPadMemberaapl:IPhoneMemberaapl:MacMemberaapl:WearablesHomeandAccessoriesMember
                period.endDate
                2017-09-30188020000001393370000002556900000012826000000
                2018-09-29183800000001648880000002519800000017381000000
                2019-09-28212800000001423810000002574000000024482000000
                2020-09-26237240000001377810000002862200000030620000000
                2021-09-25318620000001919730000003519000000038367000000
                2022-09-24292920000002054890000004017700000041241000000

                The next step involves plotting the pivot table as a stacked bar chart, representing the revenue distribution of Apple's major products over time.

                import matplotlib.pyplot as plt
                import matplotlib.ticker as ticker

                # plot the histogram bar chart
                ax = revenue_product_pivot.plot(kind='bar', stacked=True, figsize=(8, 6))

                # rotate the x-axis labels by 0 degrees
                plt.xticks(rotation=0)

                # set the title and labels for the chart
                ax.set_title("Apple's Revenue by Product Category", fontsize=16, fontweight='bold')
                ax.set_xlabel('Period End Date', fontsize=12)
                ax.set_ylabel('Revenue (USD)', fontsize=12)

                # set the legend properties
                ax.legend(title='Product Category', loc='upper left', fontsize='small', title_fontsize=10)

                # add gridlines
                ax.grid(axis='y', linestyle='--', alpha=0.3)

                # remove the top and right spines
                ax.spines['top'].set_visible(False)
                ax.spines['right'].set_visible(False)

                # format y-axis ticks to show values in millions in dollars
                formatter = ticker.FuncFormatter(lambda x, pos: '$%1.0fB' % (x*1e-9))
                plt.gca().yaxis.set_major_formatter(formatter)

                # map the original labels to new labels
                label_map = {
                    'aapl:IPhoneMember': 'iPhone',
                    'aapl:MacMember': 'Mac',
                    'aapl:IPadMember': 'iPad',
                    'aapl:WearablesHomeandAccessoriesMember': 'Wearables & Home'
                }

                # create a list of new labels based on the original labels
                new_labels = [label_map[label] for label in sorted(revenue_product['segment.value'].unique())]
                handles, _ = ax.get_legend_handles_labels()
                plt.legend(handles=handles[::-1], labels=new_labels[::-1])

                # add the values in billions of dollars to each part of the bar
                for p in ax.containers:
                    ax.bar_label(p, labels=['%.1f' % (v/1e9) for v in p.datavalues],
                                 label_type='center', fontsize=8)

                plt.show()

                revenue-per-product-category

                Extract and Visualize Revenue by Region per Year from 10-Ks

                The process for extracting and visualizing revenue per geographical region is similar to the previous example. However, instead of filtering for product-related revenue metrics, we filter the XBRL data for revenue metrics related to different regions, namely America, Europe, China, Japan, and the rest of Asia.

                mask_america = all_revenues['segment.value'] == 'aapl:AmericasSegmentMember'
                mask_europe = all_revenues['segment.value'] == 'aapl:EuropeSegmentMember'
                mask_china = all_revenues['segment.value'] == 'aapl:GreaterChinaSegmentMember'
                mask_japan = all_revenues['segment.value'] == 'aapl:JapanSegmentMember'
                mask_asia_rest = all_revenues['segment.value'] == 'aapl:RestOfAsiaPacificSegmentMember'

                revenue_geo = all_revenues[(mask_america | mask_europe | mask_china | mask_japan | mask_asia_rest)]

                # pivot the dataframe to create a new dataframe with period.endDate as the index, segment.value as the columns, and value as the values
                revenue_geo_pivot = revenue_geo.pivot(index='period.endDate', columns='segment.value', values='value')


                print("Apple's revenues by region from 2017 to 2022")
                print('---------------------------------------------')
                revenue_geo_pivot
                Apple's revenues by region from 2017 to 2022
                ---------------------------------------------
                Out:
                segment.valueaapl:AmericasSegmentMemberaapl:EuropeSegmentMemberaapl:GreaterChinaSegmentMemberaapl:JapanSegmentMemberaapl:RestOfAsiaPacificSegmentMember
                period.endDate
                2017-09-309660000000054938000000447640000001773300000015199000000
                2018-09-2911209300000062420000000519420000002173300000017407000000
                2019-09-2811691400000060288000000436780000002150600000017788000000
                2020-09-2612455600000068640000000403080000002141800000019593000000
                2021-09-2515330600000089307000000683660000002848200000026356000000
                2022-09-2416965800000095118000000742000000002597700000029375000000
                # plot the histogram bar chart
                ax = revenue_geo_pivot.plot(kind='bar', stacked=True, figsize=(8, 6))

                # rotate the x-axis labels by 0 degrees
                plt.xticks(rotation=0)

                # set the title and labels for the chart
                ax.set_title("Apple's Revenue by Region", fontsize=16, fontweight='bold')
                ax.set_xlabel('Period End Date', fontsize=12)
                ax.set_ylabel('Revenue (USD)', fontsize=12)

                # set the legend properties
                ax.legend(title='Region', loc='upper left', fontsize='small', title_fontsize=10)

                # add gridlines
                ax.grid(axis='y', linestyle='--', alpha=0.3)

                # remove the top and right spines
                ax.spines['top'].set_visible(False)
                ax.spines['right'].set_visible(False)

                # format y-axis ticks to show values in millions in dollars
                formatter = ticker.FuncFormatter(lambda x, pos: '$%1.0fB' % (x*1e-9))
                plt.gca().yaxis.set_major_formatter(formatter)

                # map the original labels to new labels
                label_map = {
                    'aapl:AmericasSegmentMember': 'America',
                    'aapl:EuropeSegmentMember': 'Europe',
                    'aapl:GreaterChinaSegmentMember': 'China',
                    'aapl:JapanSegmentMember': 'Japan',
                    'aapl:RestOfAsiaPacificSegmentMember': 'Asia Rest'
                }

                # create a list of new labels based on the original labels
                new_labels = [label_map[label] for label in sorted(revenue_geo['segment.value'].unique())]
                handles, _ = ax.get_legend_handles_labels()
                plt.legend(handles=handles[::-1], labels=new_labels[::-1])

                # add the values in billions of dollars to each part of the bar
                for p in ax.containers:
                    ax.bar_label(p, labels=['%.1f' % (v/1e9) for v in p.datavalues],
                                 label_type='center', fontsize=8)

                plt.show()

                revenue-by-region

                Footer

                Products

                • EDGAR Filing Search API
                • Full-Text Search API
                • Real-Time Filing Stream API
                • Filing Download & PDF Generator API
                • XBRL-to-JSON Converter
                • 10-K/10-Q/8-K Item Extractor
                • Investment Adviser & Form ADV API
                • Insider Trading Data - Form 3, 4, 5
                • Restricted Sales Notifications - Form 144
                • Institutional Holdings - Form 13F
                • Form N-PORT API - Investment Company Holdings
                • Form N-PX API - Proxy Voting Records
                • Form 13D/13G API
                • Form S-1/424B4 - IPOs, Debt & Rights Offerings
                • Form C - Crowdfunding Offerings
                • Form D - Private Placements & Exempt Offerings
                • Regulation A Offering Statements API
                • Changes in Auditors & Accountants
                • Non-Reliance on Prior Financial Statements
                • Executive Compensation Data API
                • Directors & Board Members Data
                • Company Subsidiaries Database
                • Outstanding Shares & Public Float
                • SEC Enforcement Actions
                • Accounting & Auditing Enforcement Releases (AAERs)
                • SRO Filings
                • CIK, CUSIP, Ticker Mapping

                General

                • Pricing
                • Features
                • Supported Filings
                • EDGAR Filing Statistics

                Account

                • Sign Up - Start Free Trial
                • Log In
                • Forgot Password

                Developers

                • API Sandbox
                • Documentation
                • Resources & Tutorials
                • Python API SDK
                • Node.js API SDK

                Legal

                • Terms of Service
                • Privacy Policy

                Legal

                • Terms of Service
                • Privacy Policy

                SEC API

                © 2025 sec-api.io by Data2Value GmbH. All rights reserved.

                SEC® and EDGAR® are registered trademarks of the U.S. Securities and Exchange Commission (SEC).

                EDGAR is the Electronic Data Gathering, Analysis, and Retrieval system operated by the SEC.

                sec-api.io and Data2Value GmbH are independent of, and not affiliated with, sponsored by, or endorsed by the U.S. Securities and Exchange Commission.

                sec-api.io is classified under SIC code 7375 (Information Retrieval Services), providing on-demand access to structured data and online information services.