Extract Financial Statements from SEC Filings and XBRL Data with Python
On this page:
- Overview of XBRL and the Converter API
- Getting Started
- Extract Income Statement from XBRL Data in 10-K Filing
- Export Income Statement from SEC 10-K Filing to Excel
- Extract and Merge Financial Statements from Multiple 10-K Filings
- Extract and Visualize Revenue by Product Category per Year from 10-Ks
- Extract and Visualize Revenue by Region per Year from 10-Ks
Welcome to this Python tutorial on how to extract financial data from XBRL in SEC filings using Python. You will learn how to convert XBRL data into a pandas dataframe, extract income statements and balance sheets from 10-K filings, and build financial tables from EDGAR XBRL files. We will also explore how to extract and visualize revenue metrics by product categories and geographical regions.
You don't need to have any prior understanding of XBRL to follow and complete this tutorial successfully.
The 10-K filings of Apple serve as an excellent example throughout the tutorial. To retrieve the XBRL data of an SEC filing by its URL, we use the .xbrl_to_json(filing_url)
function. The XBRL data is already converted to JSON format, and the financial statements are standardized for ease of use.
It's important to note that XBRL data often includes non-standard names for financial statements such as ConsolidatedStatementsofOperations
and ConsolidatedStatementsOfLossIncome
. To solve this issue, the API automatically converts non-standard names for all financial statements, for example, income statements to StatementsOfIncome
.
All US GAAP item names are standardized as well. For instance, us-gaap_SalesRevenueGoodsNet
is renamed to SalesRevenueGoodsNet
by removing the "us-gaap_" prefix. It's crucial to bear in mind that GAAP items are not renamed to their parent name. As an example, us-gaap_SalesRevenueGoodsNet
is not renamed to Revenue
. Though, the original GAAP item name is always maintained.
Overview of XBRL and the Converter API
Feel free to skip to the next section if you already have a good understanding of XBRL.
XBRL (eXtensible Business Reporting Language) is a standardized format for exchanging business and financial information between organizations, such as the US SEC and publicly listed companies. The majority of unstructured data in commonly known filings like 10-Ks, is nowadays available in a structured, but highly complex format, as XBRL.
Before the introduction of XBRL, extracting financial statements like income statements from filings involved scraping and heavily post-processing the data to structure it. With XBRL, we can now access information such as revenue, net profit, etc. without scraping the HTML filing. However, using raw XBRL data can be challenging due to its complexity. This is where our XBRL-to-JSON Converter API comes in, simplifying the process of parsing XBRL data.
The SEC requires certain filings to include structured XBRL data. These filings include 10-K, 10-Q, 8-K, 20-F, S-1 and many others. A comprehensive list of filings that contain XBRL data can be found here.
XBRL Files of Apple's 10-K
Let's take one of Apple's 10-K filings as an example:
As we can see from the screenshot of the EDGAR filing details page above, the XBRL data is divided across six different XML files:
- Schema document (aapl-20200926.xsd)
- Calculation linkbase doc (aapl-20200926_cal.xml)
- Definition linkbase doc (_def.xml)
- Label linkbase doc (_lab.xml)
- Presentation linkbase doc (_pre.xml)
- Instance doc (_htm.xml)
These files contain the financial statements and financial metrics such as revenue, earnings per share, net profit, and various text and notes to financial statements, along with information about how the data should be represented, for example in what order keys such as revenue, cost of goods sold, should be displayed in an income statement.
The XBRL-to-JSON Converter API takes the URL of Apple's 10-K filing as input, merges all those XBRL XML files into a single data object, converts the merged content into JSON data, standardizes all element labels and returns the data in JSON format. You don't have to take care of any part of that process anymore. You don't have to worry about merging and combining individual XBRL files or learn how to find the context of XBRL instance elements. All this heavy lifting is done by the XBRL-to-JSON Converter API.
In summary, you simply provide the URL of the 10-K filing and you receive a completely structured SEC filing in JSON format as a result that can simply be imported into a pandas dataframe.
Getting Started
We start with installing the sec-api
Python package an instantiating the XbrlApi()
providing us with a simply way to access the XBRL-to-JSON converter.
API_KEY = 'YOUR_API_KEY'
!pip install -q sec-api
from sec_api import XbrlApi
xbrlApi = XbrlApi(API_KEY)
Extract Income Statement from XBRL Data in 10-K Filing
We show you how to extract the income statement of Apple's 10-K filing by using the .xbrl_to_json(htm_url)
function and providing the URL of the filing as the input parameter. The function extracts and standardizes all XBRL XML and XSD files attached to the filing into a JSON format, making it easy to locate and analyze specific financial data. The function also returns all other financial statements from the 10-K filing, such as the balance sheet and cash flow statement, as well as all disclosure notes.
In 10-K filings, income statements are disclosed in item 8 "Financial Statements and Supplementary Data". Using the .xbrl_to_json(htm_url)
function, we don't have to scrape the filing section and convert the HTML table into a dataframe, and instead can simply access the statement with aapl_xbrl_json['StatementsOfIncome']
. Following is an overview of the mapping between the financial statements and the keys of their XBRL counterparts.
Financial Statement | XBRL-JSON Counterpart |
---|---|
Income statement | xbrl_json['StatementsOfIncome'] |
Balance sheet | xbrl_json['BalanceSheets'] |
Cash flow statement | xbrl_json['StatementsOfCashFlows'] |
If you want to locate the URLs of Apple's or other companies' 10-K filings, you can use our Query API to do so.
# URL of Apple's 10-K filings
url_10k_aapl = 'https://www.sec.gov/Archives/edgar/data/320193/000032019322000108/aapl-20220924.htm'
aapl_xbrl_json = xbrlApi.xbrl_to_json(htm_url=url_10k_aapl)
print("Keys of income statement dictionary in XBRL from Apple's 10-K filing")
print('--------------------------------------------------------------------')
print(*list(aapl_xbrl_json['StatementsOfIncome'].keys()), sep='
')
Keys of income statement dictionary in XBRL from Apple's 10-K filing
--------------------------------------------------------------------
RevenueFromContractWithCustomerExcludingAssessedTax
CostOfGoodsAndServicesSold
GrossProfit
ResearchAndDevelopmentExpense
SellingGeneralAndAdministrativeExpense
OperatingExpenses
OperatingIncomeLoss
NonoperatingIncomeExpense
IncomeLossFromContinuingOperationsBeforeIncomeTaxesExtraordinaryItemsNoncontrollingInterest
IncomeTaxExpenseBenefit
NetIncomeLoss
EarningsPerShareBasic
EarningsPerShareDiluted
WeightedAverageNumberOfSharesOutstandingBasic
WeightedAverageNumberOfDilutedSharesOutstanding
Next, we build a function get_income_statement(xbrl_json)
to convert the income statement inside the XBRL-JSON data to a pandas dataframe while ignoring financial metrics related to specific dimensions, such as revenue generated by product category or geographical region.
import pandas as pd
# convert XBRL-JSON of income statement to pandas dataframe
def get_income_statement(xbrl_json):
income_statement_store = {}
# iterate over each US GAAP item in the income statement
for usGaapItem in xbrl_json['StatementsOfIncome']:
values = []
indicies = []
for fact in xbrl_json['StatementsOfIncome'][usGaapItem]:
# only consider items without segment. not required for our analysis.
if 'segment' not in fact:
index = fact['period']['startDate'] + '-' + fact['period']['endDate']
# ensure no index duplicates are created
if index not in indicies:
values.append(fact['value'])
indicies.append(index)
income_statement_store[usGaapItem] = pd.Series(values, index=indicies)
income_statement = pd.DataFrame(income_statement_store)
# switch columns and rows so that US GAAP items are rows and each column header represents a date range
return income_statement.T
income_statement_aapl = get_income_statement(aapl_xbrl_json)
print("Income statement of Apple's 2022 10-K filing as dataframe")
print('---------------------------------------------------------')
income_statement_aapl
Income statement of Apple's 2022 10-K filing as dataframe
---------------------------------------------------------
2021-09-26-2022-09-24 | 2020-09-27-2021-09-25 | 2019-09-29-2020-09-26 | |
---|---|---|---|
RevenueFromContractWithCustomerExcludingAssessedTax | 394328000000 | 365817000000 | 274515000000 |
CostOfGoodsAndServicesSold | 223546000000 | 212981000000 | 169559000000 |
GrossProfit | 170782000000 | 152836000000 | 104956000000 |
ResearchAndDevelopmentExpense | 26251000000 | 21914000000 | 18752000000 |
SellingGeneralAndAdministrativeExpense | 25094000000 | 21973000000 | 19916000000 |
OperatingExpenses | 51345000000 | 43887000000 | 38668000000 |
OperatingIncomeLoss | 119437000000 | 108949000000 | 66288000000 |
NonoperatingIncomeExpense | -334000000 | 258000000 | 803000000 |
IncomeLossFromContinuingOperationsBeforeIncomeTaxesExtraordinaryItemsNoncontrollingInterest | 119103000000 | 109207000000 | 67091000000 |
IncomeTaxExpenseBenefit | 19300000000 | 14527000000 | 9680000000 |
NetIncomeLoss | 99803000000 | 94680000000 | 57411000000 |
EarningsPerShareBasic | 6.15 | 5.67 | 3.31 |
EarningsPerShareDiluted | 6.11 | 5.61 | 3.28 |
WeightedAverageNumberOfSharesOutstandingBasic | 16215963000 | 16701272000 | 17352119000 |
WeightedAverageNumberOfDilutedSharesOutstanding | 16325819000 | 16864919000 | 17528214000 |
The following screenshot displays the same consolidated statements of operations, also known as the income statement, from the 10-K filing of Apple in "Item 8 - Financial Statements and Supplementary Data".
To provide further context, we included a table that maps the financial metric names such as "net sales" and "cost of sales" from the human-readable 10-K to its corresponding XBRL counterparts accessible in our income_statement_aapl
dataframe. The 10-K filing can be accessed through this link.
Human-Readable 10-K Filing | Counterpart in XBRL |
---|---|
Total net sales | RevenueFromContractWithCustomerExcludingAssessedTax |
Total cost of sales | CostOfGoodsAndServicesSold |
Gross margin | GrossProfit |
Research and development | ResearchAndDevelopmentExpense |
Selling, general and administrative | SellingGeneralAndAdministrativeExpense |
Total operating expenses | OperatingExpenses |
Operating income | OperatingIncomeLoss |
Other income/(expense), net | NonoperatingIncomeExpense |
Income before provision for income taxes | IncomeLossFromContinuingOperationsBeforeIncomeTaxesExtraordinaryItemsNoncontrollingInterest |
Provision for income taxes | IncomeTaxExpenseBenefit |
Net income | NetIncomeLoss |
Earnings per share: Basic | EarningsPerShareBasic |
Earnings per share: Diluted | EarningsPerShareDiluted |
Shares used in computing earnings per share: Basic | WeightedAverageNumberOfSharesOutstandingBasic |
Shares used in computing earnings per share: Diluted | WeightedAverageNumberOfDilutedSharesOutstanding |
The income statement with the same values but with different financial metric names can also be found on the EDGAR interactive data page, which is generated from the same XBRL data. You can access it by visiting this link.
Export Income Statement from SEC 10-K Filing to Excel
Extracting and exporting financial statements from SEC filings, such as 10-K, 10-Q, S-1, F-1, or 20-F, to Excel can be done with just a few lines of Python code. The .xbrl_to_json(htm_url)
function takes the URL of the filing as input and returns the filing's XBRL data in JSON format. After extracting the financial statement, such as the income statement, with the get_income_statement(xbrl_json)
function, the resulting pandas dataframe can be easily exported to Excel using the .to_excel(filename)
method.
url_10k_aapl = 'https://www.sec.gov/Archives/edgar/data/320193/000032019322000108/aapl-20220924.htm'
xbrl_json_aapl = xbrlApi.xbrl_to_json(htm_url=url_10k_aapl)
income_statement_aapl = get_income_statement(xbrl_json_aapl)
income_statement_aapl.to_excel('income-statement-aapl.xlsx')
The resulting exported income statement in Excel looks like this:
Extract and Merge Financial Statements from Multiple 10-K Filings
So far we only looked at extracting data from a single 10-K filing. Let's take a step further and explore how to extract data from multiple 10-K filings filed in recent years and merge the data for further analysis.
We specifically look at Apple's 10-Ks filed annually between 2019 and 2022. Firstly, we provide the URLs of the filings to the xbrl_to_json()
converter function, and then convert the result into separate income statements for each 10-K, where each statement is represented by a dataframe. Finally, we merge all the statements into a single dataframe and remove all the metrics that are duplicated across multiple filings. For instance, the 10-K filed in 2022 has revenue data for the timeframe from 2020 to 2021, while the 10-K filed in 2021 includes the same data for this period.
# Apple's 10Ks of the last 4 years, 2019 to 2022
url_10k_2019 = 'https://www.sec.gov/Archives/edgar/data/320193/000032019319000119/a10-k20199282019.htm'
url_10k_2020 = 'https://www.sec.gov/Archives/edgar/data/320193/000032019320000096/aapl-20200926.htm'
url_10k_2021 = 'https://www.sec.gov/Archives/edgar/data/320193/000032019321000105/aapl-20210925.htm'
url_10k_2022 = 'https://www.sec.gov/Archives/edgar/data/320193/000032019322000108/aapl-20220924.htm'
xbrl_json_2019 = xbrlApi.xbrl_to_json(htm_url=url_10k_2019)
xbrl_json_2020 = xbrlApi.xbrl_to_json(htm_url=url_10k_2020)
xbrl_json_2021 = xbrlApi.xbrl_to_json(htm_url=url_10k_2021)
xbrl_json_2022 = xbrlApi.xbrl_to_json(htm_url=url_10k_2022)
income_statement_2019 = get_income_statement(xbrl_json_2019)
income_statement_2020 = get_income_statement(xbrl_json_2020)
income_statement_2021 = get_income_statement(xbrl_json_2021)
income_statement_2022 = get_income_statement(xbrl_json_2022)
income_statements_merged = pd.concat([income_statement_2019, income_statement_2020, income_statement_2021, income_statement_2022], axis=0, sort=False)
# sort & reset the index of the merged dataframe
income_statements_merged = income_statements_merged.sort_index().reset_index()
# convert cells to float
income_statements_merged = income_statements_merged.applymap(lambda x: pd.to_numeric(x, errors='ignore'))
print("Merged, uncleaned financials of all income statements")
print('-----------------------------------------------------')
income_statements_merged.head(10)
Merged, uncleaned financials of all income statements
-----------------------------------------------------
index | 2016-09-25-2017-09-30 | 2017-10-01-2017-12-30 | 2017-10-01-2018-09-29 | 2017-12-31-2018-03-31 | 2018-04-01-2018-06-30 | 2018-07-01-2018-09-29 | 2018-09-30-2018-12-29 | 2018-09-30-2019-09-28 | 2018-12-30-2019-03-30 | 2019-03-31-2019-06-29 | 2019-06-30-2019-09-28 | 2019-09-29-2019-12-28 | 2019-09-29-2020-09-26 | 2019-12-29-2020-03-28 | 2020-03-29-2020-06-27 | 2020-06-28-2020-09-26 | 2020-09-27-2021-09-25 | 2021-09-26-2022-09-24 | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | CostOfGoodsAndServicesSold | 1.410480e+11 | NaN | 1.637560e+11 | NaN | NaN | NaN | NaN | 1.617820e+11 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
1 | CostOfGoodsAndServicesSold | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 1.695590e+11 | NaN | NaN | NaN | 2.129810e+11 | 2.235460e+11 |
2 | CostOfGoodsAndServicesSold | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 1.617820e+11 | NaN | NaN | NaN | NaN | 1.695590e+11 | NaN | NaN | NaN | 2.129810e+11 | NaN |
3 | CostOfGoodsAndServicesSold | NaN | NaN | 1.637560e+11 | NaN | NaN | NaN | NaN | 1.617820e+11 | NaN | NaN | NaN | NaN | 1.695590e+11 | NaN | NaN | NaN | NaN | NaN |
4 | EarningsPerShareBasic | NaN | NaN | 3.000000e+00 | NaN | NaN | NaN | 1.05 | 2.990000e+00 | 0.62 | 0.55 | 0.76 | 1.26 | 3.310000e+00 | 0.64 | 0.65 | 0.74 | NaN | NaN |
5 | EarningsPerShareBasic | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 2.990000e+00 | NaN | NaN | NaN | NaN | 3.310000e+00 | NaN | NaN | NaN | 5.670000e+00 | NaN |
6 | EarningsPerShareBasic | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 3.310000e+00 | NaN | NaN | NaN | 5.670000e+00 | 6.150000e+00 |
7 | EarningsPerShareBasic | 9.270000e+00 | 3.92 | 1.201000e+01 | 2.75 | 2.36 | 2.94 | 4.22 | 1.197000e+01 | 2.47 | 2.20 | 3.05 | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
8 | EarningsPerShareDiluted | 9.210000e+00 | 3.89 | 1.191000e+01 | 2.73 | 2.34 | 2.91 | 4.18 | 1.189000e+01 | 2.46 | 2.18 | 3.03 | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
9 | EarningsPerShareDiluted | NaN | NaN | 2.980000e+00 | NaN | NaN | NaN | 1.05 | 2.970000e+00 | 0.61 | 0.55 | 0.76 | 1.25 | 3.280000e+00 | 0.64 | 0.65 | 0.73 | NaN | NaN |
income_statements = income_statements_merged.groupby('index').max()
# reindex the merged dataframe using the index of the first dataframe
income_statements = income_statements.reindex(income_statement_2019.index)
# loop over the columns
for col in income_statements.columns[1:]:
# extract start and end dates from the column label
splitted = col.split('-')
start = '-'.join(splitted[:3])
end = '-'.join(splitted[3:])
# convert start and end dates to datetime objects
start_date = pd.to_datetime(start)
end_date = pd.to_datetime(end)
# calculate the duration between start and end dates
duration = (end_date - start_date).days / 360
# drop the column if duration is less than a year
if duration < 1:
income_statements.drop(columns=[col], inplace=True)
# convert datatype of cells to readable format, e.g. "2.235460e+11" becomes "223546000000"
income_statements = income_statements.apply(lambda row: pd.to_numeric(row, errors='coerce', downcast='integer').astype(str), axis=1)
print("Income statements from Apple's 10-K filings (2016 to 2022) as dataframe")
print('------------------------------------------------------------------------')
income_statements
Income statements from Apple's 10-K filings (2016 to 2022) as dataframe
------------------------------------------------------------------------
2016-09-25-2017-09-30 | 2017-10-01-2018-09-29 | 2018-09-30-2019-09-28 | 2019-09-29-2020-09-26 | 2020-09-27-2021-09-25 | 2021-09-26-2022-09-24 | |
---|---|---|---|---|---|---|
RevenueFromContractWithCustomerExcludingAssessedTax | 229234000000 | 265595000000 | 260174000000 | 274515000000 | 365817000000 | 394328000000 |
CostOfGoodsAndServicesSold | 141048000000 | 163756000000 | 161782000000 | 169559000000 | 212981000000 | 223546000000 |
GrossProfit | 88186000000 | 101839000000 | 98392000000 | 104956000000 | 152836000000 | 170782000000 |
ResearchAndDevelopmentExpense | 11581000000 | 14236000000 | 16217000000 | 18752000000 | 21914000000 | 26251000000 |
SellingGeneralAndAdministrativeExpense | 15261000000 | 16705000000 | 18245000000 | 19916000000 | 21973000000 | 25094000000 |
OperatingExpenses | 26842000000 | 30941000000 | 34462000000 | 38668000000 | 43887000000 | 51345000000 |
OperatingIncomeLoss | 61344000000 | 70898000000 | 63930000000 | 66288000000 | 108949000000 | 119437000000 |
NonoperatingIncomeExpense | 2745000000 | 2005000000 | 1807000000 | 803000000 | 258000000 | -334000000 |
IncomeLossFromContinuingOperationsBeforeIncomeTaxesExtraordinaryItemsNoncontrollingInterest | 64089000000 | 72903000000 | 65737000000 | 67091000000 | 109207000000 | 119103000000 |
IncomeTaxExpenseBenefit | 15738000000 | 13372000000 | 10481000000 | 9680000000 | 14527000000 | 19300000000 |
NetIncomeLoss | 48351000000 | 59531000000 | 55256000000 | 57411000000 | 94680000000 | 99803000000 |
EarningsPerShareBasic | 9.27 | 12.01 | 11.97 | 3.31 | 5.67 | 6.15 |
EarningsPerShareDiluted | 9.21 | 11.91 | 11.89 | 3.28 | 5.61 | 6.11 |
WeightedAverageNumberOfSharesOutstandingBasic | 5217242000 | 19821510000 | 18471336000 | 17352119000 | 16701272000 | 16215963000 |
WeightedAverageNumberOfDilutedSharesOutstanding | 5251692000 | 20000435000 | 18595651000 | 17528214000 | 16864919000 | 16325819000 |
Extract and Visualize Revenue by Product Category per Year from 10-Ks
In this section, we explore how Apple's revenue is distributed across its major products, such as iPhone, Mac, iPad, and wearables. Additionally, we analyze the geographic distribution of revenue to determine the areas from which Apple generates its income.
Both the product and geographic revenue data are also included in the XBRL data we previously extracted from the 10-K filings. This data can be extracted and converted into a pandas dataframe for further analysis and visualization. By doing this, we can get insights into Apple's revenue streams and the markets where Apple generates the most income.
We begin by merging all the revenue objects from ['StatementsOfIncome']['RevenueFromContractWithCustomerExcludingAssessedTax']
in each 10-K filing into a list, which we then convert into a dataframe. We remove any duplicate values and then filter the revenue objects by focusing only on those that are relevant to Apple's products, such as iPhone, iPad, Mac, and wearables.
all_revenues_json = xbrl_json_2019['StatementsOfIncome']['RevenueFromContractWithCustomerExcludingAssessedTax'] + \
xbrl_json_2020['StatementsOfIncome']['RevenueFromContractWithCustomerExcludingAssessedTax'] + \
xbrl_json_2021['StatementsOfIncome']['RevenueFromContractWithCustomerExcludingAssessedTax'] + \
xbrl_json_2022['StatementsOfIncome']['RevenueFromContractWithCustomerExcludingAssessedTax']
all_revenues = pd.json_normalize(all_revenues_json)
all_revenues.drop_duplicates(inplace=True)
# convert the 'value' column to a numeric type
all_revenues['value'] = all_revenues['value'].astype(int)
mask_iphone = all_revenues['segment.value'] == 'aapl:IPhoneMember'
mask_ipad = all_revenues['segment.value'] == 'aapl:IPadMember'
mask_mac = all_revenues['segment.value'] == 'aapl:MacMember'
mask_wearables = all_revenues['segment.value'] == 'aapl:WearablesHomeandAccessoriesMember'
revenue_product = all_revenues[(mask_iphone | mask_ipad | mask_mac | mask_wearables)]
# pivot the dataframe to create a new dataframe with period.endDate as the index,
# segment.value as the columns, and value as the values
revenue_product_pivot = revenue_product.pivot(index='period.endDate', columns='segment.value', values='value')
print("Apple's revenues by product from 2017 to 2022")
print('---------------------------------------------')
revenue_product_pivot
Apple's revenues by product from 2017 to 2022
---------------------------------------------
segment.value | aapl:IPadMember | aapl:IPhoneMember | aapl:MacMember | aapl:WearablesHomeandAccessoriesMember |
---|---|---|---|---|
period.endDate | ||||
2017-09-30 | 18802000000 | 139337000000 | 25569000000 | 12826000000 |
2018-09-29 | 18380000000 | 164888000000 | 25198000000 | 17381000000 |
2019-09-28 | 21280000000 | 142381000000 | 25740000000 | 24482000000 |
2020-09-26 | 23724000000 | 137781000000 | 28622000000 | 30620000000 |
2021-09-25 | 31862000000 | 191973000000 | 35190000000 | 38367000000 |
2022-09-24 | 29292000000 | 205489000000 | 40177000000 | 41241000000 |
The next step involves plotting the pivot table as a stacked bar chart, representing the revenue distribution of Apple's major products over time.
import matplotlib.pyplot as plt
import matplotlib.ticker as ticker
# plot the histogram bar chart
ax = revenue_product_pivot.plot(kind='bar', stacked=True, figsize=(8, 6))
# rotate the x-axis labels by 0 degrees
plt.xticks(rotation=0)
# set the title and labels for the chart
ax.set_title("Apple's Revenue by Product Category", fontsize=16, fontweight='bold')
ax.set_xlabel('Period End Date', fontsize=12)
ax.set_ylabel('Revenue (USD)', fontsize=12)
# set the legend properties
ax.legend(title='Product Category', loc='upper left', fontsize='small', title_fontsize=10)
# add gridlines
ax.grid(axis='y', linestyle='--', alpha=0.3)
# remove the top and right spines
ax.spines['top'].set_visible(False)
ax.spines['right'].set_visible(False)
# format y-axis ticks to show values in millions in dollars
formatter = ticker.FuncFormatter(lambda x, pos: '$%1.0fB' % (x*1e-9))
plt.gca().yaxis.set_major_formatter(formatter)
# map the original labels to new labels
label_map = {
'aapl:IPhoneMember': 'iPhone',
'aapl:MacMember': 'Mac',
'aapl:IPadMember': 'iPad',
'aapl:WearablesHomeandAccessoriesMember': 'Wearables & Home'
}
# create a list of new labels based on the original labels
new_labels = [label_map[label] for label in sorted(revenue_product['segment.value'].unique())]
handles, _ = ax.get_legend_handles_labels()
plt.legend(handles=handles[::-1], labels=new_labels[::-1])
# add the values in billions of dollars to each part of the bar
for p in ax.containers:
ax.bar_label(p, labels=['%.1f' % (v/1e9) for v in p.datavalues],
label_type='center', fontsize=8)
plt.show()
Extract and Visualize Revenue by Region per Year from 10-Ks
The process for extracting and visualizing revenue per geographical region is similar to the previous example. However, instead of filtering for product-related revenue metrics, we filter the XBRL data for revenue metrics related to different regions, namely America, Europe, China, Japan, and the rest of Asia.
mask_america = all_revenues['segment.value'] == 'aapl:AmericasSegmentMember'
mask_europe = all_revenues['segment.value'] == 'aapl:EuropeSegmentMember'
mask_china = all_revenues['segment.value'] == 'aapl:GreaterChinaSegmentMember'
mask_japan = all_revenues['segment.value'] == 'aapl:JapanSegmentMember'
mask_asia_rest = all_revenues['segment.value'] == 'aapl:RestOfAsiaPacificSegmentMember'
revenue_geo = all_revenues[(mask_america | mask_europe | mask_china | mask_japan | mask_asia_rest)]
# pivot the dataframe to create a new dataframe with period.endDate as the index, segment.value as the columns, and value as the values
revenue_geo_pivot = revenue_geo.pivot(index='period.endDate', columns='segment.value', values='value')
print("Apple's revenues by region from 2017 to 2022")
print('---------------------------------------------')
revenue_geo_pivot
Apple's revenues by region from 2017 to 2022
---------------------------------------------
segment.value | aapl:AmericasSegmentMember | aapl:EuropeSegmentMember | aapl:GreaterChinaSegmentMember | aapl:JapanSegmentMember | aapl:RestOfAsiaPacificSegmentMember |
---|---|---|---|---|---|
period.endDate | |||||
2017-09-30 | 96600000000 | 54938000000 | 44764000000 | 17733000000 | 15199000000 |
2018-09-29 | 112093000000 | 62420000000 | 51942000000 | 21733000000 | 17407000000 |
2019-09-28 | 116914000000 | 60288000000 | 43678000000 | 21506000000 | 17788000000 |
2020-09-26 | 124556000000 | 68640000000 | 40308000000 | 21418000000 | 19593000000 |
2021-09-25 | 153306000000 | 89307000000 | 68366000000 | 28482000000 | 26356000000 |
2022-09-24 | 169658000000 | 95118000000 | 74200000000 | 25977000000 | 29375000000 |
# plot the histogram bar chart
ax = revenue_geo_pivot.plot(kind='bar', stacked=True, figsize=(8, 6))
# rotate the x-axis labels by 0 degrees
plt.xticks(rotation=0)
# set the title and labels for the chart
ax.set_title("Apple's Revenue by Region", fontsize=16, fontweight='bold')
ax.set_xlabel('Period End Date', fontsize=12)
ax.set_ylabel('Revenue (USD)', fontsize=12)
# set the legend properties
ax.legend(title='Region', loc='upper left', fontsize='small', title_fontsize=10)
# add gridlines
ax.grid(axis='y', linestyle='--', alpha=0.3)
# remove the top and right spines
ax.spines['top'].set_visible(False)
ax.spines['right'].set_visible(False)
# format y-axis ticks to show values in millions in dollars
formatter = ticker.FuncFormatter(lambda x, pos: '$%1.0fB' % (x*1e-9))
plt.gca().yaxis.set_major_formatter(formatter)
# map the original labels to new labels
label_map = {
'aapl:AmericasSegmentMember': 'America',
'aapl:EuropeSegmentMember': 'Europe',
'aapl:GreaterChinaSegmentMember': 'China',
'aapl:JapanSegmentMember': 'Japan',
'aapl:RestOfAsiaPacificSegmentMember': 'Asia Rest'
}
# create a list of new labels based on the original labels
new_labels = [label_map[label] for label in sorted(revenue_geo['segment.value'].unique())]
handles, _ = ax.get_legend_handles_labels()
plt.legend(handles=handles[::-1], labels=new_labels[::-1])
# add the values in billions of dollars to each part of the bar
for p in ax.containers:
ax.bar_label(p, labels=['%.1f' % (v/1e9) for v in p.datavalues],
label_type='center', fontsize=8)
plt.show()