sec-api.ioSEC API by D2V
FilingsPricingSandboxDocs
Log inGet Free API Key
  1. Home
  2. Tutorials

Analyze Form 13F Filings with Python - Fund Performance Monitoring & More

Open In Colab   Download Notebook

On this page:

  • Trend Analysis of Active Funds
    • Fund Lifespan and Survival Rates
      • Evolution of Asset Values
        • Ranking Funds by Performance Factors
          • Outlier Detection
            • Performance Metrics Calculation & Ranking
              • Visualizing Fund Performance Relative to the Benchmark
                • Alpha Distribution Across Funds
                • Conclusion

                  This tutorial guides you through the process of acquiring, cleansing, and analyzing 13F-HR filings filed between 1998 and 2023 on the SEC's EDGAR database. These filings provide a snapshot of institutional investment managers' holdings, offering insights into fund performance and investment strategies. Our focus will encompass several analytical dimensions:

                  1. Trend Analysis of Active Funds: We'll examine the total number of active funds from 1998 to 2023 over time to assess trends in fund creation. This will help us understand whether the asset management industry is expanding or contracting in recent years.
                  2. Fund Lifespan and Survival Rates: By analyzing the operational duration of funds, we will calculate survival rates and determine quantiles to understand the typical lifespan of these funds. This analysis reveals that less than half of the funds remain active for more than five years, highlighting the competitive nature of the asset management industry.
                  3. Evolution of Asset Values: Aggregating the holdings values reported in 13F filings from 2013 to 2023, we'll track the progression of asset values to gauge fund performance over time. This longitudinal analysis provides insights into how institutional portfolios have adjusted in response to market conditions.
                  4. Per-Fund Analysis: Delving into individual fund performances, we will compute key financial metrics such as maximum drawdown, beta, alpha, and annualized returns. These factors will form the basis of a performance ranking, identifying the top 20 best-performing funds.
                  5. Performance Visualization: We will visualize the performance of these top funds against the SP500 index (ticker: SPY) as a benchmark. This comparison aims to evaluate the relative merits of investing in these funds versus a passive market investment strategy, including the potential benefits of replicating the holdings of top-performing funds.
                  6. Alpha Distribution Analysis: By examining the distribution of alpha across all funds, we aim to quantify the extent to which funds have outperformed the market on a risk-adjusted basis over the last decade. This involves categorizing funds into alpha buckets and analyzing the performance quantiles within each category.

                  Data Visualization Techniques

                  • Comparative Fund Performance: Charting fund performance alongside the SP500 index to visually assess relative investment returns.
                  • Fund Survival Rates: Graphically representing the longevity of funds to highlight industry attrition rates.
                  • Alpha Distribution Histogram: Showcasing the distribution of funds across different levels of market outperformance.

                  APIs Utilized for Data Gathering

                  • Query API: For retrieving 13F-HR filings from 1998 to 2023.
                  • Filing Render API: To extract cover page details and total holdings values.

                  Background on 13F Filings

                  13F filings are structured with a cover page detailing the fund manager's address, the number of holdings, and the aggregated value of reported assets. The holdings list enumerates each invested company or instrument, specifying the security type (e.g., stock, note), quantity held, and position value.

                  This tutorial assumes readers have a foundational understanding of 13F filings, allowing us to focus on the analytical techniques and insights these documents can provide to financial professionals.

                  Trend Analysis of Active Funds

                  In this section, we'll perform a trend analysis on the total number of active funds from 1998 to 2023. This analysis will help us discern patterns in fund creation, offering insights into whether the asset management industry is experiencing growth or decline.

                  Data Collection Process

                  We utilized the Query API to compile a comprehensive dataset of 13F filings. This dataset includes the CIKs of funds, accession numbers of their 13F filings, and the filing dates. To gather this data yourself, use the following query in the Query API:

                  {
                  "query": "formType:"13F" AND filedAt:[YEAR-MONTH-01 TO YEAR-MONTH-31]",
                  "from": "0",
                  "size": "50",
                  "sort": [{"filedAt": {"order": "desc"}}]
                  }

                  Replace YEAR and MONTH with the desired timeframe, incrementally adjusting the from parameter by 50 to paginate through the results. Continue this process for each month from 1998 to 2023, resetting from to 0 and updating YEAR/MONTH as necessary, until all data is collected.

                  Data Transformation and Analysis

                  After acquiring the filing metadata in JSON format, we transformed, cleaned, standardized and saved it to 13f-filings-metadata.csv. The resulting dataset is then loaded into a pandas DataFrame for use in this tutorial.

                  import pandas as pd
                  import numpy as np
                  import matplotlib.pyplot as plt
                  import matplotlib.style as style
                  import matplotlib.ticker as mtick
                  import textwrap
                  from functools import wraps

                  style.use("default")

                  params = {
                      "axes.labelsize": 8,
                      "font.size": 8,
                      "legend.fontsize": 8,
                      "xtick.labelsize": 8,
                      "ytick.labelsize": 8,
                      "text.usetex": False,
                      "font.family": "sans-serif",
                  }

                  plt.rcParams.update(params)


                  def round_decorator(decimals):
                      def decorator(func):
                          def wrapper(*args, **kwargs):
                              result = func(*args, **kwargs)
                              return round(result, decimals)

                          return wrapper

                      return decorator


                  def memoize(func):
                      cache = {}

                      @wraps(func)
                      def wrapper(*args, **kwargs):
                          # convert pd.Series/list to a hashable tuple for caching
                          hashable_args = tuple(
                              arg if not isinstance(arg, (pd.Series, list)) else tuple(arg)
                              for arg in args
                          )

                          hashable_kwargs = {
                              k: v if not isinstance(v, (pd.Series, list)) else tuple(v)
                              for k, v in kwargs.items()
                          }

                          key = (hashable_args, tuple(sorted(hashable_kwargs.items())))

                          if key not in cache:
                              cache[key] = func(*args, **kwargs)
                          return cache[key]

                      return wrapper
                  funds_filings = pd.read_csv('./data/13f-filings-metadata.csv')
                  funds_filings
                  Out:
                  cikformTypeaccessionNofiledAtyear
                  0102542113F-HR0000732697-98-0000041998-02-06 05:00:00+00:001998
                  193752213F-HR0000937522-98-0000141998-02-11 05:00:00+00:001998
                  2104479713F-HR0000950144-98-0013011998-02-12 05:00:00+00:001998
                  3100154013F-HR0000201196-98-0000011998-02-13 05:00:00+00:001998
                  4102621413F-HR0000914121-98-0001701998-02-17 05:00:00+00:001998
                  ..................
                  390142200472013F-HR0001398344-23-0232982023-12-21 23:24:48+00:002023
                  390143200472013F-HR0001398344-23-0232992023-12-21 23:26:06+00:002023
                  390144200472013F-HR0001398344-23-0233022023-12-21 23:28:56+00:002023
                  390145200472013F-HR0001398344-23-0233032023-12-21 23:29:23+00:002023
                  390146173113413F-HR/A0001214659-23-0167552023-12-22 20:44:54+00:002023

                  390147 rows × 5 columns

                  funds_per_year_stats = (
                      funds_filings.drop_duplicates(subset=["cik", "year"])
                      .groupby(funds_filings["year"])["cik"]
                      .count()
                      .to_frame()
                      .reset_index()
                      .rename(columns={"filedAt": "year", "cik": "count"})
                  )

                  print("Number of active funds per year (2014 - 2023):")
                  funds_per_year_stats[-10:]
                  Number of active funds per year (2014 - 2023):
                  Out:
                  yearcount
                  1620143876
                  1720154221
                  1820164393
                  1920174533
                  2020184901
                  2120195224
                  2220205596
                  2320216191
                  2420227032
                  2520237254
                  fig, ax = plt.subplots(figsize=(4, 3))

                  ax.plot(funds_per_year_stats["year"], funds_per_year_stats["count"], color="black")

                  ax.set_xticks(funds_per_year_stats["year"])
                  ax.set_xticklabels(funds_per_year_stats["year"], rotation=90)
                  ax.yaxis.grid(True, linestyle="--", which="major", color="grey", alpha=0.25)
                  ax.spines["top"].set_visible(False)
                  ax.spines["right"].set_visible(False)
                  ax.set_ylabel("Investment Managers")
                  ax.get_yaxis().set_major_formatter(
                      mtick.FuncFormatter(lambda x, p: format(int(x), ","))
                  )

                  ax.axvspan(1999.5, 2001.5, alpha=0.1, color="red", zorder=-100)
                  ax.axvspan(2006.5, 2008.5, alpha=0.1, color="red", zorder=-100)
                  ax.axvspan(2019.5, 2021.5, alpha=0.1, color="red", zorder=-100)

                  ax.text(
                      2000.5,
                      ax.get_ylim()[1] - 400,
                      "Dot-com
                  Bubble",
                      horizontalalignment="center",
                      verticalalignment="center",
                      color="red",
                      alpha=0.5,
                  )
                  ax.text(
                      2007.5,
                      ax.get_ylim()[1] - 400,
                      "GFC",
                      horizontalalignment="center",
                      verticalalignment="center",
                      color="red",
                      alpha=0.5,
                  )
                  ax.text(
                      2020.5,
                      ax.get_ylim()[1] - 400,
                      "COVID",
                      horizontalalignment="center",
                      verticalalignment="center",
                      color="red",
                      alpha=0.5,
                  )

                  caption = (
                      "Figure: Active Institutional Investment Managers per Year. The figure shows the number of institutional investment managers that filed at least one 13F-HR form in a given year. "
                      + "Managers represent funds such as Bridgewater Associates or Pershing Square. Funds are counted based on their CIK and the year a corresponding 13F was filed."
                  )
                  caption = "
                  ".join(textwrap.wrap(caption, 60))
                  fig.text(0.5, -0.30, caption, ha="center", fontsize=9)

                  plt.tight_layout()
                  plt.show()
                  Active Institutional Investment Managers per Year
                  total_number_of_funds = funds_filings["cik"].nunique()
                  print("Total number of funds (1998 - 2023):", format(total_number_of_funds, ","))
                  Total number of funds (1998 - 2023): 13,035
                  funds_per_year = (
                      funds_filings[["cik", "year"]]
                      .copy()
                      .drop_duplicates(subset=["cik", "year"])
                      .sort_values(by=["year"])
                      .reset_index(drop=True)
                  )

                  entering_and_exiting_funds = [
                      # {
                      # "year": 2013,
                      # "entering": 210,
                      # "exiting": 10
                      # },
                      # {
                      # "year": 2014,
                      # "entering": 41,
                      # "exiting": 104
                      # }
                      # ...
                  ]

                  for year in funds_per_year["year"].unique():
                      funds_year = funds_per_year[funds_per_year["year"] == year]

                      stats = {"year": year, "entering": 0, "exiting": 0}

                      if len(entering_and_exiting_funds) == 0:
                          stats["entering"] = len(funds_year["cik"])
                          entering_and_exiting_funds.append(stats)
                          continue

                      funds_prev_year = funds_per_year[funds_per_year["year"] == year - 1]

                      # calculate number of CIKs that are in funds_year, but not in funds_prev_year
                      # these are new funds
                      new_funds = funds_year[~funds_year["cik"].isin(funds_prev_year["cik"])]

                      # calculate number of CIKs that are in funds_prev_year, but not in funds_year
                      # these are funds that stopped filing
                      funds_that_stopped_filing = funds_prev_year[
                          ~funds_prev_year["cik"].isin(funds_year["cik"])
                      ]

                      stats["entering"] = len(new_funds["cik"])
                      stats["exiting"] = len(funds_that_stopped_filing["cik"])

                      entering_and_exiting_funds.append(stats)


                  entering_and_exiting_funds = pd.DataFrame(entering_and_exiting_funds)
                  entering_and_exiting_funds.sort_values(by=["year"], inplace=True)

                  print("Number of entering and exiting funds per year")
                  entering_and_exiting_funds[-10:]
                  Number of entering and exiting funds per year
                  Out:
                  yearenteringexiting
                  162014552253
                  172015564219
                  182016445273
                  192017497357
                  202018677309
                  212019618295
                  222020734362
                  232021949354
                  2420221175334
                  252023739517
                  fig, ax = plt.subplots(figsize=(4, 3))

                  ax.bar(
                      entering_and_exiting_funds["year"][1:],
                      entering_and_exiting_funds["entering"][1:],
                      color="black",
                      label="New Funds",
                      zorder=100,
                  )

                  ax.bar(
                      entering_and_exiting_funds["year"][1:],
                      entering_and_exiting_funds["exiting"][1:] * -1,
                      color="grey",
                      label="Funds that Stopped Filing",
                      zorder=100,
                  )

                  ax.get_yaxis().set_major_formatter(
                      mtick.FuncFormatter(lambda x, p: format(int(x), ","))
                  )

                  ax.set_xticks(entering_and_exiting_funds["year"][1:])
                  ax.set_xticklabels(entering_and_exiting_funds["year"][1:], rotation=90)
                  ax.yaxis.grid(True, linestyle="--", which="major", color="grey", alpha=0.25)
                  ax.spines["top"].set_visible(False)
                  ax.spines["right"].set_visible(False)

                  # add text "New Funds" onto the figure
                  ax.text(
                      2005,
                      680,
                      "New Funds",
                      horizontalalignment="center",
                      verticalalignment="center",
                      color="black",
                      fontsize=8,
                  )

                  ax.text(
                      2005,
                      -400,
                      "Exiting Funds",
                      horizontalalignment="center",
                      verticalalignment="center",
                      color="black",
                      fontsize=8,
                  )

                  ax.axvspan(1999.5, 2001.5, alpha=0.1, color="red", zorder=-100)
                  ax.axvspan(2006.5, 2008.5, alpha=0.1, color="red", zorder=-100)
                  ax.axvspan(2019.5, 2021.5, alpha=0.1, color="red", zorder=-100)

                  # add text labels for financial crisis and covid
                  ax.text(
                      2002,
                      ax.get_ylim()[1] - 200,
                      "Dot-com
                  Bubble",
                      horizontalalignment="center",
                      verticalalignment="center",
                      color="red",
                      alpha=0.5,
                  )
                  ax.text(
                      2007.5,
                      ax.get_ylim()[1] - 200,
                      "GFC",
                      horizontalalignment="center",
                      verticalalignment="center",
                      color="red",
                      alpha=0.5,
                  )
                  ax.text(
                      2020.5,
                      ax.get_ylim()[1] - 200,
                      "COVID",
                      horizontalalignment="center",
                      verticalalignment="center",
                      color="red",
                      alpha=0.5,
                  )

                  ax.set_ylabel("Investment Managers - Entering/Exiting")

                  caption = "Figure: Number of New and Exiting Institutional Investment Managers per Year. The figure shows the number of new and exiting funds per year. A new fund is defined as a fund that filed a 13F-HR form for the first time in a given year. A fund that stopped filing is defined as a fund that filed a 13F-HR form in the previous year, but not in the current year. Funds are counted based on their CIK."
                  caption = "
                  ".join(textwrap.wrap(caption, 60))
                  fig.text(0.5, -0.40, caption, ha="center", fontsize=9)

                  plt.show()
                  Number of New and Exiting Institutional Investment Managers per Year

                  Over the past 26 years, from 1998 to 2023, our analysis reveals a dynamic landscape in the fund management industry, involving 13,035 hedge funds and investment managers. Notably, 2023 emerged as a landmark year, showcasing the highest number of active funds, with a remarkable surge in new entrants from under 500 in 2016 to over 1,000 by 2022. This period witnessed the greatest influx of newly registered funds, highlighting a significant shift in the industry's composition.

                  The year 2022 stood out not only for its record-setting new fund registrations but also for signaling a pronounced market adjustment. The repercussions of this strong market correction became particularly evident in 2023, a year that saw over 500 funds withdrawing from the investment arena, surpassing the exit rates observed in the aftermath of the Global Financial Crisis (GFC) in 2009 and 2010. This trend underscores a notable correlation between market downturns or recessions and an increase in the number of funds ceasing operations, with the peak of exits often coinciding with the transition from a bear to a bull market. This suggests that many funds tend to exit the market just as conditions begin to improve.

                  For clarity, "entering funds" are identified as those reporting a Form 13F for the first time, whereas 'exiting funds' are recognized as those with previous 13F filings but none in the given year. Given the SEC's mandate requiring quarterly disclosures (four times per year), the absence of any filings from a fund within a year is a strong indicator of its operational cessation.

                  Fund Lifespan and Survival Rates

                  This section delves into the operational longevity of investment funds, focusing on calculating their survival rates and establishing quantiles to gauge the typical lifespan within the asset management industry.

                  We begin by identifying the distinct funds that submitted at least one Form 13F filing from 1998 to 2000. We then examine which of these funds remained operational in 2023, a full 23 years later, to calculate the "survival rate." This metric reflects the percentage of funds that continue to exist and operate, offering insights into the long-term viability and stability within the sector.

                  cik_of_early_funds = funds_filings[funds_filings['year'].isin([1998,1999,2000])]['cik'].unique()
                  cik_of_active_funds = funds_filings[funds_filings['year'].isin([2023])]['cik'].unique()

                  # create intersection of cik_of_early_funds and cik_of_active_funds
                  cik_of_early_funds = set(cik_of_early_funds)
                  cik_of_active_funds = set(cik_of_active_funds)

                  intersection = cik_of_early_funds.intersection(cik_of_active_funds)
                  intersection_pct = round(len(intersection) / len(cik_of_early_funds) * 100, 2)

                  print("Number of funds that were active in...")
                  print("1998, 1999 and 2000:\t\t", len(cik_of_early_funds))
                  print("1998, 1999, 2000 and 2023:\t", len(intersection), f"({intersection_pct}% survival rate)")
                  Number of funds that were active in...
                  1998, 1999 and 2000: 2256
                  1998, 1999, 2000 and 2023: 659 (29.21% survival rate)

                  In the next step, we quantify the duration of activity for each fund by tracking the number of years it reported Form 13F filings to the SEC via EDGAR. This step involves plotting the data on a histogram where the x-axis represents the number of active years, and the y-axis shows the count of funds for each duration of activity.

                  Utilizing the entire period from 1998 to 2023, we create 26 distinct buckets to categorize the funds based on the number of years they were operational. This approach allows us to visualize the distribution of fund lifespans across the observed timeframe.

                  Following the distribution analysis, we compute basic descriptive statistics to capture the central tendency and dispersion of fund lifespans. These statistics include the mean (average number of active years), standard deviation (variation in lifespan), and the first, second (median), and third quantiles.

                  ciks = funds_filings["cik"].unique()

                  years_active = []

                  for cik in ciks:
                      years = funds_filings[funds_filings["cik"] == cik]["year"].unique()
                      years_active.append({"cik": cik, "years_active": len(years), "years": years})

                  years_active = pd.DataFrame(years_active)

                  fig, ax = plt.subplots(figsize=(4, 3))

                  bins_range = range(1, 27)
                  ax.hist(years_active["years_active"], bins=bins_range, color="black")

                  # move all bars by 0.5 to the left
                  ax.set_xticks(np.array(bins_range) - 0.5)
                  ax.set_xticklabels(bins_range, rotation=90, fontsize=8, ha="center")

                  ax.yaxis.grid(True, linestyle="--", which="major", color="grey", alpha=0.25)
                  ax.spines["top"].set_visible(False)
                  ax.spines["right"].set_visible(False)

                  ax.set_xlabel("Years Active")
                  ax.set_ylabel("Number of Funds")

                  caption = (
                      "Figure: Distribution of Life Span of Funds (1998 - 2023). The figure shows the distribution of "
                      + "the number of years a fund was active. A fund is considered active if it filed a 13F-HR form "
                      + "in a given year. Funds are counted based on their CIK."
                  )

                  caption = "
                  ".join(textwrap.wrap(caption, 60))
                  fig.text(0.5, -0.30, caption, ha="center", fontsize=9)

                  plt.show()

                  print("Descriptive Statistics of the Life Spans of Funds (1998 - 2023)")
                  pd.DataFrame(years_active["years_active"].describe().round(2))
                  Histogram of Fund Lifespans
                  Descriptive Statistics of the Life Spans of Funds (1998 - 2023)
                  Out:
                  years_active
                  count13035.00
                  mean7.22
                  std6.53
                  min1.00
                  25%2.00
                  50%5.00
                  75%10.00
                  max26.00

                  The dataset, encompassing 13,035 funds, reveals a diverse landscape of fund activity durations. On average, funds have been active for approximately 7.22 years, suggesting a moderate operational lifespan across the sector. However, the standard deviation of 6.53 years highlights a significant variance in fund lifespans, indicating that while some funds operate for a brief period, others demonstrate much longer endurance.

                  A closer look at the distribution shows that the minimum active duration is just 1 year, reflecting the challenges some funds face in sustaining operations. The 25th percentile, at 2 years, implies that a quarter of the funds ceased filings within two years of operation. The median value of 5 years suggests that half of the funds were active for five years or less, providing a more resistant measure of central tendency against outliers compared to the average.

                  At the higher end, the 75th percentile reveals that three-quarters of the funds were active for 10 years or less, and 25% of all funds were active for 10 years and more, underscoring that a significant portion of the sector does not surpass a decade of activity. Remarkably, the maximum lifespan extends to 26 years, showcasing that a select few funds have managed to navigate the asset management industry successfully over the entire period analyzed.

                  Evolution of Asset Values

                  In this part of the tutorial, we delve into the progression of asset values from 2013 to 2023 by analyzing the holdings values reported in 13F filings. This analysis aims to shed light on the performance of institutional portfolios over time, revealing how they've adapted to shifting market conditions.

                  Data Extraction and Processing

                  Our focus is on the aggregate holding values of all funds for the period 2013-2023. We achieve this by extracting the total value of holdings as reported on the cover page of each 13F-HR filing. The cover page provides a snapshot of a fund's reported assets, essential for evaluating fund performance at a glance.

                  To extract the Form 13F Information Table Value Total from each cover page, we first identify the URL of these pages. The URL is located in the documentUrl parameter within the documentFormatFiles array of a filing's metadata object, accessible via the Query API.

                  Example of Extracting Data

                  Upon locating the cover page URL, we use a basic XML parser to retrieve the tableValueTotal, which represents the aggregate value of the fund's holdings. Below is an example of a cover page and of how the metadata object appears, highlighting the necessary parameters for locating the cover page:

                  Cover page example (source filing):

                  cover-page-example

                  Filing metadata object example:

                  {
                  "formType": "13F-HR",
                  "accessionNo": "0001398344-23-022864",
                  "periodOfReport": "2023-09-30",
                  "documentFormatFiles": [
                  {
                  "sequence": "1",
                  "documentUrl": "https://www.sec.gov/Archives/edgar/data/2004720/000139834423022864/xslForm13F_X02/primary_doc.xml",
                  "type": "13F-HR"
                  }
                  ]
                  }

                  Data Collection and Standardization

                  For efficiency, we've already gathered this information using the Query API and further processed it using the Render API to obtain and parse all relevant cover pages. This process involved extracting the number of holdings and their total value from each 13F filing's cover page. Given the challenge of over 30% of holding values being incorrectly reported across all funds, we applied various outlier detection algorithms to standardize the data.

                  The final dataset, encapsulated in cover-pages-cleaned.csv, includes seven columns: the CIK of the fund, filing date and time, the periodOfReport (referring to the filing's quarter), accession number, number of holdings, reported value of all holdings, and the standardized holdings value — a figure refined through our outlier detection and standardization process.

                  cover_pages = pd.read_csv("./data/cover-pages-cleaned.csv", parse_dates=["periodOfReport", "filedAt"])
                  print("13F Cover Pages (2013 - 2023)")
                  cover_pages.sort_values(by=["filedAt"])
                  13F Cover Pages (2013 - 2023)
                  Out:
                  filerCikfiledAtperiodOfReportaccessionNonumberHoldingsholdingsValueReportedholdingsValueStandardized
                  189810405922013-10-01 09:47:12-04:002013-09-300001040592-13-0000135941219594121959000
                  29887676842013-10-01 11:38:08-04:002013-09-300000767684-13-000012122881429881429000
                  79014522082013-10-01 12:08:50-04:002013-09-300001144204-13-05326881581943581943000
                  16918056762013-10-01 12:12:11-04:002013-09-300001144204-13-05327154113073908451307390845
                  186914265882013-10-01 16:57:06-04:002013-09-300001426588-13-000006182194373194373000
                  ........................
                  8838620047202023-12-21 18:22:18-05:002018-12-310001398344-23-023296482129656986129656986
                  9527020047202023-12-21 18:24:48-05:002019-06-300001398344-23-023298469160161167160161167
                  9036020047202023-12-21 18:26:06-05:002019-03-310001398344-23-023299472142784489142784489
                  9908820047202023-12-21 18:28:56-05:002019-09-300001398344-23-023302487167369301167369301
                  10561820047202023-12-21 18:29:23-05:002019-12-310001398344-23-023303475183817738183817738

                  198835 rows × 7 columns

                  def plot_holding_values(cover_pages):
                      fig, ax = plt.subplots(figsize=(6, 3))

                      cover_pages_grouped = cover_pages.groupby("periodOfReport")
                      cover_pages_grouped = (
                          cover_pages_grouped["holdingsValueStandardized"].sum().to_frame().reset_index()
                      )

                      ax.plot(
                          cover_pages_grouped["periodOfReport"],
                          cover_pages_grouped["holdingsValueStandardized"],
                          color="black",
                      )

                      xticks = cover_pages_grouped["periodOfReport"].dt.date.unique().tolist()
                      ax.set_xticks(xticks)
                      ax.set_xticklabels(xticks, rotation=90)
                      ax.yaxis.grid(True, linestyle="--", which="major", color="grey", alpha=0.25)
                      ax.spines["top"].set_visible(False)
                      ax.spines["right"].set_visible(False)

                      # draw vertical lines for every Q1. e.g. 2013-03-31, 2014-03-31, 2015-03-31, ...
                      y_min = ax.get_ylim()[0]

                      for year in cover_pages_grouped["periodOfReport"].dt.year.unique():
                          x_value = str(year) + "-03-31"
                          x_value = pd.to_datetime(x_value, format="%Y-%m-%d")
                          if (x_value in cover_pages_grouped["periodOfReport"].values) == False:
                              continue
                          y_value = cover_pages_grouped[cover_pages_grouped["periodOfReport"] == x_value][
                              "holdingsValueStandardized"
                          ].values[0]

                          ax.vlines(
                              x=x_value,
                              ymin=y_min,
                              ymax=y_value,
                              color="grey",
                              alpha=0.25,
                              zorder=-100,
                              linestyle="--",
                              linewidth=0.8,
                          )

                      ax.set_ylabel("Value of all Holdings (USD, Billions)")

                      ax.get_yaxis().set_major_formatter(
                          mtick.FuncFormatter(lambda x, p: format(int(x) / 1000000000, ",.0f"))
                      )

                      ax.axvspan(
                          pd.to_datetime("2020-03-31", format="%Y-%m-%d"),
                          pd.to_datetime("2021-12-31", format="%Y-%m-%d"),
                          alpha=0.1,
                          color="red",
                          zorder=-100,
                      )

                      ax.text(
                          pd.to_datetime("2020-12-31", format="%Y-%m-%d"),
                          ax.get_ylim()[1] * 0.95,
                          "COVID",
                          horizontalalignment="center",
                          verticalalignment="center",
                          color="red",
                          alpha=0.5,
                      )

                      ax.set_title(
                          "Holding Values across all Investment Managers
                  per Quarter (2013 - 2023)"
                      )

                      total_number_of_funds = cover_pages["filerCik"].nunique()
                      number_of_13F_filings = cover_pages["accessionNo"].nunique()

                      caption = (
                          "Figure: Total Value of all Holdings per Quarter. The figure shows the total dollar value of all holdings per quarterly reporting period (end of March, June, September and December). "
                          + "The value is calculated by summing the value of all holdings reported in a given quarter per investment manager. The value is standardized to USD billions. "
                          + f"A total of {format(total_number_of_funds, ',')} investment managers and {format(number_of_13F_filings, ',')} 13F filings are included in the figure. "
                      )

                      caption = "
                  ".join(textwrap.wrap(caption, 70))
                      fig.text(0.5, -0.35, caption, ha="center", fontsize=9)

                      # replace x labels with year and quarter
                      def get_quarter(period):
                          if period.month == 3:
                              return "Q1"
                          elif period.month == 6:
                              return "Q2"
                          elif period.month == 9:
                              return "Q3"
                          elif period.month == 12:
                              return "Q4"
                          else:
                              return "Q0"

                      def format_period(period):
                          qtr = get_quarter(period)
                          if qtr != "Q1":
                              return qtr
                          return str(period)[0:4] + " " + get_quarter(period)

                      ax.set_xticklabels(
                          [format_period(period) for period in xticks],
                          rotation=90,
                          ha="center",
                          fontsize=8,
                      )

                      plt.tight_layout()
                      plt.show()


                  plot_holding_values(cover_pages)
                  Total Value of all Fund Holdings per Quarter

                  The figure presents a time series analysis of the aggregated dollar value of holdings reported quarterly by investment managers, delineating the standard reporting periods that conclude at the end of March, June, September, and December.

                  The trend from the beginning of 2013 to the onset of 2020 is characterized by a relative stagnation in holding values, suggesting a period of flat growth across the combined portfolios of the investment industry. During this timeframe, the growth in total holdings value does not exhibit significant upward momentum.

                  A pivotal shift in the trajectory of holding values is marked by the COVID-19 pandemic. From 2020 to the end of 2021, there is an escalation in holding values, reflecting perhaps a reactive surge in investment during the pandemic period. In the fourth quarter of 2021, the aggregated assets reached an unprecedented peak, surpassing $60 trillion. This surge is subsequently countered by a downturn that coincides with the market correction in 2022, indicating a retraction of holding values in response to changing economic conditions.

                  Ranking Funds by Performance Factors

                  Our analysis extends to calculating factors to rank and compare funds by performance. We focus on measures such as maximum drawdown, Sharpe and Sortine ratio, beta, alpha, and annualized returns to establish a ranking of the top-performing funds.

                  Outlier Detection

                  A crucial step before computing these metrics is the exclusion of erroneous data through an outlier detection method as roughly 12% of funds active in 2023 have discrepancies in their 13F filings. These errors stretch beyond the variations we cleaned in the first chapter of this tutorial, and are not merely superficial but rather substantial. They range from the incorrectly swapping the number of holdings with the aggregate holding values, to inaccuracies in the reporting of aggregate holding values, such as missing million-dollar units. Additionally, some funds have submitted duplicate filings for a single period or have reported an implausible zero total holding value.

                  To address these issues, the outlier detection algorithm in the detect_outliers function pinpoints outliers by their percentage changes. The process begins by calculating the percentage change between consecutive holdingsValueStandardized entries in the cover pages of a specific fund. We then apply a threshold of 2000% to flag significant increases as outliers. If a row doesn't initially qualify as an outlier but is preceded by one that does — and the percentage change is neither excessively high nor too low — it too is classified as outlier. This approach ensures we capture outliers that could be part of an anomaly spanning multiple periods. Further refinement is done for drastic drops, with a -96% threshold used to identify extreme decreases in holding values. The resulting cover pages dataframe for a given fund includes a new column outlier with boolean flags for these deviations.

                  def detect_outliers(df, column):
                      outliers_df = df.copy()
                      outliers_df["pctChange"] = outliers_df[column].pct_change()
                      outlier_threshold_1 = 20
                      outliers_df["outlier"] = outliers_df["pctChange"] > outlier_threshold_1
                      outliers_df["outlierPrevRow"] = (
                          outliers_df["outlier"].shift(1).fillna(False).astype(bool)
                      )
                      outliers_df["outlier"] = (outliers_df["outlier"] == False) & outliers_df[
                          "outlierPrevRow"
                      ] & (outliers_df["pctChange"] < 0.7) & (
                          outliers_df["pctChange"] > -0.5
                      ) | outliers_df[
                          "outlier"
                      ]
                      outliers_df["outlierPrevRow"] = (
                          outliers_df["outlier"].shift(1).fillna(False).astype(bool)
                      )
                      outliers_df["outlier"] = (outliers_df["pctChange"] < -0.96) & ~outliers_df[
                          "outlier"
                      ] & ~outliers_df["outlierPrevRow"] | outliers_df["outlier"]
                      outliers_df["outlierPrevRow"] = (
                          outliers_df["outlier"].shift(1).fillna(False).astype(bool)
                      )
                      return outliers_df


                  print("Example of a fund (CIK: 1906802) with outliers:")
                  outliers_example = detect_outliers(
                      cover_pages[cover_pages["filerCik"] == 1906802], "holdingsValueStandardized"
                  )
                  outliers_example[
                      [
                          "periodOfReport",
                          "numberHoldings",
                          "holdingsValueReported",
                          "holdingsValueStandardized",
                          "outlier",
                      ]
                  ]
                  Example of a fund (CIK: 1906802) with outliers:
                  Out:
                  periodOfReportnumberHoldingsholdingsValueReportedholdingsValueStandardizedoutlier
                  1576232022-03-31143132484132484000False
                  1643262022-06-30128225137137000True
                  1749022022-12-31145132717600132717600True
                  1802372023-03-31143133881144133881144True
                  1876732023-06-30143141582997141582997False
                  1922172023-09-30143131914090131914090False
                  active_ciks_in_2023 = cover_pages[cover_pages["periodOfReport"].dt.year == 2023]['filerCik'].unique().tolist()

                  active_ciks_wo_outliers = []
                  active_ciks_with_outliers = []

                  for cik in active_ciks_in_2023:
                      cik_df = cover_pages[cover_pages['filerCik'] == cik]
                      outliers = detect_outliers(cik_df, 'holdingsValueStandardized')
                      has_outliers = outliers['outlier'].any()
                      if has_outliers == False:
                          active_ciks_wo_outliers.append(cik)
                      else:
                          active_ciks_with_outliers.append(cik)

                  pct_of_funds_w_outliers = round(len(active_ciks_with_outliers) / len(active_ciks_in_2023) * 100, 2)
                  print("Active funds in 2023:", format(len(active_ciks_in_2023), ","))
                  print("Active funds with outliers:", format(len(active_ciks_with_outliers), ","), f"({pct_of_funds_w_outliers}%)")
                  print("Active funds without outliers:", format(len(active_ciks_wo_outliers), ","))
                  Active funds in 2023: 6,966
                  Active funds with outliers: 814 (11.69%)
                  Active funds without outliers: 6,152

                  Performance Metrics Calculation & Ranking

                  To ascertain the relative performance of investment funds, we apply a range of metrics to rank and sort them effectively:

                  • Total Return: This measures the change from the first quarter's total holdings value to the most recent quarter, providing a snapshot of overall growth or decline.
                  • Asset Growth Rates: We evaluate performance over one, three, and five years based on asset growth rates. This metric focuses on the appreciation of the assets themselves, independent of new capital inflows.
                  • Maximum Drawdown: By examining quarter-over-quarter peak-to-trough declines in the aggregated value of all holdings, we assess the largest potential loss experienced over a specified period.
                  • Annualized Volatility: We calculate this as the standard deviation of quarterly returns, offering insight into the variability of fund returns over time.
                  • Sharpe Ratio: This common measure of risk-adjusted return divides the fund's annualized return by its annualized volatility.
                  • Sortino Ratio: Similar to the Sharpe Ratio, the Sortino Ratio measures the risk-adjusted return but uses only the annualized downside volatility, focusing on negative asset price volatility.
                  • Beta: Beta provides a measure of a fund's volatility relative to the overall market.
                  • Alpha: Alpha indicates the fund's ability to outperform the market benchmark on a risk-adjusted basis.
                  • T-Statistics and P-Values: These statistical measures test the significance of fund returns against benchmark returns, crucial for establishing the reliability of performance differences.
                  • Most Recent Total Holdings Value: This figure reflects the latest reported aggregate value of a fund's holdings.
                  • Number of Reporting Periods: The total number of quarters for which the fund has reported data.
                  • Number of Recently Reported Holdings: This is the count of holdings reported in the most recent filing period.

                  For the purpose of these calculations, we adopt a 2% annualized risk-free rate. The S&P 500 index, accessed via the ETF ticker SPY, serves as our benchmark. Historical adjusted closing prices are sourced from Yahoo Finance, enabling us to compute both beta and alpha for each fund with respect to this benchmark.

                  from scipy import stats


                  def get_time_factor(return_period=None):
                      if not return_period in ["daily", "monthly", "quarterly", "Q", "yearly"]:
                          raise ValueError(
                              'return_period must be one of: ["daily", "weekly", "monthly", "quarterly", "yearly"]'
                          )

                      if return_period == "daily":
                          return 252
                      elif return_period == "monthly":
                          return 12
                      elif return_period == "quarterly" or return_period == "Q":
                          return 4
                      elif return_period == "yearly":
                          return 1


                  @round_decorator(3)
                  def calculate_annualized_returns(fund_cover_pages):
                      quarter = fund_cover_pages["periodOfReport"].dt.quarter.values[-1]
                      period_mask = fund_cover_pages["periodOfReport"].dt.quarter == quarter
                      holding_values = fund_cover_pages[period_mask]
                      annual_returns = (1 + holding_values["holdingsValueStandardized"].pct_change()) ** (
                          1 / 1
                      ) - 1
                      periods = holding_values["periodOfReport"]

                      return pd.DataFrame({"annual_returns": annual_returns, "periods": periods}).dropna()


                  @round_decorator(3)
                  def calculate_beta(portfolio_returns, benchmark_returns):
                      if len(portfolio_returns) == 0:
                          return np.nan
                      covariance = np.cov(portfolio_returns, benchmark_returns)[0][1]
                      benchmark_variance = np.var(benchmark_returns)
                      beta = covariance / benchmark_variance
                      return beta


                  @round_decorator(3)
                  def calculate_alpha(
                      portfolio_returns, benchmark_returns, risk_free_rate, beta, return_period
                  ):
                      if len(portfolio_returns) == 0:
                          return np.nan
                      time_factor = get_time_factor(return_period)
                      portfolio_annual_return = np.mean(portfolio_returns) * time_factor
                      benchmark_annual_return = np.mean(benchmark_returns) * time_factor
                      alpha = portfolio_annual_return - (
                          risk_free_rate + beta * (benchmark_annual_return - risk_free_rate)
                      )
                      return alpha


                  @round_decorator(3)
                  def calculate_sharpe_ratio(portfolio_returns, risk_free_rate, return_period):
                      if len(portfolio_returns) == 0:
                          return np.nan
                      time_factor = get_time_factor(return_period)

                      annualized_return = np.mean(portfolio_returns) * time_factor
                      annualized_std = np.std(portfolio_returns) * np.sqrt(time_factor)

                      sharpe_ratio = (annualized_return - risk_free_rate) / annualized_std

                      return sharpe_ratio


                  @round_decorator(3)
                  def calculate_sortino_ratio(
                      portfolio_returns, minimum_acceptable_return, return_period
                  ):
                      if len(portfolio_returns) == 0:
                          return np.nan

                      time_factor = get_time_factor(return_period)

                      # convert annualized MAR to a daily/weekly/quarterly rate
                      normalized_mar = (1 + minimum_acceptable_return) ** (1 / time_factor) - 1
                      # difference between portfolio returns and the MAR
                      excess_returns = portfolio_returns - normalized_mar
                      negative_excess_returns = excess_returns[excess_returns < 0]

                      if len(negative_excess_returns) == 0:
                          return np.nan

                      average_return = np.mean(portfolio_returns)
                      downside_deviation = np.std(negative_excess_returns)

                      annualized_return = average_return * time_factor
                      annualized_downside_deviation = downside_deviation * np.sqrt(time_factor)

                      if annualized_downside_deviation == 0:
                          return np.nan

                      sortino_ratio = (annualized_return - normalized_mar) / annualized_downside_deviation

                      return sortino_ratio


                  def calculate_t_stat(portfolio_returns, benchmark_returns):
                      t_statistic, p_value = stats.ttest_ind(portfolio_returns, benchmark_returns)
                      return t_statistic, p_value
                  import yfinance as yf


                  def get_spy_data():
                      start_date = cover_pages["periodOfReport"].min() - pd.Timedelta(days=3)
                      end_date = cover_pages["periodOfReport"].max() + pd.Timedelta(days=3)

                      print("Start date:", start_date)
                      print("End date:", end_date)

                      spy_data = yf.download(
                          "SPY",
                          start=start_date,
                          end=end_date,
                      )

                      # check for each portfolio_values["date"].dt.date.unique() if it exists in spy_data.index
                      # if not, check if next day exists in spy_data.index. if not, check if next next day exists in spy_data.index
                      # add all dates that exist to dates_filter
                      dates_filter = []

                      for date in pd.to_datetime(cover_pages["periodOfReport"].dt.date.unique().tolist()):
                          if date in spy_data.index:
                              dates_filter.append(date)
                          else:
                              new_date = date - pd.Timedelta(days=1)
                              if new_date in spy_data.index:
                                  dates_filter.append(new_date)
                              else:
                                  new_date = date - pd.Timedelta(days=2)
                                  if new_date in spy_data.index:
                                      dates_filter.append(new_date)
                                  else:
                                      print(f"no date for {date}")

                      spy_data = spy_data[spy_data.index.isin(dates_filter)].copy()

                      spy_data.drop(columns=["Open", "High", "Low", "Close", "Volume"], inplace=True)

                      # standardize dates to that they all represent the end of the quarter
                      # e.g. "2013-06-28" -> "2013-06-30"
                      def to_quarter_end(date):
                          if date.month == 3:
                              return date.replace(day=31)
                          elif date.month == 6:
                              return date.replace(day=30)
                          elif date.month == 9:
                              return date.replace(day=30)
                          elif date.month == 12:
                              return date.replace(day=31)
                          else:
                              return date

                      new_index = pd.Series(spy_data.index).apply(to_quarter_end)

                      spy_data.set_index(new_index, inplace=True)

                      return spy_data


                  @memoize
                  def get_benchmark_returns(periods):
                      adj_close = spy_data[spy_data.index.isin(periods)]
                      returns = adj_close["Adj Close"].pct_change().dropna()
                      return returns


                  spy_data = get_spy_data()
                  print(f"Rows: {len(spy_data)}")
                  spy_data.head()
                  Start date: 2013-09-27 00:00:00
                  End date: 2023-10-03 00:00:00
                  [*********************100%%**********************] 1 of 1 completed
                  Rows: 41
                  Out:
                  Adj Close
                  Date
                  2013-09-30139.467758
                  2013-12-31154.146393
                  2014-03-31156.771637
                  2014-06-30164.859512
                  2014-09-30166.730240
                  risk_free_rate = 0.02  # annualized


                  def get_fund_performance_metrics(fund_cover_pages):
                      fund_cover_pages = fund_cover_pages.drop_duplicates(subset=["periodOfReport"])
                      fund_cover_pages = fund_cover_pages[
                          fund_cover_pages["holdingsValueStandardized"] > 0
                      ]
                      metrics = {}

                      if len(fund_cover_pages) < 1:
                          return None

                      last_row = fund_cover_pages.iloc[-1]
                      quarter = last_row["periodOfReport"].quarter
                      recent_year = last_row["periodOfReport"].year

                      metrics["assetValue"] = last_row["holdingsValueStandardized"].sum()
                      metrics["#holdings"] = last_row["numberHoldings"]
                      metrics["firstPeriod"] = fund_cover_pages.iloc[0]["periodOfReport"]
                      periods = fund_cover_pages["periodOfReport"].dt.date.unique().tolist()
                      metrics["#periods"] = len(periods)

                      if len(fund_cover_pages) <= 4:
                          return metrics

                      metrics["maxDrawdownQtrPct"] = round(
                          fund_cover_pages["holdingsValueStandardized"].pct_change().min() * 100, 2
                      )

                      row_5yr = fund_cover_pages[
                          (fund_cover_pages["periodOfReport"].dt.year == recent_year - 5)
                          & (fund_cover_pages["periodOfReport"].dt.quarter == quarter)
                      ]
                      row_3yr = fund_cover_pages[
                          (fund_cover_pages["periodOfReport"].dt.year == recent_year - 3)
                          & (fund_cover_pages["periodOfReport"].dt.quarter == quarter)
                      ]
                      row_1yr = fund_cover_pages[
                          (fund_cover_pages["periodOfReport"].dt.year == recent_year - 1)
                          & (fund_cover_pages["periodOfReport"].dt.quarter == quarter)
                      ]

                      quarterly_returns = (
                          fund_cover_pages["holdingsValueStandardized"].pct_change().dropna()
                      )
                      periods = fund_cover_pages["periodOfReport"].dt.date.unique().tolist()
                      benchmark_returns = get_benchmark_returns(periods)

                      if len(row_1yr) > 0:
                          if row_1yr["holdingsValueStandardized"].values[0] > 0:
                              pct_change_1yr = (
                                  last_row["holdingsValueStandardized"]
                                  - row_1yr["holdingsValueStandardized"].values[0]
                              ) / row_1yr["holdingsValueStandardized"].values[0]
                          else:
                              pct_change_1yr = 0

                          annualized_volatility = fund_cover_pages[
                              "holdingsValueStandardized"
                          ].pct_change().std() * np.sqrt(4)

                          beta = calculate_beta(quarterly_returns, benchmark_returns)

                          alpha = calculate_alpha(
                              quarterly_returns, benchmark_returns, risk_free_rate, beta, "Q"
                          )

                          sharpe_ratio = calculate_sharpe_ratio(quarterly_returns, risk_free_rate, "Q")
                          sortino_ratio = calculate_sortino_ratio(quarterly_returns, risk_free_rate, "Q")
                          t_stat_benchmark, p_value_benchmark = calculate_t_stat(
                              quarterly_returns, benchmark_returns
                          )

                          metrics["volYr"] = round(annualized_volatility, 2)
                          metrics["sharpe"] = sharpe_ratio
                          metrics["sortino"] = sortino_ratio
                          metrics["beta"] = beta
                          metrics["alpha"] = alpha
                          metrics["t-stat"] = round(t_stat_benchmark, 3)
                          metrics["p"] = round(p_value_benchmark, 3)
                          metrics["assets.pct_change_1yr"] = round(pct_change_1yr * 100, 2)

                      if len(row_3yr) > 0 and row_3yr["holdingsValueStandardized"].values[0] > 0:
                          pct_change_3yr = (
                              last_row["holdingsValueStandardized"]
                              - row_3yr["holdingsValueStandardized"].values[0]
                          ) / row_3yr["holdingsValueStandardized"].values[0]
                          metrics["assets.pct_change_3yr"] = round(pct_change_3yr * 100, 2)
                      else:
                          metrics["assets.pct_change_3yr"] = np.nan

                      if len(row_5yr) > 0 and row_5yr["holdingsValueStandardized"].values[0] > 0:
                          pct_change_5yr = (
                              last_row["holdingsValueStandardized"]
                              - row_5yr["holdingsValueStandardized"].values[0]
                          ) / row_5yr["holdingsValueStandardized"].values[0]
                          metrics["assets.pct_change_5yr"] = round(pct_change_5yr * 100, 2)
                      else:
                          metrics["assets.pct_change_5yr"] = np.nan

                      return metrics


                  print("Single fund performance metrics (CIK: 1759236)")
                  get_fund_performance_metrics(cover_pages[cover_pages["filerCik"] == 1759236])
                  Single fund performance metrics (CIK: 1759236)
                  Out:
                  {'assetValue': 324876323,
                   '#holdings': 111,
                   'firstPeriod': Timestamp('2019-03-31 00:00:00'),
                   '#periods': 19,
                   'maxDrawdownQtrPct': -19.87,
                   'volYr': 0.29,
                   'sharpe': 0.932,
                   'sortino': 1.784,
                   'beta': 1.046,
                   'alpha': 0.147,
                   't-stat': 0.924,
                   'p': 0.362,
                   'assets.pct_change_1yr': 30.9,
                   'assets.pct_change_3yr': 82.12,
                   'assets.pct_change_5yr': nan}
                  risk_free_rate = 0.02  # annualized


                  def get_performance_metrics(cover_pages):
                      performance_by_cik = {}

                      for cik in cover_pages["filerCik"].unique():
                          try:
                              metrics = get_fund_performance_metrics(
                                  cover_pages[cover_pages["filerCik"] == cik]
                              )
                              if metrics is not None:
                                  performance_by_cik[cik] = metrics
                          except Exception as e:
                              print(f"Error for cik {cik}: {e}")

                      performance_by_cik = pd.DataFrame.from_dict(
                          performance_by_cik, orient="index"
                      ).reset_index(names=["cik"])
                      return performance_by_cik
                  all_funds_metrics = get_performance_metrics(cover_pages)
                  active_funds_metrics = all_funds_metrics[all_funds_metrics["cik"].isin(active_ciks_wo_outliers)]
                  long_active_funds_mask = active_funds_metrics["firstPeriod"].dt.year <= 2018
                  positive_sharpe_mask = active_funds_metrics["sharpe"] > 0
                  positive_5yr_change_mask = active_funds_metrics["assets.pct_change_5yr"] > 0

                  best_performing = active_funds_metrics[
                      long_active_funds_mask & positive_sharpe_mask & positive_5yr_change_mask
                  ].copy()

                  best_performing.sort_values(by=["sharpe"], ascending=False, inplace=True)
                  best_performing.reset_index(drop=True, inplace=True)

                  print(f"Ranking of {len(best_performing):,} filtered funds active in 2023 ranked by their Sharpe ratio:")
                  best_performing.head(10)
                  Ranking of 2,047 filtered funds active in 2023 ranked by their Sharpe ratio:
                  Out:
                  cikassetValue#holdingsfirstPeriod#periodsmaxDrawdownQtrPctvolYrsharpesortinobetaalphat-statpassets.pct_change_1yrassets.pct_change_3yrassets.pct_change_5yr
                  01737109566888254315572018-03-3122-8.660.312.18112.2801.1630.5303.4030.00281.11249.451612.68
                  1169821822271184517022017-06-3026-8.980.182.0485.3720.5800.3072.4970.01618.26150.13463.99
                  28386187729545653382018-03-3123-2.770.201.95624.0620.8600.2982.3330.02540.71214.78561.58
                  335449761233110005122018-03-3122-10.910.221.8776.0700.5630.3302.1590.03722.87125.88418.69
                  41731061101691697717062018-03-3123-7.980.271.8396.7570.6230.4222.6630.01144.65317.551062.82
                  515402357494194968843502013-09-3041-15.360.151.8362.6570.8580.1802.4350.01729.6195.65178.02
                  6174407320703076272972017-03-3125-13.820.211.8295.2380.9110.2722.2000.0339.05129.71326.88
                  7148288067028660337982013-09-3041-14.240.221.7734.0590.5660.3173.2470.002108.81358.451103.50
                  811418027690388090932662013-09-3041-17.980.231.7223.4060.6800.3193.2690.00234.15106.95288.03
                  917216081646575000822017-12-3124-11.270.251.7044.8700.5720.3652.4250.01938.9297.45425.19

                  Visualizing Fund Performance Relative to the Benchmark

                  Building on the Sharpe ratio rankings from the previous analysis, we proceed to visualize the performance trajectories of the leading funds. The goal is to compare these funds' growth over time with the S&P 500 benchmark, offering a perspective on the merits of active management against a passive market strategy. This visualization may also provide insights into the advantages of cloning the investment strategies of high-performing funds.

                  We will focus on the top 15 funds with the strongest performance records. To ensure a fair comparison, the S&P 500 index's performance is normalized to match the initial value reported by each fund in the first quarter of their respective reporting period. This approach allows for a direct, relative performance assessment over the same timeframe.

                  def get_quarter(period):
                      if period.month == 3:
                          return "Q1"
                      elif period.month == 6:
                          return "Q2"
                      elif period.month == 9:
                          return "Q3"
                      elif period.month == 12:
                          return "Q4"
                      else:
                          return period


                  def format_period(period):
                      qtr = get_quarter(period)
                      if qtr == "Q3":
                          return qtr
                      if qtr != "Q1":
                          # return qtr
                          return ""
                      return str(period)[0:4] + " " + get_quarter(period)


                  def plot_assets_over_time(cover_pages_of_cik, ax=None, show_y_axis_label=True, title=None):
                      cik = cover_pages_of_cik.iloc[0]["filerCik"]

                      if ax is None:
                          fig, ax = plt.subplots(figsize=(6, 3))

                      ax.plot(
                          cover_pages_of_cik["periodOfReport"],
                          cover_pages_of_cik["holdingsValueStandardized"],
                          color="black",
                      )

                      periods = cover_pages_of_cik["periodOfReport"].dt.date.unique().tolist()
                      benchmark = spy_data[spy_data.index.isin(periods)].copy()

                      # scale benchmark to same scale as holdingsValueStandardized
                      benchmark["value_positions"] = (
                          benchmark["Adj Close"] / benchmark["Adj Close"].iloc[0]
                      ) * cover_pages_of_cik["holdingsValueStandardized"].values[0]

                      ax.plot(
                          benchmark.index,
                          benchmark["value_positions"],
                          color="blue",
                          linestyle="--",
                          alpha=0.5,
                      )

                      xticks = cover_pages_of_cik["periodOfReport"].dt.date.unique().tolist()

                      ax.set_xticks(xticks)
                      ax.set_xticklabels(xticks, rotation=90, fontsize=6)

                      ax.yaxis.grid(True, linestyle="--", which="major", color="grey", alpha=0.25)

                      ax.spines["top"].set_visible(False)
                      ax.spines["right"].set_visible(False)

                      if show_y_axis_label:
                          ax.set_ylabel("Asset Value (USD, Millions)")
                      ax.get_yaxis().set_major_formatter(
                          mtick.FuncFormatter(lambda x, p: format(int(x) / 1000000, ",.0f"))
                      )
                      # ax.set_title(f"CIK {cik} - Holdings Value over Time")
                      ax.set_title(f"CIK {cik}" if title is None else title)

                      y_min = ax.get_ylim()[0]

                      for year in cover_pages_of_cik["periodOfReport"].dt.year.unique():
                          x_value = str(year) + "-03-31"
                          x_value = pd.to_datetime(x_value, format="%Y-%m-%d")
                          if (x_value in cover_pages_of_cik["periodOfReport"].values) == False:
                              continue
                          y_value = cover_pages_of_cik[cover_pages_of_cik["periodOfReport"] == x_value][
                              "holdingsValueStandardized"
                          ].values[0]
                          ax.vlines(
                              x=x_value,
                              ymin=y_min,
                              ymax=y_value,
                              color="grey",
                              alpha=0.25,
                              zorder=-100,
                              linestyle="--",
                              linewidth=0.8,
                          )

                      ax.axvspan(
                          pd.to_datetime("2020-03-31", format="%Y-%m-%d"),
                          pd.to_datetime("2021-12-31", format="%Y-%m-%d"),
                          alpha=0.1,
                          color="red",
                          zorder=-100,
                      )

                      ax.text(
                          pd.to_datetime("2020-12-31", format="%Y-%m-%d"),
                          ax.get_ylim()[1] * 0.95,
                          "COVID",
                          horizontalalignment="center",
                          verticalalignment="center",
                          color="red",
                          alpha=0.5,
                      )

                      pct_growth_5yr = 0
                      pct_growth_3yr = 0
                      pct_growth_1yr = 0

                      fund_metrics = all_funds_metrics[all_funds_metrics["cik"] == cik]

                      try:
                          pct_growth_5yr = "{:.2f}%".format(fund_metrics.iloc[0]["assets.pct_change_5yr"])
                      except:
                          pass
                      try:
                          pct_growth_3yr = "{:.2f}%".format(fund_metrics.iloc[0]["assets.pct_change_3yr"])
                      except:
                          pass
                      try:
                          pct_growth_1yr = "{:.2f}%".format(fund_metrics.iloc[0]["assets.pct_change_1yr"])
                      except:
                          pass

                      # add text to figure
                      # get x-anchor by getting first x value of xticks
                      x_anchor = xticks[0]
                      ax.text(
                          x_anchor,
                          ax.get_ylim()[1] * 0.90,
                          f"5 yr return: {pct_growth_5yr}
                  3 yr return: {pct_growth_3yr}
                  1 yr return: {pct_growth_1yr}",
                          horizontalalignment="left",
                          verticalalignment="top",
                          color="black",
                          alpha=0.8,
                      )

                      number_of_13F_filings = cover_pages_of_cik["accessionNo"].nunique()

                      caption = (
                          f"Figure: Quarterly Value of all Holdings of Fund {cik}. The figure shows the total dollar value of all holdings per quarterly reporting period (end of March, June, September and December). "
                          + f"A total of {format(number_of_13F_filings, ',')} 13F filings are included in the figure. "
                      )

                      caption = "
                  ".join(textwrap.wrap(caption, 70))

                      # replace x labels with year and quarter
                      ax.set_xticklabels(
                          [format_period(period) for period in xticks],
                          rotation=90,
                          ha="center",
                          fontsize=8,
                      )


                  cik = best_performing.iloc[0]["cik"]
                  print("CIK:", cik)
                  plot_assets_over_time(cover_pages[cover_pages["filerCik"] == cik], title=f"Holding Values per Quarter (CIK: {cik})")
                  CIK: 1737109
                  Holding Values per Quarter (CIK: 1737109)
                  fig, axs = plt.subplots(5, 3, figsize=(9, 12))

                  for i in range(0, 15):
                      cik = best_performing.iloc[i]['cik']
                      row = i // 3
                      col = i % 3
                      show_y_axis_label = i % 3 == 0
                      plot_assets_over_time(cover_pages[cover_pages["filerCik"] == cik], ax=axs[row, col], show_y_axis_label=show_y_axis_label)

                  fig.suptitle("Top 15 Investment Managers by Total Asset Growth
                  ", fontsize=12)
                  plt.tight_layout()
                  plt.show()
                  Top 15 Investment Managers by Total Asset Growth

                  Alpha Distribution Across Funds

                  In our final analysis, we turn our attention to the distribution of alpha among all funds over the past decade. Alpha, a key indicator of excess return on a risk-adjusted basis, serves as a measure of a fund's ability to beat the market. We will categorize funds into different alpha groups and assess the performance distribution within these segments. The objective is to determine the prevalence and degree of market outperformance among funds. This examination seeks to identify the frequency and quantity of funds that consistently achieve returns above market benchmarks.

                  cik_filter = [1525947, 1801507, 1476534, 96223, 1532618, 1761013, 1781948, 1720777, 1820680, 714142]

                  data_to_plot = all_funds_metrics[
                      (all_funds_metrics['alpha'].isna() == False)
                      & (all_funds_metrics["alpha"] < 3)
                      & (all_funds_metrics["alpha"] > -3)
                      & (all_funds_metrics["#periods"] > 5)
                      & (all_funds_metrics["#holdings"] > 5)
                      & (all_funds_metrics["assets.pct_change_3yr"].isna() == False)
                      & (all_funds_metrics["maxDrawdownQtrPct"] > -70)
                  ]

                  data_to_plot.sort_values(by='alpha', ascending=False).head(10)
                  Out:
                  cikassetValue#holdingsfirstPeriod#periodsmaxDrawdownQtrPctvolYrsharpesortinobetaalphat-statpassets.pct_change_1yrassets.pct_change_3yrassets.pct_change_5yr
                  607517610132590385683018212019-03-3119-67.957.160.4917.6306.7042.6850.9790.335227.48383.91NaN
                  428816334464394156411852015-03-3135-26.176.540.38112.867-0.4272.4971.0510.2975.9736.95115.05
                  5616172831911886552405372018-03-3122-20.235.540.48220.6861.0652.4891.0310.3098.0892.43123.49
                  647017809853841330002732019-09-3016-54.053.730.6807.8460.8762.3441.2080.23752.44100.44NaN
                  194513047621688600000822013-09-3014-4.765.010.60175.4937.0042.2411.0070.3241030.221195.71NaN
                  6311176764036476308819522019-03-3116-48.192.090.8476.114-3.4032.1711.4510.158-0.92417.45NaN
                  65211766904186003058530452019-12-3116-53.213.110.7544.6681.1592.1541.3470.18953.16293.46NaN
                  572517220534841953793112018-03-3122-58.374.080.5236.3720.7921.9921.1050.27613.89-73.27220.43
                  217513647423627977394535497322013-09-3039-21.618.330.34121.1997.7001.9380.9950.32323.3933.8856.98
                  26681420473380528670902772013-09-3041-15.574.820.40119.7170.1141.8971.1840.24024.1051.59625.65
                  fig, ax = plt.subplots(figsize=(6, 3))


                  x = data_to_plot["alpha"]

                  ax.hist(x, bins=100, color="black", alpha=0.9)

                  ax.set_yscale("log")

                  ax.set_xlabel("Alpha")
                  ax.set_ylabel("Number of Funds")

                  ax.axhline(y=100, color="grey", alpha=0.5, linestyle="--", linewidth=0.8, zorder=-100)
                  ax.axhline(y=10, color="grey", alpha=0.5, linestyle="--", linewidth=0.8, zorder=-100)
                  ax.axvline(x=0, color="red", alpha=0.5, linestyle="--", linewidth=1.8, zorder=100)

                  ax.spines["top"].set_visible(False)
                  ax.spines["right"].set_visible(False)

                  ax.set_title("Alpha Distribution of All Funds (2013 - 2023)")

                  plt.tight_layout()
                  plt.show()

                  print("Alpha Distribution of All Funds (2013 - 2023) - Descriptive Stats")
                  pd.DataFrame(data_to_plot["alpha"].describe().round(2))
                  Alpha Distribution of All Funds (2013 - 2023)
                  Alpha Distribution of All Funds (2013 - 2023) - Descriptive Stats
                  Out:
                  alpha
                  count4022.00
                  mean0.04
                  std0.26
                  min-0.85
                  25%-0.08
                  50%0.00
                  75%0.11
                  max2.68

                  Our analysis focuses on a subset of 4,022 funds from the original pool of 13,000, spanning the years 2013 to 2023. This subset was selected based on criteria including a minimum of five reported quarters, possession of over five holdings, and a calculable three-year asset growth or decline rate.

                  Our findings reveal that 50% of the funds active within this timeframe have an Alpha of 0.0. This suggests a balanced distribution where half of the funds did not outperform the S&P 500, while the other half managed to surpass it. Remarkably, 25% of these funds achieved market outperformance by more than 11 percentage points, registering an Alpha of 0.11 or higher. Conversely, the bottom 25% of funds underperformed the market by at least 8%, indicated by an Alpha of -0.08 or lower.

                  Conclusion

                  In 2023, the asset management industry witnessed a historical peak in activity, with over 7,000 funds filing a 13F, marking the highest number since records began in 1998. This growth trajectory, however, was not linear, with significant fluctuations influenced by macroeconomic events such as the dot-com bubble, the Global Financial Crisis (GFC), and the COVID-19 pandemic. These periods saw an initial surge in new fund entries, followed by a notable increase in exits, reflecting the industry's sensitivity to economic cycles.

                  An analysis of fund longevity and survival rates revealed that only 29% of the 2,256 funds active from 1998 to 2000 remained in operation by 2023. On average, funds maintained activity for around 7.22 years, though the standard deviation of 6.53 years indicates a broad dispersion in operational durations. Specifically, 25% of funds ceased operations within two years, while the median lifespan was five years. Notably, a quarter of all funds achieved a lifespan exceeding ten years.

                  The aggregate asset growth rates remained relatively flat from 2013 to early 2020, followed by a dramatic increase from early 2020 to the end of 2021. This period, marked by the COVID-19 pandemic and subsequent economic stimulus measures, saw holding values double from 30trilliontoover60 trillion, only to decline as part of the 2022 market correction.

                  By calculating various performance metrics such as the Sharpe and Sortino ratios, along with alpha and beta, and analyzing the distribution of alpha, we discerned a division in fund performance over the last decade. Half of the funds underperformed the market, while the remaining half outperformed. Specifically, the bottom 25% of funds underperformed the market by more than 8%, whereas the top 25% exceeded market performance by more than 11% on a risk-adjusted basis.

                  Footer

                  Products

                  • EDGAR Filing Search API
                  • Full-Text Search API
                  • Real-Time Filing Stream API
                  • Filing Download & PDF Generator API
                  • XBRL-to-JSON Converter
                  • 10-K/10-Q/8-K Item Extractor
                  • Investment Adviser & Form ADV API
                  • Insider Trading Data - Form 3, 4, 5
                  • Restricted Sales Notifications - Form 144
                  • Institutional Holdings - Form 13F
                  • Form N-PORT API - Investment Company Holdings
                  • Form N-PX API - Proxy Voting Records
                  • Form 13D/13G API
                  • Form S-1/424B4 - IPOs, Debt & Rights Offerings
                  • Form C - Crowdfunding Offerings
                  • Form D - Private Placements & Exempt Offerings
                  • Regulation A Offering Statements API
                  • Changes in Auditors & Accountants
                  • Non-Reliance on Prior Financial Statements
                  • Executive Compensation Data API
                  • Directors & Board Members Data
                  • Company Subsidiaries Database
                  • Outstanding Shares & Public Float
                  • SEC Enforcement Actions
                  • Accounting & Auditing Enforcement Releases (AAERs)
                  • SRO Filings
                  • CIK, CUSIP, Ticker Mapping

                  General

                  • Pricing
                  • Features
                  • Supported Filings
                  • EDGAR Filing Statistics

                  Account

                  • Sign Up - Start Free Trial
                  • Log In
                  • Forgot Password

                  Developers

                  • API Sandbox
                  • Documentation
                  • Resources & Tutorials
                  • Python API SDK
                  • Node.js API SDK

                  Legal

                  • Terms of Service
                  • Privacy Policy

                  Legal

                  • Terms of Service
                  • Privacy Policy

                  SEC API

                  © 2025 sec-api.io by Data2Value GmbH. All rights reserved.

                  SEC® and EDGAR® are registered trademarks of the U.S. Securities and Exchange Commission (SEC).

                  EDGAR is the Electronic Data Gathering, Analysis, and Retrieval system operated by the SEC.

                  sec-api.io and Data2Value GmbH are independent of, and not affiliated with, sponsored by, or endorsed by the U.S. Securities and Exchange Commission.

                  sec-api.io is classified under SIC code 7375 (Information Retrieval Services), providing on-demand access to structured data and online information services.