Data Details

This dataset prioritizes full coverage and transparency over uniform accounting precision.

All the content below was generated by ChatGPT at the end of the process, to explain how it generated the final file. This is everything ChatGPT wants you to know about the methodology, limitations, and strengths of this document, in its own words, unedited.

What's a Program?

In this dataset, a “program” is defined as a statutorily grounded, policy-relevant unit of government activity that can be meaningfully linked to a flow of funds. Programs are constructed by aligning authorizing law, how agencies actually operate, and where money moves through federal financial structures (e.g., outlays, trust funds, or tax expenditures). They are intentionally defined at a middle level of granularity—large enough to reflect distinct policy purposes and funding streams, but not so granular that they fragment into administrative artifacts or accounting line items. This means programs are not taken directly from any single reporting system, but are synthesized to answer a practical question: what does the government do, and how much does it cost?

Downloads

Methodology

What this is

This document is a reconstruction of FY2024 federal spending that connects three things that are usually fragmented:

what the government is authorized to do (statute)
what it says it does (program inventories)
where the money actually flows (financial accounts, outlays, and revenue loss)

There is no single government dataset that does this end-to-end. This model exists because those systems do not align.

What problem this is solving

The official Office of Management and Budget Federal Program Inventory (FPI) includes thousands of entries, but:

it mixes levels of abstraction (programs, activities, line items)
it does not reconcile cleanly to total federal spending
it is difficult to use to answer basic questions like:
- “What are we actually spending money on?”
- “How much does each major function of government cost?”

This model takes a different approach:

Start from what the government is supposed to do, and force a reconciliation to where the money actually goes.

How it was built

The model was constructed in layers.

1. Statute-first program definition

Programs are defined based on authorizing law, not reporting artifacts.

That means:

fewer, more meaningful rows
elimination of duplicative or purely administrative entries

2. Program → financial structure mapping

Each program is mapped to a financial anchor:

Treasury Account Symbols (TAS), where available
trust funds (e.g., Social Security, Highway Trust Fund)
appropriation/account group structures
revenue-loss estimates for tax expenditures

Where a direct mapping was not available, the model uses:

account-group proxies
and explicitly labels them as such

3. Dollar assignment

Each program is assigned one primary fiscal measure:

Outlays (preferred, for actual spending)
Obligations (for credit programs)
Revenue loss (for tax expenditures)

The model is anchored to total FY2024 federal spending (~$6.75T) and forces reconciliation to that total.

4. Residual handling (this part matters)

Where data cannot be cleanly mapped:

dollars are not estimated away
they are isolated into explicit residual rows

These rows are labeled:

“Unmapped – [Department] … (requires agency clarification)”

This is intentional.

It surfaces where:

agency reporting is unclear
financial structures do not align with program definitions
or public data is insufficient

5. Confidence levels

Each row includes a Confidence Level:

High → directly tied to Treasury accounts or audited data
Medium-High → structured trust/account mapping
Medium → account-group or proxy mapping
Low → unresolved or estimated
Detail / zero-carry → structural rows without direct dollars

This allows users to distinguish:

what is known precisely
what is inferred
and what is not yet resolvable

What’s different from existing government data

This model makes several deliberate tradeoffs:

It prioritizes coverage over uniform precision

100% of spending is accounted for
but not all spending is mapped at the same level of financial detail

It prioritizes structure over reporting conventions

programs reflect statutory intent, not internal agency labels

It prioritizes transparency over cleanliness

unresolved areas are shown, not smoothed over

Strengths

1. Full coverage

This model accounts for ~100% of FY2024 federal spending.

There are no hidden gaps—only explicit ones.

2. Cross-system reconciliation

It connects:

statute
program inventories
and financial accounts

This is not available in any single government dataset.

3. Transparency of uncertainty

Instead of pretending precision:

confidence is labeled
assumptions are visible
gaps are isolated

This makes the model more—not less—credible.

4. Usability

The model is structured to answer questions like:

What are the largest functions of government?
How much do direct benefits vs. tax expenditures cost?
Which agencies control the most spending?
Where is the data weakest?

Limitations

1. Mixed levels of financial precision

Not all programs are mapped to Treasury accounts.

some are TAS-level (high precision)
others rely on account-group proxies

This reflects real limitations in public data—not modeling shortcuts.

2. Credit and loan programs are inherently complex

Programs like student loans:

use obligations, not outlays
involve subsidy accounting and long-term flows

These are included, but not fully normalized to cash-flow equivalents.

3. Tax expenditures are estimates

Revenue-loss figures:

come from modeling, not observed spending
vary depending on assumptions

They are included for completeness, but are not directly comparable to outlays.

4. Agency reporting inconsistencies

Some residuals exist because:

agencies report at incompatible levels
financial accounts do not map cleanly to programs

These are surfaced explicitly.

What this is not

not an official government dataset
not an audited financial statement
not a replacement for detailed budget documents

What this is

A transparent, structured, and fully reconciled model of what the federal government does and how much it spends doing it.

Final note

No single source provides this view.

This model exists because:

the underlying systems (policy, budgeting, accounting, reporting) are not aligned
and answering basic questions requires stitching them together

Where the model is precise, it shows that.
Where it is not, it shows that too.

That’s the point.

Definitions

Department

The cabinet-level department or independent agency responsible for the program.

Examples:

Health and Human Services (HHS)
Treasury
Department of Defense

Program

The name of the program, defined based on statute or policy function.

Programs are structured to reflect:

what the government is authorized to do
not just how agencies report activities

Agency

The specific agency or sub-agency that administers the program.

Examples:

Centers for Medicare & Medicaid Services (CMS)
Internal Revenue Service (IRS)

Statute

The primary law authorizing the program.

This anchors the program to:

congressional intent
legal authority

Program Type

The functional category of the program.

Examples include:

Direct Benefit (e.g., Social Security)
Grant
Tax Expenditure
Credit / Loan / Guarantee
Procurement / Operations

In Official FPI?

Whether the program appears in official federal program inventories.

Values:

Fully represented → clearly included in official data
Partially represented → present but fragmented or inconsistent
Not represented → missing or not clearly identifiable

💰 Dollar Columns

Outlays (millions)

Actual money spent in FY2024.

Represents cash leaving the federal government
Includes benefits, salaries, contracts, and grants

👉 Best measure of real spending

Outlays Source

Where the outlay figure comes from.

Indicates whether the value is:

directly tied to financial accounts
derived from mapping
or estimated

Obligations (millions)

Money the government committed to spend in FY2024.

Created when a contract, grant, or loan is approved
May be spent in future years

👉 “Money promised,” not necessarily spent yet

Obligations Source

Where the obligation value comes from.

Budget Authority (millions)

The amount Congress authorized agencies to commit.

Sets the legal limit on obligations
Does not mean money was spent

👉 “Spending permission”

Budget Authority Source

Where the budget authority value comes from.

Revenue Loss (millions)

Estimated revenue not collected due to tax policy (tax expenditures).

Examples:

tax credits
deductions
exclusions
Not direct spending
Based on estimates

👉 Spending through the tax code

Revenue Loss Source

Where the revenue loss estimate comes from.

📊 Core Modeling Columns

Dollar Basis (Primary)

The primary fiscal measure used for the program.

Each program is assigned one:

Outlays
Obligations
Revenue Loss

This ensures:

no double counting
consistent totals

Dollar Quality

How directly the dollar amount reflects observed data.

Values:

Actual → reported or observed spending
Estimated → derived from modeling or mapping
Carry-Zero → structural row with no direct dollars

Confidence Level

How tightly the program’s dollars are tied to financial systems.

Values:

High → directly tied to Treasury accounts or audited data
Medium-High → structured trust/account mapping
Medium → account-group or proxy mapping
Low → unresolved or estimated
Detail / zero-carry → structural row without direct dollars

Why Unresolved

Explanation for any residual or unmapped dollar amounts.

Used when:

data does not align cleanly
agency reporting is unclear
financial structures are inconsistent

Dollar Method Note

Description of how the dollar value was derived.

Provides context for:

mapping approach
assumptions
reconciliation logic

🧩 Context / Metadata Columns

Baseline Match Detail

Details on how the program maps to official program inventories.

Used to:

explain alignment or mismatch
document differences in structure

Funding Status

Indicates whether the program is active, ongoing, or uncertain.

Source

High-level reference for where the data originated.

Examples:

government datasets
agency reports
modeled estimates

Notes

Additional context or clarifications about the program.

⚠️ Important Notes

1. All dollar amounts are in USD

(And depending on your version: either full dollars or millions—label clearly)

2. Not all programs use the same type of dollars

Outlays = actual spending
Obligations = commitments
Revenue Loss = tax-based spending

3. Totals use a single “Primary” measure per program

This avoids double counting and ensures full coverage.

4. This dataset prioritizes transparency

uncertainty is labeled
gaps are shown
assumptions are documented

Data Details

What's a Program?

Downloads

Jump to:

Methodology

What this is

What problem this is solving

How it was built

1. Statute-first program definition

2. Program → financial structure mapping

3. Dollar assignment

4. Residual handling (this part matters)

5. Confidence levels

What’s different from existing government data

It prioritizes coverage over uniform precision

It prioritizes structure over reporting conventions

It prioritizes transparency over cleanliness

Strengths

1. Full coverage

2. Cross-system reconciliation

3. Transparency of uncertainty

4. Usability

Limitations

1. Mixed levels of financial precision

2. Credit and loan programs are inherently complex

3. Tax expenditures are estimates

4. Agency reporting inconsistencies

What this is not

What this is

Final note

Definitions

Department

Program

Agency

Statute

Program Type

In Official FPI?

💰 Dollar Columns

Outlays (millions)

Outlays Source

Obligations (millions)

Obligations Source

Budget Authority (millions)

Budget Authority Source

Revenue Loss (millions)

Revenue Loss Source

📊 Core Modeling Columns

Dollar Basis (Primary)

Dollar Quality

Confidence Level

Why Unresolved

Dollar Method Note

🧩 Context / Metadata Columns

Baseline Match Detail

Funding Status

Source

Notes

⚠️ Important Notes

1. All dollar amounts are in USD

2. Not all programs use the same type of dollars

3. Totals use a single “Primary” measure per program

4. This dataset prioritizes transparency