Skip to main content
§ Work

Intelligence layers for hedge funds running existing strategies.

Alt-data ingestion with point-in-time correctness, feature libraries that don't leak look-ahead, signals an existing strategy can absorb, and attribution that survives the next risk-committee review. Anonymized engagement pattern.

by Bogdan#hedge-fund#alt-data#quant#case-study

The brief, in shape: a fund is running a strategy. The strategy has been good for some number of years, but the marginal information advantage is shrinking. They want an intelligence layer — alt-data ingested cleanly, transformed into features and signals their existing process can absorb, with attribution that holds up under scrutiny.

The work is not to replace the strategy. It is also not to "build a fund." It is to make the existing operation more data-intelligent without compromising the part of it that already works.

This is anonymized — no client names, no specific datasets, no PnL figures — but the engagement pattern is consistent enough across the work we've done that the architecture below describes the actual thing rather than a stylized caricature.

What's already there, what isn't

The fund typically has:

  • A reliable market-data layer (vendor-fed, plus internal cleansing).
  • A strategy stack — some mix of Python, kdb, C++ — that produces positions on whatever cadence the strategy runs at.
  • A research environment where PMs and quants iterate.
  • Risk and compliance already in place.

What they don't have, typically, is a clean alt-data pipeline. Vendor data lands as files, the schema drifts, the timestamps are ambiguous, the entity mapping is wrong, and "ingestion" is whatever the most recent intern wrote. Point-in-time correctness is an aspiration rather than a guarantee. And there's no way to merge alt-data-derived signals into the existing strategy without contaminating its evaluation history.

That's the surface area.

Alt-data ingestion: the unglamorous foundation

The single most common failure mode in alt-data integration is leakage from delivery time vs. observation time. A vendor sends Tuesday's data on Wednesday morning. The data is "as of Tuesday" but you didn't have it Tuesday. A naive backfill stamps Tuesday's row with Tuesday's date, and a year later your backtest looks magnificent because it's using information that wasn't available at decision time.

What we build instead: an ingestion layer that records, per row, both the observation timestamp (when the underlying event happened) and the availability timestamp (when the data became queryable in the fund's system). Every downstream feature, signal, and backtest pulls as-of the availability timestamp. Look-ahead bias becomes structurally impossible rather than contractually forbidden.

Concretely:

  • Schema-validated landing tables with explicit observed_at and available_at columns.
  • Idempotent loaders, so re-ingesting a vendor's correction doesn't double-count.
  • Entity resolution against the fund's existing instrument universe — vendor tickers, Bloomberg tickers, and internal IDs do not agree, and this is where most pipelines silently drop rows.
  • Versioned schemas, so a vendor's mid-quarter format change doesn't corrupt the historical record.

Nothing about this is glamorous. It is the difference between an alt-data layer that survives an audit and one that doesn't.

Feature engineering, point-in-time

Above the ingestion layer sits a feature library: quantities derived from the alt-data that map onto the strategy's existing decision surface. Sentiment indices, supply-chain indicators, behavioral aggregates, event flags — whatever the dataset surfaces and the strategy can absorb.

Each feature carries:

  • A definition — the actual transformation, in code, source-controlled.
  • A point-in-time evaluation function that takes a date d and returns the feature value as it would have been computable using only data available at d.
  • Tests against a small set of historical dates with known correct values, run on every change.

The point-in-time function is the load-bearing piece. It is what lets the same feature library serve both research backtests and live signal generation, with no diverging implementation between the two — the bug class that kills more strategies than any other.

Signal generation, with the strategy team in the loop

The intelligence layer outputs signals — usually scalar or low-dimensional — that the existing strategy can incorporate as inputs. The strategy team decides how to weight them, whether to use them at all, and how to combine them with their existing alpha.

This boundary is deliberate. The fund's strategy team owns the strategy. We own the input quality and the feature/signal infrastructure. Crossing the line — building the strategy for them — is how engagements go bad. They know things about the strategy we will never know, and we know things about the data plumbing they have no reason to want to know.

Attribution

Once the new signals are in the strategy, the fund needs to know what they're contributing. Not just total PnL, which is contaminated by everything else, but the marginal contribution of the new signal stream relative to a counterfactual where it isn't used.

Walk-forward attribution — running the strategy with and without the new signals over rolling windows, under the same execution model and the same constraints — is the cleanest method. The output is a time series of marginal contribution that survives the next risk-committee review.

The same machinery makes turning a signal off easy. If a feature stops earning its keep, it gets shelved without ceremony.

What ships vs. what stays theirs

The fund keeps the strategy, the research environment, the PMs, the risk system. We ship:

  • The ingestion pipeline — their infrastructure, their accounts, their cloud, their secrets.
  • The point-in-time feature library — theirs to extend.
  • The signal generation layer — theirs to retune.
  • A handover document and operational runbook, so their team runs it without us.

The integration is async-first; we work alongside their team rather than on top of them.

The pattern

Every hedge-fund engagement we've done in this space has the same shape: existing operation, marginal information advantage shrinking, alt-data and intelligence layers as the lever. The work is in the boring parts — point-in-time correctness, schema discipline, attribution that holds up — not the glamorous parts.

The strategy stays the strategy. The operation gets sharper.