Skip to main content
§ Product

Cross-Lingual News & Filings Intelligence

Intraday sentiment scores and event extractions for tickers, sectors, and macro themes, sourced from non-English news, regulatory filings, and social — covering the languages where your existing stack goes dark.

Engagement
6–10 week build · monthly model retraining
Built for
Macro PMs · Event-driven PMs · Multi-strat research leads
§ Problem

RavenPack, Bloomberg, AlphaSense — every fund has English-language news sentiment. The edge is the Chinese filings, the Brazilian regulator notices, the Japanese local press picked up three days before the wire.

What this is

A sentiment and event-extraction layer for the languages your English-language stack can't read. Three layers:

  • Source ingestion. Per-language sources — business press, regulatory filings (CSRC, KRX, HKEX, etc.), select social — wired into a normalized pipeline with deduplication and source-level reliability scoring.
  • Per-language NER + scoring. Language-specific named-entity recognition for the names that matter to your universe — tickers, executives, regulators, brands. Sentiment and event classification with models fine-tuned per language, not translated-then-scored.
  • Delivery. Intraday scores and event-time alerts via API. Snippets surfaced alongside scores so an analyst can audit the source.

How it's built

For each supported language: an NER model fine-tuned on financial-domain text, an event-classification head, a sentiment head, and a source-reliability scoring layer. Backbone: a multilingual transformer per language (XLM-RoBERTa-class for most, language-specific where the data justifies). Inference served behind a thin FastAPI layer. Backtest infra runs against an internal labeled set — extended each engagement against the fund's universe.

What you get

  • The source list curated for your universe.
  • A scoring API — intraday sentiment, event classification, entity-level scores.
  • Snippets surfaced alongside scores (translated for the analyst, scored in source).
  • Backtest infra tied to your existing research stack.
  • A monthly model-retraining schedule and the labeled-data process behind it.
§ How we engage

Engagement is shape, not list.

Length and price are functions of the data and the destination. The shape below is the typical engagement.

Length
6–10 week build · monthly model retraining

Scoped during the discovery call against the actual data and the operation it integrates with.

Lead
Bogdan

Principal engineer. Architecture and most code ships through one keyboard.

Cadence
Async, weekly

Written updates between, calls when the decision needs the room.

Bar
Production

Async correctness, capacity under burst, observability at every boundary.

§ Questions

What buyers ask about this one.

  • We have RavenPack. Why would we add this?

    RavenPack is English-first. The serious edge sits where everyone else is dark — Chinese consumer brands picked up on Weibo, Brazilian commodity exporters covered in Portuguese press, Korean conglomerate filings in Korean. The product is built to cover those gaps, not duplicate the English coverage you already pay for.

  • Which languages are supported?

    Chinese (Simplified + Traditional), Japanese, Korean, Portuguese, Spanish, Russian, German, French — depth varies. The engagement starts by picking the two or three you actually trade in and going deep, rather than spreading thin. The named-entity recognition layer is the bottleneck for each new language, not the translation.

  • What's the source list?

    Per-language curated. The first engagement defines it against the universe you trade. For Chinese: top business press (Caixin, 21st Century Business Herald), regulatory filings (CSRC, HKEX), Weibo and Zhihu signal where useful. Per-language equivalents elsewhere. The curation matters more than the breadth.

  • How do you handle translation drift between source and signal?

    We don't translate then score. We score in the source language with a language-specific model, then surface the relevant snippet (translated) alongside the score so the analyst can audit. Translation as evidence, not as input.

  • Pricing?

    Scoped against language depth and source-list size. Discovery call covers both.

§ The next step

If the deliverable matches the gap, the next step is one call.

We'll scope length and price against your data and the operation it integrates with. No retainer, no fishing.

Bogdan and team · async-first · OP—2026