Event Detection in Financial Markets

Turning noisy news flow into market-moving events - and using those events to model next-day stock direction.

NLP + Event MiningTime-Series SignalsClustering + RetrievalSentiment + Directional PredictionReutersBiz (2016-2020) + Large-cap stocks

What this project does

Most finance headlines are noise. This system finds the days that look like real market events, extracts the dominant event from the day's headlines, and predicts whether similar events historically pushed the stock up or down.

Thesis-style pipeline
1) Detect event dates

Identify days with abnormal return behavior (volatility-aware) to focus learning on impactful windows.

2) Mine the event from headlines

Group same-day headlines into clusters and keep the dominant cluster as the event candidate for that day.

3) Predict direction

Match the new event to similar historical events (semantic similarity) and infer the expected direction using sentiment + neighbors.

Outcome: cleaner training signalOutcome: interpretable timelinesOutcome: higher accuracy on predicted days

Methodology (high-level)

A simple pipeline: find event-like days from price behavior, extract the dominant event from headlines, then predict direction by matching to similar historical events.

Non-technical
Flowchart: Prices + Headlines -> Events -> Prediction
Replace with a clean diagram: EDD -> Event mining (clustering) -> Event matching -> Sentiment-weighted direction.
[placeholder]
Step 1 - Find event dates

Use price time-series to detect days that behave unusually (relative to recent and longer-term volatility). These are likely event days.

Input: open pricesOutput: event-date candidates
Step 2 - Extract the event

On each event day, group similar headlines and keep the dominant cluster as the day's event representation (summarized in the thesis version).

Input: headlinesOutput: event cluster + summary
Step 3 - Predict direction

Match the new event to similar historical events and infer the expected return direction using sentiment and historical outcomes.

Output: up / down (or abstain)With explanation
Technical notes (optional)Toggle
  • Event-date detection (EDD): a volatility-aware abnormality score combining short/long horizons and signed/absolute returns.
  • Event mining: clustering headlines per day to isolate the dominant event while filtering outliers.
  • Thesis-style representation: semantic similarity + automatic cluster summarization for robust matching and cleaner event text.
  • Evaluation: standard accuracy on predicted days + a coverage-aware view using RIS.

Tesla walkthrough (visual story)

A single-company view makes the pipeline intuitive: most days have headlines, few days behave like events, and even fewer produce confident predictions.

Interactive-friendly
Calendar: News Days vs Event Dates vs Predictions
Overlay 3 layers: (1) days with Tesla headlines, (2) detected event dates, (3) days where the model issues a prediction.
[placeholder]
Timeline: Price/Return with Event Markers
Line chart + vertical markers for news days / event dates / predicted days (with up/down arrows).
[placeholder]
EDD score over time
Abnormality score with threshold; highlighted selected dates.
[placeholder]
Clustering snapshot (same-day headlines)
Show cluster sizes + representative summary per cluster.
[placeholder]
Nearest historical events (semantic match)
Event candidate -> top-k similar historical clusters, with summaries and past outcomes.
[placeholder]
Modern thesis upgrade (high level)

Compared to the original paper implementation, the thesis version uses semantic similarity and automatic cluster summaries to make the event representation cleaner and matching more robust - while keeping the same core "event mining -> prediction" storyline.

Semantic similarity (embeddings)Cluster summarization (LLM/NLG)Finance sentiment (modern)

Results snapshot

A quick, portfolio-friendly summary of what improved when we focused training on event dates and event-level representations.

Directional accuracy (overall)
0.65
Using event-date filtering (EDD) to build the historical event set.
Baseline comparison
RSH: 0.58Market Model: 0.35
Same test window; event filtering reduces noise and helps the predictor.
Coverage-aware behavior
Predicts selectively
The model is designed to abstain on noisy days and focus on days that resemble historical events.
Coverage vs Accuracy chart
Plot accuracy on predicted days vs prediction coverage (fraction of days predicted).
[placeholder]

Evaluation (Accuracy + RIS)

Selective prediction changes how you should evaluate a model. I report standard accuracy on predicted days, and also use a coverage-aware metric.

Current (standard) reporting
  • Directional accuracy computed on the subset of days where the system issues a prediction.
  • # Predictions reported alongside accuracy to make the "selective" behavior explicit.
Accuracy on predicted days #predictions / time window
RIS (coverage-aware)

RIS is used to evaluate models that predict on a subset of samples, balancing correctness with selectivity. This makes it easier to compare systems that intentionally abstain under uncertainty.

Selective prediction Comparable across coverages
Placeholder: add a short RIS definition / formula + link to details.

Reuters key events vs model-detected event headlines (Tesla)

A fun, human-readable check: when Reuters curated a Tesla event timeline, the detected events align closely in meaning - often with completely different wording.

Story mode
Date
Reuters
Model
June 21, 2016
Timeline
Reuters summary
curated

Tesla announces its plan to buy SolarCity, a solar energy system company in which Musk holds a stake, for $2.9 billion.

Detected headline
matched meaning

breaking: tesla makes offer to acquire solar company solarcity

July 28, 2017
Timeline
Reuters summary
curated

Musk hands over the first Model 3s to employee buyers, announcing over half a million advance reservations for the new electric sedan starting at $35,000. Musk anticipates "at least six months of manufacturing hell."

Detected headline
matched meaning

tesla drops after musk warns of 'manufacturing hell': @randewich $tsla

Nov. 1, 2017
Timeline
Reuters summary
curated

Tesla pushes back its target to build 5,000 Model 3s per week to the first quarter of 2018 from an original target of December due to production bottlenecks.

Detected headline
matched meaning

tesla reports biggest-ever quarterly loss, model 3 delays $tsla

April 3, 2018
Timeline
Reuters summary
curated

Musk says Tesla will not need to raise more capital in 2018. Shares jump as much as 6.9 percent.

Detected headline
matched meaning

tesla says no need for capital raise as model 3 output rises

May 2, 2018
Timeline
Reuters summary
curated

Tesla shares slump after Musk cuts off analysts on a conference call asking about company finances, criticizing their "boring, bonehead" questions. Tesla loses $2 billion in stock market value.

Detected headline
matched meaning

the price of cutting off analysts? for tesla, it's $2 billion

Aug. 1, 2018
Timeline
Reuters summary
curated

Tesla reports its biggest-ever loss but shares rise on Musk's claims of positive cash flow and profit in the second half of 2018, and signs of more consistent Model 3 production.

Detected headline
matched meaning

happening now: tesla reports second-quarter revenue of $4 billion vs. $2.79 billion reported last year. tesla shares down 1.9 percent in choppy trading after the bell following results $tsla

Aug. 7, 2018
Timeline
Reuters summary
curated

Musk surprises investors by using Twitter to announce he is considering taking Tesla private at $420 per share, adding "Funding secured."

Detected headline
matched meaning

buyers stampede to tesla stock after elon musk tweets he is considering taking the company private. @alexandriasage reports:

Note: this section intentionally uses the paper-style "headline representative" to preserve the original comparison format.

Demo (event matching + explanation)

A recruiter-friendly interaction: paste a headline and see the closest historical events, the cluster summary, and the expected direction.

Add later
Interactive demo placeholder
Input: headline(s) -> Output: event summary, top similar historical events, predicted sign, confidence/abstain.
[placeholder]

Where I'd take it next

A few high-leverage upgrades that make the system more robust and more deployable.

Adaptive event-date detection

Automatically adapt short/long windows (and thresholds) to market regime so "abnormal" stays meaningful across volatility shifts.

Event retrieval + calibration

Tighten semantic retrieval, and add calibrated confidence so the system can trade off coverage vs reliability explicitly.

Resources

Quick links for readers and reviewers.

Tip: if you add a short GIF of the calendar/timeline interaction here, it will dramatically increase time-on-page.