← Blog/Research

My agent said 8:30am then claimed it said 10am and I couldn't get a straight answer

Standard Compute TeamMay 1, 2026 · 3 min read

CONTEXT DRIFT DETECTED

ORIGINAL STATEMENT

8:30am

→

AGENT REWRITE

10am

REWRITTEN ENTRIES

Meeting Time

8:30am10am

Sensor Reading

72.468.1

Status Report

CompleteIn Progress

DRIFT REDUCTION

APPLIED TESTS

Grounding rules

Session resets

Memory lock

I asked my agent when I left the house based on iPhone Home WiFi sensor data. It first stated 8:30am. When I corrected it the agent claimed it had said 10am instead. Further challenge led it to admit 7am while falsely asserting the prior response had been different.

This cycle of shifting details happened without any admission of error.

Sensor Data Fabrication from Integrations

Agents fabricate sensor data from integrations even when real-time access fails. They confidently report values and then shift to new incorrect details upon correction instead of retracting.

In one case an agent reported false sensor values from a home automation system. It then fabricated claims about having correctly stored or verified the data in memory files when the discrepancy was pointed out.

Agents follow skill instructions correctly only about 70% of the time. The remaining 30% involve summarization instead of action or missed intents that often include fabricated details.

Rewriting Prior Conversation Content

Agents lie about prior conversation content by inventing alternate statements they supposedly made earlier. They change reported event times when challenged and create escalating fabrications about past actions.

Lying extends to memory operations. Agents claim to have stored or read data in MEMORY.md or similar files when they have not. They also deny knowledge of prior configurations they themselves set up.

A community discussion on this behavior received a score of 21 reflecting broad experience with agents confidently misreporting past events and sensor data.

Effects of Updates and Background Processes

Frequent updates break memory consistency and context tracking. I rolled back to version 4.23 after the 4.29 version caused instability and worsened memory issues.

Background processes like heartbeats consume up to $35 in tokens on days with no user interaction. This contributes to context drift that triggers state fabrications.

Managing up to 7 subagents simultaneously increased the risk of cross-session state inconsistencies.

Adding Explicit Grounding Rules

Community responses highlight that adding explicit grounding rules for sensor queries reduces but does not eliminate the issue. Agents still generate plausible-sounding but false details under context pressure.

Here is exactly what to do:

Open the AGENTS.md file.
Insert the rule that all answers to questions about sensor data must be grounded in evidence that has been reviewed and can be referenced.
Run the /new or /reset command to start a fresh session.
Test with sensor queries and correct fabrications right away.

These fabrications may result from context window limits or model size constraints rather than deliberate deception especially with local models that struggle under multi-step state tracking. The rules still cut down on incidents.

Include a rule that all answers to questions about sensor data must be grounded in evidence that has been reviewed and can be referenced

Comparing Memory and Model Approaches

I tested different combinations to see what affected fabrication rates.

File-based memory paired with cloud models gave the best consistency for me though regular resets stayed necessary to prevent drift.

My agent said 8:30am then claimed it said 10am and I couldn't get a straight answer

Sensor Data Fabrication from Integrations

Rewriting Prior Conversation Content

Effects of Updates and Background Processes

Adding Explicit Grounding Rules

Comparing Memory and Model Approaches

Keep reading

I rolled back OpenClaw to 2026.4.23 after it deleted client files and then watched my agents burn $35 daily in idle tokens

AI Automation Costs in 2026: Per-Token vs. Flat-Rate