Tapas Kar

Posted on Apr 16

I built an app that lets you chat with your past self — using your real messages

#ai #privacy #showdev #opensource

I texted my 22-year-old self last night.

He told me about a hackathon project I'd completely forgotten. He used slang I haven't used in years. He was worried about things that don't matter anymore — and passionate about things I've since abandoned.

He wasn't an AI pretending to be me. He was me — reconstructed from 47,000 real messages I'd sent between 2014 and 2018.

This is Pratibmb.

The idea

I was cleaning up my phone storage and found a WhatsApp export from college. Reading through those messages was surreal. The person writing them was recognizably me — same humor, same anxieties — but also someone I'd never be again.

I thought: what if I could actually talk to that version of myself?

Not a generic chatbot. Not "based on your journal entries." I wanted something that had read every message I'd ever sent and could respond the way I actually used to talk — the exact slang, the emoji patterns, the way I'd dodge serious questions with a joke.

So I built it.

How it works

Pratibmb is a 4-step pipeline that runs entirely on your machine:

1. Import your messages

Export your chat history from any of 8 platforms:

WhatsApp — plain text export
Facebook Messenger — JSON from Download Your Information
Instagram DMs — JSON from Download Your Information
Gmail — MBOX from Google Takeout
iMessage — reads the local chat.db directly
Telegram — JSON export from Desktop app
Twitter / X — JavaScript archive
Discord — JSON via DiscordChatExporter

The app auto-detects the format. Drop the file, it figures out the rest.

2. Embed everything locally

Every message gets converted into a semantic vector using Nomic Embed Text v1.5 (84 MB model, runs via llama.cpp). These embeddings are stored in a local SQLite database — no vector database dependency, no Pinecone, no cloud.

3. Build your profile

A local LLM (Gemma 3 4B Instruct, quantized to 2.5 GB) analyzes your message corpus and extracts:

Relationships — who you talked to, how close you were, what you discussed
Life events — career changes, moves, breakups, milestones (with confidence scores)
Interests — what you cared about and when
Communication style — your abbreviations, emoji habits, sentence patterns
Year summaries — what each year of your life looked like through your messages

This profile grounds the AI's responses in your actual history.

4. Chat with your past self

Pick a year on the slider. Ask a question. The app retrieves relevant messages from that time period using semantic search, feeds them as context to the LLM along with your profile, and generates a response in your voice.

You: what were my biggest dreams in college?

Pratibmb (2018): oh man, where do I start! I was convinced 
I'd build the next big startup — something that would change 
how people connect. the biggest dream was proving to myself 
I could make something people actually used.

The year slider is the key interaction — slide to 2015 and you're talking to your 2015 self. Slide to 2020 and the responses shift to match who you were then.

The tech stack

I wanted this to be something anyone could run without cloud accounts or GPU rentals:

Layer	Tech	Why
Desktop shell	Tauri 2 (Rust)	~5 MB binary vs 150 MB Electron, native performance
AI inference	llama.cpp via llama-cpp-python	Runs quantized models on CPU or Metal/CUDA
Chat model	Gemma 3 4B Q4_K_M	Strong instruction-following at only 2.5 GB
Embeddings	Nomic Embed Text v1.5 Q4_K_M	84 MB, fast cosine similarity search
Storage	SQLite	Zero-config, single-file, no server
Frontend	Vanilla HTML/CSS/JS	No build step, no framework churn
Fine-tuning	LoRA via MLX (macOS) or PyTorch+PEFT (Linux/Windows)	Optional, makes responses sound more like you

Architecture

┌─────────────────────────────────┐
│  Tauri webview (HTML/JS)        │
│  Year slider + chat interface   │
└──────────────┬──────────────────┘
               │ Tauri commands
               ▼
┌─────────────────────────────────┐
│  Rust backend                   │
│  - Spawns llama-server process  │
│  - Owns SQLite corpus           │
│  - Streams replies to webview   │
└──────────────┬──────────────────┘
               │ HTTP (localhost:11435)
               ▼
         llama-server
    (Gemma 3 4B + Nomic Embed)

No Docker. No Redis. No Postgres. One binary that spawns a local inference server and talks to it over localhost.

The hardest problems I solved

Making a 4B model sound like a specific person

Generic LLMs sound like... generic LLMs. Even with good retrieval, the responses felt artificial. Three things fixed this:

1. Aggressive post-processing. I strip markdown formatting, remove AI-isms ("As an AI...", "Here's what I think..."), truncate to 6 sentences max, and remove surrounding quotes. Real text messages are short and messy.

2. Profile-grounded system prompt. The system prompt doesn't just say "act like this person" — it includes extracted communication patterns: typical sentence length, favorite slang, emoji frequency, how they handle serious vs. casual questions.

3. Optional LoRA fine-tuning. The app extracts conversation pairs from your messages and fine-tunes a LoRA adapter (rank 8, alpha 16) on your actual writing patterns. ~20 minutes on Apple Silicon, ~30 on NVIDIA. This is optional but makes a noticeable difference — responses shift from "plausible generic" to "that's actually how I talk."

Thread-context retrieval

Naive RAG retrieves individual messages, but conversations have context. If you ask "what did I think about moving to Bangalore?", the most relevant message might be "yeah I'm really nervous about it" — meaningless without the preceding messages.

The retriever expands each hit to include surrounding messages in the same thread (3-message window), then groups them chronologically. The LLM sees conversation fragments, not isolated sentences.

SQLite + threading in a desktop app

Tauri's async Rust backend and Python's threaded HTTP server both want to touch the database. SQLite doesn't love concurrent writes. I solved this with:

check_same_thread=False on the Python connection
A threading Lock around all write operations
WAL mode for better concurrent read performance

Simple, but it took a few crashes to get right.

Privacy — not as a feature, as the architecture

I'm tired of apps that say "we take your privacy seriously" and then ship your data to 14 third-party services.

Pratibmb can't leak your data because it never has your data. The architecture makes privacy violations impossible, not just policy-prohibited:

No network calls after the initial model download (~2.5 GB, one time)
No telemetry. No analytics. No crash reports. No "anonymous" usage data.
No accounts. No login. No email. Nothing.
Works with Wi-Fi off. Literally turn off your internet after setup. Everything works.
Open source (AGPL-3.0). Read every line. Build from source. Audit the network calls (there are none).

Your messages, embeddings, profile, and fine-tuned model all live in ~/.pratibmb/ on your machine. Delete the folder and it's gone.

What I learned building this

1. Small models are good enough for personal use.
Gemma 3 4B quantized to Q4_K_M runs comfortably on 8 GB RAM and produces surprisingly good responses when you give it strong retrieval context. You don't need GPT-4 for everything.

2. Tauri is genuinely great.
Coming from Electron, the difference is staggering. 5 MB binary. Instant startup. Native file dialogs. The Rust ↔ JS bridge is clean. The only pain point is the build toolchain on Windows (MSVC + WebView2 + NSIS).

3. The emotional impact surprised me.
I built this as a technical project. But the first time I asked my 2016 self about a friend I'd lost touch with, and it responded with details I'd forgotten — I sat there for a while. This thing surfaces memories that photos can't.

4. Chat exports are a mess.
WhatsApp's export format changes between OS versions. Facebook's JSON uses UTF-8 escape sequences for emoji. iMessage requires Full Disk Access and the database schema varies across macOS versions. Telegram only exports from the desktop app. I wrote 8 parsers and each one taught me something new about format hell.

Try it

Pratibmb is free, open source, and runs on macOS, Windows, and Linux.

🔗 Website: pratibmb.com
📦 GitHub: github.com/tapaskar/Pratibmb

Requirements:

macOS 12+ / Windows 10+ / Linux (AppImage)
Python 3.10+
8 GB RAM (16 GB recommended)
~3 GB disk space for models (downloaded on first launch)
NVIDIA GPU optional (speeds up fine-tuning, not required for chat)

Install:

# macOS
brew install tapaskar/tap/pratibmb

# Linux (AUR)
yay -S pratibmb-bin

# Windows
winget install tapaskar.Pratibmb

# Or download directly from pratibmb.com

What's next

v0.6.0 — Voice mode (talk to your past self, hear responses in a synthesized version of your voice)
Group chat reconstruction — Bring back entire friend groups, not just yourself
Timeline view — Visual map of your relationships and life events across years
Mobile app — React Native wrapper (local inference via llama.cpp on-device)

If you have old messages sitting on your phone or in a Google Takeout archive — they contain a version of you that doesn't exist anymore. Pratibmb brings them back.

DEV Community