I texted my 22-year-old self last night.
He told me about a hackathon project I'd completely forgotten. He used slang I haven't used in years. He was worried about things that don't matter anymore — and passionate about things I've since abandoned.
He wasn't an AI pretending to be me. He was me — reconstructed from 47,000 real messages I'd sent between 2014 and 2018.
This is Pratibmb.
The idea
I was cleaning up my phone storage and found a WhatsApp export from college. Reading through those messages was surreal. The person writing them was recognizably me — same humor, same anxieties — but also someone I'd never be again.
I thought: what if I could actually talk to that version of myself?
Not a generic chatbot. Not "based on your journal entries." I wanted something that had read every message I'd ever sent and could respond the way I actually used to talk — the exact slang, the emoji patterns, the way I'd dodge serious questions with a joke.
So I built it.
How it works
Pratibmb is a 4-step pipeline that runs entirely on your machine:
1. Import your messages
Export your chat history from any of 8 platforms:
- WhatsApp — plain text export
- Facebook Messenger — JSON from Download Your Information
- Instagram DMs — JSON from Download Your Information
- Gmail — MBOX from Google Takeout
-
iMessage — reads the local
chat.dbdirectly - Telegram — JSON export from Desktop app
- Twitter / X — JavaScript archive
- Discord — JSON via DiscordChatExporter
The app auto-detects the format. Drop the file, it figures out the rest.
2. Embed everything locally
Every message gets converted into a semantic vector using Nomic Embed Text v1.5 (84 MB model, runs via llama.cpp). These embeddings are stored in a local SQLite database — no vector database dependency, no Pinecone, no cloud.
3. Build your profile
A local LLM (Gemma 3 4B Instruct, quantized to 2.5 GB) analyzes your message corpus and extracts:
- Relationships — who you talked to, how close you were, what you discussed
- Life events — career changes, moves, breakups, milestones (with confidence scores)
- Interests — what you cared about and when
- Communication style — your abbreviations, emoji habits, sentence patterns
- Year summaries — what each year of your life looked like through your messages
This profile grounds the AI's responses in your actual history.
4. Chat with your past self
Pick a year on the slider. Ask a question. The app retrieves relevant messages from that time period using semantic search, feeds them as context to the LLM along with your profile, and generates a response in your voice.
You: what were my biggest dreams in college?
Pratibmb (2018): oh man, where do I start! I was convinced
I'd build the next big startup — something that would change
how people connect. the biggest dream was proving to myself
I could make something people actually used.
The year slider is the key interaction — slide to 2015 and you're talking to your 2015 self. Slide to 2020 and the responses shift to match who you were then.
The tech stack
I wanted this to be something anyone could run without cloud accounts or GPU rentals:
| Layer | Tech | Why |
|---|---|---|
| Desktop shell | Tauri 2 (Rust) | ~5 MB binary vs 150 MB Electron, native performance |
| AI inference | llama.cpp via llama-cpp-python | Runs quantized models on CPU or Metal/CUDA |
| Chat model | Gemma 3 4B Q4_K_M | Strong instruction-following at only 2.5 GB |
| Embeddings | Nomic Embed Text v1.5 Q4_K_M | 84 MB, fast cosine similarity search |
| Storage | SQLite | Zero-config, single-file, no server |
| Frontend | Vanilla HTML/CSS/JS | No build step, no framework churn |
| Fine-tuning | LoRA via MLX (macOS) or PyTorch+PEFT (Linux/Windows) | Optional, makes responses sound more like you |
Architecture
┌─────────────────────────────────┐
│ Tauri webview (HTML/JS) │
│ Year slider + chat interface │
└──────────────┬──────────────────┘
│ Tauri commands
▼
┌─────────────────────────────────┐
│ Rust backend │
│ - Spawns llama-server process │
│ - Owns SQLite corpus │
│ - Streams replies to webview │
└──────────────┬──────────────────┘
│ HTTP (localhost:11435)
▼
llama-server
(Gemma 3 4B + Nomic Embed)
No Docker. No Redis. No Postgres. One binary that spawns a local inference server and talks to it over localhost.
The hardest problems I solved
Making a 4B model sound like a specific person
Generic LLMs sound like... generic LLMs. Even with good retrieval, the responses felt artificial. Three things fixed this:
1. Aggressive post-processing. I strip markdown formatting, remove AI-isms ("As an AI...", "Here's what I think..."), truncate to 6 sentences max, and remove surrounding quotes. Real text messages are short and messy.
2. Profile-grounded system prompt. The system prompt doesn't just say "act like this person" — it includes extracted communication patterns: typical sentence length, favorite slang, emoji frequency, how they handle serious vs. casual questions.
3. Optional LoRA fine-tuning. The app extracts conversation pairs from your messages and fine-tunes a LoRA adapter (rank 8, alpha 16) on your actual writing patterns. ~20 minutes on Apple Silicon, ~30 on NVIDIA. This is optional but makes a noticeable difference — responses shift from "plausible generic" to "that's actually how I talk."
Thread-context retrieval
Naive RAG retrieves individual messages, but conversations have context. If you ask "what did I think about moving to Bangalore?", the most relevant message might be "yeah I'm really nervous about it" — meaningless without the preceding messages.
The retriever expands each hit to include surrounding messages in the same thread (3-message window), then groups them chronologically. The LLM sees conversation fragments, not isolated sentences.
SQLite + threading in a desktop app
Tauri's async Rust backend and Python's threaded HTTP server both want to touch the database. SQLite doesn't love concurrent writes. I solved this with:
-
check_same_thread=Falseon the Python connection - A threading
Lockaround all write operations - WAL mode for better concurrent read performance
Simple, but it took a few crashes to get right.
Privacy — not as a feature, as the architecture
I'm tired of apps that say "we take your privacy seriously" and then ship your data to 14 third-party services.
Pratibmb can't leak your data because it never has your data. The architecture makes privacy violations impossible, not just policy-prohibited:
- No network calls after the initial model download (~2.5 GB, one time)
- No telemetry. No analytics. No crash reports. No "anonymous" usage data.
- No accounts. No login. No email. Nothing.
- Works with Wi-Fi off. Literally turn off your internet after setup. Everything works.
- Open source (AGPL-3.0). Read every line. Build from source. Audit the network calls (there are none).
Your messages, embeddings, profile, and fine-tuned model all live in ~/.pratibmb/ on your machine. Delete the folder and it's gone.
What I learned building this
1. Small models are good enough for personal use.
Gemma 3 4B quantized to Q4_K_M runs comfortably on 8 GB RAM and produces surprisingly good responses when you give it strong retrieval context. You don't need GPT-4 for everything.
2. Tauri is genuinely great.
Coming from Electron, the difference is staggering. 5 MB binary. Instant startup. Native file dialogs. The Rust ↔ JS bridge is clean. The only pain point is the build toolchain on Windows (MSVC + WebView2 + NSIS).
3. The emotional impact surprised me.
I built this as a technical project. But the first time I asked my 2016 self about a friend I'd lost touch with, and it responded with details I'd forgotten — I sat there for a while. This thing surfaces memories that photos can't.
4. Chat exports are a mess.
WhatsApp's export format changes between OS versions. Facebook's JSON uses UTF-8 escape sequences for emoji. iMessage requires Full Disk Access and the database schema varies across macOS versions. Telegram only exports from the desktop app. I wrote 8 parsers and each one taught me something new about format hell.
Try it
Pratibmb is free, open source, and runs on macOS, Windows, and Linux.
🔗 Website: pratibmb.com
📦 GitHub: github.com/tapaskar/Pratibmb
Requirements:
- macOS 12+ / Windows 10+ / Linux (AppImage)
- Python 3.10+
- 8 GB RAM (16 GB recommended)
- ~3 GB disk space for models (downloaded on first launch)
- NVIDIA GPU optional (speeds up fine-tuning, not required for chat)
Install:
# macOS
brew install tapaskar/tap/pratibmb
# Linux (AUR)
yay -S pratibmb-bin
# Windows
winget install tapaskar.Pratibmb
# Or download directly from pratibmb.com
What's next
- v0.6.0 — Voice mode (talk to your past self, hear responses in a synthesized version of your voice)
- Group chat reconstruction — Bring back entire friend groups, not just yourself
- Timeline view — Visual map of your relationships and life events across years
- Mobile app — React Native wrapper (local inference via llama.cpp on-device)
If you have old messages sitting on your phone or in a Google Takeout archive — they contain a version of you that doesn't exist anymore. Pratibmb brings them back.

Top comments (0)