A high-performance, hardware-aligned CSV column extractor written in pure C. Built to bypass the Python Pandas "Out-of-Memory" (OOM) ingestion bottleneck.
Data engineering teams frequently process massive CSV/NDJSON logs (10GB+). Using pd.read_csv() forces the entire dataset into RAM, requiring expensive AWS instances (e.g., r6g.xlarge) and causing frequent OOM crashes just to extract a few columns.
This engine uses mmap (Memory Mapping) to read data directly from the SSD, bypassing RAM allocation entirely. It utilizes raw C pointers and a custom state machine to extract columns at the hardware limit.
- Includes a seamless Python wrapper (
axiom_pandas_accelerator.py) so data engineers don't have to leave their native environment.
Tested on: Acer Nitro 16 (Ryzen 7)
- Pandas Baseline (Read-Only): 3.21 seconds ❌ (High RAM usage)
- Axiom Engine (Read + Write): 1.23 seconds ✅ (Virtually Zero RAM)
- Speedup: ~2.6x faster end-to-end execution, with 99% less memory footprint.
from axiom_pandas_accelerator import extract_columns_fast
# Extracts Column 0 and Column 9 instantly
extract_columns_fast("huge_data.csv", 0, 9)