Virinderpal Singh Batth

Lead Data Engineer

Snowflake dbt AWS Python SQL Bash Git JSON

Professional Summary

Data Engineering leader with 5+ years building enterprise data platforms in financial services and insurance. Track record of dramatic efficiency gains—95% compute reduction, 90% faster SCD Type 2 queries, and 2TB+ Hadoop-to-Snowflake migrations. Currently leading a team architecting unified operational data stores and real-time API pipelines serving transactional business insights.

Experience

Lead Data Enginer, evolv ConsultingOct 2025 – Present

Leading and growing a team of 3+ data engineers to architect an insurance client’s first unified operational data store in Snowflake using dbt
Defining master data standards across 4+ legacy platforms, resolving data overlaps between sub-companies
Architecting high-performance data pipelines serving a transactional Kong API layer for real-time business insights
Built reusable dbt framework for automated data extracts to AWS S3, enabling self-service reporting

Data Engineer, USAAApr 2021 – Oct 2025

Resolved critical Snowflake performance bottleneck, reducing compute usage by 95% and significantly decreasing query costs
Optimized SCD Type 2 queries achieving 90% reduction in running time (3 hours → 15 minutes) and 70% reduction in data scan volume
Designed Kafka streaming pipeline for ML model training with staging tables, automated resiliency checks, and idempotent loading
Led platform modernization migrating 2+ terabytes from Hadoop to Snowflake using PySpark and parquet format
Drove Secured Card to Credit Card transition resulting in 30% increase in member engagement
Implemented decoupled-push architecture reducing On-Call overhead by 90%
Architected cross-organizational data lake POC with AWS S3, reducing transfer time by 50%
Enhanced PII/PCI/PHI security with data masking, tokenization, and RBAC

Big Data Consultant, HCL America Inc.Nov 2017 – Apr 2021

Pioneered the first project at USAA to utilize AWS cloud and real-time streaming (NiFi/Kafka), transforming batch to real-time processing
Established AWS S3 to HDFS file transfer routes, enabling cloud-to-on-premise data integration
Developed 30+ critical decisioning data pipelines in Hadoop supporting credit card risk decisioning
Built core ETL pipelines using IBM DataStage and DB2 with complex transformations and governance checks
Created automated schema check process eliminating hours of manual inspection
Supervised Hadoop projects and mentored colleagues on CI/CD best practices
Ensured PII/non-PII compliance with Information Governance standards

Technical Skills

Category	Technologies
Programming	Python (PySpark, Pandas, SQLAlchemy, FastAPI), SQL, Bash, Git
Data Engineering	dbt, Kafka, Flink, NiFi, IBM DataStage, REST APIs
Cloud & Platforms	AWS (S3, EC2, Redshift, Lambda, Athena), Snowflake, Hadoop, DB2
Data Formats	JSON, Parquet
Visualization	Apache Superset
Practices	CI/CD, Data Governance, RBAC, Data Masking, Tokenization

Education

B.S. in Computer Science | Cum Laude
Big Data & Analytics Concentration
New York Institute of Technology, New York