N
Neel Shah
Available for consulting engagements

Neel Shah
Data Architect
& Engineering Lead

10+ years building large-scale data systems for Government, Healthcare, and Financial Services. PySpark, Python, cloud infrastructure, and strict PII compliance. Based in Ottawa, Canada.

PySpark Python Azure · AWS · Databricks Healthcare & Finance PII / PIPEDA Government & NPO
Expertise

What I build

End-to-end data systems for organisations that can't afford failures — government agencies, health institutions, and financial services firms with millions of sensitive records.

Large-Scale PySpark Systems
Distributed data processing pipelines handling billions of records — ETL, aggregation, and transformation at population scale on Databricks and Azure
🔒
PII & Compliance Engineering
PIPEDA-compliant health data pipelines, financial audit trails, privacy-by-design architecture for sensitive government and NPO data
☁️
Cloud Data Architecture
End-to-end system design on Azure, AWS, and Databricks — infrastructure automation, CI/CD, and 99.99% uptime for critical production workloads
🏥
Healthcare & Financial Systems
National health data platforms at CIHI, credit risk engines at Goldman Sachs — deep domain knowledge in regulated, high-stakes environments
🐍
Python Engineering
Production Python systems — REST APIs with FastAPI, microservices, test automation frameworks, real-time processing, and ML pipelines
🤖
AI-Accelerated Development
Using Claude, GPT, and local LLMs as development accelerators — faster code reviews, documentation, data validation, and pipeline generation
Who I work with

Industries & Clients

🏛️
Government
Federal & Provincial
🏥
Healthcare
Pharma & Health Data
🏦
Financial Services
Credit Risk & Fraud
🔬
NPO & Research
Non-profit & Academic
Services

Data engineering & consulting

I help organisations with sensitive, large-scale data — building the pipelines, cloud infrastructure, and compliance frameworks to turn raw data into reliable, auditable systems.

Discuss a project
PySpark pipeline design
PII & PIPEDA compliance
Cloud architecture (Azure/AWS)
ETL & data warehouse
Real-time data systems
Python API development
Health data platforms
Credit risk & fraud systems
Open Source

Projects & Contributions

All projects
😊 emot Python

Emoji & emoticon detection library. An early open source achievement that grew to 3M+ downloads globally.

⭐ 196 🍴 79 3M+ downloads
📊 Arxiv Analysis Jupyter

Analysis of 24,000+ research papers mapping AI/ML trends over time. Full dataset included.

⭐ 4 24K+ papers
🔬 Scopus India Dataset

Data analysis of Indian researchers in Scopus journals (2000–2016) with full dataset published.

📌 Pinned 2000–2016

Have large, sensitive data to solve?

Whether it's a national health platform, a credit risk engine, or a government compliance system — I've built it. Let's talk.