N
Neel Shah
About

Large-scale data systems for
high-stakes organisations

Senior Data Engineer and Technical Consultant specialising in PySpark, Python, and cloud infrastructure for Government, Healthcare, and Financial Services. Based in Ottawa, Canada.

I'm Neel Shah — a Senior Data Engineer and Technical Consultant with 10+ years building end-to-end data systems for organisations that handle sensitive, high-volume data. My core expertise is PySpark, Python, and cloud infrastructure — delivering reliable, compliant, and scalable pipelines across Government, Healthcare, and Financial Services.

As Tech Lead at CIHI (Canadian Institute for Health Information), I lead large-scale PySpark pipelines processing over 1 billion Canadian health data points — covering national registry, diagnosis, and pharma datasets — for federal government, provincial governments, and NPO clients. I manage client relationships, lead the engineering team end-to-end, and ensure strict compliance with PIPEDA and provincial health privacy legislation. This means PII governance, audit trails, and data security are part of every technical decision I make.

Before CIHI, at EXL Service embedded at Goldman Sachs, I built PySpark-based credit risk management platforms handling 1 million financial transactions per hour — powering Apple Card, Walmart Card, and GM Card risk decisioning with full regulatory compliance. I also architected cloud-based systems at Canopy Growth, Manulife, and SITA — building high-availability infrastructure across Azure and AWS for millions of users.

I use modern AI tools — Claude, GPT, and local LLMs — as productivity accelerators: faster code reviews, automated documentation, data validation pipelines, and intelligent development workflows. AI makes my engineering output faster and higher quality — it's a tool, not the product.

I also created emot — an early open source contribution that grew to 3M+ downloads. It's a reminder that the best tools solve one thing really well.

Originally from Vadodara, India — graduated 1st in my engineering class — I moved to Canada for graduate studies and have contributed to both the tech community and volunteer AI initiatives since.

Quick facts
  • 📍 Ottawa, Ontario, Canada
  • 🏢 CIHI (current)
  • 🎓 Lakehead University
  • 💻 10+ years experience
  • 📄 3 research papers · 89+ citations
  • 📦 3M+ open source downloads
  • 🌍 5 languages
Languages
  • English Native
  • Hindi Native
  • Gujarati Native
  • French Elementary
  • Sanskrit Limited
Core Expertise

Technical Skills

PySpark & Big Data

PySparkApache SparkDatabricksLarge-scale ETLData lakesDistributed processingStreamingDelta LakeSQL at scale
🔒

PII & Compliance

PIPEDAHealth data privacyPII governanceAudit trailsData maskingAccess controlsProvincial health legislationFinancial regulatory compliance
☁️

Cloud & Infrastructure

AzureAWSDatabricksDockerCI/CDMicroservicesFastAPIREST APIsPower BIInfrastructure automation
🤖

AI-Accelerated Development

Claude APIOpenAI / GPT-4Local LLMs (Ollama)Prompt engineeringRAG pipelinesAI code reviewAutomated documentationData validation with LLMs

Additional Skills

Languages
PythonSQLRKotlinJavaScriptBashRustGo
Data & ML
scikit-learnRandom ForestNLPPandasElasticsearchKibanaMongoDB
Domains
GovernmentHealthcare / PharmaFinancial ServicesCredit RiskNPOAirport Systems
Methods
Agile / ScrumSDLC leadershipClient managementSystem architectureCode reviewMentoring

Experience

Canadian Institute for Health Information (CIHI) Current
Tech Lead
Jul 2023 – Present · Ottawa, ON

Lead large-scale PySpark ETL pipeline processing up to 24M records with 200+ parameters in under 60 minutes — ingesting 1B+ Canadian health data points (registry, diagnosis, pharma) from every hospital in the country. Serve federal/provincial government and NPO clients, lead cross-functional engineering team end-to-end through SDLC, manage client relationships, and enforce PIPEDA and provincial health privacy compliance.

EXL Service (embedded at Goldman Sachs)
Senior Consultant — Financial Risk Systems
May 2022 – Jul 2023 · Ottawa, ON

Built PySpark-based credit risk management platform handling Apple Card, Walmart Card, and GM Card portfolios at 1M transactions/hour. Resolved P-0 production incidents saving $10M+ in risk exposure. Built Python test automation framework reducing end-to-end test time by 60%.

Canopy Growth Corporation
Web System Architect
Jul 2021 – Apr 2022 · Ottawa, ON

Architected and led a full waterfall-to-agile transformation across 42 company websites. Designed Python/FastAPI/Docker/AWS microservice handling 100K requests/hour. Delivered $5M/year in annual cost savings through AEM virtualisation. Built WCAG accessibility compliance tooling across the full digital estate.

Manulife
Senior Python Developer
Aug 2020 – Jul 2021 · Waterloo, ON

Built and maintained Azure cloud infrastructure of 1,800+ servers (Windows & Linux) with 99.99% uptime SLA. Developed real-time Power BI dashboards for infrastructure monitoring. Automated CI/CD pipeline with Python and Docker, reducing debugging time by 45 minutes.

SITA
Python Developer
Sep 2019 – Jun 2020 · Montreal, QC

Designed real-time airport analytical system integrating LiDAR and Camera hardware using Python and reactive programming. Led Python 2→3 migration of large-scale airport systems. Transformed monolithic legacy architecture into cloud-based microservices on Azure.

Lakehead University
Research Assistant & Python Developer
Nov 2017 – May 2019 · Thunder Bay, ON

Published 3 peer-reviewed papers (89+ citations, NSERC-funded). Built 20-node Elasticsearch cluster searching 330M tweets/second for real-time public health analytics. Developed Random Forest NLP model achieving 93.4% accuracy for population-level health classification.

Datalog.ai
Python Developer
Jan 2017 – Aug 2017 · Remote

Built asynchronous chatbot analytics API handling thousands of requests/second. Developed 5+ real-time AWS dashboards for semantic analysis and topic extraction. Designed clustering algorithm for chat-based decision support.

Panchamrut Dairy
Python Developer & Data Analyst
Jul 2014 – Dec 2016 · Godhra, Gujarat, India

Built real-time data analysis system for product cost and logistics using SAP and Python. Developed time-series sales forecasting model achieving 71% prediction efficiency. Designed ETL and report generation pipeline for sales, cost, and inventory data.

Education

🎓
Lakehead University
Graduate Studies · Computer Science
2017–2019 · Thunder Bay, ON, Canada
🎓
Parul Institute of Engineering
Bachelor of Engineering
2010–2014 · Vadodara, India
🏆 1st Rank — Excellence Certificate
Life Outside the Terminal

Beyond Work

🚴

Cycling

Ottawa's cycling paths are underrated. Long rides clear the head after a week of debugging PySpark DAGs.

🚶

Walking

Walking is where problems get solved. Some of the best architecture decisions happen away from the screen.

🏋️

Gym

Consistency at the gym mirrors consistency in engineering. Show up, do the work, trust the compound effect.