📄Download CV ▾
Open to new opportunities

Vinh X. Nguyen

AI Data Engineer Leader
📧 nxv.can@gmail.com 📞 (+1) 548.994.4264 📍 Waterloo, Ontario, Canada

I build big data platforms — from infrastructure to semantic layers — and have spent the last few years exploring how LLM-powered multi-agent systems can make analytics faster and more honest. Mostly working with AWS, Snowflake, Databricks, and Python.

~9 yrs· 7 sectors· 200B+ events· 99.97% reconciliation accuracy
🏦 Banking 💰 Insurance 🚗 Automotive 📊 Analytics / KPMG 🔐 PKI · Security 🛍 Retail / PoS 🎓 Education
$1.5B
transactions audited by 6-agent LangGraph (MADA)
200B+
events / 30TB text processed on AWS · Snowflake · Spark
1M+
mobile-banking users protected by Cert Pinning
99.97%
financial reconciliation accuracy (from 60%)

About Me

Modeling: OLTP, OLAP, Star schema, Schema-on-read, Delta Lakes, SCD, Data Vault 1.0 & 2.0, Medallion.
Engineering: Led Big Data batch/streaming/ETL at 200B+ events, 30TB text on AWS + Snowflake + Spark + Databricks.
Agentic LLM: Accelerated time-to-insight with LLM-powered multi-agents using MCP, LangGraph, LangChain, and RAG analyzing $1.5B financial transactions.
Architecture: Architect Community Lead at TymeBank (GOTyme – 1M users); defined integration & data patterns across 5 engineering teams.
Cloud-First: Full-stack AWS (security, network, compute, messaging, analytics); Databricks; Snowflake.
Financial Impact: Detected multi-million-dollar revenue leakage. Improved financial reconciliation accuracy to 99.97%.
Mentorship: University lecturer & engineering mentor — ML, algorithms, blockchain, performance & cost optimization.
Sectors: Bank (GOTyme, UBS), Insurance (Manulife, Prudential), Automotive (Cox), Analytics (KPMG, Ryte).

Skills & Expertise

🤖 AI Agentic LLM

  • Multi-agent workflows (LangGraph, LangChain)
  • Model Context Protocol (MCP)
  • GPT-4, Claude, Qwen; RAG pipelines
  • Vector Databases, Dynamic Tool Calls
  • Financial Anomaly Detection Agents

🗄️ Data Modeling

  • OLTP / OLAP at 200B+ records
  • Star Schema, SCD Types
  • Data Vault 2.0
  • Medallion Architecture (500M+ events)
  • Schema-on-read, Delta Lakes

⚙️ Data Engineering

  • Spark (EMR/Glue), Databricks, Delta Lake
  • Kafka, Kinesis, SQS/SNS, DynamoDB
  • Batch, micro-batch & streaming pipelines
  • S3 + Glue Catalog, Snowflake, MySQL
  • AWS Lambda, Sagemaker, Jupyter

☁️ Cloud & Infrastructure

  • AWS: ECS, Kinesis, VPC, WAF, Route53
  • API Gateway, CloudFront, ELB, NAT
  • Snowflake: Snowpipe, Materialized Views
  • Databricks: Delta, Cost-optimization
  • IaC: Terraform, CloudFormation

📊 Analytics & Visualization

  • Tableau, PowerBI, QuickSight
  • Grafana, Datadog, CloudWatch, ELK
  • NLP: TF-IDF, Lemmatization, POS tagging
  • 30TB text processing, 30K req/day
  • Google Analytics (2M/mo)

🔐 Security & Systems

  • PKI, Certificate Pinning (1M+ users)
  • Blockchain, Cryptography
  • Event-driven, Backpressure, Async
  • QoS, MPLS, Load Balancing
  • WAF, VPC Peering, IAM

Projects at a Glance

Eleven flagship projects across five companies — grouped into two product tracks per company, each with numbered workstreams. Hover any company below to see the full project breakdown in Experience.

mindmap
  root((11 Flagship Projects))
    FPT Canada
      P1 Click-stream Web Analytics
        200B+ Omni-Channel Ingestion
        10M-Entity MDS
        Self-Serve Data Platform
        Omni-Channel Dashboard Rebuild
      P2 Financial Auditor and Alerting
        MADA 6-Agent LangGraph
        Copilot-MCP VS Code Extension
        Revenue Leakage Audit
    Tyme GOTyme Bank
      P1 Banking Business Products
        Personal Lending
        ID Payment
        Bancassurance
        Architect Community Lead
      P2 Infrastructure
        Cert Pinning 1M users
        Integration Patterns
        500M Tx Pipeline
      P3 Analytics
        Snowflake DW 2B
        Databricks Delta 50M
    NFQ Asia
      P1 SEO Keyword Analytics
        NLP at 30TB
        Serverless 50K req day
      P2 PoS Serverless API
        Lambda PoS Device API
        Event-Driven Pipeline
        Auto-Scaling Cost Control
    NashTech Global
      P1 Banking Insurance KYC
      P2 Process and People
    FPT Software
      P1 DirecTV Delivery
      P2 Cebu Dev Center and Training
      

Work Experience

Data Engineering Lead | Data Product Lead | Data Integrity Lead
FPT Canada — Ontario, Canada
Apr 2021 – Present
Project 1 · Data Platform
📊Click-stream Web Analytics
AWSSnowflakeSpark KinesisDelta LakeTableau
  1. Omni-Channel Ingestion 200B+ — Kinesis + Spark on EMR, Bronze→Silver→Gold medallion.
  2. 10M-Entity MDS — Master Data Service unifying customer / product across 7 source systems.
  3. Self-Serve Data Platform — used by 7+ DS, Analytics & Finance teams.
  4. Omni-Channel Dashboard Rebuild — Tableau / PowerBI / QuickSight unified views.
99.97% enrichment quality · query time 30s → 8s (−73%)
Project 2 · Agentic AI
🤖Financial Auditor & Alerting
LangGraphMCPRAG GPT-4 / ClaudeVector DBDynamoDB
  1. MADA — 6-agent LangGraph pipeline (Rules / Stats / LLM / Reviewer / RAG / MCP) auditing $1.5B in transactions.
  2. Copilot-MCP VS Code Extension — wires DBs & internal tools into the dev workflow.
  3. Revenue Leakage Audit — detected multi-million-dollar leaks; resumed 2 products.
Mapping accuracy 60% → 99.97% · multi-million-dollar audit recovered
LangGraphAWSSnowflake MCPDynamoDBSpark
Data Product Owner | Data Solution Architect | Architect Community Lead
Tyme Global (GOTyme Digital Bank) — Vietnam
Jul 2019 – Apr 2021
Project 1 · Banking Products
🏦GOTyme Banking Business Products
JavaAWSMicroservices KafkaPostgreSQL
  1. Personal Lending — underwriting + origination flow.
  2. ID Payment — instant transfer-by-ID rails.
  3. Bancassurance — partner insurance product line.
  4. Architect Community Lead — 100+ architecture reviews / 5 dev teams aligned.
3 products launched · 5 teams on shared integration patterns
Project 2 · Infrastructure & Security
🔐Bank-Grade Infra & Cert Pinning
PKIMobile SDKAWS EMR KinesisSQS / SNS
  1. Cert Pinning 1M+ users — PKI + mobile SDK + rotation pipeline against MitM.
  2. Integration Patterns — standardized across 5 dev teams.
  3. 500M+ Tx Pipeline — streaming + batch reconciliation.
1M+ users protected · zero-downtime rotation
Project 3 · Analytics
📈Bank-Wide Analytics Stack
SnowflakeDatabricks Delta LakeGoogle Analytics
  1. Snowflake DW — 2B-record warehouse.
  2. Databricks Delta — 50M-event Delta Lake.
Self-serve analytics across product, risk, finance
DatabricksSnowflakeGoogle Analytics AWS EMRCertificate Pinning
Technical Architect | Java Recruitment Lead
NFQ Asia — Ho Chi Minh City, Vietnam
Apr 2017 – Jul 2019
Project 1 · NLP at Scale
🔍SEO Keyword Analytics Platform
NLPTF-IDFAWS Lambda DatadogServerless
  1. 30TB Text Pipeline — TF-IDF + Lemmatization + POS tagging.
  2. Serverless API — 20K–50K req/day on AWS Lambda.
  3. Latency Optimization — 60s → 3s (−95%).
  4. Datadog Cost Audit — resolved $10K/mo logging issue.
95% latency cut · $10K/mo saved
Project 2 · Edge × Cloud
🛍PoS Serverless API Integration
AWS LambdaAPI GatewaySQS DynamoDBEventBridge
  1. PoS Device API Layer — Lambda endpoints for in-store PoS hardware.
  2. Event-Driven Pipeline — SQS → Lambda → DynamoDB / warehouse.
  3. Auto-Scaling & Cost Control — cold-start & concurrency tuned for retail bursts.
Pay-per-use retail backend · zero idle cost
NLPAWS LambdaTF-IDF ServerlessDatadog
Principal Software Engineer | Technical Architect
NashTech Global — Vietnam
May 2016 – Apr 2017
Project 1 · Regulated Engineering
🏦Banking / Insurance / KYC Engineering
JavaUBSManulife PrudentialKYC
  1. 20-member Engineering Lead across Banking, Insurance and KYC streams.
  2. Customer Quality Gates — review & release ceremony for regulated clients.
Reduced production incidents over 10+ workflow iterations
Project 2 · People & Process
🎓Engineering Excellence Program
MentorshipCode ReviewProcess
  1. Daily 30-min Upskilling — 10 developers leveled up.
  2. Workflow Re-engineering — 10+ iterations cutting incident rate.
Repeatable mentorship model adopted team-wide
JavaBankingInsuranceKYC
Technical Training Manager | Solution Architect
FPT Software — Philippines & Vietnam
Jun 2014 – May 2016
Project 1 · Customer Delivery
📡DirecTV / Cox Automotive Delivery
Java.NETSolution Arch
  1. Solution Architecture for offshore-onshore delivery.
  2. Customer-Facing Architect — sprint cadence with US clients.
Stable offshore delivery for tier-1 telecom & auto clients
Project 2 · Greenfield Org Build
🏗️Cebu Dev Center & Training Programs
RecruitmentTraining Java.NET
  1. Cebu Dev Center — recruited 22 Java & .NET engineers from scratch.
  2. Training for 60+ devs — curriculum + assessment.
  3. New-Graduate Program — sourcing-to-billable pipeline.
  4. Hiring Pipeline — 90% recruitment success rate.
Greenfield offshore center stood up & profitable
Java.NETTeam BuildingTraining
University Lecturer
Dong Nai Technology University & Nong Lam University — Vietnam
2009 – Present
  • Courses: Data Structures & Algorithms, ML, Blockchain, Cryptography, PKI, Discrete Maths.
  • Network Multiplex Communication at University of Bordeaux (Vietnam Branch).
Machine LearningBlockchainCryptographyDSA

Architecture & Journey

Career Timeline

From training engineers in Vietnam & the Philippines to leading AI data engineering in Canada.

timeline
    title 17+ Years in Engineering Leadership
    2009 : University Lecturer (DNTU / NLU)
    2014 : FPT Software - Training Manager / Solution Architect (PH and VN)
    2016 : NashTech Global - Principal Engineer / Architect
    2017 : NFQ Asia - Technical Architect (NLP at 30TB)
    2019 : Tyme / GOTyme Bank - Architect Community Lead (1M+ users)
    2021 : FPT Canada - Data Engineering Lead (200B+ events, LangGraph agents)
      

Skills Mind-Map

Six pillars across AI, data, cloud, and security.

mindmap
  root((Vinh X. Nguyen))
    AI Agentic LLM
      LangGraph / LangChain
      MCP
      RAG and Vector DBs
      Anomaly Detection
    Data Modeling
      OLTP / OLAP 200B+
      Star / SCD
      Data Vault 2.0
      Medallion
    Data Engineering
      Spark / Databricks
      Kafka / Kinesis
      Snowflake / Delta Lake
      Lambda / Sagemaker
    Cloud
      AWS Full-Stack
      Snowflake
      Databricks
      Terraform / CFN
    Analytics
      Tableau / PowerBI
      Datadog / Grafana
      NLP TF-IDF
    Security
      PKI / Cert Pinning
      Cryptography
      WAF / IAM
      Event-driven
      

Signature Architecture #1 — LangGraph Multi-Agent Anomaly Detection

FPT Canada — LLM agents auditing $1.5B in financial transactions.

flowchart LR
    Tx[(Transactions DW)] --> Orchestrator{LangGraph Orchestrator}
    Orchestrator --> Rules[Rules Agent]
    Orchestrator --> Stats[Statistical Agent]
    Orchestrator --> LLM[LLM Reasoning Agent]
    LLM --> RAG[(Policy RAG / Vector DB)]
    LLM --> MCP[MCP Tools to DBs]
    Rules --> Reviewer[Reviewer Agent]
    Stats --> Reviewer
    LLM --> Reviewer
    Reviewer --> Human[Human-in-the-Loop]
    Reviewer --> Cases[(Case Store)]
      

Signature Architecture #2 — 200B+ Event ETL Platform

FPT Canada — 99.97% enrichment quality across AWS, Snowflake, Databricks.

flowchart LR
    Sources[(Sources)] --> Stream[Kinesis / Kafka]
    Stream --> Bronze[S3 Bronze]
    Bronze --> Spark[Spark on EMR / Databricks]
    Spark --> Silver[S3 Silver / Delta]
    Silver --> Gold[S3 Gold / Delta]
    Gold --> SF[(Snowflake DW)]
    Gold --> Glue[Glue Catalog]
    SF --> BI[Tableau / PowerBI / QuickSight]
    Spark --> QA[Quality and Reconciliation]
    QA --> Alerts[CloudWatch / Datadog]
      

Signature Architecture #3 — Certificate Pinning for 1M+ Users

GOTyme Digital Bank — mitigating MitM at mobile-banking scale.

flowchart LR
    CA[Internal CA / PKI] --> Cert[Server Cert + Backup Pin]
    Cert --> SDK[Mobile SDK]
    SDK --> App[Banking App 1M+ users]
    App --> API[(Banking APIs)]
    Rotation[Rotation Pipeline] --> Cert
    App --> Telem[Telemetry / Crash Analytics]
    Telem --> Flags[Feature Flags / Staged Rollout]
      

Signature Architecture #4 — PoS Serverless API Integration

NFQ Asia — in-store PoS hardware integrated with auto-scaling AWS backend.

flowchart LR
    PoS[PoS Devices] --> APIGW[API Gateway]
    APIGW --> Lambda[AWS Lambda Edge API]
    Lambda --> SQS[SQS Queue]
    SQS --> Worker[Lambda Worker]
    Worker --> DDB[(DynamoDB)]
    Worker --> EB[EventBridge]
    EB --> DWH[(Warehouse / Analytics)]
    Lambda --> CW[CloudWatch / Alarms]
      

Education & Certifications

Master in Computer Science
Université Pierre et Marie Curie (Paris 6) — France
2011–2013
AWS Certified Data Analytics Specialty
AWS — Canada
2023
Statistics for Data Analysis
McMaster University — Ontario, Canada
2025
BSc in Computer Science
Nong Lam University — Vietnam
2025
Advanced Training for Banking Architect Leads
AWS Training Center — Vietnam
2020