Skip to content

Derek Albosta

I build data platforms and ML systems — engineered like production software.

Software & data engineer · 5 years across startups and Indeed-scale products

Derek Albosta

595M+

rows validated in a production-scale healthcare EDW

100GB+

file batches processed by a distributed Azure platform

15%

pricing-accuracy improvement, A/B-validated at 1.2M+ MAU

80–90%

manual validation effort eliminated for clients

Selected Work

Value-Based Care Enterprise Data Warehouse

Invene
Problem
Unify 19 payer-specific healthcare file layouts into an analytics-ready warehouse, with correctness guaranteed before go-live.
Architecture
Medallion (Bronze/Silver/Gold) on Microsoft Fabric; PySpark transforms extracted into unit-tested utility modules, with data quality gates at every pipeline stage.
Outcome
595M+ rows (~60GB) validated at production scale. Threshold-based anomaly detection eliminated a recurring six-figure financial-loss risk.
Microsoft Fabric PySpark Python

Distributed Health Plan Validation Platform

Invene
Problem
Clients manually validated massive health-plan file batches — slow, error-prone, and unscalable.
Architecture
Event-driven .NET 8 Azure Functions with Service Bus and Blob Storage, orchestrated by ADF. Database-driven rule configuration covers 500+ health-plan and subject-area combinations — new file types onboard with zero code changes.
Outcome
Handles 100GB+ batches; cut client manual validation effort by 80–90%.
.NET 8 Azure Functions Service Bus ADF

Employer Pricing Model, SimplyHired.com

Indeed
Problem
Improve pricing accuracy for a product with 1.2M+ monthly users without degrading API latency.
Architecture
Python pricing model behind REST APIs; outputs precomputed and cached in MongoDB; every model enhancement gated behind statistically rigorous A/B tests.
Outcome
15% MAE improvement and 70%+ latency reduction, with revenue impact validated before full rollout.
Python MongoDB A/B testing

ResumeTailor

Personal project
Problem
Tailoring a resume to each job posting is tedious and unguided.
Architecture
GPT-powered comparison of job descriptions against resumes — compatibility scoring, concrete rewrite recommendations, and role-specific interview prep.
Outcome
Open source on GitHub.
Python LLMs View source →

About

I started at Indeed as a software engineer and was promoted into data science after shipping ML models that measurably moved revenue. Data science taught me statistical rigor — A/B testing, knowing when a result is real. Leading greenfield architecture as a technical lead taught me ownership — schema to deployment. Today I build healthcare data platforms where both compound: pipelines that are unit-tested, quality-gated, and monitored like production software.

Most data systems fail on engineering, not math. I build the kind that don't.

Experience

  1. Nov 2025 – Present

    Software Engineer (Contract) · Invene

    Built a healthcare EDW on Microsoft Fabric and a distributed Azure validation platform processing 100GB+ batches.

  2. Dec 2024 – Nov 2025

    Technical Lead · Nice.Industries

    Owned greenfield full-stack architecture from inception to deployment; selected for Beta University accelerator.

  3. May 2021 – May 2024

    Software Engineer → Data Scientist · Indeed

    Productionized pricing ML for 1.2M+ MAU; reallocated ~$5M ad spend via clustering; promoted to Data Scientist.

  4. Dec 2020 – May 2021

    ML Intern · Juva Health

    HIPAA-aligned video de-identification pipeline (deep autoencoders, Keras/TensorFlow).

B.S. Computer Science, Minor in Mathematics — University of Puget Sound, 2020

Skills

Data platforms & pipelines

  • PySpark
  • Microsoft Fabric
  • Azure Data Factory
  • SQL
  • PostgreSQL
  • MongoDB
  • Medallion architecture
  • Data quality engineering

Backend & cloud

  • Python
  • C# / .NET 8
  • Azure (Functions, Service Bus, Blob Storage)
  • AWS
  • REST APIs
  • FastAPI
  • Flask
  • Django
  • Docker

ML & statistics

  • scikit-learn
  • TensorFlow
  • PyTorch
  • pandas / NumPy
  • A/B testing & experiment design
  • LangChain / LLM pipelines

Certifications

Anthropic

  • Claude Code in Action Jun 2026
  • Building with the Claude API Jun 2026
  • Introduction to Model Context Protocol Jun 2026
  • Introduction to Agent Skills May 2026

DeepLearning.AI

Deep Learning Specialization

  • Neural Networks and Deep Learning Nov 2020
  • Improving Deep Neural Networks: Hyperparameter Tuning, Regularization and Optimization Nov 2020
  • Structuring Machine Learning Projects Nov 2020
  • Convolutional Neural Networks Dec 2020

Startup Universe

  • AI for Good Institute @ Stanford Jun 2025