-
10+ years of software engineering AI/ML experience
-
Proven leadership of production AI/ML systems at scale
-
Deep expertise in LLM productionization. Including: RAG, finetuning, evaluation, guardrails, and model monitoring.
-
Strong Python experience
-
Experience with modern AI frameworks. Including: PyTorch, TensorFlow, JAX, Scikit-learn.
-
Hands-on AI/MLOps experience. Including: CI/CD for ML, deployment automation, experiment tracking, and monitoring.
-
Strong experience with cloud platforms. Including: AWS/GCP/Azure.
-
Strong experience with Kubernetes and other distributed systems
-
Experienced in building evaluation pipelines and adding observability instrumentation.
-
Technical leadership by shaping the architectural direction across multiple teams.
-
Define and own architecture for scalable AI/ML systems. Including:
-
Inference pipelines Evaluation frameworks Model lifecycle workflows Monitoring and observability systems
-
Translate business requirements into robust AI platform designs and delivery plans
-
Make strategic decisions on:
-
Model integrations and gateways Retrieval-augmented generation (RAG) approaches Evaluation methodologies Safety and guardrail systems
-
Establish standards for model readiness, evaluation gates, rollout/rollback mechanisms, and drift detection
-
Build and deploy production-grade LLM capabilities integrated into distributed systems with clear SLOs and telemetry
-
Design scalable AI/MLOps and AIOps practices across training, testing, deployment, and monitoring
-
Improve data pipelines, feature workflows, and lineage processes supporting model evaluation and inference
-
Instrument tracing and model observability using OpenTelemetry and modern telemetry standards
-
Own evaluation pipelines tracking latency, cost, accuracy, hallucination rates, and prompt/version drift
-
Provide clear trade-off analyses balancing model performance, cost efficiency, safety, and maintainability
-
Create clear, well-structured technical proposals that help guide executive decisions on investments and roadmap planning.
-
Guide engineers in AI production, fostering good experimentation habits, and designing distributed systems.
-
Boost engineering quality with thoughtful reviews, clear documentation, and standards built on solid practices.
-
Shape the AI production architecture of a category-defining GenAI infrastructure company
-
Define how enterprise-grade AI systems are observed, evaluated, and remediated
-
Build mechanisms that scale beyond individual engineers
-
Influence roadmap and platform strategy at a formative stage