AI Inference Engineer
About Us
At Teton, we are redefining the role of healthcare workers through cutting-edge AI technology. Facing a global nursing shortage, our solutions offer vital support to overburdened health systems. We distinguish ourselves by focusing relentlessly on product excellence and user experience, rapidly deploying solutions that make a real difference.
At this stage of our company we require a physical presence in our office in Copenhagen, Denmark. We believe this enables the fastest and most efficient iteration cycles to build an impactful product that users love to use.
The Job
We're looking for a highly specialized AI Inference Engineer who thrives on optimizing AI models for real-world deployment at scale. You'll be the technical force behind making our healthcare AI systems blazingly fast, efficient, and production-ready. This role demands deep technical expertise in model optimization, CUDA programming, and cutting-edge inference frameworks.
You will be responsible for:
Model Optimization & Quantization: Implementing advanced quantization techniques, pruning, and distillation to maximize inference speed while maintaining accuracy
CUDA & Low-Level Optimization: Writing and optimizing CUDA kernels, leveraging TensorRT, and pushing the boundaries of GPU utilization
DeepStream Integration: Building robust inference pipelines using NVIDIA DeepStream, Jetpack, and edge deployment frameworks
Transformer Optimization: Specializing in transformer model inference optimization, including attention mechanisms, KV-cache optimization, and memory management
Infrastructure Scaling: Designing and implementing scalable inference infrastructure that can handle healthcare's demanding real-time requirements
Performance Engineering: Profiling, benchmarking, and continuously improving model serving latency and throughput
What You Bring
Deep AI Optimization Expertise: 3+ years of hands-on experience optimizing deep learning models for production inference
CUDA Mastery: Strong proficiency in CUDA programming, kernel optimization, and GPU memory management
Inference Frameworks: Extensive experience with TensorRT, DeepStream, Triton Inference Server, or similar high-performance serving frameworks
Transformer Specialization: Deep understanding of transformer architectures and their optimization challenges (attention mechanisms, memory patterns, sequence handling)
Systems Programming: Proficiency in Python, C++, and PyTorch with a focus on performance-critical code
Edge Deployment: Experience with NVIDIA Jetpack, edge computing, and resource-constrained environments
Performance Mindset: Obsessed with benchmarking, profiling, and squeezing every ounce of performance from hardware
Bonus Points
Experience with custom CUDA kernel development
Knowledge of mixed-precision training and inference
Familiarity with distributed inference and model parallelism
Experience with healthcare or safety-critical AI applications
Contributions to open-source inference optimization projects
What We Offer
Participation in our warrant program (stock options)
Work with state-of-the-art AI optimization technology in a pioneering field
Access to cutting-edge hardware and compute resources
A vibrant, learning-focused work environment with fellow optimization enthusiasts
Direct impact on healthcare delivery through performance-critical AI systems
Join Our Team
We're looking for engineers who get excited about shaving milliseconds off inference time and making AI models run faster than anyone thought possible. If you're passionate about the intersection of AI, systems programming, and real-world impact, come help us transform healthcare through optimized AI inference.
Ready to push the boundaries of what's possible with AI optimization? Join us in Copenhagen and be part of our mission to revolutionize healthcare.