Wolk

Migrating sophisticated, compute-intensive forecasting models to the cloud uncovered a handful of critical hurdles—ranging from performance and portability concerns to access controls and scheduling—that we needed to address head-on to achieve a seamless, reliable pipeline.

The forecasting models needed substantial CPU and memory resources, which exceeded the capacity of researchers’ laptops.

Solution: We containerized each model in Docker and stored images in AWS ECR. We then leveraged AWS Batch on Fargate to provision right-sized compute on demand, ensuring each monthly run has the resources it needs without idle capacity.

Our client’s domain experts excel at climate science, not at cloud infrastructure or containerization.

Solution: We built end-to-end CI/CD pipelines in GitHub Actions that handle Docker builds, tests, and deployments. Researchers now simply push code—no need to learn Docker commands or AWS consoles.

The same codebase needed to run seamlessly on a developer’s laptop and in AWS.

Solution: We defined environment-agnostic configuration files and used IaC (Terraform) to spin up equivalent S3 buckets, IAM roles, and Batch job definitions locally (via tools like LocalStack) and in production, eliminating “it works on my machine” issues.

New weather data arrives monthly and must trigger the forecasting pipeline reliably.

Solution: We configured AWS EventBridge rules to invoke AWS Batch jobs on the first of each month. This orchestration ensures the pipeline kicks off exactly when needed—every time.

Multiple partner institutes require segmented access to input data and prediction outputs.

Solution: We implemented IAM policies scoped per institute and used separate S3 prefixes for each collaborator. This enforces least-privilege access while keeping a unified infrastructure.

Werk

We transformed a set of Python-based weather prediction models—originally running only on individual laptops—into a reliable, scalable cloud solution that serves both our primary client and its partner institutes.

High-performance computing requirements

Researchers unfamiliar with cloud tooling

Code portability between local and cloud environments

Monthly recurring runs with fresh data

Collaborative data access control