Digital Endeavours was engaged by a consultancy working for a music industry client to design and deliver event-based ML infrastructure from scratch.
The greenfield project required establishing technical capability early through rapid delivery of frontend infrastructure within days, building foundation for a one-year engagement covering both architecture and engineering responsibilities. The scope encompassed complete AWS account structure, networking, backend services, sequential data processing pipeline, graph database, and frontend portals.
Digital Endeavours delivered Infrastructure-as-Code through automated GitLab pipelines, implementing architectural designs with built-in validation and testing. The engagement required covering both architecture and engineering roles, as the consultancy had no dedicated architects available. This dual responsibility enabled consistent design decisions across the complete system, from AWS Organizations setup through application deployment automation. High availability and cost optimisation were assessed throughout, ensuring the platform balanced functionality with operational efficiency.
The infrastructure foundation established AWS Organizations with multi-account capability, CloudWatch billing alarms, and GitHub OIDC for credential-less deployments. The VPC was designed for high availability with Multi-AZ capability whilst maintaining cost efficiency through a single NAT Gateway and selective VPC endpoint deployment. Security groups enforced strict access controls, with all compute resources deployed in private subnets. This approach ensured robust infrastructure whilst maintaining lean operational costs.
The backend service deployed as an ECS Fargate task behind an Application Load Balancer with ACM certificates and CloudWatch logging. Aurora Serverless v2 PostgreSQL provided the database layer, with Secrets Manager handling credential rotation and strict IAM policies enforcing least-privilege access. The service handled client portal authentication and triggered SES for email notifications.
The data processing pipeline required sequential execution due to rate-limited external API constraints. ECS Fargate tasks processed data in sequence, with each task writing results to S3 before loading into Neptune graph database in small batches to accommodate endpoint limitations. SQS queues decoupled pipeline stages, with Lambda functions orchestrating task lifecycle - stopping completed tasks and starting subsequent stages. The Neptune graph database was selected for relationship mapping requirements, with managed service capabilities ensuring the client could maintain infrastructure following any future handover. Each ECS task and Lambda function used dedicated container images stored in ECR, with individual GitHub repositories managing builds and deployments. EventBridge rules ensured Lambda functions used latest images through automated configuration updates.
Frontend delivery consisted of CloudFront distributions serving S3-hosted static content with ACM certificates and Route53 DNS management. Frontend portals implemented authentication through Cognito and Django backend integration, appropriate to their respective access requirements. GitHub Actions automated deployments, publishing updated static content to S3. All infrastructure components were deployed using Terraform, with GitHub Actions pipelines providing automated validation and deployment across multiple repositories.
Optimisation remained an ongoing focus throughout the engagement. Digital Endeavours assessed the Django container build process, which was taking approximately 60 minutes. By migrating the base image from Alpine Linux to Amazon Linux 2023, build times reduced to 3 minutes. This optimisation resulted from Amazon Linux 2023 using glibc rather than musl libc, enabling pip to download pre-compiled Python binary wheels instead of compiling packages from source. Cost optimisation focused on keeping infrastructure lightweight, deploying only necessary components. Data transfer costs during intensive testing periods created billing spikes, prompting team education on cost impact awareness. This fostered a culture of cost consciousness, reducing unnecessary expenses affecting the client. The architectural split between Lambda for short-duration tasks and ECS Fargate for long-running processes optimised compute costs based on workload characteristics.
Results
The engagement delivered a complete event-based ML system spanning account structure, networking, backend services, sequential data processing pipeline, graph database, and frontend portals. The infrastructure successfully supported the client’s ML workflows and frontend requirements throughout the one-year engagement, with additional capabilities identified for future development phases.
Key Technologies
CI/CD & IaC: GitHub Actions, Terraform
Infrastructure: AWS (Organizations, VPC, ECS Fargate, Lambda, Aurora Serverless v2, Neptune, S3, CloudFront, ALB, SES, ECR, SQS, EventBridge, Secrets Manager, Cognito, Route53, ACM)
Application Stack: Python, Django, Docker
Monitoring: CloudWatch