luis.nagaki — senior lead devops / sre · iac / automation specialist
Lead DevOps / SRE.
I build the bridges between
systems that refuse to talk.
Twenty-six-plus years wiring production infrastructure across GCP, AWS, and OCI. My specialty is the seam between tools — turning Jira tickets into Terraform plans, and giving non-Terraform users a way to ship infrastructure with nothing but a YAML or JSON file.
Every org has the same gap: a platform team that lives in Terraform, and everyone else who lives in Jira, ServiceNow, or a Confluence page. The handoff between them is where projects die.
I close that gap. I build the glue — schema-validated YAML and JSON contracts that compile into HCL, gated by policy, executed through CI/CD, and tracked back to the ticket that asked for them. Developers never touch Terraform. Auditors get a paper trail. SRE keeps the keys.
The result: faster delivery, fewer 3 a.m. pages, and a platform that scales past the people who built it. Including the team of DevOps and SRE engineers I lead and mentor along the way.
# requested via: JIRA-4821 · approver: data-platform-leads kind: BigQueryDataset apiVersion: platform.io/v1 metadata: owner: team-analytics jira: JIRA-4821 env: prod spec: cloud: gcp project: acme-analytics-prod module: bigquery-dataset version: v3.1.2 inputs: dataset_id: orders_curated location: EU description: "Curated orders for downstream reporting" labels: domain: sales tier: gold tables: - name: orders schema: schemas/orders.json partitioning: { field: order_date, type: DAY } clustering: [customer_id, region] - name: order_items schema: schemas/order_items.json partitioning: { field: order_date, type: DAY } governance: data_class: restricted pii: true dataplex_zone: sales-curated cost_center: CC-7741 # → compiled to terraform · planned · PR opened · auto-merged on approval
- GKE, Cloud Run, Anthos
- Vertex AI + Gemini (IaC + scoped IAM)
- BigQuery + Dataplex (IaC framework)
- Custom IAM roles & bindings (IaC)
- VPC-SC, Workload Identity, IAP
- Cloud Build · Artifact Registry
- Org policies, folder-scoped IaC
- EKS, ECS, Lambda, Step Functions
- Lambda / ECS / EC2 (IaC framework)
- Control Tower · Org & SCP design
- CodePipeline · GitHub OIDC
- Transit Gateway, PrivateLink mesh
- OKE, Functions, API Gateway
- Landing zone · compartments · tag defaults
- Resource Manager + custom Terraform
- GCP / AWS → OCI workload migration
Author & maintain Terraform modules.
A versioned, tested, documented module library across GCP, AWS, OCI. Semver pinning, terratest + OPA in CI, drift detection. Modules are the contract — everything else compiles to them.
YAML / JSON → Terraform.
One vocabulary across clouds and SaaS — GCP, AWS, OCI, Okta, Jira. Schema-validated, policy-gated, version-pinned. People who don't write HCL still ship infrastructure.
Jira → GitHub → cloud.
JSON webhooks, GitHub Actions, Workload Identity Federation. No long-lived keys, no manual hops.
Pipelines that don't lie.
Reusable GitHub Actions workflows and Concourse pipelines as a library — same pattern as my Terraform modules. Migrated off Jenkins to get there. SonarQube gating, signed artifacts, OIDC everywhere.
Run Kubernetes in production.
GKE, EKS, OKE. Cluster provisioning, CNI, ingress, network policy, service mesh. Kustomize for env-specific manifest templating. App rollouts, version upgrades, fleet maintenance — clusters as cattle.
Argo CD + Helm.
Desired state in Git, reconciled to Kubernetes. Blue/green, canary, progressive rollouts. Rollback in seconds.
Manager-approved access.
Okta groups, GCP custom IAM, SaaS entitlements — granted via approved Jira tickets. No shadow admins.
Inventory, drift, blast radius.
A queryable catalog of every cloud resource. A Forge app that renders deployment risk inside Jira tickets. The platform sees itself.
SLOs, not vibes.
Error budgets, burn-rate alerts on real user journeys, runbook automation. Datadog + Sumo Logic, incidents routed through PagerDuty with Slack + Jira context. Retros that change behavior, not just calendars.
Cost shifts left.
Right-sized instances, committed-use discounts, spot / preemptible pools. Infracost previews on every PR, idle-resource sweeps from the inventory, cost attribution by team / cost-center baked into IaC tags. Spend that explains itself.
Design with the team. Automate from day one. Plan it all in Jira.
Embedded with product and engineering teams from kickoff — whiteboard sessions, design docs, architecture reviews. Every project ships with IaC, automation, and observability baked in by design, not bolted on later. The Jira backlog goes up before the first commit: epics, tickets, dependencies, blast-radius YAML. Built so the team can run it without me.
AI infrastructure: tokens, not keys.
Vertex AI + Gemini on GCP, provisioned as code. Engineers building agents and AI workflows authenticate through service-account tokens with role assumption, not API keys in .env files. Same security philosophy as the rest of the platform — manager-approved access, scoped IAM, fully audited. AI velocity without the leak surface.
Terraform Module Library
The foundation under every framework below. A curated, versioned Terraform module library covering networking, compute, data, security, and observability primitives across GCP, AWS, and OCI. Each module ships semver-pinned, terratest-covered, OPA-policied, tflint-clean, with auto-generated docs from variables and outputs. Deprecations follow a published window — never a surprise. Modules are the contract between the platform team and everyone who consumes the platform.
$ cloud coverage . . . . . . . gcp · aws · oci · cross-cloud primitives
$ ci enforced . . . . . . . . terratest · opa · tflint · trivy · checkov
$ versioning . . . . . . . . . semver + published deprecation windows
Kubernetes Platform
A multi-cloud Kubernetes substrate that every workload runs on. Cluster provisioning via the Terraform module library (GKE, EKS, OKE), with cluster-wide concerns standardized rather than re-invented per team: CNI (Cilium / native VPC CNI), the Kubernetes Gateway API for ingress and routing (deliberately moved off the older Ingress + NGINX pattern for cleaner platform / app-team separation), network policy and service mesh (Istio or Cilium) for mTLS and east-west traffic control. Autoscaling is tuned in two dimensions — Karpenter and cluster autoscaler for nodes, HPA + VPA for pods. External Secrets Operator pulls from cloud secret stores so no plaintext secrets live in Git. Velero handles backup and DR, with periodic restore drills (DR plans you haven't tested are wishes). Prometheus + Grafana for in-cluster metrics, recording rules for SLO math, remote-write into the broader observability stack. Pod Security Standards and namespace-scoped RBAC keep multi-tenancy honest. App delivery rides on top via Kustomize for env-specific manifests and Argo CD + Helm for GitOps. Cluster and node upgrades are scripted, zero-downtime, and routine — not events.
$ networking . . . . . . . . . cilium / vpc cni · gateway api · netpol · istio mesh
$ autoscaling . . . . . . . . karpenter · cluster autoscaler · hpa · vpa
$ platform services . . . . . external-secrets · velero · prometheus · grafana
$ security & tenancy . . . . . pod security standards · rbac · network policy
$ app lifecycle . . . . . . . kustomize + helm + argo cd · progressive rollouts
AI / ML Platform
The GCP AI substrate: Vertex AI workbenches, model endpoints, training jobs,
and Gemini access — provisioned through the Terraform module library. The
interesting work isn't the model infrastructure itself; it's the developer access
layer around it. Engineers building agents and AI workflows authenticate through
short-lived service-account tokens with role assumption (the same pattern
as Workload Identity Federation for CI), not long-lived API keys scattered across
.env files and laptops. IAM scoping by team and project tier; access
requests flow through the identity framework (PRJ-007), so manager approval is the
gate, not Vertex AI console access. The result: AI engineering teams move fast
without scattering keys, and security knows exactly who has access to what model.
$ developer auth . . . . . . . service-account tokens + role assumption (no api keys)
$ access model . . . . . . . . manager-approved jira ticket → scoped iam binding
$ leak surface . . . . . . . . zero long-lived ai keys in code or laptops
Reusable CI/CD Workflow Library
The CI/CD parallel of the module library: a curated, versioned set of
reusable GitHub Actions workflows and Concourse pipeline templates
that every team calls into instead of copy-pasting YAML. Build, test, lint, scan,
sign, publish, deploy — each step lives once, gets tested, and is versioned
independently. Consumer repos reference the library with a single pinned line
(uses: org/.github/.workflows/build.yml@v3.2.1). Composite actions
handle the standard scaffolding: cloud auth via OIDC and Workload Identity
Federation, secret rotation, OPA policy gating, SonarQube quality checks. The
library is the contract for what "shipping software" looks like at the org —
owned centrally, consumed everywhere, deprecated on a published window.
$ versioning . . . . . . . . . semver · pinned by consumers · deprecation windows
$ auth model . . . . . . . . . oidc + wif · zero long-lived pipeline secrets
$ consistency . . . . . . . . one definition of "shipping" across every repo
YAML / JSON Deployment Framework
A self-service abstraction: devs write a small YAML or JSON file describing what they need (a service, a bucket, a vault, a queue). The framework validates against JSON Schema, applies OPA policy, picks the right Terraform module + version, and produces a fully planned change. Same input file works across GCP, AWS, and OCI — the headline pattern that the next three frameworks specialize.
$ clouds supported . . . . . . gcp · aws · oci
$ learning curve for devs . . ~1 page of docs
$ policy violations at apply . caught at PR-time
AWS Workloads IaC Framework
An application-team specialization of the YAML framework targeting AWS workloads. Developers
describe a service in a single file — runtime (Lambda, ECS task, EC2/ASG), networking, IAM,
event sources, queues, buckets, databases, secrets — and the framework compiles it into a
full Terraform stack with least-privilege IAM wired in by default. Same author writes
lambda.fn.yaml on Monday and ecs.svc.yaml on Tuesday; the abstraction
stays consistent.
$ service primitives . . . . . sqs · sns · s3 · rds · secrets mgr
$ iam policy . . . . . . . . . least-privilege, generated per service
$ time to ship a new svc . . . ~minutes, not sprints
BigQuery & Dataplex IaC Framework
A data-team specialization of the YAML framework. Analytics engineers and data engineers describe datasets, tables, views, materialized views, scheduled queries, routines, and Dataplex lakes / zones / assets in a single declarative file. The schema bakes in data classification (PII, restricted, public), access patterns, and lifecycle rules — the framework compiles it all to Terraform underneath. Data people ship their own infrastructure without ever opening an HCL file.
$ dataplex coverage . . . . . lakes · zones · assets · tasks
$ data classification . . . . pii / restricted enforced pre-apply
$ self-serve adoption . . . . analysts onboarded · 0 hcl written
Identity & SaaS Access Framework
The framework pattern extended past infrastructure into identity. Access requests — Okta group membership, application assignment, GCP custom IAM roles & bindings, and other SaaS entitlements — flow through Jira. The requester's manager approves the ticket; the webhook pipeline validates the request against policy, then Terraform applies the change against Okta, GCP IAM, GCP Identity-Aware Proxy (IAP) bindings, and the relevant SaaS APIs. The manager never needs Okta admin rights. The requester never needs GCP project ownership. Approvers approve; the system grants. Audit lives in Jira and Git, not in a spreadsheet.
$ gcp iam . . . . . . . . . . custom roles · bindings · org policies
$ app access . . . . . . . . . gcp iap · google identity · no vpn
$ approval model . . . . . . . manager-approves jira → tf applies
$ shadow admins . . . . . . . . eliminated
Jira → Terraform Bridge
A serverless bridge between Jira and a Terraform monorepo. Jira fires a JSON webhook
on ticket transition; a GitHub Actions workflow receives it, validates the payload
against a JSON Schema, generates a YAML request from the ticket fields, opens a PR, runs
terraform plan, and posts the plan back into the ticket as a comment.
Ticket approval triggers the apply workflow, which authenticates to GCP, AWS, and OCI via
Workload Identity Federation — no long-lived keys, no service-account JSON files,
no static cloud credentials anywhere in CI.
$ ticket-to-resource trace . . 100% auditable in jira + git
$ auth model . . . . . . . . . wif · zero long-lived cloud creds
$ manual provisioning . . . . . eliminated
Blast Radius — Jira Forge App
An Atlassian Forge app that turns production deployment tickets in Jira
into risk briefings. Every application repository ships a blast-radius.yaml
declaring service information, owners, severity tier, and upstream / downstream
dependencies. When a production deploy ticket is opened, the Forge app reads the
YAML from the repo's main branch and renders a dynamic panel inside the issue
view: what's changing, what depends on it, who owns each dependency, and the
resulting blast radius. Approvers see the real risk profile before they click
approve — not a hunch, not a wiki page someone last edited two years ago.
$ yaml contract . . . . . . . per-repo, version-controlled, team-owned
$ approver context . . . . . . deps · owners · severity · downstream
$ change incidents . . . . . . caught at approval, not after deploy
Multi-Cloud Resource Inventory
A queryable catalog of every resource running across GCP, AWS, and OCI. The
inventory ingests from three sources of truth: Terraform state (for what
the platform believes exists), provider APIs (for what actually exists —
and therefore drift), and the per-repo blast-radius.yaml files (for
ownership and dependency metadata). Resources are indexed by owner, cost center,
environment, tier, region, and dependency graph edges. The catalog feeds cost
attribution reports, drift detection, the Blast Radius Forge app, and the access
framework's policy decisions.
$ data sources . . . . . . . . tf state · provider apis · repo yaml
$ queryable by . . . . . . . . owner · cost center · tier · env · region
$ drives . . . . . . . . . . . blast radius · cost · drift · access policy
Cloud Cost Optimization
Cost work that runs continuously, not quarterly. Attribution wired into every IaC tag (team, cost center, environment, tier) so the inventory produces showback / chargeback reports by ownership — not just a giant monthly bill. Infracost runs in CI on every Terraform PR and posts a dollar-delta comment before merge, so cost becomes a code-review concern instead of a billing-cycle surprise. Idle-resource sweeps from the inventory catch the long tail: orphan disks, unused public IPs, stale snapshots, dormant load balancers. Right-sizing recommendations from utilization data feed back into module defaults so the next deploy gets it right by default. For Kubernetes: spot / preemptible node pools and bin-packing through Karpenter (EKS) and cluster autoscaler (GKE / OKE). For data: BigQuery slot reservations vs. on-demand chosen by workload pattern. Commitments — CUDs on GCP, Savings Plans on AWS, RIs on OCI — sized from real consumption, not vendor sales-deck projections.
$ pr-time visibility . . . . . infracost preview · dollar delta per change
$ commitments . . . . . . . . cud (gcp) · savings plans (aws) · ri (oci)
$ idle sweep . . . . . . . . . orphan disks · unused ips · stale snapshots
$ k8s spend . . . . . . . . . spot pools · karpenter bin-pack · right-sized requests
OCI Landing Zone & Migration
Designed and rolled out an OCI landing zone from scratch — compartment hierarchy, tag defaults, network architecture, identity domains, and logging baseline — to support the migration of production workloads from GCP and AWS into OCI. Built the IaC modules from scratch and wrapped them in the same YAML / JSON framework used elsewhere, so the application teams running the migrations shipped OCI infrastructure using a vocabulary they already knew. No second platform to learn, no second framework to maintain.
$ tenancy guardrails . . . . . policy-as-code, drift-checked
$ migration scope . . . . . . . workloads from gcp + aws → oci
$ production cutover . . . . . zero customer impact
$ let's_talk --about infra
Open to staff/principal DevOps / SRE, platform engineering lead, or cloud architect roles. Remote-first; long-term contracts welcome.
# off-hours: ESP32 + Home Assistant, Node-RED, MQTT. Same patterns as the day job, smaller blast radius.