$ whoami
luis.nagaki — senior lead devops / sre · iac / automation specialist

Lead DevOps / SRE.
I build the bridges between
systems that refuse to talk.

Twenty-six-plus years wiring production infrastructure across GCP, AWS, and OCI. My specialty is the seam between tools — turning Jira tickets into Terraform plans, and giving non-Terraform users a way to ship infrastructure with nothing but a YAML or JSON file.

26+ yrs production 3 clouds in anger +50% deploy speed 0 snowflakes left behind iac · config mgmt · ci/cd · kubernetes · gitops · observability · identity · ai/ml

view work → $ contact --me

// 01 the_problem_i_solve

Every org has the same gap: a platform team that lives in Terraform, and everyone else who lives in Jira, ServiceNow, or a Confluence page. The handoff between them is where projects die.

I close that gap. I build the glue — schema-validated YAML and JSON contracts that compile into HCL, gated by policy, executed through CI/CD, and tracked back to the ticket that asked for them. Developers never touch Terraform. Auditors get a paper trail. SRE keeps the keys.

The result: faster delivery, fewer 3 a.m. pages, and a platform that scales past the people who built it. Including the team of DevOps and SRE engineers I lead and mentor along the way.

dataset.bq.yaml — bigquery + dataplex request from JIRA-4821

# requested via: JIRA-4821  ·  approver: data-platform-leads
kind: BigQueryDataset
apiVersion: platform.io/v1
metadata:
  owner:    team-analytics
  jira:     JIRA-4821
  env:      prod
spec:
  cloud:    gcp
  project:  acme-analytics-prod
  module:   bigquery-dataset
  version:  v3.1.2
  inputs:
    dataset_id:  orders_curated
    location:    EU
    description: "Curated orders for downstream reporting"
    labels:
      domain:    sales
      tier:      gold
    tables:
      - name:        orders
        schema:      schemas/orders.json
        partitioning: { field: order_date, type: DAY }
        clustering:   [customer_id, region]
      - name:        order_items
        schema:      schemas/order_items.json
        partitioning: { field: order_date, type: DAY }
  governance:
    data_class:    restricted
    pii:           true
    dataplex_zone: sales-curated
    cost_center:   CC-7741
# → compiled to terraform · planned · PR opened · auto-merged on approval

// 02 cloud_footprint

cloud · 01

GCP

production workloads

GKE, Cloud Run, Anthos
Vertex AI + Gemini (IaC + scoped IAM)
BigQuery + Dataplex (IaC framework)
Custom IAM roles & bindings (IaC)
VPC-SC, Workload Identity, IAP
Cloud Build · Artifact Registry
Org policies, folder-scoped IaC

cloud · 02

AWS

production workloads

EKS, ECS, Lambda, Step Functions
Lambda / ECS / EC2 (IaC framework)
Control Tower · Org & SCP design
CodePipeline · GitHub OIDC
Transit Gateway, PrivateLink mesh

cloud · 03 — current

OCI

production workloads · current focus

OKE, Functions, API Gateway
Landing zone · compartments · tag defaults
Resource Manager + custom Terraform
GCP / AWS → OCI workload migration

// 03 what_i_actually_do

infrastructure as code

Author & maintain Terraform modules.

A versioned, tested, documented module library across GCP, AWS, OCI. Semver pinning, terratest + OPA in CI, drift detection. Modules are the contract — everything else compiles to them.

platform engineering

YAML / JSON → Terraform.

One vocabulary across clouds and SaaS — GCP, AWS, OCI, Okta, Jira. Schema-validated, policy-gated, version-pinned. People who don't write HCL still ship infrastructure.

workflow automation

Jira → GitHub → cloud.

JSON webhooks, GitHub Actions, Workload Identity Federation. No long-lived keys, no manual hops.

ci/cd

Pipelines that don't lie.

Reusable GitHub Actions workflows and Concourse pipelines as a library — same pattern as my Terraform modules. Migrated off Jenkins to get there. SonarQube gating, signed artifacts, OIDC everywhere.

kubernetes platform

Run Kubernetes in production.

GKE, EKS, OKE. Cluster provisioning, CNI, ingress, network policy, service mesh. Kustomize for env-specific manifest templating. App rollouts, version upgrades, fleet maintenance — clusters as cattle.

gitops · deployments

Argo CD + Helm.

Desired state in Git, reconciled to Kubernetes. Blue/green, canary, progressive rollouts. Rollback in seconds.

identity & access

Manager-approved access.

Okta groups, GCP custom IAM, SaaS entitlements — granted via approved Jira tickets. No shadow admins.

visibility & change mgmt

Inventory, drift, blast radius.

A queryable catalog of every cloud resource. A Forge app that renders deployment risk inside Jira tickets. The platform sees itself.

reliability

SLOs, not vibes.

Error budgets, burn-rate alerts on real user journeys, runbook automation. Datadog + Sumo Logic, incidents routed through PagerDuty with Slack + Jira context. Retros that change behavior, not just calendars.

cost optimization

Cost shifts left.

Right-sized instances, committed-use discounts, spot / preemptible pools. Infracost previews on every PR, idle-resource sweeps from the inventory, cost attribution by team / cost-center baked into IaC tags. Spend that explains itself.

architecture & team leadership

Design with the team. Automate from day one. Plan it all in Jira.

Embedded with product and engineering teams from kickoff — whiteboard sessions, design docs, architecture reviews. Every project ships with IaC, automation, and observability baked in by design, not bolted on later. The Jira backlog goes up before the first commit: epics, tickets, dependencies, blast-radius YAML. Built so the team can run it without me.

ai / ml platform

AI infrastructure: tokens, not keys.

Vertex AI + Gemini on GCP, provisioned as code. Engineers building agents and AI workflows authenticate through service-account tokens with role assumption, not API keys in .env files. Same security philosophy as the rest of the platform — manager-approved access, scoped IAM, fully audited. AI velocity without the leak surface.

// 04 selected_work

// foundations the substrate [ 4 ]

PRJ-006

Terraform Module Library

terraform modules semver terratest opa

The foundation under every framework below. A curated, versioned Terraform module library covering networking, compute, data, security, and observability primitives across GCP, AWS, and OCI. Each module ships semver-pinned, terratest-covered, OPA-policied, tflint-clean, with auto-generated docs from variables and outputs. Deprecations follow a published window — never a surprise. Modules are the contract between the platform team and everyone who consumes the platform.

$ module catalog . . . . . . . network · compute · data · security · obs
$ cloud coverage . . . . . . . gcp · aws · oci · cross-cloud primitives
$ ci enforced . . . . . . . . terratest · opa · tflint · trivy · checkov
$ versioning . . . . . . . . . semver + published deprecation windows

PRJ-010

Kubernetes Platform

kubernetes gke eks oke cilium istio gateway api karpenter prometheus grafana velero external-secrets kustomize helm

A multi-cloud Kubernetes substrate that every workload runs on. Cluster provisioning via the Terraform module library (GKE, EKS, OKE), with cluster-wide concerns standardized rather than re-invented per team: CNI (Cilium / native VPC CNI), the Kubernetes Gateway API for ingress and routing (deliberately moved off the older Ingress + NGINX pattern for cleaner platform / app-team separation), network policy and service mesh (Istio or Cilium) for mTLS and east-west traffic control. Autoscaling is tuned in two dimensions — Karpenter and cluster autoscaler for nodes, HPA + VPA for pods. External Secrets Operator pulls from cloud secret stores so no plaintext secrets live in Git. Velero handles backup and DR, with periodic restore drills (DR plans you haven't tested are wishes). Prometheus + Grafana for in-cluster metrics, recording rules for SLO math, remote-write into the broader observability stack. Pod Security Standards and namespace-scoped RBAC keep multi-tenancy honest. App delivery rides on top via Kustomize for env-specific manifests and Argo CD + Helm for GitOps. Cluster and node upgrades are scripted, zero-downtime, and routine — not events.

$ cluster fleet . . . . . . . gke (gcp) · eks (aws) · oke (oci)
$ networking . . . . . . . . . cilium / vpc cni · gateway api · netpol · istio mesh
$ autoscaling . . . . . . . . karpenter · cluster autoscaler · hpa · vpa
$ platform services . . . . . external-secrets · velero · prometheus · grafana
$ security & tenancy . . . . . pod security standards · rbac · network policy
$ app lifecycle . . . . . . . kustomize + helm + argo cd · progressive rollouts

PRJ-012

AI / ML Platform

vertex ai gemini gcp gcp iam service accounts wif terraform

The GCP AI substrate: Vertex AI workbenches, model endpoints, training jobs, and Gemini access — provisioned through the Terraform module library. The interesting work isn't the model infrastructure itself; it's the developer access layer around it. Engineers building agents and AI workflows authenticate through short-lived service-account tokens with role assumption (the same pattern as Workload Identity Federation for CI), not long-lived API keys scattered across .env files and laptops. IAM scoping by team and project tier; access requests flow through the identity framework (PRJ-007), so manager approval is the gate, not Vertex AI console access. The result: AI engineering teams move fast without scattering keys, and security knows exactly who has access to what model.

$ infrastructure . . . . . . . vertex ai · model endpoints · workbenches · gemini
$ developer auth . . . . . . . service-account tokens + role assumption (no api keys)
$ access model . . . . . . . . manager-approved jira ticket → scoped iam binding
$ leak surface . . . . . . . . zero long-lived ai keys in code or laptops

PRJ-013

Reusable CI/CD Workflow Library

github actions concourse ci reusable workflows composite actions semver opa

The CI/CD parallel of the module library: a curated, versioned set of reusable GitHub Actions workflows and Concourse pipeline templates that every team calls into instead of copy-pasting YAML. Build, test, lint, scan, sign, publish, deploy — each step lives once, gets tested, and is versioned independently. Consumer repos reference the library with a single pinned line (uses: org/.github/.workflows/build.yml@v3.2.1). Composite actions handle the standard scaffolding: cloud auth via OIDC and Workload Identity Federation, secret rotation, OPA policy gating, SonarQube quality checks. The library is the contract for what "shipping software" looks like at the org — owned centrally, consumed everywhere, deprecated on a published window.

$ library scope . . . . . . . build · test · scan · sign · publish · deploy
$ versioning . . . . . . . . . semver · pinned by consumers · deprecation windows
$ auth model . . . . . . . . . oidc + wif · zero long-lived pipeline secrets
$ consistency . . . . . . . . one definition of "shipping" across every repo

// frameworks one pattern, many domains [ 4 ]

PRJ-002

YAML / JSON Deployment Framework

terraform json-schema opa go

A self-service abstraction: devs write a small YAML or JSON file describing what they need (a service, a bucket, a vault, a queue). The framework validates against JSON Schema, applies OPA policy, picks the right Terraform module + version, and produces a fully planned change. Same input file works across GCP, AWS, and OCI — the headline pattern that the next three frameworks specialize.

$ modules abstracted . . . . . 40+
$ clouds supported . . . . . . gcp · aws · oci
$ learning curve for devs . . ~1 page of docs
$ policy violations at apply . caught at PR-time

PRJ-005

AWS Workloads IaC Framework

terraform aws lambda ecs ec2 yaml

An application-team specialization of the YAML framework targeting AWS workloads. Developers describe a service in a single file — runtime (Lambda, ECS task, EC2/ASG), networking, IAM, event sources, queues, buckets, databases, secrets — and the framework compiles it into a full Terraform stack with least-privilege IAM wired in by default. Same author writes lambda.fn.yaml on Monday and ecs.svc.yaml on Tuesday; the abstraction stays consistent.

$ compute primitives . . . . . lambda · ecs · ec2 · asg · batch
$ service primitives . . . . . sqs · sns · s3 · rds · secrets mgr
$ iam policy . . . . . . . . . least-privilege, generated per service
$ time to ship a new svc . . . ~minutes, not sprints

PRJ-004

BigQuery & Dataplex IaC Framework

terraform bigquery dataplex yaml gcp

A data-team specialization of the YAML framework. Analytics engineers and data engineers describe datasets, tables, views, materialized views, scheduled queries, routines, and Dataplex lakes / zones / assets in a single declarative file. The schema bakes in data classification (PII, restricted, public), access patterns, and lifecycle rules — the framework compiles it all to Terraform underneath. Data people ship their own infrastructure without ever opening an HCL file.

$ bq objects abstracted . . . datasets · tables · views · routines
$ dataplex coverage . . . . . lakes · zones · assets · tasks
$ data classification . . . . pii / restricted enforced pre-apply
$ self-serve adoption . . . . analysts onboarded · 0 hcl written

PRJ-007

Identity & SaaS Access Framework

terraform okta gcp iam iap jira yaml

The framework pattern extended past infrastructure into identity. Access requests — Okta group membership, application assignment, GCP custom IAM roles & bindings, and other SaaS entitlements — flow through Jira. The requester's manager approves the ticket; the webhook pipeline validates the request against policy, then Terraform applies the change against Okta, GCP IAM, GCP Identity-Aware Proxy (IAP) bindings, and the relevant SaaS APIs. The manager never needs Okta admin rights. The requester never needs GCP project ownership. Approvers approve; the system grants. Audit lives in Jira and Git, not in a spreadsheet.

$ saas providers managed . . . okta · github · pagerduty · slack
$ gcp iam . . . . . . . . . . custom roles · bindings · org policies
$ app access . . . . . . . . . gcp iap · google identity · no vpn
$ approval model . . . . . . . manager-approves jira → tf applies
$ shadow admins . . . . . . . . eliminated

// platform tooling the systems that orchestrate, observe, govern [ 4 ]

PRJ-001

Jira → Terraform Bridge

jira webhooks github actions wif json schema terraform

A serverless bridge between Jira and a Terraform monorepo. Jira fires a JSON webhook on ticket transition; a GitHub Actions workflow receives it, validates the payload against a JSON Schema, generates a YAML request from the ticket fields, opens a PR, runs terraform plan, and posts the plan back into the ticket as a comment. Ticket approval triggers the apply workflow, which authenticates to GCP, AWS, and OCI via Workload Identity Federation — no long-lived keys, no service-account JSON files, no static cloud credentials anywhere in CI.

$ request lead-time . . . . . from 4 days → 38 min
$ ticket-to-resource trace . . 100% auditable in jira + git
$ auth model . . . . . . . . . wif · zero long-lived cloud creds
$ manual provisioning . . . . . eliminated

PRJ-008

Blast Radius — Jira Forge App

atlassian forge typescript jira cloud yaml change mgmt

An Atlassian Forge app that turns production deployment tickets in Jira into risk briefings. Every application repository ships a blast-radius.yaml declaring service information, owners, severity tier, and upstream / downstream dependencies. When a production deploy ticket is opened, the Forge app reads the YAML from the repo's main branch and renders a dynamic panel inside the issue view: what's changing, what depends on it, who owns each dependency, and the resulting blast radius. Approvers see the real risk profile before they click approve — not a hunch, not a wiki page someone last edited two years ago.

$ platform . . . . . . . . . . atlassian forge · jira cloud · typescript
$ yaml contract . . . . . . . per-repo, version-controlled, team-owned
$ approver context . . . . . . deps · owners · severity · downstream
$ change incidents . . . . . . caught at approval, not after deploy

PRJ-009

Multi-Cloud Resource Inventory

inventory gcp aws oci terraform state

A queryable catalog of every resource running across GCP, AWS, and OCI. The inventory ingests from three sources of truth: Terraform state (for what the platform believes exists), provider APIs (for what actually exists — and therefore drift), and the per-repo blast-radius.yaml files (for ownership and dependency metadata). Resources are indexed by owner, cost center, environment, tier, region, and dependency graph edges. The catalog feeds cost attribution reports, drift detection, the Blast Radius Forge app, and the access framework's policy decisions.

$ cloud coverage . . . . . . . gcp · aws · oci
$ data sources . . . . . . . . tf state · provider apis · repo yaml
$ queryable by . . . . . . . . owner · cost center · tier · env · region
$ drives . . . . . . . . . . . blast radius · cost · drift · access policy

PRJ-011

Cloud Cost Optimization

finops infracost gcp aws oci karpenter

Cost work that runs continuously, not quarterly. Attribution wired into every IaC tag (team, cost center, environment, tier) so the inventory produces showback / chargeback reports by ownership — not just a giant monthly bill. Infracost runs in CI on every Terraform PR and posts a dollar-delta comment before merge, so cost becomes a code-review concern instead of a billing-cycle surprise. Idle-resource sweeps from the inventory catch the long tail: orphan disks, unused public IPs, stale snapshots, dormant load balancers. Right-sizing recommendations from utilization data feed back into module defaults so the next deploy gets it right by default. For Kubernetes: spot / preemptible node pools and bin-packing through Karpenter (EKS) and cluster autoscaler (GKE / OKE). For data: BigQuery slot reservations vs. on-demand chosen by workload pattern. Commitments — CUDs on GCP, Savings Plans on AWS, RIs on OCI — sized from real consumption, not vendor sales-deck projections.

$ attribution model . . . . . team · cost center · env · tier (via iac tags)
$ pr-time visibility . . . . . infracost preview · dollar delta per change
$ commitments . . . . . . . . cud (gcp) · savings plans (aws) · ri (oci)
$ idle sweep . . . . . . . . . orphan disks · unused ips · stale snapshots
$ k8s spend . . . . . . . . . spot pools · karpenter bin-pack · right-sized requests

// implementations proof at scale [ 1 ]

PRJ-003

OCI Landing Zone & Migration

oci terraform networking migration

Designed and rolled out an OCI landing zone from scratch — compartment hierarchy, tag defaults, network architecture, identity domains, and logging baseline — to support the migration of production workloads from GCP and AWS into OCI. Built the IaC modules from scratch and wrapped them in the same YAML / JSON framework used elsewhere, so the application teams running the migrations shipped OCI infrastructure using a vocabulary they already knew. No second platform to learn, no second framework to maintain.

$ compartments provisioned . . ~80, fully tagged
$ tenancy guardrails . . . . . policy-as-code, drift-checked
$ migration scope . . . . . . . workloads from gcp + aws → oci
$ production cutover . . . . . zero customer impact

// 05 contact

$ let's_talk --about infra

Open to staff/principal DevOps / SRE, platform engineering lead, or cloud architect roles. Remote-first; long-term contracts welcome.

# off-hours: ESP32 + Home Assistant, Node-RED, MQTT. Same patterns as the day job, smaller blast radius.

email →

built in vim, deployed by a robot. last commit: main@a7f3c12 · passing

Lead DevOps / SRE. I build the bridges between systems that refuse to talk.