文件
k3s/flux/README.md

9.7 KiB

Flux GitOps 迁移指南

补充一份面向本地演练和远端平滑切换的执行清单,见 TEST_MIGRATION_PLAN.md

目录结构

flux/
├── clusters/
│   └── dev-cm/                          # 集群级别编排
│       ├── kustomization.yaml           # 资源列表
│       ├── sources.yaml                 # HelmRepository 源
│       ├── kube-system.yaml             # CoreDNS / NodeLocalDNS
│       ├── infra-devops.yaml            # cert-manager / reflector / velero
│       ├── infra-data.yaml              # CNPG / Valkey
│       ├── infra-monitor.yaml           # Loki / Prometheus
│       ├── infra-net.yaml               # Nginx / CrowdSec / Tailscale
│       ├── infra-gitops.yaml            # Gitea
│       └── apps.yaml                    # Halo / RustDesk / Fillcode / SinceAI
├── infrastructure/
│   ├── sources/                         # 所有 HelmRepository 定义
│   ├── kube-system/                     # CoreDNS 自定义 + NodeLocalDNS
│   ├── infra-devops/                    # cert-manager, webhook-dnspod, reflector, velero
│   ├── infra-data/                      # CNPG operator, Barman, PG集群, Valkey
│   ├── infra-net/                       # ingress-nginx, CrowdSec, Tailscale DERP, 证书
│   ├── infra-monitor/                   # Loki, Promtail, Prometheus+Grafana
│   └── infra-gitops/                    # Gitea, Gitea Actions
└── apps/                                # Halo, RustDesk, Whoami, 证书, Ingress

依赖顺序

sources (HelmRepository)
    │
    ├── kube-system (无依赖)
    │
    └── infra-devops (cert-manager → webhook-dnspod → ClusterIssuer, reflector, velero)
            │
            ├── infra-data (CNPG operator → Barman plugin → PG集群 + ObjectStore, Valkey)
            │       │
            │       ├── infra-monitor (Loki → Promtail, Prometheus+Grafana→PG)
            │       │       │
            │       │       ├── infra-net (Nginx, 证书, CrowdSec→Loki+PG, Tailscale)
            │       │       │
            │       │       └── infra-gitops (Gitea→PG+Valkey, Gitea Actions→Gitea)
            │       │
            │       └───────┴── apps (Halo→PG, RustDesk, Whoami, 证书, Ingress)

K3s 保留项

以下资源继续由 K3s HelmChart 管理,不迁移到 Flux

  • k3s/apps/infra/gitops/namespaces.yaml — infra-gitops 命名空间
  • k3s/apps/infra/gitops/flux/helmchart.yaml — flux-operator HelmChart
  • k3s/apps/infra/gitops/flux/flux-instance.yaml — FluxInstance (含 sync 配置)
  • k3s/apps/infra/gitops/flux/networkpolicy.yaml — flux-operator NetworkPolicy
  • k3s/apps/infra/gitops/flux/clusterrolebinding.yaml — flux-web RBAC

迁移步骤

1. 创建 Git 认证 Secret

Flux 需要 HTTPS 凭据来访问 Gitea 仓库。在集群中创建 Secret:

kubectl -n infra-gitops create secret generic flux-git-auth \
    --from-literal=username=<GITEA_USERNAME> \
    --from-literal=password=<GITEA_ACCESS_TOKEN>

2. 确认仓库 URL

检查 k3s/apps/infra/gitops/flux/flux-instance.yaml 中的 sync.url 字段,确保指向正确的 deploy 仓库地址。当前设置为:

sync:
  url: https://git.dev.cm/devcm/deploy.git

如果组织名或仓库名不同,请修改。

3. 提交并推送 Flux 清单

git add flux/
git add k3s/apps/infra/gitops/flux/flux-instance.yaml
git commit -m "feat: 迁移到 Flux GitOps 管理"
git push origin main

4. 应用更新后的 FluxInstance

FluxInstance 的 sync 配置更新后,K3s 会自动检测变更并重新应用。也可以手动触发:

kubectl apply -f k3s/apps/infra/gitops/flux/flux-instance.yaml

这会让 flux-operator 创建:

  • GitRepository/flux — 监听 deploy 仓库
  • Kustomization/flux — 应用 flux/clusters/dev-cm/ 路径下的所有资源

5. 等待 Flux 完成同步

# 查看 GitRepository 状态
kubectl -n infra-gitops get gitrepository flux

# 查看所有 Kustomization 状态
kubectl -n infra-gitops get kustomization

# 查看所有 HelmRelease 状态
kubectl get helmrelease -A

# 实时查看 Flux 事件
kubectl -n infra-gitops get events --sort-by='.lastTimestamp' --watch

等待所有 Kustomization 和 HelmRelease 状态变为 Ready

6. 验证资源被 Flux 接管

对于每个已有的 Helm Release,Flux 会检测到已存在的资源并进行接管(adopt)。验证:

# 检查所有 HelmRelease 是否就绪
kubectl get helmrelease -A -o wide

# 检查某个具体的 release
kubectl -n infra-devops describe helmrelease cert-manager

7. 清理旧的 K3s HelmChart 资源

确认 Flux 已成功接管所有资源后,删除旧的 K3s HelmChart CR(不会影响已部署的应用):

# 列出所有 K3s HelmChart
kubectl get helmchart -A

# 逐个删除(保留 flux-operator
kubectl delete helmchart -n infra-devops cert-manager
kubectl delete helmchart -n infra-devops cert-manager-webhook-dnspod
kubectl delete helmchart -n infra-devops reflector
kubectl delete helmchart -n infra-devops velero
kubectl delete helmchart -n infra-data cloudnative-pg
kubectl delete helmchart -n infra-data cloudnative-pg-plugin-barman
kubectl delete helmchart -n infra-data valkey-cluster-sh
kubectl delete helmchart -n infra-net ingress-nginx
kubectl delete helmchart -n infra-net crowdsec
kubectl delete helmchart -n infra-net tailscale-derp-hk
kubectl delete helmchart -n infra-monitor loki
kubectl delete helmchart -n infra-monitor loki-promtail
kubectl delete helmchart -n infra-monitor prometheus
kubectl delete helmchart -n infra-gitops gitea
kubectl delete helmchart -n infra-gitops gitea-actions
kubectl delete helmchart -n apps fillcode-whoami
kubectl delete helmchart -n apps halo
kubectl delete helmchart -n apps rustdesk

注意: K3s HelmChart 使用 helm.cattle.io/v1 API。删除 HelmChart CR 默认不会卸载已部署的 Helm release。Flux 的 HelmRelease 会接管这些 release 的后续管理。

8. 清理旧的 K3s 清单文件

确认一切正常后,可以移除 k3s/apps/ 中已迁移到 Flux 的文件(保留 flux 相关的):

# 保留以下文件(K3s 继续管理):
# k3s/apps/infra/gitops/namespaces.yaml
# k3s/apps/infra/gitops/flux/

# 其余文件可以删除或归档

资源映射表

原 K3s HelmChart Flux HelmRelease 命名空间
cert-manager cert-manager infra-devops
cert-manager-webhook-dnspod cert-manager-webhook-dnspod infra-devops
reflector reflector infra-devops
velero velero infra-devops
cloudnative-pg cloudnative-pg infra-data
cloudnative-pg-plugin-barman cloudnative-pg-plugin-barman infra-data
valkey-cluster-sh valkey-cluster-sh infra-data
ingress-nginx ingress-nginx infra-net
crowdsec crowdsec infra-net
tailscale-derp-hk tailscale-derp-hk infra-net
loki loki infra-monitor
loki-promtail loki-promtail infra-monitor
prometheus prometheus infra-monitor
gitea gitea infra-gitops
gitea-actions gitea-actions infra-gitops
fillcode-whoami fillcode-whoami apps
halo halo apps
rustdesk rustdesk apps

HelmRelease 内依赖关系

HelmRelease dependsOn
cert-manager-webhook-dnspod cert-manager
cloudnative-pg-plugin-barman cloudnative-pg
loki-promtail loki
crowdsec ingress-nginx, loki (cross-ns)
gitea-actions gitea

注意事项

  1. Helm Release 接管: Flux 默认会检测与 HelmRelease 同名的已存在 Helm release。如果名称不匹配,需要在 spec.releaseName 中指定原始名称。

  2. CRD 管理: cert-manager 和 kube-prometheus-stack 的 HelmRelease 配置了 install.crds: CreateReplaceupgrade.crds: CreateReplace 以确保 CRD 被正确管理。

  3. 跨命名空间引用: 所有 HelmRepository 位于 infra-gitops 命名空间。HelmRelease 通过 sourceRef.namespace: infra-gitops 跨命名空间引用。FluxInstance 配置为单租户模式 (multitenant: false),允许此行为。

  4. kube-system 资源: prune: false 用于 kube-system Kustomization,防止 Flux 意外删除系统资源。

  5. Velero CRD: Velero HelmRelease 保持 upgradeCRDs: false,与原始配置一致。

  6. 敏感信息: 以下 Secret 需要手动维护(不在 Git 中管理):

    • flux-git-auth (Gitea 访问令牌)
    • dnspod-secret (DNSPod API 凭据)
    • s3-devcm-hw (华为云 OBS 凭据)
    • cnpg17-cluster-*-app (PostgreSQL 密码, 由 CNPG 自动管理)
    • valkey-cluster-sh (Valkey 密码)
    • gitea-actions (Gitea Actions runner token)