aarch64 · NVIDIA GB10 · Production Ready · 30+ Containers · Apache 2.0

Private AI Stack. RAG & agents. One command.

AGmind deploys a full private AI platform in ~25 minutes — Dify and RAGFlow workflows, agentic stacks (Open Notebook, DB-GPT, plugin daemon), local vLLM / Ollama inference, Weaviate or Qdrant vectors, Grafana observability, TLS, firewall, backups — all on your hardware. Built for NVIDIA DGX Spark. Zero vendor lock-in, zero cloud subscriptions.

  • ~25 minto a working platform
  • 30+containers orchestrated
  • 11install phases
  • 100%data stays on-prem
root@agmind ~/AGmind bash
$ git clone https://github.com/botAGI/AGmind.git
Cloning into 'AGmind'… done.
$ cd AGmind && sudo bash install.sh
▸ Detecting platform…
aarch64 · Grace ARM 20-core · 128 GiB unified · NVIDIA GB10 (Blackwell)
▸ Running wizard… ~15 questions
Profile: VPS · LLM: vLLM / Gemma-4-26B · Vector: Weaviate
▸ Phase 5/11 · Pulling images
82%
36 / 37 containers healthy
TLS issued · UFW armed · Grafana live
Dify → https://ai.yourdomain.com
Grafana → https://ai.yourdomain.com/grafana
$
GPU UTIL
84%Gemma-4-26B
TOKENS/SEC
23.8tok/s · streaming
CONTAINERS
36 / 37 healthy
DIFY·RAGFLOW·OLLAMA·vLLM·WEAVIATE·QDRANT·GRAFANA·PROMETHEUS·LOKI·ALLOY·DOCLING·LITELLM·TEI·SEARXNG·MINIO·AUTHELIA·OPEN WEBUI·POSTGRES·REDIS·NGINX· DIFY·RAGFLOW·OLLAMA·vLLM·WEAVIATE·QDRANT·GRAFANA·PROMETHEUS·LOKI·ALLOY·DOCLING·LITELLM·TEI·SEARXNG·MINIO·AUTHELIA·OPEN WEBUI·POSTGRES·REDIS·NGINX·
Capabilities

Everything a private AI platform needs — RAG, agents, ops — bundled.

Not a proof of concept. A real stack with agentic orchestration, vector search, TLS, monitoring, backups, 2FA, rate-limiting, and Day-2 operations built-in from phase 1.

RAG & Agent Platform

Dify orchestrator with plugin daemon for tool-using agents + RAGFlow knowledge bases + Open Notebook and DB-GPT for autonomous research and SQL agents + vLLM / Ollama inference + Weaviate or Qdrant + Docling OCR + Crawl4AI + SearXNG + MinIO. Open WebUI is an optional chat front-end. All wired as one Docker Compose project.

30+ containers Single compose file Zero YAML edits

GB10 Memory Tuning

Tuned for the 128 GB unified memory pool — AGmind allocates ~85 GiB to containers and calibrates vLLM context, KV cache, and TEI embeddings against the 121 GiB usable. VLLM_ATTENTION_BACKEND=TRITON_ATTN is set automatically (FP8 FlashInfer is broken on GB10).

Production-Grade Security

30+ Linux capabilities dropped. UFW + fail2ban + Authelia 2FA. Secrets from /dev/urandom, chmod 600, auto-rotated. Nginx rate limiting. SSRF proxy for code sandbox.

Observability, Day 1

Prometheus + Grafana (5 pre-built dashboards) + Loki + Alertmanager. Telegram/webhook alerts. Node Exporter + cAdvisor. Portainer for visual ops.

Deployment Profiles

LAN — internal only, SSH-tunnel admin surfaces. VPS — public domain, automatic Let's Encrypt, Authelia 2FA, LiteLLM gateway on.

Backups & Disaster Recovery

Scheduled backups of PostgreSQL, Redis, volumes. agmind restore replays in minutes. DR-drill script validates the full recovery path on a schedule.

aarch64 · GB10 only · Dual-Spark cluster

x86_64 dropped 2026-04-25. AGmind targets NVIDIA DGX Spark (GB10 / Blackwell) and equivalent aarch64 hosts running DGX OS 7.5.0 (Ubuntu 24.04 arm64) with driver 580.x — do not upgrade past it. Optional dual-Spark master/worker over 200G QSFP.

One Command

From bare metal to RAG in ~25 minutes.

Clone. Run. Answer ~15 questions. The wizard generates configs, pulls images by digest, starts the stack, issues TLS, and prints your credentials. No manual YAML. No dangling state.

  • Auto-detects OS, CPU, GPU, memory, free disk, open ports
  • Installs Docker CE + NVIDIA Container Toolkit
  • Generates .env, nginx, Redis, secrets — all chmod 600
  • Creates admin users for Dify, Grafana, Portainer
  • Runs full healthcheck loop before returning
Interactive install
# Clone and run — the wizard does the rest
$ git clone https://github.com/botAGI/AGmind.git
$ cd AGmind
$ sudo bash install.sh
Non-interactive install (CI / IaC)
$ sudo DEPLOY_PROFILE=lan \
       LLM_PROVIDER=vllm \
       LLM_MODEL=google/gemma-4-26B-A4B-it \
       EMBED_PROVIDER=vllm \
       EMBEDDING_MODEL=deepvk/USER-bge-m3 \
       NON_INTERACTIVE=true \
       bash install.sh
Install Orchestrator

11 phases. ~15 questions. Zero guessing.

Every phase is idempotent and resumable. Fail mid-pull? Re-run the installer and it picks up exactly where it left off.

01

Diagnostics

Detect OS, CPU, GPU, RAM, disk, free ports. Validate minimum requirements before touching the system.

OS · CPUGPU probePort scan
02

Wizard

~15 questions: profile, vector DB, LLM provider, model, embeddings, TLS, monitoring, 2FA, backups, tunnel.

LAN / VPSWeaviate / QdrantOllama / vLLM
03

Docker

Docker CE + NVIDIA Container Toolkit (aarch64 repos). Verifies CDI device discovery for the GB10 GPU and pins driver to the 580.x line.

docker-ce arm64nvidia-ctkCDI · GB10
04

Config

Generates .env, nginx reverse proxy, Redis, secrets (/dev/urandom), LiteLLM routing, Authelia config.

.env (chmod 600)nginx confsecrets
05

Pull

Validates versions.env manifest against the registry. Pulls every image by digest — no :latest ever.

digest-pinnedmirror supportretry
06

Start

docker compose up -d. Creates admin users for Dify and Open WebUI. Waits for initial bootstraps.

compose upDify initadmin user
07

Deploy Peer

Optional dual-Spark cluster mode. Pushes vLLM and dedicated GPU services to the worker node over the 200G QSFP link. Skipped on single-node installs.

cluster · optional200G QSFPpeer vLLM
08

Health

Healthcheck loop per service with a timeout budget. Obtains Let's Encrypt certificate when profile is VPS.

healthchecksTLS issueDNS check
09

Models

Downloads LLM + embeddings to the correct runtime (Ollama registry or HF cache). GPU-aware warm-up. Graceful timeout handling for large models.

Ollama pullHF cachewarmup
10

Backups

Configures the backup system — age encryption, optional S3 upload, cron schedule. Wires agmind backup and agmind restore into systemd timers.

age encryptioncronS3 (optional)
11

Complete

Installs CLI & systemd service. Initializes admin users. Writes credentials.txt (chmod 600). Prints the final endpoints table — including *.local mDNS hosts.

agmind CLIsystemdmDNS endpoints
Day-2 Operations

The agmind CLI — operate without knowing Docker.

One command to see the whole stack. One command to back it up. One command to rotate every secret.

root@agmind ~# agmind status live
 
Built-In Observability

Grafana dashboards. Prometheus metrics. Loki logs. Out of the box.

5 dashboards pre-provisioned. Alerts routed to Telegram or webhook. Node Exporter + cAdvisor + custom GPU exporter.

Last 12h · auto-refresh 5s

Service Status

35

CPU Usage by Service

20%15%10%5%0%
06:0007:0008:0009:0010:0011:0012:0013:0014:0015:0016:0017:00
agmind-ragflow-esagmind-minioagmind-crawl4aiagmind-openwebuiagmind-lokiagmind-ragflow-mysqlagmind-workeragmind-searxngagmind-ssrf-proxyagmind-plugin-daemon

Memory Usage by Service

4 GiB3 GiB2 GiB1 GiB0 B
06:0007:0008:0009:0010:0011:0012:0013:0014:0015:0016:0017:00
agmind-ragflow-esagmind-minioagmind-crawl4aiagmind-openwebuiagmind-lokiagmind-ragflow-mysqlagmind-workeragmind-searxngagmind-ssrf-proxyagmind-plugin-daemon

Network I/O

5 MB/s4 MB/s3 MB/s2 MB/s1 MB/s0 B/s
06:0007:0008:0009:0010:0011:0012:0013:0014:0015:00

Disk Usage

10.7%
Used
10.7%
Used

Disk Free

Available 3.28 TiB Available 3.27 TiB

Disk Total

Total 3.67 TiB Total 3.67 TiB
  • 5 dashboards
  • Loki logs
  • Alertmanager · Telegram + webhook
  • cAdvisor + Node Exporter
  • Custom GPU exporter
System Design

Three Docker networks. 40+ services. Mapped end-to-end.

Nginx is the only public-facing service. Code sandbox reaches the internet only through a Squid SSRF proxy with an allow-list. Everything else lives on the internal backend.

agmind-frontend public · TLS terminates · only ingress
nginx TLS · rate-limit · WAF rules
grafana dashboards (also on backend)
portainer container UI
agmind-backend internal-only · 35+ services · non-routable from outside
Orchestration & agents
  • dify-apiworkflows · RAG · agents
  • dify-workerasync tasks
  • plugin-daemontool plugins · MCP-style
  • ragflowknowledge bases
  • open-notebookresearch agent
  • dbgptSQL agent
  • open webuichat UI (optional)
Inference & extraction
  • vLLMLLM · Gemma-4-26B
  • vLLM-embedembeddings · bge-m3
  • vLLM-rerankreranker
  • ollamalocal model server
  • TEIembed + rerank (CPU)
  • litellmunified gateway
  • doclingOCR · doc → chunks
  • crawl4aiweb ingest
State & storage
  • postgresmetadata · users
  • rediscache · queue · locks
  • weaviatevector store
  • qdrantvector store (alt)
  • minioS3-compatible blobs
  • surrealdbnotebook graph store
  • elasticsearchragflow index
  • mysqlragflow meta
  • searxngprivate web search
Observability & auth
  • prometheusmetrics scrape
  • loki + alloylog pipeline
  • alertmanagerrouting · Telegram
  • cadvisorcontainer metrics
  • node-exporterhost metrics
  • exportersredis · pg · nginx
  • autheliaSSO · 2FA · TOTP
  • certbotLet's Encrypt
ssrf-network isolated · no direct internet · sandbox only
code sandbox dify-sandbox · runs user code
squid SSRF proxy allow-list · forward proxy
approved hosts no other egress

agmind-frontend

Nginx is the only thing exposed externally — it terminates TLS, applies rate-limit and security headers. Grafana and Portainer are dual-homed so they reach the host network without leaking onto the public side.

agmind-backend

Every data-plane service: Dify orchestration, agentic workers (RAGFlow, Open Notebook, DB-GPT), the GPU inference fleet (vLLM × 3 + Ollama + TEI), state stores, and the full observability pipeline. Non-routable from outside — services find each other by Compose DNS.

ssrf-network

Dify's code sandbox runs untrusted user code with zero direct internet. All outbound HTTP is funnelled through a Squid forward proxy with a strict allow-list — the only path off-host.

Measured on real hardware

DGX Spark · Gemma-4-26B · full numbers.

NVIDIA GB10, 128 GB unified memory. fp8 KV cache. 65K context window.

TTFT · streaming 183ms
Single-request throughput 23.5tok/s
3× concurrent aggregate 50tok/s
Context window 65Ktokens
Max concurrency @ 65K 45parallel
Total memory footprint 95GiB
Use cases

Built for engineers who ship AI in production.

ML Engineer
docs docling OCR TEI embed Weaviate vLLM
Data Analyst
chat Dify workflow DB-GPT SQL on prod DB
DevOps
Grafana · Prometheus · Loki · Telegram alerts
Team Lead
private ChatGPT · data never leaves the perimeter
Why AGmind

Cloud API vs. DIY stack vs. AGmind.

The same RAG platform. Three paths. One of them ships before lunch.

Cloud API
OpenAI / Anthropic
DIY stack
compose from scratch
AGmind
one command
Time to first chat10 min2–4 weeks~25 minutes
Data stays on your hardware
Cost at scalePer-token · unboundedHardware + eng timeHardware only
TLS, firewall, 2FAProvider-managedYou write itPhase 8 — automatic
Grafana + Prometheus + LokiManual · daysPre-provisioned
Backups · restore drillsBespoke scriptsBuilt in · cron-scheduled
Vendor lock-inHighNoneNone · Apache 2.0
Secret rotationDashboardManualagmind rotate-secrets
GPU autodetect · VRAM split
Day-2 operations CLIagmind status · doctor · backup
Security

Hardened from phase one. Not an afterthought.

Every service runs with least-privilege. Every secret is 64 random bytes from /dev/urandom. Every public surface is rate-limited.

30+ Linux caps dropped

Every container runs with cap_drop: ALL and only the specific capabilities it needs added back. No SYS_ADMIN anywhere.

cap_dropno-new-privilegesread-only rootfs

Authelia 2FA + SSO

Single-sign-on across Dify, Grafana, Portainer, Open WebUI. TOTP or WebAuthn. Session tokens rotated automatically.

TOTPWebAuthnSSO

Squid SSRF proxy

The code sandbox has zero direct egress. All outbound HTTP is funnelled through a Squid proxy with a strict allow-list.

allow-listno egressisolated netns

Nginx rate limiting

Per-IP limits on /v1/chat, /login, /api/*. Burst + sustained buckets. fail2ban bans on 5xx spikes.

limit_reqfail2banUFW

Secrets by design

64-byte secrets from /dev/urandom, written to credentials.txt with chmod 600. Rotation is a single command.

/dev/urandomchmod 600zero downtime rotation

Image pinning by digest

Every image pulled by sha256 digest — never :latest. The versions.env manifest is auditable and reproducible.

sha256reproducibleauditable
FAQ

Answers to what engineers actually ask.

What hardware do I actually need?

Hard requirement: aarch64 + NVIDIA GB10. AGmind targets NVIDIA DGX Spark — 20-core Grace ARM, 128 GB unified memory (AGmind allocates ~85 GiB for containers), Blackwell GB10. x86_64 is unsupported as of 2026-04-25 — the installer refuses to run. Equivalent aarch64 hosts with a Blackwell-class GPU should work but are not in the tested matrix. OS: DGX OS 7.5.0 (Ubuntu 24.04 arm64), driver 580.x — do not upgrade past it.

Does it really work offline / on-prem?

Yes. After phase 5 (image pull) the stack runs without outbound internet — service endpoints resolve via mDNS (agmind-dify.local, agmind-rag.local, …) so you don't even need a DNS server. LAN profile skips Let's Encrypt and uses self-signed TLS. SearXNG can be disabled for zero outbound traffic.

Can I bring my own models?

Absolutely. Any GGUF model for Ollama, any HuggingFace model for vLLM. Qwen, Llama, Gemma, Mistral, DeepSeek — all tested. LiteLLM lets you route to commercial APIs if you want a hybrid setup.

How do I upgrade without breaking things?

agmind update --check compares your pinned digests with upstream. agmind update --apply pulls new digests, restarts services in dependency order, and runs the healthcheck loop. If anything fails, rollback is one command.

Will it run on x86_64, RTX 40-series, AMD ROCm, or Apple Silicon?

No. AGmind dropped x86_64 on 2026-04-25 and is not built for ROCm or Apple Silicon. The installer is gated on aarch64 + NVIDIA GB10. Notable arm64 caveats handled automatically: Dify sandbox is amd64-only and runs through QEMU; RAGFlow uses a community-built arm64 image (ar2r223/ragflow-spark); TEI rerank stays on CPU where no arm64 GPU image exists.

Can I use it commercially?

Yes. AGmind is Apache 2.0. Ship products on top of it, modify it, re-distribute it. Bundled components carry their own permissive licenses — Dify, vLLM, Ollama, Grafana stack all allow commercial use.

Is there commercial support?

Community support is on GitHub Discussions + Issues. For dedicated engineering, deployments, and SLAs, reach out through the repository.

Start now

Run one command. Own the entire stack.

Apache 2.0. No vendor lock-in. 30+ containers on your hardware, TLS on day one, Grafana on day one, backups on day one.

View on GitHub
$ git clone https://github.com/botAGI/AGmind.git && cd AGmind && sudo bash install.sh