aarch64 · NVIDIA GB10 · Production Ready · 30+ Containers · Apache 2.0

Private AI Stack. RAG & agents. One command.

AGmind deploys a full private AI platform in ~25 minutes — Dify and RAGFlow workflows, agentic stacks (Open Notebook, DB-GPT, plugin daemon), local vLLM / Ollama inference, Weaviate or Qdrant vectors, Grafana observability, TLS, firewall, backups — all on your hardware. Built for NVIDIA DGX Spark. Zero vendor lock-in, zero cloud subscriptions.

Install in one command Star on GitHub

~25 minto a working platform
30+containers orchestrated
11install phases
100%data stays on-prem

root@agmind ~/AGmind bash

$ git clone https://github.com/botAGI/AGmind.git

Cloning into 'AGmind'… done.

$ cd AGmind && sudo bash install.sh

▸ Detecting platform…

✓ aarch64 · Grace ARM 20-core · 128 GiB unified · NVIDIA GB10 (Blackwell)

▸ Running wizard… ~15 questions

✓ Profile: VPS · LLM: vLLM / Gemma-4-26B · Vector: Weaviate

▸ Phase 5/11 · Pulling images

82%

✓ 36 / 37 containers healthy

✓ TLS issued · UFW armed · Grafana live

✓ Dify → https://ai.yourdomain.com

✓ Grafana → https://ai.yourdomain.com/grafana

$ ▍

GPU UTIL

84%Gemma-4-26B

TOKENS/SEC

23.8tok/s · streaming

CONTAINERS

36 / 37 healthy

DIFY·RAGFLOW·OLLAMA·vLLM·WEAVIATE·QDRANT·GRAFANA·PROMETHEUS·LOKI·ALLOY·DOCLING·LITELLM·TEI·SEARXNG·MINIO·AUTHELIA·OPEN WEBUI·POSTGRES·REDIS·NGINX· DIFY·RAGFLOW·OLLAMA·vLLM·WEAVIATE·QDRANT·GRAFANA·PROMETHEUS·LOKI·ALLOY·DOCLING·LITELLM·TEI·SEARXNG·MINIO·AUTHELIA·OPEN WEBUI·POSTGRES·REDIS·NGINX·

Capabilities

Everything a private AI platform needs — RAG, agents, ops — bundled.

Not a proof of concept. A real stack with agentic orchestration, vector search, TLS, monitoring, backups, 2FA, rate-limiting, and Day-2 operations built-in from phase 1.

RAG & Agent Platform

Dify orchestrator with plugin daemon for tool-using agents + RAGFlow knowledge bases + Open Notebook and DB-GPT for autonomous research and SQL agents + vLLM / Ollama inference + Weaviate or Qdrant + Docling OCR + Crawl4AI + SearXNG + MinIO. Open WebUI is an optional chat front-end. All wired as one Docker Compose project.

30+ containers Single compose file Zero YAML edits

GB10 Memory Tuning

Tuned for the 128 GB unified memory pool — AGmind allocates ~85 GiB to containers and calibrates vLLM context, KV cache, and TEI embeddings against the 121 GiB usable. VLLM_ATTENTION_BACKEND=TRITON_ATTN is set automatically (FP8 FlashInfer is broken on GB10).

Production-Grade Security

30+ Linux capabilities dropped. UFW + fail2ban + Authelia 2FA. Secrets from /dev/urandom, chmod 600, auto-rotated. Nginx rate limiting. SSRF proxy for code sandbox.

Observability, Day 1

Prometheus + Grafana (5 pre-built dashboards) + Loki + Alertmanager. Telegram/webhook alerts. Node Exporter + cAdvisor. Portainer for visual ops.

Deployment Profiles

LAN — internal only, SSH-tunnel admin surfaces. VPS — public domain, automatic Let's Encrypt, Authelia 2FA, LiteLLM gateway on.

Backups & Disaster Recovery

Scheduled backups of PostgreSQL, Redis, volumes. agmind restore replays in minutes. DR-drill script validates the full recovery path on a schedule.

aarch64 · GB10 only · Dual-Spark cluster

x86_64 dropped 2026-04-25. AGmind targets NVIDIA DGX Spark (GB10 / Blackwell) and equivalent aarch64 hosts running DGX OS 7.5.0 (Ubuntu 24.04 arm64) with driver 580.x — do not upgrade past it. Optional dual-Spark master/worker over 200G QSFP.

One Command

From bare metal to RAG in ~25 minutes.

Clone. Run. Answer ~15 questions. The wizard generates configs, pulls images by digest, starts the stack, issues TLS, and prints your credentials. No manual YAML. No dangling state.

✓Auto-detects OS, CPU, GPU, memory, free disk, open ports
✓Installs Docker CE + NVIDIA Container Toolkit
✓Generates .env, nginx, Redis, secrets — all chmod 600
✓Creates admin users for Dify, Grafana, Portainer
✓Runs full healthcheck loop before returning

Interactive install

# Clone and run — the wizard does the rest
$ git clone https://github.com/botAGI/AGmind.git
$ cd AGmind
$ sudo bash install.sh

Non-interactive install (CI / IaC)

$ sudo DEPLOY_PROFILE=lan \
       LLM_PROVIDER=vllm \
       LLM_MODEL=google/gemma-4-26B-A4B-it \
       EMBED_PROVIDER=vllm \
       EMBEDDING_MODEL=deepvk/USER-bge-m3 \
       NON_INTERACTIVE=true \
       bash install.sh

Install Orchestrator

11 phases. ~15 questions. Zero guessing.

Every phase is idempotent and resumable. Fail mid-pull? Re-run the installer and it picks up exactly where it left off.

Diagnostics

Detect OS, CPU, GPU, RAM, disk, free ports. Validate minimum requirements before touching the system.

OS · CPUGPU probePort scan

Wizard

~15 questions: profile, vector DB, LLM provider, model, embeddings, TLS, monitoring, 2FA, backups, tunnel.

LAN / VPSWeaviate / QdrantOllama / vLLM

Docker

Docker CE + NVIDIA Container Toolkit (aarch64 repos). Verifies CDI device discovery for the GB10 GPU and pins driver to the 580.x line.

docker-ce arm64nvidia-ctkCDI · GB10

Config

Generates .env, nginx reverse proxy, Redis, secrets (/dev/urandom), LiteLLM routing, Authelia config.

.env (chmod 600)nginx confsecrets

Pull

Validates versions.env manifest against the registry. Pulls every image by digest — no :latest ever.

digest-pinnedmirror supportretry

Start

docker compose up -d. Creates admin users for Dify and Open WebUI. Waits for initial bootstraps.

compose upDify initadmin user

Deploy Peer

Optional dual-Spark cluster mode. Pushes vLLM and dedicated GPU services to the worker node over the 200G QSFP link. Skipped on single-node installs.

cluster · optional200G QSFPpeer vLLM

Health

Healthcheck loop per service with a timeout budget. Obtains Let's Encrypt certificate when profile is VPS.

healthchecksTLS issueDNS check

Models

Downloads LLM + embeddings to the correct runtime (Ollama registry or HF cache). GPU-aware warm-up. Graceful timeout handling for large models.

Ollama pullHF cachewarmup

Backups

Configures the backup system — age encryption, optional S3 upload, cron schedule. Wires agmind backup and agmind restore into systemd timers.

age encryptioncronS3 (optional)

Complete

Installs CLI & systemd service. Initializes admin users. Writes credentials.txt (chmod 600). Prints the final endpoints table — including *.local mDNS hosts.

agmind CLIsystemdmDNS endpoints

Day-2 Operations

The `agmind` CLI — operate without knowing Docker.

One command to see the whole stack. One command to back it up. One command to rotate every secret.

root@agmind ~# agmind status live

Built-In Observability

Grafana dashboards. Prometheus metrics. Loki logs. Out of the box.

5 dashboards pre-provisioned. Alerts routed to Telegram or webhook. Node Exporter + cAdvisor + custom GPU exporter.

Service Status

CPU Usage by Service

20%15%10%5%0%

06:0007:0008:0009:0010:0011:0012:0013:0014:0015:0016:0017:00

agmind-ragflow-esagmind-minioagmind-crawl4aiagmind-openwebuiagmind-lokiagmind-ragflow-mysqlagmind-workeragmind-searxngagmind-ssrf-proxyagmind-plugin-daemon

Memory Usage by Service

4 GiB3 GiB2 GiB1 GiB0 B

06:0007:0008:0009:0010:0011:0012:0013:0014:0015:0016:0017:00

agmind-ragflow-esagmind-minioagmind-crawl4aiagmind-openwebuiagmind-lokiagmind-ragflow-mysqlagmind-workeragmind-searxngagmind-ssrf-proxyagmind-plugin-daemon

Network I/O

5 MB/s4 MB/s3 MB/s2 MB/s1 MB/s0 B/s

06:0007:0008:0009:0010:0011:0012:0013:0014:0015:00

Disk Usage

10.7%

Used

10.7%

Used

Disk Free

Available 3.28 TiB Available 3.27 TiB

Disk Total

Total 3.67 TiB Total 3.67 TiB

5 dashboards
Loki logs
Alertmanager · Telegram + webhook
cAdvisor + Node Exporter
Custom GPU exporter

System Design

Three Docker networks. 40+ services. Mapped end-to-end.

Nginx is the only public-facing service. Code sandbox reaches the internet only through a Squid SSRF proxy with an allow-list. Everything else lives on the internal backend.

agmind-frontend public · TLS terminates · only ingress

nginx TLS · rate-limit · WAF rules

grafana dashboards (also on backend)

portainer container UI

agmind-backend internal-only · 35+ services · non-routable from outside

Orchestration & agents

dify-apiworkflows · RAG · agents
dify-workerasync tasks
plugin-daemontool plugins · MCP-style
ragflowknowledge bases
open-notebookresearch agent
dbgptSQL agent
open webuichat UI (optional)

Inference & extraction

vLLMLLM · Gemma-4-26B
vLLM-embedembeddings · bge-m3
vLLM-rerankreranker
ollamalocal model server
TEIembed + rerank (CPU)
litellmunified gateway
doclingOCR · doc → chunks
crawl4aiweb ingest

State & storage

postgresmetadata · users
rediscache · queue · locks
weaviatevector store
qdrantvector store (alt)
minioS3-compatible blobs
surrealdbnotebook graph store
elasticsearchragflow index
mysqlragflow meta
searxngprivate web search

Observability & auth

prometheusmetrics scrape
loki + alloylog pipeline
alertmanagerrouting · Telegram
cadvisorcontainer metrics
node-exporterhost metrics
exportersredis · pg · nginx
autheliaSSO · 2FA · TOTP
certbotLet's Encrypt

ssrf-network isolated · no direct internet · sandbox only

code sandbox dify-sandbox · runs user code

→

squid SSRF proxy allow-list · forward proxy

→

approved hosts no other egress

agmind-frontend

Nginx is the only thing exposed externally — it terminates TLS, applies rate-limit and security headers. Grafana and Portainer are dual-homed so they reach the host network without leaking onto the public side.

agmind-backend

Every data-plane service: Dify orchestration, agentic workers (RAGFlow, Open Notebook, DB-GPT), the GPU inference fleet (vLLM × 3 + Ollama + TEI), state stores, and the full observability pipeline. Non-routable from outside — services find each other by Compose DNS.

ssrf-network

Dify's code sandbox runs untrusted user code with zero direct internet. All outbound HTTP is funnelled through a Squid forward proxy with a strict allow-list — the only path off-host.

Measured on real hardware

DGX Spark · Gemma-4-26B · full numbers.

NVIDIA GB10, 128 GB unified memory. fp8 KV cache. 65K context window.

TTFT · streaming 183ms

Single-request throughput 23.5tok/s

3× concurrent aggregate 50tok/s

Context window 65Ktokens

Max concurrency @ 65K 45parallel

Total memory footprint 95GiB

Use cases

Built for engineers who ship AI in production.

ML Engineer

→

docs → docling OCR → TEI embed → Weaviate → vLLM

Data Analyst

→

chat → Dify workflow → DB-GPT → SQL on prod DB

DevOps

→

Grafana · Prometheus · Loki · Telegram alerts

Team Lead

→

private ChatGPT · data never leaves the perimeter

Why AGmind

Cloud API vs. DIY stack vs. AGmind.

The same RAG platform. Three paths. One of them ships before lunch.

	Cloud API OpenAI / Anthropic	DIY stack compose from scratch	AGmind one command
Time to first chat	10 min	2–4 weeks	~25 minutes
Data stays on your hardware	–
Cost at scale	Per-token · unbounded	Hardware + eng time	Hardware only
TLS, firewall, 2FA	Provider-managed	You write it	Phase 8 — automatic
Grafana + Prometheus + Loki	–	Manual · days	Pre-provisioned
Backups · restore drills	–	Bespoke scripts	Built in · cron-scheduled
Vendor lock-in	High	None	None · Apache 2.0
Secret rotation	Dashboard	Manual	`agmind rotate-secrets`
GPU autodetect · VRAM split	–	–
Day-2 operations CLI	–	–	`agmind status · doctor · backup`

Security

Hardened from phase one. Not an afterthought.

Every service runs with least-privilege. Every secret is 64 random bytes from /dev/urandom. Every public surface is rate-limited.

30+ Linux caps dropped

Every container runs with cap_drop: ALL and only the specific capabilities it needs added back. No SYS_ADMIN anywhere.

cap_dropno-new-privilegesread-only rootfs

Authelia 2FA + SSO

Single-sign-on across Dify, Grafana, Portainer, Open WebUI. TOTP or WebAuthn. Session tokens rotated automatically.

TOTPWebAuthnSSO

Squid SSRF proxy

The code sandbox has zero direct egress. All outbound HTTP is funnelled through a Squid proxy with a strict allow-list.

allow-listno egressisolated netns

Nginx rate limiting

Per-IP limits on /v1/chat, /login, /api/*. Burst + sustained buckets. fail2ban bans on 5xx spikes.

limit_reqfail2banUFW

Secrets by design

64-byte secrets from /dev/urandom, written to credentials.txt with chmod 600. Rotation is a single command.

/dev/urandomchmod 600zero downtime rotation

Image pinning by digest

Every image pulled by sha256 digest — never :latest. The versions.env manifest is auditable and reproducible.

sha256reproducibleauditable

FAQ

Answers to what engineers actually ask.

What hardware do I actually need?

Hard requirement: aarch64 + NVIDIA GB10. AGmind targets NVIDIA DGX Spark — 20-core Grace ARM, 128 GB unified memory (AGmind allocates ~85 GiB for containers), Blackwell GB10. x86_64 is unsupported as of 2026-04-25 — the installer refuses to run. Equivalent aarch64 hosts with a Blackwell-class GPU should work but are not in the tested matrix. OS: DGX OS 7.5.0 (Ubuntu 24.04 arm64), driver 580.x — do not upgrade past it.

Does it really work offline / on-prem?

Yes. After phase 5 (image pull) the stack runs without outbound internet — service endpoints resolve via mDNS (agmind-dify.local, agmind-rag.local, …) so you don't even need a DNS server. LAN profile skips Let's Encrypt and uses self-signed TLS. SearXNG can be disabled for zero outbound traffic.

Can I bring my own models?

Absolutely. Any GGUF model for Ollama, any HuggingFace model for vLLM. Qwen, Llama, Gemma, Mistral, DeepSeek — all tested. LiteLLM lets you route to commercial APIs if you want a hybrid setup.

How do I upgrade without breaking things?

agmind update --check compares your pinned digests with upstream. agmind update --apply pulls new digests, restarts services in dependency order, and runs the healthcheck loop. If anything fails, rollback is one command.

Will it run on x86_64, RTX 40-series, AMD ROCm, or Apple Silicon?

No. AGmind dropped x86_64 on 2026-04-25 and is not built for ROCm or Apple Silicon. The installer is gated on aarch64 + NVIDIA GB10. Notable arm64 caveats handled automatically: Dify sandbox is amd64-only and runs through QEMU; RAGFlow uses a community-built arm64 image (ar2r223/ragflow-spark); TEI rerank stays on CPU where no arm64 GPU image exists.

Can I use it commercially?

Yes. AGmind is Apache 2.0. Ship products on top of it, modify it, re-distribute it. Bundled components carry their own permissive licenses — Dify, vLLM, Ollama, Grafana stack all allow commercial use.

Is there commercial support?

Community support is on GitHub Discussions + Issues. For dedicated engineering, deployments, and SLAs, reach out through the repository.

Start now

Run one command. Own the entire stack.

Apache 2.0. No vendor lock-in. 30+ containers on your hardware, TLS on day one, Grafana on day one, backups on day one.

View on GitHub

$ git clone https://github.com/botAGI/AGmind.git && cd AGmind && sudo bash install.sh