📖 Introducere
Prometheus este un sistem open-source de monitorizare și alertare, creat inițial de SoundCloud în 2012 și devenit al doilea proiect incubat de CNCF după Kubernetes. Grafana este platforma de vizualizare care transformă metricile în dashboard-uri uimitoare.
Împreună, Prometheus + Grafana formează stiva de monitorizare preferată de echipele DevOps în 2025, folosită de companii precum Google, Uber, Red Hat și mii de altele.
🏗️ Arhitectura Prometheus
Prometheus folosește un model pull-based - serverul central "trage" metricile de la target-uri (exporters) la intervale regulate. Datele sunt stocate într-o bază de date time-series locală.
- Prometheus Server - colectează și stochează metricile
- Exporters - expun metrici (Node Exporter, Blackbox Exporter, etc.)
- Alertmanager - gestionează alertele (email, Slack, PagerDuty)
- Pushgateway - pentru job-uri scurte (batch)
- Grafana - vizualizare dashboard-uri
📥 Instalare Prometheus
# Descarcă și instalează Prometheus cd /tmp wget https://github.com/prometheus/prometheus/releases/download/v2.53.0/prometheus-2.53.0.linux-amd64.tar.gz tar xvf prometheus-2.53.0.linux-amd64.tar.gz sudo mv prometheus-2.53.0.linux-amd64 /opt/prometheus # Creează utilizator dedicat sudo useradd --no-create-home --shell /bin/false prometheus sudo chown -R prometheus:prometheus /opt/prometheus # Creează fișierul de service systemd sudo nano /etc/systemd/system/prometheus.service
[Unit] Description=Prometheus After=network.target [Service] User=prometheus Group=prometheus Type=simple ExecStart=/opt/prometheus/prometheus --config.file=/opt/prometheus/prometheus.yml --storage.tsdb.path=/opt/prometheus/data Restart=always [Install] WantedBy=multi-user.target
# Pornește Prometheus sudo systemctl daemon-reload sudo systemctl enable prometheus sudo systemctl start prometheus # Verifică statusul sudo systemctl status prometheus # Accesează http://IP_SERVER:9090
📡 Node Exporter - Metrici sistem
# Instalează Node Exporter pe fiecare server monitorizat wget https://github.com/prometheus/node_exporter/releases/download/v1.7.0/node_exporter-1.7.0.linux-amd64.tar.gz tar xvf node_exporter-1.7.0.linux-amd64.tar.gz sudo mv node_exporter-1.7.0.linux-amd64/node_exporter /usr/local/bin/ # Creează service sudo nano /etc/systemd/system/node_exporter.service
[Unit] Description=Node Exporter After=network.target [Service] User=nobody Group=nogroup Type=simple ExecStart=/usr/local/bin/node_exporter [Install] WantedBy=multi-user.target
sudo systemctl daemon-reload sudo systemctl enable node_exporter sudo systemctl start node_exporter # Verifică: http://IP_SERVER:9100/metrics
⚙️ Configurare Prometheus
# /opt/prometheus/prometheus.yml
global:
scrape_interval: 15s
evaluation_interval: 15s
alerting:
alertmanagers:
- static_configs:
- targets: ['localhost:9093']
rule_files:
- "alerts.yml"
scrape_configs:
- job_name: 'prometheus'
static_configs:
- targets: ['localhost:9090']
- job_name: 'node_exporter'
static_configs:
- targets:
- 'server1:9100'
- 'server2:9100'
- 'server3:9100'
- job_name: 'blackbox'
metrics_path: /probe
params:
module: [http_2xx]
static_configs:
- targets:
- https://exemplu.ro
- https://serviciilinux.bluemails.eu
relabel_configs:
- source_labels: [__address__]
target_label: __param_target
- source_labels: [__param_target]
target_label: instance
- target_label: __address__
replacement: localhost:9115
🔔 Alertmanager - Notificări
# Instalează Alertmanager
wget https://github.com/prometheus/alertmanager/releases/download/v0.27.0/alertmanager-0.27.0.linux-amd64.tar.gz
tar xvf alertmanager-0.27.0.linux-amd64.tar.gz
sudo mv alertmanager-0.27.0.linux-amd64 /opt/alertmanager
# Configurare alertmanager.yml
route:
group_by: ['alertname']
group_wait: 10s
group_interval: 10s
repeat_interval: 1h
receiver: 'email'
receivers:
- name: 'email'
email_configs:
- to: 'admin@exemplu.ro'
from: 'alerts@exemplu.ro'
smarthost: 'smtp.gmail.com:587'
auth_username: 'alerts@exemplu.ro'
auth_password: 'parola'
require_tls: true
- name: 'slack'
slack_configs:
- api_url: 'https://hooks.slack.com/services/...'
channel: '#alerts'
# Fișier alerts.yml pentru reguli de alertare
groups:
- name: instance_down
rules:
- alert: InstanceDown
expr: up == 0
for: 5m
labels:
severity: critical
annotations:
summary: "Instance {{ $labels.instance }} down"
description: "{{ $labels.instance }} of job {{ $labels.job }} has been down for more than 5 minutes."
- name: high_cpu
rules:
- alert: HighCPUUsage
expr: (100 - (avg by (instance) (rate(node_cpu_seconds_total{mode="idle"}[5m])) * 100)) > 80
for: 10m
labels:
severity: warning
annotations:
summary: "High CPU usage on {{ $labels.instance }}"
description: "CPU usage is above 80% for more than 10 minutes."
- name: high_memory
rules:
- alert: HighMemoryUsage
expr: (1 - (node_memory_MemAvailable_bytes / node_memory_MemTotal_bytes)) * 100 > 90
for: 5m
labels:
severity: warning
annotations:
summary: "High memory usage on {{ $labels.instance }}"
- name: disk_space
rules:
- alert: DiskSpaceLow
expr: (node_filesystem_avail_bytes{mountpoint="/"} / node_filesystem_size_bytes{mountpoint="/"}) * 100 < 10
for: 5m
labels:
severity: critical
annotations:
summary: "Low disk space on {{ $labels.instance }}"
📈 Instalare Grafana
# Adaugă repository Grafana sudo apt-get install -y software-properties-common wget sudo wget -q -O /usr/share/keyrings/grafana.key https://apt.grafana.com/gpg.key echo "deb [signed-by=/usr/share/keyrings/grafana.key] https://apt.grafana.com stable main" | sudo tee /etc/apt/sources.list.d/grafana.list # Instalează Grafana sudo apt-get update sudo apt-get install -y grafana # Pornește serviciul sudo systemctl daemon-reload sudo systemctl enable grafana-server sudo systemctl start grafana-server # Accesează http://IP_SERVER:3000 (user: admin, pass: admin)
🔌 Conectare Grafana la Prometheus
- Accesează Grafana (http://IP_SERVER:3000)
- Login: admin / admin
- Configuration → Data Sources → Add data source → Prometheus
- URL: http://localhost:9090
- Save & Test
📊 Dashboard-uri predefinite (Grafana Dashboards)
# Importă dashboard-uri gata făcute # Node Exporter Full: 1860 # Prometheus 2.0 Stats: 3662 # Kubernetes Cluster: 6417 # PostgreSQL: 9628 # Nginx: 12708
🐳 Prometheus + Grafana cu Docker Compose
# docker-compose.yml
version: '3.8'
services:
prometheus:
image: prom/prometheus:latest
ports:
- "9090:9090"
volumes:
- ./prometheus.yml:/etc/prometheus/prometheus.yml
- prometheus_data:/prometheus
command:
- '--config.file=/etc/prometheus/prometheus.yml'
- '--storage.tsdb.path=/prometheus'
grafana:
image: grafana/grafana:latest
ports:
- "3000:3000"
environment:
- GF_SECURITY_ADMIN_PASSWORD=admin
volumes:
- grafana_data:/var/lib/grafana
depends_on:
- prometheus
node_exporter:
image: prom/node-exporter:latest
ports:
- "9100:9100"
volumes:
- /proc:/host/proc:ro
- /sys:/host/sys:ro
- /:/rootfs:ro
command:
- '--path.procfs=/host/proc'
- '--path.sysfs=/host/sys'
volumes:
prometheus_data:
grafana_data:
📈 Metrici esențiale de monitorizat
# CPU
(100 - (avg by (instance) (rate(node_cpu_seconds_total{mode="idle"}[5m])) * 100))
# RAM
(1 - (node_memory_MemAvailable_bytes / node_memory_MemTotal_bytes)) * 100
# Disk
(node_filesystem_avail_bytes{mountpoint="/"} / node_filesystem_size_bytes{mountpoint="/"}) * 100
# Network Traffic
rate(node_network_receive_bytes_total[5m])
rate(node_network_transmit_bytes_total[5m])
# Load Average
node_load1, node_load5, node_load15
# System Uptime
time() - node_boot_time_seconds
🎯 Concluzie
Cu Prometheus și Grafana, ai o soluție enterprise-grade pentru monitorizarea întregii infrastructuri:
- ✅ Colectare metrici centralizată
- ✅ Dashboard-uri profesionale
- ✅ Alertare în timp real
- ✅ Scalabilă la mii de servere
- ✅ Comunitate activă și suport larg