2024-11-17 Setting Up A Monitoring Suite
I installed Uptime Kuma and Beszel previously. They are running well. Here I am going to try a suite of tools for logs and metrics monitoring. For logging I'll use Promtail, Loki, and Grafana. For metrics, I'll try both the NodeExporter/Prometheus and Telegraf/InfluxDB suites with Grafana. Exploring them is my first step towards more detailed application monitoring.
I investigated a little in this monitoring landscape. Of course there will be a lot of competing tools and technologies, with overlaps between them that need to be sorted out. These particular tools such as Grafana, Loki, Promtail, Prometheus, Node Exporter, Telegraf, and InfluxDB are widely sued with strong support. They all need some time and effort to learn and experiment with. But it will be a good opportunity to appreciate the works in this landscape.
Installation
I actually played with these tools, following their get started guides:
- To install Loki with Docker Compose, check the installation guide, which also contains starter configurations for Grafana and Promtail
- To install InfluxDB with Docker Compose, visit the guide. There is also this guide from the company InfluxData that also contains information about Telegraf installation.
- To install Prometheus and Node Exporter, this guide from Grafana contains a Docker Compose file with all the three services.
Of course there are a lot of information on the web in different flavors. For me I like to start and stop services using docker-compose.yaml
as a unit, so here I just create a folder for each of the containers:
Grafana
To install Grafana, first create the folder:
mkdir grafana
cd grafana
To install Grafana, I just check the Loki nstallation guide and get the reference docker-compose.yaml
as suggested:
wget https://raw.githubusercontent.com/grafana/loki/v3.0.0/production/docker-compose.yaml -O docker-compose.yaml
and create docker-compose.yaml
with the Grafana part only:
networks:
monitoring:
name: monitoring
services:
grafana:
restart: unless-stopped
image: grafana/grafana:latest
container_name: grafana
user: '1000'
environment:
- GF_PATHS_PROVISIONING=/etc/grafana/provisioning
- GF_AUTH_ANONYMOUS_ENABLED=true
- GF_AUTH_ANONYMOUS_ORG_ROLE=Admin
volumes:
- ./data:/var/lib/grafana
ports:
- "3000:3000"
networks:
- monitoring
We will use a new network monitoring
for Grafana and other tools discussed here.
Trying to launch the container, docker compose up -d
will show permission errors. Just change the owner of the created folder data
to the current user, whose UID
is 1000
obtained from the command id -u
.
sudo chown <user id>:<user id> data
Or later if starting over, create the data
folder with proper owner and permissions in advance.
After trying various container installations, we get a feel about how to adjust user and folder permissions for different containers. When getting real, one should look at the information for each container, which user and group it operates on, and map them to host machine properly. It is encouraged not to run containers with superuser privileges.
Launch the container again and visit the site http://192.168.x.x:3000
, assuming it is on a local LAN.
Loki
Create another folder (next to grafana
)
mkdir loki
cd loki
and the docker-compose.yaml
based on the same downloaded file:
networks:
monitoring:
name: monitoring
external: true
services:
loki:
restart: unless-stopped
image: grafana/loki:latest
container_name: loki
ports:
- "3100:3100"
command: -config.file=/etc/loki/local-config.yaml
networks:
- monitoring
Launch the container and shell into it, we can look at the configuration file at /etc/loki/local-config.yaml
.
auth_enabled: false
server:
http_listen_port: 3100
common:
instance_addr: 127.0.0.1
path_prefix: /loki
storage:
filesystem:
chunks_directory: /loki/chunks
rules_directory: /loki/rules
replication_factor: 1
ring:
kvstore:
store: inmemory
schema_config:
configs:
- from: 2020-10-24
store: tsdb
object_store: filesystem
schema: v13
index:
prefix: index_
period: 24h
ruler:
alertmanager_url: http://localhost:9093
We'll create a file ./conf/local-config.yaml
on current folder with the same content, as we want to make it persistent and configurable later. Modify docker-compose.yaml
as:
services:
loki:
restart: unless-stopped
image: grafana/loki:latest
container_name: loki
ports:
- "3100:3100"
volumes:
- ./conf/loki:/etc/loki
command: -config.file=/etc/loki/local-config.yaml
networks:
- monitoring
Check if the service is ready by visiting http://192.168.x.x:3100/ready
, assuming the host is at 192.168.x.x
. You may have to wait a bit for it to reply ready
.
Promtail
Now turn to Promtail. Following the same practice, create a folder:
mkdir promtail
cd promtail
The docker-compose.yaml
looks like:
networks:
monitoring:
name: monitoring
external: true
services:
promtail:
restart: unless-stopped
image: grafana/promtail:latest
container_name: promtail
volumes:
- /var/log:/var/log
- /var/run/docker.sock:/var/run/docker.sock
command: -config.file=/etc/promtail/config.yml
networks:
- monitoring
Launch the container and shell into it (docker-compose exec -it promtail bash
) and see what does the configuration file look like (in /etc/promtail
):
server:
http_listen_port: 9080
grpc_listen_port: 0
positions:
filename: /tmp/positions.yaml
clients:
- url: http://loki:3100/loki/api/v1/push
scrape_configs:
- job_name: system
static_configs:
- targets:
- localhost
labels:
job: varlogs
__path__: /var/log/*log
It seems by default Promtail already uses http://loki:3100
as one of its clients, so we don't have to change it.
Like the case for Loki, create a file at ./conf/config.yml
with the same content as above, and change the docker-compose.yaml
to mount the folder:
...
promtail:
...
volumes:
- /var/log:/var/log
- /var/run/docker.sock:/var/run/docker.sock
- ./conf:/etc/promtail
command: -config.file=/etc/promtail/config.yml
...
Launch the container again. If there are some errors about "Ingestion rate limit exceeded," you can increase the rate limit for Loki. In Loki's configuration file, add the following section and restart both:
limits_config:
ingestion_rate_mb: 16
ingestion_burst_size_mb: 32
You may have to adjust the parameters or search the web for more help.
To verify if the setup is working, on Grafana's site, add a new Loci data source (in the Connections
page) with the Loki's URL (here http://loki:3100
), and click Save and Test
to see it connects. If so, go to Explore
page, select the Loki data source, in the Label filters
, select job
for the label and varlogs
for as the value, and see of results come out:
Prometheus and Node Exporter
To install Prometheus with Docker, the official page is here, but there is no immediate Docker Compose file for reference (although adapting the docker command is simple). For Node Exporter, the information regarding installation with Docker is here, but it is not recommended to install it as containers. I'd still like to install Node Exporter as a Docker container for now. Later if needed I can always install it on the host.
I just borrow the compose file from Grafana's guide. First create a folder:
mkdir prometheus
cd prometheus
Adapt the docker-compose.yaml
there into:
networks:
monitoring:
name: monitoring
external: true
services:
prometheus:
restart: unless-stopped
image: 'prom/prometheus:latest'
container_name: prometheus
user: '1000'
volumes:
- type: bind
source: ./conf/prometheus.yml
target: /etc/prometheus/prometheus.yml
- ./data:/prometheus
ports:
- 9090:9090
networks:
- monitoring
Also prepare the minimal configuration file at conf/prometheus.yml
:
global:
scrape_interval: 15s
scrape_configs:
- job_name: "prometheus"
static_configs:
- targets: ["localhost:9090"]
Make sure the file conf/prometheus.yaml
and folder data
exist with proper owner and permissions. Launch the container and visit http://192.168.x.x:9090
, which will redirect to http://192.168.x.x:9090/query
.:
Since Prometheus also exposes its own metrics, you can visit the page http://192.168.x.x:9090/metrics
:
This is the the prometheus
job specified in the minimal configuration.
Therefore, we can add a new Prometheus data source on the Grafana site. Like adding the Loki data source, select the Prometheus type and enter the URL http://prometheus:9090
, and click Save and Test
to see if it connects. If so, explore the data by selecting the Prometheus data source, select prometheus_http_requests_total
and see if results come out:
Node Exporter
On a separate folder:
mkdir nodeexporter
cd nodeexporter
Create the docker-compose.yaml
:
networks:
monitoring:
name: monitoring
external: true
services:
nodeexporter:
restart: unless-stopped
image: prom/node-exporter:latest
container_name: nodeexporter
volumes:
- /proc:/host/proc:ro
- /sys:/host/sys:ro
- /:/rootfs:ro
command:
- '--path.procfs=/host/proc'
- '--path.rootfs=/rootfs'
- '--path.sysfs=/host/sys'
- '--collector.filesystem.mount-points-exclude=^/(sys|proc|dev|host|etc)($$|/)'
ports:
- 9100:9100
networks:
- monitoring
Visit the site http://192.168.x.x:9100
to see if the site is up. We can also check its metrics URL at http://192.168.x.x:9100/metrics
:
Accordingly, We can modify Prometheus's configuration by adding the new entry to include the new scrape "target":
global:
scrape_interval: 15s
scrape_configs:
- job_name: "prometheus"
static_configs:
- targets: ["localhost:9090"]
- job_name: "node_self"
scrape_interval: 10s
static_configs:
- targets:
- "nodeexporter:9100"
Re-launch Prometheus again. We can check if Node Exporter is connected with Prometheus by visiting the page http://192.168.x.x:9090/targets
(or click Status / Target health
)
On Grafana we can also explore the data. With the Prometheus data source, select metrics node_cpu_seconds_toatal
, and also select job
as the label and node_self
as the value to see if results show up:
InfluxDB
As mentioned previously, this guide from InfluxData discusses the installation of InfluxDB and Telegraf. First create the folder:
mkdir influxdb
cd influxdb
Since there is no immediate official docker-compose.yaml
, let's try the CLI approach indicated there. Create the initial docker-compose.yaml
:
networks:
monitoring:
name: monitoring
external: true
services:
influxdb:
restart: unless-stopped
image: influxdb:2-alpine
container_name: influxdb
user: '1000'
ports:
- "8086:8086"
volumes:
- ./data:/var/lib/influxdb2:rw
- ./conf:/etc/influxdb2:rw
networks:
- monitoring
For this work, prepare the conf
and data
folder with proper owner and permissions like before.
Launch the container to see if it is up. If successful, run the setup command:
$ docker-compose up -d
$ docker compose exec -it influxdb influx setup
? Please type your primary username jy
? Please type your password ***********
? Please type your password again ***********
? Please type your primary organization name Pointegrity
? Please type your primary bucket name homelab
? Please type your retention period in hours, or 0 for infinite 168
? Setup with these parameters?
Username: jy
Organization: Pointegrity
Bucket: homelab
Retention Period: 168h0m0s
Yes
User Organization Bucket
jy Pointegrity homelab
Visit the site http://192.168.x.x:8086
(assuming on a local LAN):
Looking at the folder ./conf
we see a generated configuration file ./conf/local-configs
.
Telegraf
To install Telegraf, create a folder:
mkdir telegraf
cd telegraf
And create a docker-compose.yaml
:
networks:
monitoring:
name: monitoring
external: true
services:
telegraf:
restart: unless-stopped
image: telegraf:alpine
container_name: telegraf
user: "1000"
volumes:
- ./conf/telegraf.conf:/etc/telegraf/telegraf.conf:ro
- /var/run/docker.sock:/var/run/docker.sock:ro
networks:
- monitoring
According to the guide, to setup Telegraf, we need to create a configuration file telegraf.conf
inside the local folder ./conf
:
[[outputs.influxdb_v2]]
## The URLs of the InfluxDB cluster nodes.
##
## Multiple URLs can be specified for a single cluster, only ONE of the
## urls will be written to each interval.
## urls exp: http://127.0.0.1:8086
urls = ["http://influxdb:8086"]
## Token for authentication.
token = "..."
## Organization is the name of the organization you wish to write to; must exist.
organization = "Pointegrity"
## Destination bucket to write into.
bucket = "homelab"
where the token needs to be generated from InfluxDB.
Unfortunately error messages show up when launching the container. The configuration file is not sufficient. So based on this, we create a more complete configuration file:
$ docker-compose exec -it telegraf config > conf/telegraf.conf
Also set the file owner and permissions right, such as
sudo chown -R <user>:<user> conf/telegraf
Also protect the telegraf.conf
file, which contains tokens. Launch the container and see if it's working.
To see if Telegraf connects InfluxDB, open InfluxDB, in Data Explorer
, select homelab
(or the bucket name you choose), cpu
, usage_system
and Submit
to see if data show up.
To connect Grafana with InfluxDB, also generate an API token on the Load Data
page (with All access API token
), then on Grafana, add a new data source with
- InfluxDB type,
- Flux query language,
- The organization, the generated API token from InfluxDB, and the bucket
Click Sav and Test
to see if it connects correctly, then Explore Data
by choosing the influxdb
data source, and use the Sample query
to query for some data;