Recording daily traffic per port on LibreNMS

Background

About a half year ago, I started using LibreNMS, a handy SNMP network device monitoring software. Since then, I use it to manage about 30 Cisco L2/L3 switches (of course, managed), all exist at a single manufacturing site.

Though I joined the site only one year ago, the site itself is decades old and network personnel wasn't always at the site. As a consequence, beyond those 30 Cisco managed switches, there are countless desktop unmanaged switches. Most of them are still 100Mbps. Note that "desktop" is just a category name, actual locations vary. Below desk, on floor, under floor, or over ceiling.

Situation in detail

Our installation of LibreNMS shows some error counts in ports like below.

Error ports sample

I ignored them for a few months, since users didn't report issues. These errors were annoying only for me when I watch LibreNMS. But over time, I gradually noticed a several ports are frequently listed as "Errored" among over 1,000 ports.

However, default 5 minute update cycle is too short to conclude which port to investigate further. Ports once appearing as "errored" often disappears when I reload the WebUI. To prioritize which of them to be searched and replaced, I want to know total traffic and frame error counts per port for longer interval like 1-day or 1-week.

Though there are some requests found as below, not yet implemented in LibreNMS core.

https://community.librenms.org/t/port-errors-page-port-errors-vs-errors-delta/3045

https://community.librenms.org/t/generating-network-monthly-report/1383

I decided to investigate possibility by myself.

LibreNMS architecture

LibreNMS basic architecture

In a sense, LibreNMS is a human friendly viewer of SNMP information. LibreNMS poller collects SNMP information from devices every 5 minute, and then parse them and load them into SQL database (specifically, MariaDB).

Basic architecture is shown on right (image can be enalrged by clicking.)

Information (more specifically, SNMP counters like ifOutErrors and ifOutOctets) about ports are written to ports table, but overwritten everytime poller runs.

This behavior is understandable since LibreNMS primarily uses SQL database for real-time, up-to-date SNMP information. Histories are recorded mostly in RRD.

But for my purpose, numbers are preferred over graphs. So, keeping history of ports table is necessary.

Options considered

Then the next problem is how to keep that history. I did some research and came up with following options.

  1. Writing completely separate script from LibreNMS to collect SNMP counters.
  2. Adding daily process to PHP code.
  3. Audit / history table using SQL Trigger.
  4. System Versioned Tables

Considering factors like below, I concluded to implement this as SQL side.

  1. Maintaining device list separately from LibreNMS would be cumbersome.
  2. LibreNMS PHP code is intended to be automatically updated, and my modification would not be suitable for all users.
  3. SQL trigger is executed with every update, which would be too often.
  4. System Versioned Tables seem not easily handle data handling across versions (my usecase requires tracking each port over days)

Actual implementation: recording part

My implementation is two-step. First, record ports table daily, anyway. Second, analyze that kept table later by Python script. Here's the first part.

Create ports_hist table based on ports as below.

CREATE TABLE ports_hist SELECT *, NOW() AS captured_at FROM ports;

Then, next thing is to record all rows of ports into ports_hist daily. I thought about cron, but discovered MySQL has very nice feature for this particular situation: Event Scheduler. This feature enables to keep my implementation completely in SQL.

Since event scheduler is OFF by default, don't forget to turn it on.

# vi  /etc/mysql/mariadb.conf.d/50-server.cnf
event_scheduler=ON

Then, add daily capturing configuration on MySQL.

CREATE EVENT capture_ports_hist
ON SCHEDULE EVERY 1 DAY STARTS '2024-08-02 00:31'
DO
    INSERT INTO ports_hist SELECT *, NOW() FROM ports;

In the next morning, I could see data like below.

SELECT port_id, `ifOutOctets`, `ifInOctets`,captured_at
FROM ports_hist
WHERE captured_at < '2024-08-03' AND captured_at > '2024-08-02';
+---------+--------------+---------------+---------------------+
| port_id | ifOutOctets  | ifInOctets    | captured_at         |
+---------+--------------+---------------+---------------------+
| 1       | 1243047387   | 1243047387    | 2024-08-02 00:31:00 |
| 2       | 2652452145   | 16019563426   | 2024-08-02 00:31:00 |
| 3       | 35039656033  | 132556547082  | 2024-08-02 00:31:00 |
| 4       | 839009919291 | 25483043737   | 2024-08-02 00:31:00 |
| 5       | 154018108298 | 832475169270  | 2024-08-02 00:31:00 |
+---------+-------------+---------------+---------------------+
...

Before writing Python script to analyze these data, I waited for some more days until some meaningful data piled up. So, analyzing part to be continued in the next article.


Appendix on addtional counters and DB schema (2024-10-15)

One day, I noticed some ports report ifOutDiscards in addition to ifOutErros observed above. Since ifOutDiscards and some other statistics are recorded not in ports table but in ports_statics table, I changed my capturing event scheduler as below.

CREATE EVENT capture_ports_combined_hist
ON SCHEDULE EVERY 1 DAY STARTS '2024-09-30 00:31'
DO
    INSERT INTO ports_combined_hist
        SELECT a.port_id
            ,ifInErrors
            ,ifOutErrors     
            ,ifInOctets
            ,ifOutOctets     
            ,ifInNUcastPkts
            ,ifOutNUcastPkts
            ,ifInDiscards
            ,ifOutDiscards
            ,ifInUnknownProtos
            ,ifInBroadcastPkts
            ,ifOutBroadcastPkts
            ,ifInMulticastPkts       
            ,ifOutMulticastPkts
            ,NOW() AS captured_at
        FROM ports a
            LEFT JOIN ports_statistics b ON a.port_id = b.port_id;

As a result, I have multiple custom tables in librenms schema. But that causes error in LibreNMS ./validate.php. I also think I would add a few more tables like these in future. So, I migrated custom tables to newly created additional schema netadmin_datamart.

Also, I created a MySQL user netadmin to do this kind of analysis with read-only privilege to librenms and read-write to netadmin_datamart.

I would not write details about these processes to keep my article simple.

social