Generating reports from Unimus job failures

“Why do our backups fail, Bruce? So that we can learn to fix them.”

Intro

With hundreds or even many thousands of jobs daily for some Unimus users, there are bound to be a few failed ones. Jobs such as Discovery or Backup fail for various reasons related to connection errors, refused credentials or unsupported devices. The results of failed jobs are all captured and error logs are viewable in the web GUI. From error logs one can learn the details of why a discovery or a backup for a device failed and use the information to remedy the situation.

In this brief article we will go over generating job report exports. These export files can then be processed externally by a monitoring system, reporting system, by an external parser, or by hand. But first, the error log data needs to be retrieved. The data is stored in the database, so to access it we need the credentials to the database and a few well constructed queries. We will be showing how to generate reports of failed jobs and review them by hand in a spreadsheets editor (Excel) for its advanced data processing functions.

Getting to the data

Unimus supports MySQL, PostgreSQL and MSSQL relational databases in addition to file based HSQL. The former ones store data in tables with columns and rows. Databases use structured query language (SQL) for data manipulation and querying. We'll use such queries to extract the data about failed jobs and export it to a text file.

Connecting to the DB

In our test scenario an arbitrary Unimus server is using MariaDB, a database system, which is a fork of MySQL. For accessing MariaDB we are going to simply run the mariadb client from shell. Need to specify the host IP, database name and credentials:

mariadb --host=127.0.0.1 --database=unimusmdb --user=will -password

Constructing the query

Let's say we are interested in failed Discovery jobs. So for each device we want to get some info about the device itself and error logs for last job that was a failed Discovery.

We will be using data from tables device, device_history_job and zone. The device table contains useful columns like id, address, description, model, type and vendor. The device_history_job table is populated by useful data in the create_time, error_log, info, job_type, device_id and successful columns and the zone table is used to describe device zone membership.

We SELECT columns we want displayed FROM the tables and LEFT JOIN table device_history_job ON id of 'last device job that was a Discovery' via a subquery and table zone ON id of the zone. Then we filter the results with WHERE by 'failed jobs'. And let's say we want to limit the results to recent ones, e.g. ones that took place in the last week. Our query then might look something like this:

SELECT
  d.id,
  dhj.info,
  DATE_FORMAT(FROM_UNIXTIME(dhj.create_time), '%H:%i:%s %d.%m.%Y'),
  z.name,
  dhj.job_type,
  REPLACE(dhj.error_log, '\r\n', ' ') AS error_log
FROM device d
LEFT JOIN device_history_job dhj ON dhj.id = (
  select id
  from device_history_job
  where d.id = device_id
    and job_type = 'DISCOVERY'
  order by create_time
  desc limit 1)
LEFT JOIN zone z ON z.id = d.zone_id
WHERE dhj.successful = 0
  AND dhj.create_time > UNIX_TIMESTAMP(DATE_ADD(CURDATE(), INTERVAL -7 DAY));
MySQL query for fetching last Discovery jobs within last week if they failed

We have used REPLACE to output the error log on a single line for a more comprehensible way to display results.

Let's name the columns properly and put the query in quotation marks. Now we can feed the query into the mariadb command via --execute option and write the output into a local file:

mariadb --host=127.0.0.1 --user=will --database=unimusmdb --password --execute="SELECT \
  d.id AS \`ID\`, \
  dhj.info AS \`Device info\`, \
  DATE_FORMAT(FROM_UNIXTIME(dhj.create_time), '%H:%i:%s %d.%m.%Y') AS \`Time\`, \
  z.name AS Zone, \
  dhj.job_type AS \`Job type\`, \
  REPLACE(dhj.error_log, '\r\n', ' ') AS \`Error log\` \
FROM device d \
LEFT JOIN device_history_job dhj ON dhj.id = (\
  select id \
  from device_history_job \
  where d.id = device_id \
    and job_type = 'DISCOVERY' \
  order by create_time \
  desc limit 1) \
LEFT JOIN zone z ON z.id = d.zone_id \
WHERE dhj.successful = 0 \
  AND dhj.create_time > UNIX_TIMESTAMP(DATE_ADD(CURDATE(),INTERVAL -7 DAY))" > disco_local.csv
Shell command to query failed Discovery jobs

For failed Backup jobs we would just change the job_type in the WHERE section of the query to 'BACKUP'. Also, since a successful Discovery is a prerequisite for a Backup job, we can select additional columns, like vendor, type and model, to describe the discovered device in more detail. Queries and shell commands for failed backups, along with Postgres and MSSQL versions can be found on our GitHub.

Reviewing the data

We are using OnlyOffice Excel in our scenario because it is free and gets the job done. To view the data simply open the file in your favorite spreadsheets tool and choose 'Tab' as the delimiter:

Alternatively you can import the data using the Data Import feature:

Now select the data and format it as a table for advanced filtering capabilities. You can then filter the results by devices, zones or specific keywords in the error log messages. Here is an example of how to filter the logs based on multiple criteria:

0:00
/

Conclusion

And that's basically it! This short guide should provide a starting point for generating and viewing failed job reports from Unimus. There is also a thread going on our Forum for any feedback, questions or possible improvements.