Hello,
I just wanted to share our way to Monitor the Backup Solution "BackupExec" from Symantec:
We tried to Setup an Eventlog Sensor first but that was not very successful Then we tried the way with the Email Sensor. We just didnt like the Idea of that solution. In our opinion this would easily lead to mistakes
Then we just set up an SQL Sensor to query the Backup Database directly and it is working perfectly now.
Here is how we have Setup the Sensor:
Sensor Type: Microsoft SQL sensor
Of course you have to fill in the Instance and Database Name and the Login Type of the BackupExec Database (can differ from Installations)
SQL Expression: Thats the Point
SELECT FinalJobStatus FROM JobHistorySummary WHERE ActualStartTime > GETDATE() - 7 AND JobName LIKE 'JOBNAME' ORDER BY ActualStartTime DESC
Here you have to edit 2 Settings:
GETDATE() - 7 --> Here you can Setup how many days the Query will look back.
If you have an daily Job you can set it to 2
If you have a Job running only once per week you should set to 8 or so, just test it out
Jobname LIKE 'JOBNAME' --> Here you have to set what the Job is called directly in BackupExec - You can work with % and so
So basically this Expression gives you the Job Status of the last Job depending on the getdate and Jobname Function you set
To get good Alarms set the below Settings:
Postprocessing: Process numerical results
Condition warning equals 3
Condition Error equals 6
This way we are able to Monitor all of our different Backupjobs from Backupexec, good luck with that
Article Comments
We monitor backups a little bit more advanced - I thought I should share this knowledge as well...
Single Job Monitoring: Monitor a single job and it's results - allows you configure the Job-Name with a PRTG filter value in the SQL sensor. The results will include various values - most notable are:
- FinalJobStatus (as text)
- TotalDataSizeBytes
- TotalNumberOfDirectories
- TotalNumberOfFiles
- TotalRateMBMin
In order to implement this sensor, add a SQLv2 Sensor and configure like this:
- Database: BEDB
- Instance Name: BKUPEXEC (in most cases)
- Use input parameter: specify the exact job name
- DBNull = Error
Channels: Most channels are rather simple to configure, they are counters, SpeedDisk or BytesDisk - as PRTG has those channel types integrated already. The special channel is FINALJOBSTATUS - in order to have this working you will need the "backupexec.jobstatus.ovl" file in your %programfilesx86%\PRTG...\Lookups directory - see below for the file.
SQL Script for single job monitoring:
SELECT TOP 1 J.JobName, H.TotalDataSizeBytes, H.TotalRateMBMin, H.TotalNumberOfFiles, H.TotalNumberOfDirectories, H.FinalJobStatus, H.ActualStartTime, H.EndTime, H.ElapsedTimeSeconds FROM Jobs J LEFT JOIN TaskDefinition T ON T.JobDefinitionID = J.JobDefinitionID LEFT JOIN Schedule S ON S.ScheduleId = T.ScheduleID LEFT JOIN JobHistorySummary H ON H.JobID = J.JobID WHERE S.SystemId IS NOT NULL AND H.ActualStartTime > GETDATE()-8 --adjust this value if needed, 8 helps for the last 8 days.. weekly backups (full backups) AND J.JobName LIKE @prtg ORDER BY H.ActualStartTime DESC
The backupexec.jobstatus.ovl file:
<?xml version="1.0" encoding="UTF-8"?> <ValueLookup id="backupexec.jobstatus" desiredValue="1" undefinedState="error" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:noNamespaceSchemaLocation="PaeValueLookup.xsd"> <Lookups> <SingleInt state="warning" value="1"> Job Cancelled </SingleInt> <SingleInt state="Ok" value="2"> Job Completed </SingleInt> <SingleInt state="warning" value="3"> Job Exception </SingleInt> <SingleInt state="warning" value="4"> Job Dispatched </SingleInt> <SingleInt state="warning" value="5"> Job On Hold </SingleInt> <SingleInt state="error" value="6"> Job Error </SingleInt> <SingleInt state="error" value="7"> Job Invalid Schedule </SingleInt> <SingleInt state="error" value="8"> Job Invalid Time Window </SingleInt> <SingleInt state="error" value="9"> unknown status code (9) </SingleInt> <SingleInt state="error" value="10"> Job Missed </SingleInt> <SingleInt state="warning" value="11"> Job Paused </SingleInt> <SingleInt state="warning" value="12"> Job Pending </SingleInt> <SingleInt state="warning" value="13"> Job Recovered </SingleInt> <SingleInt state="error" value="14"> Job Resource Disabled </SingleInt> <SingleInt state="warning" value="15"> Job Resumed </SingleInt> <SingleInt state="Ok" value="16"> Job Running </SingleInt> <SingleInt state="Ok" value="17"> Job Ready </SingleInt> <SingleInt state="Ok" value="18"> Job Scheduled </SingleInt> <SingleInt state="Ok" value="19"> Job Success </SingleInt> <SingleInt state="warning" value="20"> Job Superceded </SingleInt> <SingleInt state="error" value="21"> Job Abort (Time) </SingleInt> <SingleInt state="warning" value="22"> Job To Be Scheduled </SingleInt> <SingleInt state="warning" value="23"> Job Linked to Another Job </SingleInt> <SingleInt state="error" value="24"> unknown status value (24) </SingleInt> <SingleInt state="error" value="25"> Job Blocked by Template Rule </SingleInt> </Lookups> </ValueLookup>
Another SQL script we use is the one below - this actually approaches the whole monitoring more in an overview - it still depends on the JobHistory table, meaning, the job must have been running.. in theory you could work around this and actually get information on the scheduler etc.. the script below is a pure example.
DECLARE @JobFilter AS VARCHAR(500) DECLARE @TimeWindow AS Integer SET @JobFilter = '%-FULL' -- we only want to see jobs that with "-FULL" at the end of the job name / use @PRGT if you want to control it in PRTG SET @TimeWindow = 7 --7 days in each direction for history and scheduled jobs SELECT COUNT(J.JobName) AS TotalJobs, SUM(CASE WHEN H.IsJobActive = 1 THEN 1 ELSE 0 END) AS ActiveJobs, -- SUM(CASE WHEN H.FinalJobStatus = 1 THEN 1 ELSE 0 END) AS FinalStatus_JobCancelled, -- SUM(CASE WHEN H.FinalJobStatus = 2 THEN 1 ELSE 0 END) AS FinalStatus_JobCompleted, -- SUM(CASE WHEN H.FinalJobStatus = 3 THEN 1 ELSE 0 END) AS FinalStatus_JobException, -- SUM(CASE WHEN H.FinalJobStatus = 4 THEN 1 ELSE 0 END) AS FinalStatus_JobDispatched, -- SUM(CASE WHEN H.FinalJobStatus = 5 THEN 1 ELSE 0 END) AS FinalStatus_JobOnHold, SUM(CASE WHEN H.FinalJobStatus = 6 THEN 1 ELSE 0 END) AS FinalStatus_JobError, -- SUM(CASE WHEN H.FinalJobStatus = 7 THEN 1 ELSE 0 END) AS FinalStatus_JobInvalidSchedule, -- SUM(CASE WHEN H.FinalJobStatus = 8 THEN 1 ELSE 0 END) AS FinalStatus_JobInvalidTimeWindow, -- SUM(CASE WHEN H.FinalJobStatus = 9 THEN 1 ELSE 0 END) AS FinalStatus_unknownStatusCode9, -- SUM(CASE WHEN H.FinalJobStatus = 10 THEN 1 ELSE 0 END) AS FinalStatus_JobMissed, -- SUM(CASE WHEN H.FinalJobStatus = 11 THEN 1 ELSE 0 END) AS FinalStatus_JobPaused, -- SUM(CASE WHEN H.FinalJobStatus = 12 THEN 1 ELSE 0 END) AS FinalStatus_JobPending, --SUM(CASE WHEN H.FinalJobStatus = 13 THEN 1 ELSE 0 END) AS FinalStatus_JobRecovered, -- SUM(CASE WHEN H.FinalJobStatus = 14 THEN 1 ELSE 0 END) AS FinalStatus_JobResourceDisabled, -- SUM(CASE WHEN H.FinalJobStatus = 15 THEN 1 ELSE 0 END) AS FinalStatus_JobResumed, SUM(CASE WHEN H.FinalJobStatus = 16 THEN 1 ELSE 0 END) AS FinalStatus_JobRunning, -- SUM(CASE WHEN H.FinalJobStatus = 17 THEN 1 ELSE 0 END) AS FinalStatus_JobReady, -- SUM(CASE WHEN H.FinalJobStatus = 18 THEN 1 ELSE 0 END) AS FinalStatus_JobScheduled, SUM(CASE WHEN H.FinalJobStatus = 19 THEN 1 ELSE 0 END) AS FinalStatus_JobSuccess, SUM(CASE WHEN H.FinalJobStatus = 20 THEN 1 ELSE 0 END) AS FinalStatus_JobSuperceded, -- SUM(CASE WHEN H.FinalJobStatus = 21 THEN 1 ELSE 0 END) AS FinalStatus_JobAbort, -- SUM(CASE WHEN H.FinalJobStatus = 22 THEN 1 ELSE 0 END) AS FinalStatus_JobToBeScheduled, -- SUM(CASE WHEN H.FinalJobStatus = 23 THEN 1 ELSE 0 END) AS FinalStatus_JobLinkedToAnotherJob, -- SUM(CASE WHEN H.FinalJobStatus = 24 THEN 1 ELSE 0 END) AS FinalStatus_unknownStatusCode24, -- SUM(CASE WHEN H.FinalJobStatus = 25 THEN 1 ELSE 0 END) AS FinalStatus_JobBlockedByTemplateTule, SUM(CASE WHEN H.FinalJobStatus = 3 THEN 1 ELSE 0 END) AS FinalStatus_Exceptions --3 = With Exceptions -- SUM(CASE WHEN H.FinalJobStatus = 19 THEN 1 ELSE 0 END) AS FinalStatus_SuccessFull --19 = Successful FROM Jobs J LEFT JOIN TaskDefinition T ON T.JobDefinitionID = J.JobDefinitionID LEFT JOIN Schedule S ON S.ScheduleId = T.ScheduleID LEFT JOIN JobHistorySummary H ON H.JobID = J.JobID WHERE S.SystemId IS NOT NULL AND H.ActualStartTime > GETDATE() - @TimeWindow AND J.NextDueDate < GETDATE() + @TimeWindow AND J.JobName LIKE @JobFilter
Finally I wanted to mention what are our real challenges are, and we don't yet have a really good solution: Our backup runs FULL starting Friday evening... during the week we run incremental backups. Now the incremental backups are not as critical... so let's focus on the weekends.
What happened every now and then was that e.g. only a few tapes where write able and other might still have been locked or one of our libraries jammed etc..
In the end, it means - we e.g. came in Monday morning and discovered that 50+ % of the backups did not run.
Now, the question is, how do you monitor this. There are about a 150 jobs - they are stacked on each other. In theory I expect let say 5 running jobs, 0 completed and a 145 pending - starting Friday night - over the weekend this number will change constantly.
What I did not yet find is that a good solution that when Backup Exec waits for user interaction like insert tapes, offline library, etc. does wait for user interaction.
As well as the fact that I can't tell PRTG on Friday I expect e.g. 150 jobs pending, on Saturday 1 PM the number should be more like 75 jobs pending and on Sunday 6 AM is should be down to 50 pending and Sunday 8 PM it should be 0 pending and 150 successful.
This is very granular, making it hard to find a solution. The jobs in our case will not finish - they are within their weekend time-window and will not be auto cancelled and therefor only manually looking in to Backup Exec will tell you if we are making progress or not.
It could be a solution to constantly see if the Total Bytes backed up goes up - but this again is challenging, we would need to compare values over time.. PRTG is as far as I know not directly able to do so and this would mean we would need to have a temp file with values form the last check in some kind of script or database that we would compare too...
So far I did not come up with the ultimate solution - every now and then I think about it a little more.. but well, I am not there yet.
Kind regards
Florian Rossmark
Nov, 2017 - Permalink
Hi there,
That's indeed awesome and probably useful for our other customers :) We'll make sure to share it accordingly when the time comes! Thanks again!
With kind regards,
Stephan Linke, Tech Support Team
Nov, 2017 - Permalink
Hi, i tried to setup but i have some problems with the config from Channel/Sensor. Thanks for help. Greetings
J.Vokoun
Jul, 2018 - Permalink
Could you elaborate the issues you're having with the script (error messages, etc.). Maybe Florian or I are able to help :)
Kind regards,
Stephan Linke, Tech Support Team
Jul, 2018 - Permalink
Hi,
thank you very much for sharing this. Would it be possible for you to post your proceeding as answer to a question? Like in this article?
this way it is possible to tag your answer as best answer and it will give a better overview over the topic.
Thank you!
P.S.: I edited the article due to formatting reasons, no content was changed.
Sep, 2013 - Permalink