cancel
Showing results for 
Search instead for 
Did you mean: 
cancel
5351
Views
5
Helpful
10
Replies

Delete phantom agent connection

Joe Fletcher
Level 1
Level 1

Hi,

 

Got a bit of a problem with 6.2.1 instance. We have(had) an agent registered. The server was decommissioned without first removing it from TIDAL so I now have an agent connection that won't delete.

 

If I try to delete the agent via the Java Client I get the error

Delete node record with ID [84] failed. delete from nodmst where nodmst_id=?.

If I try to delete it directly from the database I get a constraint error.

 

*
ERROR at line 1:
ORA-02292: integrity constraint (TIDAL.SYS_C0030961) violated - child record
found

This is usually expected if there is a job or event definition  or some such hanging around. I can't find anything in job,event,queue, action definitions but I do see references to an orphan job in the logs.

03/06 06:17:23:735[39:MD-26]: (mem=6329344160/19351470080) AgentNodeMessageHandler: orphanJob (Node: 84 (mybadhost)): Total JobRuns 8697

On top of this, having this phantom agent connection around seems to be mucking up my licensing. I have another agent connection down and it won't recover because TIDAL seems to be confused about license status.

 

Anyone got any suggestions on how I might clear this connection?

 

TIA

 

1 Accepted Solution

Accepted Solutions

Just by way of wrapping this one up, we appeared to have some duplicate but orphaned entries in the database. We have to manually delete these after which we were able to clear the connection.

 

We don't have any explanation of exactly how these entries came to be present.

 

View solution in original post

10 Replies 10

Robert Gardner
Cisco Employee
Cisco Employee

You will need to reassign all jobs using that agent to another agent before trying to delete the agent.

 

If all jobs that were running on agent "A" will run on agent "B" then all you need to do is update the connection information from "mybadhost" to "newhostname".  If they are not all going to a new agent you will need to find all these jobs and update the agent for them.

 

Finally file events will need to be updated if using this agent. 

Once all these are updated you will be able to delete the agent.

This is the problem. I can't find any jobs defined to run on that agent.

It appears the machine itself was retired over a year ago so any job data is long gone from retained history. I've got the reference to an orphaned job in the master runtime logs but it has no detail for the job itself outside the run count.

 

I've tried search the database using the machine's nodmst_id but so far everything has come back blank.

 

Any suggests for sql I might try to find the child records?

 

 

 

 

Hi Joe,

 

Is the connection part of any agent lists? If so, try removing it before removing the connection.

If not, it might work if you stop the scheduler and then remove the connection from the database.

Not part of any agent lists. Other than the connection itself I can't see anything referring to the offending server.

 

Can you suggest a direct database query that might isolate the reference?

Is there a relationship between nodmst_id and jobrun that might be searchable?

 

There is

Select * from jobrun where nodmst_id=x;



Where x is the nodmst ID



This will return any jobruns in the table (especially old ones) tied to the nodmst_id.


Tried that already.

 

SQL> select * from jobrun where nodmst_id=84;

no rows selected

This is what's weird. There don't seem to be any jobs related to this connection yet the master logs seem to think there's some orphan job running against it. Any other tables we can search?

Open a TAC case to look at it live. I would say identify the orphan jobs in you see in the log and disable those jobs.


You can find where agents are being used with the following sql statements. This should get you started. If you do not find anything here then just open a case with TAC.



--use to find id (nodmst_id) of agent trying to be deleted

select * from nodmst;



-- use this to find if any jobs are using the agent (replace x with id from first query)

select nodmst_id,* from jobdtl where nodmst_id=x;



-- query to find agent list that is using agent (replace x with id from first query)

select * from nodlstms where nodlstmst_id in (select nodlstmst_id from nodlstdt where nodmst_id=x);



-- Find agents in an agent list

select nodmst_id,* from nodlstdt where nodmst_id=x;



-- query to return any actions that are using the agent (replace x with id from first query)

select nodmst_id,* from tskmst where tskmst_id in (select tskmst_id from trgtskrun where nodmst_id=x);



-- Find agents on task runs

select nodmst_id,* from trgtskrun where nodmst_id=x;



-- Find agents with outages

select nodmst_id,* from nodout where nodmst_id=x;



For your reference a list of the tables with a nodmst_id


dshdtl

dbo

nodmst_id

int

hostedservice

dbo

nodmst_id

int

jobdep

dbo

nodmst_id

int

jobdtl

dbo

nodmst_id

int

jobrun

dbo

nodmst_id

int

msglog

dbo

nodmst_id

int

nodlstdt

dbo

nodmst_id

int

nodmst

dbo

nodmst_id

int

nodmstexport

dbo

nodmst_id

varchar

nodout

dbo

nodmst_id

int

nodres

dbo

nodmst_id

int

owneragt

dbo

nodmst_id

int

resnod

dbo

nodmst_id

int

trgmst

dbo

nodmst_id

int

trgtskrun

dbo

nodmst_id

int

tskmst

dbo

nodmst_id

int

tsksch

dbo

nodmst_id

int

tskvar

dbo

nodmst_id

int

usreqv

dbo

nodmst_id

int

varmst2

dbo

nodmst_id

int

workrunusr

dbo

nodmst_id

int





Thanks Robert,

 

that's quite an extensive list.

 

Also check the msglog for any references to nodmst_id,

 

select * from msglog where nodmst_id=84

 

BR,

Derrick Au

Just by way of wrapping this one up, we appeared to have some duplicate but orphaned entries in the database. We have to manually delete these after which we were able to clear the connection.

 

We don't have any explanation of exactly how these entries came to be present.

 

Review Cisco Networking for a $25 gift card