Systems Management Forums

OME- Discovery/Inventory fails due to services stopping randomly

Systems Management

Systems Management
Dell Systems Management Solutions: Dell OpenManage, iDRAC, Repository Manager, Microsoft SCCM, Chassis Managment Controller, and more

OME- Discovery/Inventory fails due to services stopping randomly

  • Hello, I recently decided to install OMSE onto one of our 2008 r2 servers and try using it to manage some of my servers. I am using the most recent version of OMSE and OMSA on the client servers. i am running into an issue scanning a couple of our subnets for server discovery with the latest version of OpenMange Essentials. I have had success discovering most of our servers, but on two particular subnets I can not get the discovery/inventory scan to complete. It gets to roughly 50-60% done and then says stopped by user, I then get an error at the top right of the OME console that says:

     

    DSM Essentials Network Monitor service is currently not running

    -The DSM essentials network monitor service controls the discovery, inventory, and statusing of all the discovery ranges. When this service goes down, no discovery, inventory or statusing is initiated.

     

    well that explains why my discovery's are failing....can anyone comment on why the services would be hanging up/stopping? I have rebooted the server and tried just restarting the specific services....i can scan single IP addresses, but when i try a subnet i only get partially through before the services crash.

     

    OME seems really cool, and I'd love to get all my servers listed.....any help/comments are appreciated!

  • Thanks for the post and details.  Let's get a bit more info:

    - Number of cores on OME server?

    - Amount of memory on OME server?

    - OME installed on physical or VM?

    - OME db installed locally or remote?

    - How many devices are discovered before you start to discover the misbehaving ones?

    - Is there anything different between the 'bad' servers and the ones that are discovered and acting ok?

    - What protocol is enabled on your discovery wizard for the 'bad' range?

    -Can you select a single server / ip addr from this 'bad' range and try to discover it?

    -On the OME tools screen, there is a disc log checkbox.  You might want to enable that and re-discover and snoop through that log file.  If we end up asking you to open a ticket, this log will probably be useful.

    Let's start with that...

    Thx,

    Rob

  • Thanks for the response Rob, let me do my best to answer those questions

    - Number of cores on OME server? 8

    - Amount of memory on OME server? 4

    - OME installed on physical or VM? Physical

    - OME db installed locally or remote? Remote on a 2008 R2 SQL Enterprise server

    - How many devices are discovered before you start to discover the misbehaving ones? 30-40 devices total, about 6-8 are identified properly

    - Is there anything different between the 'bad' servers and the ones that are discovered and acting ok? nothing different besides the purposes of the servers

    - What protocol is enabled on your discovery wizard for the 'bad' range? Since i am just getting started with this, I am only using SNMP. I didnt even have SNMP configured on my servers until i decided to try OMSE out. Its all fresh at this point.

    -Can you select a single server / ip addr from this 'bad' range and try to discover it?

  • sorry didnt answer that last one..

    I guess its not a bad "range" as the first half of the range is able to be scanned successfully, and the end of the subnet(where there are no servers) is able to be successfully scanned. For some reason i guess there are a couple servers that are causing the services to crash on my OMSE server....

  • Ok, good info.

    I wonder if there is any chance to take a guess at one of the 'bad' servers.  Then type that ip addr into the Dell Troubleshooting Tool (icon on your OME desktop).  The SNMP test should return a result set that includes the version of OMSA installed on your target ('bad') server.

    And...the logs as I mentioned in my first reply might be useful.

    Thx

    Rob

  • Good Morning Rob, thanks for the help, its sure nice to have a responsive support forum like this

    i tried running the diagnostic tool on two of the servers that are in my "problem range" and the first returned the MIB- System name only, and the second returned the system name as well as the omsa version. So, i have to assume that the first server is at least 1 of the culprits, and may be the only one. I installed OMSA on all of my servers on the same day, same version. I can log into OMSA on the problem server and everything looks fine, same version 6.50 as the others.

  • I also looked to enable the discovery logs and i do not see that particular check box. under tools in OMSE i have:

    Enabled, Log Asynchronous calls, Informational, Warning, and Critical. They are all checked by default except Log Asynchronous Calls

  • sorry, the logs export is a funny little arrow on the icon bar on that screen.  I mis-remembered :)

  • Ok, that helps.  Still, a mis-behaving server or two should not cause the service trouble.  But let's keep looking.

    Here is my discovery checklist (some you have done, so ignore)

    Here are the important items to check when you are chasing down OME Discovery and Inventory issues.

    1. Check FAQ 3.6 if inventory is not showing up.

    2. Be sure OMSA is on the managed node (target)  if you are using SNMP/WMI (#3.3)

    3. Be sure you have “accept packets” checked on your managed node (#3.4)

    4. If discovering a 2008 server, see item #3.5

    5. Be sure you check the community string, remember they are *case sensitive* (#3.4).  Did you type it in correctly in the discovery wizard?  Did you type it in correctly in the managed node SNMP properties?  Did you type it in correctly in the TS test screen?

    6. Run the SNMP troubleshooting test for SNMP or WMI.  For SNMP the results *must* include the version of OMSA in the table.  For WMI the results *must* contain the words “Dell Server Agent” for the namespace returned.

    And to emphasize: the snmp community string (mis-spelled, case sensitive) is *often* a trouble spot.

    Rob

  • Yeah, so i kinda feel like an idiot....i double checked my community strings yesterday and somehow i managed to overlook this server...it wasn't capitalized. I updated the community string, and restarted the snmp service. I still get the same result from the snmp test, no OMSA version. Everything else on the discovery checklist is good. I am rerunning the discovery now and will then check the logs. Thanks again Rob

  • Also, you said you are using only SNMP in your discovery wizard.  Just need to confirm...is IPMI selected?  If so, de-select and re-try.

  • Nope, IMPI is not selected.

  • Ha, yeah, easy to miss those.  Maybe also try starting up the OMSA gui on the box directly just to have a look around and see that it installed ok.  Good luck.

  • IPMI, its still early :)

  • I have already tested OMSA on the problem server and i cant see anything wrong from the local OMSA gui. Should i post my log file here? It seems empty, or nearly empty.