Lync Server 2013: Front-end Fails to Start

Received message “Server startup delayed ... fabric pool manager has not finished initial placement of users?” Read our solution!

Hi All – quick writeup today. I’ve been doing a *lot* of work on Lync Server 2013 and Acano integration. As part of this effort I went through I have a simple all-in-one Lync FrontEnd (FE) server running in my environment. I built it all on my OpenStack Icehouse deployment and things looked pretty good…but those of you in the know realize that I really didn’t follow best-practices in doing so. No insult to OpenStack, of course, but really your Lync FE is best placed on a proper organizational DMZ instead of a private internal network…and I don’t think it’s possible to use OpenStack Neutron GRE Networking as I’m doing without that side-effect 😉

So…what did I do? I kept my working Lync FE on OpenStack but simply stopped it. Then I converted the qcow2 to VMware VMDK format, then converted the resulting raw “flat” file to have a proper VMDK disk descriptor, and created a custom VMware VM (attaching my converted disk). Started up my shiny-new converted image on my ESXi 5.5 host, and it All Came Up 🙂 Woohoo!

Except…it didn’t really start. In my case, the error code was Windows Event 32174 as shown below:

Log Name: Lync Server
Source: LS User Services
Event ID: 32174
Task Category: (1006)
Server startup is being delayed because fabric pool manager has not finished initial placement of users.

Currently waiting for routing group: {EF5151C7-B5E1-53B8-9F61-0CC90C82B9F6}.
Number of groups potentially not yet placed: 7.
Total number of groups: 7.
Cause: This is normal during cold-start of a Pool and during server startup.
If you continue to see this message many times, it indicates that insufficient number of Front-Ends are available in the Pool.
Resolution:
During a cold-start of a large Pool it can take upto an hour for the placement process to finish as it needs to populate all the Front-End databases with data from the Backup Store. If the Pool is running and the Front-End is just started, this is normal for some time. If this repeats for a long time, ensure that all the Front-Ends configured for this Pool are up and running. If multiple Front-Ends have been recently decommissioned, run Reset-CsPoolRegistrarState -ResetType QuorumLossRecovery to enable the Pool to recover from Quorum Loss and make progress.

More specifically, the “Lync Server Front-End” Windows service was stuck in Starting... state. Search on that (as well as my specific Windows Event error code 32174) resulted in lots of matches: As you will find mentioned here. And also here. Not to mention here. And certainly not forgetting the problem as documented here.

Wow. That’s a lot of failures!

The majority of the problems indicate a certificates error. Under Windows Server 2012, if you have any intermediate certificates in your local machine’s Trusted CA store then Lync FE will fail to start. There’s even a handy Windows PowerShell script to check for this condition (as well as other misplaced / mismatched certificates). But in my case…nada.

Now – one thing I had to do was re-IP my Lync FE server, simply because the original IP was “owned” by OpenStack Neutron (and thus responses would not route correctly back to the migrated Lync FE). I don’t know if that was the problem; at least one TechNet article indicates that – as long as your DNS records are up-to-date – you can change the Lync FE IP address all you want.

But I still needed to solve my problem!

After much event log research I found the point which led to failures:

Log Name: Lync Server
Source: LS User Services
Event ID: 32131
Task Category: (1006)
Server is waiting for the first run of user replicator task to complete. This could take a while depending on the network connectivity with the Active Directory and the number of users in the deployment. The server will continue to be in the starting phase until this operation completes.

The next “LS User Services” message after that was the 32174 message above. That message kept repeating every 2 minutes. I put on my thinking cap and thought…it’s a user synchronization problem. So I dug around even more and found some messages that indicated database status:

  • Log Name: Lync Server
    Source: LS User Services
    Event ID: 32150
    Task Category: (1006)
    The active user store database server is set when the server first starts up and connects to the user store backend server. The user store backend server can also change when there is a fail over to the mirror user store database server if a mirror is configured. The failover can happen as part of regular maintenance or as part of disaster recovery. If this event is raised during steady state (instead of during startup of server), it indicates a failover.
    
    The active user store database server has been set to LVINLYNCX100\RTC.
  • Log Name: Lync Server
    Source: LS User Services
    Event ID: 32133
    Task Category: (1006)
    Successfully connected to database "rtcxds" using the connection string of 
    driver={SQL Server Native Client 11.0};Trusted_Connection=yes;AutoTranslate=no;server=(local)\rtc;database=rtcxds;
  • Log Name: Lync Server
    Source: LS User Services
    Event ID: 32133
    Task Category: (1006)
    Successfully connected to database "rtcab" using the connection string of 
    driver={SQL Server Native Client 11.0};Trusted_Connection=yes;AutoTranslate=no;server=(local)\rtc;database=rtcab;
  • Log Name: Lync Server
    Source: LS User Services
    Event ID: 30960
    Task Category: (1006)
    Successfully connected to database "rtc" using the connection string of 
    driver={SQL Server Native Client 11.0};Trusted_Connection=yes;AutoTranslate=no;server=(local)\rtclocal;database=rtc;
  • Log Name: Lync Server
    Source: LS User Services
    Event ID: 32135
    Task Category: (1006)
    Aggregation script registered with the server successfully.

From this it certainly appears that all database connectivity is just fine. Finally I stumbled upon the winning article here! that led me to a solution. The article steps you through different debugging processes when your Lync FE fails to start. The fourth step (reset the Lync 2013 FrontEnd Pool) was what worked for me. I simply used:

Reset-CsPoolRegistrarState -PoolFqdn [my Lync FE FQDN] -ResetType FullReset

All of a sudden…the failing Lync Front-End Service started. And I gained a lot of knowledge into how Lync initializes itself as an added bonus.

Happy Computing!

Team-oriented systems mentor with deep knowledge of numerous software methodologies, technologies, languages, and operating systems. Excited about turning emerging technology into working production-ready systems. Focused on moving software teams to a higher level of world-class application development. Specialties:Software analysis and development...Product management through the entire lifecycle...Discrete product integration specialist!

Leave a Reply

Your email address will not be published. Required fields are marked *

*