Category Archives: Troubleshooting
SQL Cluster Failover Issues
Today’s topic is ‘oh crap, I manually failed over my SQL Server cluster during a lunch-time scheduled maintenance window and SQL Server and SQL Server Agent did not come back online.’ The key words in their being OH CRAP!
Looking at the ‘oh so informative cluster events’ I see the following:
The Cluster service failed to bring clustered service or application ‘SQL Server (MSSQLSERVER)’ completely online or offline. One or more resources may be in a failed state. This may impact the availability of the clustered service or application.
That did not help, let’s move onto the Windows Event Viewer. It gave us: “The specified account’s password has expired.” Wait a minute! These are service accounts created by our active directory administrator that are supposed to never expire. Hmmm, I better investigate further. I look at a big group of my SQL Server service accounts and noticed about half of them are set this way. When I questioned the AD administrator, he indicated that he was training a new person and he must have done the half that was incorrect. Problem solved. Thus, it is a good idea to always check to make sure that your accounts are setup properly before you use them in SQL Server.
AD Accounts Are Great, But…
Using Windows Authentication for your database is a great best practice to follow. However, one caveat to remember as one of my administrators found out today. Do not use that account as a login if it is also the account you use for your database.
We setup an AD account for his database, that part is great. He thought that was also the account to use as his application administrative account. Not a good idea because when he fat fingered the password this morning a few times, as people tend to do on Monday mornings when their coffee hasn’t fully kicked in, he locked out his application as well as the database sending 150 alert messages all over the department. This set off a chain reaction where he rebooted his server after unlocked the account and proceeded to call me and tell me that SQL was down and sounded the sky is falling alarm. He overreacted essentially.
When I figured out what he did, I explained the situation and told him that he needed another AD account for his application or I could setup another one for his database. He said he installed his application under that account and cannot change either one. Not the best scenario, but every time you fat finger the account you will bring the world down upon you. Your choice. He chose the road well traveled unfortunately. Enjoy!
Weird DBMail Error in 2008
Found a weirdness this morning that I thought I would share with you. I have recently setup up two new SQL Server 2008 clusters and setup the database mail yesterday so that I can receive a nightly backup report, which is much easier than 50-100 individual emails per server. The database mail worked on the send test email, but was not sending out for jobs.
I talked to our exchange guy to make sure there was nothing on that end preventing the flow. I reviewed the settings and review the SQL agent job. Nothing jumped out at me. The job history showed SQLSTATE 42000, ERROR 14607.
Upon bingling the error code, I found the error to mean incorrect profile name. I named the profile, so now SQL Server is judging my naming convention?
Wait, I scripted out the job from another server, but I changed the names in the script. I rechecked the profile on the source server and it was named “<server name> Mail”. However my profile on the new server was simply named after the server name. Once I changed the name by deleting the profile and creating a new one, the job worked successfully. Attention to details! I hope this helps someone, enjoy!
SQL Nexus, Quirkiness Workaround
Yesterday, I wrote about the “Realtime – Server Status” report quirkiness where essentially the report does not work when you import SQL Server 2008 data and essentially it does not contain any information as to why it does not work, only a blank screen. Craig Purnell (Blog | Twitter) shortly thereafter sent me an updated report library file that works with 2008. That is exciting! I ran a diff on the two files and found that Craig updated two lines with the same change in order for this to work. The filename is “Realtime – Server Status.rdl” located in the Reports subfolder of your SQL Nexus installation folder. Below is the line that needs to be changed (remember to do it in both places):
Original line: SELECT @ts_now = cpu_ticks / CONVERT (float, cpu_ticks_in_ms) FROM sys.dm_os_sys_info
Updated Line: SELECT @ts_now = ms_ticks FROM sys.dm_os_sys_info
Thanks Craig, now I have results that look like the following:
SQL Nexus, Using RML Utilities
Continuing in our journey of SQL Nexus, today we will look at the RML Utilities that we installed prior to the installation of Nexus. While thinking about what to blog about this morning, I decided to search for some background information to integrate into our learning. Once a found an article from the SQL Customer Advisory Team (aka SQL CAT), I thought I would not reinvent the wheel and simply pass this information on to you. The article, Precision Performance for Microsoft SQL Server using RML Utilities 9.0, gives an excellent break down for the uses of the tool. Check it out and enjoy!
SQL Nexus, Read Trace Errors Resolved!
Last Monday we started our SQL Nexus journey together in order for me to refresh my knowledge of this awesome tool. Near the end of last week, the blog series was delayed due to an error with the ReadTrace and its corresponding report. Yesterday and again today, I am troubleshooting our attempt to resume our journey but with little hope for correcting the error. I have posted the error on CodePlex and am awaiting a response.
Yesterday, I referenced a “ReadTrace exit code: -2” error in the ReadTrace log file. I did find one page out there on SQL Server Central where Grant Fritchey (Twitter | Blog) had encountered the problem. After contacting, Grant he told me that the problem that he had was a simple one, he did not have the file and path name correct which corresponded with the file not found error higher up in the ReadTrace log. However, when cutting and pasting this file straight out of the log, the file opens. Strange indeed, but Grant got me thinking and thus reading through all of the files looking for a possible mistake in the scripts. I could not find any.
Keep in mind, this is with running the StartSQLDiagDetailed_Trace2008R2.cmd performance script that I outlined in one of the previous posts. This simply reference XML files that appear to only outline the performance indicators in which we want to monitor. It is in the MSDiagProcs.sql script that it identifies the trace files to load and run. In addition, none of the three scripts will load trace files into the import.
Here is a sample of what is called from the CMD scripts:
H:\PerfScripts\2008R2>”C:\Program Files\Microsoft SQL Server\100\Tools\Binn\SQLdiag.exe” /I “H:\PerfScripts\2008R2\SQLDiagPerfStats_Trace2008.XML” /O “H:\PerfScripts\2008R2\SQLDiagOutput” /P
However, attempts to run this manually through ReadTrace also failed with one of the trace files created by the performance script, such as:
H:\Program Files\Microsoft Corporation\RMLUtils>readtrace.exe -I”H:\PerfScripts\2008R2\SQLDiagOutput\SQLVMDEV10_SQLDIAG__sp_trace.trc” -o”C:\Users\crssqladmin\AppData\Local\Temp\2\RML” -SSQLVMDEV10 -dsqlnexus -E -T18 -f -H!PSSDIAG
However, playing with the file names yielded a different problem. When I renamed the file taking out the double underscore, ‘__’, thus resulting in SQLVMDEV10_SQLDIAGsp_trace.trc and SQLVMDEV10_SQLDIAGsp_trace_blk.trc, which seemed odd to me from the beginning, the ReadTrace load now indicated a permissions error. When I added my administrative account to these two trace files, the ReadTrace worked. Hmm, weird but interesting. I wonder now if the SQL Nexus import would load? See for yourself:
Enjoy and I hope this helps if you encountered the same issue!
SQL Nexus, Read Trace Errors
Last Monday we started our SQL Nexus journey together in order for me to refresh my knowledge of this awesome tool. Near the end of last week, the blog series was delayed due to an error with the ReadTrace and its corresponding report. Today, I am still attempting to resume our journey but with little hope for correcting the error. I have posted the error on CodePlex and am awaiting a response.
The first error I received when loading the trace files in was error 26 which many people translated into a permissions problem on the trace file. I resolved that issue by granting full control to my admin login and SQL Server when that did not resolve the issue. Now I am receiving a “ReadTrace exit code: -2” error in the ReadTrace log file.
At this point, I can find no helpful information on the interwebs. I do find it interesting the that log file also denotes that it cannot find the trace file that it is trying to load, however when I cut and paste the file in question right from the log file it opens in notepad so I am not sure that the error is entirely accurate. I checked the permissions on the trace file and it is consistent with the other files. I’m stumped.
Has anyone else seen this?
SQL Nexus, Delayed with Error
I was hoping to continue our series today but I am running into an error while loading the trace files. I have spent the day with other production issues and have been unable to track down this error with ReadTrace. I hope to be able to blog about it next week. Stay tuned and enjoy your weekend.
SQL Nexus, Data Collection
Monday we started our SQL Nexus journey together in order for me to refresh my knowledge of how to use this awesome tool. On Tuesday, we installed some essential pieces in order to get the SQL Nexus tool installed and ready to use. We detoured on Wednesday to attend the SQL Server 2012 Technology Days at Microsoft in Tampa. Today we return to the fun, so let’s jump right back in!
I realized this morning that I left out another download, the SQL Server 2005/2008 Performance Statistics collection scripts. This has the XML configurations needed to run SQLDiag in order to load performance data into SQL Nexus. Well let me qualify that, these XML files are pre-configured and they can be modified if necessary. These configurations will capture DMVs and also blocking information that can help you troubleshoot performance problems. It will also collect machine information from Perfmon, msinfo32, various event logs, and the profiler trace (unless you run the NoTrace command script). There is more detailed information on customization available on the CodePlex download page (make sure you read the fine print about the download location as it is not the big download button on the right).
Once you run one of the scripts, such as StartSQLDiagDetailed_Trace2008R2.cmd, you will see several lines of preparatory messages followed by “SQLDIAG Collection started. Press Ctrl+C to stop.” At this point you can reproduce your issue, if possible. If not possible, then collect data for a few minutes and then press Ctrl+C to finish the collection and generate the output necessary for SQL Nexus. Be vigilant of running this as the output can grow very large very quickly especially if you running this against a production server. Be careful! Once the dialog closes, then the output is ready to load into SQL Nexus.
Now open SQL Nexus and we can import the data that is output from SQLDiag from the output folder that is generated underneath the folder where you ran the script from, by default. From there we can run some of the reports located on the upper left corner of the interface. Not all of them will apply depending upon what data you collected and which script ran. Below is a sample of the Blocking and Wait Statistics report. This was just a sample ran for this blog, so unfortunately I did not having any blocking issues to show you. I hope this helps your understanding of SQL Nexus. Enjoy!
Breaking from the week-long geek fest, I felt I needed to rant somewhat about vendor provided databases. Working in my environment, the majority of SQL Server databases are vendor provided. We have a few in-house applications but most of those are on our Oracle infrastructure, for now (insert evil laugh).
More often than not it has been my experience that vendors do not understand security and common accepted best practices for SQL Server not to mention the provided installation documentation is usually horrid. It is not too much to ask for vendors to understand the database environment in which they deploy. I am thoroughly frustrated after trying to install a database today from a vendor who after several phone conversations I began to realize that they do not understand how certain aspects of SQL Server work and that scares me thoroughly! End of rant, enjoy!