Hit a fun error today throwing a Severity 020 error. Here is the error from the Error Log
SQL Server Assertion: File: <qxcntxt.cpp>, line=956 Failed Assertion = '!"No exceptions should be raised by this code"'. This error may be timing-related. If the error persists after rerunning the statement, use DBCC CHECKDB to check the database for structural integrity, or restart the server to ensure in-memory data structures are not corrupted.
The error was happening every hour at 34 minutes past the hour. The problem started at 4:34PM the day before and was consistently happening every hour. I figured I had a SQL Agent job throwing some error or something. However after some searching around this error seemed pretty rare.
The alert notification email had some more clues:
Process ID 67 attempted to unlock a resource it does not own: DATABASE: 30 . Retry the transaction, because this error may be caused by a timing condition. If the problem persists, contact the database administrator.
I figured that the DATABASE: 30 was a reference to a database with the DB_ID of 30. However there was no database with the ID of 30 in the instance. The highest DB_ID was 28. I fired up a SQL Profiler trace to see if I could capture the session and the statement. The Process ID was showing up in the alert email so that helped me track down the statement.
SELECT database_id, object_id, index_id, partition_number, page_io_latch_wait_count, page_io_latch_wait_in_ms FROM sys.dm_db_index_operational_stats(null,null,null,null) WHERE page_io_latch_wait_count >0
We have a Diagnostics tool that was running the SQL Statement above. I ran the statement in SSMS and was able to reproduce the error. So now what. Well I noticed we had some databases that were set to Offline. We are in the process of decommissioning this server. I detached a couple of the databases and the statement completed with no errors.
Above the statement that was erroring out was a query to grab all the databases on the server and throw them in a table. My guess is that the statement used to grab all the databases grabbed one of the Offline Databases by mistake. Then the sys.dm_db_index_operational_stats procedure was executed against the offline database thus throwing this wild error.
One problem with that theory is that the offline database was not being returned in the database list query. So maybe the diganostics tool was running off a cached copy of the database list or something. Not sure. Anyway, after the databases were detached the problem went away.