Just arrived at the office, without breakfast, I had complaint from boss that our website was very slow. Looking at the website, it was very slow indeed, only after a good few minutes, the website only started to load some texts barely while you still can see the rolling icon on web browser's tab.
After looking at application log files, I still couldn't find any issue. In fact, the last recorded data was 2 days ago. Tried to connect to database, it worked. Did a couple of SQL queries to see if I could retrieve data, it worked too. Next, let me tried to check worker process for application pool. It would take a snapshot of current requests associated with the application pool. To my surprise, some of the requests have taken more than 2 minutes and still running.
It left me no choice but to restart the application pool and website. It still having the same issue. This time, I created a dump file (.DMP file) and used an IIS Debug Diagnostics Tool which can be downloaded here for free to analyze hang issue. After a while, a HTML report popped up in Internet Explorer.
It gave me a nice easy to read report, and it seemed that there was thread not releasing a .NET lock. By examining .NET call stack, it was stuck trying to make a call to our web services.
Let's go to Web Service server.
After looking at Task Manager -> Details tab, w3wp.exe for web service was having very high CPU usage! Nearly 50% of CPU was detected for this web worker. Created a dump file and analyzed using the same diagnostics tool. The report showed that it was taking a long time to retrieve an object (BrandFeaturedItem, it's a hint!) from cache. At first, I thought it was due to our own cache engine which is running as a Window Service. The purpose of having cache engine is to enable us to have distributed cache. Restarted the service and waited to see what happened next. Apparently, it seemed to help a bit with speed, but shortly after that, it showed high CPU usage again.
In my mind, I thought perhaps this time I could use the diagnostics tool again to analyze .NET memory issue. In the report, it showed me the most memory consuming .NET object types, very nice. Out of those object types, I saw BrandFeaturedItem again! There were 544309 objects! It was not normal as I would expect to have no more than few hundreds. Without hesitation, I quickly ran a SQL query to retrieve rows from BrandFeaturedItem table in database.
More than 20 thousands rows in this table. Let me see if there were any duplicate rows by running SQL query below.
SELECT BrandId, ProductId, FeaturedItemType, COUNT(*) FROM BrandFeaturedItem GROUP BY BrandId, ProductId, FeaturedItemType HAVING COUNT(*) > 1
There were a lot of duplicate rows indeed. Somewhere in application was creating duplicate rows which I would sort it out later after fixing this urgent issue. Now, I needed to remove all duplicate rows leaving one copy of each. Using Common Table Expressions to create a temporary result set, I would be able to get all duplicate rows leaving only one copy of each.
WTIH BrandFeaturedItemCTE (BrandId, ProductId, FeaturedItemType) AS ( SELECT BrandId, ProductId, FeaturedItemType FROM BrandFeaturedItem GROUP BY BrandId, ProductId, FeaturedItemType HAVING COUNT(*) > 1 ) DELETE FROM BrandFeaturedItem WHERE Id IN ( SELECT Id FROM BrandFeaturedItemCTE cte LEFT JOIN BrandFeaturedItem bfi ON cte.BrandId = bfi.BrandId AND cte.ProductId = bfi.ProductId AND cte.FeaturedItemType = bfi.FeaturedItemType ) AND Id NOT IN ( SELECT MAX(Id) FROM BrandFeaturedItem GROUP BY BrandId, ProductId, FeaturedItemType HAVING COUNT(*) > 1 )
Now, all duplicate rows were gone. By fixing the urgent speed issue, the website now loaded quickly as before. Relieved, now it's time for me to look at the application which could be another post itself to explain how I finally fixed a bug.
Thanks to IIS Debug Diagnostics Tool, they made my life so much easier to search for hang issue. Unlike before, I would need to open Command Prompt to get me a dump file using ADPlus and run WindDbg to evaluate all the threads and stacks.
The process of debugging hang issue was roughly like below.
- Retrieve process id from IIS
- Create crash dump using process id
- Goto c:\program files\debugging tools for windows (x86)
- adplus -hang -p [process id] -o c:\applications\crashdump
- File -> Symbol File Path ->
- Run command in windbg
- .load c:\windows\microsoft.net\framework\v4.0.30319\sos.dll
- Evaluate all threads
- ~* e !clrstack
- Finding which thread is holding the log, look for first number at "Info"
- Goto the thread [number]
- Under the thread number context 0:[number]>
Now, I just need to use IIS Debug Diagnostics Tool and it will just get me a nice easy-to-read report which will show me where to look for. Timewise, it saves me lot of efforts. However, knowing how to use ADPlus and WinDbg is a plus, just in case the tool is not available to use.