CPU Cooler heartache


So, Summer is finally deciding to raise the temperature here in Australia right now, which has meant that my normally quiet AMD x2 3800+ has been slowly raising it’s CPU Fan speed from a nice, quiet 3000 rpm to an annoying 5500 rpm screamer. It became unbearable, so I decided to do something about it.

Until I had time to research which heatsink, cooler combo I needed (wanted), I thought I would try a little Arctic Silver on it, to see if it would have any effect. So, I whip out the CPU at about 11pm and dab a little (a little too much) on it and put it all together again. The end result = no difference.

Bummer, I have to spend some money on a Heatsink/Fan combo.

So I search around and me being me Mr. Senor Scrooge, when it comes to spending money on gadgetry (because of it’s high depreciation), I came across the GlacialTech Igloo 7300 Light and while it didn’t offer massive cooling capabilities, it’s claim to fame was being quiet. Another restriction I had was the height because my power supply sits vertical rather than horizontal (ie: Open the case and the PSU covers the CPU) and there’s about 50mm gap there, so I had to stick with something of a similar height to the standard AMD fan. So it was between a Zalman (AUD $38) and the Glacialtech ($22) so the Glacial won.

Got it home and when I pulled out the CPU, the Arctic Silver was all over the bloody thing (late night smear campaign) and as I was cleaning it up I got a tiny bit from my finger on the base between 2 cpu pins (this was discovered later). I put the CPU in and the thing doesn’t post. Aagh.. turn it off, back on again and it boots this time.

Oh it is (was) a beautiful thing to not hear the PC again. Temps are a little higher but not too bad – 48C – So I put it all back together and the thing runs fine for a day and a half until Sunday night, black screen. Try to reboot, no post. Try again, posts and begins to boot but gets to fsck resiser partitions and some really weird stuff goes down. Try to boot to CD (pclinuxos) it begins to boot and resets itself during hardware detection. Start disconnecting everything to boot as bare as possible and it boots up on one drive (out of the raid). Try the other drive and it gets caught in a big loop. I run memtest86 on the pclinuxos disk and it freezes during it. Grab the PSU and Ram from another PC I have and try them one by one. Same thing kind of occurences.

God, this is getting awry. It seems like a motherboard or cpu by now right? But the problem is, which? I neither want to go another day without my PC nor do I want to spend big money on trying each solution. So I opt. for a 3200+ Athlon 939 and an Asus A8S-X, got home and realised that while looking at other M/B’s with AGP, I totally missed that this Mobo used PCI-E instead of AGP – Cursed myself. But the 3200+ was there anyway – part of me wanting it to be the answer (problem solved) part of me not (forgoing a 3800 dual core for a 3200+ single :( )

So, wack in the 3200+ “carefully” of course. The thing boots, runs fine. I start working away leaving the whole case open. Everything’s great – a mixed emotion comes over me – I lost my dual core but pc works.

Fit the PC together and it runs for about 3 hours and then locks up. Reboot, another fsck (and the Virtual Machine inside it) It seems to run better (longer) if I leave the case open and the PSU on the floor next to it but still, it runs for a few hours or up to 20hours and then dies. – Shit.. yeah, so whats the deal now? Motherboard right? but the Asus only takes PCI-E and I really don’t want to go out and buy another Motherboard without REALLY knowing whether that’s the issue.

Taking about 50 steps backward, which only ever seems to really occur after you have tried every other avenue, I ask the question “What changed?” – Let me just try the old AMD CPU Cooler again. Put it under the most duress conditions. Closed the box up, watched a documentary on it, had MythTV index all my MP3’s and 40 hours later now it’s still chugging along – so far, it has been flawless. Could it really be heatsink yet NOT heat related?

My theory (actually hypothesis) is that due to the GlacialTech CPU Cooler having a large taper on it from the CPU socket up to the 120mm FAN. This taper means it comes pretty close to some capacitors on the board. Could it be causing some kind of interference? The other possibility – strange as it may seem is that the 120mm fan is drawing more power than the AMD’s 80mm fan – but seriously, this current draw is so tiny that it shouldn’t really matter. For example, I have tried to run with only 1 hard drive, and a hard drive draws a lot more current than a CPU fan – albeit directly from the PSU rather than the Motherboard.

The PC with the Glacialtech was much more stable with the 3200+ Single Core than the 3800+ dual core. It would barely post with the 3800+, whereas with the Glacialtech CPU Fan the 3200+ would at least run for a few hours.

Tonight, I will be testing the 3800+ X2 with the stock AMD fan if I get a chance, something I haven’t yet tried from the beginning. Good luck to me. Once I sort out this whole saga, I will try and get some photos posted.

======UPDATE 21.01.2007=========

Ok, I have this whole saga completely worked out now and have taken the pics. The CPU cooler is fine, the CPU is fine, the Motherboard is unique but fine. The problem ended up being that the stock AMD cpu cooler pulls the cpu down quite hard and slightly bows the motherboard. The new Glacialtech didn’t put the bow in the board and for some reason my ASRock is addicted to it’s little bowing technique for whatever reason. I spliced the outer casing of an old PS2 mouse and used the rubber as a space between the heatsink and the bracket, which made the bracket sit higher and put more pressure on the motherboard “emulating” the bow that the stock AMD CPU was creating.

I have seen a similar but opposite problem to this on an Abit motherboard where tightening the CPU would causing the center of the memory modules to lose contact because of the bow in the motherboard. They are just something you would never think of.

Technorati Tags: , , ,

Raunchy Launchy & Katapult – Quicksilver for Windows & Linux


I’ve longed for the day when I could do away with my “Programs Menu” altogether. I had been searching for something to do this kind of thing for a while when I came across Quicksilver for Macintosh. Those Macintosh people sicken me. How dare they have something so cool?

If you’re a software whore like me that has a Start Menu that fills an entire 19″ monitor or hate the nested menus that are so common in Linux Distributions, then you will fall in love with Launchy and Katapult.

What do they do? Allow fast access to your applications, bookmarks and other items by using a key combination (Ctrl+Space by default) to launch a transparent popup in the middle of the screen. They index your standard programs menu, which allows you to begin typing the application you want.

For example, if I want to Launch “OpenOffice Writer” and I have “OpenGL” as a menu item, I type the following:

[Ctrl]+[Space], op[spacebar]wr[enter] and my application launches; quick, easy and painless.

While neither are perfect (both are still young projects), they are both free and open source and are now based on a plugin architecture to allow you to index more things. Katapult has a very cool built in calculator, so I can type:

[Ctrl]+[Space], 45*12 and I will automatically see the result. It’s a very handy addition and I find myself using it often.

Both use Ctrl+Space to popup the desktop screen widget by Default. This can cause a conflict when I am using Rdesktop from Linux -> Windows or NoMachine’s NX from Windows to Linux, so I change the Launchy default to use Alt+Space instead and leave Katapult as the default.

I have never used Quicksilver, so I can’t give any real comparison to it. I am certain any Macintosh Geek will tell me it does 100 other cool things that we will never have but then I didn’t pay $3000 for my Dual Core PC, did I ;)

Both Launchy and Katapult are one of those “can’t live without” applications. Check them out.

All that is not one..

“All that is not One must ever
Suffer with the wound of Absence
And whoever in Love’s city
Enters, finds but room for One
And but in One-ness, Union.”

Full Name: Nur ad-Din Abd ar-Rahman (1414-92) Jami was a Persian (Sufi) Poet born near Herat, Afghanistan. Nearly 100 works are attributed to him but is most known for a collection of poems “Haft Aurang [The Seven Thrones]”. He is considered the last great mystical poet of Islam.

My Linux Distributions of Choice

A long, long time ago (actually not THAT long ago) I started to play with Linux. My very first Linux Distribution began with Corel Linux 1.0. I think I did finally get it installed but lo’ and behold that was as far as I got. Knowing very little else to do from there, I wiped that little 2gb hard drive and kept using Windows.

Then came a retry with Redhat 7.3 with some success but I soon came to despise ‘rpm dependency hell’. ie: This package requires this, which requires this, which requires…. there was a point I just gave up trying to install the dependencies and hoped and prayed for the best, or just compile from source wondering if it would ever upset anything else. Actually, I wondered back then why anyone would use RPM’s as it seemed compiling from tarball’s was so much easier.

Anyway, I came across Mandrake 9.1 free on a CD somewhere and discovered URPMI.. oh sweet love it was at first sight. Now, I could actually just find the software I wanted and 95% of the time, it would install just as long as it was in MDK packages. I considered myself a fairly religious Mandrake zealot for sometime, right up to Mandriva 10.1 though always toying with the odd distro, specifically small distributions like Peanut Linux, Vector Linux and Damnsmalllinux.

Fast forward to today and Mandriva (as of fairly recently) is actually no longer in use anywhere:

On my desktop: PCLinuxOS 0.93

Why?
Flash, Java, ATI Drivers are all in Synaptic, Katapult and a very usable Wine package which happens to run a number of my ex-Windows needs.

In a Virtual Machine on my desktop: Suse 10.1

Why? Ifolder is responsible for my move from Mandriva 2006 to Suse 10.1. I couldn’t get it to compile on Mandriva and if you HAVEN’T used IFolder, you are really missing out on something. It also happens to run Communigate Pro as a mail server, Apache 2, Postgres, Mysql and about 40 development websites just nicely.

At Work as a Server: Suse 10.0

Why? Novell being behind them gives me some confidence to sell it to the boss (and I just don’t like Redhat anymore). Additionally, YAST makes it easy for me to take holidays and others to manage the machine.

On my Web Servers: Centos 4.x

Why? Compatibility – a few Cpanel servers I iniherited came with it. When applying updates, patches etc. a standard operating environment simplifies an Admin’s life.

and Windows still has a small place in my heart but mostly just to keep others happy :)

powered by performancing firefox

VMWare, SANs and Replicating on the cheap

Yesterday, I attended the Virtualization Forum here in Sydney and thanks to Raghu Raghuram, VMware’s vice president of datacenter and desktop platform, my eyes were opened about not only the current but additionally the future possibilities of Virtualization. One of my biggest areas of interest in this regard is High Availability and Data Recovery. These two areas can be easily overcome if you throw enough money at it as nearly anything involving the words – “Cluster” or “High Availability” = SAN = $$$.

With VMware’s ESX 3.0 Enterprise Edition x 2 Servers connected to a SAN with Snapshots and Replication abilities – ooh, one is in heaven… but what if you can’t hit your boss up for the $100-200K required to migrate your 25 servers into one big, beautiful HA VMware cluster?

I went on a hunt this morning after the delicious solutions I saw yesterday and whilst nothing is downloaded, installed or tested, I did find some interesting starting points (some of which I had tried to find previously).

My starting points are:

======================================

Openfiler is a powerful, intuitive browser-based network storage distribution. Openfiler, combined with the underlying Linux-based operating system, delivers file-based Network Attached Storage and block-based Storage Area Networking in a single framework. Openfiler is powered by rPath Linux. The entire software stack interfaces with third-party software that is all open source.

Networking protocols supported by Openfiler include: NFS, SMB/CIFS, HTTP/WebDAV, FTP and iSCSI (target). Network directories supported by Openfiler include NIS, LDAP (with support for SMB/CIFS encrypted passwords), Active Directory (in native and mixed modes) and Hesiod. Authentication protocols include Kerberos 5. Openfiler includes support for volume-based partitioning, Ext3, JFS, XFS and Reiserfs as on-disk native filesystem, point-in-time snapshots with scheduling, quota-based resource allocation, and a single unified interface for share management which makes allocating shares for various network file-system protocols a breeze.

An Openfiler based storage implementation can be configured to provide IP-based online volume replication for high-availability and data redundancy. All the tools to do this are entirely available as open source software as a part of the Openfiler distribution.

And as all good projects should have, there’s a Virtual Appliance for it on Sourceforge

======================================

Openfiler uses DRDB for it’s one to one Data replication – As they say below “a network Raid-1”

DRBD is a block device which is designed to build high availability clusters. This is done by mirroring a whole block device via (a dedicated) network. You could see it as a network raid-1.

What is the scope of drbd, what else do I need to build a HA cluster?

DRBD takes over the data, writes it to the local disk and sends it to the other host. On the other host, it takes it to the disk there.

The other components needed are a cluster membership service, which is supposed to be heartbeat, and some kind of application that works on top of a block device.

Examples:

A filesystem & fsck.
A journaling FS.
A database with recovery capabilities.

======================================

And last on my list of goodies Csync2

Csync2 is a cluster synchronization tool. It can be used to keep files on multiple hosts in a cluster in sync. Csync2 can handle complex setups with much more than just 2 hosts, handle file deletions and can detect conflicts. It is expedient for HA-clusters, HPC-clusters, COWs and server farms.

======================================

Can’t wait to test some of the above! Will keep it posted.

Essential Firefox Add-ons for Internet Addicts

Here’s a quick list of my favourite “can’t live without” firefox extensions.

ColorZilla  – In-page Colorpicker / Advanced Eyedropper
Fireftp – A fully featured FTP Add-on
Tab Mix Plus – Better Tab control & Session Support
Web Developer – A brilliant array of tools for Web developers, can’t live without it.
Mouse Gestures – Can’t surf without mouse gestures
InfoRSS – Scrolling RSS feeds in your Firefox status bar
Performancing for Firefox – Blog fast, blog easy, just do it.

Now, that’s why I am hooked on Firefox – no other browser can do that for me – cross platform of course.

Putty -> CentOS / RHEL Termcap linewrap settings

I have this weird line wrapping issue when using putty on CentOS so that when path names exceed 80 characters, it wraps onto the same line and deletes the text showing me my current path.

I am no Termcap guru, but the old-school Unix Termcap gurus here at work gave me this fix that they used when migrating from SCO to Suse.

Continue reading “Putty -> CentOS / RHEL Termcap linewrap settings”

Troubleshooting Slow Logins on Windows 2000 Terminal Services

Troubleshooting Slow Logins on Windows 2000 Terminal Services with roaming profiles / Outlook freezing when opening an attachment from Exchange Server (On Terminal Services and Fat Client).

How are these 2 problems related? First let me describe the environment.

Windows 2000 Active Directory

1x Raid 1 / Raid 5 with 2x Pentium 3 / 1gb Ram server that acts as Windows 2000 Exchange AND File server with SP3 (100Mb/s network)

4x Raid 1 with  2 x Pentium 3 / 1gb Ram Terminal Servers (Load Balanced with 3rd party software, RDP only / no Citrix). Each server averages approx 35-40 simultaneous users, mostly using Terminal Emulation to Unix and Outlook to Exchange Server. All users have Roaming Profiles and use reasonably extensive Group Policies to lock down desktops. 3 x have SP4 and 1 x has SP3 but problem was apparent across all 4 servers.Additionally:

Approx 30 x Fat Clients connecting to Exchange server
Approx 200 Fat Clients using POP3 for Exchange

The terminal servers sit on the same subnet as the Exchange / File Server connected by a 100Mb/s HP Procurve Switch.I will skip the initial testing stages, which is all standard stuff. eg: Memory Usage, CPU usage, Network Congestion etc. Needless to say, all of this did not point to any problems.

Then I got hold of Brian Madden’s “Terminal Server Performance Tuning” document and the first leaf out of that book that I took was to start logging the user login process by adding the following registry entry:

Key: HKLMSoftwareMicrosoftWindows NTCurrentVersionWinlogon
Value: UserEnvDebugLevel
Type: REG_DWORD
Data: 10002 (Hex)

From there, you check for a “userenv.log” file in the %System-Root%DebugUserMode folder

**Remember to turn the loggin off when you’re finished**

The log for each user login is approx 10 pages, so obviously I can’t provide the complete logs.

What I found interesting (or disturbing) in the logfile were the instances below:
1:
USERENV(cab8.d1b0) 14:16:21:015 RecurseDirectory: Adding fileserver1profilesuser1WINDOWS
USERENV(cab8.d1b0) 14:16:57:000 ReconcileFile: fileserver1profilesuser1ntuser.dat
2:
USERENV(d284.d278) 14:20:17:531 RecurseDirectory: Adding fileserver1profilesuser1WINDOWS
USERENV(d284.d278) 14:20:17:671 ReconcileFile: fileserver1profilesuser1ntuser.dat
3:
USERENV(9100.9154) 15:48:16:125 RecurseDirectory: Adding fileserver1profilesuser1WINDOWS
USERENV(9100.9154) 15:48:54:984 ReconcileFile: fileserver1profilesuser1ntuser.dat

Here we see 3 login instances. In sessions 1 and 3, we see a 36 and 38 second delay between the Recurse and the Reconcile, whereas sessons 2: the delay is just more than .140 of a second.. a bit of a difference. But why?

So, I started testing file transfers (2 folders containing 60 files = 60mb) across our network.

Here, our problem raised it’s ugly head on occasion but not always replicable and without any apparent changes occuring. Here’s a brief description of the tests and an outline of what was happening.

fs1->ts1 – ts4

When attempting to copy files from the FS Share to the TS drives, Explorer would initially go blank as if I had not initiated the transfer. Then after 5-10 seconds it would show me “37 minutes” left to transfer. 37 minutes across a 100Mb/s network for 60mb??!! Ok, we have a problem.

But check this out. Whilst that 37 minute transfer is in progress, I can copy those same set of files and folders from fs1 -> dc1 or to a client machine (same switch, same subnet) in less than 20 seconds AND I could copy a similar sized folder/file set from ts1-ts2 (or any combination of Terminal Servers) between each other in 20 seconds or less. All of this whilst the fs1->ts(x) file transfer was still snail pacing along.

Additionally, when running perfmon to watch network activity and see what was happening, everything was idle except for a quick spike every 38-40 seconds (very close to the 36-38 second delay experienced in the login log)

What made it more difficult is that the problem only occured once you had a medium-high number of clients connected (across 15 branches, 1000kms away) and I couldn’t kick them off anytime I needed to make a change or reboot (oh, how I love Windows reboots).

In the meantime, I was searching for articles on Google, Forums, Newsgroups and Microsoft KB looking for any information on this – you know the drill.

Now, Windows Server has a feature in Network Connection (Properties)->File and Printer Sharing for Microsoft Networks (Properties) called “Server Optimization” with 4 choices:

  1. Minimize Memory Used
  2. Balance
  3. Maximize Throughput for Sharing
  4. Maximize Throughput for Network Applications

Microsoft recommends number 3 if it is a File Server but number 4 if it is an Exchange or SQL server. When I approached the problem, the fs1 (Exchange/File Server) was set to 4.

If you want to read more about it, check out these articles.

None of the above had any effect on my problem so I decided in our situation, to leave it on 2: Balance

In doing this, the following day, myself and a number of users got the following error when logging off

“The Network BIOS Command Limit Has Been Reached” ..Yay, at least we are getting an actual error now. :)

A quick search for that gave me Microsoft KB 810886

I describe the important bits below:

After you install Windows 2000 Service Pack 3 (SP3), you may receive the following error message:

The network BIOS command limit has been reached.

This issue may occur if both of the following conditions are true:

Windows 2000-based clients submit multiple and simultaneous long-term requests against a file server that runs Server Message Block (SMB) Service. One example of a long-term request is a client using the FindFirstChangeNotification call to monitor a server share for changes. The MaxCmds registry value setting on the client is lower than 50, or the MaxMpxCt registry value setting on the server is lower than 50.

The maximum number of concurrent outstanding network requests between a SMB client and the server is determined when a client/server session is negotiated. The maximum value that a client supports is determined by the MaxCmds registry value. The maximum value that a server supports is determined by the MaxMpxCt registry value. For a particular client and server pair, the number of concurrent outstanding network requests is the lesser of these two values.

To track the number of concurrent outstanding SMB network requests on a client, add the Current Commands counter in the Redirector performance object to Performance Monitor.

OK, so let’s add the “Current Commands” counter from the Redirector object (*On the ts1-4 NOT on fs1) and guess what? It would hit a ceiling limit of 49 with the occasional jump to 52/53 when trying to perform some operations. When I would try a 60mb file transfer from fs1->ts(x), the current commands would jump but no Bytes get transferred. The Current Commands counter drops back to 49 every 40 seconds, then I would get a spike and some data transferred, it would jump back above 50 and my data would stop getting transferred.

So following the directions outlined in article Microsoft KB 810886 I ended up with the following:

My File/Exchange Server now has the following values:

HKEY_LOCAL_MACHINESYSTEMCurrentControlSetServiceslanmanserverparameters

“MaxWorkItems”=dword 8192 (decimal)
“MaxMpxCt”=dword 512 (decimal)

My Terminal Servers have the following values:

[HKEY_LOCAL_MACHINESYSTEMCurrentControlSetServiceslanmanworkstationparameters]

“MaxCmds”=dword 512 (decimal)

None of these Dword Keys existed, so I had to add them manually and they do not take effect until a reboot. My observations of the Terminal Servers since changing these keys and rebooting is the I can no longer replicate the slow / pausing file transfer. The “Current Commands” counter in Perfmon averages about 210 for 35 users.

The 6:1 ratio seems quite consistent in our environment (10 users / 60 Current Commands) and this may help you adjust your current commands ratio on your Terminal Servers.

Additionally, the “Current Commands” perfmon counter ALWAYS shows 0-1 on the File/Exchange Server, so unless somebody knows otherwise, I can’t help you discover a reasonable figure for that.

Basically, I just hope this information can help someone as it has taken me an unquantifiable amount of hours to research / debug / sort out this problem. If I can pull anymore information out of these servers, I will keep this updated.

 

Technorati Tags: , , ,

Firefox has finally won me over..

Last week, I finally bowed down to the power that is Firefox. So where have I been, you may ask? I have been hanging my Internet based washing on Opera (Linux) and Maxthon (Windows) for the last couple of years.

Let me explain why:

First at work, I (still) use Windows 2000 as my workstation. There’s a host of reasons for this, which I won’t go into – but the main one that keeps me using Maxthon is that our intranet uses integrated authentication, which means one less login for me each day. Maxthon is a front-end wrapper around the IE (or Gecko) engine giving me a tabbed interface, mouse gestures etc. The other and main advantage is it’s (mostly) low memory footprint compared to both Firefox and Opera.

On my Linux machines or Windows machines with memory, I chose Opera. It used to be fast, it used to have a low memory footprint and once you got used to it’s mail client, it was good.

What’s changed? Maxthon slows down exponentially after a day’s worth of use. Opera is chewing more memory than any other application I run and most of all, Firefox has XUL and add-ons, which are finally useful, with compatibility better than Opera and security better than IE.

I still have IE running under Wine on my Linux machines, Opera installed wherever I go but it looks like Firefox has finally taken the cake for me at the moment.

I am currently writing this blog from the Firefox “Performancing” plugin, to a WordPress blog that I uploaded from the Fireftp plugin, my wife is playing her music (not necessarily a good thing) from Firefox on Windows which is controlling Amarok Music Player on my PCLinuxOS machine.