round one
Any user of Mac OS X is familiar with (if not a fan of) the spinning beach ball of occasional doom. A number of sites, including this one, discuss things that can summon / conjur The Ball.
As a daily user of a gen 1 G5 Dual Processor PowerMac (2GHz, 1.5GB RAM) who keeps many applications open at once (including MS apps and Safari), reboots as infrequently as possible (current uptime is 8 days, but the average is about two weeks), and consequently (is it consequence?) does increasing battle with the Ball/Wheel/Lollipop/Death-Bringer in the later days/hours of a boot cycle, I’ve decided to incrementally look into the root causes of my situation, and take pro-action as appropriate.
This thread will be the log.
Stay tuned, come along for the ride, post your experiences, etc.
rds
A performance-tuning specialist, I’m not. Just thought I’d get that out of the way.
What I do know about performance tuning, mostly from attempting to do it (and consequently doing a half-assed job, due to lack of experience) in the last ten years in I.T., is that short of gathering a lot of data, from many sources, over a long period of time (preferably over multiple periods of both high- and low-activity), and then working hard to correlate the data to actual end-user activity, is that it’s really, really hard to figure out, other than in the most general sense, what is making your system go slow.
The various web pages in the main post of this thread indicate a variety of reasons why OS X can slow down / conjur up the spilling beachball of death.
Of course, a basic premise hypothesis for can be that, simply, too much is running on my system at any given time. That’s a fair assertion. Without a doubt, I tend to be an “open it, and leave it running until it crashes sort of guy”.
Also, I know enough about UNIX (or other modern operating systems) and virtual memory to know that if there is not enough physical RAM for the running application set, swapping occurs, which leads to poor end-user performance.
I also know enough about disks to believe the assertions of those who indicate that running boot+swap disks too close to full capacity is generally a good recipe for poor performance, primarily due to fragmentation leading to lots of drive head seeks.
In reality, in the most basic sense, it’s probably the case that all of the above are happening on my machine:
1) Too little CPU for the apps currently loaded / operating
2) Too little RAM leading to too much swap/virtual memory activity
3) Hard disk (which is slow ATA) is almost full, heavily fragmented, slowing down. Horror.
So. That’s not much of an explanation. It’s not even a hypothesis.
We need a hypothesis to start, re: my recent system slowdowns and the Dreaded Lollipop. Cary Milsap (www.hotsos.com), an Oracle performance-tuning guru, is a big believer in the theory of going after the performance problems that are high-impact, meaning frequently- or constantly-occuring.
Why waste time looking for a marginal performance problem that occurs once every 2 days or so? Let’s go for the gusto.
Of course, since this is a practical though experience, I really don’t know where to start. So we’ll start with a basic hypothesis, and see if we can prove it.
Hold on – here we go:
Hypothesis 1 (Premise 2): I have too little physical RAM in this machine, so swapping / paging is occuring as I switch around between applications.
Ok. Not bad. It’s a start.
So, we should probably describe some basic characteristics regarding my system:
C1) 2 x 2.0GHz G5 Processors
C2) 1.5GB physical RAM
C3) 120GB SATA Boot Drive formatted with HFS+
C4) OS X 10.4.4 “Tiger”
Ok. That’s a good start, but we can certainly throw down some more data.
C5) Physical disk on the system:
# df -k | grep disk
/dev/disk1s3 292905072 281502276 11146796 96% /
(Internal SATA)
/dev/disk0s3 292926240 58060328 234865912 20% /Volumes/Backup0000-0000
(Internal SATA)
/dev/disk2s6 490193424 490157208 36216 100% /Volumes/Mobile DCF
(Firewire => ATA)
/dev/disk5 320171904 308122416 12049488 96% /Volumes/dcf0000-0005
(Fireware => ATA, 2 Disk x RAID 0 using OS X volume manager)
And oh, btw, I also have my iDisk from .Mac mounted.
Hmm. Wow. Three filesystems over 90% full. I certainly wouldn’t let my clients get away with that. Someone needs to buy more disk, or do some grooming, it seems. Starting with that boot drive, maybe.
Also:
C6) Connection to the internet:
Airport Extreme (902.11g), via a Linksys (Cisco) 802.11G WAP/Cable Modem combo
Oh yeah, it might be useful to indicate how long my system has been up:
C7) 22:44 up 12 days, 18 mins, 5 users, load averages: 1.72 1.67 1.71
Interesting. Based on my historic “wisdom” regarding the definition of “load” in the UNIX OS (the rightmost three numbers), load is defined as the average length of the process run queue. Which is to say, the average number of processes on the system that have every resource they need (I/O, RAM, network, etc) except a processor willing to give them some runtime.
Traditional “wisdom” regarding this number says that when it tends to be below the number of physical procs in a system (2, in my case), you don’t have a system that is CPU-bound.
Over the last while, my average load is “1.72″, which sort of jibes with the fact that Activity Monitor’s definition of “% CPU idle” tends to hover between 50% (1 whole proc) and 70%.
So, at the moment, ignoring my earlier assertion re: the nature of performance tuning (many numbers to be crunched, over time, with massive correlation going on), it could be argued that, as of right now, I’m not looking at a CPU resource problem. So much… or something.
That’s good. Since my hypothesis is that I’m short of physical RAM, I kind of need to pursue that one.
One meaningful way to do that, perhaps, would be to look at the VM space required by all my active processes, at the moment. The set of active processes, for now, is fairly static – they’ve all been running for several hours, if not more than a day.
Active Processes (right now), sorted by virtual memory requirement
Oh yeah, we probably need a header for that
PID Name User % CPU # Threads Real Memory VMem Size
0 kernel_task root 3.90 55 179.83 MB 1.36 GB
154 WindowServer windowserver 18.60 4 81.73 MB 811.88 MB
1092 Cyberduck rshangle 5.60 27 27.12 MB 583.29 MB
520 Desktop Manager rshangle 1.60 4 13.18 MB 577.61 MB
8333 Mail rshangle 1.30 17 33.41 MB 557.52 MB
26158 PowerPoint rshangle 0.20 10 15.24 MB 540.46 MB
5491 org.gudy.azureus2.ui.swt.Main rshangle 6.90 88 49.05 MB 539.88 MB
499 iTunes rshangle 20.90 19 146.81 MB 491.37 MB
29235 LaunchBar rshangle 0.10 5 114.25 MB 451.61 MB
5564 Safari rshangle 4.60 13 37.52 MB 424.95 MB
15289 Word rshangle 1.40 13 10.08 MB 417.54 MB
8487 Excel rshangle 0.20 11 7.32 MB 408.95 MB
278 VersionCueCS2 root 0.20 26 12.34 MB 365.04 MB
950 Keynote rshangle 0.80 8 13.20 MB 356.38 MB
513 N067U_ButtonManager rshangle 0.20 4 5.09 MB 300.20 MB
582 TextEdit rshangle 9.10 8 16.39 MB 262.53 MB
8710 Activity Monitor rshangle 3.00 6 19.40 MB 249.80 MB
287 StuffIt Deluxe rshangle 0.20 5 7.96 MB 243.48 MB
524 Database Daemon rshangle 0.50 5 5.66 MB 225.52 MB
4545 iTunes kaveilhe 2.40 8 36.63 MB 220.77 MB
5187 UniversalAccess rshangle 0.30 2 1.94 MB 216.12 MB
108 coreaudiod root 1.60 2 3.30 MB 68.75 MB
8711 pmTool root 1.60 1 1.86 MB 37.39 MB
Keep in mind that, thanks to Activity Monitor (which is based on, I dunno… top?), which refreshes every 5 seconds, that this listing is constantly changing (well, if by “constantly” you can agree that “every five seconds” is somewhat “constant”). Perhaps I should use the CLI top, with some longer scanning interval. Well, maybe not. Let’s not make this harder than it needs to be, yet.
So I’m going to pull this into Excel so I can actually ad it all up.
Actually, you know what, let’s NOT pull it all into Excel… because my system is so fucking slow / spinny-balled at the moment, I can’t even get the GD document to open in Excel… Jebus bless the command line. And our friend, awk…
Oh wait, Excel jsut came back from lunch. And, surprise, just consumed a large chunk of VM. So… the total VM space for active processes is, give or take:
Or 9.7GB, give or take. Just for active processes.
Ok. I have a VM space, for just active processes, that is roughly 6+ times the size of my physical RAM.
Ow. My total VM size (for all processes) is 25GB.
I clearly need more RAM… but since a) I don’t have the cash for more RAM and b ) this system will only take 8GB max, anyway… I might as well try some non-RAM purchase-based optimization, first.
That would seem to imply that, yeah, I could be running into RAM / swapping issues when switching around between heavy-hitting applications.
Next step, logically (or maybe this should have been the first step? Who can say…) is to set up some monitoring to watch / log page in / page out activity as I do some unhealthy stuff, like switch focus around between some heavy-hitting apps. Ok.
Got vm_stat logging on a ten-second interval, so I don’t do any more damage to procs than is necessary.
Let’s switch it up… and never let down, in the process.
Switched over to safari, viewed some pr0n. Wrote a Mail in Mail.app, excruciatingly slow.
Let’s really mix it up… MS App city… GO GO GO.
Switching over to Excel… now Word… now PowerPoint. Just opened three marginally-sized Excel/Word/PPT documents. Apps were dogs switching over.
Let’s see what was happening in our VMstat file while this was going on. CPU, btw, never really appeared pegged… just a slowdown + the spinny wheel.
Ok, not a mind-bogglingly surprising graph – a fair amount of pagein/pageout activity during the MS app switches.
So, if we stick with our hypothesis for now, I feel like, given the current load of the system, that there continues to be a good probability that my current slowdowns (which have recently gotten a lot worse) are due to my a) boot disk (where swap lives) running out of space (or getting near to running out of space) and b) compounded by likely fragmentation on my boot drive which, did I mention, is almost out of space?
So, I think I’m going to do three things. I know in good troubleshooting methodology, I should only tackle one problem at a time, but I’m not a good troubleshooter. I am going to:
a) put swap on a 80% free 300GB internal SATA drive that is NOT my boot disk (I’ll let you know how I did that when I figure that out…)
b) do a clean fsck on my boot disk
c) clean up as much as space on my boot disk as possible
d) Get back to you…
Hold steady, New York City!
GBallard data re: fscking, booting in single user mode with OS X, file permission repair, good and true insight re: the need to protect data, etc.
rds
ps. Micro-update:
a) Swap now relocated to second internal SATA drive with 80% free space. How did I do this:
NOTE: Make backups of /etc/rc before doing this, or die horribly.
Go into the depths of /etc/rc and make some updates:
sudo vi /etc/rc
# comment out this next line, the original location of the swap file
#swapdir=/private/var/vm
# add the following lines to:
# make sure your alternate swap disk is mounted earlier-than-normal in the boot process
mkdir “/Volumes/Backup0000-0000″
mount_hfs /dev/disk0s3 “/Volumes/Backup0000-0000″
# redefine the location of swap
swapdir=/Volumes/Backup0000-0000/private/var/vm
It worked:
trogdor-5:~ rshangle$ ls -al /Volumes/Backup0000-0000/private/var/vm
total 131072
drwxr-xr-x 3 root wheel 102 Jan 22 23:46 .
drwxr-xr-x 3 root rshangle 102 Jan 22 23:46 ..
-rw——T 1 root wheel 67108864 Jan 22 23:46 swapfile0
b) Haven’t done said fsck yet, because I’m getting conflicting data from the web re: the safe-ness of bringing 10.4 to single-user and running “fsck” (although, given that fsck (filesystem check) is a rather fundamental bit of core code, I’d be shocked if it were a bad idea to follow this approach).
Also, I’ll need to hunt down a USB keyboard for the purpose of being able to hold down Command-S upon boot to get into single-use mode. Not going to cut it with my Apple Wireless keyboard.
c) Done. Currently 26GB free on my boot disk, up from 7GB.
Will report back within the next day or so, after the system has reached a level of operational equilibrium.
r
i just bought another GB of RAM, bringing the total to 2.5GB. for fun.
stay tuned.
rds
DATELINE
Add’l 1GB (2 x 512MB DIMM) installed.
Nice to see, in Activity Monitor, that even with the “standard” app load on a fresh boot, still have over 0.75GB free. This is a change from the previous state, which had bascially 0GB free after iPhoto, iTunes, MS Word, MS Excel, Terminal, Keynote, Safari, Activity Monitor, Cyberduck, Mail were loaded.
We’ll see how that works out over the next few days.
RAM was aquired from RamJet (http://www.ramjet.com), an organization I’ve had success with in the past (despite the relatively high cheese factor of their web page; this is my fourth purchase from RamJet), for $144 (incl shipping).
rds
DATELINE:
so far, anecdotally, the additional 1GB of RAM (total of 2.5GB) has made a subejctive impact in performance. Although the system will tend to “settle”, once apps are loaded, with a near-zero green “free” memory space (see above image from Activity Monitor), pageouts generally are floating at 1/10th to 1/20th of page-ins, indicating a significant drop in paging activity due to needing to free infrequently used pages.
will keep watching and reporting. looking forward to the next performance problem, as for now, the verdict seems to be:
a) moving swap to a non-boot hard drive with hundreds of free GB and
b) adding additional 1GB of ram…
These two actions seem to have significantly improved system performance / reduced occurance of spinny wheels.
stay tuned.
rds
Thenthis.
rds
Some neat behind-the-scenes OS X performance optimizing technologies.
rds