Linux question of the day

onesaint · August 2013

ChooseLife wrote: »

Same here, I always thought this cache could not be directly purged, only indirectly by starting processes, which in turn take memory away from the cache.
But onesaint's question sent me on the search quest - and it turns out there is a tunable kernel parameter /proc/sys/vm/drop_caches that does exactly that: Drop Caches - linux-mm.org Wiki

P.S. onesaint, would that constitute a right answer?

That's it! I usually clear Inodes, pagechaches, and dentries to get an idea of how much memory is actually being used on the system. Generally I use this for baselining a VM's ram usage.

# free -m
             total       used       free     shared    buffers     cached
Mem:          1999       1900         99          0         98       1315
-/+ buffers/cache:        486       1512
Swap:         4095          0       4095

# sync; echo 3 > /proc/sys/vm/drop_caches

# free -m
             total       used       free     shared    buffers     cached
Mem:          1999        418       1580          0          2         77
-/+ buffers/cache:        338       1660
Swap:         4095          0       4095

onesaint · August 2013

fiftyo wrote: »

I believe this question comes down to message queues/shared memory segments which you can view via ipcs -a then there would probably some segment eating a lot of memory in the queue, which you could destroy via ipcrm -M <key value>.

Additionally, you can run:

# ps -e -o pid,vsz,comm= | sort -n -k 2 | less

and that will show you each processes memory usage.

UnixGuy · August 2013

Excellent explanations thanks guys. I didn't know that we can actually free up the memory.

but I want to ask, since the performance isn't really affected by the high usage of the memory, is there a certain advantage in freeing up the memory in this way? Because from what I understand, the kernel tries to fill up the memory to enhance the performance.

What's the recommended course of action in this case? leave the memory full or try to free it up in the ways you suggested above?

ChooseLife · August 2013

UnixGuy wrote: »

is there a certain advantage in freeing up the memory in this way? Because from what I understand, the kernel tries to fill up the memory to enhance the performance.

You're bringing up a good point. In general this is a performance enhancing feature, so there is no need to purge caches just to see "free memory" number go up.

However, there may be special cases... E.g. I remember reading something about MySQL query cache vs filesystem cache contention and balancing them... Another scenario that comes to mind is a multi-tenant virtual host where you overcommit virtual RAM, but then VMs' kernels start using up all memory that they think they have for caching, starving the host's physical RAM...

onesaint · August 2013

ChooseLife wrote: »

You're bringing up a good point. In general this is a performance enhancing feature, so there is no need to purge caches just to see "free memory" number go up.

However, there may be special cases... E.g. I remember reading something about MySQL query cache vs filesystem cache contention and balancing them... Another scenario that comes to mind is a multi-tenant virtual host where you overcommit virtual RAM, but then VMs' kernels start using up all memory that they think they have for caching, starving the host's physical RAM...

The latter was my reasoning for using drop_caches. Knowing actual ram usage needs vs. kernel enhanced memory gives me the baseline to decide how much ram a particular VM needs on a hypervisor.

ChooseLife · August 2013

Actually that's a good trick for right-sizing VMs :thumbs up:

UnixGuy · August 2013

@ChooseLife & @onesaint excellent tips!!

paul78 · August 2013

That was an interesting read guys... Good question.

onesaint · August 2013

Next question;

I have two legacy servers in a distant, remote location. The specs are reported to be the same on both servers, but the disk size and my alerting system tells me that the the servers are different. One server reports having a software raid1, built from 2 disks reported by fdisk. The second server has 1 disk, reported by fdisk. The alert system is telling me the second system (with 1 disk) has a raid also. How do I determine if there is a raid controller on the second system and if possible what type of raid?

I'm editing this to make it a bit more general.

W Stewart · August 2013

Maybe cat /proc/mdstat or the mdadm command with some flag. Not sure which flag though.

onesaint · August 2013

W Stewart wrote: »

Maybe cat /proc/mdstat or the mdadm command with some flag. Not sure which flag though.

So, /proc/mdstat and mdadm will give you information on software raids (md devices), but if you have a hardware raid, it won't be seen by those utilities.

W Stewart · August 2013

That's a tough one. I usually just use omreport at my job to get info on hardware RAID and the drive usually just shows up as /dev/sda.

onesaint · August 2013

W Stewart wrote: »

That's a tough one. I usually just use omreport at my job to get info on hardware RAID and the drive usually just shows up as /dev/sda.

I'm in the same boat. On our normal equipment (HP) I usually use their RAID utility. But, I came across this and needed to figure out the differences in configuration on RHEL4 machines with different hardware.

ChooseLife · August 2013

Same here. For hardware RAIDs I always use vendor-provided tools (hpasmcli for HP servers). And I know software RAID is managed by mdadm in Linux but have never used it.

But your question is actually asking

onesaint wrote: »

How do I determine if there is a raid controller on the second system and if possible what type of raid?

I would snoop around the system's hardware reports as shown by dmesg, lsscsi, lspci, lsmod...

UnixGuy · August 2013

For a proper Hardware RAID the OS is not supposed to know about the hardware RAID, from the OS utilities like fdisk should show one disk. Some hardware RAID controller provide OS based utilities (like some Sun(Solaris) servers they have hardware RAID that can be accessed from the OS directly).

We have to reboot and go to the RAID controller utility to check if there's a RAID array configured

onesaint · August 2013

ChooseLife wrote: »

But your question is actually asking

I would snoop around the system's hardware reports as shown by dmesg, lsscsi, lspci, lsmod...

That's it with some extra tid bits.

So, I found dmesg and system tools like dmidecode will show the controller, but it will usually show the hardware raid configured drive (e.g., a mirror will only show 1 disk), just as UnixGuy noted. So, lshw (included in some base and other add on repos, depending on distro) will show the controller and the physical disks attached as well. With some easy deduction the raid can be determined;

# fdisk -l

Disk /dev/sda: 250.9 GB, 250999144448 bytes

#lshw
output omitted...
      description: RAID bus controller
      product: 3ware Inc 3ware 7xxx/8xxx-series PATA/SATA-RAID
      vendor: 3ware Inc
output omitted...
  *-member:0
       description: ATA Disk
       product: WDC WD2500
       vendor: Western Digital
       physical id: 0
       bus info: raid@c0/p0
       logical name: c0/p0
      ...    
       capacity: 127GiB (137GB)
  *-member:1
       description: ATA Disk
       product: WDC WD2500
       vendor: Western Digital
       physical id: 1
       bus info: raid@c0/p1
       logical name: c0/p1
       version: 20.06C06
       capacity: 127GiB (137GB)

well you'd all get good rep, but I have to spread the love it says!

UnixGuy · August 2013

New Question (I'm cheating because I'm copying this question from a facebook page

):

Multiple choice Question:
Q) One of your user had a large log file (Say 5G in size) in use that she deleted to get more disk space. But, after deleting the log file she open TT claiming that space was not recovered. Which action do you take to resolve this issue on the server?
1. Login as root and delete the file.
2. Send HUP single to process that was accessing the log file. If HUP failedl try to restart the process.
3. Use the logrotate command.
4. I have no idea.

onesaint · August 2013

I'd lsof for the file, determine the process, and then go with number 2.

Expect · August 2013

I too go for the PID termination way of approach

pram · August 2013

kill the proc and tell your user to just redirect into the file instead of deleting it

>huge.log

paul78 · August 2013

Yeah - I'll go with #2 as well. For some reason, it feels like a trick question so I wrote a small program that opens a file so I can play with different scenarios and #2 seems to do the job.

ChooseLife · August 2013

I have to point out that if someone does not know the answer, technically, #4 is a correct answer.

Expect · August 2013

Here's another question:

How can one process 'talk' to another process, consider these two processes have the same file descriptor.

pram · August 2013

using ipc sockets, or posix mqueues

socket(2): create endpoint for communication - Linux man page
mq_overview(7) - Linux man page

onesaint · August 2013

Shared memory or sockets. I just has a great knowledge share about this the other day.

paul78 · August 2013

Also if you are creating a process and you want to use the same file descriptors, when you fork() the child processes, the child process inherits all open file descriptors.

pram · August 2013

forking children is still under the parent process, the question is about two separate processes (i guess)

there are a lot of ways to do this actually, i don't think theres one correct answer. the simplest way is to use a pipe '|' between the processes receiving the stdin. like cat blah | grep | awk | sed | sort is 5 processes using the same file descriptor. you can make a fifo pipe for the same purpose. and ipc sockets etc etc

W Stewart · October 2013

Here's something I ran into at my job. A customers server keeps running out of memory and killing off processes to the point where we can no longer remotely access it and have to run across the street 3 times a day to hard reboot it. The customer won't do anything to resolve this issue. What can I do to make all of our lives easier?

UnixGuy · October 2013

@Stewart: What's the size of the RAM & Swap? maybe the server needs more swap space?

onesaint · October 2013

W Stewart wrote: »

Here's something I ran into at my job. A customers server keeps running out of memory and killing off processes to the point where we can no longer remotely access it and have to run across the street 3 times a day to hard reboot it. The customer won't do anything to resolve this issue. What can I do to make all of our lives easier?

first, I'd go with what UnixGuy mentioned and check out available ram / swap and possibly increase the swap space.

Is sysstat used on the system at all and if so, have statistics been reviewed? That might be helpful.

If swap isn't the issue, maybe write a quick script to free cached memory and run that with cron e.g.,

/proc/sys/vm/drop_caches

Or to try and determine what services are taking up all the memory and why it's happening i'd run something like;

ps -e -o pid,vsz,comm= | sort -n -k 2 | less

I like ps as it takes less memory than top. Then you can run a cron job to deal with the trouble process.

Linux question of the day

Comments