Techbook - Come and Experience the Technical Reality. - Here is a great blog on technology review from my peer and friend Mahesh Pulipati. I'll have an RSS aggregation from his organization setup soon, and am excited to include him under the IT blogging realm!
Share on FacebookOften times, there are complicated environments that have a large number of storage objects. Mapping them can be difficult, but often necessary to understand how data is being moved. Like many of my colleagues, documentation can be a chore and is prone to misinterpretations, as well as time consuming. Being a Linux engineer my first thought is "How do I automate this so I can do more interesting things?" After dragging my feet and going in kicking and screaming, I decided to dive into PowerShell. The following is an example of a dynamically created Visio 2010 diagram of the physical storage configuration of my NetApp Filer simulator.
Tools Used
- NetApp PowerShell Toolkit
- NetApp Visio Stencils (Extracted to "My Shapes" under Libraries->Documents)
- Visio 2010
- At minimum, RemoteSigned permissions (Set-ExecutionPolicy RemoteSigned)
- At least Read-Only Access to a NetApp Filer

How this diagram is created
- Script logs into the Filer
- Data is collected
- Script opens Visio and Stencils
- Objects are placed on the workspace in logical order
- Next version will have data from each point in a table (Latency, I/O, Growth Rate, Overwrite Rate, Days to Full) and exported to Sharepoint team portal
- Data points will be active and retrievable to all whom has Sharepoint team portal access
- Business Workflows will monitor the data points and take action when conditions are met (Create request for new storage, warn current use for remediation, etc...)
It won't turn any heads, but that is just a formatting process that is simple to design. Nothing with this process is particularly amazing (which is why I didn't include the code), but using the collection of tools like these can come in handy when trying to troubleshoot a complex system. Seeing the visual data paths and their associated metrics helps put a scenario into perspective and helps move teams troubleshooting exercises to a solution based orientation rather than a problem based orientation (think of doing a maze from start-to-finish or from finish-to-start...which one is easier??). It also has the added benefit of creating nice diagrams if you are "artistically" challenged like me.
Share on Facebook
I am often asked the question "But Sonny, how do you know?". Well, this is my secret sauce: I don't guess. I have my own "Matrix" and I simulate the entire global environment of my "sphere of influence" and use that as a test bed for implementations. Similar to a scale model train set, I add every possible switch, router, firewall, storage, server, any device that exists with any software that runs in the real world: all running on my personal lab. The below screenshot is as Lab I am working on for graduate school, and for non-disclosure reasons cannot publish my "Matrix" world, but my homework will suffice for this post. Below you can see that this is a simple 2 core switch environment that is running a proprietary "Imaging Server" for a hospital in L.A., and the interconnect between the switches is set to 100 Mbps (to simulate the poor application performance).
In this scenario I have set up a typical doctor accessing data from the imaging server, and email and web, with work groups performing normal daily operations from each node group, and I can change any aspect of the application and collect data to troubleshoot the situation. Agents running on the Linux and Windows application and database virtual servers feed input into IT Guru to report response time. Hooks into NetApp Storage access the VM array and gather statistics based on current conditions. All networking devices are reporting on their statistics and contributing to the picture as a whole. Hooks into MySQL are analyzing data queries and performing correlations back to the feeds from the simulated Filer array or EMC frame. All forms of code (JAVA,Python,PHP,ColdFusion, Ruby) are being analyzed and reported on their performance on an object-by-object basis. All working in perfect concert to help find, troubleshoot, and eliminate any suspected bottlenecks that might exist within an environment.
From this vantage point, I can see through the entire OSI stack and report on potential performance bottlenecks:
1) Layers 1-3 provide information about how data flows through the physical, data link, and network layers based on existing configurations
2) Layers 4-7 are accessed through a variety of network taps, agents on the source and destination, and a correlation engine to connect the data into information. Objects are analyzed for code performance, SQL query times are correlated back to hardware and OS performance metrics
3) Information collected is analyzed to create knowledge based on existing known workloads to establish a baseline
4) Conditions are set for potential business scenario planning with decision makers and executives based on known initiative/work loads
5) Scenarios are built and tested against proposed design
In the scenario here, the link between the first floor and second floor core switches is too small, causing network utilization to spike to 100%. This means that the Radiology application will have network queuing as a result of insufficient resources. Once the simulation completes, a new scenario can be cloned and created to remote the bottleneck and show how the application performs and what happens to queuing as a result.

The result of changing the interconnect can be seen to the left: point-to-point queuing delays are statistically zero, and point-to-point utilization on this same workload is less than 1%.

From the graph on the right, you can see that the application response time dropped from a previous peak of 60 seconds over the course of an hour, down on the 10Gb change scenario to a peak of 12.5 seconds.
You tell me....where is the guess work?
Share on FacebookDo reads and writes plague your applications? Do you wish you could get more "bang for the buck"? I know many companies that complain about I/O contention while balancing the tedious task of having redundancy built into their infrastructures in case of a "rainy day" failure (that is almost sure to occur). Many tier I storage vendors are moving to hybrid disk pools to add faster access, as well as mitigate the latency of RAID 1. However, many small and medium size businesses cannot afford the big 3, but anyone can take advantage of hybrid pools for a minimal price.
The first question any I.T. manager/architect/guy-who-is-responsible-for-storage should ask is "Am I using the tools that are available to me that support enterprise features?" Without understanding the intricate details of how many open source tools work, we will dive a little deep into the one feature EVERYONE should have in their arsenal: Logical Volume Manager
With budgets being squeezed from the down economy, everyone is scrambling to demonstrate their return on investment for any purchases needed to achieve the kind of performance that screams "Bonus in my review". However, LVM is readily available with the equipment you have lying around; you just need to learn to use it. What is great about LVM is its ability to abstract the disk layer into something far more human compatible than managing the disks locally. Now, you still have to understand the physical disk layer as LVM is built on top of disks, but the only steps required is partitioning. So, if you understand how fdisk works at a basic level, you are good to go! Yes, you can build your volumes directly on the disk, but partitioning allows you to divide your pools up for different work loads.
Now I'm no C programmer, but the above makes me want to learn C. I can envision the following workload written into LVM:
1) All disk timings are calculated on initialization
2) All disks are prioritized by the level at which they perform: lower the latency, higher the priority
3) In a hybrid RAID design like mine, I could feasibly write the above algorithm into LVM for such a server design.
It would be a fairly easy process/feature to add to LVM, and create a directly attached storage, or storage server for the mid-range, to enterprise,
(I so want to believe this is already a part of LVM, but don't know for sure yet as I am researching it.)
***This is a WIP****
This one makes me scratch my head; why wouldn't every VM practitioner NOT know how to add memory and vCPU's without rebooting? I can't tell you how many times a server needs more resources, and teams take on the herculean task of planning coordination to shut an application down, just to add some memory or core power.

1. Goto the options tab under your VM guests' settings 2. Select "Memory/CPU Hotplug" Setting 3. Enable memory hot add by selecting the enable memory hot add radial button 4. Enable CPU hot add by selecting the enable cpu hot add radial button 5. Select the OK button
The only time you will need to reboot to add either CPU or Memory will be this last time. Reboot the guest, add either memory or CPU, and then run the appropriate script for your desired function below:
#!/bin/bash
#This script will hot add vCPU's inside VMware guest systems
#Sonny Stormes - 2009
for CPU in $(ls /sys/devices/system/cpu/ | grep cpu | grep -v idle)
do
CPU_DIR="/sys/devices/system/cpu/${CPU}"
echo "Found cpu: \"${CPU_DIR}\" ..."
CPU_STATE_FILE="${CPU_DIR}/online"
if [ -f "${CPU_STATE_FILE}" ]; then
STATE=$(cat "${CPU_STATE_FILE}" | grep 1)
if [ "${STATE}" == "1" ]; then
echo -e "\t${CPU} already online"
else
echo -e "\t${CPU} is new cpu, onlining cpu ..."
echo 1 > "${CPU_STATE_FILE}"
fi
else
echo -e "\t${CPU} already configured prior to hot-add"
fi
done
-------------------------------------------------------------------------------------------------
#!/bin/bash
#This script will hot add memory into a VMware guest
#Sonny Stormes - 2009
if [ "$UID" -ne "0" ]
then
echo -e "You must be root to run this script.\nYou can 'sudo' to get root access"
exit 1
fi
for MEMORY in $(ls /sys/devices/system/memory/ | grep memory)
do
SPARSEMEM_DIR="/sys/devices/system/memory/${MEMORY}"
echo "Found sparsemem: \"${SPARSEMEM_DIR}\" ..."
SPARSEMEM_STATE_FILE="${SPARSEMEM_DIR}/state"
STATE=$(cat "${SPARSEMEM_STATE_FILE}" | grep -i online)
if [ "${STATE}" == "online" ]; then
echo -e "\t${MEMORY} already online"
else
echo -e "\t${MEMORY} is new memory, onlining memory ..."
echo online > "${SPARSEMEM_STATE_FILE}"
fi
done
These functions are only good for adding either component. To remove is a more complicated process, but if enough interest is shown from the above, I'll complete my single script for both attaching and detaching CPU or memory.
Share on Facebook
Much of the beauty of dm-multipath in Red Hat is the chance to combine the redundancy and stability of traditional IP networking with the performance of fibre channel connectivity, culminating in a network connection that can withstand the failure of so many points it almost makes the head spin. This can be accomplished by the use of iSCSI and a TOE network interface, a dual port (minimum) FC-HBA, a properly configured dm-multipath, and a storage system that can export block storage across both mediums (IP network and FC..which is most systems, unless you are living in the year 2000).
Here is the basic premise:
1) Fibre Channel connections are your primary block transport. Set the initiator group on the storage unit as normal, export the blocks, and connect LUN's as normal.
2) Set up an iSCSI initiator to the storage unit and export the same LUN out the target of your storage system. Initiate the connection on both network ports out of the iSCSI TOE card.
3) Set up DM-Multipath. Out of the box, DM-Multipath will create a pseudo-device based on the SCSI_ID of each attached LUN and create a round-robin connection to each LUN (in count, you should have 6 total devices - 2 FC primary paths, 2 FC secondary paths, and 2 iSCSI IP network paths).
4) Set the 4 FC paths (primary and secondary) up accordingly with policies that let's the primary path be used during service time, the secondary path be a potential failover path in the event that either of the 2 primary paths are down, and a final policy that sets the IP network connections to the iSCSI target as dual tertiary paths that will be the final path of last resort.
Hang in there with me! I know that this sounds complicated, but over the next few posts, I'll be explaining each step in detail in a way that is easy for even a beginner to understand. I know that much of this sounds like a drunk man trying to explain why he is wearing a pink tutu in a park at 11am, but trust me when I say it is not that difficult.
At the end of this series, you can build a storage network that can withstand a charging rhino running from Godzilla (ok, maybe not a rhino, but certainly a koala bear wreaking havoc munching on FC cables or networks...if you have koala bears in your data center..after all..who doesn't?).
Until then, happy file and block exporting!
Share on Facebook
Are you like me and hate setting permissions with outside scripts on dm-multipath devices? Best way to avoid this mess (and udev rules to boot) is to include the following on your multipath.conf device:
multipath {
wwid 360a98000486e58526c34515944703277
alias devicename
mode 660
uid 501
gid 502
}
What's great about this is when DM-multipath instantiates the new mapper devices, they will all be set with the permissions, user, and group assignments that you want. No muss, no fuss!
Enjoy!
Share on Facebook
I hate it when I have a disk that needs to be removed while the system is still active. If you are sick of having to reboot to remove devices, look no further! Brainstormes U to the rescue!
From the CLI (where all Linux and Unix administration REALLY happens) execute the following:
echo 1 > /block/sys/<devicename>/device/delete
So if you had a disk /dev/sdb that was either a LUN you want to delete, or an attached device (SATA,SAS,IDE,yo mamma) by issuing the command
echo 1 > /block/sys/sdb/device/delete
You will have successfully removed it from the kernel and can unplug, deattach, throw out on its butt, the device in question, all without a reboot.
Ok, I'll take it to the next step and reverse the process:
To add a device on any bus (SATA,SAS,FC-HBA,iSCSI) the reverse is true...with some modifications:
echo "- - -" >/sys/class/scsi_host/host$NUMBER/scan
Where host$NUMBER is the number of the bus you want to scan. (The "- - -" means to look at every channel, every target, and every lun on that host). After you finish, check dmesg for your new device and BAM! You've successfully hotplugged a new LUN,SATA disk, SAS disk, USB disk
Peace!
Share on Facebook
Booting from SAN has so many benefits, it is sure to become the standard for server engineering. By attaching storage address space to an HBA, the disks can be removed from the internal system and controlled by a tier I storage unit providing many benefits local disk cannot. For instance, should hardware fail on an existing Boot-From-SAN system, moving the disks to a working server is just a matter of re-pointing the WWID to a standby system and booting the address space on the working hardware; no muss, no fuss. From this feature comes one of the kewlest features I employ as often as I can: address space cloning, and thus entire server cloning. By employing storage virtualization (which is a standard feature on pretty much ALL storage devices nowadays...kind of like a CD-ROM or twisty ties) system admins can simply create a clone of storage address space, assign it to an HBA, grip it and rip it, and you are off to the races; a full server provisioning in under 10 minutes...unless you have an hour to waste watching a status bar move from left to right...
The steps are pretty generic amongst all major (and pretty much all minor) vendor storage units:
1) Set you FC-HBA to boot from SAN with a Boot BIOS
2) Attach some storage address space to the FC-HBA's WWID
3) Set the BIOS to boot from the address space assigned
4) Install as normal
To perform the cloning process:
1) Create a snapshot (most vendors have some type of cloning process that does this by default)
2) Clone a new volume from the snapshot
3) Attach cloned volume to the HBA WWID
4) Boot that bad boy
These are pretty high level steps, but really it is as simple as the steps listed above. Of course you have to concern yourself with spindle limitations, bandwidth, and all the other variables required in storage engineering, but is certainly not a process that should be feared because of the stigma on storage virtualization. It is a skill that should be in every worthwhile systems/storage engineer's bag.
Share on Facebook


