Rss

  • twitter
  • facebook

Techbook - Come and Experience the Technical Reality

Techbook - Come and Experience the Technical Reality. - Here is a great blog on technology review from my peer and friend Mahesh Pulipati. I'll have an RSS aggregation from his organization setup soon, and am excited to include him under the IT blogging realm!

Share on Facebook

Automation of Visio - Experiences mapping a NetApp Filer using Visio 2010 and PowerShell

stick_figure_drawing_computer_client_diagram_1600_clr_5129

Often times, there are complicated environments that have a large number of storage objects. Mapping them can be difficult, but often necessary to understand how data is being moved. Like many of my colleagues, documentation can be a chore and is prone to misinterpretations, as well as time consuming. Being a Linux engineer my first thought is "How do I automate this so I can do more interesting things?" After dragging my feet and going in kicking and screaming, I decided to dive into PowerShell. The following is an example of a dynamically created Visio 2010 diagram of the physical storage configuration of my NetApp Filer simulator.

Tools Used

  • NetApp PowerShell Toolkit
  • NetApp Visio Stencils (Extracted to "My Shapes" under Libraries->Documents)
  • Visio 2010
  • At minimum, RemoteSigned permissions (Set-ExecutionPolicy RemoteSigned)
  • At least Read-Only Access to a NetApp Filer

example1

How this diagram is created

  1.  Script logs into the Filer
  2. Data is collected
  3. Script opens Visio and Stencils
  4. Objects are placed on the workspace in logical order
  5. Next version will have data from each point in a table (Latency, I/O, Growth Rate, Overwrite Rate, Days to Full) and exported to Sharepoint team portal
  6. Data points will be active and retrievable to all whom has Sharepoint team portal access
  7. Business Workflows will monitor the data points and take action when conditions are met (Create request for new storage, warn current use for remediation, etc...)

It won't turn any heads, but that is just a formatting process that is simple to design. Nothing with this process is particularly amazing (which is why I didn't include the code), but using the collection of tools like these can come in handy when trying to troubleshoot a complex system. Seeing the visual data paths and their associated  metrics helps put a scenario into perspective and helps move teams troubleshooting exercises to a solution based orientation rather than a problem based orientation (think of doing a maze from start-to-finish or from finish-to-start...which one is easier??). It also has the added benefit of creating nice diagrams if you are "artistically" challenged like me.

Share on Facebook

IT is not a guessing game...

crystal_ball_dollar_300_wht_5903
 
I am often asked the question "But Sonny, how do you know?". Well, this is my secret sauce: I don't guess. I have my own "Matrix" and I simulate the entire global environment of my "sphere of influence" and use that as a test bed for implementations. Similar to a scale model train set, I add every possible switch, router, firewall, storage, server, any device that exists with any software that runs in the real world: all running on my personal lab. The below screenshot is as Lab I am working on for graduate school, and for non-disclosure reasons cannot publish my "Matrix" world, but my homework will suffice for this post. Below you can see that this is a simple 2 core switch environment that is running a proprietary "Imaging Server" for a hospital in L.A., and the interconnect between the switches is set to 100 Mbps (to simulate the poor application performance).
 
In this scenario I have set up a typical doctor accessing data from the imaging server, and email and web, with work groups performing normal daily operations from each node group, and I can change any aspect of the application and collect data to troubleshoot the situation. Agents running on the Linux and Windows application and database virtual servers feed input into IT Guru to report response time. Hooks into NetApp Storage access the VM array and gather statistics based on current conditions. All networking devices are reporting on their statistics and contributing to the picture as a whole. Hooks into MySQL are analyzing data queries and performing correlations back to the feeds from the simulated Filer array or EMC frame. All forms of code (JAVA,Python,PHP,ColdFusion, Ruby) are being analyzed and reported on their performance on an object-by-object basis. All working in perfect concert to help find, troubleshoot, and eliminate any suspected bottlenecks that might exist within an environment.IT-Lab
 
From this vantage point, I can see through the entire OSI stack and report on potential performance bottlenecks:
 

1) Layers 1-3 provide information about how data flows through the physical, data link, and network layers based on existing configurations

2) Layers 4-7 are accessed through a variety of network taps, agents on the source and destination, and a correlation engine to connect the data into information. Objects are analyzed for code performance, SQL query times are correlated back to hardware and OS performance metrics

3) Information collected is analyzed to create knowledge based on existing known workloads to establish a baseline

4) Conditions are set for potential business scenario planning with decision makers and executives based on known initiative/work loads

5) Scenarios are built and tested against proposed design

 
In the scenario here, the link between the first floor and second floor core switches is too small, causing network utilization to spike to 100%. This means that the Radiology application will have network queuing as a result of insufficient resources. Once the simulation completes, a new scenario can be cloned and created to remote the bottleneck and show how the application performs and what happens to queuing as a result.
 

IT-Lab-10g-change
The result of changing the interconnect can be seen to the left: point-to-point queuing delays are statistically zero, and point-to-point utilization on this same workload is less than 1%.

 

IT-Lab-custom-app-response-time
 
 
 
 
 
 
 
 
 
 
 
From the graph on the right, you can see that the application response time dropped from a previous peak of 60 seconds over the course of an hour, down on the 10Gb change scenario to a peak of 12.5 seconds.

You tell me....where is the guess work?

Share on Facebook

Infiniband on RedHat/CentOS

On day's I'm sick (like today) I get to think a lot and really put my thoughts to pen. One project I have worked on for a long time is Infiniband in RedHat/CentOS. Using Infiniband is great, but it can also swamp you in ways you never expected. The different types of Infiniband are:

1) IPoIB. In this case IP stack is put above IB. You don’t need to rewrite your applications while you can utilize high throughput. On the other hand you will kill IB low latencies and won’t be able to utilize whole IB throughput capabilities.

2) Sockets Direct Protocol (SDP) which is designed to utilize IB RDMA capabilities and bypass TCP/IP stack. SDP can be used transparently w/o recompiling your application. It’s not that fast as native IB API but is better than IPoIB.

3) IB Verbs; the lowest API, User Direct Access Programming Library (uDAPL) which is based on IB Verbs, Message Passing Interface (MPI) or Unified Parallel C. Different versions of MPI and UPC can be based on either IB Verbs or uDAPL. I personally work with MPI and UPC so I will describe their installation over InfiniBand.

I have implemented all 3 types of technology to great success. The layer with the fastest speed and the lowest latencies is uDAPL, which I must say, takes some getting used to. It requires an abstraction layer between the IB interface, and the rest of the program that needs access to storage or process threads on the fabric will use that abstraction layer to send/receive data to targets available to the requesting server. One implementation was using iSCSI on IPoIB. The upside is that storage on an IP stack is very fast, but latency is the same as on Ethernet because of the protocol stack. I would equate its speed to that of 10g over FC; high speeds with a hanging nail of increased latency (If you are not proficient in OO programming, threading is almost impossible to do structurally in the main languages that are the most expansive - Java and Python).

Share on Facebook

Hybrid Storage Disks in a LVM Pool - How Solid State and SATA disks can increase performance.

Do reads and writes plague your applications? Do you wish you could get more "bang for the buck"? I know many companies that complain about I/O contention while balancing the tedious task of having redundancy built into their infrastructures in case of a "rainy day" failure (that is almost sure to occur). Many tier I storage vendors are moving to hybrid disk pools to add faster access, as well as mitigate the latency of RAID 1. However, many small and medium size businesses cannot afford the big 3, but anyone can take advantage of hybrid pools for a minimal price.

The first question any I.T. manager/architect/guy-who-is-responsible-for-storage should ask is "Am I using the tools that are available to me that support enterprise features?" Without understanding the intricate details of how many open source tools work, we will dive a little deep into the one feature EVERYONE should have in their arsenal: Logical Volume Manager

With budgets being squeezed from the down economy, everyone is scrambling to demonstrate their return on investment for any purchases needed to achieve the kind of performance that screams "Bonus in my review". However, LVM is readily available with the equipment you have lying around; you just need to learn to use it. What is great about LVM is its ability to abstract the disk layer into something far more human compatible than managing the disks locally. Now, you still have to understand the physical disk layer as LVM is built on top of disks, but the only steps required is partitioning. So, if you understand how fdisk works at a basic level, you are good to go! Yes, you can build your volumes directly on the disk, but partitioning allows you to divide your pools up for different work loads.

Now I'm no C programmer, but the above makes me want to learn C. I can envision the following workload written into LVM:

1) All disk timings are calculated on initialization
2) All disks are prioritized by the level at which they perform: lower the latency, higher the priority
3) In a hybrid RAID design like mine, I could feasibly write the above algorithm into LVM for such a server design.

It would be a fairly easy process/feature to add to LVM, and create a directly attached storage, or storage server for the mid-range, to enterprise,

(I so want to believe this is already a part of LVM, but don't know for sure yet as I am researching it.)

***This is a WIP****

Share on Facebook

Adding vCPU/Memory in a VMware Guest hot -- without rebooting

This one makes me scratch my head; why wouldn't every VM practitioner NOT know how to add memory and vCPU's without rebooting? I can't tell you how many times a server needs more resources, and teams take on the herculean task of planning coordination to shut an application down, just to add some memory or core power.

VMWare Guest Settings for vCPU/Mem hot add

1. Goto the options tab under your VM guests' settings
2. Select "Memory/CPU Hotplug" Setting
3. Enable memory hot add by selecting the enable memory hot add radial button
4. Enable CPU hot add by selecting the enable cpu hot add radial button
5. Select the OK button

The only time you will need to reboot to add either CPU or Memory will be this last time. Reboot the guest, add either memory or CPU, and then run the appropriate script for your desired function below:

#!/bin/bash
#This script will hot add vCPU's inside VMware guest systems
#Sonny Stormes - 2009
for CPU in $(ls /sys/devices/system/cpu/ | grep cpu | grep -v idle)

do
    CPU_DIR="/sys/devices/system/cpu/${CPU}"
    echo "Found cpu: \"${CPU_DIR}\" ..."
    CPU_STATE_FILE="${CPU_DIR}/online"
    if [ -f "${CPU_STATE_FILE}" ]; then
        STATE=$(cat "${CPU_STATE_FILE}" | grep 1)
        if [ "${STATE}" == "1" ]; then
          echo -e "\t${CPU} already online"
    else
      echo -e "\t${CPU} is new cpu, onlining cpu ..."
      echo 1 > "${CPU_STATE_FILE}"
  fi
    else
      echo -e "\t${CPU} already configured prior to hot-add"
  fi
done
-------------------------------------------------------------------------------------------------
#!/bin/bash
#This script will hot add memory into a VMware guest
#Sonny Stormes - 2009
if [ "$UID" -ne "0" ]
  then
  echo -e "You must be root to run this script.\nYou can 'sudo' to get root access"
  exit 1
fi

for MEMORY in $(ls /sys/devices/system/memory/ | grep memory)
do
    SPARSEMEM_DIR="/sys/devices/system/memory/${MEMORY}"
    echo "Found sparsemem: \"${SPARSEMEM_DIR}\" ..."
    SPARSEMEM_STATE_FILE="${SPARSEMEM_DIR}/state"
    STATE=$(cat "${SPARSEMEM_STATE_FILE}" | grep -i online)
    if [ "${STATE}" == "online" ]; then
      echo -e "\t${MEMORY} already online"
    else
      echo -e "\t${MEMORY} is new memory, onlining memory ..."
      echo online > "${SPARSEMEM_STATE_FILE}"
    fi
done

These functions are only good for adding either component. To remove is a more complicated process, but if enough interest is shown from the above, I'll complete my single script for both attaching and detaching CPU or memory.

Share on Facebook

Using both FC and IP network connectivity in Red Hat dm-multipath (Part 1)

Much of the beauty of dm-multipath in Red Hat is the chance to combine the redundancy and stability of traditional IP networking with the performance of fibre channel connectivity, culminating in a network connection that can withstand the failure of so many points it almost makes the head spin. This can be accomplished by the use of iSCSI and a TOE network interface, a dual port (minimum) FC-HBA, a properly configured dm-multipath, and a storage system that can export block storage across both mediums (IP network and FC..which is most systems, unless you are living in the year 2000).

Here is the basic premise:

1) Fibre Channel connections are your primary block transport. Set the initiator group on the storage unit as normal, export the blocks, and connect LUN's as normal.

2) Set up an iSCSI initiator to the storage unit and export the same LUN out the target of your storage system. Initiate the connection on both network ports out of the iSCSI TOE card.

3) Set up DM-Multipath. Out of the box, DM-Multipath will create a pseudo-device based on the SCSI_ID of each attached LUN and create a round-robin connection to each LUN (in count, you should have 6 total devices - 2 FC primary paths, 2 FC secondary paths, and 2 iSCSI IP network paths).

4) Set the 4 FC paths (primary and secondary) up accordingly with policies that let's the primary path be used during service time, the secondary path be a potential failover path in the event that either of the 2 primary paths are down, and a final policy that sets the IP network connections to the iSCSI target as dual tertiary paths that will be the final path of last resort.

Hang in there with me! I know that this sounds complicated, but over the next few posts, I'll be explaining each step in detail in a way that is easy for even a beginner to understand. I know that much of this sounds like a drunk man trying to explain why he is wearing a pink tutu in a park at 11am, but trust me when I say it is not that difficult.

At the end of this series, you can build a storage network that can withstand a charging rhino running from Godzilla (ok, maybe not a rhino, but certainly a koala bear wreaking havoc munching on FC cables or networks...if you have koala bears in your data center..after all..who doesn't?).

Until then, happy file and block exporting!

Share on Facebook

Avoid udev rules

Are you like me and hate setting permissions with outside scripts on dm-multipath devices? Best way to avoid this mess (and udev rules to boot) is to include the following on your multipath.conf device:
 
 
 

multipath {
    wwid 360a98000486e58526c34515944703277
    alias devicename
    mode 660
    uid 501
    gid 502
}

What's great about this is when DM-multipath instantiates the new mapper devices, they will all be set with the permissions, user, and group assignments that you want. No muss, no fuss!

Enjoy!

Share on Facebook

Deployement of ALUA with Red Hat and NetApp

I love the idea of having failover on my primary database. So if you haven't done so, I would highly recommend it. With Netapp and Red Hat, here's the skinny:

1) Set up your /etc/multipath.conf to use mpath_prio_alua rather then the default mpath_prio_ontap. I have not verified if mpath_prio_ontap reads the SCSI-3 ALUA commands, but I know the latter does, so it was easier to use it than the former.

2) Set the ALUA on the igroup with:
igroup set alua on

3) Reboot your hosts

4) Enjoy ALUA goodness!

With this configuration, when cf failover is initiated, your Red Hat server will know when the primary path is dead and use the secondary path without intervention. It really is a beautiful thing, and I recommend it for any storage heads out there.

Share on Facebook

Removing active devices from Linux

I hate it when I have a disk that needs to be removed while the system is still active. If you are sick of having to reboot to remove devices, look no further! Brainstormes U to the rescue!

 

From the CLI (where all Linux and Unix administration REALLY happens) execute the following:

 

echo 1 > /block/sys/<devicename>/device/delete

 

So if you had a disk /dev/sdb that was either a LUN you want to delete, or an attached device (SATA,SAS,IDE,yo mamma) by issuing the command

 

echo 1 > /block/sys/sdb/device/delete

 

You will have successfully removed it from the kernel and can unplug, deattach, throw out on its butt, the device in question, all without a reboot.

 

Ok, I'll take it to the next step and reverse the process:

To add a device on any bus (SATA,SAS,FC-HBA,iSCSI) the reverse is true...with some modifications:

 

echo "- - -" >/sys/class/scsi_host/host$NUMBER/scan

 

Where host$NUMBER is the number of the bus you want to scan. (The "- - -" means to look at every channel, every target, and every lun on that host). After you finish, check dmesg for your new device and BAM! You've successfully hotplugged a new LUN,SATA disk, SAS disk, USB disk

 

Peace!

Share on Facebook
This site is using Web Stats, created by emailextractor14.com