bill’s blog

Just another WordPress weblog

Browsing Posts in My Life

July 4th is one of my favorite holiday’s… hitting the beach… Barbecuing… cold beers… and fireworks! BUT working in IT brings with the possibility of having you’re holiday plans interrupted by server/network outages. I can’t remember a 4th of July where I didn’t get a call that something is up with one of my servers and 2010 would be no different.

It started at 6:30AM… The main website for our Canadian office was unreachable. SO I booted my laptop and checked the site. Hmmm… It came up fine for me. Perhaps the server’s admin got to it before me. Called corporate and fixed myself a cup of coffee. 45 minutes later the phone rang again. “The site is down again!” I walked back to the computer, coffee in hand and indeed my browser timed out. Hmmm… OK something’s not right. I VPN’ed into the box and pointed the servers browser to the website and the site loaded BUT not as fast as I would have expected. The load on the server looked a bit high to me but this wasn’t my box and didn’t know what the normal numbers for the box were! I started to pour through the APACHE server error logs looking of answers. Nothing there!  Back to the browser… The page loaded fine, the speed having returned. I turned to my wife’s computer (making sure I took the local network out of the equation) and the point her browser to the site. This time I got a strange error message in the browser window…

“Can’t connect to the database too many connections open.”

Hmmm. Strange? Let me refresh my browser… the site pops back up. OK… let’s jump on the box and have a look at what’s going on… CPU utilization looks normal… MySQL looks OK… Refresh the browser… the site is still up. OK let’s have a look at the MySQL logs… Still nothing. So I called the developer to confirm that no moves were made into Production on Friday. I rebooted the box and everything seemed to return to normal. 2 hours later the phone rings once more… The site is down again Bill. Man this isn’t even my server… This is really going to be bad if I have to reboot the server every 2 hours this weekend. Opened a browser window and got the database connection error message again. OK let’s take a look that the system logs… WOW that’s funny the kernel is error’ing out and throttling back the the network stack. OK… Let’s see what netstat turns up… ouch! There were hundreds of connections in a FIN-WAIT or a SYN_RECEIVED state? What’s going to? Did some one patch the OS on this box? Nope… Let’s check the throughput of this box… 75,000 requests per second… Now one could dream but I’d think this was pretty rare occasion for this domain! OK… Let’s see if I could get at the firewall logs… Sure enough there were thousands of connections open. WOW we were in the middle of a DDoS (Distributed Denial of Service) attack. I couldn’t believe it.

The point of this is that it doesn’t use fancy network tools to figure out what’s going wrong with a  machine. I didn’t use a network sniffer. The box was not one of mine so I didn’t know the state of the server.  I used what was on the machine and started by eliminating variables. But the really big lesson learned is, it doesn’t matter how small you think your site is, it could always be the target of something like a DDoS.

My Roman Catholic upbringing taught that while Jesus Christ took on human form he was still God. This belief is fundamental to Catholicism. I’ve often heard my Asian friends say they go to the temple to give reverence to the ancestors. I always understood this to mean that they were paying their respect to immortal beings or gods. It wasn’t until today that I realized that in Taoism, mortals could be deified and worshipped as gods. Guess this makes sense, as a big part of Taoism is the harmony between humans and the universe!

Evidently this deification process was happening as late as the 12th century. Che Kung, who a great general during the Southern Song Dynasty (1127-1279) was deified for his devotion to the people of Sha Tin.

It is thought that he had the ability to suppress plagues and many believe that Che Kung was responsible for keeping the Sung Dynasty alive by providing safe passage for Emperor Bing and his brother during the rebellions in Southern China. It is because of this that many now considered him a god.

There are two temples dedicated to Che Kung in Hong Kong… the most famous being the Che Kung Miu near Tai Wai, in Sha Tin District, New Territories. The temple complex is once again undergoing renovations.

Throughout the temple are pinwheels. It is believed that good luck will come upon those that spin the pinwheel.

MTBF

2 comments

I work in IT and one of my job functions is to warehouse the image files of a corporate creative department. Translated… that means I buy a lot of storage. One of the things that storage admins are looking at is the failure rate of the disc drives that make up their SAN environments. The higher the failure rate of a particular drive the better your chances of having a catastrophic loss… Or in other words you’re restoring from tape if you loss a lot of drives at one time!

MTBF (or mean time before failure) is a standard measurement (in hours) we use to calculate the life of a disk drive before it fails. The other measurement we use is AFR (or the annualized failure rate), which is expressed as a percent based on the MTBF verse the amount of time that device is powered on and running. A couple of things to note… MTBF is not necessarily a devices useful life. And AFR is not meant to be applied to a single drive but rather it is the expected failure rate of any given drive within a particular production run (population).

So what does this all mean?

Well most vendors spec consumer-geared disk drives at about 300000 MTBF. That being said the key word in MRBF is M (or mean). So what we’re looking at is about half of the drive for a given population with fail in the first 300000 hours of use.

Translated again… and I got help on this one ;-)

If you had 600,000 drives with 300,000 hour MTBFs, you’d expect to see one drive failure per hour. In a year you’d expect to see 8,760 (the number of hours in a year) drive failures or a 1.46% Annual Failure Rate (AFR) (Harris, 2007).

Realizing that this is what a manufacturer quotes as the expected life, one has to ask how does that hold up in reality. Well Google did a bit of research on this and found that their failure rate was much different from that of the manufacturers. Why? Because there is no clear definition between what a manufacturer considers a failure and the real world’s expectation on these devise are.

In reality many factors will determine whether a drive should remain in production. Call is an IT admins intuition… Call is that odd clicking sound… calls it taking forever to save a file… Often time we (IT professionals) will replace a drive before it is completely unusable (or the point where we can no longer retrieve data from the device). Did the drive fail? Technically no… Practically yes! If we can’t rely on the drive to reliably save and retrieve data that it has fails for our purpose… guess some manufactures don’t see it the same way!

Resources:

Harris, R., (2007, February, 19th), Google’s Disk Failure Experience, retrieved on June 3rd 2010 from http://storagemojo.com/2007/02/19/googles-disk-failure-experience/

The Art of War is governed by five constant factors, to be taken into account in one’s deliberations, when seeking to determine the conditions obtaining in the field.

The Moral Law
Heaven
Earth
The Commander
Method and Discipline

- Sun Tzu, The Art of War

Wow what a week! It was a stroll down math’s hit parade… number line theory… adding fractions… primes… substituting variables… and the rules for the order of mathematical operations.  The fact is we use math everyday but rarely do we think about the fact we are using math! So let’s see how we take our math skills for granted!

The other day I was in NYC. I had $7.50 in my pocket for lunch! It was the end of the week and wife’s snagged my wallet so going to the ATM was out of the question! For anyone who’s never been to New York, filling your belly on $7.50 is not an easy task!

I was in the mood for pizza. I ran into the nearest pizza place and saw that a slice of pizza costs $3.50 and a coke would run me an additional $1.50. Now I know this is going to be a stretch but bear with me… Let’s put some number line theory to work! Let’s look at 0 on the number line as being the dividing mark between contentment and starvation! If I drop into the negative side of the number line I’d go hungry. If I stay on the positive side, I’d walk away with a full belly!

Let’s begin…

Starting at + 7.50 on a number line… let’s do some math. 2 slices of pizza, because one slice wasn’t going to cut it… could be represented by the following the equation:

(2 * -3.50)

Let’s apply that to our number line.

(2 * -3.5) = -7.00 + 7.50 (our starting point) = .50

So we’re still positive…  still good! BUT then I need to add the coke in.

.50 + (- 1.50) = -1.00

As you can see I’ve fallen into the negative side of the number line at -1.00. Bill goes hungry.

I know one can say do without the Coke… but I just can’t eat a slice without and icy cold soda!

Let’s look at the menu again!

Ohhh… that calzone looks good at $6.50 for a plain one (I’d have to sacrifice palette for hunger)!

Back to the number line…

(1 * -6.5) = – 6.5 + (7.50) = 1.00

Now we’re talking… still on the positive side. BUT I still need to add in that icy cold Coke (it doesn’t matter… just need one of them to swallow back food with)!

- 6.5  + (- 1.50) = -.50

Poof… I just got blewn that out of the water by .50. I’m running out of options! Let’s see what else is on the menu!

Ahhh… Garlic knots at $2.25. SO maybe I can do a bag of knots, a slice of pizza and that icy cold Coke!

-3.5 + (-2.25) + (-1.5) = -7.25 +7.25 = +.25

Now we’re talking! Still on the positive side of zero… SO I guess I’ve got my lunch! Contentment!

Is my example simple? Yes BUT this is the kind of math that we perform automatically everyday without really putting any effort into it!

Stay tuned for primes and encryption next week!



During the course of our day, we are often asked to provide answers. What’s wrong with the server? Who’s going to take the late shift? How can we implement this? We rely on the skills that we have developed over years of work and provide quick answers! Most times we’re right but every so often because we haven’t put the time into thinking things through, we are wrong! And the thing is most times the powers that be don’t always consider the things that have went well, BUT rather focus on those few times things that went wrong!

So what are we to do? Look forward and anticipate next steps. Think logically about the problem. Come up with a game plan! Sure following your gut will help but only to a certain extent. While your initial response may be right most of the time it is not always on the money! Part of preparation is mitigating risk. If we are studying for a test, we concentrate on what we feeling are the important concepts… Studying the longest on those concepts! You may get some answers wrong BUT if you do your homework you may still walk away with an A even though you got a few answers wrong. You have to balance getting every question right verses getting most questions right. Acceptable losses!

Putting together a game plan often requires more time and effort than actually implementing a solution. Understanding the why is of the utmost importance! Knowing your audience and the reasons they need the technology rolled out is the key to a successful rollout. We are not here to simply implement the next coolest technology. We are here as enablers! We are here to leverage our understanding of technology with a clear business need!

Once you have the business need figured out, its time to start thinking about a rollout plan. What is the return on investment? How many man-hours will it take to get this done? Will this work in our environment? How will we do a proper pilot program? The list goes on and on. It is up to us to try and discover as may of the potential problems BEFORE we put our solution into production! Take copious notes! Remember you’ll need to reproduce this and introduce it into production. Focus on the important points. Understanding the scope of the project will help you determine whether or not an obstacle is truly an impediment to a successful rollout! Remember… there will always be problems; it’s just a matter of whether they are show stoppers!

This leads to how you are going to mitigate the problems you can’t get around. I’m in the middle of a fairly large migration from one directory service to another. It requires me to make changes to servers throughout the world. I can’t update all the servers globally in one weekend. SO… I’m going to have some users that will need to remember two passwords when they log in on Monday morning. One for their login the other to access resources not yet migrated.

So how are we going to break down this problem?

THINKING THINGS THROUGH – In an optimal situation I’d be able to coordinate the migration globally. That would assume I had fully qualified people situated at each location that had a server that needed to be upgraded. This will lead to users having to remember two passwords and which to use at the appropriate time because we will have two directories running at the same time.

UNDERSTANDING THE RISKS – Unfortunately this is not the case. For power users this shouldn’t really be a problem but for novices this could lead to multiple failed logins that would have the helpdesk resetting passwords the first day after the implementation. So how do we avoid all the phone calls about failed logins on Monday morning. We could have the two directories trust each other. BUT that would require a lot of extra work.

MITIGATING THE RISKS – We can use Keychain Access (yes we’re talking about an OpenDirectory migration) to securely store the passwords from the old directory. This gives the impression of single sign-on. Alleviating the need for users to remember multiple passwords.

Yes this is really simple and it doesn’t truly depict the actual planning of the directory migration… if it did I’d be out of a job! BUT the point is you need to think about what you’re trying to do. There really isn’t one right solution. The solution is dependent on how your organization is able to handle the situation at hand. Think through the issues ahead of time is most important. Create troubleshooting checklists to pass out to those individuals (both the end-user and the helpdesk staff. Test, test and test again. The more you plan, the better your chances of a successful rollout!

There is suffering.
Suffering has a cause.
Suffering has an end.
There is a path that leads to the end of suffering.

– Gautama Buddha

Security surrounding PDAs and other “smart-phones” is a complicated issue. I for one own an iPhone (but hopefully for not much longer)! I know… I know! Here comes the classic iPhone / Blackberry debate. It’s been a hotly contested acquisition! IT would prefer I use a Blackberry. They feel they have more control over the device and in many respects they do… BUT they don’t want to pay my expenses and I’d much rather a richer Internet experience. Fortunately for me many senior VPs in the organization wanted an iPhone as well.

Why give all the background?

Because sometimes technology is driven by the business and thus needs to be supported by IT. We need to find the best way to make these devices secure even tough they may not have all the security bells and whistles IT is looking for.

These devices have allowed us to spend a little less time in the office and a little more time doing the things we want… But there is a cost. Sometimes in the course of using information we have to deal with data that is sensitive… whether it is of a military nature or mere intellectual property concerns! The reality is these devices are now capable of holding a lot more information. In fact some of these device now offer the ability to extend its capabilities though the use of SD cards! So how do we protect the company and the data we all work so hard to create? Corporate policy! We need to have clear guidelines as to what data we will allow on any device… that includes USB thumb drives!

Most of us use these so-called smart-phones as glorified email and calendaring clients. Both Blackberry and the iPhone offer differing levels of security over these devices… Both offerings allow for remote wipe! Blackberry does this though the use of its proprietary server product… the iPhone relies on its implementation of Microsoft’s ActiveSync. Certainly RIM’s offering is a lot more feature rich… but one needs to keep in mind the type of data we are protecting.

Email in many ways has become the ultimate corporate communication tool. I’ve recently rolled out a BPM solution where I work and as I’ve been demo’ing the application, I’m constantly asked if the tool will email everyone involved in the project. And while it is possible I stress that the tool is not a replacement for picking up the phone and speaking… collaborating… understanding! Another example… the people I support in Asia have 10’s of GBs worth of emails… dating back 10 or more years. Why? To cover their bottoms! I think the need to cover one’s bottom is pervasive in many corporate cultures… and thus email is the perfect tool. Now one has it in writing, one can receive delivered and read notifications too!

Just picture it… “There’s no denying you read my emails!” as I slap down a stack of printed copies like Perry Mason.

I bring up Perry Mason because like it or not we are a very litigious society! We sue over the smallest thing! Some rightfully so, other suits… ahh not so much! E-discovery has become a big thing. In American law, discovery is the pre-trial phase in a lawsuit in which each party through the law of civil procedure can request documents and other evidence from other parties and can compel the production of evidence by using a subpoena (wikipedia.org, 2010). Therefore e-discovery is the production of electronic evidence, which can include… IM chats transcripts, excel/word documents, PDFs, web pages, source code, databases, graphic files or in our case emails. Not only does the defendant have to produce these documents, they need to provide complete records and in a timely fashion. If the defendant does not comply accordingly, many jurors perceive this as… “They have something to hide.” These documents are required to be preserved. Additionally, the company needing to disclose these documents needs to provide a document detailing the extent of the search they conducted.

E-Discovery is no small matter and requires a great deal of attention to adequately produce relevant documents. Systems need to be put into place to ensure e-discovery compliance. These systems include a stated policy on the retention of email distributed within a company. Centralizing data is another way to minimize the efforts required to comply with discovery demands. Additionally, organizing the data and providing mechanisms to rapidly search documents for specific keywords across the entire enterprise. Maintaining strong access controls over your data is essential to providing strong evidence! If a lawyer can prove that you didn’t have full control over your data, they can then argue that the data could have been tampered with reducing its credibility in court.

Ultimately, being able to produce evidence in a timely fashion helps your credibility in court. Noncompliance can be costly as well! Fines and other legal sanctions can be placed upon an organization that fails to “protect” its data!

Resources:

Various, (2010, February 9th), Discovery (law), Retrieved on February 23rd, 2010 from http://en.wikipedia.org/wiki/Discovery_(law)

Snow Day!

No comments

A quick message from Izzy to her Grandmoms!

Problems Viewing? Download it Here!