Skatterbrainz Blog: systems architecture

Showing posts with label systems architecture. Show all posts

Wednesday, April 23, 2014

These are a Few of My Favorite Views (SQL with Fries and a Pepsi)

If you use System Center Configuration Manager 2007 or 2012, and you like to drink beer and play with firearms and flammable substances, and things like SQL Server Management Studio, well, I have no clue what you're thinking about. Just kidding. There are three places I can promise you I can spend a lot of time, calmly perusing the landscape:

A Lowe's or Home Depot store (preferrably on a slow, week night)
The beer section of a store like Total Wine
SQL Server Management Studio and a System Center Configuration Manager site database

The first two are not at all surprising. Almost any guy that lives near a big hardware store or beer store is in the same boat (apologies to any recovering alcoholics out there). But the third is my personal weakness. In days long past, it would have been supplanted by a good Lego kit, a stack of Marvel comic books, or a pile of Revell model kits or Estes rocket kits. These days, crossing the 50 year milestone, it's become more geeky. Sure, I love me some good woodworking projects (I just finished a badass bench swing in my backyard, all done by hand, no power tools. p-shaaaa!)

My favorite database views?

v_Advertisement
v_Collection
v_Package
v_GS_Computer_System
v_R_System
v_GS_Installed_Software_Categorized
v_GS_System_Enclosure
Any of the v_CM_COLL_XXXXXXXX collection views

Of course, I really dig poking around the site-related views, boundaries, discoveries, metering, etc. As well as the operating system views, x86 memory stuff, collected files, software products and so on.

The saddest part of this? I rattled those off from my pointy head, without being anywhere near access to the tables or views. That's how deep they've become ingrained in my squishy skull.

If you're as twisted as I am, open SQL Management Studio, right-click on the Views section and create a new View. Drag in some of the ones named above and link on ResourceID (or MachineID for some), and start exploring what kinds of cool things you can build on your own. Data is like Lego's to me now.

Sunday, April 20, 2014

IT Catastrophes: Triage and Compression with Fries and Coke

Triage (noun)
(b) the sorting of patients (as in an emergency room) according to the urgency of their need for care

Compression ()

The state of being compressed (re: reduced in size or volume, as by pressure)

(source: Merriam-Webster online dictionary)

This is another one of my silly-brained IT monologues about subjects which are rarely discussed.

What I'm talking about is a loose comparison and contrast with these two words as they relate to medical and technology fields. It is however a very real subject (or subjects) for those of us who occasionally deal with critical outages, especially those which involve things like:

Highly-Available Hosting Services (think: Google, Microsoft, Facebook, etc.)
Mission Critical Systems (think: Defense, Lifesaving, etc.)
Service Level Agreements (the dreaded SLA's)

It's kind of funny how most businesses feel their operations are "mission critical" or "highly-available", when they're being subjective. From an objective view however, it's not always as "critical" when things go "down" for a few minutes; even a few hours.

By the way: Compression, as it pertains to this article relates to the compression of time in which you have to operate in. The time between a failure and sufficient restoration of services.

When dealing with a system outage in a truly "critical" environment, the first steps are pretty much the same as what an Emergency Medical Technician (EMT) would have to consider:

What exactly is not working?
How serious is the impact?
What is known about what led to this outage?
How long has it been down?
How much time is left?

You were probably thinking of the Who, What, Where, When, Why and How sequence. I kind of tripped you up with two What's and three How's. (Technically, #4 could be a "when", and #2 could be a "who" or "where", but whatever). Let's move along.

With regards to a human, the general rule of thumb is 4-6 minutes, total. That's about how long the brain go without Oxygen and still recover. Compression CPR is usually the first course of action to sustain blood flow; keeping the remaining oxygen-rich blood reserves moving through the brain. Enough pseudo-medical blabbering. The main point is that there is a "first-course of action" to resort to in most cases.

What aspects are shared between a medical outage and an IT system outage?

There are measurable limits to assessing what can be saved and how
There are identifiable considerations with regards to impact on various courses of action
Techniques can be developed and stored for more efficient use when needed
Steps can be taken to identify probable risks and applying risk mitigation

With regards to a system-wide outage, the general rule of thumb is not so clear-cut as the 4-6 minute rule. It truly varies by what the systems does and who (or what) it supports. Consider the two following scenarios:

Scenario 1

The interplanetary Asteroid tracking system you maintain is monitoring a projectile traveling at an extremely high velocity towards planet Earth. The system "goes down" during a window of time in which it would be able to assess a specific degree of variation of its trajectory. The possible margin of error from the last known projected path could have it hit the Earth, or miss it by a few hundred miles. The sooner the system is back on line, the sooner a more precise forecast can be derived.

Every hour the system is offline, the margin of error could potentially be re-factored (and reduced) by a considerable amount, possibly ruling out a direct hit. The best estimate of a direct impact places the date and time somewhere around one year from right now. Your advisers state that it would require at least six months to prepare and launch an interceptor vehicle in time to deflect or divert the projectile away from a direct Earth impact.

Scenario 2

Your order-tracking system for MySuckyShoes.com is down and customers are unable to place orders for new sucky shoes. Your financial manager estimates that during this particular period of the year, using past projections, combined with figures collected up until the outage, every hour the system is offline, you are losing $500,000 of potential sales revenue. The system has reportedly been offline for two hours. So far, that's $1 million bucks.

Which of these scenarios is more critical?

Answer: It depends

What are the takeaways from each scenario?

How long do you have to restore operations before things get really bad?
Having the time window defined, what options can you consider to diagnose and restore services?
How prepared are you with regards to the outage at hand?
What resources are at your disposal, for how long, and how soon?

In the first scenario, you have roughly six months to get things going. Odds would be generally good to assume you can restore services sooner than that, but what if the outage was caused by an Earthquake that decimated your entire main data center? Ouch.

In the second scenario, the margin would depend on the objective scale of revenue your business could withstand losing. If you're Google, a million dollar outage might be bad, but not catastrophic. If you're a much smaller business, it could wipe you out entirely.

What's really most important (besides the questions about what systems are down, why, when and how) is knowing what the "limits" are. Remember the 4-6 minutes rule? SLAs are obviously important, but an SLA is like a life insurance policy; not like a record of discussion between the EMT in the ambulance with the attending physician back at the hospital ER. One is prescriptive and didactic. The other is matter-of-fact, holy shit, no time to f*** around.

QUESTION: When was the last time you or your organization sat down and clearly defined what losses it can absorb and where the line exists whereby you would have to consider filing for bankruptcy?

Is your IT infrastructure REALLY critical to the business, or just really important? In other words: could your business continue to operate at ANY level without the system in operation?

Forget all the confidence you have in your DR capabilities for just a minute. Imagine if ALL of your incredibly awesome risk avoidance preparation were to fail. How long could you last as a business? At what point would you lose your job? At what point would your department, division or unit fail? At what point would the organization fail? Or do you think it's fail-proof?

Thursday, March 20, 2014

Stick Shift or Automatic: Software Deployment by the Numbers

No. I'm not trying to promote sales of any books/ebooks, not even my own (cough-cough). I am about to dive into a murky subject that many Windows-environment Systems Administrators and Systems Engineers have a very tough time understanding. And, as if that wasn't enough, I would venture to bet that very, very, veerrrrrrrry few Business Analysts, Project Managers and executive management folks have even a basic grasp of this subject.

The irony of this is that this is about as fundamental to any medium-to-large scale IT environment as bricks are to building a big house. But even a small business can benefit from this, so don't scoff and walk away just yet.

What am I talking about?...

Manual versus Automated Software Product installation, and Repackaging.

Two things which fit together like peas and carrots. Or hookers and politicians. Or Florida and hurricanes.

Yes. It's 2014, and from what I can tell there is a frightening number of so-called "technology professionals" that sincerely believe that there is little or no difference in terms of cost or quality between these to approaches. These two diametrically-opposed approaches, that is. In fact, many think that manual installations are cheaper, even when dozens of installations are involved per product. I am not joking, nor am I exaggerating. Please read on.

Most of them, if fed enough beer, or Xanax, would bet their retirement interest on the assumption that the differences between these two are as close as driving a vehicle with stick-shift versus an automatic transmission. That this is a monolithic, linear-scale, pound-for-pound comparison.

It's a good thing for them that the retirement interest on their fortunes is almost invisible to their overall balance sheet. A few zeros can get lost in all that green I'm sure. When you dig into the real numbers, the comparison is about as close as a boxing match between Mike Tyson on his "best day ever" and Richard Simmons after getting a root canal.

All kidding aside, let's do some math. mmkay?

Quasi-Scientific Analysis

Let's say product "Fubar 2014", from a well-known vendor, is required by 500 of your employees in order to "do their assigned job duties". You have a minimum wage minion whip out a stopwatch and begin timing the installation on the first five computers. The minion tallies up the results and hands it to you. It goes something like this:

Technician walks over to computer "123" in building 42, up on the 3rd floor, in room 112 and sits down. Time spent getting there by foot/bicycle/car/private jet/yacht or teleporter is, on average, 5 minutes.
Technician then logs on and opens Windows Explorer. 2 minutes (waits for initial profile setup)
Navigates to central file server share on the network (Active Directory domain environment) to locate the folder containing Fubar 2014 setup files and related files. 1 minute.
Navigates to "prereqs" subfolder to install individual products which are required before installing Fubar 2014: Java Runtime 1.5 (vendor says it can't work with 1.6 or later), then Apple Quicktime 7.1, Adobe Flash Player 11.5 (Fubar 2014 won't work with version 12), and a few other items. 10 minutes.
Double-clicks on Fubar 2014 "setup.exe" file to launch main setup. Clicks Next on the Welcome page.
Accepts default for installation target folder path, clicks Next
Checks the EULA terms and enters a license product key. Clicks Next.
Waits for installation to complete. 8 minutes.
Goes back into Windows Explorer and right-clicks on a folder under C:\Program Files\Fubar to open the Properties form. Modifies the NTFS permissions to allow members of the local "Users" group to have Modify/Change permissions on that folder and all sub-folders. This is required since the users do not have local Administrator rights, so UAC has been a problem. This has been known to resolve the problem, so your tech goes ahead with the routine modification. 5 minutes.
Tech goes into Windows Services and disables a service that Fubar 2014 uses to check for periodic updates, which users cannot install without elevated permissions, so this is a standard practice at your shop to disable it. 2 minutes
Tech opens REGEDIT, navigates down to HKEY_LOCAL_MACHINE\Software\Fubar\Fubar 2014\Settings\ and changes the value of "autoupdate" from 1 to 0. 1 minute.
Tech reboots computer, and waits for login screen to log back on. 2 minutes.
Tech logs back on (1 minute or less) and launches Fubar 2014 to confirm it works. While still opened, Tech navigates into settings to change the option (Tools / Options / Data) to set the "Default library location" to a central UNC server path where all the users share templates and common items to maintain standards. 2 minutes.
Tech closes Fubar 2014 and logs off.
Tech goes on to next location and repeats this process.

Paying the Bill

If you kept track of the time spent above, that's 5+2+1+10+8+5+2+1+2+2 or 38 minutes. That's without ANY interruptions or unexpected problems. And that's assuming the computers are relatively new and performing well.

In reality, from tests I have witnessed over the past 5 years alone, in various enterprise environments from 5,000 to 50,000 computers, the average time to perform an installation of this magnitude is roughly between 35 and 50 minutes.

When performed during business hours with people around in close proximity, the times averaged 45 minutes to 1 hour.

When additional problems had to be resolved, such as missing updates, recovering disk space, removing conflicting components, that range increased to around 1 hour 20 minutes to 1 hour 50 minutes.

I haven't even mentioned:

Time spent deactivating old licenses
Time spent activating new licenses
Time spent dealing with device drivers
Time spent dealing with custom network interface settings
Time spent on the phone dealing with vendor support:

Large vendor: waiting on line, listening to 70's pop music, interlaced with endless repeats of ads for their other products, like their "new cloud services". awesome.
Small vendor: waiting for guy (company owner/programmer/tester/web admin/support rep) to move his cat off the desk so he can flip through his paper stack to find your purchase order.

Impact on end-users while they wait for the tech to do their work
Impact on production from unexpected conflicts with other line-of-business products which are only discovered after the installation because there was no lead-time testing afforded.

In situations where a previous version had to first be uninstalled before performing a new install of the later version (usually because the vendor didn't want to take the time to handle this within their new installation package) the time ranges increase to around 2 hours to 2 hours 30 minutes.

Simple: 35 - 50 minutes

Complex: 120 - 150 minutes

In beancounter English: that's a range of roughly 1 hour to 2-1/2 hours.

Repeat this times 500 and you get anywhere from 316 hours (simple) to 1125 hours (pain in the ass).

Multiply that times the technician labor of say, $9/hour (you're a cheap bastard, after all), and that equates to roughly $2,850 to $10,120 of labor. For ONE software installation.

I'd guess you probably have more than a few products that would be handled this same way across your organization.

Are you starting to see where this is going yet?

Sanity Check Time

Now, let's crank this puppy through ONE cycle of repackaging effort and see how this spews out the other end of the meat grinder.

Software Package Engineer (hereinafter SPE) opens a CMD console within a guest virtual machine (VM) running inside of VMware Workstation or Microsoft Hyper-V (take your pick).
Navigates to folder where Fubar files are stored.
Launches setup.exe -r and completes a normal setup process.
SPE grabs the resulting setup.iss file from C:\Windows and copies it into new folder along with the original setup files. 5 minutes total by now.
SPE opens a text/code editor and loads a template script to enter some lines to handle checks for prerequisites like JRE, Silverlight, Quicktime and so forth.
SPE enters code to invoke the setup.exe with setup.iss and redirect the output to a new log file. Total of 15 minutes by now.
SPE saves script and puts all the files into the new deployment source folder. SPE launches a separate VM, which is configured to match the configuration of the computers in use by employees who will be getting the installation. SPE runs the install using a command shell to ensure it runs "silent" and requires no user interaction whatsoever. Total runtime, including launching the VM and logging on is now around 30 minutes.
SPE emails or IM's the designated customer SME (that's subject-matter-expert) who was nominated to be the "test user" and asks them to log into the VM using Remote Desktop and kick the tires. Time spent contacting the customer about 1 minute.
SPE moves on to work on other packages or tasks while waiting for customer to respond and attempt the testing (parlance: User Acceptance Testing, or "UAT") No time expended by SPE during this period by the way.
Customer gives the package "two thumbs-up!" and the SPE moves it into staging for production deployment. SPE creates a new "Application" in System Center Configuration Manager 2012, creates a resource Collection with the computers to be targeted, and assigns the Application to the Collection using an Advertisement. 10 minutes (he's drinking decaf this morning)
Advertisement is scheduled to run after hours, avoiding impact on production time against the customer staff who will receive the new installation. SPE does not have to wait for the installation because it is scheduled to run on its own, so he/she checks on it the next morning.

Total time spent: 5+15+30+1+10 = 61 minutes.

I realize I said "he" a lot, but "she" could do just as well obviously, so that's irrelevant.

Things I didn't include:

UAT problems resulting in having to go back and make adjustments and retesting
Pre-flight deployments to verify conflicts in production on a limited subset of computers.
Next-day support calls for incidental one-offs like machines being offline or a service was stopped or an application was open and had a lock on a file that prevented the Advertisement from completing successfully.
Cats walking across keyboards and causing a BSOD.
Who knows what else.

Taking those things into account, the ranges can jump from 60-80 minutes for a simple scenario to 2 hours, for just a simple repackaging effort like the one Fubar 2014 involves.

In the "real world" some products can be much more difficult to repackage and may consume days or weeks of development and testing in order to get a rock-solid package into production. Those are rare, but even then, EVEN THEN, the savings when calculated across hundreds or thousands of computers, spread across multiple locations, states or continents, can be well worth the effort.

Think of "mission critical" applications, where the time window to get them into production, with 99.999% success rate, is only an hour or two, end to end, over 20,000 or 30,000 computers. That's not fiction. There are industries where this is not uncommon, and they rely heavily on this methodology to ensure:

Highest probability of success
Consistent and predictable results
Minimized impact on production operations
Optimum transparency of all moving parts (think reporting and monitoring)

Steak Dinner Anyone?

So, this SPE makes $75,000 a year in USD, roughly $36/hour, and spent an hour building this simple package. That's $36 to deploy one product to 500 computers over one evening without asking any users to step away from their computers during work hours.

The cheapest scenario in the first example was $2,850.

The most expensive scenario in the latter example was $1,442.

Even if the SPE had to devote an entire week to one product, or roughly 40 x $36 = $1,442, that's a LOT CHEAPER than a $9/hour tech running around to 500 computers, or 10 x $9/hour techs running around to 50 computers each.

That's Not All

Billy Mays homage: Now, if you go with the repackaging and automated deployment scenario, you have a mechanism in place that does the following for you without ANY additional cost:

Provides automatic self-healing of deployments, to cover situations where a targeted computer is replaced or reimaged with a fresh Windows configuration.
Provides simple, effortless scalability for future growth.
Provides a robust auditing and reporting trail for accountability and compliance.
Provides fault tolerance
Provides coverage during non-production hours.

Still think that thumb drive is the way to go? Hmmmm?

Thursday, January 9, 2014

The Angry Admin: Time for a Rant about Naming Conventions

If you work with network administration of ANY kind (file and print services, directory administration, web applications, software deployments, etc.), pay attention:

Take a moment to review how you name things.

Devices: Servers, Desktops, Laptops, Tablets, Mobile Devices, Printers, Printer Pools/Groups.

Resources: Software Packages, Log Files, Folders, Shares.

Active Directory: User Accounts, Security Groups, Contacts, Group Policy Objects, WIM Files, Forests, Domains, Sites and Subnets, Site Links.

SCCM: Packages, Collections, Advertisements, Task Sequences, Boundaries, Catalog Descriptions.

All of it. EVERY F-ING THING.

Then, ask yourself this: If I had to manage 50,000 of EACH of these things, what would make it easier to filter out and find things by logical grouping?

By type or category (desktop, laptop, high-end vs cheap-crap, servers, physical vs. virtual)
By organizational entity (department, division, group, sector, team, project, etc.)
By security role (admins, users, special roles, etc.)

You might laugh this off or smirk, but om-f-ing-g: this comes back to bite the living shit out of more people than I can count. If I had a dollar for every time I've heard an SE, SA or whatever say "damn. I wish I would have named this stuff like ____." I would be retired and living on a tropical island. My own island. With my own castle and airplane and an armada of jetski's to go with it.

Just do it.

Tuesday, October 9, 2012

Configuration Manager: Database Exploration, Part 2 - Notes

Before I continue on with this is "theme" that I've started, there are some very important issues I need to discuss. Rather than boring you with a long introduction, I will just dive in and hit each one as I go. I'll warn you that this is article is taking a sharp turn into a dark tunnel of seriousness. No joking around here. Very unlike my usual goofy stuff, but it's important to cover this before I continue on.

SMS Provider vs. SQL Server

From a "purely technical" aspect, you can interact with, and manage, a Configuration Manager site data store through the SMS Provider interface, or through SQL Server (ADO or ADO.NET, etc.), however, you absolutely NEED to be careful to avoid some easy mistakes. This is all within the context of building custom applications which interface with your Configuration Manager infrastructure. This is also regardless of whether you are working with Configuration Manager 2007 or 2012.

While you can query (retrieve) information from either interface, the SQL interface is usually much faster to execute. I'm obviously talking about using the ADO or ADO.NET pipeline. However...
NEVER attempt to update anything directly through the SQL Server interface! All operations that involve modifying site resources, collections, or settings (and so on), should be performed through the SMS Provider only. Some examples include adding a Package to a Distribution Point, or adding a Resource to a direct-membership Collection. Going around the SMS Provider can cause serious problems for your Configuration Manager site. I'll spare you the lengthy explanation of how the inboxes and outboxes are spooled and de-spooled in the background, and how it all weaves in and out of the database
Executing intensive queries (or updates, for that matter) against the SMS Provider interface can impact Configuration Manager processing, especially if performed at peak processing times (discovery cycles, software deployments, etc.). The net result may cause a backlog in data processing and show up in your component status logs as well. Try to limit such activity to off-peak times or days to avoid impacting Configuration Manager itself.
Executing intensive queries directly against the site SQL Server database may also impact performance, and should be carefully monitored by using SQL profiling and performance logs to determine the level and duration of such impact.
Use the most efficient tool to handle a specific task: If you are post-processing query results and spending a lot of code cycles calculating date differences, cost values, or mapping integers to string values - do that instead within the query!!! SQL is so much faster and more efficient at many common data manipulation tasks than standard 3GL, 4GL programming languages or scripts.
Minimize Connections! If you have code firing off multiple queries, be sure to pay close attention to how you open and close your data connections. If you can use one connection for all of your queries, do it. It will save time and reduce the overhead impact on the data store host itself. This is true for using SQL Server or the SMS Provider.

Database Separation and Isolation

Most any DBA with a fair amount of experience will advise you to avoid direct interaction with "mission critical" data stores if you can instead use a replica. It really boils down to how time-sensitive the information is that you rely upon to accomplish the required task. If you need to generate inventory reports, and your inventory is only updated every day or week, you probably could do just fine by pointing your queries at a replica database and avoid adding more overhead on your production database. It's just one more thing to consider if you are worried about performance impact.

The Right Tools

If you haven't used SQL Server Management Studio, or haven't used it much, give it a try. In fact, if you're testing your queries through your code debugger, STOP. That's a bad habit and can yield some very skewed results. As the old saying goes: "Just because you CAN, doesn't mean you SHOULD". I can't count the number of times I've asked a programmer to minimize their code debugger and run the same queries in the SSMS console, and seen their reaction to how different the performance can be. It can really highlight where program code is slowing down a conversion or calculation step that could be more efficiently executed within the SQL statement.

It's not really about SSMS. Any tool that lets you model and execute T-SQL statements directly against the data store will work fine. It's when you run the SQL expressions from within the program code that things can get twisted. Eliminating secondary and tertiary processing layers ensures you get an accurate, honest and clear picture of what's going on.

Safety

Living on the edge is cool, if you get paid to do commercials for Red Bull. For the rest of us, it helps if we take certain precautions to avoid letting simple mistakes explode into disastrous calamities. If you have the option of a test environment, use it. If not, employ test-environment methods to mitigate unintentional impact on production systems. It's really that simple.

Thursday, June 28, 2012

Jobs. Jobs. Jobs. IT Jobs

My employer, Endurance IT Services, located in Virginia Beach, Virginia, is hiring. Yes!

(2) Engineer (Tier II) - Windows/VMware/Exchange (Hampton Roads)

Looking for Systems Engineer with a minimum of 7+ years of hands on experience with diverse network environments. This position is for a full time opening on our engineering team. This team is responsible for the design, installation and support of networks for numerous clients in Hampton Roads. The typical environments include Microsoft Windows 2003/2008 Servers, Exchange 2003/2007/2010, Cisco routers/switches/firewalls and other 3rd party applications. Bachelor degree preferred. Ability to assess and formally document client environments a must.

Key Skills include all the items below:

Microsoft Exchange 2007 / 2010
Windows Server 2008 / 2008 R2
Active Directory
Networking with various products

(2) Senior Engineer (Tier III) - Windows/VMware/Exchange (Hampton Roads)

Senior Systems Engineer with a minimum of 15+ years of hands on experience with diverse network environments. This position is for a full time opening on our engineering team. This team is responsible for the design, installation and support of networks for numerous clients in Hampton Roads. The typical environments include Microsoft Windows 2003/2008 Servers, Exchange 2003/2007/2010, VMware Vsphere, SANs, Cisco routers/switches/firewalls and other 3rd party applications. Bachelor degree preferred. Ability to assess and formally document client environments a must.

Key Skills include all the items below:

VMware in HA environments
Microsoft Exchange 2007 / 2010
Windows Server 2008 / 2008 R2
Networking with various products

We are looking for strong candidates with technical support skills and great interpersonal skills.

We are a fast growing company that is strongly focused on customer service and satisfaction. We are building a corporate culture which supports learning, growth and advancement in the Network Services career field. Our focus is solely network services. We offer competitive salaries, a comprehensive benefits package, and a great place to work.

(A LOT of employers say the above mumbo-jumbo, but I will concur 110% that it is indeed a great place to work for.)

If you live within the Hampton Roads area, and are both qualified and interested in one of these available positions - contact me for more information at ds0934 (at) gmail (dot) com.

Tuesday, January 24, 2012

Mixing Oil and Water: Systems Engineering and Software Development

In all my years working this field we call "IT" I have never seen an environment where these two groups coexist cohesively and with a common goal. Systems Engineering and Software Development. Sure, they might get along well, and share working spaces, but as functional entities they typically exist in separate worlds, chasing their own beasts to slay. And why not? After all, they serve different purposes, for different masters (usually). They use different tools and techniques. They often have very different reasons for doing things as well.

The Developers want to create and update things.
The Engineers want to implement and upgrade things.

Hmmm. Those aren't that different after all. Are they? I know some folks would argue that engineers want to implement and "maintain" things, but maintenance is really an operations role, which in most cases falls into the hands of Systems Administrators (or "sysadmins" to use contemporary parlance). Engineers want to engineer things. Simple enough. However, maintaining isn't really engineering, it's really maintaining or administering; keeping things moving along without interruption.

Developers want to create new things. Write new code, new features, new widgets. Add new capabilities or make the interface more "cool" and intuitive. They don't want to maintain things any more than engineers do, even though they have to deal with bug fixes, patches, service packs and regression testing. Life sucks. Nobody said the gravy doesn't have lumps every now and then. Engineers have to deal with those lumps too, especially when transitioning new things into the hands of the SysAdmins. It's life (and it happens to pay pretty darn well, thank you).

So, if these two camps are so similar, and they often possess such awesome potential, when then are they not more often exploited towards a common goal?

Prologue

I'm writing this because this quasi-mythical world is precisely where I've found myself repeatedly over the past twenty-odd years. You see, I started out as a draftsman (on the board, with sepia, paper and Mylar, thank you), and was part of the wave of folks in the 1980's who got corralled into the emerging world of CAD, or Computer Aided Design. That led to customizing the CAD toolset, which sucked me into the world of software development and programming like drunk sailor in front of a Thai brothel (no, I was never in the Navy, nor have I ever been to Thailand, the phrase just seems to fit).

After years of that, I drifted into "mainstream" IT by way of helping an old friend implement SMS 2.0 in a large corporate environment (ok, he did 99% of the work, I just handed him some wrenches), but it was enough to turn on a light bulb in my head about the incredible potential of a networked environment, the tools, frameworks and protocols that make it possible to leverage a whole new level of capability and productivity on a much larger scale. LDAP, ADSI, WMI, TCP/IP, SNMP, the Windows API stacks, Registry, Event logs, alert mechanisms, and so on. Then came ever-increasing power and simplicity with enterprise databases (Oracle, SQL Server), and the arrival of web technologies.

Oh man, I was so there. SO THERE. It was like a labor camp escapee wandering into a Ruth's Chris restaurant on "free dinner" night.

Even now, even when I get involved in another project to inject Development into Engineering, it's the exception, never the rule. In most cases it creates such an oddity that neither camp gets offended, just more or less confused about the value. That is, until the project starts to bear fruit. Then I get Engineers asking about the development aspects, and Developers asking about the infrastructure engineering aspects. It's kind of like the old Reese's Peanut Butter Cup ads from the 1980's.

Continuing On...

As Yogi Berra might have said, I've been doing the same thing over and over again, only differently each time. I am handed some problems to solve, which span multiple systems, multiple departments, multiple environments, multiple business cultures, multiple security realms, multiple personalities, and asked to somehow build something of a bridge across all of it to get the traffic (information) moving. I don't really have name for it. Some call it BPA, or Business Process Automation, or Systems Automation, or Data Mining (not a really great term). Some don't know what to call, so they just give a description like "that stuff Dave is working on". But whatever it might be called, it's where these two worlds intersect. And the result is quite a lot like the Super Hero Action League touching their power rings together (only without the "shazamm!!!" sound effect).

A lot of vendors would argue you can solve almost every problem with an out-of-the-box solution, but anyone who's worked in this realm for more than a decade knows that's as realistic as pocket-sized nuclear fusion power. Maybe someday. Maybe someday.

Putting this a slightly different way: Any time you implement something to abbreviate the steps required to perform a repetitive task, it most likely involves writing a script, a program, or configuring a bunch of options in a widget to enable this to happen. This borders on Software Development. With the advent of PowerShell, especially the concerted effort behind its proliferation, we're seeing a surge in the popularity and prevalence of SysAdmins writing scripts to handle chores that were once considered off limits. Maybe it was because of a bad perception about VBscript or Batch scripting? Maybe it's because PowerShell is becoming so intrinsically part of more and more Microsoft (and VMware) products? Maybe it's because of it's Spartan syntax and brevity (at the command line, not always so within cmdlets)?

I've met so many people in this line of work who have strong feelings about "best practices". Many are of the opinion that unless there's a ready-made product to do something, it shouldn't be done by other means. This is not only short-sighted and foolish, it's downright disingenuous towards the employer. If there exists a means to solve a problem now, even if it requires a little elbow grease, you owe it to yourself, your employer, and your customers to consider it. Especially if this involves existing means which are "free" (provided with the products already purchased: Windows, etc.)

Microsoft didn't provide this insanely broad inventory of API's, protocols, command tools, and utilities for talking about at dinner parties. They were provided for your benefit, and the benefit of their customers to build things to do more than they do "out of the box". Don't be afraid. Don't fear change. The "Soft" in Software was Charles Babbage's gift to all of us that freed us of the Iron shackles to hardware. It was intended, from the very start, to be "changeable". To allow us to adapt processes and capabilities to meet newer challenges.

If you ever, and I mean EVER, enjoyed building things as a child, whether it be Lego's, model kits, model rockets, Linkin Logs, or even stacking wooden blocks, you should find a lot to enjoy building things on the Windows platform. Heck, any platform to be honest, but for this discussion I'm thinking along the lines of Windows. It seems that either a lot of grown-ups forgot how much fun that experience was, or they never got into it in the first place. If you consider yourself an Engineer, and ever wondered what it was that Developers found appealing in their work, it's the fascination with putting things together from scratch and seeing them perform when finished. Engineers get that sensation as well, but at a higher level in the logical technology stack.

In most cases, Engineers are working with components which were provided by Developers, but this is where it often stands and never changes. There's still room for development during the Engineering phase. When the out-of-box components don't hit every note in the song, it can do wonders to get a Developer involved. In most cases, if the right people are involved, the song comes out even better than expected and ideas start popping out for new potential and new directions.

Conclusion

IT is all about change. It's an intrinsic part of the fabric of technology. To fear it is to refuse the absolute purpose in and of itself. Nothing demonstrates, or proves, the value in embracing this more than seeking a convergence between Development and Engineering. Nothing. The teaming of the "thinkers" with the "do-ers" is the ultimate way to push it as far as it can go. As long as these two worlds are kept isolated, or at the very least, un-involved, the less we take this profession seriously and the more handicapped we keep our employers as a result.

Saturday, December 17, 2011

A Circle of Imaginary Links

Anyone who has worked with Configuration Manager for a while knows about Packages, Programs, Advertisements, and Collections. They also know the difference between "Software Products" and "Add Remove Programs" as it pertains to inventory reporting. Seasoned software packagers (or more appropriately termed "repackagers") know about creating .MSI and bootstrap .EXE InstallShield packages as well. But there's a sinister problem lurking in all of this...

The names assigned to Packages, Programs, Advertisements and Collections are arbitrary. Sure, many places have adopted standards for naming things, but still, it's a human dependency.

The names assigned to an application are controlled by the vendors, but only if they created the application. And then there's the issue of how well they apply consistent naming standards to their individual components.

Just yesterday I was poking around to find how many installations of a product at version "7.11.3" were in our environment. As I watched the report start to build, it was showing "7.10" and even "7.9.1" installations as well. But the engineers assured me that the package they deployed had uninstalled previous versions before initializing the new installation. After some investigation, it turns out the vendor left components in their 7.11.3 installer that still identified themselves as "Product Name 7.9.1" so Configuration Manager dutifully picked it up and reported it. Switching over to the ARP (that's Add or Remove Programs list) report, it correctly showed 7.11.3 was the only installation.

I won't even get into the products that dump garbage components on computers that identify themselves with product names like "Update", "O&%$__#" and "Company Name". So far, free markets and consumer-driven competition are not doing much to fix that mess, but neither would and overly restrictive government regulation. It's just typical stupid human behavior (e.g. laziness).

Back to the topic.

So, when pulling a report of what software is installed on computers, it can be helpful to also include some attributes for each product such as "Is-Packaged", "Is-Windows7-Ready", "Is-Current-Supported", and so on. You can beat some of that out of Asset Intelligence, but some you cannot. For example, to really know if an installed product has a corresponding Configuration Manager Package and Advertisement, you need to somehow relate the installed Product Name and Version to the Configuration Manager Package and Advertisement. Sounds easy enough, right?

Ok, you've got an environment where you have deployed 500 installations of AutoCAD 2012 using a network license client, and 25 of AutoCAD 2012 with a standalone license. From ARP reports it will show them all as "AutoCAD 2012 English" or something generic like that. But you can't use the same package to deploy both (well, you could, but now we're talking about some twisted branched logic in the package using a script or some other intermediary program logic), so how would you know that you've got "A Package" for "AutoCAD 2012" and which one it belongs to?

Even if you switch to linking to the Installed Programs list (pulled from a query of .EXE files on the computer) it would show "AutoCAD 2012" or "AutoCAD R18.x" or something like that, not "AutoCAD 2012 Network Client", because they will both read from "acad.exe" and the only difference you could pull might be the size of the file itself. What about products that are only installed as part of another product advertisement? What about all those Autodesk Design Review and DWG True View installations that were placed by your various AutoCAD and Revit and Inventor deployment packages? How do you automate the referential integrity of those applications to a source package and advertisement?

In case you haven't already surmised, this is all analogous to Reverse Lookup DNS. Forward lookup is easy: This package and advertisement installs this application. Fine. Now, what package and advertisement installed this application on these 100,000 computers? How do I automate that workflow?

What happens when you have in-house developers and packagers putting together things to deploy? How about those not uncommon cases where the package doesn't really install a product, but registers components and files that support another product, but which are given their own name by virtue of how business operational minds like to give names to things that don't really exist? You know, when you install and register three DLL files, open a firewall port and now the user can access a special web application in their browser, so that deployed bundle of crap is given the name of whatever it is that they connect to via the browser. It's not really an installed application, is it? If the users rely on this to connect to a web application named "Fubar", good luck convincing them to call it "installed components to support Fubar". They're going to call it "Fubar" and ask "when are you going to get my Fubar installed?" Oh yes. You can bet on that.

I'm digressing. It's much easier to convey this verbally than in writing. I have lots of stories to illustrate this weird delusional mess.

This is the mind-bending thought process that swims around in my head as I've been building a web-based asset management process for a customer. The process allows them to overtly control these arbitrary relationships for more than just reporting. It also allows them to manage distribution that comes into play when executing computer replacements versus computer refreshes (refresh in this case means upgrade the OS and map in required application upgrades at the same time).

For example, the computers in a particular department-based collection all have XP and Office 2007, but they will be reimaged via SCCM OSD with Windows 7 and Office 2010, but they each have unique LOB (line of business) applications installed. Many require an upgrade to work with Windows 7, or the customer budgeted for new versions based on feature enhancements and decided to tie it into the same upgrade window. Regardless, they needed a quick and easy way to select an upgrade mapping individually and in batch to say "these computers get the newer version" and "these don't get an upgrade" as well as "remove it from these computers entirely". It's much more complex than this, but that's a simplified example.

In short, all I can say is that I LOVE working with this stuff. It falls squarely in the field of work I crave: "Windows Platform Business Process Automation" or WPBPA. I believe I invented this term, so neener neener neeeeeeener. I need some breakfast and coffee as I've been up all night working on my next book project. Holy cow - what sleep can do for you. Cheers!

Smile

Thursday, September 29, 2011

The Zen of Systems Automation

Level 1 - Writing scripts, hacking code, building your own solutions

Level 2 - Finding existing scripts, code, solutions and leveraging them with a little effort

Level 3 - Leveraging built-in functionality without having to write any custom additions

Level 4 - Systems run themselves. You drink beer and focus on other things

Tuesday, September 27, 2011

Autodesk Network Deployment Strategies

This is going to be sort of an extension of yesterday's post and on some of the topics covered in my book "The AutoCAD Network Administrator's Bible". Mainly: how to unleash an Autodesk network deployment installation on your network with some logical and strategic efficiency regarding traffic isolation.

If you've ever taken a certification exam, this may all seem very familiar.

Background

Let's start with a model: Fictional Corporation

New York, NY - is the main data center for the company. The data center is state of the art with blade servers, SAN device arrays, and virtualized servers and virtual data center switches. While being the largest office in the company, there are no AutoCAD users in this office, at present. However, the company IT department creates and maintains all software distribution resources for the company. They build the AutoCAD network deployment and host it (initially) in the NY data center.

Chicago, IL - is the second largest office in the company, but has the largest concentration of AutoCAD users in the company. The connection between NY and Chicago uses multiple/redundant T-1 connections.

Washington, DC - is the third largest office, with the fewest AutoCAD users. The connection link between NY and DC is fractional T-1. Not bad, and not unreliable, but not as fast as the NY-Chicago link

Virginia Beach, VA - is the smallest office with the second largest group of AutoCAD users. The link between NY-VB is fractional T-1, about the same performance characteristics as NY-DC.

The IT department utilizes Microsoft System Center Configuration Manager 2007 to deploy software, updates, collect inventory data, as well as uses it for provisioning new and refreshed computers. There are primary site servers in Chicago and Washington DC. Virginia Beach has a seconary site server. All four site servers are also distribution points.

The file, print, and Configuration Manager site servers in the remote offices are physical machines. The servers in NY are virtual.

Situation

All computers in the company run Windows 7 Enterprise Edition, Service Pack 1. All servers are running Windows Server 2008 or 2008 R2. The IT department has installed a FlexLM(R) license server in New York and obtained a valid license file from Autodesk. They have configured the license server and verified it is operating normally.

The IT department creates the first AutoCAD network deployment share on a server in the NY data center. They are aware of the deployment caveates for .NET Framework 4.0 and have already packaged the AutoCAD DirectX(R) component installer as an .MSI.

Using Configuration Manager (aka "SCCM"), they deploy .NET Framework 4.0 to all computers in the company successfully. They also deploy the DirectX(R) custom installer successfully. They then deploy a few test clients in the NY office using SCCM successfully. Everything so far looks good.

Within SCCM they assign additional distribution point servers for the AutoCAD deployment package, one for each remote office. They create the necessary collections and add direct memberships for clients in each remote office to a corresponding office-related collection and assign the advertisement.

The IT department runs server data backups over the WAN links to the NY data center for archival between midnight and 2AM ET (1AM Chicago time). Client computers are schedule to run disk defrag, and anti-virus scans between 2AM and 4AM local time. Tests show that the AutoCAD deployment takes roughly 40 minutes to install on a full T-1 connection, and 60 minutes on a fractional T-1 connection.

Few, if any, of the remote office clients successfully install. Most return an error that the package timed out.

Question 1: What Happened?

What might have caused the remote office clients to fail the installation attempt when the clients in the New York office completed the installation just fine? Was it...

The Package did not finish replicating to the remote office distribution point servers.
The network links might have been saturated with concurrent traffic during the deployment.
The replicated package files contained identical deployment .INI content, so the clients attempted to install from the New York server share.
Answers 1 and 2
Answers 2 and 3
All of the Above
None of the Above

The answer is (definitely) 3 but could also be 2, so the best answer is 5.

Question 2: How to Fix This:

When you run an installation from the network deployment share, the process refers to the DEPLOYMENT_LOCATION key in the .INI file. So, what's the best way to address this?

A. Open the deployment .INI on each SCCM package share and edit the DEPLOYMENT_LOCATION value to refer to the local share UNC path.

B. Build each deployment "on" a server in each remote office, then create a SCCM package and program that refers to the UNC as a distribution share.

C. Build each deployment "on" a server in each remote office, in a separate folder create a .bat or .cmd script that references the setup command for that server. Create a SCCM package and program that points to that script.

Best Answer? ____

FlexLM License Servers

After sorting out their deployment issues, all clients are working fine and obtaining licenses from the license server as expected. The IT department decides they want to add a little redundancy by implementing two more FlexLM(R) license servers in a Distributed configuration. They provision a license server in Chicago and another in Washington DC.

The clients were originally installed using a system environment variable to assign the FlexLM(R) server setting. Now they want to reconfigure the Chicago users to point to their own license server first, then the New York server, followed by the DC server. The DC users are to be configured so they point to their license server first, then New York, followed by Chicago. Lastly, the Virginia Beach users should point to DC, then NY, and then Chicago.

What's the Easiest way to accomplish this change?

A. Modify each deployment using the deployment utility "Modify Deployment" link, and enter new FlexLM server information, then re-deploy the installations to all clients for each site.

B. Create a SCCM package and program that executes a script to configure the system environment variable to suit each location. Target the clients using collections based on site assignment.

C. Create four Group Policy Objects with a Group Policy Preference setting to replace the system environment variable value, link the GPO to the Active Directory OU for each site.

Best Answer? _____

Slow Autodesk Network Deployments?

One advantage of being a "consultant" is getting to peek into a variety of environments, cultures, methodologies, and whatnot. One thing I see fairly often that seems to warrant some mention is slow performance over network pathways. A cursory search on Google reveals that there are plenty of articles, blog posts and tweets regarding slow "deployment creation", it's tough to find anything really focusing on the execution side: deploying the installation from the share to the clients.

The potential points of trouble or failure in a typical LAN or WAN environment are numerous. From client hardware and software, to client configuration settings, to physical wiring, switches, hubs, routers, wireless equipment, wireless configuration settings, server hardware, server software, server configuration settings, scheduled backups, scheduled anti-virus scans, unpredicted end-user activity, power faults, wireless interference, and layer on top of all this the issues incurred in virtual data center environments. These are some of the aspects and any one of which (or combination of several) that can cause performance drag or outright failure. The more you dig into the world of network engineering, the more you can appreciate what a network engineer deals with.

So, if you're Autodesk product network deployments are taking longer than they should to deploy, here are some things to look at:

Network Site Link Speeds

If your employer is large enough to have a dedicated "network admin", consult him/her for link speeds between points of deployment (server shares) and end users (desktop and laptop computers). Find out how many hops are involved. What the switches and routers are like. What limitations they're aware of. I recommend doing this BEFORE you even attempt to build network deployment shares (referenced in my book. shameless plug, I know) Basically, you always want to keep as little time, distance and latency as possible between the installation source and the installation target. This is irrespective of doing it manually, via scripts, group policy or management products like Configuration Manager.

NIC Configuration Settings

Verify the Network Adapter configuration settings with your network administrator. I've seen plenty of situations where everyone assumed it was automatically set to the correct configuration, when it was anything but. Sometimes atypical network equipment configurations require atypical client configurations to suit. Even if you are absolutely, positively convinced that all of your clients are using identical and correct network settings, check them again, just to be sure.

Concurrent Utilitization: Server

If your deployment share resides on a file server, or a print server, or an application server (web, database, etc.), I would STRONGLY recommend you consult your server administrators to setup some performance monitoring to measure and verify the loads placed on each server. If one role or function is hogging the resources (CPU time, memory, disk I/O, NIC throughput, etc.) consider moving it to another server.

If concurrent roles or functions are not an issue, check to see when system and file backups are scheduled to run. Do your best to avoid scheduling your network deployments during system backups, anti-virus scans, or scheduled maintenance. If your servers are virtualized, and your server admins ocassionally move them from one physical host to another (or SAN storage attachments, etc.) it's important to coordinate your activities to avoid problems.

Talk to your department coordinators or "power users" also. Make sure they don't have any tasks they run during certain hours to back-up their project files, or perform operations that directly tax the server or its resources. Copying a ton of large files can greatly impact network performance if the server is not configured to address that usage ahead of time.

Always always always check the event logs for any sign of potential issues.

Concurrent Utilitization: Network

If the server is not being taxed, you should take a careful look at what's going over the network segments. Between the clients and the switch or gateway. Between the switch or gateway and other routers, and between the other routers back to the servers. There are lots of ways to do this and it's important to consult and coordinate with your network admins to do this effectively. In many cases they can help identify bottlenecks caused by two major traffic activities that can be separated to avoid conflict.

Client Integrity

Check for available/free disk space on the target hard drive volume. Check for errors and warnings in the client event logs. Is the client getting frequent anti-virus quarantine events? Are installed applications causing problems? Be careful of applications that want to continue running in the background, especially when the client is tight on physical memory. If the computer is being used heavily, consider upgrading the hardware or installing a separate physical hard drive to isolate I/O tasking. The list here goes on and on, of course.

Client Activity

Are users performing CPU or disk-intensive activities? Rendering large models, video processing, audio editing, etc.

More

Oh yes. There's more. MUCH more. I haven't (and won't) go into the impacts of other key networking services like DNS, Kerberos, WINS, DHCP, encryption overhead, certificates, dual network interfaces, teaming, virtual switches, virtual NICs, or any of that stuff. It's more than I feel up to blabbering about, but it's out there, waiting for your brain to absorb it all.

And Finally...

Before you jump into building your network deployment, especially if you're aiming for multiple servers across a WAN, it's important to gather as much information as possible about your network so you can plan your strategy correctly. If you spent hours convincing your coworkers, management and customers/users that network deployments are the "way to go", don't you want to make sure it works as great as it possibly can?

Consult your network and server administrators. Run tests to verify file copy speeds across all links you plan on using. Run your tests over a month or at least a few weeks, at all times of day, all days of the week. Record details of performance and identify patterns in best and worst results. Pointing a finger at one of your network admins and blaming them for slow speeds is only going to piss them off and get you moved to the bottom of their list of things to take care of. Having measurement data and engaging in a cooperative discussion will get you where you want to go with the least amount of pain and effort. Be sure to bring donuts and tell lots of jokes too. It never hurts.

Tools and Links

WindowsNetworking.com - Troubleshooting Network Issues

Advanced Network Adapter Troubleshooting for Windows Workstations

WireShark (the de facto diagnostic tool)

Network Monitor 3.4 for Windows 7, Vista, Server 2003, 2008, 2008 R2

Sysinternals Networking Utilities

Sysinternals Process Utilities

Books by Mark Minasi at Amazon

Network Diagnostics and Tracing in Windows 7

NETSH commands for Windows 7 and Windows Server 2008 R2

Saturday, September 17, 2011

Organic Architecture. Part 1

Organic - "4a. Forming an integral element of a whole" / "4b. Having systematic coordination of part" - Merriam Webster Dictionary

Most of you may assume I'm referring to the works of people like Frank Lloyd Wright, or some artist or sculptor. Close, but no cigar, at least not yet.

I'm going to try to dive into a rather nebulous and obscure topic within the microcosm of software/systems architecture: a term one of my professors once used: "organic architecture". He didn't articulate this in class, but he did so during a private discussion, and I've always held it in the back of my head when concocting any "solution" to a systematic challenge. Basically, it denotes the pursuit of designing and building something that just feels "natural".

Trying to quantify that definition would be about as simple as quantifying what makes you feel love or sadness (or what my wife wants for dinner). But I'm going to try to smoke out some constraints by example to help develop a shape to this concept. It's going to require multiple parts in order to properly dissect this invisible beast, so hopefully you can (and will want to) bare with me on this journey into nerdville.

Organic Architecture plays a major part in many aspects of our lives and our environment. From things we participate in, to the tools we use, and the machines we operate and travel within. It's an inherent part of bio-medicine and chemical engineering, as well as traffic management systems and air traffic control. It's everywhere. Rather than try to tie a rope around something this broad in scale, I'm going to break off the tiny chunk that pertains to computer software architecture.

Why?

I really don't know. It's raining. My wife is on travel and the house is quiet. I've been cleaning, cooking, eating, and hanging out with my kids, the cat and dog. It's one of those brain-incubator environments where the mind wanders and tends to develop ideas that beg to be externalized. Whoa! I think I just said something logical? Maybe not. In any case, this is one small area of software technology that doesn't get much discussion or illumination, anywhere; Not in schools, or conferences or in the workplace. There are a few books that discuss this but most are obscure and way outside the mainstream for even the geekiest of geeks. Typically, it is hinted at, rather than outright bagged and tagged. Some examples I may touch on:

Interface Design
Systems Management Architecture
Configuration Management Architecture
Systems State Management
Workflow Design
Self-Healing Processes

Example 1 - Agent vs Agentless

You've probably heard this term before. It's used by quite a few products and technologies. It refers to a basic concept of where nodal processing is performed within a distributed system. But this concept is not confined to software or technology at all. It's used in the intelligence community, as it pertains to HUMINT versus SIGINT (geeks, you may commence to beating off over the acronyms... now!).

An "agent" in this context would refer to a component or process which runs on a remote node within a distributed system. The agent performs the majority of processing locally. In most scenarios, the agent receives general "instructions" or control parameters from a higher "authority" or higher node in the system. It also typically transmits the results of its processing to a higher authority or node in the system. A biological example of this would be nerve receptors within the nervous system in the human body. A computer example would be monitoring agents deployed as part of a systems management technology (e.g. Microsoft System Center Operations Manager®).

An "agentless" scenario would be one where nodes are not configured with any distinct autonomous processing components, but instead are acted upon directly from external/remote nodes in the system. A social reference example of this would be cash registers in most modern shopping centers (when the power goes out, they cannot function on their own). A computer example would be a web application, or remote systems data interrogation (think file systems, CIM (WMI), registry, event logs, etc.).

A "system", according to Merriam Webster's Dictionary, can be many things, but in simplest terms: it is a collective body of elements or components that participate in some common goal or outcome. A football team is a system. A McDonalds hamburger is a system. A government is a system. A farm is a system.

A "distributed system" is not clearly defined by any official dictionary (at least, none that I've encountered), but it basically builds atop the "system" concept to include a qualification of having elements or components which are not physically collocated or in close "proximity".

Putting all of these things together you get one of two distinct concepts for monitoring and managing a distributed collection of components:

Agent - components are deployed to each member of the system to perform most of the querying and maintenance of the node locally.
Agentless - members of the system are managed directly from a central node (or hierarchical layer node).

This general concept is used for all sorts of functions within a computer system (i.e. network environment). It's used by software licensing services (e.g. Flexera FlexLM® and FlexNet®), distributed processing (i.e. 3DS Max® "net rendering"), and so on.

Summary

Why is it important or even worthy of knowing about this stuff? Because the more you see the basic concepts laid out, the more familiar you will be to the "50,000 foot view". The 50,000 foot view is important because it provides the most general, basic understanding of how something works. It's how you explain something to someone else who has absolutely no insight or understanding about anything remotely related to the thing you are explaining. It's also how you keep things in context when a slick vendor rep is pumping your hand and waiting for your signature on the P.O. after he tries to bullshit you into believing they have inventing something "radically new".

In most cases, the "radical" stuff has been around since the 1950's and 1960's (the last era when people actually used their brains to invent new things). Ever since, most of everything labeled "new" is really a refinement or repackaging of something old. Knowing what snake oil is, helps you spot new snake oil.

Next Part - Nodal and Agent Autonomy