Getting Started with SQL Azure Development

Microsoft Windows Azure offers several choices for data storage. These include Windows Azure storage and SQL Azure. You may choose to use one or both in your particular project. Windows Azure storage currently contains three types of storage structures: tables, queues and blobs.

SQL Azure is a relational data storage service in the cloud. Some of the benefits of this offering are the ability to use a familiar relational development model that includes much of the standard SQL Server language (T-SQL), tools and utilities. Of course, working with well-understood relational structures in the cloud, such as tables, views and stored procedures, also results in increased developer productivity when working in this new platform. Other benefits include a reduced need for physical database-administration tasks to perform server setup, maintenance and security, as well as built-in support for reliability, high availability and scalability.

I won’t cover Windows Azure storage or make a comparison between the two storage modes here. You can read more about these storage options in Julie Lerman’s July 2010 Data Points column (msdn.microsoft.com/magazine/ff796231). It’s important to note that Windows Azure tables are not relational tables. The focus of this is on understanding the capabilities included in SQL Azure.

This article will explain the differences between SQL Server and SQL Azure. You need to understand the differences in detail so that you can appropriately leverage your current knowledge of SQL Server as you work on projects that use SQL Azure as a data source.

If you’re new to cloud computing you’ll want to do some background reading on Windows Azure before continuing with this article. A good place to start is the MSDN Developer Cloud Center at msdn.microsoft.com/ff380142.

Getting Started with SQL Azure

To start working with SQL Azure, you’ll first need to set up an account. If you’re an MSDN subscriber, then you can use up to three SQL Azure databases (maximum size 1GB each) for up to 16 months (details at msdn.microsoft.com/subscriptions/ee461076) as a developer sandbox. To sign up for a regular SQL Azure account (storage and data transfer fees apply) go to microsoft.com/windowsazure/offers/.

After you’ve signed up for your SQL Azure account, the simplest way to initially access it is via the Web portal at sql.azure.com. You must sign in with the Windows Live ID that you’ve associated to your Windows Azure account. After you sign in, you can create your server installation and get started developing your application.

An example of the SQL Azure Web management portal is shown in Figure 1. Here you can see a server and its associated databases. You’ll notice that there’s also a tab on the Web portal for managing the Firewall Settings for your particular SQL Azure installation.


Figure 1 Summary Information for a SQL Azure Database

As you initially create your SQL Azure server installation, it will be assigned a random string for the server name. You’ll generally also set the administrator username, password, geographic server location and firewall rules at the time of server creation. You can select the location for your SQL Azure installation at the time of server creation. You will be presented with a list of locations (datacenters) from which to choose. If your application front end is built in Windows Azure, you have the option to locate both that installation and your SQL Azure installation in the same geographic location by associating the two installations.

By default there’s no access to your server, so you’ll have to create firewall rules for all client IPs. SQL Azure uses port 1433, so make sure that port is open for your client application as well. When connecting to SQL Azure you’ll use the username@servernameformat for your username. SQL Azure supports SQL Server Authentication only; Windows Authentication is not supported. Multiple Active Result Set (MARS) connections are supported.

Open connections will time out after 30 minutes of inactivity. Also, connections can be dropped for long-running queries and transactions or excessive resource usage. Development best practices in your applications around connections are to open, use and then close those connections manually, to include retry connection logic for dropped connections and to avoid caching connections because of these behaviors. For more details about supported client protocols for SQL Azure, see Steve Hale’s blog post at blogs.msdn.com/b/sqlnativeclient/archive/2010/02/12/using-sql-server-client-apis-with-sql-azure-vversion-1-0.aspx.

Another best practice is to encrypt your connection string to prevent man-in-the-middle attacks.

You’ll be connected to the master database by default if you don’t specify a database name in the connection string. In SQL Azure the T-SQL statement USE is not supported for changing databases, so you’ll generally specify the database you want to connect to in the connection string (assuming you want to connect to a database other than master). Here’s an example of an ADO.NET connection:

Server=tcp:server.ctp.database.windows.net;

Database=<databasename>;

User ID=user@server;

Password=password;

Trusted_Connection=False;

Encrypt=true;

Setting up Databases

After you’ve successfully connected to your installation you’ll want to create one or more databases. Although you can create databases using the SQL Azure portal, you may prefer to do so using some of the other tools, such as SQL Server Management Studio 2008 R2. By default, you can create up to 149 databases for each SQL Azure server installation. If you need more databases than that, you must call the Windows Azure business desk to have this limit increased.

When creating a database you must select the maximum size. The current options for sizing (and billing) are Web or Business Edition. Web Edition, the default, supports databases of 1GB or 5GB total. Business Edition supports databases of up to 50GB, sized in increments of 10GB—in other words, 10GB, 20GB, 30GB, 40GB and 50GB.

You set the size limit for your database when you create it by using the MAXSIZE keyword. You can change the size limit or the edition (Web or Business) after the initial creation using the ALTER DATABASE statement. If you reach your size or capacity limit for the edition you’ve selected, then you’ll see the error code 40544. The database size measurement doesn’t include the master database, or any database logs. For more details about sizing and pricing, see microsoft.com/windowsazure/pricing/#sql.

It’s important to realize that when you create a new SQL Azure database, you’re actually creating three replicas of that database. This is done to ensure high availability. These replicas are completely transparent to you. The new database appears as a single unit for your purposes.

Once you’ve created a database, you can quickly get the connection string information for it by selecting the database in the list on the portal and then clicking the Connection Strings button. You can also quickly test connectivity via the portal by clicking the Test Connectivity button for the selected database. For this test to succeed you must enable the Allow Microsoft Services to Connect to this Server option on the Firewall Rules tab of the SQL Azure portal.

Creating Your Application

After you’ve set up your account, created your server, created at least one database and set a firewall rule so that you can connect to the database, you can start developing your application using this data source.

Unlike Windows Azure data storage options such as tables, queues or blobs, when you’re using SQL Azure as a data source for your project, there’s nothing to install in your development environment. If you’re using Visual Studio 2010, you can just get started—no additional SDKs, tools or anything else are needed.

Although many developers will choose to use a Windows Azure front end with a SQL Azure back end, this configuration is not required. You can use any front-end client with a supported connection library such as ADO.NET or ODBC. This could include, for example, an application written in Java or PHP. Connecting to SQL Azure via OLE DB is currently not supported.

If you’re using Visual Studio 2010 to develop your application, you can take advantage of the included ability to view or create many types of objects in your selected SQL Azure database installation directly from the Visual Studio Server Explorer. These objects are Tables, Views, Stored Procedures, Functions and Synonyms. You can also see the data associated with these objects using this viewer. For many developers, using Visual Studio 2010 as the primary tool to view and manage SQL Azure data will be sufficient. The Server Explorer View window is shown in Figure 2. Both a local installation of a database and a cloud-based instance are shown. You’ll see that the tree nodes differ slightly in the two views. For example, there’s no Assemblies node in the cloud installation because custom assemblies are not supported in SQL Azure.


Figure 2 Viewing Data Connections in Visual Studio Server Explorer

As I mentioned earlier, another tool you may want to use to work with SQL Azure is SQL Server Management Studio (SSMS) 2008 R2. With SSMS 2008 R2, you actually have access to a fuller set of operations for SQL Azure databases than in Visual Studio 2010. I find that I use both tools, depending on which operation I’m trying to complete. An example of an operation available in SSMS 2008 R2 (and not in Visual Studio 2010) is creating a new database using a T-SQL script. Another example is the ability to easily perform index operations (create, maintain, delete and so on). An example is shown in Figure 3.


Figure 3 Using SQL Server Management Studio 2008 R2 to Manage SQL Azure

Newly released in SQL Server 2008 R2 is a data-tier application, or DAC. DAC pacs are objects that combine SQL Server or SQL Azure database schemas and objects into a single entity. You can use either Visual Studio 2010 (to build) or SQL Server 2008 R2 SSMS (to extract) to create a DAC from an existing database.

If you wish to use Visual Studio 2010 to work with a DAC, then you’d start by selecting the SQL Server Data-Tier Application project type in Visual Studio 2010. Then, on the Solution Explorer, right-click your project name and click Import Data-Tier Application. A wizard opens to guide you through the import process. If you’re using SSMS, start by right-clicking on the database you want to use in the Object Explorer, click Tasks, then click Extract Data-Tier Application to create the DAC.

The generated DAC is a compressed file that contains multiple T-SQL and XML files. You can work with the contents by right-clicking the .dacpac file and then clicking Unpack. SQL Azure supports deleting, deploying, extracting and registering DAC pacs, but does not support upgrading them.

Another tool you can use to connect to SQL Azure is the latest community technology preview (CTP) release of the tool code-named “Houston.” Houston is a zero-install, Silverlight-based management tool for SQL Azure installations. When you connect to a SQL Azure installation using Houston, you specify the datacenter location (as of this writing North Central U.S., South Central U.S., North Europe, Central Europe, Asia Pacific or Southeast Asia).

Houston is in early beta and the current release (shown in Figure 4) looks somewhat like SSMS. Houston supports working with Tables, Views, Queries and Stored Procedures in a SQL Azure database installation. You can access Houston from the SQL Azure Labs site at sqlazurelabs.com/houston.aspx.


Figure 4 Using Houston to Manage SQL Azure

Another tool you can use to connect to a SQL Azure database is SQLCMD (msdn.microsoft.com/library/ee336280). Even though SQLCMD is supported, the OSQL command-line tool is not supported by SQL Azure.

Using SQL Azure

So now you’ve connected to your SQL Azure installation and have created a new, empty database. What exactly can you do with SQL Azure? Specifically, you may be wondering what the limits are on creating objects. And after those objects have been created, how do you populate those objects with data?

As I mentioned at the beginning of this article, SQL Azure provides relational cloud data storage, but it does have some subtle feature differences to an on-premises SQL Server installation. Starting with object creation, let’s look at some of the key differences between the two.

You can create the most commonly used objects in your SQL Azure database using familiar methods. The most commonly used relational objects (which include tables, views, stored procedures, indices and functions) are all available. There are some differences around object creation, though. Here’s a summary of those differences:

  • SQL Azure tables must contain a clustered index. Non-clustered indices can be subsequently created on selected tables. You can create spatial indices, but you cannot create XML indices.
  • Heap tables are not supported.
  • CLR geo-spatial types (such as Geography and Geometry) are supported, as is the HierachyID data type. Other CLR types are not supported.
  • View creation must be the first statement in a batch. Also, view (or stored procedure) creation with encryption is not supported.
  • Functions can be scalar, inline or multi-statement table-valued functions, but cannot be any type of CLR function.

    There’s a complete reference of partially supported T-SQL statements for SQL Azure on MSDN at msdn.microsoft.com/library/ee336267.

    Before you get started creating your objects, remember that you’ll connect to the master database if you don’t specify a different one in your connection string. In SQL Azure, the USE (database) statement is not supported for changing databases, so if you need to connect to a database other than the master database, then you must explicitly specify that database in your connection string, as shown earlier.

    Data Migration and Loading

    If you plan to create SQL Azure objects using an existing, on-premises database as your source data and structures, then you can simply use SSMS to script an appropriate DDL to create those objects on SQL Azure. Use the Generate Scripts Wizard and set the “Script for the database engine type” option to “for SQL Azure.”

    An even easier way to generate a script is to use the SQL Azure Migration Wizard, available as a download from CodePlex at sqlazuremw.codeplex.com. With this handy tool you can generate a script to create the objects and can also load the data via bulk copy using bcp.exe.

    You could also design a SQL Server Integration Services (SSIS) package to extract and run a DDM or DDL script. If you’re using SSIS, you’d most commonly design a package that extracts the DDL from the source database, scripts that DDL for SQL Azure and then executes that script on one or more SQL Azure installations. You might also choose to load the associated data as part of the package’s execution path. For more information about working with SSIS, see msdn.microsoft.com/library/ms141026.

    Also of note regarding DDL creation and data migration is the CTP release of SQL Azure Data Sync Services (sqlazurelabs.com). You can see this service in action in a Channel 9 video, “Using SQL Azure Data Sync Service to provide Geo-Replication of SQL Azure Databases,” at tinyurl.com/2we4d6q. Currently, SQL Azure Data Sync services works via Synchronization Groups (HUB and MEMBER servers) and then via scheduled synchronization at the level of individual tables in the databases selected for synchronization.

    You can use the Microsoft Sync Framework Power Pack for SQL Azure to synchronize data between a data source and a SQL Azure installation. As of this writing, this tool is in CTP release and is available from tinyurl.com/2ecjwku. If you use this framework to perform subsequent or ongoing data synchronization for your application, you may also wish to download the associated SDK.

    What if your source database is larger than the maximum size for the SQL Azure database installation? This could be greater than the absolute maximum of 50GB for the Business Edition or some smaller limit based on the other program options.

    Currently, customers must partition (or shard) their data manually if their database size exceeds the program limits. Microsoft has announced that it will be providing an auto-partitioning utility for SQL Azure in the future. In the meantime, it’s important to note that T-SQL table partitioning is not supported in SQL Azure. There’s a free utility called Enzo SQL Shard (enzosqlshard.codeplex.com) that you can use for partitioning your data source.

    You’ll want to take note of some other differences between SQL Server and SQL Azure regarding data loading and data access.

    Added recently is the ability to copy a SQL Azure database via the Database copy command. The syntax for a cross-server copy is as follows:

    CREATE DATABASE DB2A AS COPY OF Server1.DB1A

    The T-SQL INSERT statement is supported (with the exceptions of updating with views or providing a locking hint inside of an INSERT statement).

    Related further to data migration, T-SQL DROP DATABASE and other DDL commands have additional limits when executed against a SQL Azure installation. In addition, the T-SQL RESTORE and ATTACH DATABASE commands are not supported. Finally, the T-SQL statement EXECUTE AS (login) is not supported.

    Data Access and Programmability

    Now let’s take a look at common programming concerns when working with cloud data. First, you’ll want to consider where to set up your development environment. If you’re an MSDN subscriber and can work with a database that’s less than 1GB, then it may well make sense to 
develop using only a cloud installation (sandbox). In this way there will be no issue with migration from local to cloud. Using a regular (non-MSDN subscriber) SQL Azure account, you could develop directly against your cloud instance (most probably using a cloud-located copy of your production database). Of course, developing directly from the cloud is not practical for all situations.

    If you choose to work with an on-premises SQL Server database as your development data source, then you must develop a mechanism for synchronizing your local installation with the cloud installation. You could do that using any of the methods discussed earlier, and tools like Data Sync Services and Sync Framework are being developed with this scenario in mind.

    As long as you use only the supported features, the method for having your application switch from an on-premises SQL Server installation to a SQL Azure database is simple—you need only to change the connection string in your application.

    Regardless of whether you set up your development installation locally or in the cloud, you’ll need to understand some programmability differences between SQL Server and SQL Azure. I’ve already covered the T-SQL and connection string differences. In addition, all tables must have a clustered index at minimum (heap tables are not supported).

    As previously mentioned, the USE statement for changing databases isn’t supported. This also means that there’s no support for distributed (cross-database) transactions or queries, and linked servers are not supported.

    Other options not available when working with a SQL Azure database include:

  • Full-text indexing
  • CLR custom types (however, the built-in Geometry and Geography CLR types are supported)
  • RowGUIDs (use the uniqueidentifier type with the NEWID function instead)
  • XML column indices
  • Filestream datatype
  • Sparse columns

    Default collation is always used for the database. To make collation adjustments, set the column-level collation to the desired value using the T-SQL COLLATE statement.

    And finally, you cannot currently use SQL Profiler or the Database Tuning Wizard on your SQL Azure database.

    Some important tools that you can use with SQL Azure for tuning and monitoring include:

  • SSMS Query Optimizer to view estimated or actual query execution plan details and client statistics
  • Select Dynamic Management views to monitor health and status
  • Entity Framework to connect to SQL Azure after the initial model and mapping files have been created by connecting to a local copy of your SQL Azure database.

    Depending on what type of application you’re developing, you may be using SSAS, SSRS, SSIS or PowerPivot. You can also use any of these products as consumers of SQL Azure database data. Simply connect to your SQL Azure server and selected database using the methods already described in this article.

    It’s also important to fully understand the behavior of transactions in SQL Azure. As mentioned, only local (within the same database) transactions are supported. In addition, the only transaction-isolation level available for a database hosted on SQL Azure is READ COMMITTED SNAPSHOT. Using this isolation level, readers get the latest consistent version of data that was available when the statement STARTED.

    SQL Azure doesn’t detect update conflicts. This is also called an optimistic concurrency model, because lost updates, non-repeatable reads and phantoms can occur. Of course, dirty reads cannot occur.

    Database Administration

    Generally, when using SQL Azure, the administrator role becomes one of logical installation management. Physical management is handled by the platform. From a practical standpoint this means there are no physical servers to buy, install, patch, maintain or secure. There’s no ability to physically place files, logs, tempdb and so on in specific physical locations. Because of this, there’s no support for the T-SQL commands USE <database>, FILEGROUP, BACKUP, RESTORE or SNAPSHOT.

    There’s no support for the SQL Agent on SQL Azure. Also, there is no ability (or need) to configure replication, log shipping, database mirroring or clustering. If you need to maintain a local, synchronized copy of SQL Azure schemas and data, then you can use any of the tools discussed earlier for data migration and synchronization—they work both ways. You can also use the DATABASE COPY command.

    Other than keeping data synchronized, what are some other tasks that administrators may need to perform on a SQL Azure installation? 

    Most commonly, there will still be a need to perform logical administration. This includes tasks related to security and performance management. Additionally, you may be involved in monitoring for capacity usage and associated costs. To help you with these tasks, SQL Azure provides a public Status History dashboard that shows current service status and recent history (an example of history is shown in Figure 5) at microsoft.com/windowsazure/support/status/servicedashboard.aspx.


    Figure 5 SQL Azure Status History

    SQL Azure provides a high-security bar by default. It forces SSL encryption with all permitted (via firewall rules) client connections. Server-level logins and database-level users and roles are also secured. There are no server-level roles in SQL Azure. Encrypting the connection string is a best practice. Also, you may want to use Windows Azure certificates for additional security. For more details, see blogs.msdn.com/b/sqlazure/archive/2010/09/07/10058942.aspx.

    In the area of performance, SQL Azure includes features such as automatically killing long-running transactions and idle connections (more than 30 minutes). Although you can’t use SQL Profiler or trace flags for performance tuning, you can use SQL Query Optimizer to view query execution plans and client statistics. You can also perform statistics management and index tuning using the standard T-SQL methods.

    There’s a select list of dynamic management views (covering database, execution or transaction information) available for database administration as well. These include sys.dm_exec_connections , _requests, _sessions, _tran_database_transactions, _active_transactions and _partition_stats. For a complete list of supported dynamic management views for SQL Azure, see msdn.microsoft.com/library/ee336238.aspx#dmv.

    There are also some new views such as sys.database_usage and sys.bandwidth_usage. These show the number, type and size of the databases and the bandwidth usage for each database so that administrators can understand SQL Azure billing. A sample is shown in Figure 6. In this view, quantity is listed in KB. You can monitor space used via this command:

    SELECT SUM(reserved_page_count) * 8192

    FROM sys.dm_db_partition_stats


    Figure 6 Bandwidth Usage in SQL Query

    You can also access the current charges for the SQL Azure installation via the SQL Azure portal by clicking on the Billing link at the top-right corner of the screen.

    Learn More

    To learn more about SQL Azure, I suggest you download the Windows Azure Training Kit. This includes SQL Azure hands-on learning, white papers, videos and more. The training kit is available from microsoft.com/downloads/details.aspx?FamilyID=413E88F8-5966-4A83-B309-53B7B77EDF78.

    Also, you’ll want to read the SQL Azure Team Blog at blogs.msdn.com/b/sqlazure/ and check out the MSDN SQL Azure Developer Center at msdn.microsoft.com/windowsazure/sqlazure.

    If you want to continue to preview upcoming features for SQL Azure, be sure to visit SQL Azure Labs at sqlazurelabs.com.                                                                                


Office 365 plan comparison – Enterprise and Kiosk subscriptions (E1, E2, E3, E4, K1, K2)

Office 365 plan comparison – Enterprise and Kiosk subscriptions (E1, E2, E3, E4, K1, K2)

By: mike|March 13, 2012No Comments
Microsoft Office 365 Plans Compared

There is a lot of high level information about what features are included with each Office 365 plan. However, the details for the Office 365 plans are scattered among numerous Microsoft documents that aren’t easily found, yet alone understood. Here I’ve attempted to describe in detail, the differences between the Office 365 Kiosk and Enterprise plans. This includes the Office 365 E1, E2, E3, and E4 enterprise plans and the Office 365 K1 and K2 kiosk worker plans.

In addition, the Sharepoint Online component can be accessed by external users who can collaborate on documents and projects within Office 365 Sharepoint sites. These access licenses are called Partner Access License (PAL) and each Office 365 installation is granted 50 Partner Access Licenses by default. Currently Microsoft doesn’t enforce this limit and allows up to 1000 external users per Office 365 installation.

This article doesn’t address the Office 365 P1 plan. This plan is targeted at professionals and small businesses and is roughly equivalent to the Enterprise E3 plan, however, there are many differences within each service. These are addressed in a separate article comparing the Office 365 P1, E1 and E3 plans.

This article covers the three major services included with Office 365:

Office 365 Enterprise and Kiosk plans – Sharepoint Online – (E1, E2, E3, E4, K1, K2 Plans)
Office 365 Enterprise and Kiosk plans – Exchange Online – (E1, E2, E3, E4, K1, K2 Plans)
Office 365 Enterprise and Kiosk plans – Lync Online – (E1, E2, E3, E4, K1, K2 Plans)
Note: Microsoft lowered pricing on their enterprise plans on March 14th, 2012. Those changes are:

 

Office 365 New Pricing – Office 365 Plans E1, E2, E3, E4, K1, and K2
SKU Previous Cost New Cost Reduction
Office 365 K2 $10.00 $8.00 20%
Office 365 E1 $10.00 $8.00 20%
Office 365 E2 $16.00 $14.00 13%
Office 365 E3 $24.00 $20.00 17%
Office 365 E4 $27.00 $22.00 19%
SharePoint Storage (GB) $2.50 $0.20 92%
Exchange Advanced Archiving $3.50 $3.00 14%
Notes:
This pricing applies to new customers only. Customers that are under contract will still pay the same rate that their contract states. Rates will be updated when the contract is renewed.
If a current customers purchases additional seats, the new seats will be subject to the new pricing.

These Office 365 subscription plans are targeted at business of all size that require maximum flexibility in there online e-mail and collaboration service.

Office 365 Sharepoint Online Features for the Enterprise and Kiosk Plans

 

Office 365 Plan Comparison – Office 365 Plans E1, E2, E3, E4, K1, and K2
Sharepoint Online Features
Feature
Office 365 K1 and K2 Plans SharePoint Online Kiosk 1 and Kiosk 2
Office 365 E1 and E2 Plans SharePoint Online 1
Office 365 E3 and E4 Plans SharePoint Online 2
Office 365 Partner Access License (PAL)
(external partners)
Can access all team sites by default?
Yes
Yes
Yes
No1
My Site
No
Yes
Yes
No
Enterprise Features (Access, Business Connectivity Services (BCS), InfoPath Forms, Excel and Visio Services)
Yes3
No
Yes2
Yes2
Office Web Apps
K1 – View only
K2 – View and edit
E1 – View only
E2 – View and edit
View and edit
View only
Adds storage to the company’s overall pooled quota?
No
Yes. 500MB per user subscription license
Yes. 500MB per user subscription license
No
Can be an administrator of tenant, site or site collection?
No
Yes
Yes
No
1 – External partners can only access the sites they have been invited to by delegated site collection owners. 2 – Can view and upload Visio diagrams, view and build external lists, build and visit Access-based webpages, build and view embedded Excel graphs and create/publish, fill in and submit InfoPath forms. 3 – Kiosk workers have read-only rights except they can edit web-based and InfoPath Forms only.

Office 365 Partner Access License (PAL)

The Office 365 Partner Access License grants access to features like another Office 365 Plan. Below is an excerpt from the “Microsoft SharePoint Online for Enterprises Service Description”:

External sharing: The external sharing capabilities in SharePoint Online enable a company to simply invite external users in to view, share, and collaborate on their sites. Once a SharePoint Online Administrator enables external sharing, a site collection administrator can activate external sharing for the site they manage, and then invite external users to collaborate on sites, lists, and document libraries. An external user has access rights to only the site collection they are invited into. Please also note the external user use rights as explained above.

Note
Every Office 365 SharePoint Online customer (at the tenant level, not per subscription) includes 50 Partner Access Licenses (PALs) that can be leveraged for external sharing. Customers are not currently required to obtain additional PALs for external sharing beyond 50 users with a limit of 1000 until the next major update of the Office 365 service at which time Microsoft may choose to make it available as a paid add-on.
Microsoft supports invited external users signing in to the service using a Microsoft Online Services ID.
External sharing also supports Windows Live ID, including @Live.com, @Hotmail.com and @MSN.com user names, plus regional derivations of LiveID user names.
EasiID, the portion of LiveID that allows external users to associate their business email address (ex: user@contoso.com) to the LiveID system, is not supported at this time.

Office 365 Sharepoint Online “Server Resources” quota

From a development perspective SharePoint Online offers a flexible, robust framework for customizing and developing solutions in Office 365. The development features and patterns used to develop for SharePoint Online are a subset of those available in SharePoint 2010 on-premises.

Note
While SharePoint Online offers many opportunities to building customized solutions, the service does not yet support Full Trust Coded (FTC) solutions or what is sometimes referred to as farm-level solutions. The SharePoint Online development patterns and practices are currently targeted at site collection level solutions.
“Server Resources” quota, what are used to determine amount of processing power available to Sandboxed Solutions, is determined by the number of licensed user seats in a company’s tenancy. To calculate server resource quota in Office 365, you can use the following equation: (#seats×200) + 300. For example, in a typical 25 seat license, the available server resources quota would be 5300.
Neither Kiosk 1 (K1) or Kiosk 2 (K2) add to the overall total server resources quota
Companies cannot purchase server resources as a standalone add-on

SharePoint Online key features and specifications

These are common for all Office 365 plans that include Sharepoint Online.

Feature Description
Storage (pooled) 10 gigabytes (GB) base customer storage plus 500 megabytes (MB) per enterprise user
Storage per Kiosk Worker Zero (0). Licensed Kiosk Workers do not bring additional storage allocation.
Storage per external user Zero (0). Licensed external users do not bring additional storage allocation.
Additional storage (per GB per month); no minimum purchase. $2.50USD/GB/month
Site collection storage quotas Up to 100 gigabytes (GB) per site collection
My Site storage allocation (does not count against tenant’s overall storage pool) 500 megabytes (MB) of personal storage per My Site (once provisioned)
*Note: the storage amount on individual’s My Site storage cannot be adjusted.
Site collections (#) per tenant Up to 300 (non-My Site site collections)
Total storage per tenant Up to 5 terabyte (TB) per tenant
File upload limit 250 megabytes (MB) per file
External Users (PALs) 50 PALs are included per tenant. Current “Feature Preview” allows for usage rights of up to 1000 external users without requiring additional PALs. Microsoft reserves the right to charge for additional PALs beyond 50 at the time the next major Office 365 update.
Microsoft Office support Microsoft Access 2010 Microsoft Excel® 2007 and 2010 Microsoft InfoPath® 2010
Outlook 2007 and 2010
Microsoft OneNote 2010
PowerPoint 2007 and 2010 Microsoft SharePoint Designer 2010 Word 2007 and 2010
SharePoint Workspace 2010
Project Professional 2010
Browser support Internet Explorer 7 Internet Explorer 8 Internet Explorer 9 Firefox 3 and higher Safari 3.1.2 on Macintosh OS X 10.5 Chrome
Mobile device support Windows Phone 7.5 codenamed “Mango” or later Windows Mobile® 6.1 or later Nokia S60 3.0 or later
Apple iPhone 3.0 or later
Blackberry 4.2 or later
Android 1.5 or later

Office 365 Exchange Online Features for the Enterprise and Kiosk Plans

There are significant differences between the Office 365 kiosk and enterprise plans for Exchange Online. The Kiosk version has limited connectivity versus the enterprise plans. The primary benefit of the Office 365 E3 and E4 plans for Exchange Online is unlimited storage, the legal hold feature, and voicemail integration.

Office 365 Plan Comparison – Office 365 Plans E1, E2, E3, E4, K1, and K2
Exchange Online Features
Feature
Office 365 K1 and K2 Plans
Exchange Online
Kiosk
Office 365 E1 and E2 Plans
Exchange Online
(Plan 1)
Office 365 E3 and E4 Plans
Exchange Online
(Plan 2)
Office 365 Partner Access License (PAL)
(external partners)
Mailbox size
500 megabytes (MB)
25 gigabytes (GB)*
Unlimited**
Office 365 Partner Access License (PAL) users have not access to Exchange online freatures.
Outlook Web App
(regular and light versions)
Yes
Yes
Yes
POP
Yes
Yes
Yes
IMAP
No
Yes
Yes
Outlook Anywhere (MAPI)
No
Yes
Yes
Microsoft Exchange ActiveSync®
No
Yes
Yes
Exchange Web Services
No***
Yes
Yes
Inbox rules
No
Yes
Yes
Delegate access
No (cannot access other users’ mailboxes, shared mailboxes, or resource mailboxes)
Yes
Yes
Instant messaging interoperability in OWA
No
Yes (requires Lync Online or Microsoft Lync Server 2010)
Yes (requires Lync Online or Microsoft Lync Server 2010)
SMS notifications
No
Yes
Yes
Custom retention policies
Yes
Yes
Yes
Multi-mailbox search
Yes
Yes
Yes
Personal archive
No
Yes
Yes
Voicemail
No
No
Yes
Legal hold
No
No
No
*25 GB of storage apportioned across the user’s primary mailbox and personal archive **25 GB of storage in the user’s primary mailbox, plus unlimited storage in the user’s personal archive. Refer to the personal archive section of this document for further information regarding unlimited storage in the archive ***Direct access to Kiosk user mailboxes via Exchange Web Services is not permitted. However, line of business applications can use Exchange Web Services impersonation to access Kiosk user mailboxes

Office 365 E3 and E4 Plans – Unlimited Personal Archive

In Office 365, a personal archive can only be used to store one user’s messaging data. In the Office 365 E1 and Office 365 E2 plan each user receives 25 gigabytes (GB) of total storage which includes both the user’s primary mailbox and personal archive. This effectively limits the personal archive for an Office 365 E1 and E2 plan user to less than 25 GB.

An Office 365 E3 plan and Office 365 E4 plan user has 25 GB for their primary mailbox, plus unlimited storage in the personal archive. For Office 365 E3 and E4 plan users, the personal archive has a default quota of 100 GB. This is generally large enough for reasonable use, including importing a user’s historical email. In the unlikely event that a user reaches this quota, Office 365 support can increase the quota.

To change the Single Item Recovery period for a mailbox, an administrator must contact Office 365 support. The Office 365 E1 and E2 plans support Single Item Recovery period of up to 30 days. The Office 365 E3 plan and Office 365 E4 plan both support a Single Item Recovery period of any length.

Office 365 Lync Online Features for the Enterprise and Kiosk Plans

This table comparing Office 365 plans for Lync Online is a little overkill because the Kiosk plans don’t include Lync and external users don’t have Lync rights either. On top of that all Office 365 E plans include Lync Online Plan 2.

Feature
Office 365 K1 and K2 Plans
Office 365 E1 and E2 Plans
Lync Online
(Plan 2)
Office 365 E2 and E3 Plans
Lync Online
(Plan 2)
Office 365 Partner Access License (PAL)
(external partners)
Instant messaging (IM) and presence
Office 365 Kiosk K1 and K2 plans have no access to Lync Online features of Office 365.
Yes
Yes
Office 365 Partner Access License (PAL) users have not access to Exchange online freatures.
Lync-to-Lync audio/video calling (1-to-1)
Yes
Yes
Lync federation (IM/presence/audio/video)
Yes
Yes
Click-to-communicate in Office
Yes
Yes
Authenticated attendee in Lync meetings*
Yes
Yes
Microsoft Exchange ActiveSync®
Yes
Yes
Online Meetings
Yes (up to 250 attendees)
Yes (up to 250 attendees)
Initiate ad-hoc and scheduled online meetings
Yes
Yes
Initiate multiparty (3 or more users) Lync audio/video sessions
Yes
Yes
Initiate interactive data sharing (screen/application/whiteboard)
Yes
Yes
Interop with third-party dial-in audio conferencing services
Yes
Yes
*Unauthenticated attendees who join scheduled Lync meetings do not require a Lync Online license.

 

References:

The information in this article comparing Office 365 subscription plans was gleaned from the following Microsoft documents:

Office 365 Microsoft Sharepoint Online for Enterprises Service Description (Updated 12/6/2011)
Office 365 Microsoft Exchange Online for Enterprises Service Description (Updated 11/18/2011)
Office 365 Microsoft Lync Online for Enterprises Service Description (Updated 7/29/2011)

Microsoft Office 365for professionals and small businesses(Plan P1) Service Description

    

Microsoft Office 365

 

 

 

 

 

 

 

Microsoft Office 365
for professionals and small businesses
(Plan P1)

 
 

Service Description

 

 

Note: This document is provided for informational purposes only, and Microsoft makes no warranties, express or implied, with respect to this document or the information contained in it.

Published: February 2012
Updated: May 29, 2012

For the latest information, please see http://www.microsoft.com/online.

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

The information contained in this document represents the current view of Microsoft Corporation on the issues discussed as of the date of publication and is subject to change at any time without notice to you. This document is provided “as-is.” Information and views expressed in this document, including URL and other Internet Web site references, may change without notice. You bear the risk of using it. MICROSOFT MAKES NO WARRANTIES, EXPRESS OR IMPLIED, IN THIS DOCUMENT.

This document does not provide you with any legal rights to any intellectual property in any Microsoft product. You may copy and use this document for your internal, reference purposes. This document is confidential and proprietary to Microsoft. It is disclosed and can be used only pursuant to a non-disclosure agreement.

The descriptions of other companies’ products in this document, if any, are provided only as a convenience to you. Any such references should not be considered an endorsement or support by Microsoft. Microsoft cannot guarantee their accuracy, and the products may change over time. Also, the descriptions are intended as brief highlights to aid understanding, rather than as thorough coverage. For authoritative descriptions of these products, please consult their respective manufacturers.

Microsoft may have patents, patent applications, trademarks, copyrights, or other intellectual property rights covering subject matter in this document. Except as expressly provided in any written license agreement from Microsoft, the furnishing of this document does not give you any license to these patents, trademarks, copyrights, or other intellectual property.

All trademarks are the property of their respective companies.

©2012 Microsoft Corporation. All rights reserved.

Microsoft, ActiveSync, Backstage, Entourage, Excel, Forefront, Hotmail, InfoPath, Internet Explorer, Lync, MSN Messenger, OneNote, Outlook, PowerPoint, RoundTable, SharePoint, Silverlight, SkyDrive, SQL Server, Visual Studio, Windows, Windows Live, Windows Mobile, Windows Phone, Windows Server, and Windows Vista are either registered trademarks or trademarks of Microsoft Corporation in the United States and/or other countries.

The names of actual companies and products mentioned herein may be the trademarks of their respective owners.

Contents    

Contents    2

Introduction    2

1. Why Office 365 for Your Organization    2

1.1 Virtually Anytime, Anywhere Access    2

1.2 Easy to Use    2

1.3 Improved Collaboration    2

1.4 Security and Reliability    2

2. Overview of Services Provided by Office 365    2

2.1 Email, Calendar, and Contacts    2

2.2 Team Sites and Public Website    2

2.3 Office Web Apps    2

2.4 Instant Messaging and Online Meetings    2

3. Requirements for Using Office 365    2

3.1 System Requirements    2

3.2 Using Office Desktop Applications    2

3.3 Using Mobile Devices    2

4. Office 365 Security    2

5. Email, Calendar, and Contacts    2

5.1 Access Your Email, Calendar, and Contacts    2

5.2 Functionality of Your Outlook Email, Calendar, and Contacts    2

5.3 Large, Easy-to-Use Mailboxes    2

5.4 Professional Email Addresses    2

5.5 Automatically Update Your Email, Calendar, and Contacts across Devices    2

5.6 See Colleagues’ Availability from Your Outlook Calendar    2

5.7 Antivirus and Anti-Spam Filtering    2

5.8 Reduce Inbox Overload with Conversation View    2

5.9 Set Out-of-Office Replies    2

5.10 Recover Deleted Items    2

5.11 Access Other Email Accounts through Office 365    2

5.12 Personal Archive    2

5.13 Additional Features    2

6. Team Sites and Public Websites    2

6.1 Public-Facing Website    2

6.2 Manage Important Documents    2

6.3 Plenty of Space for Your Documents and Sites    2

6.4 External Sharing    2

6.5 Microsoft Office Integration    2

6.7 Familiar Look and Feel    2

6.8 Data Is Highly Secure    2

7. Office Web Apps    2

7.1 Never Be without the Tools You Need    2

7.2 Ensure Consistent Document Views    2

7.3 Edit Content with Confidence    2

7.4 Work Easily with Others    2

8. Instant Messaging and Online Meetings    2

8.1 Find and Connect with Colleagues and Customers    2

8.2 Easily Conduct Professional Online Presentations or Spontaneous Online Meetings    2

8.3 Interoperability with 3rd Party Dial-in Audio Conferencing Services    2

8.4 View Presence Status and Click-to-Communicate In Microsoft Office Applications    2

8.5 Communicate with Other Office 365 and Windows Live Users    2

8.6 Presence with Microsoft Outlook and Other Office Applications    2

8.7 Presence with Exchange Online    2

9. Additional Office 365 Service Details    2

9.1 Administering Office 365    2

9.2 Getting Help    2

9.3 Additional Self-help Resources    2

9.4 Countries Where Office 365 (Plan P1) Is Available    2

9.5 Languages    2

9.6 Licensing    2

9.7 Buying your Office 365 Subscription    2

9.8 Microsoft Office 365 Marketplace    2

9.9 Service Level Agreement    2

9.10 Data Center Locations    2

Appendix A: Exchange Online Detailed Feature Summary    2

Appendix B: SharePoint Online Detailed Feature Summary    2

 

Introduction

 

Office 365 for professionals and small businesses (Plan P1).
is a set of web-enabled tools that lets you access your email, important documents, contacts, and calendars from virtually anywhere and on almost any device. Designed for organizations with one to 25 employees (with a technical limit of 50 users maximum), the service brings together online versions of the best business-grade communications and collaboration tools from Microsoft plus Microsoft Office Web Apps at a price that small businesses can afford. Office 365 works seamlessly with the programs you already know and use — Microsoft Outlook, Microsoft Word, Microsoft Excel, and Microsoft PowerPoint. This is the much-anticipated cloud service that gives small businesses the capabilities and efficiencies to grow and target more rapid success.

 

Powerful security features from Microsoft Corporation help protect your data, and it will be backed with a 99.9 percent financially-backed uptime guarantee. Office 365 was designed to be easy enough for small businesses to run without specialized IT knowledge.

1. Why Office 365 for Your Organization

1.1 Virtually Anytime, Anywhere Access

Office 365 helps you access your email, important documents, contacts, and calendar on nearly any device from almost anywhere. It frees you to work where and when you choose, allowing you to respond to important requests right away, no matter where you are. Because you can use your mobile device to access email and documents, you won’t have to hurry back to the office (or look for a WIFI hot spot if you are using your computer). When traveling, you can access your email and even edit online documents from most popular web browsers.

1.2 Easy to Use

Office 365 is easy to try, simple to learn, and straightforward to use. It works seamlessly with the programs you know and use most, including Outlook, Word, Excel, OneNote and PowerPoint. With Office 365, you can choose which tools to use.

1.3 Improved Collaboration

With Office 365, you can create a password-protected portal to share large, hard-to-email files both inside and outside your organization, giving you a single location to find the very latest versions of files or documents, no matter how many people are working on them.

1.4 Security and Reliability

Powerful security features from Microsoft help protect your data. Office 365 is backed with a 99.9-percent uptime, financially backed guarantee. Office 365 helps safeguard your data with enterprise-grade reliability, disaster recovery capabilities, data centers in multiple locations, and a strict privacy policy. It also helps protect your email environment with up-to-date antivirus and anti-spam solutions.

2. Overview of Services Provided by Office 365

2.1 Email, Calendar, and Contacts

Powered by Microsoft Exchange Online

Office 365 provides you access to email, calendar, and contacts from virtually anywhere at any time on desktops, laptops, and mobile devices—while helping to protect against malicious software and spam.

  • Easily manage your email with 25-gigabyte (GB) mailboxes and send emails up to 25 megabytes (MB) in size
  • Work from almost anywhere with automatically updated email, calendar, and contacts across devices you use most, including PCs, Macintosh computers, iPhone, Android phones, Blackberry smartphones, Microsoft Windows Mobile®, and Windows® Phones
  • Connect with Microsoft Outlook 2010 or Office Outlook 2007 and use all of the rich Outlook functionality you already know and use, whether you are connected to the Internet at home, or in the office, or you are working offline
  • Access your email, calendar, and contacts from nearly any web browser while enjoying a rich, familiar Outlook experience with Outlook Web App
  • Use your existing domain name to create professional email addresses powered by Exchange Online (for example, mark@contoso.com)
  • Easily schedule meetings by sharing calendars and viewing them side by side, seeing your colleagues’ availability, and suggested meeting times from your calendar
  • Help protect your organization from spam and viruses with Microsoft Forefront® Online Protection for Exchange, which includes multiple filters and virus-scanning engines

2.2 Team Sites and Public Website

Powered by Microsoft SharePoint® Online

SharePoint Online helps you create sites to share documents and information with colleagues and customers. It lets you:

  • Work together effectively by sharing team documents and tracking project milestones to keep everyone in sync
  • Keep your team’s important documents online so the latest versions are always at hand
  • Provide all team members with online access to critical business information whenever and wherever they need it
  • Easily protect critical business information by controlling who can access, read, and share documents and information
  • Market your small business using a simple public-facing website with a custom domain name (for example, www.contoso.com)
  • Publish, share and edit Access database applications on your Team Site

2.3 Office Web Apps

Hosted on Microsoft SharePoint Online

Office Web Apps are convenient online companions to Word, Excel, PowerPoint, and OneNote® that offer you an easy way to access, view, and edit documents directly from your web browser.

  • Work with others simultaneously in Excel spreadsheets and in OneNote notebooks while seeing who is editing what parts of the document
  • Access and view Office documents from your mobile device
  • Ensure that viewers experience great fidelity between documents viewed with the Office Web Apps and those viewed in the desktop Office applications

2.4 Instant Messaging and Online Meetings

Powered by Microsoft Lync Online

Microsoft Lync™ Online helps you find and quickly connect with the right person from within the Office applications you already use.

  • Find and connect with colleagues and customers from virtually anywhere via rich presence, instant messaging (IM), audio/video calls, and online meetings
  • Use the Presence indicator to see when coworkers and partners are online and available
  • Make PC-to-PC audio and video calls with colleagues and customers
  • Conduct rich online meetings—including audio, video, and web conferencing—with people both inside and outside your organization
  • Share your desktop, online whiteboards, and presentations with colleagues and partners inside and outside of your organization
  • Click-to-Communicate with other users of Office 365 and Windows Live™ Messenger

3. Requirements for Using Office 365

3.1 System Requirements

Office 365 works effectively with many combinations of browsers, operating systems, and supporting software. Please refer to System Requirements for Office 365 to view the latest software requirements.

3.2 Using Office Desktop Applications

For the best experience with Office 365, a set of software updates must be applied to each PC. These updates are required for all workstations that use rich clients (such as Microsoft Office 2010) and connect to Office 365 services. To apply these updates, each user should run the Office desktop set-up program, which can be found on the Office 365 home page.

3.3 Using Mobile Devices

You can access the Email, Team Sites, and Instant Messaging capabilities of Office 365 from a variety of phones and mobile devices.

Exchange ActiveSync technology synchronizes mailbox data between mobile devices and Exchange Online, so users can access their email, calendar, contacts, and tasks on the go. Exchange Online also provides better data security features on mobile devices with password enforcement and remote data wiping capabilities.

Team Sites (powered by SharePoint Online) give you a central place to share documents and information with colleagues and customers. Team Sites can render on many devices (including Web-enabled mobile phones) using a simplified text-only format.

The Lync Mobile client lets you send and receive instant messages from your mobile device.Lync Mobile clients are available for the leading smart phone platforms, including Windows Phone, iPhone, Android, and Nokia Symbian.

4. Office 365 Security

Powerful security features from Microsoft help protect your data
with security standards that exceed what many businesses can provide for themselves. With high reliability, disaster recovery capabilities, data centers in multiple locations, and a strict privacy policy, your data is more secure. Availability to the services will be backed with a 99.9-percent uptime, financially backed Service Level Agreement (SLA) when the service is released for general availability. The service includes:

  • Access secure features: Exchange Online is accessed through 128-bit Secure Sockets Layer (SSL) or TLS encryption
  • Intrusion monitoring: Microsoft continuously monitors the Office 365 systems for any unusual or suspicious activity. If Microsoft detects such activity, it investigates and responds appropriately
  • Security audits: Microsoft regularly assesses the Office 365 Services infrastructure to ensure that the latest compliance policies and antivirus signatures are installed, along with high-level configuration settings and required security updates. The Office 365 services have:
    • Achieved ISO 27001 certification
    • Completed SAS70 Type I and II audits
    • Added controls that assist customers in complying with certain regulatory requirements
  • High availability: Office 365 has a 99.9-percent scheduled uptime. If a customer’s service is affected, Office 365 offers a service credit subject to the terms and conditions of the SLA.
  • Business continuity: Redundant network architecture is hosted at geographically dispersed Microsoft data centers to handle unscheduled service outages. Data centers act as backups for each other: If one fails, the affected customers are transferred to another data center with limited interruption of service.

5. Email, Calendar, and Contacts

Powered by Microsoft Exchange Online

Key Features and Benefits

Office 365 messaging services, powered by Exchange Online, provide you with a 25 GB mailbox, contacts, and calendar that is available almost any time and from almost anywhere. Read and reply to your email directly from almost any major smartphone, including iPhone, Android, Nokia, Blackberry, and Windows Phone, or use almost any Macintosh computer or PC.

The following details provide a look at some of the key benefits and capabilities of the messaging services provided by Office 365.

5.1 Access Your Email, Calendar, and Contacts

Microsoft Outlook Web App is a web-based version of Outlook that provides the familiar, rich functionality and experience you are accustomed to from the desktop version of Microsoft Outlook. If you are limited by low bandwidth, Outlook Web App is optimized so it minimizes data and bandwidth use. Cross-browser support for Safari, Firefox, Chrome, and Internet Explorer ensures that wherever you are connected to the Internet—at home, at the office, or on the road—you can access your email through Outlook Web App.

Users can access Outlook Web App from a link on the Office 365 Portal.

 

Figure 1: Access your email from a broad range of browsers with Outlook Web App

5.2 Functionality of Your Outlook Email, Calendar, and Contacts

Office 365 is the only set of services designed to be fully compatible with Microsoft Outlook. Exchange Online works with Outlook 2010 or Office Outlook 2007, making it easier to use the familiar desktop application.

5.3 Large, Easy-to-Use Mailboxes

Exchange Online provides you with 25 GB of mailbox storage. This removes the need to archive email locally with PST files and allows real-time access to your messages from Outlook, a browser or a mobile device. Emails have a size limit of 25 MB, allowing you to send large files, including videos and PowerPoint slides.

5.4 Professional Email Addresses

Use your existing domain name to create professional email addresses powered by Exchange Online (for example, mark@contoso.com). Adding your domain to Office 365 means that you can have your own domain name on email addresses, Lync Online accounts, distribution lists and your public website. When adding a domain, you can also choose to continue to host your public website with another provider.

5.5 Automatically Update Your Email, Calendar, and Contacts across Devices

You can access your email, contacts, and calendar from mobile devices that incorporate Exchange ActiveSync® technology. These devices maintain a connection with the service, receiving any new or updated emails messages, calendar items, contacts or tasks as soon as they arrive on the service. Mobile access is available from a wide range of
devices including iPhone, Android, Nokia, Blackberry, and Windows Phone.

5.6 See Colleagues’ Availability from Your Outlook Calendar

Exchange Online lets you access a consistent calendar from your multiple devices, share your calendar with people inside and outside your company, view multiple calendars side by side, and use the scheduling assistant to view availability and schedule meetings with people inside and outside your company.

5.7 Antivirus and Anti-Spam Filtering

All messages sent through the Exchange Online service are automatically scanned for viruses and malware to help safeguard your data. Exchange Online uses Forefront Online Protection for Exchange—an enterprise-level email filtering technology—to help protect your incoming and outgoing messages. The service uses proprietary anti-spam technology to help achieve high accuracy rates and uses multiple, complementary antivirus engines. Additionally, internal messages are scanned to protect you from viruses that may be sent through email messages within your organization. Antivirus and anti-spam protections are preconfigured and automatically updated, so there are no steps necessary for setting up, configuring, or maintaining the filtering technology.

5.8 Reduce Inbox Overload with Conversation View

By grouping conversations together, you can view messages in context and narrow the number of relevant messages in your inbox. Messages within the conversation are grouped, no matter where the message exists within the mailbox, which helps you and your employees be more productive.

5.9 Set Out-of-Office Replies

With the Exchange Out-of-office feature, you can see if someone is out of office before sending an email message or scheduling an appointment. You can schedule out-of-office messages in advance with specific start and end times. You can configure separate out-of-office messages for users in your company and for external users such as your customers or partners. Junk email and mailing list awareness prevents external out-of-office messages from being sent to extended mailing lists and spammers. You can also format out-of-office messages as rich HTML messages with hyperlinks rather than as plain text. Exchange Online also gives you the ability to set out-of-office messages from mobile devices that support this Exchange ActiveSync feature.

5.10 Recover Deleted Items

Exchange Online enables you to restore items that have been deleted from any email folder—including the Deleted Items folder—in case you accidentally delete an important item. These items are kept in a Recoverable Items folder for 14 days before being permanently removed. You can recover these items yourself using the Recover Deleted Items feature in Outlook Web App or Outlook.

5.11 Access Other Email Accounts through Office 365

You can connect to as many as five email accounts from Outlook Web App, letting you send, receive, and read email messages from those connected accounts in one place.

  • Windows Live Hotmail: You don’t need to turn on POP or IMAP access for a Windows Live Hotmail® account. If you have folders in your Hotmail account, these folders are copied to your account in Outlook Web App along with the email messages downloaded from your Hotmail account.
  • Gmail: Allow POP access from your Gmail account to download mail from the Gmail account to Outlook Web App.
  • Yahoo Mail Plus, Comcast, AOL: These services give you POP access automatically and don’t support IMAP access.
  • IMAP Access: Outlook Web App supports IMAP access for most services, except Gmail. With IMAP access, your folders and mail items within those folders are downloaded to Outlook Web App the same way you see them in your other account. If your other account allows IMAP access, ensure IMAP access is turned on before you connect to the account.

5.12 Personal Archive

Exchange Online offers archiving through the personal archive capabilities of Exchange 2010 to help you store historical data that you rarely access. A personal archive is a specialized mailbox that appears alongside your primary mailbox folders in Outlook or Outlook Web App similar to a personal folder. You can access the archive in the same way you access your normal mailbox. In addition, you can search both your personal archive and primary mailbox.

Outlook 2010 and Outlook Web App provides you with the full features of the personal archive, as well as related features like retention policies which can help you organize and clean up your mailbox.

Outlook 2007 provides basic support for the personal archive, but not all features are available in Outlook 2007. For example, with Outlook 2007, you cannot apply retention policies to items in your mailbox.

Administrators can use the Exchange Control Panel to enable the personal archive feature for specific users in your company.

Size of the Personal Archive

Each personal archive can only be used to store one person’s messaging data. You receive 25 GB in storage which can be used across both your primary mailbox and personal archive.

Importing Data to the Personal Archive

You can import historical data to personal archives in the following four ways:    

  • Import data from a .pst file using Outlook’s Import and Export wizard
  • Drag email messages from .pst files into the archive
  • Drag email messages from your primary mailbox into the archive
  • Set retention policies to automatically move certain email messages from your primary mailbox, based on the age of the messages

5.13 Additional Features

  • Global Address List: A Global Address List gives companies a common directory of all email-enabled users, distribution groups, and external contacts, helping to ensure that users can access the most recent contact information.
  • Resource Mailboxes: Use Outlook or Outlook Web App to schedule use of shared resources, such as a conference room. After setting up the room alias (ex. ConfRm1@contoso.com), users can reserve the room by adding the conference room email alias to meeting requests.
  • Distribution Groups: Distribution groups make it easy to send messages to multiple people. Unlike personal distribution groups that individuals create in Outlook, these distribution groups are available to all users through their Global Address List in Outlook.
  • Integrated Instant Messaging and Presence: Outlook Web App has instant messaging capabilities integrated into the web client, connected to Lync Online. Using the colorful status indicator of another person, users can see who is online and quickly decide if they should send an e-mail or just fire off a quick IM to get a fast response.
  • Message Delivery Status Reports: Flexible message tracking capability to search for message delivery status on e-mail sent to or from users in Exchange Online. A web-based user interface also allows administrators to search for delivery reports by subject and within the last two weeks.

 

For a detailed feature summary of Exchange Online, see Appendix A.

6. Team Sites and Public Websites

Powered by Microsoft SharePoint® Online

Key Features and Benefits

Office 365 makes it easy for you to share documents with colleagues, customers, and even trusted business partners. SharePoint Online is designed to work with familiar Office applications. It’s easy to create Office documents and save directly to SharePoint Online or co-author documents with Office Web Apps. Information workers can access important documents offline or from familiar mobile devices and set document-level permissions to protect sensitive content. With one click, it’s possible to communicate in real-time with colleagues and customers from within SharePoint sites.

The following sections provide information about some of the key benefits and capabilities of Team Sites and the public-facing website in Office 365.

6.1 Public-Facing Website

You can easily create a well-designed, public-facing website and apply a custom domain name (for example, www.contoso.com) using the built-in Site Designer tool. The built-in Site Designer tool provides many out-of-the-box templates you can use to personalize your site. Public sites built using SharePoint Online are excellent for small businesses that need a simple and attractive site.

6.2 Manage Important Documents

When a single document has multiple contributors, versioning and control issues can quickly become problematic. SharePoint Online provides your Team Sites with built-in document check-in
and check-out capabilities that work directly in Microsoft Office 2007, Microsoft Office 2010, and Office Professional Plus. In addition, two or more people can co-author a document using Microsoft Office 2010 and Office Professional Plus or Office Web Apps.

SharePoint Online document libraries can be configured so that revision numbers for documents are automatically updated every time a user checks in a document. You can also easily return to any previous version. Document collaboration in SharePoint Online is a well-developed, flexible feature that you can adjust to meet your specific requirements.

6.3 Plenty of Space for Your Documents and Sites

Each subscription to Office 365 comes with a SharePoint Online site collection that can host multiple sub-sites starting with 10 GB of storage plus 500 MB for each subscriber. For example, if you have 10 users, you would have 15 GB total of storage. This is in addition to the 25 GB each user gets for his or her email.

6.4 External Sharing

You can share documents and information easily with trusted business partners. A team site gives your organization a single location to find the latest versions of files or documents. You can access your team sites and the documents they contain from your web browser and your mobile device and work directly with documents from your Office desktop applications. SharePoint Online allows you to share documents and information more securely with colleagues and customers inside or outside your company. Major benefits of SharePoint Online team sites include:

  • Manage and share important documents to help teams work together
  • Track key project milestones and schedules with shared-calendars
  • Create, edit, and review documents and proposals in real-time
  • Share documents and information easily with trusted business partners
  • Manage important meeting notes and project delivery schedules
  • Enable real-time communication with colleagues right from within SharePoint
  • Apply your own unique look and feel to team sites with custom theming and branding

6.5 Microsoft Office Integration

Microsoft Office and SharePoint Online now work better together. In addition to document collaboration and management, new capabilities now enable co-authoring—two or more users can simultaneously work on the same document. With Outlook 2010 or Office Outlook 2007, you can view or edit calendars and contact lists that are stored on SharePoint Online sites and create and manage sites for organizing meetings.

Some highlights of the new functionality in Microsoft Office 2010 and Microsoft Office Professional Plus that interoperate with SharePoint Online include:

  • Backstage View: The Microsoft Office Backstage™ view allows you to manage your documents and related data—you can create, save and send documents, inspect documents for hidden metadata or personal information, and set options such as turning on or off AutoComplete suggestions.
  • Document Co-Authoring: With new co-authoring capabilities, multiple users can edit the same document at the same time, even if they are in different locations. They can even communicate as they work directly from within the desktop application.
  • Outlook: Gain read/write access to SharePoint Online items such as calendars, tasks, contacts, and documents. See complete views of calendars and tasks across multiple lists and sites.
  • Outlook Alerts: You can stay updated on changes to documents and list items on SharePoint sites by receiving notifications of changes as alerts and Really Simple Syndication (RSS).
  • Hosted Access Databases: You can easily publish Access 2010 databases from your desktop up to SharePoint Online using Access Services. You now have a way to create Web-based Access databases that are easily accessible as any other site to your broader peer group.

6.7 Familiar Look and Feel

Microsoft understands the value of keeping a consistent look and feel to its menus across different applications. When using SharePoint Online, you will find the familiar Ribbon featured in Office 2007 and Office 2010. The Ribbon has the features and the functionality you expect, saving you the time and frustration you may experience working with different online services.

Figure 2: Familiar look and feel with the SharePoint Online Ribbon

6.8 Data Is Highly Secure

All documents that you or your colleagues add to SharePoint Online are scanned for malware using multiple scanning engines. You can control who can access your documents stored in your password-protected sites, and you can further control access within SharePoint Online to designate who can view and edit documents and information.

 

For a detailed feature summary of SharePoint Online, see Appendix B.

7. Office Web Apps

Hosted on Microsoft SharePoint Online

Key Features and Benefits

Office Web Apps help you work with Office documents directly in a browser when you are away from the office or at a shared PC.
Office Web Apps are convenient online companions to Word, Excel, PowerPoint, and OneNote that give you the freedom to view and edit your Office documents from virtually anywhere with a supported browser and to view your documents on a supported mobile device.

The following sections provide information about some of the key benefits and capabilities of Office Web Apps provided by Office 365.

7.1 Never Be without the Tools You Need

If you are away from your office or home, and you find yourself using a computer that doesn’t have Microsoft Office installed, you can use the Office Web Apps to view and edit documents in Word, Excel, PowerPoint, and OneNote. Microsoft SharePoint Online team sites use the Office Web Apps to allow you to access, view, edit, save, and share your stored files from almost any computer with an Internet connection. You can even access and view PowerPoint, Word, and Excel content from a browser on mobile devices.

7.2 Ensure Consistent Document Views

You spend a lot time making your content look its best and you want to know that those who view your content are seeing what you intended. Office Web Apps provides professional, high-fidelity viewing of Word, Excel, PowerPoint, and OneNote files. You can take advantage of the rich features in Microsoft Office on your desktops to create content and then share those files online with great document fidelity and consistent formatting.

7.3 Edit Content with Confidence

When you create documents with Microsoft Office on your desktop, you might use rich content and advanced features such as graphics, images, tables of content, and cross-references to add impact to important information. Keep document formatting intact as you edit between the Office Web Apps and the corresponding desktop application.

7.4 Work Easily with Others

Office Web Apps makes it simple to collaborate on documents with people who use different platforms or different versions of Microsoft Office or simply don’t have Office installed on their computer. When you give someone access to your Office documents on SharePoint Online, they can view Microsoft Office documents through a supported Web browser using the Office Web Apps.

 

8. Instant Messaging and Online Meetings

Powered by Microsoft Lync Online

Key Features and Benefits

Microsoft Lync Online is a next-generation online communications service that connects people in new ways anytime from virtually anywhere. Lync Online provides rich and intuitive communications capabilities including presence, IM, audio/video calling, and an online meeting experience that supports audio, video, and web conferencing.

Lync Online transforms interactions with colleagues, customers, and partners from today’s hit-and-miss communications to a more collaborative, engaging, and effective experience that can help your business function more efficiently and cost effectively.

The following sections provide information about some of the key benefits and capabilities of Lync Online provided by Office 365.

8.1 Find and Connect with Colleagues and Customers

Businesses often face communications problems because people must repeatedly attempt to reach each other by phone or email. The problem gets worse when people communicate across geographies and time zones. Lync Online enables you to know when a colleague or partner is available to communicate and enables you to choose the proper communications method (IM, audio/video call, and/or data sharing) in order to resolve critical business discussion or make time-sensitive decisions.

Figure 4: Lync Online meeting with PC audio, video conferencing, and screen sharing

8.2 Easily Conduct Professional Online Presentations or Spontaneous Online Meetings

With Lync Online, you can have more effective interactions with colleagues and partners by escalating IM sessions into spontaneous online meetings including audio, video, and screen sharing in just a few clicks. You can also conduct professional online presentations with external customers, partners, and colleagues that include data, video, and audio with the ability to control content, annotate, and use a virtual whiteboard.

External attendees can join online meetings to view or share a screen and IM through a web browser. Alternatively, attendees can download and install the free Lync attendee software, which provides full fidelity PC-audio, video, and content sharing capabilities.

8.3 Interoperability with 3rd Party Dial-in Audio Conferencing Services

Dial-in audio conferencing is the ability to dial into a scheduled Lync meeting/conference from fixed-lines or mobile phones. This capability is not provided natively in Lync Online, but can be achieved through leading third-party audio conferencing services. See the Office 365 marketplace listings for more information about this optional interoperability.

8.4 View Presence Status and Click-to-Communicate In Microsoft Office Applications

Collaborating with others can be challenging if your job requires constant use of business productivity applications. Lync Online connects presence and real-time collaboration capabilities with the Microsoft Outlook messaging and collaboration client. This enables higher productivity by allowing you to collaborate using the familiar programs you and your colleagues already use.

8.5 Communicate with Other Office 365 and Windows Live Users

The federation feature of Lync Online establishes trusted relationships between your organization and one or more external organizations. This allows your people to see user presence and communicate across organizational boundaries. Public IM connectivity (PIC) allows your organization to more securely connect its existing base of enterprise-enabled IM users to trusted contacts using public IM services that can be provided by Windows Live Messenger.

All federated communications are encrypted between the IM systems using access proxy servers. Microsoft does not control encryption after messages are passed to the federated partner’s network (if the partner is federated with an on-premises Lync Server or third-party network).

IM federation requires the consent and proper configuration of both parties of the federation relationship. Once the federation is set up by both sides, users in each company can start seeing presence and communicating with users in the other company. Table 2 shows how federation affects IM, presence, and PC-to-PC audio and video.

Table 2: Federation features by link type

 

IM and Presence

PC-to-PC Audio and Video

Other companies using Office 365/Lync Online

Yes

Yes

Lync Server 2010 or Office Communications Server on-premises (any version)

Yes

Yes

Windows Live Messenger

Yes

Yes

Works with Office

8.6 Presence with Microsoft Outlook and Other Office Applications

Lync Online can connect presence with Microsoft Office 2007 or Office 2010. You can instantly find and communicate with people from within Office Outlook. This connection occurs wherever you see a colored presence indicator that represents a person’s presence status. You can then click the presence icon and initiate a communications using Lync (this feature is called “click-to-communicate”).

8.7 Presence with Exchange Online

Lync Online connects presence with Exchange Online. This includes presence status in Outlook, presence status changes based on Exchange calendar information, IM, and presence in Outlook Web App, out-of-office messages in the Lync client, and click-to-communicate via Lync Communicator from Outlook.

 

9. Additional Office 365 Service Details

9.1 Administering Office 365

Office 365 is easy to set up and use. Because it was designed for organizations without IT staff, you can focus on your business rather than learning how to navigate menus and reading unfamiliar technical words. Administration is performed using an intuitive, web-based portal accessible only to those you designate.

As an owner of your organization’s account, you are considered a Global Administrator. Global Administrators can create new user accounts, assign administrative roles to others, and configure the different services included in Office 365. You do not need any special technical expertise to be an administrator for Office 365 for professionals and small businesses.

Figure 5: Office 365 Administration Website


9.2 Getting Help

Customers who purchase Microsoft Office 365 for professionals and small businesses have the Microsoft Office 365 Community (www.community.office365.com) available as the primary way to have technical and billing issues resolved. Telephone support for any technical questions is not provided in the cost of the subscription.

The Office 365 Community

The Microsoft Office 365 Community is a single destination for self-help support information and community discussion. The Microsoft Office 365 Community has the latest information to help customers find answers to a variety of technical, billing and service questions via support forums, wikis, and blogs.

The Office 365 Community is a public website (community.office365.com) and is available 24 hours a day, 7 days a week. The support forums are staffed and moderated by Microsoft Support Agents. Anyone can view and read  the support forums, wikis, and blogs related to Microsoft Office 365. We encourage customers, Microsoft Partners and Microsoft Most Valuable Professionals (MVPs) to engage with the community and contribute to the ongoing discussions. To actively post and reply to discussions within the Community, an individual must register and sign in with a Microsoft Office 365 ID or with a Windows Live™ ID (Hotmail, MSN, Windows Live).

Community Resources

From the Community home page you can access the following resources:

  • Forums are intended to provide Community participants with an online destination where they can post technical support questions and discuss topics related to the Office 365 service. Forums include categories dedicated to each of the individual online services as well as individual topics that our customers find valuable.
  • Wikis include wiki pages created by Microsoft employees and authenticated Community members. This collaborative site encompasses the latest collective content about specific Microsoft Office 365 technical scenarios. Each individual wiki page typically includes links to websites, webcasts, troubleshooting videos, frequently asked questions (FAQ) pages, documents, and downloads about that specific technical scenario. Historical tracking of every revision date and author alias is provided along with the ability to compare versions.
  • Blogs are a good resource for obtaining current information about Microsoft Office 365 online services and for learning about the benefits of Microsoft Office 365 features and functions. Within the Community portal for Microsoft Office 365 are two basic types of blogs: the Microsoft Office 365 Blog and  the Microsoft Office 365 Technical Blog.
  • Microsoft Office 365 Blog focuses on the latest news and product information about Microsoft Office 365. The target audience is people interested in Microsoft Office 365. Sample topics include product insights, new product announcements, customer interviews, and a guest blog series.
  • Microsoft Office 365 Technical Blog helps existing customers with technical tasks or in troubleshooting common issues. The target audience consists of people using, selling, supporting, and developing applications for Microsoft Office 365. Sample topics include troubleshooting videos, technical webcasts, announcements about product feature updates, and showcasing of Microsoft partner technical solutions.

 

Help with Your Bill

Although community is the primary support vehicle for Office 365 for professionals and small businesses, customers can get help with billing issues by submitting a ticket from the Support Overview page in the Office 365 portal.  Customer billing support will respond as appropriate depending on the severity of the issue by calling the customer, e-mailing FAQs, or pointing to community support.

9.3 Additional Self-help Resources

Virtual Support Agent

The Virtual Support Agent is an automated support agent that provides online support around the clock, interacting in a natural, conversational style. It is located on the Microsoft Office 365 Support Overview page. Customers use a text-chat interface to type questions in their own words and receive immediate responses. The automated agent has access to a variety of databases built on current content about Microsoft Office 365.

Technical Support Videos

The growing library of English-language-only instructional troubleshooting videos has been developed based on the most commonly asked questions from customers.

To view these videos, go to the Community site and search for videos. Customers are encouraged to submit a request for a video through the Community portal. Customers can also navigate to the Microsoft Office 365 YouTube and Showcase channels.

Learn Through Social Media

Following Microsoft Office 365 on Facebook, Twitter, and LinkedIn provides a way for customers and partners to become more educated about Microsoft Office 365. This fast and easy way of learning about Microsoft Office 365 allows customers to listen to what others are saying and be able to add their own comments and tweets. Microsoft support professionals monitor the Microsoft-related Facebook and Twitter activity to assist with any support-related inquiries.

To find the most current Facebook feeds along with the most recent Tweets, go to the bottom of the Community home page to hear the daily discussions among customers and partners.

9.4 Countries Where Office 365 (Plan P1) Is Available

Office 365 is available in 38 countries: Australia, Austria, Belgium, Canada, Colombia, Costa Rica, Cyprus, Czech Republic, Denmark, Finland, France, Germany, Greece, Hong Kong, Hungary, India, Ireland, Israel, Italy, Japan, Luxembourg, Malaysia, Mexico, Netherlands, New Zealand, Norway, Peru, Poland, Portugal, Puerto Rico, Romania, Singapore, Spain, Sweden, Switzerland, Trinidad & Tobago, United States, and UK.

9.5 Languages

Table 3 summarizes the languages supported the Office 365 platform and related components.

Table 3: Supported languages for components related to Office 365

Component

Supported languages

Office 365 Portal

English, Japanese, German, French, Italian, traditional Chinese, simplified Chinese, Danish, Dutch, Finnish, Norwegian (Bokmal), Spanish, Swedish, Brazilian, Portuguese, Czech, Greek, Hungarian, Polish, Romanian

Help content

English, Japanese, German, French, Italian, traditional Chinese, simplified Chinese, Danish, Dutch, Finnish, Norwegian (Bokmal), Spanish Swedish, Brazilian, Portuguese, Czech, Greek, Hungarian, Polish, Romanian

Community

English, Japanese, German, French, Italian, Spanish, traditional Chinese, Korean, Russian

Office desktop set up

English, Japanese, German, French, Italian, traditional Chinese, simplified Chinese, Danish, Dutch, Finnish, Norwegian (Bokmal), Spanish Swedish, Brazilian, Portuguese, Czech, Greek, Hungarian, Polish, Romanian

 

9.6 Licensing

Office 365 for professionals and small businesses (Plan P1) is designed for 1 to 25 users, but you may purchase up to 50 users. You can add or remove users at any time, but you cannot add more than 50 users.

Office 365 for professionals and small businesses is not available under Microsoft Volume Licensing. Subscriptions are available on a month-to-month basis and automatically renew each month. You can cancel at any time with no early termination fee.

Office Professional Plus can be licensed separately from Office 365 for professionals and small businesses.

9.7 Buying your Office 365 Subscription

Office 365 for professionals and small businesses gives you the option to sign up for a 30 day trial period or to sign up for a paid subscription. Before signing up, you will be required to sign the Microsoft Online Subscription Agreement (MOSA).

The trial is a free period so you can experience Office 365 without having to purchase a subscription. The trial provides the full functionality of Plan P, with the exception that it is limited to 10 users. Customers have a couple of options at the end or during the trial period:

  • Convert an existing trial to paid subscription: If you choose to convert your trial subscription to a paid subscription of the same plan, end users on your trial subscription are automatically transferred (with their data) to the paid subscription.

    Figure 6: Click from within Office 365 to purchase during the 30-day trial period

  • Purchase a new paid subscription: If you choose to purchase a new paid subscription unrelated (different plan) to your trial subscription, you will need to manually assign users to the paid subscription. Purchasing a new paid subscription will not automatically move their data.

Your subscription term will begin on the day you convert to or purchase the paid plan subscription. Your first bill will occur on the first day of your subscription and subsequent bills will occur on the same day of each subsequent month. Your subscription will auto-renew each month unless you cancel.

 

Canceling your Office 365 Subscription

You can cancel your Office 365 subscription at any time without a penalty. Cancelation is available through the portal under the manage subscriptions tab.

After cancelation the subscription/service is in an active state until the end of the month. At the end of the month that the subscription is canceled, the account enters a 7-day grace period. During the grace period, a warning message is displayed in the portal but end users can continue to access the service. After the 7-day grace period the service goes into a 90-day disabled state. During the disabled state the end users cannot access the service. The administrator can access the service and retrieve data.

Visit www.Office365.com for the latest pricing information.

9.8 Microsoft Office 365 Marketplace

The Microsoft Office 365 Marketplace is specifically designed to help customers find trusted Microsoft Office 365 experts as well as applications and services that enhance and easily integrate with the Microsoft Office 365 suite of products. For example, customers can find a partner to purchase a custom domain to associate with their Office 365 website and email or audio conferencing providers to add dial-in phone numbers to Lync online meetings. Partners can also help migrate data and set up Office 365 services, so customers can get up and running more quickly.

Visit the Microsoft Office 365 Marketplace at http://office365.pinpoint.microsoft.com.

9.9 Service Level Agreement

Microsoft Online Services guarantees 99.9 percent uptime for all paid Office 365 subscriptions. These service levels are financially backed. That means, if Microsoft does not meet the terms of the Service Level Agreement (SLA), you are eligible to receive service credits equal to a percentage off your total monthly bill.

The following are the service credit tiers for SLA violation:

Monthly Uptime Percentage

Service Credit

< 99.9%

25%

< 99%

50%

< 95%

100%

 

 

9.10 Data Center Locations

Microsoft data centers are strategically located throughout the world to provide more secure and seamless, around-the-clock access to your data. Data is replicated to a secondary backup data center within the region to help protect against failures across entire data centers. When your company signs up for Office 365, its hosted environment is automatically provisioned in the appropriate data center based on your company’s address. All users for the company are hosted from the same region.

Appendix A: Exchange Online Detailed Feature Summary

This section presents overviews of Exchange Online features and specifications.

Feature

Description        

Mailbox size

25 GB

Message size limits (max email size)

25 MB

Recipient limits

1500 recepients/day for each cloud-based mailbox

Deleted item recovery

14 Days

Deleted mailbox recovery

30 Days

CLIENT ACCESS

 

Outlook 2010 support

Yes

Office Outlook 2007 support

Yes

Outlook Anywhere (RPC over HTTPS)

Yes

Outlook Web App Premium experience

Internet Explorer 7+, Safari 3+, Firefox, Chrome

Outlook Web App Light experience

Most other browsers not supported in the Outlook Web App Premium experience

Outlook Web App: session time-out

6 hours

WebReady document viewing

Yes

Instant messaging and presence integrated into web email client

Yes, with Lync Online

Macintosh support (rich client)

Outlook 2011 for Mac

IMAP

Yes

POP

Yes

MOBILITY

 

Mobile Devices

Windows Mobile and Windows Phone, Nokia E and N series devices, Palm devices, Apple iPhone and iPad, and Blackberry (using Blackberry Internet Service)

Remote device wipe (implementation varies by mobile device manufacturer)

Yes

Disable Exchange ActiveSync access

Yes

Mobile device allow/block/quarantine

Yes

Mobile SMS sync (through Exchange ActiveSync)

Yes

SMS (text messaging) notifications

Yes

EMAIL/INBOX

 

“Send on behalf of” and “send as”

Yes

Shared mailboxes

Yes

Inbox rules

Yes

Tasks

Yes

Conversation view and actions (such as ignore conversation)

Yes

Connected accounts (aggregate mail from multiple external email accounts)

Yes

CONTACTS/DIRECTORY

 

Personal contacts

Yes

Personal distribution groups

Yes

Offline Address Book

No

Global Address List (GAL) photos

Yes

External contacts (in GAL)

Yes

CALENDAR

 

Out-of-office auto replies

Yes

Federated calendar sharing

Yes

Side-by-side calendar view in web client

Yes

SECURITY

 

Anti-spam (AS)

Forefront Online Protection for Exchange

Antivirus (AV)

Forefront Online Protection for Exchange AV for inbound/oubound, Forefront AV internal

COMPLIANCE/ARCHIVING

 

Disclaimers

No

Personal archive

Yes

ADMINISTRATION

 

Administration through a web-based interface (Exchange Control Panel)

Yes

APPLICATION ACCESS/CUSTOMIZATION

 

Application connectivity through web services

Yes

SMTP relay

Yes

Outlook Web App Web Parts

Yes

Outlook add-ins and Outlook MAPI

Yes

 

 

 

Appendix B: SharePoint Online Detailed Feature Summary

This section presents overviews of SharePoint Online features and specifications.

SharePoint Online feature overview

Feature

Description

 

Storage

10 GB with additional 500 MB per user

Buy additional storage

No

Max Org Users

50

Partner Access Licenses (External Sharing)

Yes – up to 500 external users/month

File upload limit

250 MB

Works with Microsoft Office 2010

Access 2010

Excel 2010

Outlook 2010

OneNote 2010

PowerPoint 2010

Microsoft SharePoint Designer 2010

Word 2010

SharePoint Workspace 2010

Browser support

Internet Explorer 7

Internet Explorer 8

Firefox 3

Safari 3.1.2 on Macintosh OS X 10.5

Mobile device support

Windows Mobile 6.5.x

Nokia E series and N series devices

Apple iPhone 2.0

Team Sites

Yes

Simple Public-Facing Website

Basic public site included, vanity URLs (custom domains) are supported

Site Designer

Yes

Sandbox Solutions (Partially Trusted Code)

Yes

Access Services

Yes

Office Web Apps

Available (both read and write access); View only for invited external users

 

Features of Microsoft SharePoint

Feature

Access Services

Accessibility

Audience Targeting

Basic Sorting

Best Bets

Blogs

Browser-Based Customizations

Client Object Model (OM)

Cross-Browser Support

Discussions

Duplicate Detection

Enterprise Scale Search

External Sharing – Partner Access

Improved Governance

Integration with Lync Online and Exchange Online

Language Integrated Query (LINQ) for SharePoint

Large List Scalability and Management

Microsoft Visual Studio® 2010 SharePoint Developer Tools (to build and package Sandbox Solutions)

Mobile Connectivity & Search

Multilingual User Interface

Multiple Team Site Templates

Office and Office Web Apps intergation

Out-of-the-Box Web Parts

Permissions Management

Phonetics and Nickname Search

Photos and Presence

Recently Authored Content

Ribbon and Dialog Framework

Sandboxed Solutions

Search Scopes

Service Application Platform

SharePoint Designer

SharePoint Lists

SharePoint Ribbon and Fluent UI

SharePoint Service Architecture

SharePoint Workspace

Silverlight Web Part

Simple Public-Facing Website

Single Site Collection Search

Support for Accessibility Standards

View in Browser within Search Results

Wikis

Windows 7 Support

Workflow

Workflow Models

Enterprise-oriented SharePoint capabilities such as My Sites, Custom-code Workflows, InfoPath Forms Services, Excel Services, Visio Services, Business Connectivity Services, Advanced Web Analytics, and full-trust code are not included in Office 365 Plan P1.

 


 

PATTERNS OF PARALLEL PROGRAMMING

PATTERNS OF PARALLEL PROGRAMMING

 

 

UNDERSTANDING AND APPLYING PARALLEL PATTERNS

WITH THE .NET FRAMEWORK 4 AND VISUAL C#

 

 

 

Stephen Toub

Parallel Computing Platform

Microsoft Corporation

 

Abstract

 

This document provides an in-depth tour of support in the Microsoft® .NET Framework 4 for parallel programming.

This includes an examination of common parallel patterns and how they’re implemented without and with this

new support, as well as best practices for developing parallel components utilizing parallel patterns.

 

Last Upd ated:

 

July 1, 2010

 

This material is provided for informational purposes only. Microsoft makes no warranties, express or implied.

©2010 Microsoft Corporation.

 

 

T A B L E O F C O N T E N T S

 

 

Introduction ……………………………………………………………………………………………………………………………………………….3

Delightfully Parallel Loops ……………………………………………………………………………………………………………………………4

Fork/Join ………………………………………………………………………………………………………………………………………………….36

Passing Data……………………………………………………………………………………………………………………………………………..49

Producer/Consumer ………………………………………………………………………………………………………………………………….53

Aggregations …………………………………………………………………………………………………………………………………………….67

MapReduce………………………………………………………………………………………………………………………………………………75

Dependencies …………………………………………………………………………………………………………………………………………..77

Data Sets of Unknown Size …………………………………………………………………………………………………………………………88

Speculative Processing ………………………………………………………………………………………………………………………………94

Laziness ……………………………………………………………………………………………………………………………………………………97

Shared State …………………………………………………………………………………………………………………………………………..105

Conclusion ……………………………………………………………………………………………………………………………………………..118

 

 

Patterns of Parallel Programming Page 2

 

 

I N T R O D U C T I O N I N T R O D U C T I O N

Patterns are everywhere, yielding software development best practices and helping to seed new generations of

developers with immediate knowledge of established directions on a wide array of problem spaces. Patterns

represent successful (or in the case of anti-patterns, unsuccessful) repeated and common solutions developers

have applied time and again in particular architectural and programming domains. Over time, these tried and true

practices find themselves with names, stature, and variations, helping further to proliferate their application and

to jumpstart many a project.

 

Patterns don’t just manifest at the macro level. Whereas design patterns typically cover architectural structure or

methodologies, coding patterns and building blocks also emerge, representing typical ways of implementing a

specific mechanism. Such patterns typically become ingrained in our psyche, and we code with them on a daily

basis without even thinking about it. These patterns represent solutions to common tasks we encounter

repeatedly.

 

Of course, finding good patterns can happen only after many successful and failed attempts at solutions. Thus for

new problem spaces, it can take some time for them to gain a reputation. Such is where our industry lies today

with regards to patterns for parallel programming. While developers in high-performance computing have had to

develop solutions for supercomputers and clusters for decades, the need for such experiences has only recently

found its way to personal computing, as multi-core machines have become the norm for everyday users. As we

move forward with multi-core into the manycore era, ensuring that all software is written with as much parallelism

and scalability in mind is crucial to the future of the computing industry. This makes patterns in the parallel

computing space critical to that same future.

 

“In general, a ‘multi-core’ chip refers to eight or fewer homogeneous cores in one

microprocessor package, whereas a ‘manycore’ chip has more than eight possibly

heterogeneous cores in one microprocessor package. In a manycore system, all cores

share the resources and services, including memory and disk access, provided by the

operating system.” –The Manycore Shift, (Microsoft Corp., 2007)

 

In the .NET Framework 4, a slew of new support has been added to handle common needs in parallel

programming, to help developers tackle the difficult problem that is programming for multi-core and manycore.

Parallel programming is difficult for many reasons and is fraught with perils most developers haven’t had to

experience. Issues of races, deadlocks, livelocks, priority inversions, two-step dances, and lock convoys typically

have no place in a sequential world, and avoiding such issues makes quality patterns all the more important. This

new support in the .NET Framework 4 provides support for key parallel patterns along with building blocks to help

enable implementations of new ones that arise.

 

To that end, this document provides an in-depth tour of support in the .NET Framework 4 for parallel

programming, common parallel patterns and how they’re implemented without and with this new support, and

best practices for developing parallel components in this brave new world.

 

This document only minimally covers the subject of asynchrony for scalable, I/O-bound applications: instead, it

focuses predominantly on applications of CPU-bound workloads and of workloads with a balance of both CPU and

I/O activity. This document also does not cover Visual F# in Visual Studio 2010, which includes language-based

support for several key parallel patterns.

 

Patterns of Parallel Programming Page 3

 

 

D E L I G H T F U L L Y P A R A L L E L L O O P S D E L I G H T F U L L Y P A R A L L E L L O O P S

Arguably the most well-known parallel pattern is that befitting “Embarrassingly Parallel” algorithms. Programs that

fit this pattern are able to run well in parallel because the many individual operations being performed may

operate in relative independence, with few or no dependencies between operations such that they can be carried

out in parallel efficiently. It’s unfortunate that the “embarrassing” moniker has been applied to such programs, as

there’s nothing at all embarrassing about them. In fact, if more algorithms and problem domains mapped to the

embarrassing parallel domain, the software industry would be in a much better state of affairs. For this reason,

 

many folks have started using alternative names for this pattern, such as “conveniently parallel,” “pleasantly

parallel,” and “delightfully parallel,” in order to exemplify the true nature of these problems. If you find yourself

trying to parallelize a problem that fits this pattern, consider yourself fortunate, and expect that your

parallelization job will be much easier than it otherwise could have been, potentially even a “delightful” activity.

 

A significant majority of the work in many applications and algorithms is done through loop control constructs.

Loops, after all, often enable the application to execute a set of instructions over and over, applying logic to

discrete entities, whether those entities are integral values, such as in the case of a for loop, or sets of data, such

as in the case of a for each loop. Many languages have built-in control constructs for these kinds of loops,

Microsoft Visual C#® and Microsoft Visual Basic® being among them, the former with for and foreach keywords,

and the latter with For and For Each keywords. For problems that may be considered delightfully parallel, the

entities to be processed by individual iterations of the loops may execute concurrently: thus, we need a

mechanism to enable such parallel processing.

 

I M P L E M E N T I N G A P A R A L L E L L O O P I N G C O N S T R U C T

 

 

As delightfully parallel loops are such a predominant pattern, it’s really important to understand the ins and outs

of how they work, and all of the tradeoffs implicit to the pattern. To understand these concepts further, we’ll build

a simple parallelized loop using support in the .NET Framework 3.5, prior to the inclusion of the more

comprehensive parallelization support introduced in the .NET Framework 4.

 

First, we need a signature. To parallelize a for loop, we’ll implement a method that takes three parameters: a

lower-bound, an upper-bound, and a delegate for the loop body that accepts as a parameter an integral value to

represent the current iteration index (that delegate will be invoked once for each iteration). Note that we have

several options for the behavior of these parameters. With C# and Visual Basic, the vast majority of for loops are

written in a manner similar to the following:

 

C#

 

for (int i = 0; i < upperBound; i++)

{

 

// … loop body here

 

}

 

Visual Basic

 

For i As Integer = 0 To upperBound

 

‘ … loop body here

 

Next

 

Contrary to what a cursory read may tell you, these two loops are not identical: the Visual Basic loop will execute

one more iteration than will the C# loop. This is because Visual Basic treats the supplied upper-bound as inclusive,

 

Patterns of Parallel Programming Page 4

 

 

whereas we explicitly specified it in C# to be exclusive through our use of the less-than operator. For our purposes

here, we’ll follow suit to the C# implementation, and we’ll have the upper-bound parameter to our parallelized

loop method represent an exclusive upper-bound:

 

C#

 

public static void MyParallelFor(

int inclusiveLowerBound, int exclusiveUpperBound, Action<int> body);

 

Our implementation of this method will invoke the body of the loop once per element in the range

[inclusiveLowerBound,exclusiveUpperBound), and will do so with as much parallelization as it can muster. To

accomplish that, we first need to understand how much parallelization is possible.

 

Wisdom in parallel circles often suggests that a good parallel implementation will use one thread per core. After

all, with one thread per core, we can keep all cores fully utilized. Any more threads, and the operating system will

need to context switch between them, resulting in wasted overhead spent on such activities; any fewer threads,

 

and there’s no chance we can take advantage of all that the machine has to offer, as at least one core will be

 

guaranteed to go unutilized. This logic has some validity, at least for certain classes of problems. But the logic is

also predicated on an idealized and theoretical concept of the machine. As an example of where this notion may

break down, to do anything useful threads involved in the parallel processing need to access data, and accessing

data requires trips to caches or main memory or disk or the network or other stores that can cost considerably in

terms of access times; while such activities are in flight, a CPU may be idle. As such, while a good parallel

implementation may assume a default of one-thread-per-core, an open mindedness to other mappings can be

beneficial. For our initial purposes here, however, we’ll stick with the one-thread-per core notion.

 

With the .NET Framework, retrieving the number of logical processors is achieved

using the System.Environment class, and in particular its ProcessorCount property.

Under the covers, .NET retrieves the corresponding value by delegating to the

GetSystemInfo native function exposed from kernel32.dll.

 

This value doesn’t necessarily correlate to the number of physical processors or even

to the number of physical cores in the machine. Rather, it takes into account the

number of hardware threads available. As an example, on a machine with two

sockets, each with four cores, each with two hardware threads (sometimes referred

to as hyperthreads), Environment.ProcessorCount would return 16.

 

Starting with Windows 7 and Windows Server 2008 R2, the Windows operating

system supports greater than 64 logical processors, and by default (largely for legacy

application reasons), access to these cores is exposed to applications through a new

concept known as “processor groups.” The .NET Framework does not provide

managed access to the processor group APIs, and thus Environment.ProcessorCount

will return a value capped at 64 (the maximum size of a processor group), even if the

machine has a larger number of processors. Additionally, in a 32-bit process,

ProcessorCount will be capped further to 32, in order to map well to the 32-bit mask

used to represent processor affinity (a requirement that a particular thread be

scheduled for execution on only a specific subset of processors).

 

Patterns of Parallel Programming Page 5

 

 

Once we know the number of processors we want to target, and hence the number of threads, we can proceed to

create one thread per core. Each of those threads will process a portion of the input range, invoking the supplied

Action<int> delegate for each iteration in that range. Such processing requires another fundamental operation of

parallel programming, that of data partitioning. This topic will be discussed in greater depth later in this document;

suffice it to say, however, that partitioning is a distinguishing concept in parallel implementations, one that

separates it from the larger, containing paradigm of concurrent programming. In concurrent programming, a set of

independent operations may all be carried out at the same time. In parallel programming, an operation must first

be divided up into individual sub-operations so that each sub-operation may be processed concurrently with the

rest; that division and assignment is known as partitioning. For the purposes of this initial implementation, we’ll

use a simple partitioning scheme: statically dividing the input range into one range per thread.

 

Here is our initial implementation:

 

C#

 

public static void MyParallelFor(

int inclusiveLowerBound, int exclusiveUpperBound, Action<int> body)

{

 

// Determine the number of iterations to be processed, the number of

// cores to use, and the approximate number of iterations to process

// in each thread.

 

int size = exclusiveUpperBound -inclusiveLowerBound;

int numProcs = Environment.ProcessorCount;

int range = size / numProcs;

 

 

// Use a thread for each partition. Create them all,

// start them all, wait on them all.

 

 

var threads = new List<Thread>(numProcs);

for (int p = 0; p < numProcs; p++)

{

 

 

int start = p * range + inclusiveLowerBound;

int end = (p == numProcs -1) ?

exclusiveUpperBound : start + range;

threads.Add(new Thread(() => {

for (int i = start; i < end; i++) body(i);

 

}));

}

foreach (var thread in threads) thread.Start();

foreach (var thread in threads) thread.Join();

 

 

}

 

There are several interesting things to note about this implementation. One is that for each range, a new thread is

utilized. That thread exists purely to process the specified partition, and then it terminates. This has several

positive and negative implications. The primary positive to this approach is that we have dedicated threading

resources for this loop, and it is up to the operating system to provide fair scheduling for these threads across the

system. This positive, however, is typically outweighed by several significant negatives. One such negative is the

cost of a thread. By default in the .NET Framework 4, a thread consumes a megabyte of stack space, whether or

not that space is used for currently executing functions. In addition, spinning up a new thread and tearing one

down are relatively costly actions, especially if compared to the cost of a small loop doing relatively few iterations

and little work per iteration. Every time we invoke our loop implementation, new threads will be spun up and torn

down.

 

Patterns of Parallel Programming Page 6

 

 

There’s another, potentially more damaging impact: oversubscription. As we move forward in the world of multicore

and into the world of manycore, parallelized components will become more and more common, and it’s quite

 

likely that such components will themselves be used concurrently. If such components each used a loop like the

 

above, and in doing so each spun up one thread per core, we’d have two components each fighting for the

machine’s resources, forcing the operating system to spend more time context switching between components.

Context switching is expensive for a variety of reasons, including the need to persist details of a thread’s execution

prior to the operating system context switching out the thread and replacing it with another. Potentially more

importantly, such context switches can have very negative effects on the caching subsystems of the machine.

When threads need data, that data needs to be fetched, often from main memory. On modern architectures, the

cost of accessing data from main memory is relatively high compared to the cost of running a few instructions over

that data. To compensate, hardware designers have introduced layers of caching, which serve to keep small

amounts of frequently-used data in hardware significantly less expensive to access than main memory. As a thread

executes, the caches for the core on which it’s executing tend to fill with data appropriate to that thread’s

execution, improving its performance. When a thread gets context switched out, the caches will shift to containing

data appropriate to that new thread. Filling the caches requires more expensive trips to main memory. As a result,

the more context switches there are between threads, the more expensive trips to main memory will be required,

as the caches thrash on the differing needs of the threads using them. Given these costs, oversubscription can be a

serious cause of performance issues. Luckily, the new concurrency profiler views in Visual Studio 2010 can help to

identify these issues, as shown here:

 

 

In this screenshot, each horizontal band represents a thread, with time on the x-axis. Green is execution time, red

is time spent blocked, and yellow is time where the thread could have run but was preempted by another thread .

The more yellow there is, the more oversubscription there is hurting performance.

 

To compensate for these costs associated with using dedicated threads for each loop, we can resort to pools of

threads. The system can manage the threads in these pools, dispatching the threads to access work items queued

for their processing, and then allowing the threads to return to the pool rather than being torn down. This

addresses many of the negatives outlined previously. As threads aren’t constantly being created and torn down,

the cost of their life cycle is amortized over all the work items they process. Moreover, the manager of the thread

pool can enforce an upper-limit on the number of threads associated with the pool at any one time, placing a limit

on the amount of memory consumed by the threads, as well as on how much oversubscription is allowed.

 

Ever since the .NET Framework 1.0, the System.Threading.ThreadPool class has provided just such a thread pool,

and while the implementation has changed from release to release (and significantly so for the .NET Framework 4),

the core concept has remained constant: the .NET Framework maintains a pool of threads that service work items

provided to it. The main method for doing this is the static QueueUserWorkItem. We can use that support in a

revised implementation of our parallel for loop:

 

Patterns of Parallel Programming Page 7

 

 

C#

 

 

public static void MyParallelFor(

 

int inclusiveLowerBound, int exclusiveUpperBound, Action<int> body)

 

{

 

// Determine the number of iterations to be processed, the number of

 

// cores to use, and the approximate number of iterations to process in

 

// each thread.

 

int size = exclusiveUpperBound -inclusiveLowerBound;

 

int numProcs = Environment.ProcessorCount;

 

int range = size / numProcs;

 

// Keep track of the number of threads remaining to complete.

 

int remaining = numProcs;

 

using (ManualResetEvent mre = new ManualResetEvent(false))

 

{

 

// Create each of the threads.

 

for (int p = 0; p < numProcs; p++)

 

{

 

int start = p * range + inclusiveLowerBound;

 

int end = (p == numProcs -1) ?

 

exclusiveUpperBound : start + range;

 

ThreadPool.QueueUserWorkItem(delegate {

 

for (int i = start; i < end; i++) body(i);

 

if (Interlocked.Decrement(ref remaining) == 0) mre.Set();

 

});

 

}

 

// Wait for all threads to complete.

 

mre.WaitOne();

 

}

}

 

 

This removes the inefficiencies in our application related to excessive thread creation and tear down, and it

minimizes the possibility of oversubscription. However, this inefficiency was just one problem with the

implementation: another potential problem has to do with the static partitioning we employed. For workloads that

entail the same approximate amount of work per iteration, and when running on a relatively “quiet”

machine

(meaning a machine doing little else besides the target workload), static partitioning represents an effective and

efficient way to partition our data set. However, if the workload is not equivalent for each iteration, either due to

the nature of the problem or due to certain partitions completing more slowly due to being preempted by other

significant work on the system, we can quickly find ourselves with a load imbalance. The pattern of a loadimbalance

is very visible in the following visualization as rendered by the concurrency profiler in Visual Studio

2010.

 

Patterns of Parallel Programming Page 8

 

 

In this output from the profiler, the x-axis is time and the y-axis is the number of cores utilized at that time in the

application’s executions. Green is utilization by our application, yellow is utilization by another application, red is

utilization by a system process, and grey is idle time. This trace resulted from the unfortunate assignment of

different amounts of work to each of the partitions; thus, some of those partitions completed processing sooner

than the others. Remember back to our assertions earlier about using fewer threads than there are cores to do

work? We’ve now degraded to that situation, in that for a portion of this loop’s execution, we were executing with

fewer cores than were available.

 

By way of example, let’s consider a parallel loop from 1 to 12 (inclusive on both ends), where each iteration does N

seconds of work with N defined as the loop iteration value (that is, iteration #1 will require 1 second of

computation, iteration #2 will require two seconds, and so forth). All in all, this loop will require ((12*13)/2) == 78

seconds of sequential processing time. In an ideal loop implementation on a dual core system, we could finish this

loop’s processing in 39 seconds. This could be accomplished by having one core process iterations 6, 10, 11, and

12, with the other core processing the rest of the iterations.

 

123456789101112

However, with the static partitioning scheme we’ve employed up until this point, one core will be assigned the

range [1,6] and the other the range [7,12].

 

123456789101112

Patterns of Parallel Programming Page 9

 

 

As such, the first core will have 21 seconds worth of work, leaving the latter core 57 seconds worth of work. Since

 

the loop isn’t finished until all iterations have been processed, our loop’s processing time is limited by the

 

maximum processing time of each of the two partitions, and thus our loop completes in 57 seconds instead of the

aforementioned possible 39 seconds. This represents an approximate 50 percent decrease in potential

performance, due solely to an inefficient partitioning. Now you can see why partitioning has such a fundamental

place in parallel programming.

 

Different variations on static partitioning are possible. For example, rather than assigning ranges, we could use a

form of round-robin, where each thread has a unique identifier in the range [0,# of threads), and where each

 

thread processes indices from the loop where the index mod the number of threads matches the thread’s

 

identifier. For example, with the iteration space [0,12) and with four threads, thread #0 would process iteration

values 0, 3, 6, and 9; thread #1 would process iteration values 1, 4, 7, and 10; and so on. If we were to apply this

kind of round-robin partitioning to the previous example, instead of one thread taking 21 seconds and the other

taking 57 seconds, one thread would require 36 seconds and the other 42 seconds, resulting in a much smaller

discrepancy from the optimal runtime of 38 seconds.

 

123456789101112

To do the best static partitioning possible, you need to be able to accurately predict ahead of time how long all the

iterations will take. That’s rarely feasible, resulting in a need for a more dynamic partitioning, where the system

can adapt to changing workloads quickly. We can address this by shifting to the other end of the partitioning

tradeoffs spectrum, with as much load-balancing as possible.

 

Spectrum of Partitioning

TradeoffsFully

StaticFully

DynamicMore Load-BalancingLess Synchronization

To do that, rather than pushing to each of the threads a given set of indices to process, we can have the threads

compete for iterations. We employ a pool of the remaining iterations to be processed, which initially starts filled

with all iterations. Until all of the iterations have been processed, each thread goes to the iteration pool, removes

an iteration value, processes it, and then repeats. In this manner, we can achieve in a greedy fashion an

approximation for the optimal level of load-balancing possible (the true optimum could only be achieved with a

priori knowledge of exactly how long each iteration would take). If a thread gets stuck processing a particular long

iteration, the other threads will compensate by processing work from the pool in the meantime. Of course, even

with this scheme you can still find yourself with a far from optimal partitioning (which could occur if one thread

happened to get stuck with several pieces of work significantly larger than the rest), but without knowledge of how

much processing time a given piece of work will require, there’s little more that can be done.

 

Patterns of Parallel Programming Page 10

 

 

Here’s an example implementation that takes load-balancing to this extreme. The pool of iteration values is

maintained as a single integer representing the next iteration available, and the threads involved in the processing

“remove items” by atomically incrementing this integer:

 

C#

 

public static void MyParallelFor(

int inclusiveLowerBound, int exclusiveUpperBound, Action<int> body)

{

 

// Get the number of processors, initialize the number of remaining

// threads, and set the starting point for the iteration.

 

int numProcs = Environment.ProcessorCount;

int remainingWorkItems = numProcs;

 

 

int nextIteration = inclusiveLowerBound;

 

using (ManualResetEvent mre = new ManualResetEvent(false))

{

 

// Create each of the work items.

 

for (int p = 0; p < numProcs; p++)

 

{

ThreadPool.QueueUserWorkItem(delegate

{

 

 

int index;

 

while ((index = Interlocked.Increment(

ref nextIteration) -1) < exclusiveUpperBound)

 

{

 

body(index);

}

if (Interlocked.Decrement(ref remainingWorkItems) == 0)

 

mre.Set();

});

}

 

 

// Wait for all threads to complete.

 

mre.WaitOne();

}

}

 

 

This is not a panacea, unfortunately. We’ve gone to the other end of the spectrum, trading quality load-balancing

for additional overheads. In our previous static partitioning implementations, threads were assigned ranges and

were then able to process those ranges completely independently from the other threads. There was no need to

synchronize with other threads in order to determine what to do next, because every thread could determine

independently what work it needed to get done. For workloads that have a lot of work per iteration, the cost of

synchronizing between threads so that each can determine what to do next is negligible. But for workloads that do

very little work per iteration, that synchronization cost can be so expensive (relatively) as to overshadow the actual

work being performed by the loop. This can make it more expensive to execute in parallel than to execute serially.

 

Consider an analogy: shopping with some friends at a grocery store. You come into

the store with a grocery list, and you rip the list into one piece per friend, such that

every friend is responsible for retrieving the elements on his or her list. If the amount

of time required to retrieve the elements on each list is approximately the same as on

every other list, you’ve done a good job of partitioning the work amongst your team,

and will likely find that your time at the store is significantly less than if you had done

 

Patterns of Parallel Programming Page 11

 

 

all of the shopping yourself. But now suppose that each list is not well balanced, with

all of the items on one friend’s list spread out over the entire store, while all of the

items on another friend’s list are concentrated in the same aisle. You could address

this inequity by assigning out one element at a time. Every time a friend retrieves a

food item, he or she brings it back to you at the front of the store and determines in

conjunction with you which food item to retrieve next. If a particular food item takes

a particularly long time to retrieve, such as ordering a custom cut piece of meat at

the deli counter, the overhead of having to go back and forth between you and the

merchandise may be negligible. For simply retrieving a can from a shelf, however, the

overhead of those trips can be dominant, especially if multiple items to be retrieved

from a shelf were near each other and could have all been retrieved in the same trip

with minimal additional time. You could spend so much time (relatively) parceling out

work to your friends and determining what each should buy next that it would be

faster for you to just grab all of the food items in your list yourself.

 

Of course, we don’t need to pick one extreme or the other. As with most patterns, there are variations on themes.

For example, in the grocery store analogy, you could have each of your friends grab several items at a time, rather

than grabbing one at a time. This amortizes the overhead across the size of a batch, while still having some amount

of dynamism:

 

C#

 

public static void MyParallelFor(

int inclusiveLowerBound, int exclusiveUpperBound, Action<int> body)

{

 

// Get the number of processors, initialize the number of remaining

// threads, and set the starting point for the iteration.

 

int numProcs = Environment.ProcessorCount;

int remainingWorkItems = numProcs;

int nextIteration = inclusiveLowerBound;

const int batchSize = 3;

 

 

using (ManualResetEvent mre = new ManualResetEvent(false)) {

 

// Create each of the work items.

 

for (int p = 0; p < numProcs; p++) {

ThreadPool.QueueUserWorkItem(delegate {

int index;

 

 

while ((index = Interlocked.Add(

ref nextIteration, batchSize) -batchSize)

< exclusiveUpperBound)

 

{

 

// In a real implementation, we’d need to handle

 

// overflow on this arithmetic.

 

int end = index + batchSize;

if (end >= exclusiveUpperBound) end = exclusiveUpperBound;

 

 

for (int i = index; i < end; i++) body(i);

}

if (Interlocked.Decrement(ref remainingWorkItems) == 0)

 

mre.Set();

});

}

 

 

// Wait for all threads to complete

 

mre.WaitOne();

 

Patterns of Parallel Programming Page 12

 

 

}

}

 

 

No matter what tradeoffs you make between overheads and load-balancing, they are tradeoffs. For a particular

problem, you might be able to code up a custom parallel loop algorithm mapping to this pattern that suits your

particular problem best. That could result in quite a bit of custom code, however. In general, a good solution is one

that provides quality results for most problems, minimizing overheads while providing sufficient load-balancing,

and the .NET Framework 4 includes just such an implementation in the new System.Threading.Tasks.Parallel class.

 

P A R A L L E L . F O R

 

 

As delightfully parallel problems represent one of the most common patterns in parallel programming, it’s natural

that when support for parallel programming is added to a mainstream library, support for delightfully parallel

loops is included. The .NET Framework 4 provides this in the form of the static Parallel class in the new

System.Threading.Tasks namespace in mscorlib.dll. The Parallel class provides just three methods, albeit each

with several overloads. One of these methods is For, providing multiple signatures, one of which is almost identical

to the signature for MyParallelFor shown previously:

 

C#

 

public static ParallelLoopResult For(

int fromInclusive, int toExclusive, Action<int> body);

 

As with our previous implementations, the For method accepts three parameters: an inclusive lower-bound, an

exclusive upper-bound, and a delegate to be invoked for each iteration. Unlike our implementations, it also returns

a ParallelLoopResult value type, which contains details on the completed loop; more on that later.

 

Internally, the For method performs in a manner similar to our previous implementations. By default, it uses work

queued to the .NET Framework ThreadPool to execute the loop, and with as much parallelism as it can muster, it

invokes the provided delegate once for each iteration. However, Parallel.For and its overload set provide a whole

lot more than this:

 

.

Exception handling. If one iteration of the loop throws an exception, all of the threads participating in the

loop attempt to stop processing as soon as possible (by default, iterations currently executing will not be

interrupted, but the loop control logic tries to prevent additional iterations from starting). Once all

processing has ceased, all unhandled exceptions are gathered and thrown in aggregate in an

AggregateException instance. This exception type provides support for multiple “inner exceptions,”

whereas most .NET Framework exception types support only a single inner exception. For more

information about AggregateException, see http://msdn.microsoft.com/magazine/ee321571.aspx.

.

Breaking out of a loop early. This is supported in a manner similar to the break keyword in C# and the

Exit For construct in Visual Basic. Support is also provided for understanding whether the current

iteration should abandon its work because of occurrences in other iterations that will cause the loop to

end early. This is the primary reason for the ParallelLoopResult return value, shown in the Parallel.For

signature, which helps a caller to understand if a loop ended prematurely, and if so, why.

.

Long ranges. In addition to overloads that support working with Int32-based ranges, overloads are

provided for working with Int64-based ranges.

.

Thread-local state. Several overloads provide support for thread-local state. More information on this

support will be provided later in this document in the section on aggregation patterns.

 

Patterns of Parallel Programming Page 13

 

 

.

Configuration options. Multiple aspects of a loop’s execution may be controlled, including limiting the

number of threads used to process the loop.

.

Nested parallelism. If you use a Parallel.For loop within another Parallel.For loop, they coordinate with

each other to share threading resources. Similarly, it’s ok to use two Parallel.For loops concurrently, as

they’ll work together to share threading resources in the underlying pool rather than both assuming th ey

own all cores on the machine.

.

Dynamic thread counts. Parallel.For was designed to accommodate workloads that change in complexity

over time, such that some portions of the workload may be more compute-bound than others. As such, it

may be advantageous to the processing of the loop for the number of threads involved in the processing

to change over time, rather than being statically set, as was done in all of our implementations shown

earlier.

.

Efficient load balancing. Parallel.For supports load balancing in a very sophisticated manner, much more

so than the simple mechanisms shown earlier. It takes into account a large variety of potential workloads

and tries to maximize efficiency while minimizing overheads. The partitioning implementation is based on

a chunking mechanism where the chunk size increases over time. This helps to ensure quality load

balancing when there are only a few iterations, while minimizing overhead when there are many. In

addition, it tries to ensure that most of a thread’s iterations are focused in the same region of the

iteration space in order to provide high cache locality.

 

Parallel.For is applicable to a wide-range of delightfully parallel problems, serving as an implementation of this

quintessential pattern. As an example of its application, the parallel programming samples for the .NET Framework

4 (available at http://code.msdn.microsoft.com/ParExtSamples) include a ray tracer. Here’s a screenshot:

 

 

Ray tracing is fundamentally a delightfully parallel problem. Each individual pixel in the image is generated by firing

an imaginary ray of light, examining the color of that ray as it bounces off of and through objects in the scene, and

storing the resulting color. Every pixel is thus independent of every other pixel, allowing them all to be processed

in parallel. Here are the relevant code snippets from that sample:

 

C#

 

void RenderSequential(Scene scene, Int32[] rgb)

 

{

Camera camera = scene.Camera;

for (int y = 0; y < screenHeight; y++)

{

 

int stride = y * screenWidth;

for (int x = 0; x < screenWidth; x++)

 

Patterns of Parallel Programming Page 14

 

 

{

Color color = TraceRay(

new Ray(camera.Pos, GetPoint(x, y, camera)), scene, 0);

rgb[x + stride] = color.ToInt32();

}

}

}

 

void RenderParallel(Scene scene, Int32[] rgb)

{

Camera camera = scene.Camera;

 

 

Parallel.For(0, screenHeight, y =>

 

{

int stride = y * screenWidth;

for (int x = 0; x < screenWidth; x++)

{

 

 

Color color = TraceRay(

new Ray(camera.Pos, GetPoint(x, y, camera)), scene, 0);

rgb[x + stride] = color.ToInt32();

}

});

}

 

Notice that there are very few differences between the sequential and parallel implementation, limited only to

changing the C# for and Visual Basic For language constructs into the Parallel.For method call.

 

P A R A L L E L . F O R E A C H

 

 

A for loop is a very specialized loop. Its purpose is to iterate through a specific kind of data set, a data set made up

of numbers that represent a range. The more generalized concept is iterating through any data set, and constructs

for such a pattern exist in C# with the foreach keyword and in Visual Basic with the For Each construct.

 

Consider the following for loop:

 

C#

 

for(int i=0; i<10; i++)

{

 

 

// … Process i.

 

}

 

Using the Enumerable class from LINQ, we can generate an IEnumerable<int> that represents the same range, and

iterate through that range using a foreach:

 

C#

 

foreach(int i in Enumerable.Range(0, 10))

 

{

 

// … Process i.

 

}

 

We can accomplish much more complicated iteration patterns by changing the data returned in the enumerable.

Of course, as it is a generalized looping construct, we can use a foreach to iterate through any enumerable data

set. This makes it very powerful, and a parallelized implementation is similarly quite powerful in the parallel realm.

As with a parallel for, a parallel for each represents a fundamental pattern in parallel programming.

 

Patterns of Parallel Programming Page 15

 

 

Implementing a parallel for each is similar in concept to implementing a parallel for. You need multiple threads to

process data in parallel, and you need to partition the data, assigning the partitions to the threads doing the

processing. In our dynamically partitioned MyParallelFor implementation, the data set remaining was represented

by a single integer that stored the next iteration. In a for each implementation, we can store it as an

IEnumerator<T> for the data set. This enumerator must be protected by a critical section so that only one thread

at a time may mutate it. Here is an example implementation:

 

C#

 

public static void MyParallelForEach<T>(

IEnumerable<T> source, Action<T> body)

 

 

{

int numProcs = Environment.ProcessorCount;

int remainingWorkItems = numProcs;

 

 

using (var enumerator = source.GetEnumerator())

 

{

using (ManualResetEvent mre = new ManualResetEvent(false))

{

 

// Create each of the work items.

 

for (int p = 0; p < numProcs; p++)

 

{

ThreadPool.QueueUserWorkItem(delegate

{

 

// Iterate until there’s no more work.

 

while (true)

{

 

// Get the next item under a lock,

// then process that item.

 

T nextItem;

lock (enumerator)

{

 

if (!enumerator.MoveNext()) break;

 

nextItem = enumerator.Current;

}

body(nextItem);

 

}

if (Interlocked.Decrement(ref remainingWorkItems) == 0)

mre.Set();

});

}

 

// Wait for all threads to complete.

 

mre.WaitOne();

}

}

}

 

 

As with the MyParallelFor implementations shown earlier, there are lots of implicit tradeoffs being made in this

implementation, and as with the MyParallelFor, they all come down to tradeoffs between simplicity, overheads,

and load balancing. Taking locks is expensive, and this implementation is taking and releasing a lock for each

element in the enumerable; while costly, this does enable the utmost in load balancing, as every thread only grabs

one item at a time, allowing other threads to assist should one thread run into an unexpectedly expensive

element. We could tradeoff some cost for some load balancing by retrieving multiple items (rather than just one)

while holding the lock. By acquiring the lock, obtaining multiple items from the enumerator, and then releasing the

 

Patterns of Parallel Programming Page 16

 

 

lock, we amortize the cost of acquisition and release over multiple elements, rather than paying the cost for each

element. This benefit comes at the expense of less load balancing, since once a thread has grabbed several items,

it is responsible for processing all of those items, even if some of them happen to be more expensive than the bulk

of the others.

 

We can decrease costs in other ways, as well. For example, the implementation shown previously always uses the

enumerator’s MoveNext/Current support, but it might be the case that the source input IEnumerable<T> also

implements the IList<T> interface, in which case the implementation could use less costly partitioning, such as that

employed earlier by MyParallelFor:

 

C#

 

public static void MyParallelForEach<T>(IEnumerable<T> source, Action<T> body)

 

{

IList<T> sourceList = source as IList<T>;

if (sourceList != null)

{

 

 

// This assumes the IList<T> implementation’s indexer is safe

 

// for concurrent get access.

 

MyParallelFor(0, sourceList.Count, i => body(sourceList[i]));

}

 

else

 

{

 

// …

 

}

}

 

 

As with Parallel.For, the .NET Framework 4’s Parallel class provides support for this pattern, in the form of the

ForEach method. Overloads of ForEach provide support for many of the same things for which overloads of For

provide support, including breaking out of loops early, sophisticated partitioning, and thread count dynamism. The

simplest overload of ForEach provides a signature almost identical to the signature shown above:

 

C#

 

public static ParallelLoopResult ForEach<TSource>(

IEnumerable<TSource> source, Action<TSource> body);

 

 

As an example application, consider a Student record that contains a settable GradePointAverage property as well

as a readable collection of Test records, each of which has a grade and a weight. We have a set of such student

records, and we want to iterate through each, calculating each student’s grades based on the associated tests.

Sequentially, the code looks as follows:

 

C#

 

foreach (var student in students)

{

student.GradePointAverage =

student.Tests.Select(test => test.Grade * test.Weight).Sum();

}

 

To parallelize this, we take advantage of Parallel.ForEach:

 

C#

 

Parallel.ForEach(students, student =>

 

{

 

Patterns of Parallel Programming Page 17

 

 

student.GradePointAverage =

student.Tests.Select(test => test.Grade * test.Weight).Sum();

 

});

 

P R O C E S S I N G N O N -I N T E G R A L R A N G E S

 

The Parallel class in the .NET Framework 4 provides overloads for working with ranges of Int32 and Int64 values.

However, for loops in languages like C# and Visual Basic can be used to iterate through non-integral ranges.

 

Consider a type Node<T> that represents a linked list:

 

C#

 

class Node<T>

 

{

public Node<T> Prev, Next;

public T Data;

 

 

}

 

Given an instance head of such a Node<T>, we can use a for loop to iterate through the list:

 

C#

 

for(Node<T> i = head; i != null; i = i.Next)

{

 

 

// … Process node i.

 

}

 

Parallel.For does not contain overloads for working with Node<T>, and Node<T> does not implement

IEnumerable<T>, preventing its direct usage with Parallel.ForEach. To compensate, we can use C# iterators to

create an Iterate method which will yield an IEnumerable<T> to iterate through the Node<T>:

 

C#

 

public static IEnumerable<Node<T>> Iterate(Node<T> head)

 

{

for (Node<T> i = head; i != null; i = i.Next)

{

 

 

yield return i;

}

}

 

 

With such a method in hand, we can now use a combination of Parallel.ForEach and Iterate to approximate a

Parallel.For implementation that does work with Node<T>:

 

C#

 

Parallel.ForEach(Iterate(head), i =>

{

 

 

// … Process node i.

 

});

 

This same technique can be applied to a wide variety of scenarios. Keep in mind, however, that the

IEnumerator<T> interface isn’t thread-safe, which means that Parallel.ForEach needs to take locks when accessing

the data source. While ForEach internally uses some smarts to try to amortize the cost of such locks over the

 

Patterns of Parallel Programming Page 18

 

 

processing, this is still overhead that needs to be overcome by more work in the body of the ForEach in order for

good speedups to be achieved.

 

Parallel.ForEach has optimizations used when working on data sources that can be indexed into, such as lists and

arrays, and in those cases the need for locking is decreased (this is similar to the example implementation shown

previously, where MyParallelForEach was able to use MyParallelFor in processing an IList<T>). Thus, even though

there is both time and memory cost associated with creating an array from an enumerable, performance may

actually be improved in some cases by transforming the iteration space into a list or an array, which can be done

using LINQ. For example:

 

C#

 

Parallel.ForEach(Iterate(head).ToArray(), i =>

{

 

 

// … Process node i.

 

});

 

The format of a for construct in C# and a For in Visual Basic may also be generalized into a generic Iterate method:

 

C#

 

public static IEnumerable<T> Iterate<T>(

Func<T> initialization, Func<T, bool> condition, Func<T, T> update)

{

 

for (T i = initialization(); condition(i); i = update(i))

{

yield return i;

}

}

 

While incurring extra overheads for all of the delegate invocations, this now also provides a generalized

mechanism for iterating. The Node<T> example can be re-implemented as follows:

 

C#

 

Parallel.ForEach(Iterate(() => head, i => i != null, i => i.Next), i =>

{

 

// … Process node i.

 

});

 

B R E A K I N G O U T O F L O O P S E A R L Y

 

 

Exiting out of loops early is a fairly common pattern, one that doesn’t go away when parallelism is introduced. To

help simplify these use cases, the Parallel.For and Parallel.ForEach methods support several mechanisms for

breaking out of loops early, each of which has different behaviors and targets different requirements.

 

PLANNED EXIT

 

 

Patterns of Parallel Programming Page 19

 

 

Several overloads of Parallel.For and Parallel.ForEach pass a ParallelLoopState instance to the body delegate.

Included in this type’s surface area are four members relevant to this discussion: methods Stop and Break, and

properties IsStopped and LowestBreakIteration.

 

When an iteration calls Stop, the loop control logic will attempt to prevent additional iterations of the loop from

starting. Once there are no more iterations executing, the loop method will return successfully (that is, without an

exception). The return type of Parallel.For and Parallel.ForEach is a ParallelLoopResult value type: if Stop caused

the loop to exit early, the result’s IsCompleted property will return false.

 

C#

 

ParallelLoopResult loopResult =

Parallel.For(0, N, (int i, ParallelLoopState loop) =>

{

 

 

// …

 

if (someCondition)

{

 

 

loop.Stop();

return;

 

 

}

 

// …

 

});

Console.WriteLine(“Ran to completion: ” + loopResult.IsCompleted);

 

 

For long running iterations, the IsStopped property enables one iteration to detect when another iteration has

called Stop in order to bail earlier than it otherwise would:

 

C#

 

ParallelLoopResult loopResult =

Parallel.For(0, N, (int i, ParallelLoopState loop) =>

{

 

 

// …

 

if (someCondition)

 

{

loop.Stop();

return;

 

 

}

 

// …

 

while (true)

{

 

 

if (loop.IsStopped) return;

 

// …

 

}

});

 

 

Break is very similar to Stop, except Break provides additional guarantees. Whereas Stop informs the loop control

logic that no more iterations need be run, Break informs the control logic that no iterations after the current one

need be run (for example, where the iteration number is higher or where the data comes after the current

element in the data source), but that iterations prior to the current one still need to be run. It doesn’t guarantee

that iterations after the current one haven’t already run or started running, though it will try t o avoid more starting

after the current one. Break may be called from multiple iterations, and the lowest iteration from which Break was

called is the one that takes effect; this iteration number can be retrieved from the ParallelLoopState’s

LowestBreakIteration property, a nullable value. ParallelLoopResult offers a similar LowestBreakIteration

property.

 

Patterns of Parallel Programming Page 20

 

 

This leads to a decision matrix that can be used to interpret a ParallelLoopResult:

 

.

IsCompleted == true

 

o

All iterations were processed.

o

If IsCompleted == true, LowestBreakIteration.HasValue will be false.

.

IsCompleted == false && LowestBreakIteration.HasValue == false

o

Stop was used to exit the loop early

.

IsCompleted == false && LowestBreakIteration.HasValue == true

o

Break was used to exit the loop early, and LowestBreakIteration.Value contains the lowest

iteration from which Break was called.

Here is an example of using Break with a loop:

 

C#

 

var output = new TResult[N];

var loopResult = Parallel.For(0, N, (int i, ParallelLoopState loop) =>

{

 

 

if (someCondition)

{

 

 

loop.Break();

return;

 

 

}

 

output[i] = Compute(i);

});

long completedUpTo = N;

if (!loopResult.IsCompleted && loopResult.LowestBreakIteration.HasValue)

{

 

completedUpTo = loopResult.LowestBreakIteration.Value;

}

 

 

Stop is typically useful for unordered search scenarios, where the loop is looking for something and can bail as

soon as it finds it. Break is typically useful for ordered search scenarios, where all of the data up until some point in

the source needs to be processed, with that point based on some search criteria.

 

UNPLANNED EXIT

 

The previously mentioned mechanisms for exiting a loop early are based on the body of the loop performing an

action to bail out. Sometimes, however, we want an entity external to the loop to be able to request that the loop

terminate; this is known as cancellation.

 

Cancellation is supported in parallel loops through the new System.Threading.CancellationToken type introduced

in the .NET Framework 4. Overloads of all of the methods on Parallel accept a ParallelOptions instance, and one of

the properties on ParallelOptions is a CancellationToken. Simply set this CancellationToken property to the

CancellationToken that should be monitored for cancellation, and provide that options instance to the loop’s

 

invocation. The loop will monitor the token, and if it finds that cancellation has been requested, it will again stop

launching more iterations, wait for all existing iterations to complete, and then throw an

OperationCanceledException.

 

C#

 

private CancellationTokenSource _cts = new CancellationTokenSource();

 

Patterns of Parallel Programming

Page 21

 

 

// …

 

var options = new ParallelOptions { CancellationToken = _cts.Token };

 

try

 

{

 

Parallel.For(0, N, options, i =>

 

{

 

// …

 

});

 

}

 

catch(OperationCanceledException oce)

 

{

 

// … Handle loop cancellation.

 

}

 

Stop and Break allow a loop itself to proactively exit early and successfully, and cancellation allows an external

entity to the loop to request its early termination. It’s also possible for something in the loop’s body to go wrong,

resulting in an early termination of the loop that was not expected.

 

In a sequential loop, an unhandled exception thrown out of a loop causes the looping construct to immediately

cease. The parallel loops in the .NET Framework 4 get as close to this behavior as is possible while still being

reliable and predictable. This means that when an exception is thrown out of an iteration, the Parallel methods

attempt to prevent additional iterations from starting, though already started iterations are not forcibly

terminated. Once all iterations have ceased, the loop gathers up any exceptions that have been thrown, wraps

them in a System.AggregateException, and throws that aggregate out of the loop.

 

As with Stop and Break, for cases where individual operations may run for a long time (and thus may delay the

loop’s exit), it may be advantageous for iterations of a loop to be able to check whether other iterations have

faulted. To accommodate that, ParallelLoopState exposes an IsExceptional property (in addition to the

aforementioned IsStopped and LowestBreakIteration properties), which indicates whether another iteration has

thrown an unhandled exception. Iterations may cooperatively check this property, allowing a long-running

iteration to cooperatively exit early when it detects that another iteration failed.

 

While this exception logic does support exiting out of a loop early, it is not the recommended mechanism for doing

so. Rather, it exists to assist in exceptional cases, cases where breaking out early wasn’t an intentional part of the

algorithm. As is the case with sequential constructs, exceptions should not be relied upon for control flow.

 

Note, too, that this exceptions behavior isn’t optional. In the face of unhandled exceptions, there’s no way to tell

the looping construct to allow the entire loop to complete execution, just as there’s no built-in way to do that with

a serial for loop. If you wanted that behavior with a serial for loop, you’d likely end up writing code like the

following:

 

C#

 

var exceptions = new Queue<Exception>();

 

for (int i = 0; i < N; i++)

 

{

try

{

// … Loop body goes here.

}

catch (Exception exc) { exceptions.Enqueue(exc); }

 

}

 

if (exceptions.Count > 0) throw new AggregateException(exceptions);

 

Patterns of Parallel Programming Page 22

 

 

If this is the behavior you desire, that same manual handling is also possible using Parallel.For:

 

C#

 

var exceptions = new ConcurrentQueue<Exception>();

Parallel.For(0, N, i =>

 

{

 

try

 

{

 

// … Loop body goes here.

 

}

 

catch (Exception exc) { exceptions.Enqueue(exc); }

});

if (!exceptions.IsEmpty) throw new AggregateException(exceptions);

 

EMPLOYING MULTIPLE EX IT STRATEGIES

 

It’s possible that multiple exit strategies could all be employed together, concurrently; we’re dealing with

parallelism, after all. In such cases, exceptions always win: if unhandled exceptions have occurred, the loop will

always propagate those exceptions, regardless of whether Stop or Break was called or whether cancellation was

requested.

 

If no exceptions occurred but the CancellationToken was signaled and either Stop or Break was used, there’s a

potential race as to whether the loop will notice the cancellation prior to exiting. If it does, the loop will exit with

an OperationCanceledException. If it doesn’t, it will exit due to the Stop/Break as explained previously.

 

However, Stop and Break may not be used together. If the loop detects that one iteration called Stop while

another called Break, the invocation of whichever method ended up being invoked second will result in an

exception being thrown. This is enforced due to the conflicting guarantees provided by Stop and Break.

 

For long running iterations, there are multiple properties an iteration might want to check to see whether it should

bail early: IsStopped, LowestBreakIteration, IsExceptional, and so on. To simplify this, ParallelLoopState also

provides a ShouldExitCurrentIteration property, which consolidates all of those checks in an efficient manner. The

loop itself checks this value prior to invoking additional iterations.

 

P A R A L L E L E N U M E R A B L E . F O R A L L

 

 

Parallel LINQ (PLINQ), exposed from System.Core.dll in the .NET Framework 4, provides a parallelized

implementation of all of the .NET Framework standard query operators. This includes Select (projections), Where

(filters), OrderBy (sorting), and a host of others. PLINQ also provides several additional operators not present in its

serial counterpart. One such operator is AsParallel, which enables parallel processing of a LINQ-to-Objects query.

Another such operator is ForAll.

 

Partitioning of data has already been discussed to some extent when discussing Parallel.For and Parallel.ForEach,

and merging will be discussed in greater depth later in this document. Suffice it to say, however, that to process an

input data set in parallel, portions of that data set must be distributed to each thread partaking in the processing,

 

Patterns of Parallel Programming Page 23

 

 

and when all of the processing is complete, those partitions typically need to be merged back together to form the

single output stream expected by the caller:

 

C#

 

List<InputData> inputData = …;

foreach (var o in inputData.AsParallel().Select(i => new OutputData(i)))

{

 

 

ProcessOutput(o);

}

 

 

Both partitioning and merging incur costs, and in parallel programming, we strive to avoid such costs as they’re

pure overhead when compared to a serial implementation. Partitioning can’t be avoided if data must be processed

in parallel, but in some cases we can avoid merging, such as if the work to be done for each resulting item can be

processed in parallel with the work for every other resulting item. To accomplish this, PLINQ provides the ForAll

operator, which avoids the merge and executes a delegate for each output element:

 

C#

 

List<InputData> inputData = …;

inputData.AsParallel().Select(i => new OutputData(i)).ForAll(o =>

{

 

 

ProcessOutput(o);

});

 

A N T I -P A T T E R N S

 

 

Superman has his kryptonite. Matter has its anti-matter. And patterns have their anti-patterns. Patterns prescribe

good ways to solve certain problems, but that doesn’t mean they’re not without potential pitfalls. There are

several potential problems to look out for with Parallel.For, Parallel.ForEach, and ParallelEnumerable.ForAll.

 

S H A R E D D A T A

 

The new parallelism constructs in the .NET Framework 4 help to alleviate most of the boilerplate code you’d

otherwise have to write to parallelize delightfully parallel problems. As you saw earlier, the amount of code

necessary just to implement a simple and naïve MyParallelFor implementation is vexing, and the amount of code

required to do it well is reams more. These constructs do not, however, automatically ensure that your code is

thread-safe. Iterations within a parallel loop must be independent, and if they’re not independent, you must

ensure that the iterations are safe to execute concurrently with each other by doing the appropriate

synchronization.

 

I T E R A T I O N V A R I A N T S

 

In managed applications, one of the most common patterns used with a for/For loop is iterating from 0 inclusive to

some upper bound (typically exclusive in C# and inclusive in Visual Basic). However, there are several variations on

this pattern that, while not nearly as common, are still not rare.

 

DOWNWARD ITERATION

 

Patterns of Parallel Programming Page 24

 

 

It’s not uncommon to see loops iterating down from an upper-bound exclusive to 0 inclusive:

 

C#

 

for(int i=upperBound-1; i>=0; –i) { /*…*/ }

 

Such a loop is typically (though not always) constructed due to dependencies between the iterations; after all, if all

of the iterations are independent, why write a more complex form of the loop if both the upward and downward

iteration have the same results?

 

Parallelizing such a loop is often fraught with peril, due to these likely dependencies between iterations. If there

are no dependencies between iterations, the Parallel.For method may be used to iterate from an inclusive lower

bound to an exclusive upper bound, as directionality shouldn’t matter: in the extreme case of parallelism, on a

machine with upperBound number of cores, all iterations of the loop may execute concurrently, and direction is

irrelevant.

 

When parallelizing downward-iterating loops, proceed with caution. Downward iteration is often a sign of a less

than delightfully parallel problem.

 

STEPP ED ITERATION

 

Another pattern of a for loop that is less common than the previous cases, but still is not rare, is one involving a

step value other than one. A typical for loop may look like this:

 

C#

 

for (int i = 0; i < upperBound; i++) { /*…*/ }

 

But it’s also possible for the update statement to increase the iteration value by a different amount: for example to

iterate through only the even values between the bounds:

 

C#

 

for (int i = 0; i < upperBound; i += 2) { /*…*/ }

 

Parallel.For does not provide direct support for such patterns. However, Parallel can still be used to implement

such patterns. One mechanism for doing so is through an iterator approach like that shown earlier for iterating

through linked lists:

 

C#

 

private static IEnumerable<int> Iterate(

 

int fromInclusive, int toExclusive, int step)

 

{

 

for (int i = fromInclusive; i < toExclusive; i += step) yield return i;

 

}

 

A Parallel.ForEach loop can now be used to perform the iteration. For example, the previous code snippet for

iterating the even values between 0 and upperBound can be coded as:

 

C#

 

Parallel.ForEach(Iterate(0, upperBound, 2), i=> { /*…*/ });

 

Patterns of Parallel Programming Page 25

 

 

As discussed earlier, such an implementation, while straightforward, also incurs the additional costs of forcing the

Parallel.ForEach to takes locks while accessing the iterator. This drives up the per-element overhead of

parallelization, demanding that more work be performed per element to make up for the increased overhead in

order to still achieve parallelization speedups.

 

Another approach is to do the relevant math manually. Here is an implementation of a ParallelForWithStep loop

that accepts a step parameter and is built on top of Parallel.For:

 

C#

 

public static void ParallelForWithStep(

int fromInclusive, int toExclusive, int step, Action<int> body)

 

 

{

if (step < 1)

{

 

 

throw new ArgumentOutOfRangeException(“step”);

}

else if (step == 1)

{

 

 

Parallel.For(fromInclusive, toExclusive, body);

}

 

 

else // step > 1

 

{

int len = (int)Math.Ceiling((toExclusive -fromInclusive) / (double)step);

Parallel.For(0, len, i => body(fromInclusive + (i * step)));

 

}

}

 

 

This approach is less flexible than the iterator approach, but it also involves significantly less overhead. Threads are

not bottlenecked serializing on an enumerator; instead, they need only pay the cost of a small amount of math

plus an extra delegate invocation per iteration.

 

V E R Y S M A L L L O O P B O D I E S

 

As previously mentioned, the Parallel class is implemented in a manner so as to provide for quality load balancing

while incurring as little overhead as possible. There is still overhead, though. The overhead incurred by Parallel.For

is largely centered around two costs:

 

1)

Delegate invocations. If you squint at previous examples of Parallel.For, a call to Parallel.For looks a lot

like a C# for loop or a Visual Basic For loop. Don’t be fooled: it’s still a method call. One consequence of

this is that the “body” of the Parallel.For “loop” is supplied to the method call as a delegate. Invoking a

delegate incurs approximately the same amount of cost as a virtual method call.

 

2)

Synchronization between threads for load balancing. While these costs are minimized as much as

possible, any amount of load balancing will incur some cost, and the more load balancing employed, the

more synchronization is necessary.

 

For medium to large loop bodies, these costs are largely negligible. But as the size of the loop’s body decreases,

the overheads become more noticeable. And for very small bodies, the loop can be completely dominated by this

overhead’s cost. To support parallelization of very small loop bodies requires addressing both #1 and #2 above.

One pattern for this involves chunking the input into ranges, and then instead of replacing a sequential loop with a

parallel loop, wrapping the sequential loop with a parallel loop.

 

Patterns of Parallel Programming

Page 26

 

 

The System.Concurrent.Collections.Partitioner class provides a Create method overload that accepts an integral

range and returns an OrderablePartitioner<Tuple<Int32,Int32>> (a variant for Int64 instead of Int32 is also

available):

 

C#

 

public static OrderablePartitioner<Tuple<long, long>> Create(

long fromInclusive, long toExclusive);

 

 

Overloads of Parallel.ForEach accept instances of Partitioner<T> and OrderablePartitioner<T> as sources, allowing

you to pass the result of a call to Partitioner.Create into a call to Parallel.ForEach. For now, think of both

Partitioner<T> and OrderablePartitioner<T> as an IEnumerable<T>.

 

The Tuple<Int32,Int32> represents a range from an inclusive value to an exclusive value. Consider the following

sequential loop:

 

C#

 

for (int i = from; i < to; i++)

{

 

// … Process i.

 

}

 

We could use a Parallel.For to parallelize it as follows:

 

C#

 

Parallel.For(from, to, i =>

{

 

 

// … Process i.

 

});

 

Or, we could use Parallel.ForEach with a call to Partitioner.Create, wrapping a sequential loop over the range

provided in the Tuple<Int32, Int32>, where the inclusiveLowerBound is represented by the tuple’s Item1 and

where the exclusiveUpperBound is represented by the tuple’s Item2:

 

C#

 

Parallel.ForEach(Partitioner.Create(from, to), range =>

 

{

 

for (int i = range.Item1; i < range.Item2; i++)

 

{

 

// … process i

 

}

 

});

 

While more complex, this affords us the ability to process very small loop bodies by eschewing some of the

aforementioned costs. Rather than invoking a delegate for each body invocation, we’re now amortizing the cost of

the delegate invocation across all elements in the chunked range. Additionally, as far as the parallel loop is

concerned, there are only a few elements to be processed: each range, rather than each index. This implicitly

decreases the cost of synchronization because there are fewer elements to load-balance.

 

While Parallel.For should be considered the best option for parallelizing for loops, if performance measurements

show that speedups are not being achieved or that they’re smaller than expected, you can try an approach like the

one shown using Parallel.ForEach in conjunction with Partitioner.Create.

 

Patterns of Parallel Programming Page 27

 

 

T O O F I N E -G R A I N E D , T O O C O A R S E G R A I N E D

 

The previous anti-pattern outlined the difficulties that arise from having loop bodies that are too small. In addition

 

to problems that implicitly result in such small bodies, it’s also possible to end up in this situation by decomposing

 

the problem to the wrong granularity.

Earlier in this section, we demonstrated a simple parallelized ray tracer:

 

 

C#

 

void RenderParallel(Scene scene, Int32[] rgb)

{

Camera camera = scene.Camera;

 

 

Parallel.For(0, screenHeight, y =>

 

{

int stride = y * screenWidth;

for (int x = 0; x < screenWidth; x++)

{

 

 

Color color = TraceRay(

new Ray(camera.Pos, GetPoint(x, y, camera)), scene, 0);

rgb[x + stride] = color.ToInt32();

}

});

}

 

Note that there are two loops here, both of which are actually safe to parallelize:

 

C#

 

void RenderParallel(Scene scene, Int32[] rgb)

 

{

 

Camera camera = scene.Camera;

 

Parallel.For(0, screenHeight, y =>

 

{

int stride = y * screenWidth;

 

 

Parallel.For(0, screenWidth, x =>

 

{

Color color = TraceRay(

new Ray(camera.Pos, GetPoint(x, y, camera)), scene, 0);

rgb[x + stride] = color.ToInt32();

});

});

}

 

The question then arises: why and when someone would choose to parallelize one or both of these loops? There

are multiple, competing principles. On the one hand, the idea of writing parallelized software that scales to any

number of cores you throw at it implies that you should decompose as much as possible, so that regardless of the

number of cores available, there will always be enough work to go around. This principle suggests both loops

should be parallelized. On the other hand, we’ve already seen the performance implications that can result if

there’s not enough work inside of a parallel loop to warrant its parallelization, implying that only the outer loop

should be parallelized in order to maintain a meaty body.

 

Patterns of Parallel Programming Page 28

 

 

The answer is that the best balance is found through performance testing. If the overheads of parallelization are

minimal as compared to the work being done, parallelize as much as possible: in this case, that would mean

parallelizing both loops. If the overheads of parallelizing the inner loop would degrade performance on most

systems, think twice before doing so, as it’ll likely be best only to parallelize the outer loop/

 

 

There are of course some caveats to this (in parallel programming, there are caveats to everything; there are

caveats to the caveats). Parallelization of only the outer loop demands that the outer loop has enough work to

saturate enough processors. In our ray tracer example, what if the image being ray traced was very wide and short,

such that it had a small height? In such a case, there may only be a few iterations for the outer loop to parallelize,

resulting in too coarse-grained parallelization, in which case parallelizing the inner loop could actually be

beneficial, even if the overheads of parallelizing the inner loop would otherwise not warrant its parallelization.

 

 

Another option to consider in such cases is flattening the loops, such that you end up with one loop instead of two.

 

 

This eliminates the cost of extra partitions and merges that would be incurred on the inner loop’s parallelization:

 

C#

 

void RenderParallel(Scene scene, Int32[] rgb)

{

 

 

int totalPixels = screenHeight * screenWidth;

 

Camera camera = scene.Camera;

Parallel.For(0, totalPixels, i =>

{

 

 

int y = i / screenWidth, x = i % screenWidth;

 

Color color = TraceRay(

new Ray(camera.Pos, GetPoint(x, y, camera)), scene, 0);

rgb[i] = color.ToInt32();

});

}

 

If in doing such flattening the body of the loop becomes too small (which given the cost of TraceRay in this

example is unlikely), the pattern presented earlier for very small loop bodies may also be employed:

 

C#

 

void RenderParallel(Scene scene, Int32[] rgb)

 

{

int totalPixels = screenHeight * screenWidth;

Camera camera = scene.Camera;

 

Parallel.ForEach(Partitioner.Create(0, totalPixels), range =>

 

{

 

for (int i = range.Item1; i < range.Item2; i++)

 

{

int y = i / screenWidth, x = i % screenWidth;

Color color = TraceRay(

 

new Ray(camera.Pos, GetPoint(x, y, camera)), scene, 0);

rgb[i] = color.ToInt32();

}

});

}

 

N O N -T H R E A D -S A F E I L I S T < T > I M P L E M E N T A T I O N S

 

Patterns of Parallel Programming Page 29

 

 

Both PLINQ and Parallel.ForEach query their data sources for several interface implementations. Accessing an

IEnumerable<T> incurs significant cost, due to needing to lock on the enumerator and make virtual methods calls

to MoveNext and Current for each element. In contrast, getting an element from an IList<T> can be done without

locks, as elements of an IList<T> are independent. Thus, both PLINQ and Parallel.ForEach automatically use a

source’s IList<T> implementation if one is available.

 

In most cases, this is the right decision. However, in very rare cases, an implementation of IList<T> may not be

thread-safe for reading due to the get accessor for the list’s indexer mutating shared state. There are two

predominant reasons why an implementation might do this:

 

1.

The data structures stores data in a non-indexible manner, such that it must traverse the data structure

to find the requested index. In such a case, the data structure may try to amortize the cost of access by

keeping track of the last element accessed, assuming that accesses will occur in a largely sequential

manner, making it cheaper to start a search from the previously accessed element than starting from

scratch. Consider a theoretical linked list implementation as an example. A linked list does not typically

support direct indexing; rather, if you want to access the 42nd element of the list, you need to start at the

beginning, prior to the head, and move to the next element 42 times. As an optimization, the list could

maintain a reference to the most recently accessed element. If you accessed element 42 and then

element 43, upon accessing 42 the list would cache a reference to the 42nd element, thus making access

to 43 a single move next rather than 43 of them from the beginning. If the implementation doesn’t take

thread-safety into account, these mutations are likely not thread-safe.

2.

Loading the data structure is expensive. In such cases, the data can be lazy-loaded (loaded on first

access) to defer or avoid some of the initialization costs. If getting data from the list forces initialization,

then mutations could occur due to indexing into the list.

There are only a few, obscure occurrences of this in the .NET Framework. One

example is System.Data.Linq.EntitySet<TEntity>. This type implements

IList<TEntity> with support for lazy loading, such that the first thing its indexer’s get

accessor does is load the data into the EntitySet<TEntity> if loading hasn’t already

occurred.

 

To work around such cases if you do come across them, you can force both PLINQ and Parallel.ForEach to use the

IEnumerable<T> implementation rather than the IList<T> implementation. This can be achieved in two ways:

 

1)

Use System.Collections.Concurrent.Partitioner’s Create method. There is an overload specific to

IEnumerable<T> that will ensure this interface implementation (not one for IList<T>) is used.

Partitioner.Create returns an instance of a Partitioner<T>, for which there are overloads on

Parallel.ForEach and in PLINQ.

 

C#

 

// Will use IList<T> implementation if source implements it.

IEnumerable<T> source = …;

Parallel.ForEach(source, item => { /*…*/ });

 

 

// Will use source’s IEnumerable<T> implementationž

IEnumerable<T> source = …;

Parallel.ForEach(Partitioner.Create(source), item => { /*…*/ });

 

 

Patterns of Parallel Programming

Page 30

 

 

2)

Append onto the data source a call to Enumerable.Select. The Select simply serves to prevent PLINQ and

Parallel.ForEach from finding the original source’s IList<T> implementation.

 

C#

 

// Will use IList<T> implementation if source implements it.

IEnumerable<T> source = …;

Parallel.ForEach(source, item => { /*…*/ });

 

 

// Will only provide an IEnumerable<T> implementation.

IEnumerable<T> source = …;

Parallel.ForEach(source.Select(t => t), item => { /*…*/ });

 

 

P A R A L L E L . F O R E A C H O V E R A P A R A L L E L Q U E R Y < T >

 

PLINQ’s ParallelEnumerable type operates in terms of ParallelQuery<T> objects. Such objects are returned from

the AsParallel extension method, and all of PLINQ’s operators consume and generate instances of

ParallelQuery<T>. ParallelQuery<T> is itself an IEnumerable<T>, which means it can be iterated over and may be

consumed by anything that understands how to work with an IEnumerable<T>.

 

Parallel.ForEach is one such construct that works with IEnumerable<T>. As such, it may be tempting to write code

that follows a pattern similar to the following:

 

C#

 

var q = from d in data.AsParallel() … select d;

Parallel.ForEach(q, item => { /* Process item. */ });

 

 

While this works correctly, it incurs unnecessary costs. In order for PLINQ to stream its output data into an

IEnumerable<T>, PLINQ must merge the data being generated by all of the threads involved in query processing so

that the multiple sets of data can be consumed by code expecting only one. Conversely, when accepting an input

IEnumerable<T>, Parallel.ForEach must consume the single data stream and partition it into multiple data streams

for processing in parallel. Thus, by passing a ParallelQuery<T> to a Parallel.ForEach, in the .NET Framework 4 the

data from the PLINQ query will be merged and will then be repartitioned by the Parallel.ForEach. This can be

costly.

 

PLINQ QueryPartitionPartitionPartition..

IEnumerable<T>

Parallel.ForEachPartitionPartitionPartition..

Instead, PLINQ’s ParallelEnumerable.ForAll method should be used. Rewriting the previous code as follows will

avoid the spurious merge and repartition:

 

Patterns of Parallel Programming

Page 31

 

 

C#

 

 

var q = (from d in data.AsParallel() … select d);

q.ForAll(item => { /* Process item. */ });

 

 

This allows the output of all partitions to be processed in parallel, as discussed in the previous section on

ParallelEnumerable.ForAll.

 

PLINQ QueryPartitionPartitionPartition..

Action<TSource>

Action<TSource>

Action<TSource>

Action<TSource>

T H R E A D A F F I N I T Y IN A C C E S S I N G S O U R C E D A T A

 

Both Parallel.ForEach and ParallelEnumerable.ForAll rely on each of the threads participating in the loop to pull

data from the source enumerator. While both ForEach and ForAll ensure that the enumerator is accessed in a

thread-safe manner (only one thread at a time will use MoveNext and Current, and will do so atomically with

respect to other threads in the loop), it’s still the case that multiple threads may use MoveNext over time. In

general, this shouldn’t be a problem. However, in some rare cases the implementation of MoveNext may have

thread affinity, meaning that for correctness purposes it should always be accessed from the same thread, and

perhaps even from a specific thread. An example of this could be if MoveNext were accessing a user interface (UI)

control in Windows Forms or Windows Presentation Foundation in order to retrieve its data, or if the control were

pulling data from the object model of one of the Microsoft Office applications. While such thread affinity is not

recommended, avoiding it may not be possible.

 

In such cases, the consuming implementation needs to change to ensure that the data source is only accessed by

the thread making the call to the loop. That can be achieved with a producer/consumer pattern (many more

details on that pattern are provided later in this document), using code similar in style to the following:

 

C#

 

static void ForEachWithEnumerationOnMainThread<T>(

IEnumerable<T> source, Action<T> body)

 

{

var collectedData = new BlockingCollection<T>();

var loop = Task.Factory.StartNew(() =>

 

Parallel.ForEach(collectedData.GetConsumingEnumerable(), body));

 

try

 

{

 

foreach (var item in source) collectedData.Add(item);

}

finally { collectedData.CompleteAdding(); }

loop.Wait();

 

 

Patterns of Parallel Programming Page 32

 

 

}

 

The Parallel.ForEach executes in the background by pulling the data from a shared collection that is populated by

the main thread enumerating the data source and copying its contents into the shared collection. This solves the

issue of thread affinity with the data source by ensuring that the data source is only accessed on the main thread.

If, however, all access to the individual elements must also be done only on the main thread, parallelization is

infeasible.

 

P A R A L L E L L O O P S F O R I /O -B O U N D W O R K L O A D S I N S C A L A B L E A P P L I C A T I O N S

 

It can be extremely tempting to utilize the delightfully parallel looping constructs in the .NET Framework 4 for I/Obound

workloads. And in many cases, it’s quite reasonable to do so as a quick-and-easy approach to getting up and

running with better performance.

 

Consider the need to ping a set of machines. We can do this quite easily using the

System.Net.NetworkInformation.Ping class, along with LINQ:

 

C#

 

var addrs = new[] { addr1, addr2, …, addrN };

var pings = from addr in addrs

select new Ping().Send(addr);

foreach (var ping in pings)

Console.WriteLine(“{0}: {1}”, ping.Status, ping.Address);

 

By adding just a few characters, we can easily parallelize this operation using PLINQ:

 

C#

 

var pings = from addr in addrs.AsParallel()

select new Ping().Send(addr);

foreach (var ping in pings)

Console.WriteLine(“{0}: {1}”, ping.Status, ping.Address);

 

 

Rather than using a single thread to ping these machines one after the other, this code uses multiple threads to do

so, typically greatly decreasing the time it takes to complete the operation. Of course, in this case, the work I’m

doing is not at all CPU-bound, and yet by default PLINQ uses a number of threads equal to the number of logical

processors, an appropriate heuristic for CPU-bound workloads but not for I/O-bound. As such, we can utilize

PLINQ’s WithDegreeOfParallelism method to get the work done even faster by using more threads (assuming

there are enough addresses being pinged to make good use of all of these threads):

 

C#

 

var pings = from addr in addrs.AsParallel().WithDegreeOfParallelism(16)

select new Ping().Send(addr);

foreach (var ping in pings)

Console.WriteLine(“{0}: {1}”, ping.Status, ping.Address);

 

For a client application on a desktop machine doing just this one operation, using threads in this manner typically

does not lead to any significant problems. However, if this code were running in an ASP.NET application, it could be

 

Patterns of Parallel Programming Page 33

 

 

deadly to the system. Threads have a non-negligible cost, a cost measurable in both the memory required for their

associated data structures and stack space, and in the extra impact it places on the operating system and its

scheduler. When threads are doing real work, this cost is justified. But when threads are simply sitting around

blocked waiting for an I/O operation to complete, they’re dead weight. Especially in Web applications, where

thousands of users may be bombarding the system with requests, that extra and unnecessary weight can bring a

server to a crawl. For applications where scalability in terms of concurrent users is at a premium, it’s imperative

not to write code like that shown above, even though it’s really simple to write. There are other solutions,

however.

 

WithDegreeOfParallelism changes the number of threads required to execute and

complete the PLINQ query, but it does not force that number of threads into

existence. If the number is larger than the number of threads available in the

ThreadPool, it may take some time for the ThreadPool thread-injection logic to inject

enough threads to complete the processing of the query . To force it to get there

faster, you can employ the ThreadPool.SetMinThreads method.

 

The System.Threading.Tasks.Task class will be discussed later in this document. In short, however, note that a

Task instance represents an asynchronous operation. Typically these are computationally-intensive operations, but

the Task abstraction can also be used to represent I/O-bound operations and without tying up a thread in the

process. As an example of this, the samples available at http://code.msdn.microsoft.com/ParExtSamples include

extension methods for the Ping class that provide asynchronous versions of the Send method to return a

Task<PingReply>. Using such methods, we can rewrite our previous method as follows:

 

C#

 

var pings = (from addr in addrs

 

select new Ping().SendTask(addr, null)).ToArray();

 

Task.WaitAll(pings);

 

foreach (Task<PingReply> ping in pings)

 

Console.WriteLine(“{0}: {1}”, ping.Result.Status, ping.Result.Address);

 

This new solution will asynchronously send a ping to all of the addresses, but no threads (other than the main

thread waiting on the results) will be blocked in the process; only when the pings complete will threads be utilized

briefly to process the results, the actual computational work. This results in a much more scalable solution, one

that may be used in applications that demand scalability. Note, too, that taking advantage of

Task.Factory.ContinueWhenAll (to be discussed later), the code can even avoid blocking the main iteration thread,

as illustrated in the following example:

 

C#

 

var pings = (from addr in addrs

select new Ping().SendTask(addr, null)).ToArray();

 

Task.Factory.ContinueWhenAll(pings, _ =>

 

{

 

Task.WaitAll(pings);

foreach (var ping in pings)

Console.WriteLine(“{0}: {1}”, ping.Result.Status, ping.Result.Address);

 

});

 

Patterns of Parallel Programming Page 34

 

 

The example here was shown utilizing the Ping class, which implements the Event-based Asynchronous Pattern

(EAP). This pattern for asynchronous operation was introduced in the .NET Framework 2.0, and is based on .NET

events that are raised asynchronously when an operation completes.

 

A more prevalent pattern throughout the .NET Framework is the Asynchronous Programming Model (APM)

pattern, which has existed in the .NET Framework since its inception. Sometimes referred to as the “begin/end”

pattern, this pattern is based on a pair of methods: a “begin” method that starts the asynchronous operation, and

an “end” method that joins with it, retrieving any results of the invocation or the exception from the operation .

 

To help integrate with this pattern, the aforementioned Task class can also be used to wrap an APM invocation,

which can again help with the scalability, utilizing the Task.Factory.FromAsync method. This support can then be

used to build an approximation of asynchronous methods, as is done in the Task.Factory.Iterate extension method

available in the samples at samples available at http://code.msdn.microsoft.com/ParExtSamples. For more

information, see http://blogs.msdn.com/pfxteam/9809774.aspx. Through its asynchronous workflow functionality,

F# in Visual Studio 2010 also provides first-class language support for writing asynchronous methods. For more

information, see http://msdn.microsoft.com/en-us/library/dd233182(VS.100).aspx. The incubation language

Axum, available for download at http://msdn.microsoft.com/en-us/devlabs/dd795202.aspx, also includes firstclass

language support for writing asynchronous methods.

 

Patterns of Parallel Programming Page 35

 

 

F O R K / J O I N F O R K / J O I N

The patterns employed for delightfully parallel loops are really a subset of a larger set of patterns centered around

“fork/join.” In fork/join patterns, work is “forked” such that several pieces of work are launched asynchronously .

That forked work is later joined with in order to ensure that all of the processing has completed, and potentially to

retrieve the results of that processing if it wasn’t utilized entirely for side-effecting behavior. Loops are a prime

example of this: we fork the processing of loop iterations, and we join such that the parallel loop invocation only

completes when all concurrent processing is done.

 

The new System.Threading.Tasks namespace in the .NET Framework 4 contains a significant wealth of support for

fork/join patterns. In addition to the Parallel.For, Parallel.ForEach, and PLINQ constructs already discussed, the

.NET Framework provides the Parallel.Invoke method, as well as the new Task and Task<TResult> types. The new

System.Threading.CountdownEvent type also helps with fork/join patterns, in particular for when dealing with

concurrent programming models that don’t provide built-in support for joins.

 

C O U N T I N G D O W N

 

 

A primary component of fork/join pattern implementations is keeping track of how much still remains to be

completed. We saw this in our earlier MyParallelFor and MyParallelForEach implementations, with the loop

storing a count for the number of work items that still remained to be completed, and a ManualResetEvent that

would be signaled when this count reached 0. Support for this pattern is codified into the new

System.Threading.CountdownEvent type in the .NET Framework 4. Below is a code snippet from earlier for

implementing the sample MyParallelFor, now modified to use CountdownEvent.

 

C#

 

static void MyParallelFor(

int fromInclusive, int toExclusive, Action<int> body)

 

{

int numProcs = Environment.ProcessorCount;

int nextIteration = fromInclusive;

 

using (CountdownEvent ce = new CountdownEvent(numProcs))

 

{

for (int p = 0; p < numProcs; p++)

{

 

 

ThreadPool.QueueUserWorkItem(delegate

 

{

int index;

while ((index = Interlocked.Increment(

 

ref nextIteration) -1) < toExclusive)

{

body(index);

}

 

ce.Signal();

 

});

}

 

 

ce.Wait();

 

}

}

 

Patterns of Parallel Programming Page 36

 

 

Using CountdownEvent frees us from having to manage a count manually. Instead, the event is initialized with the

expected number of signals, each thread signals the event when the thread completes its processing, and the main

thread waits on the event for all signals to be received.

 

C O U N T I N G U P A N D D O W N

 

 

Counting down is often employed in parallel patterns, but so is incorporating some amount of counting up . If the

remaining count represents the number of work items to be completed, and we end up adding more work items

after setting the initial count, the count will need to be increased.

 

Here is an example of implementing a MyParallelForEach that launches one asynchronous work item per element

to be processed. Since we don’t know ahead of time how many elements there will be, we add a count of 1 for

each element before launching it, and when the work item completes we signal the event.

 

C#

 

static void MyParallelForEach<T>(IEnumerable<T> source, Action<T> body)

{

 

using (CountdownEvent ce = new CountdownEvent(1))

 

{

foreach (var item in source)

{

 

 

ce.AddCount(1);

 

ThreadPool.QueueUserWorkItem(state =>

 

{

try { body((T)state); }

finally { ce.Signal(); }

 

 

}, item);

}

 

 

ce.Signal();

 

ce.Wait();

}

}

 

 

Note that the event is initialized with a count of 1. This is a common pattern in these scenarios, as we need to

ensure that the event isn’t set prior to all work items completing. If the count instead started at 0, and the first

work item started and completed prior to our adding count for additional elements, the CountdownEvent would

transition to a set state prematurely. By initializing the count to 1, we ensure that the event has no chance of

reaching 0 until we remove that initial count, which is done in the above example by calling Signal after all

elements have been queued.

 

P A R A L L E L . I N V O K E

 

 

As shown previously, the Parallel class provides support for delightfully parallel loops through the Parallel.For and

Parallel.ForEach methods. Parallel also provides support for patterns based on parallelized regions of code, where

every statement in a region may be executed concurrently. This support, provided through the Parallel.Invoke

method, enables a developer to easily specify multiple statements that should execute in parallel, and as with

Parallel.For and Parallel.ForEach, Parallel.Invoke takes care of issues such as exception handling, synchronous

invocation, scheduling, and the like:

 

Patterns of Parallel Programming Page 37

 

 

C#

 

 

Parallel.Invoke(

 

() => ComputeMean(),

 

() => ComputeMedian(),

 

() => ComputeMode());

 

Invoke itself follows patterns internally meant to help alleviate overhead. As an example, if you specify only a few

delegates to be executed in parallel, Invoke will likely spin up one Task per element. However, if you specify many

delegates, or if you specify ParallelOptions for how those delegates should be invoked, Invoke will likely instead

choose to execute its work in a different manner. Looking at the signature for Invoke, we can see how this might

happen:

 

C#

 

static void Invoke(params Action[] actions);

 

Invoke is supplied with an array of delegates, and it needs to perform an action for each one, potentially in

parallel. That sounds like a pattern to which ForEach can be applied, doesn’t it? In fact, we could implement a

MyParallelInvoke using the MyParallelForEach we previously coded:

 

C#

 

static void MyParallelInvoke(params Action[] actions)

{

 

MyParallelForEach(actions, action => action());

}

 

We could even use MyParallelFor:

 

C#

 

static void MyParallelInvoke(params Action[] actions)

 

{

 

MyParallelFor(0, actions.Length, i => actions[i]());

 

}

 

This is very similar to the type of operation Parallel.Invoke will perform when provided with enough delegates.

The overhead of a parallel loop is more than that of a few tasks, and thus when running only a few delegates, it

makes sense for Invoke to simply use one task per element. But after a certain threshold, it’s more efficient to use

a parallel loop to execute all of the actions, as the cost of the loop is amortized across all of the delegate

invocations.

 

O N E T A S K P E R E L E M E N T

 

 

Parallel.Invoke represents a prototypical example of the fork/join pattern. Multiple operations are launched in

 

parallel and then joined with such that only when they’re all complete will the entire operation be considered

 

complete. If we think of each individual delegate invocation from Invoke as being its own asynchronous operation,

we can use a pattern of applying one task per element, where in this case the element is the delegate:

 

C#

 

static void MyParallelInvoke(params Action[] actions)

 

{

var tasks = new Task[actions.Length];

for (int i = 0; i < actions.Length; i++)

 

 

Patterns of Parallel Programming Page 38

 

 

{

 

tasks[i] = Task.Factory.StartNew(actions[i]);

 

}

 

Task.WaitAll(tasks);

 

}

 

This same pattern can be applied for variations, such as wanting to invoke in parallel a set of functions that return

values, with the MyParallelInvoke method returning an array of all of the results. Here are several different ways

that could be implemented, based on the patterns shown thus far (do note these implementations each have

subtle differences in semantics, particularly with regards to what happens when an individual function fails with an

exception):

 

C#

 

// Approach #1: One Task per element

 

static T[]MyParallelInvoke<T>(params Func<T>[] functions)

{

var tasks = (from function in functions

 

select Task.Factory.StartNew(function)).ToArray();

Task.WaitAll(tasks);

return tasks.Select(t => t.Result).ToArray();

 

}

 

// Approach #2: One Task per element, using parent/child Relationships

 

static T[] MyParallelInvoke<T>(params Func<T>[] functions)

 

{

var results = new T[functions.Length];

Task.Factory.StartNew(() =>

{

 

for (int i = 0; i < functions.Length; i++)

 

{

int cur = i;

Task.Factory.StartNew(

 

 

() => results[cur] = functions[cur](),

TaskCreationOptions.AttachedToParent);

 

}

}).Wait();

return results;

 

 

}

 

// Approach #3: Using Parallel.For

 

static T[] MyParallelInvoke<T>(params Func<T>[] functions)

 

{

T[] results = new T[functions.Length];

Parallel.For(0, functions.Length, i =>

{

 

results[i] = functions[i]();

});

return results;

 

 

}

 

// Approach #4: Using PLINQ

 

static T[] MyParallelInvoke<T>(params Func<T>[] functions)

{

return functions.AsParallel().Select(f => f()).ToArray();

}

 

Patterns of Parallel Programming Page 39

 

 

As with the Action-based MyParallelInvoke, for just a handful of delegates the first approach is likely the most

efficient. Once the number of delegates increases to a plentiful amount, however, the latter approaches of using

Parallel.For or PLINQ are likely more efficient. They also allow you to easily take advantage of additional

functionality built into the Parallel and PLINQ APIs. For example, placing a limit on the degree of parallelism

employed with tasks directly requires a fair amount of additional code. Doing the same with either Parallel or

PLINQ requires only minimal additions. For example, if I want to use at most two threads to run the operations, I

can do the following:

 

C#

 

static T[] MyParallelInvoke<T>(params Func<T>[] functions)

{

T[] results = new T[functions.Length];

 

 

var options = new ParallelOptions { MaxDegreeOfParallelism = 2 };

Parallel.For(0, functions.Length, options, i =>

{

 

 

results[i] = functions[i]();

});

return results;

 

 

}

 

For fork/join operations, the pattern of creating one task per element may be particularly useful in the following

situations:

 

1)

Additional work may be started only when specific subsets of the original elements have completed

processing. As an example, in the Strassen’s matrix multiplication algorithm, two matrices are multiplied

by splitting each of the matrices into four quadrants. Seven intermediary matrices are generated based on

operations on the eight input submatrices. Four output submatrices that make up the larger output

matrix are computed from the intermediary seven. These four output matrices each only require a subset

of the previous seven, so while it’s correct to wait for all of the seven prior to computing the following

four, some potential for parallelization is lost as a result.

 

2)

All elements should be given the chance to run even if one invocation fails. With solutions based on

Parallel and PLINQ, the looping and query constructs will attempt to stop executing as soon as an

exception is encountered; this can be solved using manual exception handling with the loop, as

demonstrated earlier, however by using Tasks, each operation is treated independently, and such custom

 

code isn’t needed.

 

R E C U R S I V E D E C O M P O S I T I O N

 

 

One of the more common fork/join patterns deals with forks that themselves fork and join. This recursive nature is

known as recursive decomposition, and it applies to parallelism just as it applies to serial recursive

implementations.

 

Consider a Tree<T> binary tree data structure:

 

C#

 

class Tree<T>

 

{

 

public T Data;

 

public Tree<T> Left, Right;

 

}

 

Patterns of Parallel Programming

Page 40

 

 

A tree walk function that executes an action for each node in the tree might look like the following:

 

C#

 

static void Walk<T>(Tree<T> root, Action<T> action)

 

{

if (root == null) return;

action(root.Data);

Walk(root.Left, action);

Walk(root.Right, action);

 

 

}

 

Parallelizing this may be accomplished by fork/join’ing on at least the two recursive calls, if not also on the action

invocation:

 

C#

 

static void Walk<T>(Tree<T> root, Action<T> action)

 

{

 

if (root == null) return;

 

Parallel.Invoke(

 

() => action(root.Data),

 

() => Walk(root.Left, action),

 

() => Walk(root.Right, action));

 

}

 

The recursive calls to Walk themselves fork/join as well, leading to a logical tree of parallel invocations. This can of

course also be done using Task objects directly:

 

C#

 

static void Walk<T>(Tree<T> root, Action<T> action)

 

{

if (root == null) return;

var t1 = Task.Factory.StartNew(() => action(root.Data));

var t2 = Task.Factory.StartNew(() => Walk(root.Left, action));

var t3 = Task.Factory.StartNew(() => Walk(root.Right, action));

Task.WaitAll(t1, t2, t3);

 

 

}

 

We can see all of these Tasks in Visual Studio using the Parallel Tasks debugger window, as shown in the following

screenshot:

 

 

Patterns of Parallel Programming Page 41

 

 

We can further take advantage of parent/child relationships in order to see the associations between these Tasks

in the debugger. First, we can modify our code by forcing all tasks to be attached to a parent, which will be the

Task currently executing when the child is created. This is done with the TaskCreationOptions.AttachedToParent

option:

 

C#

 

static void Walk<T>(Tree<T> root, Action<T> action)

 

{

if (root == null) return;

var t1 = Task.Factory.StartNew(() => action(root.Data),

 

TaskCreationOptions.AttachedToParent);

var t2 = Task.Factory.StartNew(() => Walk(root.Left, action),

TaskCreationOptions.AttachedToParent);

var t3 = Task.Factory.StartNew(() => Walk(root.Right, action),

TaskCreationOptions.AttachedToParent);

Task.WaitAll(t1, t2, t3);

}

 

Re-running the application, we can now see the following parent/child hierarchy in the debugger:

 

 

C O N T I N U A T I O N C H A I N I N G

 

The previous example of walking a tree utilizes blocking semantics, meaning that a particular level won’t complete

until its children have completed. Parallel.Invoke, and the Task Wait functionality on which it’s based, attempt

what’s known as inlining, where rather than simply blocking waiting for another thread to execute a Task, the

waiter may be able to run the waitee on the current thread, thereby improving resource reuse, and improving

performance as a result. Still, there may be some cases where tasks are not inlinable, or where the style of

development is better suited towards a more asynchronous model. In such cases, task completions can be chained.

 

As an example of this, we’ll revisit the Walk method. Rather than returning void, the Walk method can return a

Task. That Task can represent the completion of all child tasks. There are two primary ways to accomplish this. One

way is to take advantage of Task parent/child relationships briefly mentioned previously. With parent/child

relationships, a parent task won’t be considered completed until all of its children have completed.

 

Patterns of Parallel Programming Page 42

 

 

C#

 

 

static Task Walk<T>(Tree<T> root, Action<T> action)

{

 

 

return Task.Factory.StartNew(() =>

{

 

 

if (root == null) return;

Walk(root.Left, action);

Walk(root.Right, action);

action(root.Data);

 

 

}, TaskCreationOptions.AttachedToParent);

 

}

 

Every call to Walk creates a new Task that’s attached to its parent and immediately returns that Task. That Task,

when executed, recursively calls Walk (thus creating Tasks for the children) and executes the relevant action. At

the root level, the initial call to Walk will return a Task that represents the entire tree of processing and that won’t

complete until the entire tree has completed.

 

Another approach is to take advantage of continuations:

 

C#

 

static Task Walk<T>(Tree<T> root, Action<T> action)

 

{

 

if (root == null) return _completedTask;

 

Task t1 = Task.Factory.StartNew(() => action(root.Data));

 

Task<Task> t2 = Task.Factory.StartNew(() => Walk(root.Left, action));

 

Task<Task> t3 = Task.Factory.StartNew(() => Walk(root.Right, action));

 

return Task.Factory.ContinueWhenAll(

 

new Task[] { t1, t2.Unwrap(), t3.Unwrap() },

 

tasks => Task.WaitAll(tasks));

 

}

 

As we’ve previously seen, this code uses a task to represent each of the three operations to be performed at each

 

node: invoking the action for the node, walking the left side of the tree, and walking the right side of the tree .

However, we now have a predicament, in that the Task returned for walking each side of the tree is actually a

Task<Task> rather than simply a Task. This means that the result will be signaled as completed when the Walk call

has returned, but not necessarily when the Task it returned has completed. To handle this, we can take advantage

of the Unwrap method, which converts a Task<Task> into a Task, by “unwrapping” the internal Task into a toplevel

Task that represents it (another overload of Unwrap handles unwrapping a Task<Task<TResult>> into a

Task<TResult>). Now with our three tasks, we can employ the ContinueWhenAll method to create and return a

Task that represents the total completion of this node and all of its descendants. In order to ensure exceptions are

 

propagated correctly, the body of that continuation explicitly waits on all of the tasks; it knows they’re completed

 

by this point, so this is simply to utilize the exception propagation logic in WaitAll.

 

The parent-based approach has several advantages, including that the Visual Studio

2010 Parallel Tasks toolwindow can highlight the parent/child relationship involved,

showing the task hierarchy graphically during a debugging session, and exception

handling is simplified, as all exceptions will bubble up to the root parent. However,

the continuation approach may have a memory benefit for deep hierarchies or long chains

of tasks, since with the parent/child relationships, running children prevent

the parent nodes from being garbage collected.

 

Patterns of Parallel Programming Page 43

 

 

To simplify this, you can consider codifying this into an extension method for easier implementation:

 

C#

 

static Task ContinueWhenAll(

this TaskFactory factory, params Task[] tasks)

{

return factory.ContinueWhenAll(

tasks, completed => Task.WaitAll(completed));

}

 

With that extension method in place, the previous snippet may be rewritten as:

 

C#

 

static Task Walk<T>(Tree<T> root, Action<T> action)

 

{

 

if (root == null) return _completedTask;

 

var t1 = Task.Factory.StartNew(() => action(root.Data));

 

var t2 = Task.Factory.StartNew(() => Walk(root.Left, action));

 

var t3 = Task.Factory.StartNew(() => Walk(root.Right, action));

 

return Task.Factory.ContinueWhenAll(t1, t2.Unwrap(), t3.Unwrap());

 

}

 

One additional thing to notice is the _completedTask returned if the root node is null. Both WaitAll and

ContinueWhenAll will throw an exception if the array of tasks passed to them contains a null element. There are

several ways to work around this, one of which is to ensure that a null element is never provided. To do that, we

can return a valid Task from Walk even if there is no node to be processed. Such a Task should be already

completed so that little additional overhead is incurred. To accomplish this, we can create a single Task using a

TaskCompletionSource<TResult>, resolve the Task into a completed state, and cache it for all code that needs a

completed Task to use:

 

C#

 

private static Task _completedTask = ((Func<Task>)(() => {

var tcs = new TaskCompletionSource<object>();

tcs.SetResult(null);

return tcs.Task;

 

}))();

 

A N T I -P A T T E R N S

 

 

F A L S E S H A R I N G

 

Data access patterns are important for serial applications, and they’re even more important for parallel

 

applications. One serious performance issue that can arise in parallel applications occurs where unexpected

sharing happens at the hardware level.

 

For performance reasons, memory systems use groups called cache lines, typically of 64 bytes or 128 bytes. A

cache line, rather than an individual byte, is moved around the system as a unit, a classic example of chunky

instead of chatty communication. If multiple cores attempt to access two different bytes on the same cache line,

 

there’s no correctness sharing conflict, but only one will be able to have exclusive access to the cache line at the

 

Patterns of Parallel Programming Page 44

 

 

hardware level, thus introducing the equivalent of a lock at the hardware level that wasn’t otherwise present in

the code. This can lead to unforeseen and serious performance problems.

 

As an example, consider the following method, which uses a Parallel.Invoke to initialize two arrays to random

values:

 

C#

 

void WithFalseSharing()

{

 

 

Random rand1 = new Random(), rand2 = new Random();

int[] results1 = new int[20000000], results2 = new int[20000000];

 

 

Parallel.Invoke(

() => {

for (int i = 0; i < results1.Length; i++)

 

 

results1[i] = rand1.Next();

},

() => {

 

 

for (int i = 0; i < results2.Length; i++)

results2[i] = rand2.Next();

});

}

 

 

The code initializes two distinct System.Random instances and two distinct arrays, such that each thread involved

in the parallelization touches its own non-shared state. However, due to the way these two Random instances

were allocated, they’re likely on the same cache line in memory. Since every call to Next modifies the Random

instance’s internal state, multiple threads will now be contending for the same cache line, leading to seriously

impacted performance. Here’s a version that addresses the issue:

 

C#

 

void WithoutFalseSharing()

 

{

int[] results1, results2;

Parallel.Invoke(

 

 

() => {

 

Random rand1 = new Random();

results1 = new int[20000000];

 

 

for (int i = 0; i < results1.Length; i++)

 

results1[i] = rand1.Next();

},

() => {

 

 

Random rand2 = new Random();

results2 = new int[20000000];

 

 

for (int i = 0; i < results2.Length; i++)

results2[i] = rand2.Next();

});

}

 

 

On my dual-core system, when comparing the performance of these two methods, the version with false sharing

typically ends up running slower than the serial equivalent, whereas the version without false sharing typically

ends up running almost twice as fast as the serial equivalent.

 

False sharing is a likely source for investigation if you find that parallelized code operating with minimal

synchronization isn’t obtaining the parallelized performance improvements you expected. More information is

available in the MSDN Magazine article .NET Matters: False Sharing.

 

Patterns of Parallel Programming Page 45

 

 

R E C U R S I O N W I T H O U T T H R E S H O L D S

 

In a typical introductory algorithms course, computer science students learn about various algorithms for sorting,

often culminating in quicksort. Quicksort is a recursive divide-and-conquer algorithm, where the input array to be

sorted is partitioned into two contiguous chunks, one with values less than a chosen pivot and one with values

greater than or equal to a chosen pivot. Once the array has been partitioned, the quicksort routine may be used

recursively to sort each chunk. The recursion ends when the size of a chunk is one element, since one element is

implicitly sorted.

 

Students learn that quicksort has an average algorithmic complexity of O(N log N), which for large values of N is

much faster than other algorithms like insertion sort which have a complexity of O(N2). They also learn, however,

that big-O notation focuses on the limiting behavior of functions and ignores constants, because as the value of N

grows, the constants aren’t relevant. Yet when N is small, those constants can actually make a difference.

 

It turns out that constants involved in quicksort are larger than those involved in insertion sort, and as such, for

small values of N, insertion sort is often faster than quicksort. Due to quicksort’s recursive nature, even if the

operation starts out operating on a large N, at some point in the recursion the value of N for that particular call is

small enough that it’s actually better to use insertion sort. Thus, many quality implementations of quicksort won’t

stop the recursion when a chunk size is one, but rather will choose a higher value, and when that threshold is

reached, the algorithm will switch over to a call to insertion sort to sort the chunk, rather than continuing with the

recursive quicksort routine.

 

As has been shown previously, quicksort is a great example for recursive decomposition with task-based

parallelism, as it’s easy to recursively sort the left and right partitioned chunks in parallel, as shown in the following

example:

 

C#

 

static void QuickSort<T>(T[] data, int fromInclusive, int toExclusive)

where T : IComparable<T>

{

if (toExclusive -fromInclusive <= THRESHOLD)

InsertionSort(data, fromInclusive, toExclusive);

 

else

 

{

int pivotPos = Partition(data, fromInclusive, toExclusive);

Parallel.Invoke(

 

() => QuickSort(data, fromInclusive, pivotPos),

() => QuickSort(data, pivotPos, toExclusive));

}

}

 

 

You’ll note, however, that in addition to the costs associated with the quicksort algorithm itself, we now have

additional overheads involved with creating tasks for each half of the sort. If the computation is completely

balanced, at some depth into the recursion we will have saturated all processors. For example, on a dual-core

 

machine, the first level of recursion will create two tasks, and thus theoretically from that point forward we’re

saturating the machine and there’s no need to continue to bear the overhead of additional tasks. This implies that

we now may benefit from a second threshold: in addition to switching from quicksort to insertion sort at some

threshold, we now also want to switch from parallel to serial at some threshold. That threshold may be defined in

a variety of ways.

 

Patterns of Parallel Programming Page 46

 

 

As with the insertion sort threshold, a simple parallel threshold could be based on the amount of data left to be

processed:

 

C#

 

static void QuickSort<T>(T[] data, int fromInclusive, int toExclusive)

where T : IComparable<T>

 

{

 

if (toExclusive -fromInclusive <= THRESHOLD)

InsertionSort(data, fromInclusive, toExclusive);

 

 

else

 

{

 

int pivotPos = Partition(data, fromInclusive, toExclusive);

 

if (toExclusive -fromInclusive <= PARALLEL_THRESHOLD)

 

{

 

// NOTE: PARALLEL_THRESHOLD is chosen to be greater than THRESHOLD.

 

QuickSort(data, fromInclusive, pivotPos);

 

QuickSort(data, pivotPos, toExclusive);

 

}

 

else Parallel.Invoke(

() => QuickSort(data, fromInclusive, pivotPos),

() => QuickSort(data, pivotPos, toExclusive));

 

}

}

 

 

Another simple threshold may be based on depth. We can initialize the depth to the max depth we want to recur

to in parallel, and decrement the depth each time we recur… when it reaches 0, we fall back to serial.

 

C#

 

static void QuickSort<T>(T[] data, int fromInclusive, int toExclusive, int depth)

where T : IComparable<T>

{

 

if (toExclusive -fromInclusive <= THRESHOLD)

InsertionSort(data, fromInclusive, toExclusive);

 

 

else

 

{

 

int pivotPos = Partition(data, fromInclusive, toExclusive);

 

if (depth > 0)

 

{

 

Parallel.Invoke(

() => QuickSort(data, fromInclusive, pivotPos, depth-1),

() => QuickSort(data, pivotPos, toExclusive, depth-1));

 

}

 

else

 

{

 

QuickSort(data, fromInclusive, pivotPos, 0);

QuickSort(data, pivotPos, toExclusive, 0);

}

 

}

 

}

 

If you assume that the parallelism will be completely balanced due to equal work resulting from all partition

operations, you might then base the initial depth on the number of cores in the machine:

 

C#

 

QuickSort(data, 0, data.Length, Math.Log(Environment.ProcessorCount, 2));

 

Patterns of Parallel Programming Page 47

 

 

Alternatively, you might provide a bit of extra breathing room in case the problem space isn’t perfectly balanced:

 

C#

 

QuickSort(data, 0, data.Length, Math.Log(Environment.ProcessorCount, 2) + 1);

 

Of course, the partitioning may result in very unbalanced workloads. And quicksort is just one example of an

algorithm; many other algorithms that are recursive in this manner will frequently result in very unbalanced

workloads.

 

Another approach is to keep track of the number of outstanding work items, and only “go parallel” when the

 

number of outstanding items is below a threshold. An example of this follows:

 

C#

 

class Utilities

 

{

 

static int CONC_LIMIT = Environment.ProcessorCount * 2;

volatile int _invokeCalls = 0;

 

 

public void QuickSort<T>(T[] data, int fromInclusive, int toExclusive)

where T : IComparable<T>

{

if (toExclusive -fromInclusive <= THRESHOLD)

InsertionSort(data, fromInclusive, toExclusive);

 

else

 

{

int pivotPos = Partition(data, fromInclusive, toExclusive);

 

if (_invokeCalls < CONC_LIMIT)

 

{

 

Interlocked.Increment(ref _invokeCalls);

 

Parallel.Invoke(

() => QuickSort(data, fromInclusive, pivotPos),

() => QuickSort(data, pivotPos, toExclusive));

 

Interlocked.Decrement(ref _invokeCalls);

 

}

 

else

 

{

QuickSort(data, fromInclusive, pivotPos);

QuickSort(data, pivotPos, toExclusive);

 

}

}

}

}

 

 

Here, we’re keeping track of the number of Parallel.Invoke calls active at any one time. When the number is below

a predetermined limit, we recur using Parallel.Invoke; otherwise, we recur serially. This adds the additional

expense of two interlocked operations per recursive call (and is only an approximation, as the _invokeCalls field is

compared to the threshold outside of any synchronization), forcing synchronization where it otherwise wasn’t

needed, but it also allows for more load-balancing. Previously, once a recursive path was serial, it would remain

serial. With this modification, a serial path through QuickSort may recur and result in a parallel path.

 

Patterns of Parallel Programming Page 48

 

 

P A S S I N G D A T A P A S S I N G D A T A

There are several common patterns in the .NET Framework for passing data to asynchronous work.

 

C L O S U R E S

 

 

Since support for them was added to C# and Visual Basic, closures represent the easiest way to pass data into

background operations. By creating delegates that refer to state outside of their scope, the compiler transforms

the accessed variables in a way that makes them accessible to the delegates, “closing over” those variables. This

makes it easy to pass varying amounts of data into background work:

 

C#

 

int data1 = 42;

string data2 = “The Answer to the Ultimate Question of ” +

 

 

“Life, the Universe, and Everything”;

Task.Factory.StartNew(()=>

{

 

 

Console.WriteLine(data2 + “: ” + data1);

});

 

 

For applications in need of the utmost in performance and scalability, it’s important to keep in mind that under the

covers the compiler may actually be allocating an object in which to store the variables (in the above example,

data1 and data2) that are accessed by the delegate.

 

S T A T E O B J E C T S

 

 

Dating back to the beginning of the .NET Framework, many APIs that spawn asynchronous work accept a state

parameter and pass that state object into the delegate that represents the body of work. The

ThreadPool.QueueUserWorkItem method is a quintessential example of this:

 

C#

 

public static bool QueueUserWorkItem(WaitCallback callBack, object state);

 

 

public delegate void WaitCallback(object state);

 

We can take advantage of this state parameter to pass a single object of data into the WaitCallback:

 

C#

 

ThreadPool.QueueUserWorkItem(state => {

Console.WriteLine((string)state);

}, data2);

 

The Task class in the .NET Framework 4 also supports this pattern:

 

C#

 

Task.Factory.StartNew(state => {

Console.WriteLine((string)state);

}, data2);

 

 

Patterns of Parallel Programming Page 49

 

 

Note that in contrast to the closures approach, this typically does not cause an extra object allocation to handle

the state, unless the state being supplied is a value type (value types must be boxed to supply them as the object

state parameter).

 

To pass in multiple pieces of data with this approach, those pieces of data must be wrapped into a single object . In

the past, this was typically a custom class to store specific pieces of information. With the .NET Framework 4, the

new Tuple<> classes may be used instead:

 

C#

 

Tuple<int,string> data = Tuple.Create(data1, data2);

 

Task.Factory.StartNew(state => {

 

Tuple<int,string> d = (Tuple<int,string>)state;

 

Console.WriteLine(d.Item2 + “: ” + d.Item1);

 

}, data);

 

As with both closures and working with value types, this requires an object allocation to support the creation of

the tuple to wrap the data items. The built-in tuple types in the .NET Framework 4 also support a limited number

of contained pieces of data.

 

S T A T E O B J E C T S W I T H M E M B E R M E T H O D S

 

 

Another approach, similar to the former, is to pass data into asynchronous operations by representing the work to

be done asynchronously as an instance method on a class. This allows data to be passed in to that method

 

implicitly through the “this” reference.

 

C#

 

class Work

 

{

public int Data1;

public string Data2;

public void Run()

{

 

 

Console.WriteLine(Data1 + “: ” + Data2);

}

}

 

 

// …

 

Work w = new Work();

w.Data1 = 42;

w.Data2 = “The Answer to the Ultimate Question of ” +

 

 

“Life, the Universe, and Everything”;

Task.Factory.StartNew(w.Run);

 

 

As with the previous approaches, this approach requires an object allocation for an object (in this case, of class

Work) to store the state. Such an allocation is still required if Work is a struct instead of a class; this is because the

creation of a delegate referring to Work must reference the object on which to invoke the instance method Run,

and that reference is stored as an object, thus boxing the struct.

 

As such, which of these approaches you choose is largely a matter of preference. The closures approach typically

leads to the most readable code, and it allows the compiler to optimize the creation of the state objects. For

example, if the anonymous delegate passed to StartNew doesn’t access any local state, the compiler may be able

to avoid the object allocation to store the state, as it will already be stored as accessible instance or static fields.

 

Patterns of Parallel Programming Page 50

 

 

A N T I -P A T T E R N S

 

 

C L O S I N G O V E R I N A P P R O P R I A T E L Y S H A R E D D A T A

 

Consider the following code, and hazard a guess for what it outputs:

 

C#

 

static void Main()

 

{

 

for (int i = 0; i < 10; i++)

 

{

 

ThreadPool.QueueUserWorkItem(delegate { Console.WriteLine(i); });

 

}

 

}

 

If you guessed that this outputs the numbers 0 through 9 inclusive, you’d likely be wrong. While that might be the

output, more than likely this will actually output ten “10”s. The reason for this has to do with the language’s rules

for scoping and how it captures variables into anonymous methods, which here were used to represent the work

provided to QueueUserWorkItem. The variable i is shared by both the main thread queuing the work items and

the ThreadPool threads printing out the value of i. The main thread is continually updating the value of i as it

iterates from 0 through 9, and thus each output line will contain the value of i at whatever moment the

Console.WriteLine call occurs on the background thread. (Note that unlike the C# compiler, the Visual Basic

compiler kindly warns about this issue: “warning BC42324: Using the iteration variable in a lambda expression may

have unexpected results. Instead, create a local variable within the loop and assign it the value of the iteration

variable.”)

 

This phenomenon isn’t limited to parallel programming, though the prominence of anonymous methods and

lambda expressions in the the .NET Framework parallel programming model does exacerbate the issue. For a serial

example, consider the following code:

 

C#

 

static void Main()

 

{

var actions = new List<Action>();

for (int i = 0; i < 10; i++)

 

 

actions.Add(() => Console.WriteLine(i));

actions.ForEach(action => action());

}

 

 

This code will reliably output ten “10”s, as by the time the Action delegates are invoked, the value of i is already

10, and all of the delegates are referring to the same captured i variable.

 

To address this issue, we can create a local copy of the iteration variable in scope inside the loop (as was

recommended by the Visual Basic compiler). This will cause each anonymous method to gain its own variable,

rather than sharing them with other delegates. The sequential code shown earlier can be fixed with a small

alteration:

 

C#

 

static void Main()

{

 

 

Patterns of Parallel Programming Page 51

 

 

var actions = new List<Action>();

for (int i = 0; i < 10; i++)

{

 

 

int tmp = i;

 

actions.Add(() => Console.WriteLine(tmp));

}

actions.ForEach(action => action());

 

 

}

 

This will reliably print out the sequence “0” through “9” as expected. The parallel code can be fixed in a similar

manner:

 

C#

 

static void Main()

 

{

for (int i = 0; i < 10; i++)

{

 

 

int tmp = i;

ThreadPool.QueueUserWorkItem(delegate { Console.WriteLine(tmp); });

}

}

 

This will also reliably print out the values “0” through “9”, although the order in which they’re printed is not

 

guaranteed.

 

Another similar case where closure semantics can lead you astray is if you’re in the habit of declaring your

 

variables at the top of your function, and then using them later on. For example:

 

C#

 

static void Main(string[] args)

{

 

 

int j;

Parallel.For(0, 10000, i =>

{

 

 

int total = 0;

for (j = 1; j <= 10000; j++) total += j;

});

 

 

}

 

Due to closure semantics, the j variable will be shared by all iterations of the parallel loop, thus wreaking havoc on

the inner serial loop. To address this, the variable declarations should be moved as close to their usage as possible:

 

C#

 

static void Main(string[] args)

 

{

Parallel.For(0, 10000, i =>

{

 

int total = 0;

for (int j = 1; j <= 10000; j++) total += j;

});

}

 

Patterns of Parallel Programming Page 52

 

 

P R O D U C E R / C O N S U M E R P R O D U C E R / C O N S U M E R

The real world revolves around the “producer/consumer” pattern. Individual entities are responsible for certain

functions, where some entities generate material that ends up being consumed by others. In some cases, those

consumers are also producers for even further consumers. Sometimes there are multiple producers per consumer,

sometimes there are multiple consumers per producer, and sometimes there’s a many-to-many relationship. We

live and breathe producer/consumer, and the pattern similarly has a very high value in parallel computing.

 

Often, producer/consumer relationships are applied to parallelization when there’s no ability to parallelize an

individual operation, but when multiple operations may be carried out concurrently, with one having a

dependency on the other. For example, consider the need to both compress and encrypt a particular file. This can

be done sequentially, with a single thread reading in a chunk of data, compressing it, encrypting the compressed

data, writing out the encrypted data, and then repeating the process for more chunks until the input file has been

completely processed. Depending on the compression and encryption algorithms utilized, there may not be the

 

ability to parallelize an individual compression or encryption, and the same data certainly can’t be compressed

 

concurrently with it being encrypted, as the encryption algorithm must run over the compressed data rather than

over the uncompressed input. Instead, multiple threads may be employed to form a pipeline. One thread can read

in the data. That thread can hand the read data off to another thread that compresses it, and in turn hands the

compressed data off to a third thread. The third thread can then encrypt it, and pass it off to a fourth thread,

which writes the encrypted data to the output file. Each processing “agent”, or “actor”, in this scheme is serial in

nature, churning its input into output, and as long as the hand-offs between agents don’t introduce any reordering

operations, the output data from the entire process will emerge in the same order the associated data was input .

 

Those hand-offs can be managed with the new BlockingCollection<> type, which provides key support for this

pattern in the .NET Framework 4.

 

P I P E L I N E S

 

 

Hand-offs between threads in a parallelized system require shared state: the producer needs to put the o utput

data somewhere, and the consumer needs to know where to look to get its input data. More than just having

access to a storage location, however, there is additional communication that’s necessary. A consumer is often

prevented from making forward progress until there’s some data to be consumed. Additionally, in some systems, a

producer needs to be throttled so as to avoid producing data much faster than consumers can consume it. In both

of these cases, a notification mechanism must also be incorporated. Additionally, with multiple producers and

multiple consumers, participants must not trample on each other as they access the storage location.

 

We can build a simple version of such a hand-off mechanism using a Queue<T> and a SemaphoreSlim:

 

C#

 

class BlockingQueue<T>

 

{

private Queue<T> _queue = new Queue<T>();

private SemaphoreSlim _semaphore = new SemaphoreSlim(0, int.MaxValue);

 

public void Enqueue(T data)

 

{

if (data == null) throw new ArgumentNullException(“data”);

lock (_queue) _queue.Enqueue(data);

_semaphore.Release();

 

Patterns of Parallel Programming Page 53

 

 

}

 

public T Dequeue()

 

{

 

_semaphore.Wait();

 

lock (_queue) return _queue.Dequeue();

 

}

 

}

 

Here we have a very simple “blocking queue” data structure. Producers call Enqueue to add data into the queue,

which adds the data to an internal Queue<T> and notifies consumers using a semaphore that another element of

data is available. Similarly, consumers use Dequeue to wait for an element of data to be available and then remove

that data from the underlying Queue<T>. Note that because multiple threads could be accessing the data structure

concurrently, a lock is used to protect the non-thread-safe Queue<T> instance.

 

Another similar implementation makes use of Monitor’s notification capabilities instead of using a semaphore:

 

C#

 

class BlockingQueue<T>

{

private Queue<T> _queue = new Queue<T>();

 

 

public void Enqueue(T data)

 

{

if (data == null) throw new ArgumentNullException(“data”);

lock (_queue)

{

 

_queue.Enqueue(data);

Monitor.Pulse(_queue);

}

}

 

 

public T Dequeue()

 

{

lock (_queue)

{

 

 

while (_queue.Count == 0) Monitor.Wait(_queue);

return _queue.Dequeue();

}

}

}

 

 

Such implementations provide basic support for data hand-offs between threads, but they also lack several

important things. How do producers communicate that there will be no more elements produced? With this

blocking behavior, what if a consumer only wants to block for a limited amount of time before doing something

else? What if producers need to be throttled, such that if the underlying Queue<T> is full they’re blocked from

adding to it? What if you want to pull from one of several blocking queues rather than from a single one? What if

semantics others than first-in-first-out (FIFO) are required of the underlying storage? What if producers and

consumers need to be canceled? And so forth.

 

All of these questions have answers in the new .NET Framework 4

System.Collections.Concurrent.BlockingCollection<T> type in System.dll. It provides the same basic behavior as

shown in the naïve implementation above, sporting methods to add to and take from the collection. But it also

 

Patterns of Parallel Programming Page 54

 

 

supports throttling both consumers and producers, timeouts on waits, support for arbitrary underlying data

structures, and more. It also provides built-in implementations of typical coding patterns related to

producer/consumer in order to make such patterns simple to utilize.

 

As an example of a standard producer/consumer pattern, consider the need to read in a file, transform each line

using a regular expression, and write out the transformed line to a new file. We can implement that using a Task to

run each step of the pipeline asynchronously, and BlockingCollection<string> as the hand-off point between each

stage.

 

C#

 

static void ProcessFile(string inputPath, string outputPath)

 

{

var inputLines = new BlockingCollection<string>();

var processedLines = new BlockingCollection<string>();

 

 

// Stage #1

 

var readLines = Task.Factory.StartNew(() =>

{

 

 

try

 

{

 

foreach (var line in File.ReadLines(inputPath)) inputLines.Add(line);

}

finally { inputLines.CompleteAdding(); }

 

});

 

// Stage #2

 

var processLines = Task.Factory.StartNew(() =>

{

 

 

try

 

{

foreach(var line in inputLines.GetConsumingEnumerable()

.Select(line => Regex.Replace(line, @”\s+”, “, “)))

{

processedLines.Add(line);

 

}

}

finally { processedLines.CompleteAdding(); }

 

});

 

// Stage #3

 

var writeLines = Task.Factory.StartNew(() =>

{

File.WriteAllLines(outputPath, processedLines.GetConsumingEnumerable());

});

 

Task.WaitAll(readLines, processLines, writeLines);

}

 

 

With this basic structure coded up, we have a lot of flexibility and room for modification. For example, what if we

discover from performance testing that we’re reading from the input file much faster than the processing and

outputting can handle it? One option is to limit the speed at which the input file is read, which can be done by

modifying how the inputLines collection is created:

 

Patterns of Parallel Programming Page 55

 

 

C#

 

 

var inputLines = new BlockingCollection<string>(boundedCapacity:20);

 

By adding the boundedCapacity parameter (shown here for clarity using named parameter functionality, which is

now supported by both C# and Visual Basic in Visual Studio 2010), a producer attempting to add to the collection

will block until there are less than 20 elements in the collection, thus slowing down the file reader. Alternatively,

we could further parallelize the solution. For example, let’s assume that through testing you found the real

problem to be that the processLines Task was heavily compute bound. To address that, you could parallelize it

using PLINQ in order to utilize more cores:

 

C#

 

foreach(var line in inputLines.GetConsumingEnumerable()

 

.AsParallel().AsOrdered()

 

.Select(line => Regex.Replace(line, @”\s+”, “, “)))

 

Note that by specifying “.AsOrdered()” after the “.AsParallel()”, we’re ensuring that PLINQ maintains the same

ordering as in the sequential solution.

 

D E C O R A T O R T O P I P E L I N E

 

The decorator pattern is one of the original Gang Of Four design patterns. A decorator is an object that has the

same interface as another object it contains. In object-oriented terms, it is an object that has an “is-a” and a “hasa”

relationship with a specific type. Consider the CryptoStream class in the System.Security.Cryptography

namespace. CryptoStream derives from Stream (it “is-a” Stream), but it also accepts a Stream to its constructor

and stores that Stream internally (it “has-a” stream); that underlying stream is where the encrypted data is stored.

CryptoStream is a decorator.

 

With decorators, we typically chain them together. For example, as alluded to in the introduction to this section on

producer/consumer, a common need in software is to both compress and encrypt data. The .NET Framework

contains two decorator stream types to make this feasible: the CryptoStream class already mentioned, and the

GZipStream class. We can compress and encrypt an input file into an output file with code like the following:

 

C#

 

static void CompressAndEncrypt(string inputFile, string outputFile)

 

{

using (var input = File.OpenRead(inputFile))

using (var output = File.OpenWrite(outputFile))

using (var rijndael = new RijndaelManaged())

using (var transform = rijndael.CreateEncryptor())

using (var encryptor =

 

 

new CryptoStream(output, transform, CryptoStreamMode.Write))

using (var compressor =

new GZipStream(encryptor, CompressionMode.Compress, true))

input.CopyTo(compressor);

}

 

The input file stream is copied to a GZipStream, which wraps a CryptoStream, which wraps the output stream. The

data flows from one stream to the other, with its data modified along the way.

 

Patterns of Parallel Programming Page 56

 

 

Both compression and encryption are computationally intense operations, and as such it can be beneficial to

parallelize this operation. However, given the nature of the problem, it’s not just as simple as running both the

compression and encryption in parallel on the input stream, since the encryption operates on the output of the

compression. Instead, we can form a pipeline, with the output of the compression being fed as the input to the

encryption, such that while the encryption is processing data block N, the compression routine can have already

moved on to be processing N+1 or greater. To make this simple, we’ll implement it with another decorator, a

TransferStream. The idea behind this stream is that writes are offloaded to another thread, which sequentially

writes to the underlying stream all of the writes to the transfer stream. That way, when code calls Write on the

transfer stream, it’s not blocked waiting for the whole chain of decorators to complete their processing: Write

 

returns immediately after queuing the work, and the caller can go on to do additional work. A simple

implementation of TransferStream is shown below (relying on a custom Stream base type, which simply

implements the abstract Stream class with default implementations of all abstract members, in order to keep the

code shown here small), taking advantage of both Task and BlockingCollection:

 

C#

 

public sealed class TransferStream : AbstractStreamBase

 

{

private Stream _writeableStream;

private BlockingCollection<byte[]> _chunks;

private Task _processingTask;

 

 

public TransferStream(Stream writeableStream)

{

 

 

// … Would validate arguments here

 

_writeableStream = writeableStream;

_chunks = new BlockingCollection<byte[]>();

_processingTask = Task.Factory.StartNew(() =>

{

 

foreach (var chunk in _chunks.GetConsumingEnumerable())

_writeableStream.Write(chunk, 0, chunk.Length);

}, TaskCreationOptions.LongRunning);

}

 

public override bool CanWrite { get { return true; } }

 

public override void Write(byte[] buffer, int offset, int count)

{

 

// … Would validate arguments here

 

var chunk = new byte[count];

Buffer.BlockCopy(buffer, offset, chunk, 0, count);

_chunks.Add(chunk);

 

}

 

public override void Close()

 

{

_chunks.CompleteAdding();

try { _processingTask.Wait(); }

finally { base.Close(); }

 

 

}

}

 

 

The constructor stores the underlying stream to be written to. It then sets up the necessary components of the

parallel pipeline. First, it creates a BlockingCollection<byte[]> to store all of the data chunks to be written. Then, it

 

Patterns of Parallel Programming Page 57

 

 

launches a long-running Task that continually pulls from the collection and writes each chunk out to the underlying

stream. The Write method copies the provided input data into a new array which it enqueues to the

BlockingCollection; by default, BlockingCollection uses a queue data structure under the covers, maintaining first-

in-first-out (FIFO) semantics, so the data will be written to the underlying stream in the same order it’s added to

the collection, a property important for dealing with streams which have an implicit ordering. Finally, closing the

stream marks the BlockingCollection as complete for adding, which will cause the consuming loop in the Task

launched in the constructor to cease as soon as the collection is empty, and then waits for the Task to complete;

this ensures that all data is written to the underlying stream before the underlying stream is closed, and it

propagates any exceptions that may have occurred during processing.

 

With our TransferStream in place, we can now use it to parallelize our compression/encryption snippet shown

earlier:

 

C#

 

static void CompressAndEncrypt(string inputFile, string outputFile)

 

{

using (var input = File.OpenRead(inputFile))

using (var output = File.OpenWrite(outputFile))

using (var rijndael = new RijndaelManaged())

using (var transform = rijndael.CreateEncryptor())

using (var encryptor =

 

 

new CryptoStream(output, transform, CryptoStreamMode.Write))

 

using (var threadTransfer = new TransferStream(encryptor))

 

using (var compressor =

new GZipStream(threadTransfer, CompressionMode.Compress, true))

input.CopyTo(compressor);

 

}

 

With those simple changes, we’ve now modified the operation so that both the compression and the encryption

may run in parallel. Of course, it’s important to note here that there are implicit limits on how much speedup I can

achieve from this kind of parallelization. At best the code is doing only two elements of work concurrently,

overlapping the compression with encryption, and thus even on a machine with more than two cores, the best

speedup I can hope to achieve is 2x. Note, too, that I could use additional transfer streams in order to read

concurrently with compressing and to write concurrently with encrypting, as such:

 

C#

 

static void CompressAndEncrypt(string inputFile, string outputFile)

 

{

using (var input = File.OpenRead(inputFile))

using (var output = File.OpenWrite(outputFile))

 

 

using (var t2 = new TransferStream(output))

 

using (var rijndael = new RijndaelManaged())

using (var transform = rijndael.CreateEncryptor())

using (var encryptor =

 

new CryptoStream(t2, transform, CryptoStreamMode.Write))

using (var t1 = new TransferStream(encryptor))

 

using (var compressor =

new GZipStream(t1, CompressionMode.Compress, true))

using (var t0 = new TransferStream(compressor))

input.CopyTo(t0);

}

 

Benefits of doing this might manifest if I/O is a bottleneck.

 

Patterns of Parallel Programming Page 58

 

 

I P R O D U C E R C O N S U M E R C O L L E C T I O N <T>

 

 

As mentioned, BlockingCollection<T> defaults to using a queue as its storage mechanism, but arbitrary storage

mechanisms are supported. This is done utilizing a new interface in the .NET Framework 4, passing an instance of

an implementing type to the BlockingCollection’s constructor:

 

C#

 

public interface IProducerConsumerCollection<T> :

 

IEnumerable<T>, ICollection, IEnumerable

 

{

 

bool TryAdd(T item);

 

bool TryTake(out T item);

 

T[] ToArray();

 

void CopyTo(T[] array, int index);

 

}

 

public class BlockingCollection<T> : //…

 

{

 

//…

 

public BlockingCollection(

 

IProducerConsumerCollection<T> collection);

 

public BlockingCollection(

 

IProducerConsumerCollection<T> collection, int boundedCapacity);

 

//…

 

}

 

Aptly named to contain the name of this pattern, IProducerConsumerCollection<T> represents a collection used in

producer/consumer implementations, where data will be added to the collection by producers and taken from it

by consumers. Hence, the primary two methods on the interface are TryAdd and TryTake, both of which must be

implemented in a thread-safe and atomic manner.

 

The .NET Framework 4 provides three concrete implementations of this interface:

ConcurrentQueue<T>, ConcurrentStack<T>, and ConcurrentBag<T>.

ConcurrentQueue<T> is the implementation of the interface used by default by

BlockingCollection<T>, providing first-in-first-out (FIFO) semantics.

ConcurrentStack<T> provides last-in-first-out (LIFO) behavior, and

ConcurrentBag<T> eschews ordering guarantees in favor of improved performance

in various use cases, in particular those in which the same thread will be acting as

both a producer and a consumer.

 

In addition to BlockingCollection<T>, other data structures may be built around IProducerConsumerCollection<T>.

For example, an object pool is a simple data structure that’s meant to allow object reuse. We could build a

concurrent object pool by tying it to a particular storage type, or we can implement one in terms of

IProducerConsumerCollection<T>.

 

C#

 

public sealed class ObjectPool<T>

 

{

 

private Func<T> _generator;

 

private IProducerConsumerCollection<T> _objects;

 

Patterns of Parallel Programming Page 59

 

 

public ObjectPool(Func<T> generator)

: this(generator, new ConcurrentQueue<T>()) { }

 

 

public ObjectPool(

 

Func<T> generator, IProducerConsumerCollection<T> storage)

 

{

 

if (generator == null) throw new ArgumentNullException(“generator”);

 

if (storage == null) throw new ArgumentNullException(“storage”);

 

_generator = generator;

 

_objects = storage;

 

}

 

public T Get()

 

{

 

T item;

 

if (!_objects.TryTake(out item)) item = _generator();

 

return item;

 

}

 

public void Put(T item) { _objects.TryAdd(item); }

}

 

 

By parameterizing the storage in this manner, we can adapt our ObjectPool<T> based on use cases and the

associated strengths of the collection implementation. For example, for doing a graphics-intensive UI application,

we may want to render to buffers on background threads and then “bitblip” those buffers onto the UI on the UI

thread. Given the likely size of these buffers, rather than continually allocating large objects and forcing the

garbage collector to clean up after me, we can pool them. In this case, a ConcurrentQueue<T> is a likely choice for

the underlying storage. Conversely, if the pool were being used in a concurrent memory allocator to cache objects

of varying sizes, I don’t need the FIFO-ness of ConcurrentQueue<T>, and I would be better off with a data

structure that minimizes synchronization between threads; for this purpose, ConcurrentBag<T> might be ideal.

 

Under the covers, ConcurrentBag<T> utilizes a list of instances of T per thread. Each

thread that accesses the bag is able to add and remove data in relative isolation from

other threads accessing the bag. Only when a thread tries to take data out and its

local list is empty will it go in search of items from other threads (the implementation

makes the thread-local lists visible to other threads for only this purpose). This might

sound familiar: ConcurrentBag<T> implements a pattern very similar to the workstealing

algorithm employed by the the .NET Framework 4 ThreadPool.

 

While accessing the local list is relatively inexpensive, stealing from another thread’s

 

list is relatively quite expensive. As a result, ConcurrentBag<T> is best for situations

where each thread only needs its own local list the majority of the time . In the object

pool example, to assist with this it could be worthwhile for every thread to initially

populate the pool with some objects, such that when it later gets and puts objects, it

will be dealing predominantly with its own queue.

 

Patterns of Parallel Programming Page 60

 

 

P R O D U C E R / C O N S U M E R E V E R Y W H E R E

 

 

If you’ve written a Windows-based application, it’s extremely likely you’ve used the producer/consumer pattern,

potentially without even realizing it. Producer/consumer has many prominent implementations.

 

T H R E A D P O O L S

 

If you’ve used a thread pool, you’ve used a quintessential implementation of the producer/consumer pattern . A

thread pool is typically engineered around a data structure containing work to be performed. Every thread in the

pool monitors this data structure, waiting for work to arrive. When work does arrive, a thread retrieves the work,

processes it, and goes back to wait for more. In this capacity, the work that’s being produced is consumed by the

threads in the pool and executed. Utilizing the BlockingCollection<T> type we’ve already seen, it’s straightforward

to build a simple, no-frills thread pool:

 

C#

 

public static class SimpleThreadPool

 

{

 

private static BlockingCollection<Action> _work =

 

new BlockingCollection<Action>();

 

static SimpleThreadPool()

 

{

 

for (int i = 0; i < Environment.ProcessorCount; i++)

 

{

 

new Thread(() =>

 

{

 

foreach (var action in _work.GetConsumingEnumerable())

 

{

 

action();

 

}

 

}) { IsBackground = true }.Start();

 

}

 

}

 

public static void QueueWorkItem(Action workItem) { _work.Add(workItem); }

}

 

In concept, this is very similar to how the ThreadPool type in the .NET Framework 3.5 and earlier operated. In the

.NET Framework 4, the data structure used to store the work to be executed is more distributed. Rather than

maintaining a single global queue, as is done in the above example, the ThreadPool in .NET Framework 4 maintains

not only a global queue but also a queue per thread. Work generated outside of the pool goes into the global

queue as it always did, but threads in the pool can put their generated work into the thread-local queues rather

than into the global queues. When threads go in search of work to be executed, they first examine their local

queue, and only if they don’t find anything there, they then check the global queue. If the global queue is found to

be empty, the threads are then also able to check the queues of their peers, “stealing” work from other threads in

 

order to stay busy. This work-stealing approach can provide significant benefits in the form of both minimized

contention and synchronization between threads (in an ideal workload, threads can spend most of their time

working on their own local queues) as well as cache utilization. (You can approximate this behavior with the

SimpleThreadPool by instantiating the BlockingCollection<Action> with an underlying ConcurrentBag<Action>

rather than utilizing the default ConcurrentQueue<Action>.)

 

Patterns of Parallel Programming Page 61

 

 

In the previous paragraph, we said that “threads in the pool can put their generated

work into the thread-local queues,” not that they necessarily do. In fact, the

ThreadPool.QueueUserWorkItem method is unable to take advantage of this workstealing

support. The functionality is only available through Tasks, for which it is

turned on by default. This behavior can be disabled on a per-Task basis using

TaskCreationOptions.PreferFairness.

 

By default, Tasks execute in the ThreadPool using these internal work-stealing queues. This functionality isn’t

hardwired into Tasks, however. Rather, the functionality is abstracted through the TaskScheduler type. Tasks

execute on TaskSchedulers, and the .NET Framework 4 comes with a built-in TaskScheduler that targets this

functionality in the ThreadPool; this implementation is what’s returned from the TaskScheduler.Default property,

and as this property’s name implies, this is the default scheduler used by Tasks. As with anything where someone

talks about a “default,” there’s usually a mechanism to override the default, and that does in fact exist for Task

execution. It’s possible to write custom TaskScheduler implementations to execute Tasks in whatever manner is

needed by the application.

 

TaskScheduler itself embodies the concept of producer/consumer. As an abstract class, it provides several abstract

methods that must be overridden and a few virtual methods that may be. The primary abstract method is called

QueueTask, and is used by the rest of the .NET Framework infrastructure, acting as the producer, to queue tasks

into the scheduler. The scheduler implementation then acts as the consumer, executing those tasks in whatever

manner it sees fit. We can build a very simple, no frills TaskScheduler, based on the previously shown

SimpleThreadPool, simply by delegating from QueueTask to QueueWorkItem, using a delegate that executes the

task:

 

C#

 

public sealed class SimpleThreadPoolTaskScheduler : TaskScheduler

 

{

protected override void QueueTask(Task task)

{

 

 

SimpleThreadPool.QueueWorkItem(() => base.TryExecuteTask(task));

}

 

protected override bool TryExecuteTaskInline(

Task task, bool taskWasPreviouslyQueued)

{

return base.TryExecuteTask(task);

}

 

 

protected override IEnumerable<Task> GetScheduledTasks()

{

throw new NotSupportedException();

}

}

 

We can then produce tasks to be run on an instance of this scheduler:

 

C#

 

var myScheduler = new SimpleThreadPoolTaskScheduler();

var t = new Task(() => Console.WriteLine(“hello, world”));

t.Start(myScheduler);

 

 

Patterns of Parallel Programming Page 62

 

 

The TaskFactory class, a default instance of which is returned from the static Task.Factory property, may also be

instantiated with a TaskScheduler instance. This then allows us to easily utilize all of the factory methods while

targeting a custom scheduler:

 

C#

 

var factory = new TaskFactory(new SimpleThreadPoolTaskScheduler());

factory.StartNew(() => Console.WriteLine(“hello, world”));

 

U I M A R S H A L I N G

 

If you’ve written a responsive Windows-based application, you’ve already taken advantage of the

producer/consumer pattern. With both Windows Forms and Windows Presentation Foundation (WPF), UI controls

must only be accessed from the same thread that created them, a form of thread affinity. This is problematic for

several reasons, one of the most evident having to do with UI responsiveness. To write a response application, it’s

typically necessary to offload work from the UI thread to a background thread, in order to allow that UI thread to

continue processing Windows messages that cause the UI to repaint, to respond to mouse input, and so on. That

processing occurs with code referred to as a Windows message loop. While the work is executing in the

background, it may need to update visual progress indication in the UI, and when it completes, it may need to

refresh the UI in some manner. Those interactions often require the manipulation of controls that were created on

the UI thread, and as a result, the background thread must marshal calls to those controls to the UI thread.

 

Both Windows Forms and WPF provide mechanisms for doing this. Windows Forms provides the instance Invoke

method on the Control class. This method accepts a delegate, and marshals the execution of that delegate to the

right thread for that Control, as demonstrated in the following Windows-based application that updates a label on

the UI thread every second:

 

C#

 

using System;

using System.Drawing;

using System.Threading;

using System.Windows.Forms;

 

static class Program

 

{

[STAThread]

static void Main(string[] args)

{

 

var form = new Form();

var lbl = new Label()

{

 

 

Dock = DockStyle.Fill,

 

TextAlign = ContentAlignment.MiddleCenter

};

form.Controls.Add(lbl);

var handle = form.Handle;

 

ThreadPool.QueueUserWorkItem(_ =>

 

{

while (true)

{

 

 

lbl.Invoke((Action)delegate

{

 

Patterns of Parallel Programming Page 63

 

 

lbl.Text = DateTime.Now.ToString();

});

Thread.Sleep(1000);

 

}

});

 

 

form.ShowDialog();

}

}

 

 

The Invoke call is synchronous, in that it won’t return until the delegate has completed execution. There is also a

BeginInvoke method, which runs the delegate asynchronously.

 

This mechanism is itself a producer/consumer implementation. Windows Forms maintains a queue of delegates to

be processed by the UI thread. When Invoke or BeginInvoke is called, it puts the delegate into this queue, and

sends a Windows message to the UI thread. The UI thread’s message loop eventually processes this message,

which tells it to dequeue a delegate from the queue and execute it. In this manner, the thread calling Invoke or

BeginInvoke is the producer, the UI thread is the consumer, and the data being produced and consumed is the

delegate.

 

The particular pattern of producer/consumer employed by Invoke has a special

name, “rendezvous,” which is typically used to signify multiple threads that meet to

exchange data bidirectionally. The caller of Invoke is providing a delegate and is

potentially getting back the result of that delegate’s invocation. The UI thread is

receiving a delegate and is potentially handing over the delegate’s result. Neither

thread may progress past the rendezvous point until the data has been fully

exchanged.

 

This producer/consumer mechanism is available for WPF as well, through the Dispatcher class, which similarly

provides Invoke and BeginInvoke methods. To abstract away this functionality and to make it easier to write

components that need to marshal to the UI and that must be usable in multiple UI environments, the .NET

Framework provides the SynchronizationContext class. SynchronizationContext provides Send and Post methods,

which map to Invoke and BeginInvoke, respectively. Windows Forms provides an internal

SynchronizationContext-derived type called WindowsFormsSynchronizationContext, which overrides Send to call

Control.Invoke and which overrides Post to call Control.BeginInvoke. WPF provides a similar type. With this in

hand, a library can be written in terms of SynchronizationContext, and can then be supplied with the right

SynchronziationContext at runtime to ensure it’s able to marshal appropriately to the UI in the current

environment.

 

SynchronizationContext may also be used for other purposes, and in fact there are

other implementations of it provided in the .NET Framework for non-UI related

purposes. For this discussion, however, we’ll continue to refer to

SynchronizationContext pertaining only to UI marshaling.

 

To facilitate this, the static SynchronizationContext.Current property exists to help code grab a reference to a

SynchronizationContext that may be used to marshal to the current thread. Both Windows Forms and WPF set

this property on the UI thread to the relevant SynchronizationContext instance. Code may then get the value of

 

Patterns of Parallel Programming Page 64

 

 

this property and use it to marshal work back to the UI. As an example, I can rewrite the previous example by using

SynchronizationContext.Send rather than explicitly using Control.Invoke:

 

C#

 

[STAThread]

static void Main(string[] args)

{

 

 

var form = new Form();

var lbl = new Label()

{

 

 

Dock = DockStyle.Fill,

 

TextAlign = ContentAlignment.MiddleCenter

};

form.Controls.Add(lbl);

var handle = form.Handle;

 

 

var sc = SynchronizationContext.Current;

 

ThreadPool.QueueUserWorkItem(_ =>

 

{

while (true)

{

 

 

sc.Send(delegate

{

 

 

lbl.Text = DateTime.Now.ToString();

 

}, null);

 

Thread.Sleep(1000);

}

});

 

 

form.ShowDialog();

}

 

 

As mentioned in the previous section, custom TaskScheduler types may be implemented to supply custom

consumer implementations for Tasks being produced. In addition to the default implementation of TaskScheduler

that targets the .NET Framework ThreadPool’s internal work-stealing queues, the .NET Framework 4 also includes

the TaskScheduler.FromCurrentSynchronizationContext method, which generates a TaskScheduler that targets

the current synchronization context. We can then take advantage of that functionality to further abstract the

previous example:

 

C#

 

[STAThread]

static void Main(string[] args)

{

 

 

var form = new Form();

var lbl = new Label()

{

 

 

Dock = DockStyle.Fill,

 

TextAlign = ContentAlignment.MiddleCenter

};

form.Controls.Add(lbl);

var handle = form.Handle;

 

 

var ui = new TaskFactory(

TaskScheduler.FromCurrentSynchronizationContext());

 

ThreadPool.QueueUserWorkItem(_ =>

 

Patterns of Parallel Programming Page 65

 

 

{

while (true)

{

 

 

ui.StartNew(() => lbl.Text = DateTime.Now.ToString());

Thread.Sleep(1000);

}

});

 

form.ShowDialog();

}

 

 

This ability to execute Tasks in various contexts also integrates very nicely with continuations and dataflow, for

example:

 

C#

 

Task.Factory.StartNew(() =>

{

 

// Run in the background a long computation which generates a result

 

return DoLongComputation();

}).ContinueWith(t =>

{

 

// Render the result on the UI

 

RenderResult(t.Result);

}, TaskScheduler.FromCurrentSynchronizationContext());

 

S Y S T E M E V E N T S

 

The Microsoft.Win32.SystemEvents class exposes a plethora of static events for being notified about happenings

in the system, for example:

 

C#

 

public static event EventHandler DisplaySettingsChanged;

public static event EventHandler DisplaySettingsChanging;

public static event EventHandler EventsThreadShutdown;

public static event EventHandler InstalledFontsChanged;

public static event EventHandler PaletteChanged;

public static event PowerModeChangedEventHandler PowerModeChanged;

public static event SessionEndedEventHandler SessionEnded;

public static event SessionEndingEventHandler SessionEnding;

public static event SessionSwitchEventHandler SessionSwitch;

public static event EventHandler TimeChanged;

public static event TimerElapsedEventHandler TimerElapsed;

public static event UserPreferenceChangedEventHandler UserPreferenceChanged;

public static event UserPreferenceChangingEventHandler UserPreferenceChanging;

 

 

The Windows operating system notifies applications of the conditions that lead to most of these events through

Windows messages, as discussed in the previous section. To receive these messages, the application must make

sure it has a window to which the relevant messages can be broadcast, and a message loop running to process

them. Thus, if you subscribe to one of these events, even in an application without UI, SystemEvents ensures that

a broadcast window has been created and that a thread has been created to run a message loop for it. That thread

then waits for messages to arrive and consumes them by translating them into the proper .NET Framework objects

and invoking the relevant event. When you register an event handler with an event on SystemEvents, in a strong

sense you’re then implementing the consumer side of this multithreaded, producer/consumer implementation.

 

Patterns of Parallel Programming Page 66

 

 

A G G R E G A T I O N S A G G R E G A T I O N S

Combining data in one way or another is very common in applications, and aggregation is an extremely common

need in parallel applications. In parallel systems, work is divided up, processed in parallel, and the results of these

intermediate computations are then combined in some manner to achieve a final output.

 

In some cases, no special work is required for the last step. For example, if a parallel for loop iterates from 0 to N,

and the ith result is stored into the resulting array’s ith slot, the aggregation of results into the output array can be

done in parallel with no additional work: the locations in the output array may all be written to independently, and

no two parallel iterations will attempt to store into the same index.

 

In many cases, however, special work is required to ensure that the results are aggregated safely. There are several

common patterns for achieving such aggregations.

 

O U T P U T T I N G A S E T O F R E S U L T S

 

 

A common coding pattern in sequential code is of the following form, where some input data is processed, and the

results are stored into an output collection:

 

C#

 

var output = new List<TOutput>();

foreach (var item in input)

{

 

 

var result = Compute(item);

output.Add(result);

}

 

 

If the size of the input collection is known in advance, this can be converted into an instance of the

aforementioned example, where the results are stored directly into the corresponding slots in the output:

 

C#

 

var output = new TOutput[input.Count];

for (int i = 0; i < input.Count; i++)

{

 

 

var result = Compute(input[i]);

output[i] = result;

 

 

}

 

This then makes parallelization straightforward, at least as it pertains to aggregation of the results:

 

C#

 

var output = new TOutput[input.Count];

 

Parallel.For(0, input.Count, i =>

 

{

 

var result = Compute(input[i]);

output[i] = result;

});

 

 

However, this kind of transformation is not always possible. In cases where the input size is not known or where

the input collection may not be indexed into, an output collection is needed that may be modified from multiple

 

Patterns of Parallel Programming Page 67

 

 

threads. This may be done using explicit synchronization to ensure the output collection is only modified by a

single thread at a time:

 

C#

 

var output = new List<TOutput>();

 

Parallel.ForEach(input, item =>

 

{

 

var result = Compute(item);

 

lock (output) output.Add(result);

 

});

 

If the amount of computation done per item is significant, the cost of this locking is likely to be negligible.

However, as the amount of computation per item decreases, the overhead of taking and releasing a lock becomes

more relevant, and contention on the lock increases as more threads are blocked waiting to acquire it

concurrently. To decrease these overheads and to minimize contention, the new thread-safe collections in the

.NET Framework 4 may be used. These collections reside in the System.Collections.Concurrent namespace, and

are engineered to be scalable, minimizing the impact of contention. Some of these collections are implemented

with lock-free techniques, while others are implemented using fine-grained locking.

 

Amongst these new collections, there’s no direct corollary to the List<T> type. However, there are several

collections that address many of the most common usage patterns for List<T>. If you reexamine the previous code

 

snippet, you’ll notice that the output ordering from the serial code is not necessarily maintained in the parallel

 

version. This is because the order in which the data is stored into the output list is no longer based solely on the

order of the data in the input, but also on the order in which the parallel loop chooses to process the elements,

how partitioning occurs, and how long each element takes to process. Once we’ve accepted this issue and have

coded the rest of the application to not rely on the output ordering, our choices expand for what collection to use

to replace the list. Here I’ll use the new ConcurrentBag<T> type:

 

C#

 

var output = new ConcurrentBag<TOutput>();

Parallel.ForEach(input, item =>

{

 

 

var result = Compute(item);

output.Add(result);

});

 

 

All of the synchronization necessary to ensure the consistency of the output data structure is handled internally by

the ConcurrentBag.

 

O U T P U T T I N G A S I N G L E R E S U L T

 

 

Many algorithms output a single result, rather than a single collection. For example, consider the following serial

routine to estimate the value of Pi:

 

C#

 

const int NUM_STEPS = 100000000;

 

static double SerialPi()

{

 

 

double sum = 0.0;

double step = 1.0 / (double)NUM_STEPS;

 

 

Patterns of Parallel Programming Page 68

 

 

for (int i = 0; i < NUM_STEPS; i++)

{

 

 

double x = (i + 0.5) * step;

double partial = 4.0 / (1.0 + x * x);

sum += partial;

 

 

}

return step * sum;

}

 

 

The output of this operation is a single double value. This value is the sum of millions of independent operations,

and thus should be parallelizable. Here is a naïve parallelization:

 

C#

 

static double NaiveParallelPi()

 

{

 

double sum = 0.0;

 

double step = 1.0 / (double)NUM_STEPS;

 

object obj = new object();

 

Parallel.For(0, NUM_STEPS, i =>

 

{

double x = (i + 0.5) * step;

double partial = 4.0 / (1.0 + x * x);

lock (obj) sum += partial;

 

 

});

 

return step * sum;

}

 

 

We say “naïve” here, because while this solution is correct, it will also be extremely slow. Every iteration of the

parallel loop does only a few real cycles worth of work, made up of a few additions, multiplications, and divisions,

and then takes a lock to accumulate that iteration’s result into the overall result. The cost of that lock will

dominate all of the other work happening in the parallel loop, largely serializing it, such that parallel version will

likely run significantly slower than the sequential.

 

To fix this, we need to minimize the amount of synchronization necessary. That can be achieved by maintaining

local sums. We know that certain iterations will never be in conflict with each other, namely those running on the

same underlying thread (since a thread can only do one thing at a time), and thus we can maintain a local sum per

thread or task being used under the covers in Parallel.For. Given the prevalence of this pattern, Parallel.For

actually bakes in support for it. In addition to passing to Parallel.For a delegate for the body, you can also pass in a

delegate that represents an initialization routine to be run on each task used by the loop, and a delegate that

represents a finalization routine that will be run at the end of the task when no more iterations will be executed in

it.

 

C#

 

public static ParallelLoopResult For<TLocal>(

int fromInclusive, int toExclusive,

Func<TLocal> localInit,

Func<int, ParallelLoopState, TLocal, TLocal> body,

Action<TLocal> localFinally);

 

 

The result of the initialization routine is passed to the first iteration run by that task, the output of that iteration is

passed to the next iteration, the output of that iteration is passed to the next, and so on, until finally the last

iteration passes its result to the localFinally delegate.

 

Patterns of Parallel Programming Page 69

 

 

Parallel.ForTask 1localInitIteration AIteration BIteration N…

localFinallyTask NlocalInitIteration AIteration BIteration N…

localFinally…

In this manner, a partial result can be built up on each task, and only combined with the partials from other tasks

at the end. Our Pi example can thusly be implemented as follows:

 

C#

 

static double ParallelPi()

 

{

double sum = 0.0;

double step = 1.0 / (double)NUM_STEPS;

object obj = new object();

Parallel.For(0, NUM_STEPS,

 

 

() => 0.0,

(i, state, partial) =>

{

 

 

double x = (i + 0.5) * step;

 

return partial + 4.0 / (1.0 + x * x);

},

partial => { lock (obj) sum += partial; });

 

 

return step * sum;

}

 

 

The localInit delegate returns an initialized value of 0.0. The body delegate calculates its iteration’s result, adds it

to the partial result it was passed in (which either directly from the result of localInit or from the previous iteration

on the same task), and returns the updated partial. The localFinally delegate takes the completed partial, and only

then synchronizes with other threads to combine the partial sum into the total sum.

 

Earlier in this document we saw the performance ramifications of having a very small delegate body. This Pi

calculation is an example of that case, and thus we can likely achieve better performance using the batching

pattern described previously.

 

C#

 

static double ParallelPartitionerPi()

{

 

 

double sum = 0.0;

 

Patterns of Parallel Programming Page 70

 

 

double step = 1.0 / (double)NUM_STEPS;

object obj = new object();

 

 

Parallel.ForEach(Partitioner.Create(0, NUM_STEPS),

 

() => 0.0,

 

(range, state, partial) =>

 

{

for (int i = range.Item1; i < range.Item2; i++)

{

 

double x = (i + 0.5) * step;

 

partial += 4.0 / (1.0 + x * x);

}

return partial;

 

},

partial => { lock (obj) sum += partial; });

return step * sum;

}

 

P L I N Q A G G R E G A T I O N S

 

 

Any time you find yourself needing to aggregate, think PLINQ. For many problems, aggregation is one of several

areas in which PLINQ excels, with a plethora of aggregation support built-in.

 

T O A R R A Y / T O L I S T / T O D I C T I O N A R Y / T O L O O K UP

 

As does LINQ to Objects, PLINQ provides four “To*” methods that may be used to aggregate all of the output from

a query into a single data structure. PLINQ internally handles all of the relevant synchronization. For example, here

is the previous example of storing all results into a List<T>:

 

C#

 

var output = new List<TOutput>();

foreach (var item in input)

{

 

 

var result = Compute(item);

output.Add(result);

 

}

 

This may be converted to a LINQ implementation as follows:

 

C#

 

var output = input

.Select(item => Compute(item))

.ToList();

 

And then it can be parallelized with PLINQ:

 

C#

 

var output = input.AsParallel()

.Select(item => Compute(item))

.ToList();

 

 

In fact, not only does PLINQ handle all of the synchronization necessary to do this aggregation safely, it can also be

used to automatically regain the ordering we lost in our parallelized version when using Parallel.ForEach:

 

Patterns of Parallel Programming Page 71

 

 

C#

 

 

var output = input.AsParallel().AsOrdered()

.Select(item => Compute(item))

.ToList();

 

S I N G L E -V A L U E A G G R E G A T I O N S

 

Just as LINQ and PLINQ are useful for aggregating sets of output, they are also quite useful for aggregating down to

a single value, with operators including but not limited to Average, Sum, Min, Max, and Aggregate. As an example,

the same Pi calculation can be done using LINQ:

 

C#

 

static double SerialLinqPi()

 

{

double step = 1.0 / (double)NUM_STEPS;

return Enumerable.Range(0, NUM_STEPS).Select(i =>

{

 

double x = (i + 0.5) * step;

return 4.0 / (1.0 + x * x);

}).Sum() * step;

}

 

 

With a minimal modification, PLINQ can be used to parallelize this:

 

C#

 

static double ParallelLinqPi()

 

{

double step = 1.0 / (double)NUM_STEPS;

return ParallelEnumerable.Range(0, NUM_STEPS).Select(i =>

{

 

 

double x = (i + 0.5) * step;

return 4.0 / (1.0 + x * x);

}).Sum() * step;

}

 

 

This parallel implementation does scale nicely as compared to the serial LINQ version. However, if you test the

serial LINQ version and compare its performance against the previously shown serial for loop version, you’ll find

that the serial LINQ version is significantly more expensive; this is largely due to all of the extra delegate

invocations involved in its execution. We can create a hybrid solution that utilizes PLINQ to creation partitions and

sum partial results but creates the individual partial results on each partition using a for loop:

 

C#

 

static double ParallelPartitionLinqPi()

 

{

double step = 1.0 / (double)NUM_STEPS;

return Partitioner.Create(0, NUM_STEPS).AsParallel().Select(range =>

{

 

double partial = 0.0;

for (int i = range.Item1; i < range.Item2; i++)

{

 

 

double x = (i + 0.5) * step;

partial += 4.0 / (1.0 + x * x);

 

 

Patterns of Parallel Programming Page 72

 

 

}

return partial;

}).Sum() * step;

}

 

A G G R E G A T E

 

Both LINQ and PLINQ may be used for arbitrary aggregations using the Aggregate method. Aggregate has several

overloads, including several unique to PLINQ that provide more support for parallelization. PLINQ assumes that the

aggregation delegates are both associative and commutative; this limits the kinds of operations that may be

performed, but also allows PLINQ to optimize its operation in ways that wouldn’t otherwise be possible if it

couldn’t make these assumptions.

 

The most advanced PLINQ overload of Aggregate is very similar in nature and purpose to the Parallel.ForEach

overload that supports localInit and localFinally delegates:

 

C#

 

public static TResult Aggregate<TSource, TAccumulate, TResult>(

 

this ParallelQuery<TSource> source,

 

Func<TAccumulate> seedFactory,

 

Func<TAccumulate, TSource, TAccumulate> updateAccumulatorFunc,

 

Func<TAccumulate, TAccumulate, TAccumulate> combineAccumulatorsFunc,

 

Func<TAccumulate, TResult> resultSelector);

 

The seedFactory delegate is the logical equivalent of localInit, executed once per partition to provide a seed for

the aggregation accumulator on that partition. The updateAccumulatorFunc is akin to the body delegate, provided

with the current value of the accumulator and the current element, and returning the updated accumulator value

based on incorporating the current element. The combineAccumulatorsFunc is logically equivalent to the

localFinally delegate, combining the results from multiple partitions (unlike localFinally, which is given the current

task’s final value and may do with it what it chooses, this delegate accepts two accumulator values and returns the

aggregation of the two). And finally, the resultSelector takes the total accumulation and processes it into a result

value. In many scenarios, TAccumulate will be TResult, and this resultSelector will simply return its input.

 

As a concrete case for where this aggregation operator is useful, consider a common pattern: the need to take the

best N elements output from a query. An example of this might be in a spell checker. Given an input word list,

compare the input text against each word in the dictionary and compute a distance metric between the two. We

then want to select out the best results to be displayed to the user as options. One approach to implementing this

with PLINQ would be as follows:

 

C#

 

var bestResults = dictionaryWordList

.Select(word => new { Word = word, Distance = GetDistance(word, text) })

 

.TakeTop(p => -p.Distance, NUM_RESULTS_TO_RETURN)

 

.Select(p => p.Word)

.ToList();

 

 

In the previous example, TakeTop is implemented as:

 

C#

 

public static IEnumerable<TSource> TakeTop<TSource, TKey>(

this ParallelQuery<TSource> source,

 

 

Patterns of Parallel Programming Page 73

 

 

Func<TSource, TKey> keySelector,

 

int count)

 

{

 

return source.OrderBy(keySelector).Take(count);

 

}

 

The concept of “take the top N” here is implemented by first sorting all of the result using OrderBy and then taking

the first N results. This may be overly expensive, however. For a large word list of several hundred thousand

words, we’re forced to sort the entire result set, and sorting has relatively high computational complexity. If we’re

only selecting out a handful of results, we can do better. For example, in a sequential implementation we could

simply walk the result set, keeping track of the top N along the way. We can implement this in parallel by walking

each partition in a similar manner, keeping track of the best N from each partition. An example implementation of

this approach is included in the Parallel Extensions samples at http://code.msdn.microsoft.com/ParExtSamples,

and the relevant portion is shown here:

 

C#

 

public static IEnumerable<TSource> TakeTop<TSource, TKey>(

this ParallelQuery<TSource> source,

Func<TSource, TKey> keySelector,

int count)

 

 

{

return source.Aggregate(

 

 

// seedFactory

 

() => new SortedTopN<TKey,TSource>(count),

 

// updateAccumulatorFunc

 

(accum, item) =>

 

{

accum.Add(keySelector(item), item);

return accum;

 

 

},

 

// combineAccumulatorsFunc

 

(accum1, accum2) =>

 

{

foreach (var item in accum2) accum1.Add(item);

return accum1;

 

},

 

// resultSelector

 

(accum) => accum.Values);

}

 

 

The seedFactory delegate, called once for each partition, generates a new data structure to keep track of the top

count items added to it. Up until count items, all items added to the collection get stored. Beyond that, every time

a new item is added, it’s compared against the least item currently stored, and if it’s greater than it, the least item

is bumped out and the new item is stored in its place. The updateAccumulatorFunc simply adds the current item

to the data structure accumulator (according to the rules of only maintaining the top N). The

combineAccumulatorsFunc combines two of these data structures by adding all of the elements from one to the

other and then returning that end result. And the resultSelector simply returns the set of values from the ultimate

resulting accumulator.

 

Patterns of Parallel Programming Page 74

 

 

M A P R E D U C E M A P R E D U C E

The “MapReduce” pattern was introduced to handle large-scale computations across a cluster of servers, often

involving massive amounts of data. The pattern is relevant even for a single multi-core machine, however. Here is a

description of the pattern’s core algorithm:

 

“The computation takes a set of input key/value pairs, and produces a set of output key/value pairs. The

 

user of the MapReduce library expresses the computation as two functions: Map and Reduce.

 

Map, written by the user, takes an input pair and produces a set of intermediate key/value pairs. The

MapReduce library groups together all intermediate values associated with the same intermediate key I

and passes them to the Reduce function.

 

The Reduce function, also written by the user, accepts an intermediate key I and a set of values for that

key. It merges together these values to form a possibly smaller set of values. Typically just zero or one

output value is produced per Reduce invocation. The intermediate values are supplied to the user’s

Reduce function via an iterator.”

 

Dean, J. and Ghemawat, S. 2008. MapReduce: simplified data processing on large clusters. Commun. ACM

51, 1 (Jan. 2008), 107-113. DOI= http://doi.acm.org/10.1145/1327452.1327492

 

I M P L E M E N T I N G M A P R E D U C E W I T H P L I N Q

 

 

The core MapReduce pattern (and many variations on it) is easily implemented with LINQ, and thus with PLINQ. To

see how, we’ll break apart the description of the problem as shown previously.

 

The description of the Map function is that it takes a single input value and returns a set of mapped values: this is

the purpose of LINQ’s SelectMany operator, which is defined as follows:

 

C#

 

public static IEnumerable<TResult> SelectMany<TSource, TResult>(

this IEnumerable<TSource> source,

Func<TSource, IEnumerable<TResult>> selector);

 

Moving on, the MapReduce problem description highlights that results are then grouped according to an

intermediate key. That grouping operation is the purpose of the LINQ GroupBy operator:

 

C#

 

public static IEnumerable<IGrouping<TKey, TSource>> GroupBy<TSource, TKey>(

this IEnumerable<TSource> source,

Func<TSource, TKey> keySelector);

 

Finally, a reduction is performed by a function that takes each intermediate key and a set of values for that key,

and produces any number of outputs per key. Again, that’s the purpose of SelectMany.

 

We can put all of this together to implement MapReduce in LINQ:

 

C#

 

public static IEnumerable<TResult> MapReduce<TSource, TMapped, TKey, TResult>(

this IEnumerable<TSource> source,

 

Patterns of Parallel Programming Page 75

 

 

Func<TSource, IEnumerable<TMapped>> map,

Func<TMapped, TKey> keySelector,

Func<IGrouping<TKey, TMapped>, IEnumerable<TResult>> reduce)

 

 

{

 

return source.SelectMany(map)

.GroupBy(keySelector)

.SelectMany(reduce);

 

 

}

 

Parallelizing this new combined operator with PLINQ is as simply as changing the input and output types to work

with PLINQ’s ParallelQuery<> type instead of with LINQ’s IEnumerable<>:

 

C#

 

public static ParallelQuery<TResult> MapReduce<TSource, TMapped, TKey, TResult>(

this ParallelQuery<TSource> source,

Func<TSource, IEnumerable<TMapped>> map,

Func<TMapped, TKey> keySelector,

Func<IGrouping<TKey, TMapped>, IEnumerable<TResult>> reduce)

 

{

 

return source.SelectMany(map)

.GroupBy(keySelector)

.SelectMany(reduce);

 

 

}

 

U S I N G M A P R E D U C E

 

 

The typical example used to demonstrate a MapReduce implementation is a word counting routine, where a

bunch of documents are parsed, and the frequency of all of the words across all of the documents is summarized .

For this example, the map function takes in an input document and outputs all of the wor ds in that document. The

grouping phase groups all of the identical words together, such that the reduce phase can then count the words in

each group and output a word/count pair for each grouping:

 

C#

 

var files = Directory.EnumerateFiles(dirPath, “*.txt”).AsParallel();

 

var counts = files.MapReduce(

path => File.ReadLines(path).SelectMany(line => line.Split(delimiters)),

word => word,

group => new[] { new KeyValuePair<string, int>(group.Key, group.Count()) });

 

The tokenization here is done in a naïve fashion using the String.Split function, which accepts the list of characters

to use as delimiters. For this example, that list was generated using another LINQ query that generates an array of

all of the ASCII white space and punctuation characters:

 

C#

 

static char[] delimiters =

Enumerable.Range(0, 256).Select(i => (char)i)

.Where(c => Char.IsWhiteSpace(c) || Char.IsPunctuation(c))

.ToArray();

 

Patterns of Parallel Programming Page 76

 

 

D E P E N D E N C I E S D E P E N D E N C I E S

A dependency is the Achilles heel of parallelism. A dependency between two operations implies that one operation

can’t run until the other operation has completed, inhibiting parallelism. Many real-world problems have implicit

dependencies, and thus it’s important to be able to accommodate them and extract as much parallelism as is

possible. With the producer/consumer pattern, we’ve already explored one key solution to specific kinds of

dependencies. Here we’ll examine others.

 

D I R E C T E D A C Y C L I C G R A P H S

 

 

It’s very common in real-world problems to see patterns where dependencies between components form a

directed acyclic graph (DAG). As an example of this, consider compiling a solution of eight code projects. Some

projects have references to other projects, and thus depend on those projects being built first. The dependencies

are as follows:

 

.

Components 1, 2, and 3 depend on nothing else in the solution.

 

.

Component 4 depends on 1.

 

.

Component 5 depends on 1, 2, and 3.

 

.

Component 6 depends on 3 and 4.

 

.

Component 7 depends on 5 and 6 and has no dependencies on it.

 

.

Component 8 depends on 5 and has no dependencies on it.

 

This set of dependencies forms the following DAG (as rendered by the new Architecture tools in Visual Studio

2010):

 

 

If building each component is represented as a Task, we can take advantage of continuations to express as much

parallelism as is possible:

 

C#

 

var f = Task.Factory;

var build1 = f.StartNew(() => Build(project1));

var build2 = f.StartNew(() => Build(project2));

 

 

Patterns of Parallel Programming Page 77

 

 

var build3 = f.StartNew(() => Build(project3));

var build4 = f.ContinueWhenAll(new[] { build1 },

_ => Build(project4));

var build5 = f.ContinueWhenAll(new[] { build1, build2, build3 },

_ => Build(project5));

var build6 = f.ContinueWhenAll(new[] { build3, build4 },

_ => Build(project6));

var build7 = f.ContinueWhenAll(new[] { build5, build6 },

_ => Build(project7));

var build8 = f.ContinueWhenAll(new[] { build5 },

_ => Build(project8));

Task.WaitAll(build1, build2, build3, build4, build5, build6, build7, build8);

 

With this code, we immediately queue up work items to build the first three projects. As those projects complete,

projects with dependencies on them will be queued to build as soon as all of their dependencies are satisfied.

 

I T E R A T I N G I N L O C K S T EP

 

 

A common pattern in many algorithms is to have a series of operations that need to be done, from 0 to N, where

step i+1 can’t realistically be processed until step i has completed. This often occurs in image processing

algorithms, where processing one scan line of the image depends on the previous scan line having already been

processed. This also frequently occurs in analysis of a system over time, where each iteration represents another

step forward in time, and the world at iteration i+1 depends on the state of the world after iteration i.

 

An example of the latter is in simple modeling of the dissipation of heat across a metal plate, exemplified by the

following sequential code:

 

C#

 

float[,] SequentialSimulation(int plateSize, int timeSteps)

{

 

// Initial plates for previous and current time steps, with

// heat starting on one side

 

 

var prevIter = new float[plateSize, plateSize];

var currIter = new float[plateSize, plateSize];

for (int y = 0; y < plateSize; y++) prevIter[y, 0] = 255.0f;

 

 

// Run simulation

 

 

for (int step = 0; step < timeSteps; step++)

 

 

{

for (int y = 1; y < plateSize -1; y++)

{

 

 

for (int x = 1; x < plateSize -1; x++)

{

currIter[y, x] =

 

((prevIter[y, x -1] +

prevIter[y, x + 1] +

prevIter[y -1, x] +

prevIter[y + 1, x]) * 0.25f);

 

}

}

Swap(ref prevIter, ref currIter);

 

 

}

 

// Return results

 

Patterns of Parallel Programming Page 78

 

 

return prevIter;

}

 

 

private static void Swap<T>(ref T one, ref T two)

 

{

 

T tmp = one; one = two; two = tmp;

 

}

 

On close examination, you’ll see that this can actually be expressed as a DAG, since the cell [y,x] for time step i+1

can be computed as soon as the cells [y,x-1], [y,x+1], [y-1,x], and [y+1,x] from time step i are completed. However,

attempting this kind of parallelization can lead to significant complications. For one, the amount of computation

required per cell is very small, just a few array accesses, additions, and multiplications; creating a new Task for

such an operation is respectively a lot of overhead. Another significant complication is around memory

management. In the serial scheme shown, we only need to maintain two plate arrays, one storing the previous

iteration and one storing the current. Once we start expressing the problem as a DAG, we run into issues of

potentially needing plates (or at least portions of plates) for many generations.

 

An easier solution is simply to parallelize one or more of the inner loops, but not the outer loop . In effect, we can

parallelize each step of the simulation, just not all time steps of the simulation concurrently:

 

C#

 

// Run simulation

 

for (int step = 0; step < timeSteps; step++)

{

 

 

Parallel.For(1, plateSize -1, y =>

 

{

for (int x = 1; x < plateSize -1; x++)

{

 

 

currIter[y, x] =

 

((prevIter[y, x -1] +

prevIter[y, x + 1] +

prevIter[y -1, x] +

prevIter[y + 1, x]) * 0.25f);

 

}

});

Swap(ref prevIter, ref currIter);

 

 

}

 

Typically, this approach will be sufficient. For some kinds of problems, however, it can be more efficient (largely for

reasons of cache locality) to ensure that the same thread processes the same sections of iteration space on each

time step. We can accomplish that by using Tasks directly, rather than by using Parallel.For. For this heated plate

example, we spin up one Task per processor and assign each a portion of the plate’s size; each Task is responsible

for processing that portion at each time step. Now, we need some way of ensuring that each Task does not go on

to process its portion of the plate at iteration i+1 until all tasks have completed processing iteration i. For that

purpose, we can use the System.Threading.Barrier class that’s new to the .NET Framework 4:

 

C#

 

// Run simulation

 

int numTasks = Environment.ProcessorCount;

var tasks = new Task[numTasks];

 

 

var stepBarrier = new Barrier(numTasks, _ => Swap(ref prevIter, ref currIter));

 

int chunkSize = (plateSize -2) / numTasks;

for (int i = 0; i < numTasks; i++)

 

 

Patterns of Parallel Programming Page 79

 

 

{

int yStart = 1 + (chunkSize * i);

int yEnd = (i == numTasks -1) ? plateSize -1 : yStart + chunkSize;

 

tasks[i] = Task.Factory.StartNew(() =>

 

{

for (int step = 0; step < timeSteps; step++)

{

 

 

for (int y = yStart; y < yEnd; y++)

 

{

for (int x = 1; x < plateSize -1; x++)

{

 

currIter[y, x] =

 

((prevIter[y, x -1] +

prevIter[y, x + 1] +

prevIter[y -1, x] +

prevIter[y + 1, x]) * 0.25f);

 

}

}

 

 

stepBarrier.SignalAndWait();

 

}

});

}

 

 

Task.WaitAll(tasks);

 

Each Task calls the Barrier’s SignalAndWait method at the end of each time step, and the Barrier ensures that no

tasks progress beyond this point in a given iteration until all tasks have reached this point for that iteration .

Further, because we need to swap the previous and current plates at the end of every time step, we register that

swap code with the Barrier as a post-phase action delegate; the Barrier will run that code on one thread once all

Tasks have reached the Barrier in a given iteration and before it releases any Tasks to the next iteration.

 

D Y N A M I C P R O G R A M M I N G

 

 

Not to be confused with dynamic languages or with Visual Basic’s and C#’s for dynamic invocation, “dynamic

programming” in computer science is a classification for optimization algorithms that break down problems

recursively into smaller problems, caching (or “memoizing”) the results of those subproblems for future use, rather

than recomputing them every time they’re needed. Common dynamic programming problems include longest

common subsequence, matrix-chain multiplication, string edit distance, and sequence alignment. Dynamic

programming problems are ripe with dependencies, but these dependencies can be bested and typically don’t

prevent parallelization.

 

To demonstrate parallelization of a dynamic programming program, consider a simple implementation of the

Levenshtein edit distance algorithm:

 

C#

 

static int EditDistance(string s1, string s2)

 

{

int[,] dist = new int[s1.Length + 1, s2.Length + 1];

for (int i = 0; i <= s1.Length; i++) dist[i, 0] = i;

for (int j = 0; j <= s2.Length; j++) dist[0, j] = j;

 

for (int i = 1; i <= s1.Length; i++)

{

for (int j = 1; j <= s2.Length; j++)

 

 

Patterns of Parallel Programming Page 80

 

 

{

 

dist[i, j] = (s1[i -1] == s2[j -1]) ?

 

dist[i -1, j -1] :

 

1 +

Math.Min(dist[i -1, j],

 

Math.Min(dist[i, j -1],

 

dist[i -1, j -1]));

 

}

 

}

 

return dist[s1.Length, s2.Length];

 

}

 

This algorithm builds up a distance matrix, where the [i,j] entry represents the number of operations it would take

to transform the first i characters of string s1 into the first j characters of s2; an operation is defined as a single

character substitution, insertion, or deletion. To see how this works in action, consider computing the distance

between two strings, going from “PARALLEL” to “STEPHEN”. We start by initializing the first row to the values 0

through 8… these represent deletions (going from “P” to “” requires 1 deletion, going from “PA” to “” requires 2

deletions, going from “PAR” to “” requires 3 deletions, and so on). We also initialize the first column to the values

0 through 7… these represent additions (going from “” to “STEP” requires 4 additions, going from “” to “STEPHEN”

 

requires 7 additions, and so on).

 

P A R A L L E L

0 1 2 3 4 5 6 7 8

S 1

T 2

E 3

P 4

H 5

E 6

N 7

 

Now starting from cell [1,1] we walk down each column, calculating each cell’s value in order. Let’s call the two

strings s1 and s2. A cell’s value is based on two potential options:

 

1.

The two characters corresponding with that cell are the same. The value for this cell is the same as the

value for the diagonally previous cell, which represents comparing each of the two strings without the

current letter (for example, if we already know the value for comparing “STEPH” and “PARALL”, the value

for “STEPHE” and “PARALLE” is the same, as we added the same letter to the end of both strings, and thus

the distance doesn’t change).

2.

The two characters corresponding with that cell are different. The value for this cell is the minimum of

three potential operations: a deletion, a substitution, or an insertion. These are represented by adding 1

to the value retrieved from the cells immediately above, diagonally to the upper-left, and to the left.

As an exercise, try filling in the table. The completed table for “PARALLEL” and “STEPHEN” is as follows:

 

Patterns of Parallel Programming

Page 81

 

 

P A R A L L E L

0 1 2 3 4 5 6 7 8

S 1 1 2 3 4 5 6 7 8

T 2 2 2 3 4 5 6 7 8

E 3 3 3 3 4 5 6 6 7

P 4 3 4 4 4 5 6 7 7

H 5 4 4 5 5 5 6 7 8

E 6 5 5 5 6 6 6 6 7

N 7 6 6 6 6 7 7 7 7

 

As you filled it in, you should have noticed that the numbers were filled in almost as if a wavefront were moving

through the table, since a cell [i,j] can be filled in as soon as the three cells [i-1,j-1], [i-1,j], and [i,j-1] are completed

(and in fact, the completion of the cell above and to the left implies that the diagonal cell was also completed) .

From a parallel perspective, this should sound familiar, harkening back to our discussion of DAGs. We could, in fact,

parallelize this problem using one Task per cell and multi-task continuations, but as with previous examples on

 

dependencies, there’s very little work being done per cell, and the overhead of creating a task for each cell would

 

significantly outweigh the value of doing so.

 

You’ll notice, however, that there are macro versions of these micro problems: take any rectangular subset of the

cells in the grid, and that rectangular subset can be completed when the rectangular block above it and to its left

have completed. This presents a solution: we can block the entire matrix up into rectangular regions, run the

algorithm over each block, and use continuations for dependencies between blocks. This amortizes the cost of the

parallelization with tasks across all of the cells in each block, making a Task worthwhile as long as the block is big

enough.

 

Since the macro problem is the same as the micro, we can write one routine to work with this general pattern,

dubbed the “wavefront” pattern; we can then write a small routine on top of it to deal with blockin g as needed.

 

Here’s an implementation based on Tasks and continuations:

 

C#

 

static void Wavefront(

int numRows, int numColumns, Action<int, int> processRowColumnCell)

{

 

// … Would validate arguments here

 

// Store the previous row of tasks as well as the previous task

// in the current row.

 

Task[] prevTaskRow = new Task[numColumns];

Task prevTaskInCurrentRow = null;

var dependencies = new Task[2];

 

 

// Create a task for each cell.

 

for (int row = 0; row < numRows; row++)

{

 

 

Patterns of Parallel Programming Page 82

 

 

prevTaskInCurrentRow = null;

for (int column = 0; column < numColumns; column++)

{

 

 

// In-scope locals for being captured in the task closures.

 

int j = row, i = column;

 

// Create a task with the appropriate dependencies.

 

Task curTask;

if (row == 0 && column == 0)

{

 

 

// Upper-left task kicks everything off,

// having no dependencies.

 

curTask = Task.Factory.StartNew(() =>

 

processRowColumnCell(j, i));

}

else if (row == 0 || column == 0)

{

 

// Tasks in the left-most column depend only on the task

// above them, and tasks in the top row depend only on

// the task to their left.

 

var antecedent = column == 0 ?

 

prevTaskRow[0] : prevTaskInCurrentRow;

curTask = antecedent.ContinueWith(p =>

{

 

p.Wait(); // Necessary only to propagate exceptions.

processRowColumnCell(j, i);

});

}

 

else // row > 0 && column > 0

 

{

 

// All other tasks depend on both the tasks above

// and to the left.

 

 

dependencies[0] = prevTaskInCurrentRow;

dependencies[1] = prevTaskRow[column];

curTask = Task.Factory.ContinueWhenAll(dependencies, ps =>

{

 

 

Task.WaitAll(ps); // Necessary to propagate exceptions

processRowColumnCell(j, i);

});

}

 

// Keep track of the task just created for future iterations.

 

prevTaskRow[column] = prevTaskInCurrentRow = curTask;

}

}

 

// Wait for the last task to be done.

 

prevTaskInCurrentRow.Wait();

}

 

 

While a non-trivial amount of code, it’s

actually quite straightforward. We maintain an array of Tasks represented

the previous row, and a Task represented the previous Task in the current row. We start by launching a Task to

process the initial Task in the [0,0] slot, since it has no dependencies. We then walk each cell in each row, creating

a continuation Task for each cell. In the first row or the first column, there is just one dependency, the previous

cell in that row or the previous cell in that column, respectively. For all other cells, the continuation is based on the

 

Patterns of Parallel Programming Page 83

 

 

previous cell in both the current row and the current column. At the end, we just wait for the last Task to

complete.

 

With that code in place, we now need to support blocks, and we can layer another Wavefront function on top to

support that:

 

C#

 

static void Wavefront(

int numRows, int numColumns,

int numBlocksPerRow, int numBlocksPerColumn,

Action<int, int, int, int> processBlock)

 

 

{

 

// … Would validate arguments here

 

// Compute the size of each block.

 

int rowBlockSize = numRows / numBlocksPerRow;

int columnBlockSize = numColumns / numBlocksPerColumn;

 

 

Wavefront(numBlocksPerRow, numBlocksPerColumn, (row, column) =>

 

{

int start_i = row * rowBlockSize;

int end_i = row < numBlocksPerRow -1 ?

 

 

start_i + rowBlockSize : numRows;

 

int start_j = column * columnBlockSize;

int end_j = column < numBlocksPerColumn -1 ?

start_j + columnBlockSize : numColumns;

 

processBlock(start_i, end_i, start_j, end_j);

 

});

}

 

 

This code is much simpler. The function accepts the number of rows and number of columns, but also the number

of blocks to use. The delegate now accepts four values, the starting and ending position of the block for both row

and column. The function validates parameters, and then computes the size of each block. From there, it delegates

to the Wavefront overload we previously implemented. Inside the delegate, it uses the provided row and column

number along with the block size to compute the starting and ending row and column positions, and then passes

those values down to the user-supplied delegate.

 

With this Wavefront pattern implementation in place, we can now parallelize our EditDistance function with very

little additional code:

 

C#

 

static int ParallelEditDistance(string s1, string s2)

 

{

int[,] dist = new int[s1.Length + 1, s2.Length + 1];

for (int i = 0; i <= s1.Length; i++) dist[i, 0] = i;

for (int j = 0; j <= s2.Length; j++) dist[0, j] = j;

 

int numBlocks = Environment.ProcessorCount * 2;

 

Wavefront(s1.Length, s2.Length, numBlocks, numBlocks,

(start_i, end_i, start_j, end_j) =>

 

{

for (int i = start_i + 1; i <= end_i; i++)

 

Patterns of Parallel Programming Page 84

 

 

{

for (int j = start_j + 1; j <= end_j; j++)

{

 

dist[i, j] = (s1[i -1] == s2[j -1]) ?

dist[i -1, j -1] :

1 + Math.Min(dist[i -1, j],

 

Math.Min(dist[i, j -1],

dist[i -1, j -1]));

}

}

});

 

 

return dist[s1.Length, s2.Length];

}

 

 

For small strings, the parallelization overheads will outweigh any benefits. But for large strings, this parallelization

approach can yield significant benefits.

 

F O L D A N D S C A N

 

 

Sometimes a dependency is so significant, there is seemingly no way around it. One such example of this is a “fold”

operation. A fold is typically of the following form:

 

C#

 

b[0] = a[0];

 

for (int i = 1; i < N; i++)

 

{

 

b[i] = f(b[i -1], a[i]);

 

}

 

As an example, if the function f is addition and the input array is 1,2,3,4,5, the result of the fold will be 1,3,6,10,15.

Each iteration of the fold operation is entirely dependent on the previous iteration, leaving little room for

parallelism. However, as with aggregations, we can make an accommodation: if we guarantee that the f function is

associative, that enables enough wiggle room to introduce some parallelism (many operations are associative,

including the addition operation used as an example). With this restriction on the operation, it’s typically called a

“scan,” or sometimes “prefix scan.”

 

There are several ways a scan may be parallelized. An approach we’ll show here is based on blocking. Consider

wanting to parallel scan the input sequence of the numbers 1 through 20 using the addition operator on a quadcore

machine. We can split the input into four blocks, and then in parallel, scan each block individually. Once that

step has completed, we can pick out the top element from each block, and do a sequential, exclusive scan on just

those four entries; in an exclusive scan, element b[i] is what element b[i-1] would have been in a regular (inclusive)

scan, with b[0] initialized to 0. The result of this exclusive scan is that, for each block, we now have the

accumulated value for the entry just before the block, and thus we can fold that value in to each element in the

block. For that latter fold, again each block may be processed in parallel.

 

Patterns of Parallel Programming Page 85

 

 

1 2 3 4 56 7 8 9 1011 12 13 14 1516 17 18 19 201 3 6 10 156 13 21 30 4011 23 36 50 6516 33 51 70 90ScanScanScanScan1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20Logically partition into blocks15 40 65 900 15 55 120Exclusive Scan1 3 6 10 1521 28 36 45 5566 78 91 105 120136 153 171 190 210Logically Gather Upper EntriesInclusive Scan Entry into Block

Here is an implementation of this algorithm. As with the heated plate example shown previously, we’re using one

Task per block with a Barrier to synchronize all tasks across the three stages:

 

1. Scan each block in parallel.

2. Do the exclusive scan of the upper value from each block serially.

3. Scan the exclusive scan results into the blocks in parallel.

One important thing to note about this parallelization is that it incurs significant overhead .

In the sequential scan implementation, we’re executing the combiner function f

approximately N times, where N is the number of entries. In the parallel implementation,

we’re executing f approximately 2N times. As a result, while the operation may be

parallelized, at least two cores are necessary just to break even.

 

While there are several ways to enforce the serial nature of the second step, here we’re utilizing the Barrier’s postphase

action delegate (the complete implementation is available at

http://code.msdn.microsoft.com/ParExtSamples):

 

C#

 

public static void InclusiveScanInPlaceParallel<T>(

T[] arr, Func<T, T, T> function)

 

 

{

 

Patterns of Parallel Programming Page 86

 

 

int procCount = Environment.ProcessorCount;

T[] intermediatePartials = new T[procCount];

using (var phaseBarrier = new Barrier(procCount,

 

 

_ => ExclusiveScanInPlaceSerial(

intermediatePartials, function, 0, intermediatePartials.Length)))

{

 

// Compute the size of each range.

 

 

int rangeSize = arr.Length / procCount, nextRangeStart = 0;

 

 

// Create, store, and wait on all of the tasks.

 

 

var tasks = new Task[procCount];

for (int i = 0; i < procCount; i++, nextRangeStart += rangeSize)

{

 

 

// Get the range for each task, then start it.

 

 

int rangeNum = i;

int lowerRangeInclusive = nextRangeStart;

int upperRangeExclusive = i < procCount -1 ?

 

 

nextRangeStart + rangeSize : arr.Length;

tasks[rangeNum] = Task.Factory.StartNew(() =>

{

 

// Phase 1: Prefix scan assigned range.

 

InclusiveScanInPlaceSerial(arr, function,

lowerRangeInclusive, upperRangeExclusive, 1);

intermediatePartials[rangeNum] = arr[upperRangeExclusive -1];

 

// Phase 2: One thread should prefix scan intermediaries.

 

phaseBarrier.SignalAndWait();

 

// Phase 3: Incorporate partials.

 

if (rangeNum != 0)

{

 

for (int j = lowerRangeInclusive;

j < upperRangeExclusive;

j++)

 

{

arr[j] = function(

intermediatePartials[rangeNum], arr[j]);

}

}

 

});

}

Task.WaitAll(tasks);

 

 

}

}

 

 

This demonstrates that parallelization may be achieved where dependences would otherwise appear to be an

 

obstacle that can’t be mitigated.

 

Patterns of Parallel Programming Page 87

 

 

D A T A S E T S O F U N K N O W N S I Z E D A T A S E T S O F U N K N O W N S I Z E

Most of the examples described in this document thus far center around data sets of known sizes: input arrays,

input lists, and so forth. In many real-world problems, however, the size of the data set to be processed isn’t

known in advance. This may be because the data is coming in from an external source and hasn’t all arrived yet, or

it may because the data structure storing the data doesn’t keep track of the size or doesn’t store the data in a

manner amenable to the size being relevant. Regardless of the reason, it’s important to be able to parallelize such

problems.

 

S T R E A M I N G D A T A

 

 

Data feeds are becoming more and more important in all areas of computing. Whether it’s a feed of ticker data

from a stock exchange, a sequence of network packets arriving at a machine, or a series of mouse clicks being

entered by a user, such data can be an important input to parallel implementations.

 

Parallel.ForEach and PLINQ are the two constructs discussed thus far that work on data streams, in the form of

enumerables. Enumerables, however, are based on a pull-model, such that both Parallel.ForEach and PLINQ are

handed an enumerable from which they continually “move next” to get the next element. This is seemingly

contrary to the nature of streaming data, where it hasn’t all arrived yet, and comes in more of a “push” fashion

rather than “pull”. However, if we think of this pattern as a producer/consumer pattern, where the streaming data

is the producer and the Parallel.ForEach or PLINQ query is the consumer, a solution from the .NET Framework 4

becomes clear: we can use BlockingCollection. BlockingCollection’s GetConsumingEnumerable method provides

an enumerable that can be supplied to either Parallel.ForEach or PLINQ. ForEach and PLINQ will both pull data

from this enumerable, which will block the consumers until data is available to be processed. Conversely, as

streaming data arrives in, that data may be added to the collection so that it may be picked up by the consumers.

 

C#

 

private BlockingCollection<T> _streamingData = new BlockingCollection<T>();

 

// Parallel.ForEach

 

Parallel.ForEach(_streamingData.GetConsumingEnumerable(),

 

item => Process(item));

 

// PLINQ

 

var q = from item in _streamingData.GetConsumingEnumerable().AsParallel()

 

 

select item;

 

There are several caveats to be aware of here, both for Parallel.ForEach and for PLINQ. Parallel.ForEach and PLINQ

work on slightly different threading models in the .NET Framework 4. PLINQ uses a fixed number of threads to

execute a query; by default, it uses the number of logical cores in the machine, or it uses the value passed to

WithDegreeOfParallelism if one was specified. Conversely, Parallel.ForEach may use a variable number of threads,

based on the ThreadPool’s support for injecting and retiring threads over time to best accommodate current

workloads. For Parallel.ForEach, this means that it’s continually monitoring for new threads to be available to it,

taking advantage of them when they arrive, and the ThreadPool is continually trying out injecting new threads into

the pool and retiring threads from the pool to see whether more or fewer threads is beneficial. However, when

passing the result of calling GetConsumingEnumerable as the data source to Parallel.ForEach, the threads used by

the loop have the potential to block when the collection becomes empty. And a blocked thread may not be

 

Patterns of Parallel Programming Page 88

 

 

released by Parallel.ForEach back to the ThreadPool for retirement or other uses. As such, with the code as shown

above, if there are any periods of time where the collection is empty, the thread count in the process may steadily

grow; this can lead to problematic memory usage and other negative performance implications. To address this,

when using Parallel.ForEach in a streaming scenario, it’s best to place an explicit limit on the number of threads

the loop may utilize: this can be done using the ParallelOptions type, and specifically its MaxDegreeOfParallelism

field:

 

C#

 

var options = new ParallelOptions { MaxDegreeOfParallelism = 4 };

Parallel.ForEach(_streamingData.GetConsumingEnumerable(), options,

 

 

item => Process(item));

 

By adding the bolded code above, the loop is now limited to at most four threads, avoiding the potential for

significant thread consumption. Even if the collection is empty for a long period of time, the loop can block only

four threads at most.

 

PLINQ has a different set of caveats. It already uses a fixed number of threads, so thread injection isn’t a concern.

Rather, in the .NET Framework 4, PLINQ has an internally hardwired limit on the number of data elements in an

input data source that are supported: 231, or 2,147,483,648. This means that PLINQ should only be used for

streaming scenarios where fewer than this number of elements will be processed. In most scenarios, this limit

should not be problematic. Consider a scenario where each element takes one millisecond to process. It would

take at least 24 days at that rate of processing to exhaust this element space. If this limit does prove troublesome,

however, in many cases there is a valid mitigation. The limit of 231 elements is per execution of a query, so a

potential solution is to simply restart the query after a certain number of items has been fed into the query.

Consider a query of the form:

 

C#

 

_streamingData.GetConsumingEnumerable().AsParallel()

 

.OtherOperators()

 

.ForAll(x => Process(x));

 

We need two things, a loop around the query so that when one query ends, we start it over again, and an operator

that only yields the first N elements from the source, where N is chosen to be less than the 231 limit. LINQ already

provides us with the latter, in the form of the Take operator. Thus, a workaround would be to rewrite the query as

follows:

 

C#

 

while (true)

{

 

 

_streamingData.GetConsumingEnumerable().Take(2000000000).AsParallel()

.OtherOperators()

.ForAll(x => Process(x));

 

}

 

An additional caveat for PLINQ is that not all operators may be used in a streaming query, due to how those

operators behave. For example, OrderBy performs a sort and releases items in sorted order. OrderBy has no way

of knowing whether the items it has yet to consume from the source are less than the smallest item seem thus far,

and thus it can’t release any elements until it’s seen all elements from the source. With an “infinite” source, as is

the case with a streaming input, that will never happen.

 

Patterns of Parallel Programming Page 89

 

 

P A R A L L E L W H I L E N O T E M P T Y

 

 

There’s a fairly common pattern that emerges when processing some data structures: the processing of an

 

element yields additional work to be processed. We can see this with the tree-walk example shown earlier in this

 

document: processing one node of the tree may yield additional work to be processed in the form of that node’s

 

children. Similarly in processing a graph data structure, processing a node may yield additional work to be

processed in the form of that node’s neighbors.

 

Several parallel frameworks include a construct focused on processing these kinds of workloads. No such construct

is included in the .NET Framework 4, however it’s straightforward to build one. There are a variety of ways such a

solution may be coded. Here’s one:

 

C#

 

public static void ParallelWhileNotEmpty<T>(

IEnumerable<T> initialValues, Action<T, Action<T>> body)

 

 

{

var from = new ConcurrentQueue<T>(initialValues);

var to = new ConcurrentQueue<T>();

 

 

while (!from.IsEmpty)

 

{

Action<T> addMethod = v => to.Enqueue(v);

Parallel.ForEach(from, v => body(v, addMethod));

from = to;

to = new ConcurrentQueue<T>();

 

 

}

}

 

 

This solution is based on maintaining two lists of data: the data currently being processed (the “from” queue), and

the data generated by the processing of the current data (the “to” queue). The initial values to be processed are

stored into the first list. All those values are processed, and any new values they create are added to the second

list. Then the second list is processed, and any new values that are produced go into a new list (or alternatively the

first list cleared out). Then that list is processed, and… so on. This continues until the next list to be processed has

no values available.

 

With this in place, we can rewrite our tree walk implementation shown previously:

 

C#

 

static void Walk<T>(Tree<T> root, Action<T> action)

 

{

if (root == null) return;

ParallelWhileNotEmpty(new[] { root }, (item, adder) =>

{

 

if (item.Left != null) adder(item.Left);

if (item.Right != null) adder(item.Right);

action(item.Data);

 

});

}

 

Patterns of Parallel Programming Page 90

 

 

A N T I -P A T T E R N S

 

 

B L O C K I N G D E P E N D E N C I E S B E T W E E N P A R T I T I O N E D C H U N K S

 

As mentioned, there are several ways ParallelWhileNotEmpty could be implemented. Another approach to

implementing ParallelWhileNotEmpty combines two parallel patterns we’ve previously seen in this document:

counting up and down, and streaming. Simplistically, we can use a Parallel.ForEach over a BlockingCollection’s

GetConsumingEnumerable, and allow the body of the ForEach to add more items into the BlockingCollection. The

only thing missing, then, is the ability to mark the collection as complete for adding, which we only want to do

after the last element has been processed (since the last element may result in more elements being added). To

accomplish that, we keep track of the number of elements remaining to complete processing; every time the

adder operation is invoked, we increase this count, and every time we complete the processing of an item we

decrease it. If the act of decreasing it causes it to reach 0, we’re done, and we can mark the collection as complete

for adding so that all threads involved in the ForEach will wake up.

 

C#

 

// WARNING: THIS METHOD HAS A BUG

 

static void ParallelWhileNotEmpty<T>(

IEnumerable<T> source, Action<T, Action<T>> body)

 

 

{

var queue = new ConcurrentQueue<T>(source);

if (queue.IsEmpty) return;

 

 

var remaining = new CountdownEvent(queue.Count);

var bc = new BlockingCollection<T>(queue);

Action<T> adder = item => {

 

 

remaining.AddCount();

 

bc.Add(item);

};

var options = new ParallelOptions {

 

 

MaxDegreeOfParallelism = Environment.ProcessorCount

};

Parallel.ForEach(bc.GetConsumingEnumerable(), options, item =>

{

 

try { body(item, adder); }

finally {

if (remaining.Signal()) bc.CompleteAdding();

}

});

}

 

 

Unfortunately, this implementation has a devious bug in it, one that will likely result in deadlock close to the end of

its execution such that ParallelWhileNotEmpty will never return. The issue has to do with partitioning.

Parallel.ForEach uses multiple threads to process the supplied data source (in this case, the result of calling

bc.GetConsumingEnumerable), and as such the data from that source needs to be dispensed to those threads. By

default, Parallel.ForEach does this by having its threads take a lock, pull some number of elements from the

source, release the lock, and then process those items. This is a performance optimization for the general case,

where the number of trips back to the data source and the number of times the lock must be acquired and

released is minimized. However, it’s then also very important that the processing of elements not have

dependencies between them.

 

Patterns of Parallel Programming Page 91

 

 

Consider a very simple example:

 

C#

 

var mres = new ManualResetEventSlim();

Parallel.ForEach(Enumerable.Range(0, 10), i =>

{

 

 

if (i == 7) mres.Set();

else mres.Wait();

});

 

 

Theoretically, this code could deadlock. All iterations have a dependency on iteration #7 executing, and yet the

same thread that executed one of the other iterations may be the one destined to execute #7. To see this, consider

a potential partitioning of the input data [0,10), where every thread grabs two elements at a time:

 

0123456789

Here, the same thread grabbed both elements 6 and 7. It then processes 6, which immediately blocks waiting for

an event that will only be set when 7 is processed, but 7 won’t ever be processed, because the thread that would

process it is blocked processing 6.

 

Back to our ParallelWhileNotEmpty example, a similar issue exists there but is less obvious. The last element to be

processed marks the BlockingCollection as complete for adding, which will cause any threads waiting on the

empty collection to wake up, aware that no more data will be coming. However, threads are pulling multiple data

elements from the source on each go around, and are not processing the elements from that chunk until the chunk

contains a certain number of elements. Thus, a thread may grab what turns out to be the last element, but then

continues to wait for more elements to arrive before processing it; however, only when that last element is

 

processed will the collection signal to all waiting threads that there won’t be any more data, and we have a

 

deadlock.

 

We can fix this by modifying the partitioning such that every thread only goes for one element at a time. That has

the downside of resulting in more overhead per element, since each element will result in a lock being taken, but it

has the serious upside of not resulting in deadlock. To control that, we can supply a custom partitioner that

provides this functionality. The parallel programming samples for the .NET Framework 4, available for download at

http://code.msdn.microsoft.com/ParExtSamples includes a ChunkPartitioner capable of yielding a single element

at a time. Taking advantage of that, we get the following fixed solution:

 

C#

 

static void ParallelWhileNotEmpty<T>(

IEnumerable<T> source, Action<T, Action<T>> body)

 

{

var queue = new ConcurrentQueue<T>(source);

if (queue.IsEmpty) return;

 

var remaining = new CountdownEvent(queue.Count);

 

var bc = new BlockingCollection<T>(queue);

 

Patterns of Parallel Programming Page 92

 

 

Action<T> adder = t => {

remaining.AddCount();

bc.Add(t);

 

 

};

var options = new ParallelOptions {

 

 

MaxDegreeOfParallelism = Environment.ProcessorCount

};

Parallel.ForEach(ChunkPartitioner.Create(bc.GetConsumingEnumerable(), 1),

 

options, item =>

 

{

try { body(item, adder); }

finally {

 

 

if (remaining.Signal()) bc.CompleteAdding();

}

});

}

 

Patterns of Parallel Programming Page 93

 

 

S P E C U L A T I V E P R O C E S S I N G S P E C U L A T I V E P R O C E S S I N G

Speculation is the pattern of doing something that may not be needed in case it actually is needed. This is

increasing relevant to parallel computing, where we can take advantage of multiple cores to do more things in

advance of their actually being needed. Speculation trades off throughput for reduced latency, by utilizing

resources to do more work in case that extra work could pay dividends.

 

T H E R E C A N B E O N L Y O N E

 

 

There are many scenarios where multiple mechanisms may be used to compute a result, but how long each

mechanism will take can’t be predicted in advance. With serial computing, you’re forced to pick one and hope that

it’s the fastest. With parallel computing, we can theoretically run them all in parallel: once we have a winner, we

can stop running the rest of the operations.

 

We can encapsulate this functionality into a SpeculativeInvoke operation. SpeculativeInvoke will take a set of

functions to be executed, and will start executing them in parallel until at least one returns.

 

C#

 

public static T SpeculativeInvoke<T>(params Func<T>[] functions);

 

As mentioned earlier in the section on parallel loops, it’s possible to implement Invoke in terms of ForEach… we

can do the same here for SpeculativeInvoke:

 

C#

 

public static T SpeculativeInvoke<T>(params Func<T>[] functions)

 

{

 

return SpeculativeForEach(functions, function => function());

 

}

 

Now all we need is a SpeculativeForEach.

 

S P E C U L A T I V E F O R E A C H U S I N G P A R A L L E L . F O R E A CH

 

With ForEach, the goal is to process every item. With SpeculativeForEach, the goal is to get just one result,

executing as many items as we can in parallel in order to get just one to return.

 

C#

 

public static TResult SpeculativeForEach<TSource, TResult>(

IEnumerable<TSource> source,

Func<TSource, TResult> body)

 

{

object result = null;

Parallel.ForEach(source, (item, loopState) =>

{

 

result = body(item);

 

loopState.Stop();

});

return (TResult)result;

 

 

}

 

Patterns of Parallel Programming Page 94

 

 

We take advantage of Parallel.ForEach’s support for breaking out of a loop early, using ParallelLoopState.Stop.

This tells the loop to try not to start any additional iterations. When we get a result from an iteration, we store it,

request that the loop stop as soon as possible, and when the loop is over, return the result. A

SpeculativeParallelFor could be implemented in a very similar manner.

 

Note that we store the result as an object, rather than as a TResult. This is to

accommodate value types. With multiple iterations executing in parallel, it’s possible

that multiple iterations may try to write out a result concurrently. With reference

types, this isn’t a problem, as the CLR ensures that all of the data in a reference is

written atomically. But with value types, we could potentially experience “torn

writes,” where portions of the results from multiple iterations get written, resulting in

an incorrect result.

 

As noted, when an iteration completes it does not terminate other currently running iterations, it only works to

prevent additional iterations from starting. If we want to update the implementation to also make it possible to

cancel currently running iterations, we can take advantage of the .NET Framework 4 CancellationToken type. The

idea is that we’ll pass a CancellationToken into all functions, and the functions themselves may monitor for

cancellation, breaking out early if cancellation was experienced.

 

C#

 

public static TResult SpeculativeForEach<TSource, TResult>(

IEnumerable<TSource> source,

Func<TSource, CancellationToken, TResult> body)

 

{

 

var cts = new CancellationTokenSource();

 

object result = null;

Parallel.ForEach(source, (item, loopState) =>

{

 

 

try

 

{

result = body(item, cts.Token);

loopState.Stop();

 

cts.Cancel();

}

catch (OperationCanceledException) { }

 

});

return (TResult)result;

}

 

S P E C U L A T I V E F O R E A C H U S I N G P L I N Q

 

We can also achieve this kind of speculative processing utilizing PLINQ. The goal of SpeculativeForEach is to select

the result of the first function to complete, an operation which maps very nicely to PLINQ’s Select and First

operators. We can thus re-implement SpeculativeForEach with very little PLINQ-based code:

 

C#

 

public static TResult SpeculativeForEach<TSource, TResult>(

IEnumerable<TSource> source, Func<TSource, TResult> body)

 

 

{

 

Patterns of Parallel Programming Page 95

 

 

if (body == null) throw new ArgumentNullException(“body”);

return source.AsParallel().Select(i => body(i)).First();

 

}

 

F O R T H E F U T U R E

 

 

The other large classification of speculative processing is around anticipation: an application can anticipate a need,

and do some computation based on that guess. Prefetching, common in hardware and operating systems, is an

example of this. Based on past experience and heuristics, the system anticipates that the program is going to need

a particular resource and thus preloads that resource so that it’s available by the time it’s needed. If the system

guessed correctly, the end result is improved perceived performance.

 

Task<TResult> in the .NET Framework 4 makes it very straightforward to implement this kind of logic. When the

system anticipates a particular computation’s result may be needed, it launches a Task<TResult> to compute the

result.

 

C#

 

var cts = new CancellationTokenSource();

 

Task<int> dataForThefuture = Task.Factory.StartNew(

() => ComputeSomeResult(), cts.Token);

 

If it turns out that result is not needed, the task may be canceled.

 

C#

 

// Cancel it and make sure we are made aware of any exceptions

// that occurred.

 

cts.Cancel();

 

dataForTheFuture.ContinueWith(t => LogException(dataForTheFuture),

TaskContinuationOptions.OnlyOnFaulted);

 

If it turns out it is needed, its Result may be retrieved.

 

C#

 

// This will return the value immediately if the Task has already

 

// completed, or will wait for the result to be available if it’s

 

// not yet completed.

 

int result = dataForTheFuture.Result;

 

Patterns of Parallel Programming Page 96

 

 

L A Z I N E S S L A Z I N E S S

Programming is one of few professional areas where laziness is heralded. As we write software, we look for ways

to improve performance, or at least to improve perceived performance, and laziness helps in both of these

regards.

 

Lazy evaluation is all about delaying a computation such that it’s not evaluated until it’s needed . In doing so, we

may actually get away with never evaluating it at all, since it may never be needed. And other times, we can make

the cost of evaluating lots of computations “pay-for-play” by only doing those computations when they’re needed

and not before. (In a sense, this is the opposite of speculative computing, where we may start computations

asynchronously as soon as we think they may be needed, in order to ensure the results are available if they’re

needed.)

 

Lazy evaluation is not something at all specific to parallel computing. LINQ is heavily based on a lazy evaluation

model, where queries aren’t executed until MoveNext is called on an enumerator for the query. Many types lazilyload

data, or lazily initialize properties. Where parallelization comes into play is in making it possible for multiple

threads to access lazily-evaluated data in a thread-safe manner.

 

E N C A P S U L A T I N G L A Z I N E S S

 

 

Consider the extremely common pattern for lazily-initializing some property on a type:

 

C#

 

public class MyLazy<T> where T : class

{

private T _value;

 

 

public T Value

{

 

 

get

 

{

if (_value == null) _value = Compute();

return _value;

 

}

}

 

 

private static T Compute() { /*…*/ }

}

 

 

Here, the _value field needs to be initialized to the result of some function Compute. _value could have been

initialized in the constructor of MyLazy<T>, but that would have forced the user to incur the cost of computing

_value, even if the Value property is never accessed. Instead, the Value property’s get accessor checks to see

whether _value has been initialized, and if it hasn’t, initializes it before returning _value. The initialization check

happens by comparing _value to null, hence the class restriction on T, since a struct may never be null.

 

Unfortunately, this pattern breaks down if the Value property may be accessed from multiple threads

concurrently. There are several common patterns for dealing with this predicament. The first is through locking:

 

Patterns of Parallel Programming Page 97

 

 

C#

 

 

public class MyLazy<T> where T : class

{

 

 

private object _syncObj = new object();

 

private T _value;

 

public T Value

 

{

 

get

 

{

 

lock (_syncObj)

{

 

 

if (_value == null) _value = Compute();

return _value;

 

}

 

}

}

 

 

private static T Compute() { /*…*/ }

}

 

 

Now, the Value property is thread-safe, such that only one thread at a time will execute the body of the get

accessor. Unfortunately, we also now force every caller of Value to accept the cost of taking a lock, even if Value

has already previously been initialized. To work around that, there’s the classic double-checked locking pattern:

 

C#

 

public class MyLazy<T> where T : class

 

{

private object _syncObj = new object();

private volatile T _value;

 

 

public T Value

{

 

 

get

 

{

 

if (_value == null)

 

{

lock(_syncObj)

{

 

 

if (_value == null) _value = Compute();

}

}

 

return _value;

 

}

 

private static T Compute() { /*…*/ }

}

 

 

This is starting to get complicated, with much more code having to be written than was necessary for the initial

non-thread-safe version. Moreover, we haven’t factored in the complications of exception handling, supporting

value types in addition to reference types (and having to deal with potential “torn reads” and “torn writes”), cases

where null is a valid value, and more. To simplify this, all aspects of the pattern, including the synchronization to

ensure thread-safety, have been codified into the new .NET Framework 4 System.Lazy<T> type. We can re-write

the code using Lazy<T> as follows:

 

Patterns of Parallel Programming Page 98

 

 

C#

 

 

public class MyLazy<T>

{

 

 

private Lazy<T> _value = new Lazy<T>(Compute);

public T Value { get { return _value.Value; } }

private static T Compute() { /*…*/ }

 

 

}

 

Lazy<T> supports the most common form of thread-safe initialization through a simple-to-use interface. If more

control is needed, the static methods on System.Threading.LazyInitializer may be employed.

 

The double-checked locking pattern supported by Lazy<T> is also supported by LazyInitializer, but through a single

static method:

 

C#

 

public static T EnsureInitialized<T>(

ref T target, ref bool initialized,

ref object syncLock,

Func<T> valueFactory);

 

 

This overload allows the developer to specify the target reference to be initialized as well as a Boolean value that

signifies whether initialization has been completed. It also allows the developer to explicitly specify the monitor

object to be used for synchronization.

 

Being able to explicitly specify the synchronization object allows multiple

initialization routines and fields to be protected by the same lock.

 

We can use this method to re-implement our previous examples as follows:

 

C#

 

public class MyLazy<T> where T : class

 

{

private object _syncObj = new object();

private bool _initialized;

private T _value;

 

 

public T Value

{

 

 

get

 

{

return LazyInitializer.EnsureInitialized(

ref _value, ref _initialized, ref _syncObj, Compute);

}

}

 

private static T Compute() { /*…*/ }

}

 

 

This is not the only pattern supported by LazyInitializer, however. Another less-common thread-safe initialization

pattern is based on the principle that the initialization function is itself thread-safe, and thus it’s okay for it to be

executed concurrently with itself. Given that property, we no longer need to use a lock to ensure that only one

 

Patterns of Parallel Programming Page 99

 

 

thread at a time executes the initialization function. However, we still need to maintain the invariant that the value

being initialized is only initialized once. As such, while the initialization function may be run multiple times

concurrently in the case of multiple threads racing to initialize the value, one and only one of the resulting values

must be published for all threads to see. If we were writing such code manually, it might look as follows:

 

C#

 

public class MyLazy<T> where T : class

{

private volatile T _value;

 

 

public T Value

{

 

 

get

 

{

if (_value == null)

{

 

 

T temp = Compute();

 

Interlocked.CompareExchange(ref _value, temp, null);

}

return _value;

 

}

}

 

 

private static T Compute() { /*…*/ }

}

 

 

LazyInitializer provides an overload to support this pattern as well:

 

C#

 

public static T EnsureInitialized<T>(

ref T target, Func<T> valueFactory) where T : class;

 

With this method, we can re-implement the same example as follows:

 

C#

 

public class MyLazy<T> where T : class

{

private T _value;

 

 

public T Value

{

 

 

get

 

{

return LazyInitializer.EnsureInitialized(ref _value, Compute);

}

}

 

private static T Compute() { /*…*/ }

}

 

 

It’s worth noting that in these cases, if the Compute function returns null, _value will be set to null, which is

indistinguishable from Compute never having been run, and as a result the next time Value’s get accessor is

invoked, Compute will be executed again.

 

Patterns of Parallel Programming Page 100

 

 

A S Y N C H R O N O U S L A Z I N E S S

 

 

Another common pattern centers around a need to lazily-initialize data asynchronously and to receive notification

when the initialization has completed. This can be accomplished by marrying two types we’ve already seen:

Lazy<T> and Task<TResult>.

 

Consider an initialization routine:

 

C#

 

T Compute();

 

We can create a Lazy<T> to provide the result of this function:

 

C#

 

Lazy<T> data = new Lazy<T>(Compute);

 

However, now when we access data.Value, we’re blocked waiting for the Compute operation to complete.

Instead, for asynchronous lazy initialization, we’d like to delay the computation until we know we’ll need it, but

once we do we also don’t want to block waiting for it to complete. That latter portion should hint at using a

Task<TResult>:

 

C#

 

Task<T> data = Task<T>.Factory.StartNew(Compute);

 

Combining the two, we can use a Lazy<Task<T>> to get both the delayed behavior and the asynchronous behavior:

 

C#

 

var data = new Lazy<Task<T>>(() => Task<T>.Factory.StartNew(Compute));

 

Now when we access data.Value, we get back a Task<T> that represents the running of Compute. No matter how

many times we access data.Value, we’ll always get back the same Task, even if accessed from multiple threads

concurrently, thanks to support for the thread-safety patterns built into Lazy<T>. This means that only one

Task<T> will be launched for Compute. Moreover, we can now use this result as we would any other Task<T>,

including registering continuations with it (using ContinueWith) in order to be notified when the computation is

complete:

 

C#

 

data.Value.ContinueWith(t => UseResult(t.Result));

 

This approach can also be combined with multi-task continuations to lazily-initialize multiple items, and to only do

 

work with those items when they’ve all completed initialization:

 

C#

 

private Lazy<Task<T>> _data1 = new Lazy<Task<T>>(() =>

Task<T>.Factory.StartNew(Compute1));

 

private Lazy<Task<T>> _data2 = new Lazy<Task<T>>(() =>

Task<T>.Factory.StartNew(Compute2));

 

private Lazy<Task<T>> _data3 = new Lazy<Task<T>>(() =>

Task<T>.Factory.StartNew(Compute3));

 

//…

 

Patterns of Parallel Programming Page 101

 

 

Task.Factory.ContinueWhenAll(

 

new [] { _data1.Value, _data2.Value, _data3.Value },

 

tasks => UseResults(_data1.Value.Result, _data2.Value.Result,

 

_data3.Value.Result));

 

Such laziness is also useful for certain patterns of caching, where we want to maintain a cache of these lazilyinitialized

values. Consider a non-thread-safe cache like the following:

 

C#

 

public class Cache<TKey, TValue>

 

{

private readonly Func<TKey, TValue> _valueFactory;

private readonly Dictionary<TKey, TValue> _map;

 

 

public Cache(Func<TKey, TValue> valueFactory)

 

{

if (valueFactory == null) throw new ArgumentNullException(“loader”);

_valueFactory = valueFactory;

_map = new Dictionary<TKey, TValue>();

 

}

 

public TValue GetValue(TKey key)

{

if (key == null) throw new ArgumentNullException(“key”);

 

TValue val;

if (!_map.TryGetValue(key, out val))

{

 

 

val = _valueFactory(key);

 

_map.Add(key, val);

}

return val;

 

 

}

}

 

 

The cache is initialized with a function that produces a value based on a key supplied to it. Whenever the value of a

key is requested from the cache, the cache returns the cached value for the key if one is available in the internal

dictionary, or it generates a new value using the cache’s _valueFactory function, stores that value for later, and

returns it.

 

We now want an asynchronous version of this cache. Just like with our asynchronous laziness functionality, we can

represent this as a Task<TValue> rather than simply as a TValue. Multiple threads will be accessing the cache

concurrently, so we want to use a ConcurrentDictionary<TKey,TValue> instead of a Dictionary<TKey,TValue>

(ConcurrentDictionary<> is a new map type available in the .NET Framework 4, supporting multiple readers and

writers concurrently without corrupting the data structure).

 

C#

 

public class AsyncCache<TKey, TValue>

{

 

 

private readonly Func<TKey, Task<TValue>> _valueFactory;

private readonly ConcurrentDictionary<TKey, Lazy<Task<TValue>>> _map;

 

public AsyncCache(Func<TKey, Task<TValue>> valueFactory)

{

 

 

Patterns of Parallel Programming Page 102

 

 

if (valueFactory == null) throw new ArgumentNullException(“loader”);

_valueFactory = valueFactory;

_map = new ConcurrentDictionary<TKey, Lazy<Task<TValue>>>();

 

 

}

 

public Task<TValue> GetValue(TKey key)

 

{

if (key == null) throw new ArgumentNullException(“key”);

return _map.GetOrAdd(key,

 

k => new Lazy<Task<TValue>>(() => _valueFactory(k))).Value;

}

}

 

The function now returns a Task<TValue> instead of just TValue, and the dictionary stores Lazy<Task<TValue>>

rather than just TValue. The latter is done so that if multiple threads request the value for the same key

concurrently, only one task for that value will be generated.

 

Note the GetOrAdd method on ConcurrentDictionary. This method was added in recognition of a very common

coding pattern with dictionaries, exemplified in the earlier synchronous cache example. It’s quite common to want

to check a dictionary for a value, returning that value if it could be found, otherwise creating a new value, adding

it, and returning it, as exemplified in the following example:

 

C#

 

public static TValue GetOrAdd<TKey, TValue>(

this Dictionary<TKey, TValue> dictionary,

TKey key, Func<TKey, TValue> valueFactory)

 

 

{

TValue value;

if (!dictionary.TryGetValue(key, out value))

{

 

 

value = valueFactory(key);

 

dictionary.Add(key, value);

}

return value;

 

 

}

 

This pattern has been codified into ConcurrentDictionary in a thread-safe manner in the form of the GetOrAdd

method. Similarly, another coding pattern that’s quite common with dictionaries is around checking for an existing

value in the dictionary, updating that value if it could be found or adding a new one if it couldn’t .

 

C#

 

public static TValue AddOrUpdate<TKey, TValue>(

this Dictionary<TKey, TValue> dictionary,

TKey key,

Func<TKey, TValue> addValueFactory,

Func<TKey, TValue, TValue> updateValueFactory)

 

{

TValue value;

value = dictionary.TryGetValue(key, out value) ?

 

updateValueFactory(key, value) : addValueFactory(key);

dictionary[key] = value;

return value;

 

}

 

Patterns of Parallel Programming Page 103

 

 

This pattern has been codified into ConcurrentDictionary in a thread-safe manner in the form of the AddOrUpdate

method.

 

Patterns of Parallel Programming Page 104

 

 

S H A R E D S T A T E S H A R E D S T A T E

Dealing with shared state is arguably the most difficult aspect of building parallel applications and is one of the

main sources of both correctness and performance problems. There are several ways of dealing with shared state,

including synchronization, immutability, and isolation. With synchronization, the shared state is protected by

mechanisms of mutual exclusion to ensure that the data remains consistent in the face of multiple threads

accessing and modifying it. With immutability, shared data is read-only, and without being modified, there’s no

danger in sharing it. With isolation, sharing is avoided, with threads utilizing their own isolated state that’s not

 

available to other threads.

 

I S O L A T I O N & T H R E A D -L O C A L S T A T E

 

 

Thread-local state is a very common mechanism for supporting isolation, and there are several reasons why you

might want to use thread-local state. One is to pass information out-of-band between stack frames. For example,

the System.Transactions.TransactionScope class is used to register some ambient information for the current

thread, such that operations (for example, commands against a database) can automatically enlist in the ambient

transaction. Another use of thread-local state is to maintain a cache of data per thread rather than having to

synchronize on a shared data source. For example, if multiple threads need random numbers, each thread can

maintain its own Random instance, accessing it freely and without concern for another thread accessing it

concurrently; an alternative would be to share a single Random instance, locking on access to it.

 

Thread-local state is exposed in the .NET Framework 4 in three different ways. The first way, and the most

efficient, is through the ThreadStaticAttribute. By applying [ThreadStatic] to a static field of a type, that field

becomes a thread-local static, meaning that rather than having one field of storage per AppDomain (as you would

with a traditional static), there’s one field of storage per thread per AppDomain.

 

Hearkening back to our randomness example, you could imagine trying to initialize a ThreadStatic Random field as

follows:

 

C#

 

[ThreadStatic]

static Random _rand = new Random(); // WARNING: buggy

 

 

static int GetRandomNumber()

 

{

 

return _rand.Next();

 

}

 

Unfortunately, this won’t work as expected. The C# and Visual Basic compilers extract initialization for

static/Shared members into a static/Shared constructor for the containing type, and a static constructor is only run

once. As such, this initialization code will only be executed for one thread in the system, leaving the rest of the

threads with _rand initialized to null. To account for this, we need to check prior to accessing _rand to ensure it’s

been initialized, invoking the initialization code on each access if it hasn’t been:

 

C#

 

[ThreadStatic]

static Random _rand;

 

 

static int GetRandomNumber()

 

Patterns of Parallel Programming Page 105

 

 

{

 

if (_rand == null) _rand = new Random();

 

return _rand.Next();

 

}

 

Any thread may now call GetRandomNumber, and any number of threads may do so concurrently; each will end

up utilizing its own instance of Random. Another issue with this approach is that, unfortunately, [ThreadStatic]

may only be used with statics. Applying this attribute to an instance member is a no-op, leaving us in search of

another mechanism for supporting per-thread, per-instance state.

 

Since the original release of the .NET Framework, thread-local storage has been supported in a more general form

through the Thread.GetData and Thread.SetData static methods. The Thread.AllocateDataSlot and

Thread.AllocateNamedDataSlot static methods may be used to create a new LocalDataStoreSlot, representing a

single object of storage. The GetData and SetData methods can then be used to get and set that object for the

current thread. Re-implementing our previous Random example could be done as follows:

 

C#

 

static LocalDataStoreSlot _randSlot = Thread.AllocateDataSlot();

 

static int GetRandomNumber()

 

{

Random rand = (Random)Thread.GetData(_randSlot);

if (rand == null)

{

 

 

rand = new Random();

 

Thread.SetData(_randSlot, rand);

}

return rand.Next();

 

 

}

 

However, since our thread-local storage is now represented as an object (LocalDataStoreSlot) rather than as a

static field, we can use this mechanism to achieve the desired per-thread, per-instance data:

 

C#

 

public class MyType

 

{

private LocalDataStoreSlot _rand = Thread.AllocateDataSlot();

 

 

public int GetRandomNumber()

 

{

Random r = (Random)Thread.GetData(_rand);

if (r == null)

{

 

 

r = new Random();

 

Thread.SetData(_rand, r);

}

return r.Next();

 

 

}

}

 

 

While flexible, this approach also has downsides. First, Thread.GetData and Thread.SetData work with type Object

rather than with a generic type parameter. In the best case, the data being stored is a reference type, and we only

need to cast to retrieve data from a slot, knowing in advance what kind of data is stored in that slot . In the worst

case, the data being stored is a value type, forcing an object allocation every time the data is modified, as the value

 

Patterns of Parallel Programming Page 106

 

 

type gets boxed when passed into the Thread.SetData method. Another issue is around performance. The

ThreadStaticAttribute approach has always been significantly faster than the Thread.GetData/SetData approach,

and while both mechanisms have been improved for the .NET Framework 4, the ThreadStaticAttribute approach is

still an order of magnitude faster. Finally, with Thread.GetData/SetData, the reference to the storage and the

capability for accessing that storage are separated out into individual APIs, rather than being exposed in a

convenient manner that combines them in an object-oriented manner.

 

To address these shortcomings, the .NET Framework 4 introduces a third thread-local storage mechanism:

ThreadLocal<T>. ThreadLocal<T> addresses the shortcomings outlined:

 

.

ThreadLocal<T> is generic. It’s Value property is typed as T and the data is stored in a generic manner.

This eliminates the need to cast when accessing the value, and it eliminates the boxing that would

otherwise occur if T were a value type.

 

.

The constructor for ThreadLocal<T> optionally accepts a Func<T> delegate. This delegate can be used to

initialize the thread-local value on every accessing thread. This alleviates the need to explicitly check on

every access to ThreadLocal<T>.Value whether it’s been initialized yet.

 

.

ThreadLocal<T> encapsulates both the data storage and the mechanism for accessing that storage. This

simplifies the pattern of accessing the storage, as all that’s required is to utilize the Value property.

.

ThreadLocal<T>.Value is fast. ThreadLocal<T> has a sophisticated implementation based on

ThreadStaticAttribute that makes the Value property more efficient than Thread.GetData/SetData.

 

ThreadLocal<T> is still not as fast as ThreadStaticAttribute, so if ThreadStaticAttribute fits your needs well and if

access to thread-local storage is a bottleneck on your fast path, it should still be your first choice. Additionally, a

single instance of ThreadLocal<T> consumes a few hundred bytes, so you need to consider how many of these you

want active at any one time.

 

Regardless of what mechanism for thread-local storage you use, if you need thread-local storage for several

successive operations, it’s best to work on a local copy so as to avoid accessing thread-local storage as much as

possible. For example, consider adding two vectors stored in thread-local storage:

 

C#

 

const int VECTOR_LENGTH = 1000000;

private ThreadLocal<int[]> _vector1 =

new ThreadLocal<int[]>(() => new int[VECTOR_LENGTH]);

private ThreadLocal<int[]> _vector2 =

new ThreadLocal<int[]>(() => new int[VECTOR_LENGTH]);

 

 

// …

 

private void DoWork()

 

{

for(int i=0; i<VECTOR_LENGTH; i++)

{

 

 

_vector2.Value[i] += _vector1.Value[i];

}

}

 

 

While the cost of accessing ThreadLocal<T>.Value has been minimized as best as possible in the implementation, it

still has a non-negligible cost (the same is true for accessing ThreadStaticAttribute). As such, it’s much better to

rewrite this code as follows:

 

Patterns of Parallel Programming

Page 107

 

 

C#

 

 

private void DoWork()

{

 

 

int [] vector1 = _vector1.Value;

int [] vector2 = _vector2.Value;

for(int i=0; i<VECTOR_LENGTH; i++)

{

 

 

vector2[i] += vector1[i];

}

_vector2.Value = vector2;

 

 

}

 

Returning now to our previous example of using a thread-local Random, we can take advantage of ThreadLocal<T>

to implement this support in a much more concise manner:

 

C#

 

public class MyType

{

 

private ThreadLocal<Random> _rand =

 

new ThreadLocal<Random>(() => new Random());

 

public int GetRandomNumber() { return _rand.Value.Next(); }

}

 

Earlier in this document, it was mentioned that the ConcurrentBag<T> data structure

maintains a list of instances of T per thread. This is achieved internally using

ThreadLocal<>.

 

S Y N C H R O N I Z A T I O N

 

 

In most explicitly-threaded parallel applications, no matter how much we try, we end up with some amount of

shared state. Accessing shared state from multiple threads concurrently requires that either that shared state be

immutable or that the application utilize synchronization to ensure the consistency of the data.

 

R E L I A B L E L O C K A C Q U I S I T I O N

 

By far, the most prevalent pattern for synchronization in the .NET Framework is in usage of the lock keyword in C#

and the SyncLock keyword in Visual Basic. Compiling down to usage of Monitor under the covers, this pattern

manifests as follows:

 

C#

 

lock (someObject)

{

 

// … critical region of code

 

}

 

Patterns of Parallel Programming Page 108

 

 

This code ensures that the work inside the critical region is executed by at most one thread at a time. In C# 3.0 and

earlier and Visual Basic 9.0 and earlier, the above code was compiled down to approximately the equivalent of the

following:

 

C#

 

var lockObj = someObject;

 

Monitor.Enter(lockObj);

 

try

 

{

 

// … critical region of code

 

}

 

finally

 

{

 

Monitor.Exit(lockObj);

 

}

 

This code ensures that even in the case of exception, the lock is released when the critical region is done. Or at

least it’s meant to. A problem emerges due to asynchronous exceptions: external influences may cause exceptions

to occur on a block of code even if that exception is not explicitly stated in the code. In the extreme case, a thread

abort may be injected into a thread between any two instructions, though not within a finally block except in

extreme conditions. If such an abort occurred after the call to Monitor.Enter but prior to entering the try block,

the monitor would never be exited, and the lock would be “leaked.” To help prevent against this, the just-in-time

(JIT) compiler ensures that, as long as the call to Monitor.Enter is the instruction immediately before the try block,

no asynchronous exception will be able to sneak in between the two. Unfortunately, it’s not always the case that

these instructions are immediate neighbors. For example, in debug builds, the compiler uses nop instructions to

support setting breakpoints in places that breakpoints would not otherwise be feasible. Worse, it’s often the case

that developers want to enter a lock conditionally, such as with a timeout, and in such cases there are typically

branching instructions between the call and entering the try block:

 

C#

 

if (Monitor.TryEnter(someObject, 1000))

{

 

 

try

 

{

 

// … critical region of code

 

}

 

finally

 

{

 

Monitor.Exit(someObject);

}

 

 

}

else { /*…*/ }

 

 

To address this, in the .NET Framework 4 new overloads of Monitor.Enter (and Monitor.TryEnter) have been

added, supporting a new pattern of reliable lock acquisition and release:

 

C#

 

public static void Enter(object obj, ref bool lockTaken);

 

This overload guarantees that the lockTaken parameter is initialized by the time Enter returns, even in the face of

asynchronous exceptions. This leads to the following new, reliable pattern for entering a lock:

 

Patterns of Parallel Programming Page 109

 

 

C#

 

 

bool lockTaken = false;

 

try

 

{

 

Monitor.Enter(someObject, ref lockTaken);

 

// … critical region of code

 

}

 

finally

 

{

 

if (lockTaken) Monitor.Exit(someObject);

 

}

 

In fact, code similar to this is what the C# and Visual Basic compilers output in the .NET Framework 4 for the lock

and SyncLock construct. This pattern applies equally to TryEnter, with only a slight modification:

 

C#

 

bool lockTaken = false;

try

 

 

{

Monitor.TryEnter(someObject, 1000, ref lockTaken);

if (lockTaken)

{

 

 

// … critical region of code

 

}

else { /*…*/ }

 

 

}

 

finally

 

{

if (lockTaken) Monitor.Exit(someObject);

}

 

 

Note that the new System.Threading.SpinLock type also follows this new pattern, and in fact provides only the

reliable overloads:

 

C#

 

public struct SpinLock

 

{

 

public void Enter(ref bool lockTaken);

 

public void TryEnter(ref bool lockTaken);

 

public void TryEnter(TimeSpan timeout, ref bool lockTaken);

 

public void TryEnter(int millisecondsTimeout, ref bool lockTaken);

 

// …

 

}

 

With these methods, SpinLock is then typically used as follows:

 

C#

 

private static SpinLock _lock = new SpinLock(enableThreadOwnerTracking: false);

 

// …

 

bool lockTaken = false;

try

 

{

_lock.Enter(ref lockTaken);

 

// … very small critical region here

 

}

 

Patterns of Parallel Programming Page 110

 

 

finally

 

{

 

if (lockTaken) _lock.Exit(useMemoryBarrier: false);

}

 

Alternatively, SpinLock may be used with TryEnter as follows:

 

C#

 

bool lockTaken = false;

 

try

 

{

 

_lock.TryEnter(ref lockTaken);

 

if (lockTaken)

 

{

 

// … very small critical region here

 

}

 

else { /*…*/ }

 

}

 

finally

 

{

 

if (lockTaken) _lock.Exit(useMemoryBarrier:false);

 

}

 

The concept of a spin lock is that rather than blocking, it continually iterates through a loop (“spinning”), until the

lock is available. This can lead to benefits in some cases, where contention on the lock is very infrequent, and

where if there is contention, the lock will be available in very short order. This then allows the application to avoid

costly kernel transitions and context switches, instead iterating through a loop a few times. When used at incorrect

times, however, spin locks can lead to significant performance degradation in an application.

 

The constructor to SpinLock accepts an enableThreadOwnerTracking parameter,

which default to true. This causes the SpinLock to keep track of which thread

currently owns the lock, and can be useful for debugging purposes. This does,

however, have an effect on the lock’s behavior when the lock is misused . SpinLock is

not reentrant, meaning that a thread may only acquire the lock once. If thread

holding the lock tries to enter it again, and if enableThreadOwnerTracking is true,

the call to Enter will throw an exception. If enableThreadOwnerTracking is false,

however, the call will deadlock, spinning forever.

 

In general, if you need a lock, start with Monitor. Only if after performance testing do you find that Monitor isn’t

fitting the bill should SpinLock be considered. If you do end up using a SpinLock, inside the protected region you

should avoid blocking or calling anything that may block, trying to acquire another lock, calling into unknown code

(including calling virtual methods, interface methods, or delegates), and allocating memory. You should be able to

count the number of instructions executed under a spin lock on two hands, with the total amount of CPU

utilization in the protected region amounting to only tens of cycles.

 

MIXING EXCEPTIONS W I TH LOCKS

 

As described, a lot of work has gone into ensuring that locks are properly released, even if exceptions occur within

the protected region. This, however, isn’t always the best behavior.

 

Patterns of Parallel Programming Page 111

 

 

Locks are used to make non-atomic sets of actions appear atomic, and that’s often needed due to multiple

statements making discrete changes to shared state. If an exception occurs inside of a critical region, that

exception may leave shared data in an inconsistent state. All of the work we’ve done to ensure reliable lock release

in the face of exceptions now leads to a problem: another thread may acquire the lock and expect state to be

 

consistent, but find that it’s not.

 

In these cases, we have a decision to make: is it better to allow threads to access potentially inconsistent state, or

is it better to deadlock (which would be achievable by not releasing the lock, but by “leaking” it instead)? The

answer really depends on the case in question.

 

If you decide that leaking a lock is the best solution, instead of using the aforementioned patterns the following

may be employed:

 

C#

 

Monitor.Enter(someObject);

 

// … critical region

 

Monitor.Exit(someObject);

 

Now if an exception occurs in the critical region, the lock will not be exited, and any other threads that attempt to

acquire this lock will deadlock. Of course, due to the reentrancy supported by Monitor in the .NET Framework, if

this same thread later attempts to enter the lock, it will succeed in doing so.

 

A V O I D I N G D E A D L O C K S

 

Of all of the problems that may result from incorrect synchronization, deadlocks are one of the most well -known.

There are four conditions required for a deadlock to be possible:

 

1.

Mutual exclusion. Only a limited number of threads may utilize a resource concurrently.

2.

Hold and wait. A thread holding a resource may request access to other resources and wait until it gets

them.

3.

No preemption. Resources are released only voluntarily by the thread holding the resource.

4.

Circular wait. There is a set of {T1, …, TN} threads, where T1 is waiting for a resource held by T2, T2 is

waiting for a resource held by T3, and so forth, up through TN waiting for a resource held by T1.

If any one of these conditions doesn’t hold, deadlock isn’t possible. Thus, in order to avoid deadlock, we need to

ensure that we avoid at least one of these. The most common and actionable condition to avoid in real-world code

is #4, circular waits, and we can attack this condition in a variety of ways. One approach involves detecting that a

cycle is about to occur. We can maintain a store of what threads hold what locks, and if a thread makes an attempt

to acquire a lock that would lead to a cycle, we can prevent it from doing so; an example of this graph analysis is

codified in the “.NET Matters: Deadlock Monitor” article at http://msdn.microsoft.com/enus/

magazine/cc163352.aspx. There is another example in the article “No More Hangs: Advanced Techniques To

Avoid And Detect Deadlocks In .NET Apps” by Joe Duffy at http://msdn.microsoft.com/enus/

magazine/cc163618.aspx. That same article by Joe Duffy also includes an example of another approach: lock

leveling. In lock leveling, locks are assigned numerical values, and the system tracks the smallest value lock held by

 

Patterns of Parallel Programming

Page 112

 

 

a thread, only allowing the thread to acquire locks with smaller values than the smallest value it already holds; this

prevents the potential for a cycle.

 

In some cases, we can avoid cycles simply by sorting the locks utilized in some consistent way, and ensuring that if

multiple locks need to be taken, they’re taken in sorted order (this is, in effect, a lock leveling scheme). We can see

a simple example of this in an implementation of the classic “dining philosophers” problem.

 

The dining philosophers problem was posited by Tony Hoare, based on previous examples from Edsger Dijkstra in

the 1960s. The basic idea is that five philosophers sit around a table. Every philosopher has a plate of pasta, and

between every pair of philosophers is a fork. To eat the pasta, a philosopher must pick up and use the forks on

both sides of him; thus, if a philosopher’s neighbor is eating, the philosopher can’t. Philosophers alternate

between thinking and eating, typically for random periods of time.

 

 

We can represent each fork as a lock, and a philosopher must acquire both locks in order to eat. This would result

in a solution like the following:

 

C#

 

// WARNING: THIS METHOD HAS A BUG

 

 

const int NUM_PHILOSOPHERS = 5;

object[] forks = new object[NUM_PHILOSOPHERS];

var philosophers = new Task[NUM_PHILOSOPHERS];

for (int i = 0; i < NUM_PHILOSOPHERS; i++)

{

 

 

int id = i;

philosophers[i] = Task.Factory.StartNew(() =>

{

 

 

Patterns of Parallel Programming Page 113

 

 

var rand = new Random(id);

while (true)

{

 

 

// Think

 

 

Thread.Sleep(rand.Next(100, 1000));

 

 

// Get forks

 

 

object leftFork = forks[id];

object rightFork = forks[(id + 1) % NUM_PHILOSOPHERS];

Monitor.Enter(leftFork);

Monitor.Enter(rightFork);

 

 

// Eat

 

 

Thread.Sleep(rand.Next(100, 1000));

 

 

// Put down forks

 

 

Monitor.Exit(rightFork);

Monitor.Exit(leftFork);

}

 

 

}, TaskCreationOptions.LongRunning);

}

Task.WaitAll(philosophers);

 

 

Unfortunately, this implementation is problematic. If every philosopher were to pick up his left fork at the same

time, all of the forks would be off the table. Each philosopher would then attempt to pick up the right fork and

would need to wait indefinitely. This is a classic deadlock, following the exact circular wait condition previously

described.

 

To fix this, we can eliminate the cycle by ensuring that a philosopher first picks up the lower numbered fork and

then the higher numbered fork, even if that means picking up the right fork first:

 

C#

 

while (true)

{

 

 

// Think

 

 

Thread.Sleep(rand.Next(100, 1000));

 

 

// Get forks in sorted order to avoid deadlock

 

 

int firstForkId = id, secondForkId = (id + 1) % NUM_PHILOSOPHERS;

if (secondForkId < firstForkId) Swap(ref firstForkId, ref secondForkId);

object firstFork = forks[firstForkId];

object secondFork = forks[secondForkId];

Monitor.Enter(firstFork);

Monitor.Enter(secondFork);

 

 

// Eat

 

 

Thread.Sleep(rand.Next(100, 1000));

 

 

// Put down forks

 

 

Monitor.Exit(secondFork);

Monitor.Exit(firstFork);

}

 

 

Another solution is to circumvent the second deadlock requirement, hold and wait, by utilizing the operating

system kernel’s ability to acquire multiple locks atomically. To accomplish that, we need to forego usage of

 

Patterns of Parallel Programming Page 114

 

 

Monitor, and instead utilize one of the .NET Framework synchronization primitives derived from WaitHandle, such

as Mutex. When we want to acquire both forks, we can then utilize WaitHandle.WaitAll to acquire both forks

atomically. Using WaitAll, we block until we’ve acquired both locks, and no other thread will see us holding one

lock but not the other.

 

C#

 

const int NUM_PHILOSOPHERS = 5;

 

Mutex[] forks = Enumerable.Range(0, NUM_PHILOSOPHERS)

.Select(i => new Mutex())

.ToArray();

 

 

var philosophers = new Task[NUM_PHILOSOPHERS];

for (int i = 0; i < NUM_PHILOSOPHERS; i++)

{

 

 

int id = i;

philosophers[i] = Task.Factory.StartNew(() =>

{

 

 

var rand = new Random(id);

while (true)

{

 

 

// Think

 

 

Thread.Sleep(rand.Next(100, 1000));

 

 

// Get forks together atomically

 

 

var leftFork = forks[id];

var rightFork = forks[(id + 1) % NUM_PHILOSOPHERS];

WaitHandle.WaitAll(new[] { leftFork, rightFork });

 

 

// Eat

 

 

Thread.Sleep(rand.Next(100, 1000));

 

 

// Put down forks; order of release doesn’t matter

 

leftFork.ReleaseMutex();

rightFork.ReleaseMutex();

}

 

 

}, TaskCreationOptions.LongRunning);

}

Task.WaitAll(philosophers);

 

 

The .NET Framework 4 parallel programming samples at http://code.msdn.microsoft.com/ParExtSamples contain

several example implementations of the dining philosophers problem.

 

A N T I -P A T T E R N S

 

 

L O C K ( T H I S ) A N D L O C K ( T Y P E O F ( S O M E T Y P E ))

 

Especially in code written early in the .NET Framework’s lifetime, it was common to see synchronization done in

 

instance members with code such as:

 

C#

 

void SomeMethod()

 

{

lock (this)

{

 

Patterns of Parallel Programming

 

Page 115

 

 

// … critical region here

 

}

 

}

 

It was also common to see synchronization done in static members with code such as:

 

C#

 

static void SomeMethod()

 

{

 

lock(typeof(MyType))

 

{

 

// … critical region here

 

}

 

}

 

In general, this pattern should be avoided. Good object-oriented design results in implementation details

remaining private through non-public state, and yet here, the locks used to protect that state are exposed. With

these lock objects then public, it becomes possible for an external entity to accidentally or maliciously interfere

with the internal workings of the implementation, as well as make common multithreading problems such as

deadlocks more likely. (Additionally, Type instances can be domain agile, and a lock on a type in one AppDomain

may seep into another AppDomain, even if the state being protected is isolated within the AppDomain.) Instead

and in general, non-public (and non-AppDomain-agile) objects should be used for locking purposes.

 

The same guidance applies to MethodImplAttribute. The MethodImplAttribute accepts a MethodImplOptions

enumeration value, one of which is Synchronized. When applied to a method, this ensures that only one thread at

a time may access the attributed member:

 

C#

 

[MethodImpl(MethodImplOptions.Synchronized)]

void SomeMethod()

{

 

 

// … critical region here

 

}

 

However, it does so using the equivalent of the explicit locking code shown previously, with a lock on the instance

for instance members and with a lock on the type for static members. As such, this option should be avoided.

 

R E A D O N L Y S P I N L O C K F I E L D S

 

The readonly keyword informs the compiler that a field should only be updated by the constructor; any attempts

to modify the field from elsewhere results in a compiler error. As such, you might be tempted to write code like

the following:

 

C#

 

private readonly SpinLock _lock; // WARNING!

 

Don’t do this. Due to the nature of structs and how they interact with the readonly keyword, every access to this

_lock field will return a copy of the SpinLock, rather than the original. As a result, every call to _lock.Enter will

succeed in acquiring the lock, even if another thread thinks it owns the lock.

 

Patterns of Parallel Programming Page 116

 

 

For the same reason, don’t pass try to pass SpinLocks around. In most cases, when you do so, you’ll be making a

copy of the SpinLock. As an example, consider the desire to write an extension method for SpinLock that executes

a user-provided delegate while holding the lock:

 

C#

 

// WARNIN ! DON’T DO THIS.

 

public static void Execute(this SpinLock sl, Action runWhileHoldingLock)

 

{

bool lockWasTaken = false;

try

 

 

{

sl.Enter(ref lockWasTaken);

runWhileHoldingLock();

 

 

}

 

finally

 

{

if (lockWasTaken) sl.Exit();

}

}

 

 

Theoretically, this code should allow you to write code like:

 

C#

 

_lock.Execute( () =>

{

 

 

… // will be run while holding the lock

});

 

 

However, the code is very problematic. The SpinLock being targeted by the method will be passed by value, such

that the method will execute on a copy of the SpinLock rather than the original. To write such a method correctly,

you’d need to pass the SpinLock into the Execute method by reference, and C# doesn’t permit an extension

method to target a value passed by reference. Fortunately, Visual Basic does, and we could write this extension

method correctly as follows:

 

C#

 

(This extension method cannot be written in C#.)

 

Visual Basic

 

<Extension()>

 

Public Sub Execute(ByRef sl As SpinLock, ByVal runWhileHoldingLock As Action)

Dim lockWasTaken As Boolean

Try

 

sl.Enter(lockWasTaken)

runWhileHoldingLock()

 

 

Finally

If lockWasTaken Then sl.Exit()

End Try

End Sub

 

 

See the blog post at http://blogs.msdn.com/pfxteam/archive/2009/05/07/9592359.aspx for more information

about this dangerous phenomenon.

 

Patterns of Parallel Programming Page 117

 

 

C O N C L U S I O N C O N C L U S I O N

Understanding design and coding patterns as they relate to parallelism will help you to find more areas of your

application that may be parallelized and will help you to do so efficiently. Knowing and understanding patterns of

parallelization will also help you to significantly reduce the number of bugs that manifest in your code. Finally,

using the new parallelization support in the .NET Framework 4 which encapsulate these patterns will not only help

to reduce the bug count further, but it should help you to dramatically decrease the amount of time and code it

takes to get up and running quickly and efficiently.

 

Now, go forth and parallelize.

 

Enjoy!

 

A C K N O W L E D G E M E N T S

 

 

The author would like to thank the following people for their feedback on drafts of this paper: Donny Amalo, John

Bristowe, Tina Burden, David Callahan, Chris Dern, Joe Duffy, Ed Essey, Lisa Feigenbaum, Boby George, Scott

Hanselman, Jerry Higgins, Joe Hoag, Luke Hoban, Mike Liddell, Daniela Cristina Manu, Ade Miller, Pooja Nagpal,

Jason Olson, Emad Omara, Igor Ostrovsky, Josh Phillips, Danny Shih, Cindy Song, Mike Stall, Herb Sutter, Don Syme,

Roy Tan, Ling Wo, and Huseyin Yildiz.

 

A B O U T T H E A U T H O R

 

 

Stephen Toub is a Program Manager Lead on the Parallel Computing Platform team at Microsoft, where he spends

his days focusing on the next generation of programming models and runtimes for concurrency. Stephen is also a

Contributing Editor for MSDN® Magazine, for which he writes the .NET Matters column, and is an avid speaker at

conferences such as PDC, TechEd, and DevConnections. Prior to working on the Parallel Computing Platform,

Stephen designed and built enterprise applications for companies such as GE, JetBlue, and BankOne. Stephen

holds degrees in computer science from Harvard University and New York University.

 

This material is provided for informational purposes only. Microsoft makes no warranties, express or implied.

©2010 Microsoft Corporation.

 

Patterns of Parallel Programming Page 118

 

 

Deploying highly available and secure cloud solutions

 

 

 

Deploying highly available and secure cloud solutions

 

 

December 2012

 

 

 

 

 

 

 

 

 

 

 

 

 

footer left page.jpg Deploying highly available and secure cloud solutions

 

This document is for informational purposes only. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED, OR STATUTORY, AS TO THE INFORMATION IN THIS DOCUMENT.

This document is provided “as-is.” Information and views expressed in this document, including URL and other Internet Web site references, may change without notice. You bear the risk of using it.

Copyright © 2012 Microsoft Corporation. All rights reserved.

The names of actual companies and products mentioned herein may be the trademarks of their respective owners.

Authors and contributors

 

DAVID BILLS – Microsoft Trustworthy Computing

CHRIS HALLUM – Microsoft Windows

YALE LI – Microsoft IT

MARC LAURICELLA – MicrosoftTrustworthy Computing

ALAN MEEUS – Windows Phone

DARYL PECELJ – Microsoft IT

TIM RAINS – Microsoft Trustworthy Computing

FRANK SIMORJAY – Microsoft Trustworthy Computing

SIAN SUTHERS – Microsoft Trustworthy Computing

TONY URECHE – Microsoft Windows

footer left page.jpg Table of contents Executive summary ……………………………………………………………………………………………………………….. 1 Introduction ………………………………………………………………………………………………………………………….. 3 Measuring reliability and user expectations …………………………………………………………………….. 4 Service-oriented architecture ………………………………………………………………………………………… 4 Separation of function …………………………………………………………………………………………………….. 5 Automatic failover……………………………………………………………………………………………………………. 5 Fault tolerance …………………………………………………………………………………………………………………. 5 Disaster planning …………………………………………………………………………………………………………….. 6 Test and measure …………………………………………………………………………………………………………….. 6 Cloud provider ……………………………………………………………………………………………………………………… 7 Cloud provider expectation and responsibility ……………………………………………………………. 7 Cloud availability ……………………………………………………………………………………………………………… 8 Design for availability ……………………………………………………………………………………………………… 8 Organizational customer of the cloud ……………………………………………………………………………… 11 The organization’s responsibility ………………………………………………………………………………….. 11 Availability of sensitive information stored in the cloud ……………………………………………13 The user and the device used to access the cloud …………………………………………………………15 User expectation and feedback …………………………………………………………………………………….15 Design for test ………………………………………………………………………………………………………………….16 User device availability ……………………………………………………………………………………………………16 Conclusions …………………………………………………………………………………………………………………………..19 Additional reading ……………………………………………………………………………………………………………… 20 Executive summary

Many organizations today are focused on improving the flexibility and performance of cloud applications. Although flexibility and performance are important, cloud applications must also be available to users whenever they want to connect. This paper focuses on key methodologies that technical decision makers can use to ensure that your cloud services, whether public or private, remain available to your users.

At a high level, each cloud session consists of a customer using a computing device to connect to an organization’s cloud-based service that is hosted by an internal or external entity. When planning for a highly available cloud service, it’s important to consider the expectations and responsibilities of each of these parties. Your plan needs to acknowledge the real-world limitations of technology, and that failures can occur. You must then identify how good design can isolate and repair failures with minimal impact on the service’s availability to users.

footer left page.jpg This paper showcases examples for deploying robust cloud solutions to maintain highly available and secure client connections. In addition, it uses real-world examples to discuss scalability issues. The goal of this paper is to demonstrate techniques that mitigate the impact of failures, provide highly available services, and create an optimal overall user experience.

footer left page.jpg Introduction

Customers have high expectations for the reliability of computing infrastructure, and the same expectations apply to cloud services. Uptime, for example, is a commonly used reliability metric. Today, users expect service uptimes from 99.9% (often referred to as three nines) to 99.999% (five nines), which translates to nine hours of downtime per year (at 99.9%) to five minutes of downtime per year (at 99.999%). Service providers frequently distinguish between planned and unplanned outages, but, as IT managers well know, even a planned change can result in unexpected problems. A single unexpected problem can put even a 99.9% service commitment at risk.

Reliability is ultimately about customer satisfaction, which means that managing reliability is a more nuanced challenge than simply measuring uptime. For example, you can imagine a service that never goes down but that is really slow or that is difficult to use. Although maintaining high levels of customer satisfaction is a multifaceted challenge, reliability is the foundation upon which other aspects of customer satisfaction are built. Cloud-based services must be designed from the beginning with reliability in mind. The following principles of cloud service reliability are discussed in this paper:

. Use a service-oriented architecture . Implement separation of function . Design for failure . Automate testing and measurement . Understand service level agreements

footer left page.jpg Measuring reliability and user expectations

In addition to uptime, which was discussed earlier, other reliability metrics exist that should be considered. A common measurement of computer hardware reliability is mean time to failure (MTTF). If a component fails, the service it provides is unavailable for use until the component is repaired. However, MTTF only tells half of the story. To track the time between failure and repair, the industry created the mean time to repair (MTTR) measurement. To calculate an important metric of service reliability, we can use the equation of MTTF/MTTR. This equation shows that reducing the repair time by half will result in a doubling of the measured availability. For example, consider the situation of an online service that has historically demonstrated an MTTF of one year and an MTTR of one hour. In terms of measured availability, halving the MTTR to a half hour is equivalent to doubling the MTTF to two years.

By focusing on MTTR, you can mitigate the potential impact of failure incidents and seek to improve reliability by creating a set of standby servers with a sufficiently redundant design to hasten recovery from such incidents. You should always document these types of mitigations in a service level agreement (SLA) from the cloud provider. By documenting them you are implicitly acknowledging that some amount of failure is expected to occur, and that the best way to minimize the impact from failure is to increase the MTTF and reduce the MTTR.

The following sections detail some of the key architectural requirements of designing highly available cloud-based services.

Service-oriented architecture

Effective cloud technology adoption requires appropriate design patterns. In a service-oriented architecture, each component should have a well-designed

footer left page.jpg interface so that its implementation is independent of every other component and able to be used by new components as they are deployed. Designing architecture in this way helps reduce overall system downtime, because components that call into a failing component can properly handle such an event.

Separation of function

Separation of function, also known as separation of concerns, is a design pattern that states that each component will implement only one or a small set of closely related functions with no overlap and loose coupling to other components. The three-tier architecture shown in Figure 1 later in this paper is a classic example of separation of function. This approach allows functionality to be spread across different geographies and networks so that each function has the best chance to survive failures of specific servers. The figure depicts redundant front-end web servers, message queues, and storage, each of which could be separated geographically.

Automatic failover

If the component interfaces are registered with a uniform resource identifier (URI), failover to alternate service providers can be as simple as a DNS lookup. Using URIs instead of locations for services increases the likelihood that a functioning service can be located.

Fault tolerance

Also known as graceful degradation, fault tolerance depends on aggregating the building blocks of a service without creating unnecessary dependencies. If the user web interface is as simple as possible and decoupled from the business logic or back-end, the communications channel can survive failures of other components and maintain the organization’s link to the user. In addition, the web interface can be used to inform the user of the current status of each piece of the organization’s cloud-based service. This approach not only helps the user understand when to expect full restoration of services, but also improves user satisfaction.

footer left page.jpg Disaster planning

You should expect that services will fail from time to time. Hardware failure, software imperfections, and man-made or natural disasters can cause service failure. You should complete planning for routine problems before deployment to help troubleshooters know what to look for and how to respond. But even huge environmental disruptions, sometimes known as black swan events, will occur periodically; therefore, you need to consider such events during the planning process. The black swan theory posits that these unlikely events collectively play vastly larger roles than regular outages.

Test and measure

Two types of test and measurement of a running service are appropriate. The automated polling of the service by a test server can result in early detection and reporting of failure and thereby reduce the MTTR.

User research should be conducted either immediately before or after a deployment to understand how users react and identify unmet expectations. An easy way to obtain user feedback is to simply ask them for it, not every time you see them but occasionally. You should be able to get useful data, even with a very low response rate, and you will also obtain key performance indicators (KPIs) from users for monthly status reports.

footer left page.jpg Cloud provider

Two concepts from the 1970s have been realized by new technologies in today’s cloud offerings.

. Virtualization of computer hardware is a reality, with virtual computer images and virtual hard drives that can be remotely managed. . Fast scaling and agility are realities, with management tools that can control the power of physical and virtual hardware.

It’s important that IT professionals understand these concepts and their powerful capabilities. As stated in the “Executive summary” section, the concepts in this paper apply to both public and private clouds, each of which has a place in the toolbox of forward-looking IT departments. The focus from the start of any project should be on cloud design and management services that provide minimally disruptive service delivery to users.

Cloud provider expectation and responsibility

A natural shared responsibility exists between any organization and its chosen cloud provider. For custom applications, the cloud provider designs its service for reliability based on the use of certain features, such as failover and monitoring, by the developer. The developer must understand and use these features for reliability to be an achievable goal.

An organization that implements a solution on top of cloud-based infrastructure must ensure that the service is available as much as possible. Users expect these types of services to be as reliable as a telephone. Outages may occur, but they are rare, localized events. The organization’s ability to provide such assurance requires transparent communication with the provider about what to expect from the service and what must be supplied by the service consumer. Without such transparency, finger-pointing about responsibility can occur instead of automated recovery when service failures lead to outages for users.

footer left page.jpg The same consideration applies to private clouds. An IT organization can partition infrastructure responsibility to in-house experts who then create a private cloud. The cloud service is expected to provide a reliable platform on which the rest of the IT team can create innovative solutions to address the business needs of the organization. But just as for an outsourced cloud service, fully transparent operations and documentation of expectations will help avoid failures that result from vague or poorly defined areas of responsibility.

Cloud availability

Typically, public cloud services provide high availability by using a geographically distributed and professionally managed collection of server farms and network devices. Even very large enterprises that have private clouds for specific high-value content can profitably use public cloud services to host their application solutions. And large public clouds have effectively infinite capacity, because they can respond with more servers when demand is greater than anticipated. Offloading the excess server capacity has multiple benefits, the most prominent of which is that, in most public cloud models, excess capacity is not billed until it is used.

Many cloud providers offer built-in capabilities for increased availability and responsiveness, including:

. Round-robin DNS . Content distribution networks . Automated failover . Geographic availability zones

Design for availability

As stated earlier, the most effective way to increase availability is to shorten the MTTR. If geographic and network diversity is available from the cloud, ensure that load balancing automatically routes users away from failed components to working components. Even a relatively simple capability such as network load balancing can be affected by unexpected interaction between the organization

footer left page.jpg and the cloud provider or DNS. Such interactions have the potential to introduce instabilities in the service offering that have not been anticipated.

For example, distributed denial of service (DDoS) attacks are external attacks against availability that all cloud services need to mitigate. However, unless mitigation is carefully implemented, organizations with little security experience can unintentionally cause an application to become unavailable, which can result in as much downtime damage as a DDoS attack. DDoS mitigation is an example of a capability that is most effectively provided at scale—that is, by the cloud provider or the ISP.

Most major cloud vendors are certified for reliability and security, which they report in documents such as those contained in the Cloud Security Alliance (CSA) Security, Trust, and Assurance Registry (STAR).1 Although STAR itself is relatively new, it’s important for IT managers to consider that most in-house systems have not received such third-party vetting. Obtaining such assurance at a shared cost is another potential benefit of public cloud services.

1 Security, Trust and Assurance Registry (STAR), at https://cloudsecurityalliance.org/star/

When evaluating the benefits and challenges of creating a highly available cloud- based solution, it is important to ensure that your design includes a threat analysis of well-known problems such as those defined earlier, as well as any business- disrupting failures that are unique to the solution you want to deploy. Typically, only security attacks are considered in a threat analysis, but a well-designed cloud solution will consider other types of loss of availability and plans for mitigations as well.

The following figure shows a generic three-tier design with redundancy capability. Each request has more than one path to a component that can respond to it. Starting from the left is a user device, such as a laptop computer, from which the request is routed (again, consider the use of round-robin DNS and network load balancing features, if available). In tandem, an automated availability test service is exercising as much of the system as possible to assure that any failure is quickly reported so that necessary repairs can begin quickly.

The cloud service in the figure shows separation of function. This separation helps ensure that the network path to the user has no common components that result

footer left page.jpg in failure of both paths. Within each cloud service site, additional component redundancies are possible. In this scenario, the queue may be provided by the cloud service provider. If a common database needs to be shared between both sites, it is up to the organization’s application architecture to route traffic to the data in a way that will fail over to some other site running a mirrored copy of the database; however, many public cloud offerings have built-in data redundancy capabilities that should also be explored.

In this example solution, load balancing is split between the Internet at the front end, the cloud service in the middle tier, and the organization’s application at the back-end. Tests of disabling each of these components should be undertaken with live loads to assure that fail-over options work as planned.

Figure 1. Designing for availability

 

If the website front-end component is sufficiently simple and straightforward, users should always be able to see the enterprise brand image and status information, which will help them have confidence in the reliability of the enterprise itself. Simplicity in both components and connections is the key to the reliability and availability of the system as a whole. Complex and tightly interconnected systems are difficult to maintain and debug when something fails.

footer left page.jpg Organizational customer of the cloud

An organization that acquires cloud services from either private or public cloud providers needs to fully understand the responsibilities of the cloud provider as well as the limitations of those responsibilities. Similarly, the cloud provider must understand the availability and security requirements of the solution that it provides to users. A complete cloud solution requires a thoroughly reliable implementation and an ability by the cloud service provider to create a service that integrates its own capabilities with the organization’s requirements. The good news is that this integration is where the most innovation and value-add for the organization is generated.

The organization’s responsibility

When the responsibilities of the cloud provider are specified, well understood, and documented in a service level agreement, any unmitigated threats become the responsibility of the customer—the organization. A best practice for identifying potential unmitigated threats is to conduct a brainstorming session to identify all possible threats and then filter out those that are known to be the responsibility of the cloud provider. Threats that remain are the organization’s responsibility to mitigate. The following list of risks and responsibilities can be used as a guide for types of threats to consider:

. Apply access control locally and in the cloud. Although data loss incidents can occur for a variety of reasons, they most commonly occur when an attacker either spoofs the identity of a valid user or elevates their own privileges to acquire access that has not been authorized. Most organizations compile a directory of employees and partners that can be federated, or have their accounts mirrored, into a cloud environment; however, other users may need to be authenticated using different methods. For future flexibility, adopt

footer left page.jpg cloud services that support federation and accept identities from on-premises directories as well as from external identity providers. The trend is for providers to include access control services as a part of their service offering. A best practice is to avoid duplicating your account database, because doing so increases the attack surface of the information it contains, such as password data. . Protect data in transit. Data loss can occur if the data is not protected in storage or in transit. Protecting data in transit can be accomplished by using Transport Layer Security (TLS) to provide encryption between endpoints. Protecting data in storage is more of a challenge. Encryption can be provided in the cloud, but if the data is to be decrypted by apps that also run in the cloud, the encryption key needs special protections. It’s important to note that providing the cloud with access to the encryption keys as well as to the encrypted data is equivalent to storing the data unencrypted. . Protect trusted roles. Authorization to perform administrative functions or to access high-value data will likely be based on users’ roles within their organizations. Because roles will vary for each user while their identity remains constant, some mapping must exist between each user ID and a list of the ID’s authorized roles. If the cloud provider is trusted to control access, this list must be made available to the cloud and managed accordingly. A best practice is to use claims-based authorization technologies such as Security Assertion Markup Language (SAML). . Protect data on mobile devices. Protection of user credentials and other sensitive data on mobile devices is only feasible if security policy can be enforced. A best practice is to configure Microsoft Exchange ActiveSync mailbox policies. Although not all devices implement all of the ActiveSync policies, the market is responding to this need and organizations should seek to deploy and enforce a endpoint security solutions on all mobile devices. . Develop all code in accordance with SDL. Application code is likely to come from a combination of the cloud provider (for example, in the form of sample code), the cloud tenant organization, and third parties. A threat modeling process such as the one used as part of the Security Development Lifecycle (SDL), the software development security assurance process created by Microsoft, needs to consider this factor. One area in particular that needs to

footer left page.jpg be analyzed is the potential for conflict if more than one component controls related functionality, such as authorization. . Optimize for low MTTR. The threat modeling process also needs to consider and specify different types of expected failures to help ensure low MTTR. For each potential failure, specify the tools and technologies that are available for recovering functionality quickly.

Availability of sensitive information stored in the cloud

The cloud offers some interesting options for securing data within a highly available architecture. Consider the following example cloud solution, in which the functionality for both security and availability is split between the organization and the cloud. Sensitive organizational information is stored in the cloud, but the decryption keys are maintained within the organization so that no attack on the cloud can reveal the sensitive information. One option is to encrypt all data that is stored in the cloud to prevent data leakage. However, it is also possible to differentiate between types of data so that high-value data is protected by encryption and low-value data is protected only by access control. The design principles of separation of function and geographical distribution that were suggested earlier are used here to increase resiliency.

The following figure illustrates how the data protection scenario works. Data is entered and retrieved at a workstation that is attached to an organization’s network. The network uses a firewall to protect it from intrusions. The workstation connects to a service in the network that uses a key to encrypt and decrypt data. The key is obtained from the organization’s central directory, so all distinct local protected networks can access the data that is stored in the cloud.

The user is authenticated by the organization’s directory. Federation and SAML are used to provide centralized control of authentication and authorization, which takes advantage of the available existing account repository (such as Active Directory) in a distributed environment.

footer left page.jpg Figure 2. Data encrypted in the cloud

 

This example of encrypted data in the cloud can be used as a pattern for a variety of implementations. The protection of the data decryption key removes the threat of data leaks in the cloud and puts control of the plaintext data under the organization’s full control. The distributed architecture helps to ensure the availability of the service.

footer left page.jpg The user and the device used to access the cloud

Users will measure the availability of a cloud service solely in terms of their ability to complete their current task, which means that the cloud service as well as the device that they use to access the cloud must be functional.

User expectation and feedback

Users measure service availability based on their success at achieving their objectives. The following figure shows the typical steps in the process of a user obtaining access to a cloud resource. First, the user’s device must connect to the local network and be authenticated. Next, the device’s security disposition, or health, is checked and some sort of role-based authorization process is used to establish appropriate access for the user. Finally, the cloud resource itself must be available. The failure of any of these components will block the user’s ability to complete their task. Sometimes the blockage is desired for security purposes, but the user will always perceive it to be an impediment.

Figure 3. Availability blockers

 

When any one of the links shown in the figure fails, it is important to let the user know the nature of the problem and also what needs to be done to restore availability of the solution. Whenever user action is required, instructions need to be clear and concise.

footer left page.jpg Design for test

An automated test program is helpful for detecting solution failures. A best practice is to design the solution for online testing. All customer-facing services and webpages need to enable automated query programs that use near real-time reporting with automated escalation when significant failures occur. Online testing can provide valuable performance indicators of the availability of the solution services.

Some method of communicating users’ perception of availability should be implemented as well. For example:

. If an application provides users with access to the cloud, use it to generate statistics, such as time from login to acquisition of cloud data. . Periodically ask users when their session ends if they would take a short survey. . Send user experience researchers into the field to get user feedback.

User device availability

Devices that are used to access cloud solutions must be trusted not to leak high- value information. Because mobile devices are increasingly being used to access such information, some way to evaluate device security, or health, is required. A good cloud design is able to evaluate device health and verify user identity. This section describes how to provision device health assessment in such a way that the cloud solution can be available to users from anywhere.

This example solution addresses the need to establish secure access to an organization’s network from a user-owned device. This type of scenario is often referred to as bring your own device to work, or BYOD. For many years, IT departments were able to protect enterprise assets by quarantining all resources, including the client computers that accessed those resources, inside a protected perimeter. In BYOD scenarios users have commercially available devices that can access all of their personal data from anywhere, and they want to use those same devices to access the organization’s resources as well. This phenomenon is known as the consumerization of IT.

footer left page.jpg In such a scenario, the cloud service needs to establish the user’s identity, learn their preferences about how their personal information can be used, and obtain the user’s permission if the service (or the organization) wants to store the user’s personal information for future use or share it with others. In addition, the cloud service may have content that should only be released to user devices that are determined to be secure. Many users want to know that their privacy, identity, and assets are protected from malware, although most users are unwilling to be inconvenienced by security mechanisms.

The following figure shows a solution built to assess mobile device health from the cloud. The mobile device authenticates the user through a connection to an identity provider in the cloud. If the web service has highly confidential information, or is trying to obtain a provable indication of the user’s intent, it may elect to verify the security of the mobile device before proceeding. The user’s device will then receive a health attestation that can be sent with the user ID.

Figure 4. Secure mobile clients

 

 

Windows 8 devices can be protected from low-level rootkits and bootkits by using low-level hardware technologies such as secure boot and trusted boot.

footer left page.jpg Secure boot is a firmware validation process that helps prevent rootkit attacks; it is part of the Unified Extensible Firmware Interface (UEFI) specification. The intent of UEFI is to define a standard way for the operating system to communicate with modern hardware, which can perform faster, more efficient input/output (I/O) functions than older, software interrupt-driven BIOS systems.

Trusted boot creates a condition in which malware—even if it is able to tamper with the boot process, which is unlikely—can be detected, which prevents a health attestation from being granted. Secure boot also protects the antimalware software itself.

A Remote Attestation Service (RAS) agent can communicate measured boot data that is protected by a Trusted Platform Module (TPM). After the device successfully boots, boot process measurement (for example, measured boot in Windows 8) data is sent to a RAS agent that compares the measurements and conveys the health state of the device—a positive, negative or unknown state—by sending a health claim back to the device.

If the device is healthy, it passes that information to the web service so the organization’s access control policy can be invoked to grant access.

Depending on the requirements of the content provider, device health data can be combined with user identity information in the form of Security Assertion Markup Language (SAML) or open standard for authorization (OAuth) claims. The identity provider, for example Active Directory, may belong to the user’s employer, to the content provider, or to a social network such as Facebook. The data is evaluated by fraud detection services that are already in use at most commercial websites. Access to content is then authorized to the appropriate level of trust for what the health assertions, or claims, merit. These claims protocols are structured to allow additional requests from the content provider to the user’s device as needed by the user’s transaction requests of the provider. For example, if high-value data or funds transfers are requested, additional security state may need to be established by querying the user’s device before the transaction can be completed.

 

footer left page.jpg Conclusions

The preceding examples illustrate solutions that emphasize a secure, service- oriented architecture with separation of functions. The demonstrated architectural patterns divide the solution into components with loose coupling. This approach allows each component to fail over gracefully, even if other components fail catastrophically. The service as a whole may continue with some or all functionality no matter which individual component fails. The design can use hybrid solutions that include some on-premises functionality while providing other functionality, such as solution scaling, through an off-premises public cloud.

As a best practice, availability needs to be monitored by a service that operates in the same realm as the user. In addition, services should be designed and located in a way that makes them accessible and operable from geographically diverse locations to provide availability when calamities and natural disasters occur.

Cloud service developers and cloud service customers alike need to communicate and cooperate to anticipate, design, and test for failures at every point. Such communication and cooperation will have a direct impact on the success of the solution and the satisfaction of its users.

footer left page.jpg Additional reading

For more information about the scenarios and solutions detailed in this paper, see the following resources. These documents provide additional information to help you make the right design decisions for the availability of your cloud-based solutions.

. The Windows Azure Application Model https://www.windowsazure.com/en– us/develop/nodejs/fundamentals/application-model/ . Microsoft System Center http://microsoft.com/systemcenter (this website is now focused on Release Candidate 2012) . Cloud Computing: Achieving Control in the Hybrid Cloud http://technet.microsoft.com/en-us/magazine/hh389788.aspx . Cloud Security Alliance – Security, Trust & Assurance Registry (STAR) https://cloudsecurityalliance.org/star/ . Security Guidelines for SQL Azure http://social.technet.microsoft.com/wiki/contents/articles/1069.security– guidelines-for-sql-azure.aspx . Service-Oriented Architecture – Design Patterns www.soapatterns.org/masterlist_c.php . Cloud Insecurity: Not Enough Tools, Experience or Transparency www.technewsworld.com/story/74890.html . How the Cloud Looks from the Top: Achieving Competitive Advantage In the Age of Cloud Computing (PDF) http://download.microsoft.com/download/1/4/4/1442E796-00D2-4740-AC2D– 782D47EA3808/16700%20HBR%20Microsoft%20Report%20LONG%20webview.pdf

 

footer left page.jpg

 

One Microsoft Way

Redmond, WA 98052-6399

microsoft.com/twcnext

Office 365 Enterprise Preview

Office 365 Enterprise Preview

The efficiency you want, the controls you need

Create professional content more easily than ever, securely connect with customers and partners, and take advantage of powerful tools for more effective management, control and compliance with enterprise-grade security, data loss prevention, and rights management.
.

Top 10 reasons to try Office 365 Enterprise Preview

.

1. Have Office when you need it

When you’re away from your PC, stream a full-featured version of Office on any Internet-connected PC (Windows 7 or later required) with Office on Demand.

2. Work together with Site Mailboxes

Store documents from your PC and project-related email in a Site Mailbox so that your team can access the content no matter where you are. The Site Mailbox syncs with SharePoint, ensuring content is up-to-date.

3. Protect sensitive data

Keep your organization safe with data loss prevention (DLP) capabilities that prevent users from mistakenly sending sensitive information to unauthorized people. The DLP features in Exchange identify, monitor, and protect sensitive data through deep content analysis and provide built-in and extensible DLP policies that are based on regulatory standards such as PII, HIPAA, and PCI.
.
Data Loss Prevention capabilities.

4. Stay compliant with archiving

The ability to retain and discover data across your organization is essential to ensuring internal and regulatory compliance. Compliance officers can use the new eDiscovery Center to identify, hold, and analyze your organization’s data from Exchange, SharePoint, and Lync. The data always remains in place so you don’t need to manage a separate data store.

5. Keep people connected

You can follow documents, sites, and people to track what others are working on. SharePoint even recommends people or documents to follow. With ratings and reputation tools, you can acknowledge co-workers and inspire them to work better.

Stay connected with SharePoint.

6. Gain business insights

Combine large volumes of data from various sources with PowerPivot in Excel and explore data, visualize, and tell a compelling story with Power View. Make better decisions with teammates by sharing Business Intelligence-enriched reports and dashboards on SharePoint.

7. Work across time zones and geographies

Discuss and meet, coauthor documents, find experts, and chat in real-time with improved capabilities that keep everyone connected no matter where they are. Detailed contact cards and presence are integrated across all Office applications, making it easier than ever to connect.

8. Collaborate more securely

New capabilities and data governance/protection policy features in SharePoint let you work confidently with partners and customers. Use Lync to connect with others, while getting the authentication, encryption, and media controls your enterprise needs.

9. Get more out of your meetings

The new Lync Meeting tools let you interact with people through video, audio, and instant messaging, and to share content while meeting. Join a Lync Meeting from an HTML5-based browser using the Lync Web App, and enjoy HD video, voice over IP, instant messaging, and sharing of desktops, applications, and PowerPoint presentations.

Lync Meeting.

10. Manage and control your business more easily

Office 365 continues to improve the ease of management from a web-based portal, provides powerful scripting access via PowerShell, and continues to invest in the infrastructure with data backup, disaster recovery, and globally redundant data centers. In addition, the new service health dashboard provides customizable reports that help you get insights into your service.

Windows 8 and Windows RT Product guide

Windows 8 and

Windows RT

 

windows.microsoft.com

 

Product guide

 

 

© 2012 Microsoft Corporation. All rights reserved.

 

 

Contents

 

 

Meet Windows 8 and Windows RT 04

 

The Start screen is all you 06

Your password in a picture 07

Ready to roll: The apps you need 08

The Windows Store 11

Your Windows, wherever you go 11

 

In touch and up to date 12

 

Connect your stuff and your people 14

Work and play on the go 14

Internet Explorer 10: A more beautiful web 15

 

Natural and intuitive 16

 

Discover fast and fun ways to get around 21

Search, share, change settings, and more 24

Apps work together 26

Snapping apps 27

Keyboard shortcuts 28

 

Your PC, your apps, your choice 30

 

Truly innovative hardware 32

Windows Store: All the apps you want 34

Family Safety 40

Xbox 360 and Windows 8 and Windows RT 40

 

Reimagined, but familiar 42

 

A new take on Task Manager 44

More secure 44

File Explorer revamped 44

Restore and reset your PC 45

Multi-monitor support 45

 

Windows 8: Powerful for work and play 46

 

Windows and devices 48

Windows 8 Pro: Ready for business 48

Access and protect your data 48

 

Windows RT: Fast and connected for your life on the go 50

 

Do more with apps 52

Get one to go 52

Get more done 52

Stay more secure 52

Have it both ways 52

 

Windows 8 and Windows RT are here 54

 

 

Meet Windows 8

and Windows RT

 

 

 

Windows is beautiful, fast, and fluid, bringing together your sites, people, apps, and more—

 

so everything you care about is right on your Start screen. Windows 8 and Windows RT

provide the platform for great hardware innovation, inspiring a new generation of tablets,

 

laptops, and all-in-one PCs that take advantage of touch, mouse, and keyboard—all working

 

together on top of the fastest and most stable foundation to date.

 

Use Windows to work and play with ease. Sleek and lightweight, with a focus on beautiful

design and innovative materials, Windows-based PCs have reached new heights in

performance and battery life. Entertainment and apps of every kind take center stage. Tap

into your creative side, lose yourself in a game with friends, discover and download video

and songs, and play them on your TV.

 

The Windows Store is your place for getting apps for Windows 8 and Windows RT. Discover

a variety of great apps, check out the featured apps, or tap or click a category name to

browse all of the apps in the category.

 

Windows is connected. Built-in mobile broadband features support 3G and 4G. As you

 

move, your PC automatically finds and uses available Wi Fi hotspots. You can be effortlessly

 

connected, ready to work, and able to stay in touch from virtually anywhere.

 

04 windows.microsoft.com windows.microsoft.com 05

 

 

All about you

 

The Start screen is all about you. Vibrant and beautiful, the Start screen is the first thing

 

you’ll see. Each tile on the Start screen is connected to a person, app, website, playlist, and

everything else that’s important to you. This isn’t the usual wall of static icons. Instead, you

 

see: status updates, weather forecasts, Tweets, and more—you see live updates before you

 

ever open a single app. Pin as many tiles to Start as you want, and then group, arrange, and

name them so it’s just the way you want it.

 

 

Your password in a picture

 

Forget having to remember a bunch of letters and numbers to sign in to your PC. Now you

can use a picture for your password. You choose the picture and the way you want to you

 

draw on it, so the possibilities are infinite—draw a circle around your favorite landscape,

trace a pattern over your dog’s face—it’s easy. You can either draw a picture password

directly on a touchscreen with your finger, or you can use a mouse to draw your shapes.

 

Switch to password

Locked

Justin Harrison

Justin_Harrison@contoso.com

windows.microsoft.com

 

windows.microsoft.com

 

 

Ready to roll: The apps you need

 

Windows 8 and Windows RT come with apps for both the basics and a lot more. Apps like

People, Mail, Photos, and Messaging power you through essential tasks and work together

to make everything easier.

 

 

People

 

See the latest info and start conversations

with contacts from your email accounts,

Facebook, LinkedIn, Messenger, Twitter,

and more.

 

Photos

 

See your photos and videos in one place,

whether they’re on Facebook, Flickr,

SkyDrive, or another PC.

 

 

Messaging

 

Send messages to a unified list of your

 

Facebook and Messenger friends and

choose from hundreds of emoticons.

 

Music

 

Browse your music collection, create and

edit playlists, and keep up with the hottest

new releases. See artist details with pictures,

bios, and album lists. Sign up for Xbox Music

Pass and get unlimited listening for millions

of songs.

 

windows.microsoft.com

 

 

Maps

 

View an interactive map with the Bing Maps

app, and get turn-by-turn driving directions.

 

Find traffic details, road conditions, street

 

maps, Multimap, satellite photos, and

aerial maps.

 

SkyDrive

 

Seamlessly access and work on your files

 

from your Windows apps, as well as from

your other devices.

 

Mail

 

Get email from your accounts—including

Outlook.com, Gmail, and Yahoo!—all in

 

one place.

 

Video

 

Browse and watch movies and shows.

Watch on your PC or play to your TV.1

 

Weather

 

See a beautiful preview of current weather

conditions when using the Weather app.

 

You’ll find the latest conditions and hourly,

 

daily, and 10-day forecasts.

 

1 To use Play To, you’ll need a TV that’s certified to be compatible with Windows 8, Windows 7, or DNLA.

Some features aren’t available on Windows 7 and DNLA-certified TVs.

 

windows.microsoft.com

 

 

Games

 

Discover the hottest new games and

download them to your PC. Depending on

where you live and travel, you can edit your

avatar, see what your friends are up to, and

share your achievements with them.

 

News

Stay informed. The beautiful, photo-rich

News app, powered by Bing, makes it

easy for you to stay up to date on what’s

happening in the world.

 

Calendar

 

Keep track of your schedule in month,

 

week, or two-day view. Get notified about

 

appointments at the right time so you’re

always on time.

 

 

Finance

 

Use this app to check key market indices and

stay on top of fast-changing market conditions

right from the Start screen.

 

 

Travel

Travel

 

Explore over 2,000 destinations all over the world through beautiful

photos. The Travel app powered by Bing makes it possible: travel

guides, booking tools, real-time currency conversion, and weather

info provide everything you need to turn your next trip into an

inspired adventure.

 

The Windows Store

 

Windows 8 and Windows RT include the Windows Store, where you can discover a new

world of apps for your PC. Just tap or click the Store tile and start exploring. Filter apps

by price, rating, and more. Many apps are free, and others let you try before you buy. So

 

explore and load up on apps—the more you have, the more you can do.

 

 

Your Windows, wherever you go

 

Sign in with your Microsoft account to any of your PCs running Windows 8 or Windows RT

and immediately see everything that makes it yours: your background, your display, your

settings. And when you get a new Windows-based PC, sign in with your Microsoft account

and watch the People app come to life with info from your social networks. Windows: your

 

stuff no matter where you are.

 

windows.microsoft.com

 

windows.microsoft.com

 

 

In touch and

up to date

 

Windows keeps you in touch. When you sign in, all the latest status updates and info you

care about right on your Start screen tiles: the photo you were just tagged in, today’s

 

weather, news headlines, and messages from your friends—everything you need to stay up

 

to date at a glance.

 

12 windows.microsoft.com windows.microsoft.com 13

 

 

Connect your stuff and your people

 

It’s easy to share between your Windows apps and the services that connect them. You can

 

quickly send pictures or files right from an app. If you want to send a link to a site or share

 

an app you love, you can do that right from your browser or the Windows Store. Instantly.

No more having to interrupt what you’re doing to copy what you want to share into an email

message. Just swipe in from the right and tap Share (or point to the upper-right corner with

a mouse, and then click Share).

 

Work and play on the go

 

Start a project on one PC and finish it on another. Collaborate freely and always have the

latest version of your files. You—and the people you work with—can use free Office Web

 

Apps to edit shared documents on SkyDrive and immediately see each other’s changes. You

 

don’t even need to have Office installed. Share the files you choose with the people you

 

choose, and make the rest private.

 

 

windows.microsoft.com

 

Internet Explorer 10: A more beautiful web

 

Internet Explorer 10 is the entirely new browser built to take advantage of the full power of

Windows 8 and Windows RT.

 

 

Fast and fluid

 

Internet Explorer starts and loads sites almost instantly. It brings a fluid responsiveness to

 

the web that feels totally new. Everything you want to do on the web is a swipe, tap, or

click away.

 

Perfect for touch

 

Internet Explorer 10 provides a touch-first and truly full-screen browsing experience.

 

Navigation controls appear only when you need them and quietly get out of the way when

 

you don’t. You can flip ahead or back through your websites with the flick of a finger. Tiles

 

and tabs for frequently visited sites are oversized for easy tapping.

 

Easy to use

 

Smooth, intuitive controls work just as you’d expect. One box is both the address and

search bar for speedier navigation. Pin your favorite sites to your Start screen and get to

them as quickly as you can open your apps.

 

Safer and more private

 

The industry-leading SmartScreen technology helps keep your PC and your info safer on

the web and helps protect against malware. Privacy tools like Do Not Track are built in and

easily turned on in just one tap or click, so you have more control.

 

windows.microsoft.com

 

 

Natural

and

 

Windows 8 and Windows RT are perfect for PCs with touchscreens, those that have a mouse

and keyboard, and those with both. No matter which kind of PC you choose, you’ll discover

 

faster and more fluid ways to switch between apps, move things around, and move smoothly

 

form one place to another

 

intuitive

 

16 windows.microsoft.com windows.microsoft.com 17

 

 

Here’s what you need to know about using touch or mouse:

 

Swipe from the right edge for system commands

 

Swiping from the right side of the screen reveals the charms with

system commands.

 

Mouse equivalent: Place the mouse pointer in the lower-right or

upper-right corner of the screen.

 

 

Swipe in from the left to switch apps

 

Swiping in from the left reveals thumbnails of your open apps so

you can switch to them quickly.

 

Mouse equivalent: Place the mouse pointer in the upper-left and

click to cycle through apps or lower-left corner of the screen to see

the Start screen.

 

Swipe in and out on the left brings up previously used apps

 

Swiping in and back out on the left brings up the most recently

used apps and you can select an app from that list.

 

Mouse equivalent: Place the mouse in the upper left and slide

down the left side of the screen to see the most recently used apps.

 

 

Swipe from the bottom or top edge for app commands

 

App commands are revealed by swiping from the bottom or top

edge. You can swipe from the top to the bottom of the screen to

dock or close the current app.

 

Mouse equivalent: Right-click the app to see the apps commands.

 

 

windows.microsoft.com

 

Drag an app to close it

 

You don’t have to close apps. They won’t slow down your PC

and they’ll close on their own if you don’t use them for a while.

If you still want to close an app, drag the app to the bottom of

the screen.

 

Mouse equivalent: Click the top of the app and drag it to the

bottom of the screen.

 

Press and hold to learn

 

You can see details when you press and hold. In some cases,

pressing and holding opens a menu with more options.

 

Mouse equivalent: Point to an item to see more options.

 

 

Tap to perform an action

 

Tapping something causes an action, such as launching an app or

following a link.

 

Mouse equivalent: Click an item to perform an action.

 

 

Slide to drag

 

This is mostly used to pan or scroll through lists and pages, but

you can use it for other interactions, too, such as moving an

object or for drawing and writing.

 

Mouse equivalent: Click, hold, and drag to pan or scroll. Also,

when you use a mouse and keyboard, a scroll bar appears at the

bottom of the screen so you can scroll horizontally.

 

 

windows.microsoft.com

 

 

Pinch or stretch to zoom

 

Zooming provides a way to jump to the beginning, end, or a

 

specific location within a list. You can start zooming by pinching

or stretching two fingers on the screen.

 

Mouse and keyboard equivalent: Hold down the control key

on the keyboard while using the mouse wheel to expand or

shrink an item or tiles on the screen.

 

Rotate to turn

 

Rotating two or more fingers turns an object. You can turn the

 

whole screen 90 degrees when you rotate your device.

 

Mouse equivalent: Support for rotating an object depends on

 

whether the specific app supports it.

 

 

Discover fast and fun ways to get around

 

Pinch and stretch to zoom in and out. Get a global view of everything that’s on your screen,

 

and slide back and forth to find what you’re looking for. You can easily do all these things

 

with a mouse, too. Swipe in from the left to switch between recently used apps.

 

 

20 windows.microsoft.com windows.microsoft.com 21

 

 

Swipe in from the right to get back to Start and to other things you

do often, like searching, sharing, and changing your settings. Swipe

in from the bottom or top to see navigation controls for any app

you’re in.

 

 

Search, share, change settings, and more

 

No matter where you are in Windows 8 or Windows RT—an app, website, or your Start

screen—it’s easy to do the things you do all the time, like search, share links and photos,

 

connect devices, and change settings.

 

What you can do changes depending on where you are, so start exploring. Search, Share,

Start, Devices, and Settings are always available on the right side of your screen. With touch,

swipe in from the right edge, and then tap what you want. With a mouse, move your pointer

into the upper-right or lower-right corner, and then move it up or down.

 

22 windows.microsoft.com windows.microsoft.com 23

 

 

Search

 

Apps work together

 

Search for anything anywhere. You can search just the app you’re in,

 

Things that used to take lots of different programs to do now flow together in one simple

 

for example to find a message in Mail or an app in the Store, search

 

experience. Use the Photos app to see all your photos from Facebook and Flickr, and then

 

another app, or search your entire PC for an app, setting, or file.

 

upload them to your blog or send them to friends via email—all without ever leaving

 

the app. Get to the controls the same way in every app. Swipe in from the top or bottom

edge, or right click with a mouse, and the app commands pop right up. To search, swipe in

Share from the right and tap Search (or point to the upper-right corner with a mouse, and then

 

Share files and info with people you know or send info to another click Search) to find things within the app, on your PC, or on the web.

 

app, all without leaving the app you’re in. You can email photos

to your mom, update your Facebook status, or send a link to your

note-taking app.

 

Start

 

Get to your Start screen. Or if you’re already on Start, go back to the

last app you were in.

 

Devices

 

Use all of the devices that are connected to your PC, both wired and

wireless. You can print from an app, sync with your phone, or stream

your latest home movie to your TV.

 

 

Settings

 

Change settings for apps and your PC. You’ll find settings, help,

 

and info for the app you’re in, plus common PC settings, such as

 

network connection, volume, brightness, notifications, power,

 

and keyboard.

 

24 windows.microsoft.com windows.microsoft.com 25

 

 

The Windows keyboard speaks your language. On touch-enabled PCs, the keyboard layout

 

Snapping apps

 

automatically adjusts to the language you use on your PC. Whether you want to type and

 

Snap apps side by side2 so you can do two things at once. Chat with a friend while

 

click, or swipe and tap, you’ll be able to do it in the language of your choice.

 

arranging a date for coffee, or watch a video while getting some work done. It’s easy to do

 

two things at once with Windows 8 and Windows RT.

 

 

The touch keyboard has two modes, so you can switch between a full-sized keyboard with

large buttons, and a handy thumb keyboard that splits the keys on either side of the screen.

Each touch keyboard makes typing easier, more comfortable, and more natural whether

you’re sitting down or walking around.

 

2 A 1366 x768 minimum screen resolution is needed to snap apps side by side.

 

26 windows.microsoft.com windows.microsoft.com 27

 

 

Keyboard shortcuts

With Windows 8 and Windows RT, you can use all the keyboard shortcuts you already know,

as well as some new ones for even greater efficiency. For example, the easiest way to search

on the Start screen is to simply start typing. Not on the Start screen? Press the Windows

logo key or button and you can quickly switch between the Start screen and the app

you’re in. Here are some of the new keyboard shortcuts for Windows 8 and Windows RT.

Press this

Windows logo key + start typing

Ctrl+plus (+) or Ctrl+minus (-)

To do this

Search your PC

Zoom in or out of many items, like apps

pinned to the Start screen or in the Store

Windows logo key + O

Windows logo key + Z

Windows logo key + PgUp

Windows logo key + PgDn

Windows logo key + Shift+period (.)

Move the Start screen and apps to the

monitor on the left (apps in the desktop

won’t change monitors)

Move the Start screen and apps to the

monitor on the right (apps in the desktop

won’t change monitors)

Open commands for the app

Snap an app to the left

Lock the screen orientation (portrait

or landscape)

Ctrl+scroll wheel Zoom in or out of many items, like apps

pinned to the Start screen or in the Store Windows logo key + period (.) Snap an app to the right

Windows logo key + C Open the charms

Windows logo key + F Open the Search charm

Windows logo key +H Open the Share charm

Windows logo key +I Open the Settings charm

Windows logo key + K Open the Devices charm

 

28 windows.microsoft.com windows.microsoft.com 29

 

 

Your PC, your apps,

your choice

 

Touch-enabled HD screens and ultra-responsive performance give you the best Windows 8

and Windows RT experience, and powerful graphics.

 

30 windows.microsoft.com windows.microsoft.com 31

 

 

Truly innovative hardware

 

Windows 8 and Windows RT are available on a wide range of devices that showcase the

latest innovations. It’s an exciting time to buy a new PC because of the choices you have and

the way hardware has evolved to truly keep pace with your life. Versatile convertibles give

you the style and mobility of a tablet, but can be quickly turned into a laptop so you can use

the keyboard when you need it.

 

Tablets and convertibles are designed to work as hard as they play. Lightweight tablets are

blazing fast with longer battery life to go where you go, do what you do, and never slow

you down. Windows 8 is designed for faster, more secure startup. Many Windows 8 PCs will

resume and connect to your networks before you even notice.

 

To keep you connected while on the go, Windows has added support for mobile data

 

networks, so you’ll be able to find tablets and laptops with built-in support for your favorite

 

mobile carrier.

 

Windows provides support for a full range of environmental sensors that make it possible for

Windows-based PCs and apps to react to what you’re doing. You can buy a new Windowsbased

PC with these sensors built in, helping apps tell you where you are, react when you tilt

or rotate the screen, do fun and interesting things when you shake the screen, automatically

change screen brightness based on changes to lighting, and even let you touch two devices

together to share a photo or a webpage.

 

windows.microsoft.com

 

Windows 8 and Windows RT also introduce new ways to connect a new generation of

devices, such as USB 3.0, which supports up to 10 times the speed of current USB and

 

Bluetooth Low Energy (BLE)—a great eco-friendly innovation with much improved

power efficiency.

 

windows.microsoft.com

 

 

Windows Store: All the apps you want

 

Discover all the great apps in the Windows Store for your Windows 8 or Windows RT

PC. You can browse through games and social media apps, download your favorite

entertainment app, compare photo, music, and video apps, and get apps that other people

 

have already rated and reviewed. Lifestyle, shopping, travel, finance, productivity, and more.

 

Our categories cover all the bases.

 

When you click the Top free and New releases tiles, you’ll see the newest and highest-rated

apps. And if you know what you want, just start typing or use the on-screen keyboard when

you’re in the Store,3 and you’ll instantly see results for apps that match your search.

 

 

3 Requires an active Internet connection and 1024 x 768 minimum screen resolution to access the Windows Store

and to download and run apps.

 

34 windows.microsoft.com windows.microsoft.com 35

 

 

Check out the featured apps we’ve highlighted for you in the Spotlight section. In many

 

countries and regions, we regularly showcase a different set of apps that truly shine on

 

Windows 8 and Windows RT. We’re always on the lookout for fabulous new apps and

showcase those we think you’ll love.

 

Here’s a short list of great apps you might find in our Spotlight.4

 

Larousse

 

Find exceptional content in French exclusively designed for Windows 8! You get a French

language dictionary, an encyclopedia, a thesaurus, an atlas of 200 maps, chronologies,

idioms, image galleries, and learning games. Larousse is an excellent source of knowledge to

answer all of your questions, so you can learn, fact-check, and have fun!

 

 

Cut the Rope

 

A mysterious package has arrived, and the little monster inside has only one request…

CANDY! Help get the candy to Om Nom, the lovable star of the game, in this highly

innovative and addictive puzzle game. Combining realistic physics with simple, yet accurate

 

and precise touch control, Cut the Rope is an original and fun-filled game.

 

 

eBay

 

The eBay app for Windows 8 and Windows

RT lets you to tap into the world’s largest

marketplace anywhere you are. It’s a free

app built with eBay users in mind that will

help you get more out of your buying and

selling activity.

 

 

Condé Nast Collection

 

The Condé Nast Collection opens the door

 

to a treasure trove of artwork that defines

 

culture, style, and generations that you

 

won’t find from any other source.

 

4 Not all apps are available in all markets.

 

36 windows.microsoft.com windows.microsoft.com 37

 

 

Skyscanner

 

Need a cheap flight, fast? Search millions of routes on over 1,000 airlines and find the

lowest-priced flights in seconds with the free Skyscanner Windows app; save money,

 

save time. Skyscanner sources the best deals and then connects you to the airline or

travel agent to make your booking directly, so you always get the best deals. It’s simple,

 

independent, and finds the lowest fares fast.

 

 

iCookbook

 

Award-winning iCookbook is the first

 

recipe and cooking app for Windows 8 and

Windows RT! iCookbook comes loaded with

more than 2,000 hand-selected and kitchentested

recipes with an easy-to-read Prepare

feature! iCookbook includes the same

number of recipes as more than 20

 

full-size cookbooks.

 

windows.microsoft.com

 

windows.microsoft.com

 

 

Family Safety

 

Family Safety is now an integrated part of Windows 8 and Windows RT, so it’s easier than

ever to keep track of when and how your kids use your PC, and to set limits on exactly

which websites, apps, and games they’re allowed to use. Family safety monitors your kids’

activities and lets you know what they’re doing. By connecting your account, you’ll even be

able to get email reports right in your inbox.

 

If you’re worried about your kids downloading apps, you can use Family Safety to control

 

what they can download and see in the Windows Store. Set up a rating level to filter the

 

apps according to your preferences.

 

Xbox 360 and Windows 8 and Windows RT

 

Xbox music and video is the new way to get great entertainment on Windows 8 and Windows

RT. Enjoy the latest movies, TV shows, and music on your tablet or PC. Get access to great

 

games made especially for Windows—from the latest hits to your favorite classics. Plus,

 

use Xbox SmartGlass5 on your tablet or PC as a second screen with your Xbox 360 for an

 

enhanced experience. Even flick your photos, home movies, and music right from your

 

tablet or PC to your TV.

 

You can use touch, or mouse and keyboard, to browse and control what’s playing with the

app. And the app shows you detailed info about the movies, TV shows, games, and music

that you’re enjoying.

 

Entertainment is more amazing with Windows 8 or Windows RT and Xbox. On your PC, when

you sign in with the Microsoft account associated with your gamertag, you’ll be automatically

signed in to any Xbox LIVE app that you open in Windows 8 or Windows RT.

 

5 Coming end of 2012. Available content and features vary by device. Second screen control is available

with select games and Xbox LIVE content. Additional fees and requirements apply for some content.

See xbox.com/live.

 

windows.microsoft.com

 

 

windows.microsoft.com

 

 

Reimagined,

but familiar

 

 

Windows 8 and Windows RT are built on the rock-solid foundation of Windows 7, but

they’ve been improved on all fronts and designed to work with great new hardware

and devices. With radically updated designs, touch support, and instant resume, it’s a

new era for PCs.

 

42 windows.microsoft.com windows.microsoft.com 43

 

 

Strong on fundamentals

 

Blazing fast with incredibly fast start-up times, longer battery life, and responsiveness you

 

can feel in your fingers, Windows 8 PCs are humming with a new power.

 

More secure

 

Bad things can happen to good PCs. From hackers, viruses, worms, spyware, and other types

 

of malware, your PC has a lot to fight against. Windows 8 and Windows RT help to ensure

 

that you’re protected at all times.

 

Windows 8 and Windows RT transform the way you use your PC. They’re beautiful, fast,

 

and fluid, with Windows productivity when you need it. Go ahead and push the limits. The

features used most often by power users are now even more flexible and efficient.

 

 

Restore and reset your PC

 

Windows 8 and Windows RT come with a number of options to restore your PC. You can

even refresh your PC, which keeps all of your documents, accounts, personal settings, and

Windows Store apps, but returns Windows to its original state.

 

 

Multi-monitor support

 

If you want to get more things done at once, consider using multiple monitors. Read a

 

report on one while creating a presentation with another, or find apps in the Windows Store

 

while you skim the latest reviews online. And some apps, like coding and video editing apps,

are faster to work with when they’re spread across two or more monitors. Windows 8 and

Windows RT have been designed to give you new, powerful, multi-monitor options. Use the

four corners of each monitor to open the Start screen, charms, and your recent apps. Open

desktop apps on all monitors, or open apps from the Windows Store on one and desktop

 

apps on another. The flexibility of the new multi-monitor support will give you an edge

 

in productivity.

 

A new take on Task Manager

 

You can use the updated Task Manager to

quickly see which apps and services are

using resources on your PC. And if you want

to see the nitty-gritty details of what your

network connections and hardware are up

to, or control the apps that run at startup,

 

it’s all easy to find and control. Color-coded

 

tiles help you to quickly see what’s going on.

 

You can keep track of resources efficiently

 

and immediately with graphs and details on

processes, apps, and history.

 

File Explorer revamped

 

Whether you’re a person who puts all your

 

files into a single folder or has dozens of

 

folders and never throws anything away,

you can use File Explorer (previously called

Windows Explorer) to get a handle on your

 

files. The new ribbon in File Explorer makes

 

it a lot easier to do what you do most often,

 

such as copy and paste files, show file

extensions, and search for files based on

 

date, type, or other properties. We’ve even

brought back a long-lost favorite:

the Up button.

 

windows.microsoft.com

 

windows.microsoft.com

 

 

Windows 8:

Powerful for

 

 

Windows 8 has been reimagined to be all about you. Put what matters most right on your

Start screen, and get instant access to your people, apps, sites, and more, so you can spend

less time searching and more time doing. Windows 8 is smooth, intuitive, and designed to

let you do what you want, the way you want.

 

work and play

 

46 windows.microsoft.com windows.microsoft.com 47

 

 

Windows and devices

 

Windows 8 supports a wide range of devices, including printers, cameras, media players,

and displays. They’re designed to just work when you plug in your device. A Windows Store

device app is an app that lets you work with your hardware.

 

For example, you can plug in a scanner and Windows will automatically download the

associated Windows Store device app, when available. This app would allow you to preview,

 

scan, and configure the scanner ‘s settings. Windows 8 generally works with the same

 

peripheral devices and apps that work with Windows 7.

 

Windows 8 Pro: Ready for business

 

With Windows 8 Pro, you get everything in Windows 8, plus enhanced features that help

 

you easily connect to company networks, access files on the go, and more.

 

Access and protect your data

 

Encrypt your data with BitLocker on Windows 8 PCs to help keep your work safe and

 

confidential, even if your PC is lost or stolen.

 

Windows 8 Pro also lets you connect to your company networks using domain join.

 

For ultimate flexibility and data access, you can set up your work PC to allow remote

 

connections and access it with Remote Desktop when you’re on the go.

 

48 windows.microsoft.com windows.microsoft.com 49

 

 

Windows RT: Fast

and connected for

your life on the go

 

 

50 windows.microsoft.com windows.microsoft.com 51

 

 

Do more with apps

 

Great built-in apps like People, Mail, Photos, and Messaging work together to make things

easier and power you through essential tasks. And you can do more with a world of apps

at the Windows Store. Windows RT works exclusively with apps from the Windows Store, so

you know your software is always compatible.

 

Get one to go

 

Extraordinary battery life means you can go longer. And connected standby in Windows RT

keeps your apps in sync, even in sleep mode, so your PC turns on and is ready to go when

you are.

 

Get more done

 

Windows RT comes with Office Home & Student 2013 RT Preview6 so you can do more

right out of the box with touch-optimized, new versions of Microsoft Word, Excel, and

PowerPoint.

 

Stay more secure

 

Windows Defender, Windows Firewall, and Windows Update are always on in Windows

RT so you’re always more secure with the latest protection7. Windows RT also comes with

Device Encryption so your information is safer, too.

 

Have it both ways

 

Windows RT is built for touch so you can work, play, and customize in whole new ways, but

it also works great with mouse and keyboard so you can get to work when you need to.

 

6 Preview edition installed. Final Office version will be installed via Windows Update when available (free

download; ISP fees apply). Some features and programs unsupported. See http://office.com/officeRT

 

7 Internet access (with any fees that apply) is required to update. Windows Update and Defender can’t

be disabled.

 

windows.microsoft.com

 

windows.microsoft.com

 

 

Windows 8

and Windows RT

are here

 

Whether you’re working or playing, at home, in the office, or on the go, Windows 8 and

 

Windows RT are designed for you. You decide how you want to sign in, which PCs you want

your apps on, who else can use your PC, and the colors, pictures, and organization style that

works best for you. Get a great set of apps out of the gate, and build on it from the myriad

of apps available in the Store. Get the protection, speed, and reliability you’re used to from

Windows and then some.

 

With Windows, there are endless possibilities.

 

54 windows.microsoft.com windows.microsoft.com 55

 

 

Five ways Microsoft Office 365 will make your workday more productive

If you’re burning the candle at both ends and juggling endless priorities, Office 365 can probably help. And from as little as £3.90 per month – less than a fifth of other services like business phone contracts – it’s won’t burn a hole in your pocket. Oh, and there’s a 90-day free trial, too. Here are our top five ways Office 365 can buy you some time.

1. Work Anywhere

The principle of Office 365 is to give you access to any information (emails, documents, calendars, files) on any device (PC, laptop, tablet, phone), anywhere with an internet connection. You can even work on other people’s machines, because all your services are available through nothing more than a web browser – ideal for freelancers! Change emails, contacts or calendar entries on one device and they will automatically be updated on all your other devices, so you can move seamlessly from one location to the next.

2. Get a website up in minutes

Just starting out in business? You’ll want a website, then. With SharePoint Online in Office 365, you can create simple template-driven websites from scratch in a matter of minutes. You can have your own domain name (www.mycompany.com) too. Customise it with the Site Designer for something more glamorous, or get a basic site online and congratulate yourself on another marketing job done and dusted.

3. Get the gift of Presence

With Presence, you can see in real-time who amongst your contacts (both in your organisation and beyond) is available to talk. In Outlook, coloured markers denote “Available”, “Away”, “Busy” and other statuses. You’ll know straight away whether to phone, send an Instant Message or email – all of which you can do with one click. Plus, you can of course set your own status to open the door when you’re free or keep people out when you’re concentrating.

4. Share with anyone you want to, safely and securely

With Team Sites, you can create document repositories for teams, departments, projects or any other group of people; including external stakeholders. Team Sites are secure yet easy to create and use; and include many of the usual functions of Office 365 (like Calendars and collaborative document editing), which means you can effectively extend your Office tools to third parties working with you on specific projects. This makes collaboration simple and natural, plus, of course, it presents a very professional image.

5. Let’s have the meeting right now

Office 365 includes Lync Online, a unified communications toolkit in the truest sense of the word: you can pick and swap between whichever tool meets your immediate needs. Nudge a contact with an Instant Message (because with Presence you can see that they’re free); then escalate into an audio or video call. Perhaps you want to share a document too? No problem- screen sharing is included, too. You can even conduct more formal online presentations, in which attendees can sign up using a web-form, and then join in a virtual auditorium run directly from your desktop.

Try New Office for free now.

We, at Microsoft, are happy to announce ‘Talking Business’ to help entrepreneurs and business owners’ succeed. We believe in your ability to innovate and in the ability of technology to help, so to support ‘Talking Business’ we have many resources available for businesses like you. We have an informative newsletter, blog posts by people in the industry, and a community looking to help small business succeed.

Microsoft Security

   

   

  

   

    

   

     

     

      

     

       

 

       

     

     

   

   

 

   

   

    

     

    

   

     

     

      

       

      

     

       

Privacy settings in Microsoft         products
       
According         to a Microsoft-commissioned survey         for Data Privacy Day, many consumers feel they have little to no         control over their data online. Microsoft urges you to use privacy         tools and settings for Internet Explorer, Windows Phone 8, Xbox, and various Microsoft services.

       

See how we commemorated Data Privacy Day and read a message from Brendon Lynch, Chief Privacy         Officer at Microsoft.

       

 

       

     

     

   

  

   

  

   

    

     

    

   

     

     


     

     

Security updates       for February 12, 2013      

     

Download       12 security updates for Microsoft Windows, Microsoft Exchange, Microsoft       Office, Microsoft .Net Framework, Microsoft Server Software, and Internet       Explorer.

     

     

     


     

     

   

  

   

  

   

    

     

    

   

     

Microsoft security news

     

 

     

Clean up Bamital botnet malware      

     

Have you seen a page that says you       might have malware on your computer? It could mean you’ve been infected       by a botnet that was recently taken down by Microsoft and Symantec.

     


      Safer Internet Day: 10 years of fostering digital citizenship

     

In conjunction with Safer Internet       Day, Microsoft released results of the second annual Microsoft Computing Safety Index,       a survey of online safety behaviors among computer and mobile-device       users around the world.

     

 

     

Update Internet Explorer now

     

On January 14, Microsoft released a       security update for Internet Explorer. Visit Windows Update to download and install the update,       and make sure that you have automatic updating turned on. 

     

     


     

     

   

  

   

  

   

    

     

    

   

     

      

       

       

      

      

       

      

     

       

                                                                                                                                                                                                      

       

       

Protect your computer

       

 

       

Should I use more         than one antivirus program?

       

Find out how more can mean         less when it comes to antivirus protection. Learn which versions of         Windows include antivirus software and what to do if you want to use a         different program.

       

       

Watch out for fake Java updates

       

You may have heard reports         about security alerts for Java with urgent warnings to update your         software. Do you know how to get real updates and avoid the fake ones         that could contain a virus?

       

 

       

Take the Real vs. Rogue quiz

       

Download a free Facebook app         from Microsoft that features an interactive quiz to help you tell if a         security warning is from your real antivirus software or from rogue         security software.  

       

       


       

       

     

   

  

   

  

   

    

     

    

   

     

      

       

       

      

      

       

      

     

       

                         

       

       

Protect yourself and your family

       

 

       

Recover a hacked Microsoft         account

       

Learn how to reset your         password, and take other security measures if you think someone has         hijacked your Outlook.com or Hotmail email account.

       

       

Watch out for prize scams

       

If you’ve received an email         that says you’ve won a prize or a lottery, it could be fraudulent. Find         out how to spot scams like this and what to do if you’ve already         responded to one.

       

 

       

Secure your smartphone

       

Whether you use a Windows         Phone or another brand, these tips can help you protect your personal,         work, and financial information if your phone is ever lost or stolen.

       

       


       

       

     

   

  

   

  

   

    

     

    

   

     

Security       resources

     

     

     


     

     

   

HOW SOCIAL MEDIA MESSAGING

When it comes to posting on your brand’s social pages, you have a wide choice of content
types, including photos, updates, video, and links. Each has its own strengths—and a
proper place in your arsenal. Regardless of what type of content you post, there are certain
messaging strategies that consistently prove to increase fan engagement across social
networks. We’ve repeatedly seen specific content strategies work over many different brand
pages—both to trigger instant engagement and to inspire fans to keep coming back for
more. In this paper, we take a look at these seven strategies, using real life examples from
companies that do a fantastic job of engaging their fan communities.
Tap into fan passions
You already know what your fans are passionate about, so make your social pages the perfect
platform for them to express their passions. If you market for a fashion brand, talk about
design, style, and haute couture. If you’re a food brand, ask for favorite recipes and opinions
on food trends. Focus on the unique personality of your fans to determine what type of
messaging or content they’ll respond to best. Here are some examples of brand pages that
excel at engaging their followers on a personal and emotional level.
Benefit Cosmetics
Benefit Cosmetics shares regular updates with
its community about its makeup products:
news about upcoming product lines, pictures of
makeup, and tips and tricks. In addition to being
product centric, however, Benefit injects a healthy
dose of personality-driven appeals to its fan
community— for example,
in recognition of the
fact that everyone has
“down days,” Benefit
prompts users to let
them know when
that feeling happens,
so that the brand
can respond with
an instant “Beauty
Boost” compliment to
brighten users’ days.
3
H&M
H&M understands its fans love for fashion. Instead of trying to replicate the experience of
shopping in the store or on the website, its page supplies a steady stream of interactive
content tailored specifically to its social audiences. On Google+, H&M publishes a range of
follower-only exclusive
collection previews,
contests, and behindthe-
scenes footage, like
that of a Vogue editor
preparing for an H&M
photo shoot. Their
most popular posts
are the ones for their
top collections with
Beckham, Versace
and Marni.
Ask Simple, Closed Questions
Would you rather do a task that’s quick and easy, or one that takes time and effort? It depends
on the reward, right? Day-to-day interactions on social networks don’t really offer fans much
reward other than taking part in a community, so make sure your messaging is easy to interact
with. One strategy to ensure engagement is to ask followers questions that are a breeze to
answer. Asking open-ended questions requires fans to consider and write out their answers.
However, nothing is easier than stating an opinion to a “Yes” or “No” question. The barrier to
typing a one-word response, or simply clicking “Like,” is very low, so more users respond. Let’s
look at some examples of brands that make it easy for users to engage.
1 Google, Q3 2011 earnings call
2 Facebook, “Best Practices for your Page and Media Strategy,” 2012
Tip
Photos or video media are incredibly engaging when attached to messaging updates.
In just the first 100 days of the network, over 3.4 billion photos had already been
shared on Google+1. On Facebook, posts including a photo album, a picture or a
video generate about 180%, 120%, and 100% more engagement than the average
text post, respectively2.
Whitepaper | How Social Media Messaging Builds Your Brand
4
VEVO
VEVO, a top music video website,
shares a lot of videos and artist
news with its followers. Every
once in a while, however, to
shake things up and keep the
messaging light and fun, it
balances the dialogue with a
simple question. In the example
below, VEVO welcomes the
weekend by sharing a funny
e-card image with the prompt,
“TGIF! What’s your weekend jam?”
TIMBERLAND
Timberland, the outdoor clothing,
hiking boots, and active sports
product company, incorporates
images of its product line into
the majority of posts. To freshen
the mix of messaging tied to
the product images, Timberland
regularly includes style or image
related “closed” questions. These
messages pack a double punch—
the products still get featured
visually in a way that is relevant to
the messaging, and the question
prompts a response.
3 Facebook, “Best Practices for your Page and Media Strategy,” 2012
5
Tell Users What You Want from Them
Use instructive language in your posts to make it crystal clear what you want fans to do. Believe
it or not, ending a post with the instruction to “Like” or “+1” this post usually results in a markedly
higher number of those actions! We found a perfect example on escape business solutions’s own fan page.
The two screenshots below show content that we posted on escape business solutions’s Facebook Timeline. The
content was very similar, was published within one month, both posts showcased interesting
pictures, and both had a similar number of impressions. But the post with the instruction
to “Click LIKE if you’re as thrilled as we are,” got nearly four times as many Likes as the post
without the instruction. This result is consistent with the results our clients get on their pages
as well. The lesson: Never leave the next step up to interpretation—tell fans exactly what you
want them to do.
Tip
Question posts and fill in the blank posts generate about 90% more engagement
than the average text post3.
Whitepaper | How Social Media Messaging Builds Your Brand
6
Steve Madden
Steve Madden tweets its
followers a variety of fashionfabulous
content, but when it
has a specific action it wants
readers to take, the instruction
to “watch” couldn’t be clearer.
Treat Your Fans Like They’re VIPS
Do you have exclusive information that you haven’t shared or posted to your website yet?
Do you have internal photographs of your team, or videos of a company event that you won’t
be sharing any other way?
According to the research firm Razorfish, the #1 reason fans “Like” or “Follow” a brand on
social networks is to get access to exclusive content, promotions, and deals. So, give your fans
privileged content that makes them feel special. Coupons, giveaways, and sweepstakes get
the highest amount of entries on average. If it suits your brand, you can even give your fan
community a special name like Lady Gaga does with her “Little Monsters.”
7
The Washington Redskins
The Washington Redskins regularly gives its fans opportunities to engage in VIP, exclusive
events. For example, the Redskins host a monthly live Google+ Hangout on Air with a rotating
selection of players. Every
month the Redskins give fans
the opportunity to be one
of the nine live attendees of
the hangout (while the rest
of the community watches
the video stream live, and
later via recording.) The
Redskins get their social
communities excited about
upcoming Hangouts via an
active messaging schedule,
alerting fans to the next
athletes scheduled to appear
in hangouts and how fans
can sign up to
be one of the
exclusive nine
that get to chat
with them live.
Dunkin’ Donuts
Dunkin’ Donuts does a great job using the Facebook VIP strategy. Every week, the company
encourages its fans to submit photos of themselves with Dunkin’ Donuts products. And each
week, one lucky fan is chosen
as the “Fan of the Week,” an
honor which includes having
their picture featured on the
Dunkin’ Donuts Facebook
page. For giving fans a chance
at Facebook fame, Dunkin’
receives week after week of
quality engagement.
Whitepaper | How Social Media Messaging Builds Your Brand
8
Invite One-on-One Interactions
Even if you have a great connection with your brand’s social follower base, you can take the
relationship to a new level when you respond to them personally. Address your fans by name
whenever possible, and respond to their comments one-on-one. Many followers express
pleasant surprise when they receive this personal touch, because it proves that you’re listening
and are receptive to their comments and feedback. And they’ll be more likely to keep posting.
Annie’s Homegrown
Annie’s Homegrown takes the time to respond to every customer post with helpful
information, gratitude for the fan’s loyalty or, at the very least, a “like”. Annie’s often uses
the fan’s name in
replies to personalize
the conversation
and make the fan
feel heard and
appreciated, like in the
tweet and Facebook
post depicted here.
Tip
Be seasonable and timely with the content you’re posting. Fans are more likely to
engage with topics that are already top of mind, such as current events, holidays
or news. For example, posts mentioning Independence Day on July 4th generated
about 90% more engagement than all posts published on that day.4
3 Facebook, “Best Practices for your Page and Media Strategy,” 2012
9
Timberland
Timberland again demonstrates a high
attention to fan engagement by celebrating
individual fan photo uploads. Each photo
uploaded by a fan is a visual endorsement,
not to mention valuable earned media, so
there is a high value to encouraging fan photo
uploads. For example, Timberland posted an
update informing the
community that, in
celebration of its page
reaching over one
million fans, it would
collect and re-post
favorite fan photos,
giving each chosen
fan the enticement of
boosted exposure.
Take Your Relationship to the Next Level
It is important to not just sit back and wait for a chance to engage. Actively invite personal
conversation with your community by soliciting their opinions on relevant topics or asking
them what types of content they want to see. Then act on their suggestions. The escape business solutions
Facebook fan page is one example of this approach working very successfully. Our fan page is
an ongoing initiative to build engagement with a community of marketers, business owners,
and social media managers interested in sharing knowledge about social media in general
(and escape business solutions in particular).
We’ve found that our most engaging posts—those that get the most feedback from our fans—
are consistently the ones that invite people to post any question they have about social media,
or to have their pages reviewed by social media professionals. The key to success is that we
actually answer all the questions that are raised. Because of our reliable—and personal—
follow-up, our brand is trusted as one that responds. Our “Social Media Hour” has become a
popular fixture, and commenters frequently leave feedback about how helpful they find it.
Whitepaper | How Social Media Messaging Builds Your Brand
10
Here is an example of fan feedback we
received after posting that we would
review any fan page that left its link in the
comments of a message. We promised just
a simple 10-second test, but we delivered
for every single fan page that participated.
Many fans were pleasantly surprised and
grateful—and they let us know about it.
Humanize Your Brand
We don’t know what it is, but people get undeniably excited about a glimpse behind the scenes.
It works for DVD and Blu-ray sales, and it works on social profile pages too. Followers respond to
VIP content, and what’s more exclusive than a look inside the workings of your company?
Any messaging that humanizes your
brand, adds depth to its personality,
or colors its character will go over well
with fans. This strategy can even add a
positive new dimension to your brand,
changing its image from untouchable
to relatable with a few thoughtful posts.
Our own social profile pages regularly
displays posts that share the spirit of
escape business solutions with our fans. In the sample post
below, we let Facebook fans in on the fun
tradition of ringing a cowbell to signify
the launch of a new customer campaign.
This post received 50 Likes within an
hour of its publication.
11
The same is true of the post to our
Google+ community on the right, which
showed an amusing effort by some
women at escape business solutions to join in on Men’s
Health Awareness Month activities by
donning mustaches for “Movember.”
The follow-up commentary from readers
demonstrates their appreciation for the
levity and entertainment value of the post.
Upping the Ante: Promoting your Messages as Paid Media
Many marketers aren’t aware that an active social presence, no matter how consistently
engaging, doesn’t guarantee them visibility in their fans’ and followers’ news feeds. In fact, most
of a brand’s social audience never sees its page posts. According to one study, a brand’s posts
on Facebook typically reach just 16% of its fans.5 So while publishing a steady stream of content
is a very necessary part of engagement strategy, it is not sufficient as a stand-alone strategy.
Organic reach alone will not get the message out to your communities at the broadest scale.
Brands can reach and engage a larger percentage of users by running paid social ads. Social
ads drive engagement by tapping into the viral characteristics of social sites and broadcasting
brand messages: ads generate Likes and shares, which increase visibility in the news feed.
And since more than a quarter of social network users are likely to pay attention to a social ad
posted by a friend6, that news feed visibility has the potential to be quite impactful.
This extended visibility won’t just suit your brand within the walls of the social network it
originated in, either— paid social advertising on Google+ extends the influence of social signals
into search and the rest of the web as well. In fact, advertisers that activate the social extensions
feature within their AdWords account see a 5-10% uplift in click-through rates on average7.
Stay tuned for an upcoming escape business solutions report that offers tips and best practices for engaging
your audience and amplifying your brand message.