It Looks Like You’re Building a Large Library. Would you Like Help?

SharePoint 2010 is more than just SharePoint 2007 plus a bunch of new bullet points on the box. We didn’t just haphazardly build a bunch of new features, look back at the fertile seeds we planted, and muse about how “everything should work pretty well as libraries get large.” We built, and more importantly, tested all the features you’re reading about with scale in mind. We are setting new scale targets for 2010 that go above and beyond what we set in 2007. These numbers are not final yet, but we’re shooting for tens of millions of documents in a single library, depending on some specific parameters of your scenario.

When I throw out numbers like that, I’m not talking about just big, static libraries with content that just sits there. We want you to do crazy things with SharePoint 2010 like stuff a million document sets in a single document library with workflows running every which way, a hundred different retention policies firing off actions when you least expect them, and users uploading, tagging, and searching day in and day out. All the goodness of the SharePoint platform will be available to you whether you’re building a team site, a collaborative repository, a knowledge base, or a super large archive.

Like a plump, juicy sausage, much of the good stuff in SharePoint 2010 to give it delicious scalability are things that most people don’t need (or want) to know about. For the most part, scale just works. However, the chef (or information architect) is still a super important player. A well-planned repository is one that will have your users coming back for seconds and writing rave reviews; a poorly-planned one is one that will have them chugging Pepto-Bismol the next morning. Just because you can stuff a bunch of documents in a SharePoint 2010 library without your server igniting in flames the next day at doesn’t mean that you should without first thinking through how to best use the tools available to deliver an excellent experience to your end users.

So, even though scale in SharePoint 2010 just works, you’re not going to install the bits on day 1 and have a massive, searchable, beautiful content storefront on day 2. Guidance still matters, and believe me, we know it; this blog entry is just the beginning of the content we’re planning on delivering to help you on this front. I wouldn’t even call this blog entry guidance; it’s just a primer on the features and capabilities of SharePoint 2010 that you will grow to love if you’re passionate about scale at the library level – if you want to shove a whole bunch of documents in one place and have it be a great experience for both IT and your end users.

So what are these features and capabilities? Here are a few of the most important ones that I’m going to blog about now and in the near future:

  1. We protect your database backend from dangerous queries. If you run a query against any database that requires it to scan through millions of items to find the ones you’re asking for, you’re going to balk at how long it ties up the server’s CPU. Quite frankly, SharePoint is not an exception. Even in SharePoint 2010, there is a class of user operations in certain scenarios that make unreasonable demands on the backend. For these operations, our strategy is to nip them in the bud before they’re executed, which keeps your high-demand servers healthy and responsive. Knowing when this throttling will kick in and planning for it is an important part of large list planning.
  2. We give end-users tools to find content. When you have a sea of documents, the specific one you’re looking for can seem like a needle in a haystack. Structured metadata, easy tagging, metadata navigation, and built-in search refinement make this a less daunting task in SharePoint 2010 out of the box. This is an area we are particularly passionate about; after all, what good is a hugely scalable library if your end users hate it and can’t find what they’re looking for?
  3. We help developers write excellent code. In SharePoint 2007, we didn’t give developers the right tools to write code that scaled well as the amount of content in your site grew. Even worse, it was pretty hard to tell why and when code was bad, and if your site was running slowly, which one out of your ten custom web parts was bringing things down. You had to “build around” SharePoint and do things “just so” to avoid this from happening. In SharePoint 2010, we give you a bunch of tools to make this story better.

Dangerous Queries

One challenge we’ve consistently seen customers run into when building large repositories on SharePoint 2007 is trouble with large containers. As the number of documents in any single container grows – either at the root of a library, or in a folder – bad things start to happen. For one, as your document to container ratio increases, it becomes harder and harder to find exactly what you’re looking for. More serious are the performance implications of large containers. Any of the out of the box ways of retrieving content from containers in SharePoint 2007 – like the All Documents view, the Explorer view, or a Content Query web part – would work, but they don’t scale very well. Loading All Documents in a library with a million items at the root would take a long time to finish. The big problem here is that you wouldn’t be the only one affected; all your friends running SharePoint sites on that same database server would experience things slowing to a crawl as well, as the database server dutifully iterated over those million documents to find the right ones.

Why does this happen? Any time you ask for content from SharePoint, you have to specify how it’s sorted – for example, the All Documents view in SharePoint 2007 asks for the top 100 results, sorted by filename. But items aren’t sorted by filename in the SharePoint content database – so, to bring you this view, SharePoint has to gather up all these million items, sort them, and finally display the 100 ones at the top of the sorted list. Imagine this as being like flipping through the residential section of a phonebook to find the first 100 addresses, sorted in alphabetical order. This would be a miserable task, because the telephone book isn’t sorted in this way – so in order to ensure your sorted list was accurate in the end, you’d have to look through the entire residential section, from start to finish, because after all, the last person listed in the phone book might live at 1000 Aardvark Lane.

Large Lists and Fallback Queries

The laws of physics are the same in SharePoint 2010 as they were in SharePoint 2007; if you run a query that needs to touch a very large number of items, you’re going to have to wait a long time, and so will everybody else. One prominent thing we did in SharePoint 2010 is to nip these queries in the bud before they get executed. To make a long story short (you can read the long story here), a farm administrator can set a threshold which defines the maximum number of items a single SharePoint query can touch. By default this threshold is 5,000. Any library with more items than this threshold is a large list.

Let’s go back to our example of the library with one million items at the root. Say you had that library in SharePoint 2007, and you upgraded to SharePoint Server 2010. First thing you’ll see upon navigating to this library will look something like this:

clip_image001[8]

See the yellow bar above the list view? That’s a sign you have the Metadata Navigation and Filtering site feature turned on and it’s causing something magical to happen! When you load this view, SharePoint 2010 knows that you’re being greedy and asking it to scan through those million items. Since this query exceeds the maximum number of items a single query is allowed to scan (5,000) it doesn’t run the query. But who wants to stare at an empty list view? Instead of running this query as-is, SharePoint finagles it a bit and transforms it into a query that’s almost as good as the one you were asking for, but won’t make the database buckle under the pressure. In this case, we assume that it’s fairly likely that the document you’re looking for is one of the most recently created items in the library – so instead of scanning all one million items, we only scan the top 1,000 or so recently created documents, sort those by filename, and show them to you in the list view. This is what we call a simple fallback query: a query that doesn’t specify an index and asks for too many items in return, so instead of considering the entire list as being eligible for the query, SharePoint considers only the thousand or so most recently added items.

“Wait a second. You’re telling me that SharePoint throttles queries without asking me first? How on earth am I supposed to find anything in this crazy world of fallback queries and partial results?”

Let me assure you; this throttling business is a good thing. It’s a core ingredient in what makes SharePoint 2010 a resource for addressing your scale challenges. Gone are the sleepless nights where you toss and turn and worry about page faults on your database cluster resulting from Mack in Accounting stuffing 6,000 beer pong tournament photos in the root of a library in a forgotten team site in the dusty corners of your SharePoint deployment. The SharePoint 2010 feature set replaces this overarching concern with a set of well-scoped challenges; instead of worrying about every library that might get big, you get to plan for and craft experiences for the set of libraries that need to get big for business reasons.

I should mention really quickly that throttling is about more than just list views. There is a whole class of operations that involve iterating through all the documents in a list, or all the documents in a folder, that will get throttled (in other words, they will not execute) when the list or container is large. These operations include things like:

  • Adding a column to a library
  • Creating an index on a library
  • Deleting a large folder in a library

Metadata Navigation – finding and working on content in large lists

clip_image003[8]

Above is another screenshot from my million item library. This time, we’ve put a couple of SharePoint 2010 features to work. See that I have “demonstration scripts” selected in the left hand side in the tree view, and my list view is rendering without the yellow bar that’s telling me I’m only seeing newest results. That hierarchy of tags you see there represents a taxonomy, Item Type. I am browsing the documents in this library according to their Item Type; in the screenshot, I am filtering to show all documents with the value “demonstration scripts”. Here are the steps that I took to make this happen:

  1. I created a taxonomy that describes my content. You can look forward to some posts from our very own Dan Kogan on this very topic in this very blog in the near future. There’s a lot to learn here. Not just any taxonomy will do here; it needs to be one that broadly divides my content up into evenly-sized buckets. For example, if I had 990,000 demonstration scripts, the query you see above would not get me anywhere. In that case, it wouldn’t make much sense to use Item Type as a piece of metadata and a navigation hierarchy for this library; I would need to find another, more divisive way to pivot the data.
  2. I bound that taxonomy to a field in my library called Item Type. Think of a taxonomy field as a choice field on steroids. Instead of picking values out of a flat list, you pick them out of a tree.
  3. I configured that field as a navigation hierarchy. Every library now has a Metadata Navigation and Filtering settings page where you can configure navigation hierarchies (the filters you see arranged in the tree view) and key filters (the additional filters that show up beneath the tree view)

In these three easy steps, I made “Item Type” a first class navigational pivot over the data. Instead of just staring at a partial list of content at the root, I can now browse with impunity by this virtual folder structure.

Here’s a couple of cool aspects of this feature that aren’t apparent from a single nifty screenshot:

  1. Metadata navigation lets you slice and dice multiple ways. I might have a bunch of taxonomies on my library that classify content in different ways; for example, I might have a Products field, a Region field and a Competitors field, all bound to domain-specific taxonomies that classify the content along those dimensions. Depending on my current task, it might make more sense to filter by the Region field (for example, if I’m looking for the latest sales figures for the North America region). I get more filters than just my virtual folder; I can combine this filter with any number of key filters or list view column filters to drill down to just the content I want (for example, I want to see all demonstration scripts by the ECM team created after 2007).
  2. Metadata navigation thinks about indices and large lists so you don’t have to. Hey, remember just a few minutes ago when we were talking about large lists, indices, and being throttled? Well, metadata navigation thinks a lot about indices and how to run queries the “right way” to make them perform well and prevent throttling from happening. For starters, all the fields you configure as navigation hierarchies and key filters get indexed, and the resulting queries are written in a way that ensures the best index is used to make the query succeed.

You aren’t immune from the laws of physics; if you ask for documents tagged with demonstration scripts and there are 10,000 demonstration scripts, we’re not going to be able to show you all of them. In this case, though, you get something better than a simple fallback; you get an indexed fallback, which means that instead of considering the entire list, the query considers only the items that match the indexed portion of your query.

Wrap-up

This article was just the first in my series of posts about architecting and building large lists filled with discoverable content. Here’s what you can expect over the next few weeks:

  • A deep dive on metadata navigation, how it works, and some tips to getting the most out of it
  • A discussion on how other features, like Search and the Content Query Web Part, fit into the equation, and how to configure their metadata filtering capabilities
  • Some geeky developer tips on writing code that plays nicely with large lists

After that, I’ll be widening my scope a bit to talk about the overall knowledge management story in SharePoint 2010 – which is about more than just browsing for content in a library!

Lincoln DeMaris, Program Manager, ECM

Introducing Web Analytics in SharePoint 2010

As part of SharePoint Server 2010, we have created a new set of features to help you collect, report, and analyze the usage and effectiveness of your SharePoint 2010 deployment – whether it’s used as an internal or external web portal, a collaboration tool or a document and records management repository.  These features are part of the Web Analytics capabilities of SharePoint 2010.

This blog post is the first of several that will give you more insight into the enhanced Web Analytics features that we have built into SharePoint 2010. This first post will provide an overview of the new Web Analytics features and we’ll take a deep dive in to specific scenarios in future posts.

Overview

Web Analytics Reports

In SharePoint 2010, we have improved the set of Web Analytics reports that are available out-of-the-box, which will provide insights into the behavior of users of your SharePoint sites.  There are three categories of reports that you will find:

  • Traffic reports: These reports provide metrics such as:
    1. How much traffic your site gets (Number of Page Views);
    2. Who visits your sites (Top Visitors);
    3. How visitors arrive at your site (Top Referrers);
    4. Daily Unique Visitors, Top Destinations, Top Browsers, etc;
  • Search reports: These reports give you insight into what users are searching for, for example:
    1. How many times users searched (Number of Queries);
    2. What were the most used search terms (Top Queries);
    3. What queries have high failure rates (Failed Queries);
    4. Best Bet Usage, Search keywords, etc;
  • Inventory reports: These reports display key metrics regarding the inventory of your sites:
    1. What is the total disk drive space user (Storage Usage);
    2. How many sites exist (Number of Sites);
    3. Top Site Product Versions, Top Site Languages, etc;

We aggregate these reports aggregated at the following levels:

  1. Per web application in the farm
  2. Per site collection
  3. Per site
  4. Per search service application

Out-of-the-box, these reports are visible to Administrators at each level.  For example, site-level reports are available to Site Administrators of those sites.  We have also added a new permission level, “View Web Analytics Data,” that will allow users to access these reports without having to give them Administrator privileges.

You can access Web Analytics reports by going to Site Actions -> Site Settings.  Under the Site Actions heading you will see two links, Site Web Analytics Reports and Site Collection Web Analytics Reports

When you click on either link, you are taken to an overview page that shows you key metrics for your site.  You can then drill down to other reports by clicking on them on the left navigation. You can also change the date range for the reports by clicking on the Analyze tab on the Ribbon.

clip_image001

Custom Web Analytics Reports

The out-of-the-box reports are useful to get a general understanding of what is happening on your sites.  However, we have made it easy for you to get a deeper level of analysis, or to simply create your own reports.  To get started, click on the Customize Report button under the Analyze tab in the Ribbon.  Clicking this button will export the data contained in this report to Excel.  Excel is a power analytics tools and makes it easy for non technical users to add your own charts, set specific filters, and combine data from multiple reports.  In addition, the data within Excel is refreshable, which means that, once you customize the report, it will always be up-to-date with the latest data.  To get more details on the great new features in Excel 2010 for building charts, reports and pivot tables, take a look at the Excel Team blog.

clip_image002

Web Analytics Workflows

Web Analytics Workflows is a powerful new feature set that enables you to get reports sent out either on a schedule or when specific conditions are met.  For example, you can set them up to receive an email every time the total number of pages views drop by 80% week over week.

To setup a Web Analytics Workflow, go to the Web Analytics report that you are interested in and click on Schedule Alerts or Reports on the Analyze tab in the Ribbon.

Clicking this button will guide you through a series of steps to create your Workflow.

clip_image003

Best Bets Suggestions

Best Bets allow Search Administrators to determine what the most relevant search result is for a given keyword. Up until now, Search Administrators had to look at different reports and data to determine which best bets needed to be added. That process is no longer necessary as SharePoint 2010 periodically sends out suggestions for new Best Bets using all the search metrics it has collected. Now, Search Administrators can simply look through each of the Best Bet suggestions and easily accept or reject them.

To access the Best Bet Suggestions, go to Site Actions, click on Site Collection Web Analytics Reports, and the click on Best Bets Suggestions on the left navigation.

Web Analytics Web Part

We have created a new web part, the Web Analytics Web Part, targeted at Site Managers. This new Web Part is an end-user facing Web Part that can be easily inserted into any page on your site.  It can be configured to display the ‘most viewed content’ or the ‘most frequent search queries’ in the site. The data in the Web Part is continuously refreshed as new content or new search queries become more popular.

To use this Web Part, go into the Edit mode of one of your Site Pages and click on any place you can add a Web Part.  Then, from the Insert tab on the Ribbon, click on Web Part.  Finally, click on the Content Rollup category and select the Web Analytics Web Part.

clip_image004

After you have inserted the Web Analytics Web Part, you can then configure it to display the data you are interested in.

Conclusion

Using the new Web Analytics features in SharePoint 2010, you will be able to get a deeper understanding of what users are doing, what they want from your site and how you can tailor the SharePoint experience to bets meet their needs.   Keep an eye out for future posts where we will delve deeper into each of the features mentioned above.

EDiscovery in SharePoint Server 2010

Hi everyone, I am Quentin Christensen and I work on document and records management functionality for SharePoint. Electronic discovery (commonly referred to as eDiscovery) is an area we are supporting with new set of capabilities in SharePoint Server 2010. In case you are not familiar with eDiscovery, it is the process of finding, preserving, analyzing and producing content in electronic formats as required by litigation or investigations. eDiscovery is an important concern for all of our customers and given that SharePoint has grown to be an integral part of collaboration, document, and records management for many organizations, we recognize the need to support the eDiscovery process for SharePoint content.

Microsoft Office SharePoint Server 2007 included a hold feature that could be used for eDiscovery, but it was scoped to the Records Center site template. With SharePoint Server 2010 the eDiscovery capabilities have been greatly expanded to provide more functionality and the power to use these features across your entire SharePoint deployment.

In this post, I want to highlight three major improvements in SharePoint that support eDiscovery. You can:

  • Manage holds and conduct eDiscovery searches on any site collection
  • Use SharePoint Server Search or FAST Search for SharePoint out of box to search and process content
  • Automatically copy eDiscovery search results to a separate repository for further analysis

Read on to learn how SharePoint Server 2010 can support your eDiscovery initiatives and provide you with the tools you need to manage holds, identify, and collect SharePoint content.

The eDiscovery Process

The Electronic Discovery Reference Model from EDRM (edrm.net) provides an overview of the different parts of the eDiscovery process:

image

SharePoint Sever 2010 addresses the Information Management, Identification, Preservation and Collection stages. While this blog post will focus mostly on the identification, preservation and collection components, SharePoint provides a rich Information Management platform for Collaboration, Social Computing, Document Management and Records Management.  This means that you can take a proactive approach to eDiscovery by putting a governance framework in place and using appropriate disposition policies to expire content. Managing content and deleting it when it is no longer needed will reduce the amount of content that must be indexed and searched, and collected for eDiscovery.  The result is that eDiscovery costs can be dramatically reduced, changing the problem from finding a needle in a hay stack to finding a needle in a hay bale. Ultimately, the key to achieving legal compliance for eDiscovery obligations is built upon a foundation of robust Information Management.

When an eDiscovery event occurs, such as a receipt of complaint, discovery, or notice of potential legal claim, the identification stage begins. Content that may be subject to eDiscovery must be identified and searches are conducted to find that content. That content needs to be preserved and at some point, the content will be collected.

 

The eDiscovery Features

Hold and eDiscovery

Hold and eDiscovery is a site level feature that can be activated on any site.

image

Activating this feature creates a new category in Site Settings that provides links to Holds and Hold Reports lists. There is also a page to discover and hold content that allows you to search for content and add it to a hold. Once the Hold and eDiscovery feature is activated you can create holds and add to hold any content in the site collection. By default only Site Collection administrators have access to the Hold and eDiscovery pages. To give other users permission, add them to the permissions list for the Hold Reports and Holds lists. This will also give access to the Discover and hold content page.

clip_image005

You can manually locate content in SharePoint and add it to a hold, or you can search for content and add the search results to a hold. With the Hold and eDiscovery feature you can create holds in the hold list and then manually add content to the relevant hold by clicking on Compliance Details from the drop down menu for individual items.

image

Then click on the link to Add/Remove from hold.

image

And you can select the relevant hold to add to or remove from.

image

By manually adding an item to hold you will block editing and deletion of that item until it is released from hold. You will notice that the document now has a lock icon showing that it cannot be edited or deleted.

image

Each night a report for each hold is generated by a timer job. If you need a hold report faster you can manually run the Hold Processing and Reporting timer job in Central Administration.

Search and Process

You can manually add items to hold on any site collection, which is great. But that doesn’t help you find the content you don’t already know about. What if you have a large amount of items you want to find and add to a hold? For that you can use the features on the Discover and hold content page, which is a settings page in Site Settings. From this page you can specify a search query and then preview the results. The configured search service (SharePoint Search Server or FAST Search for SharePoint) will automatically be used. You can then select the option to keep items on hold in place so they cannot be edited or deleted, or if you have configured a Content Organizer Send to location in Central Administration you can have content copied to another site and placed on hold. You may want to create a separate records center site for a particular hold to store all content related to that hold. The Content Organizer is a new SharePoint Server 2010 feature based on the Microsoft Office SharePoint Server 2007 Document Router with richer functionality to automatically classify content based on Content Type or metadata properties. Look for a future blog post covering the Content Organizer.

Holding content in place is recommended if you want to leave content in the location is was created with all the rich context that SharePoint provides, while blocking deletion and editing of content. Be aware that this will prevent users from modifying items. If you prefer users to continue editing documents, then use the copy to another location approach.

When searching and processing, the search will by default be scoped to the entire Site Collection and run with elevated permissions so all content can be discovered. The search can be scoped to specific sites and you can also preview search results before adding the results to a hold. Items can be placed on multiple holds and compliance details will show all of the holds that are applied to an item.

image

In summary, SharePoint Server 2010 contains key features that make it an essential aspect of your eDiscovery strategy. With the new SharePoint Server 2010 capabilities you can easily apply proper retention policies for all content and make it easier to discover content if an eDiscovery event occurs. eDiscovery often prescribes tight deadlines for production. SharePoint 2010 helps you find the right content and deliver it faster.

Quentin Christensen
Program Manager – Document and Records Management
Microsoft

SharePoint ECM in Force at the AIIM Expo in Philadelphia

As you can see from the last few posts, we are incredibly proud of the evolution of our ECM capabilities in SharePoint 2010 and in April, we are heading to the AIIM Expo in Philadelphia to give attendees the chance to try out SharePoint 2010 and hear directly from the people who built the product.  Starting on April 20th, we’ll open the doors on the SharePoint Experience Lab where you can learn about Office and SharePoint 2010, assisted by the ECM team from Redmond and some of our top field specialists.  The SharePoint Experience Lab will be in the Expo Hall where we will be joined by a number of our leading partners and best of all, if you register before the event, entry to the Expo Hall is ABSOLUTELY FREE!  That’s right, register now and you will get access to a wide range of SharePoint labs, supported by the team from Redmond.

SharePointExperienceLab

In addition to the SharePoint Experience Lab, we are proud to support the SharePoint 2010 Summit @ AIIM Expo.  The SharePoint 2010 Summit @ AIIM Expo consists of almost 30 sessions delivered by the SharePoint ECM Team, customers and leading industry analysts.  Entry to the SharePoint 2010 Summit @ AIIM Expo is included with a conference pass that you can pick up for just $599 (UPDATE – Advanced registration has been extended.  Enter code A525G to receive a $50 discount).  Not a bad price to ask all the questions you ever wanted answered about SharePoint and get the inside scoop from senior product and program managers as well as Eric Swift, the General Manager of the SharePoint Marketing Group.

Here is an overview of the content being delivered by Microsoft speakers at the SharePoint 2010 Summit @ AIIM Expo:

  • Introducing SharePoint 2010
  • ECM for the Masses: How SharePoint 2010 Delivers on the Promise
  • SharePoint and Office: What’s New in 2010
  • Overview of Social Computing in SharePoint 2010
  • Web Content Management in SharePoint 2010
  • Growing SharePoint from Small Libraries to Large Scale Repositories & Massive Archives
  • Visual Customization Overview: Theming & Branding For Any Site
  • Using Enterprise Content Types & Managed Taxonomies in SharePoint 2010
  • Using SharePoint Analytics and End User Feedback to Optimize the Content and Organization of your SharePoint Sites
  • Document Management in SharePoint 2010
  • Building Rich, Immersive Sites with Microsoft Tools &  Technologies
  • Enterprise Search Overview
  • Delivering BI to the Masses at Microsoft
  • Building an Enterprise Knowledge Management Solution on SharePoint 2010
  • Records Management Strategies in SharePoint 2010
  • Better Together Collaboration with SharePoint 2010, Office 2010 & More!
  • Managing and Sharing Digital Assets in SharePoint 2010
  • If You Build It, They Will Come: Driving End User Adoption

With the launch of Office and SharePoint 2010 set for May 12th and our intent to RTM (Release to Manufacturing) this April 2010, there has never been a better time to hear from the team that built the product and get the knowledge you need to make SharePoint successful within your business.  Spring is coming to Philadelphia and with it comes SharePoint 2010 and the SharePoint ECM team.  We look forward to seeing you at the AIIM Expo.

Ryan Duguid
Senior Product Manager – ECM and Compliance
Microsoft

Page 8 of 11« First...678910...Last »

Categories

  • An error has occurred; the feed is probably down. Try again later.

Other sites you might enjoy: