One of the most talked about capabilities since the launch of SharePoint 2010 is the Managed Metadata Service. For those of you who aren’t already familiar with this service and the support it provides for modeling and deploying a rich corporate taxonomy, I’d recommend reading Pat’s post Introducing Enterprise Metadata Management. For those of you who are familiar with the great taxonomy capabilities in SharePoint 2010, I’m sure many of you have spent time looking at an empty term store wondering where to start. If you’re lucky, you already have a well defined corporate taxonomy and should by now have leveraged our import capabilities to pre load SharePoint with the vocabulary you want your users to leverage for tagging and finding content. On the other hand, you could be like many customers I talk to who don’t even know where to start when it comes to developing a taxonomy, or have spent years in conference rooms debating what the right taxonomy should be. You’ve probably even head someone say “I’m sure someone has already solved this problem”, and if that’s the case, that someone was the smartest person in the room for two key reasons. The first is that there are professional taxonomists who have already modeled most business domains and the second is that the people responsible for creating content in your company have already developed a community vocabulary or folksonomy that they use extensively.
If you happen to be one of those customers who is stuck looking at an empty term store then I’ve got great news for you. The SharePoint team have teamed up with WAND, a leading provider of Enterprise Taxonomies, to make their General Business Taxonomy available as a freely available download. The General Business Taxonomy consists of around 500 terms describing common functional areas that exist in most businesses. The General Business Taxonomy can be imported in to the SharePoint 2010 term store within minutes and provides a great starting point for customers looking to build a corporate vocabulary and take advantage of the Managed Metadata Service. In addition to this freely available download, WAND provide a range of taxonomies covering a variety of domains including Products and Services, Local Search, Enterprise, Jobs, Travel, Medical, Lifecycle, Finance and Records Retention.
Download the General Business Taxonomy today and start to explore the benefits that taxonomy can bring to your business and your people.
If you’re new to taxonomy and the benefits it can brings to your business, take a look at the following sites:
Ryan Duguid
Senior Product Manager
Microsoft Corporation
One of the most talked about capabilities since the launch of SharePoint 2010 is the Managed Metadata Service. For those of you who aren’t already familiar with this service and the support it provides for modeling and deploying a rich corporate taxonomy, I’d recommend reading Pat’s post Introducing Enterprise Metadata Management. For those of you who are familiar with the great taxonomy capabilities in SharePoint 2010, I’m sure many of you have spent time looking at an empty term store wondering where to start. If you’re lucky, you already have a well defined corporate taxonomy and should by now have leveraged our import capabilities to pre load SharePoint with the vocabulary you want your users to leverage for tagging and finding content. On the other hand, you could be like many customers I talk to who don’t even know where to start when it comes to developing a taxonomy, or have spent years in conference rooms debating what the right taxonomy should be. You’ve probably even head someone say “I’m sure someone has already solved this problem”, and if that’s the case, that someone was the smartest person in the room for two key reasons. The first is that there are professional taxonomists who have already modeled most business domains and the second is that the people responsible for creating content in your company have already developed a community vocabulary or folksonomy that they use extensively.
If you happen to be one of those customers who is stuck looking at an empty term store then I’ve got great news for you. The SharePoint team have teamed up with WAND, a leading provider of Enterprise Taxonomies, to make their General Business Taxonomy available as a freely available download. The General Business Taxonomy consists of around 500 terms describing common functional areas that exist in most businesses. The General Business Taxonomy can be imported in to the SharePoint 2010 term store within minutes and provides a great starting point for customers looking to build a corporate vocabulary and take advantage of the Managed Metadata Service. In addition to this freely available download, WAND provide a range of taxonomies covering a variety of domains including Products and Services, Local Search, Enterprise, Jobs, Travel, Medical, Lifecycle, Finance and Records Retention.
Download the General Business Taxonomy today and start to explore the benefits that taxonomy can bring to your business and your people.
If you’re new to taxonomy and the benefits it can brings to your business, take a look at the following sites:
Ryan Duguid
Senior Product Manager
Microsoft Corporation
Hi there, my name is Pat Miller, and I am the development lead for the Enterprise Metadata / Taxonomy features in SharePoint 2010. I’ve been working on the ECM team and its fore-bearers for the better part of 11 years now, first with NCompass Labs which was acquired by Microsoft in 2001, then on the Content Management Server team, then with the CMS team as part of MOSS 2007. This is the first of many blog posts on the Enterprise Metadata Management (EMM) system in the 2010 release. This will be the overview of the system, and future posts will drill into specific areas like event receivers, field editing and search refinements.
First, some background. At one point during the development of Content Management Server 2002, we spent some time with the folks that run the Microsoft.com set of websites. One of the things they were very keen on was this taxonomy system that they had built. It seemed fairly useful, and we considered implementing something like it, but didn’t have the time, and there was a general concern that no one would actually do the work of tagging data. During the development of MOSS 2007, we were spending most of our time rewriting our feature set to run on top of SharePoint, and once again, taxonomy fell off the list of things we were willing to tackle (and still, people would consistently say that people just don’t tag).
Around this time people started tagging things in their own world. The rise of digital cameras and mp3 players brought a huge amount of data that for the most part, had to be marked up with metadata in order to be searchable. Some metadata was added to the files automatically (things like date, size, camera model, etc.), but specific user information wasn’t there. You quickly learned that if you categorized the images (either through folder location or tags) you could navigate your way through 10′s of thousands of files (images, music, etc.) the way that works for you personally, rather than relying on default information like date the picture was taken. People became more familiar with the concept of navigating their content via metadata – "Let’s listen to all my Pearl Jam albums, I feel like listening to Electronica, find me photos of Dad". It’s only a small step from that to wanting to impose some sort of hierarchy – find me photos of my whole family, my extended family, I want to listen to all classical music, or perhaps just from the Baroque period. Tagging all that data really unlocked a lot of potential.
Perhaps the landscape had changed…
We decided to run with it in the 2010 release. There were a few main tenets that we tried to let guide us:
To that end, we set out to enable a bunch of new user scenarios for SharePoint 2010.
We started out the release with a blank sheet of paper and some very knowledgeable people in the information management space. We also found that most people started twitching uncontrollably when the word "ontology" was mentioned. ‘Tagging’ was fine, ‘metadata’ was OK, at ‘taxonomy’ they started looking for an exit. Telling people that a taxonomy was just a hierarchy calmed them down, but the whole ontology thing was too much of a stretch. It also complicated things considerably, and we could still get a huge amount of value out of a taxonomy, so this was our starting point.
Some features were very obvious – filtering list views based on hierarchy inclusion, search refinement, etc. Some were a small step from this – if you have a consistent vocabulary across an enterprise, you can start to do some interesting things. You can match areas of expertise to specific content or workflows. You can start to relate content in totally different systems based on something with more context than a simple string. What if you could relate your analytics content to your taxonomy system and get a real-time view of what topics people are viewing instead of simply guessing based on their position in a URL namespace? How about overlaying your security model with your metadata so that certain people had rights to view content based on the metadata applied to it? How about we get down to business and focus our resources and ship a compelling collection of features.
To that end, we came up with the following components in the system:
The taxonomy repository itself, we call it the Term Store. Some companies have very top down strict taxonomies, so some term stores might have a very few people allowed to edit them. We’ll have to support having multiple term stores.
The taxonomy system needs to be able to support a complex enterprise. A simple flat list of strings isn’t going to be sufficient. To that end, we support the following concepts and behaviors:
OK, that’s a nice set of features in the taxonomy system. What do we want to do with all those terms and termsets?
The next set of features involve integrating the taxonomy system with SharePoint. The primary place this happens is in the new managed metadata field type. Think of it as a choice field that went to the gym. It’s much more powerful. The metadata field type is a normal field that can be applied to any content type (list or document library). However it has a few nice things associated with it:
Once data is in SharePoint, other SharePoint features can deliver additional goodness:
Now that we have all that nice consistent metadata on our content, we can do a few more things:
And since we know that we can’t possibly implement every feature that everyone would want, everything is accessible through our API. In future blog posts, we’ll go over how to use this API to deliver some compelling features.
Hopefully this is a nice introduction to the work we did around taxonomies and enterprise metadata. We had a lot of fun coming up with the design and implementation, and hope that it resonates with you.
Thanks for reading.
Pat.Miller at Microsoft.com
SharePoint 2010 is more than just SharePoint 2007 plus a bunch of new bullet points on the box. We didn’t just haphazardly build a bunch of new features, look back at the fertile seeds we planted, and muse about how “everything should work pretty well as libraries get large.” We built, and more importantly, tested all the features you’re reading about with scale in mind. We are setting new scale targets for 2010 that go above and beyond what we set in 2007. These numbers are not final yet, but we’re shooting for tens of millions of documents in a single library, depending on some specific parameters of your scenario.
When I throw out numbers like that, I’m not talking about just big, static libraries with content that just sits there. We want you to do crazy things with SharePoint 2010 like stuff a million document sets in a single document library with workflows running every which way, a hundred different retention policies firing off actions when you least expect them, and users uploading, tagging, and searching day in and day out. All the goodness of the SharePoint platform will be available to you whether you’re building a team site, a collaborative repository, a knowledge base, or a super large archive.
Like a plump, juicy sausage, much of the good stuff in SharePoint 2010 to give it delicious scalability are things that most people don’t need (or want) to know about. For the most part, scale just works. However, the chef (or information architect) is still a super important player. A well-planned repository is one that will have your users coming back for seconds and writing rave reviews; a poorly-planned one is one that will have them chugging Pepto-Bismol the next morning. Just because you can stuff a bunch of documents in a SharePoint 2010 library without your server igniting in flames the next day at doesn’t mean that you should without first thinking through how to best use the tools available to deliver an excellent experience to your end users.
So, even though scale in SharePoint 2010 just works, you’re not going to install the bits on day 1 and have a massive, searchable, beautiful content storefront on day 2. Guidance still matters, and believe me, we know it; this blog entry is just the beginning of the content we’re planning on delivering to help you on this front. I wouldn’t even call this blog entry guidance; it’s just a primer on the features and capabilities of SharePoint 2010 that you will grow to love if you’re passionate about scale at the library level – if you want to shove a whole bunch of documents in one place and have it be a great experience for both IT and your end users.
So what are these features and capabilities? Here are a few of the most important ones that I’m going to blog about now and in the near future:
One challenge we’ve consistently seen customers run into when building large repositories on SharePoint 2007 is trouble with large containers. As the number of documents in any single container grows – either at the root of a library, or in a folder – bad things start to happen. For one, as your document to container ratio increases, it becomes harder and harder to find exactly what you’re looking for. More serious are the performance implications of large containers. Any of the out of the box ways of retrieving content from containers in SharePoint 2007 – like the All Documents view, the Explorer view, or a Content Query web part – would work, but they don’t scale very well. Loading All Documents in a library with a million items at the root would take a long time to finish. The big problem here is that you wouldn’t be the only one affected; all your friends running SharePoint sites on that same database server would experience things slowing to a crawl as well, as the database server dutifully iterated over those million documents to find the right ones.
Why does this happen? Any time you ask for content from SharePoint, you have to specify how it’s sorted – for example, the All Documents view in SharePoint 2007 asks for the top 100 results, sorted by filename. But items aren’t sorted by filename in the SharePoint content database – so, to bring you this view, SharePoint has to gather up all these million items, sort them, and finally display the 100 ones at the top of the sorted list. Imagine this as being like flipping through the residential section of a phonebook to find the first 100 addresses, sorted in alphabetical order. This would be a miserable task, because the telephone book isn’t sorted in this way – so in order to ensure your sorted list was accurate in the end, you’d have to look through the entire residential section, from start to finish, because after all, the last person listed in the phone book might live at 1000 Aardvark Lane.
The laws of physics are the same in SharePoint 2010 as they were in SharePoint 2007; if you run a query that needs to touch a very large number of items, you’re going to have to wait a long time, and so will everybody else. One prominent thing we did in SharePoint 2010 is to nip these queries in the bud before they get executed. To make a long story short (you can read the long story here), a farm administrator can set a threshold which defines the maximum number of items a single SharePoint query can touch. By default this threshold is 5,000. Any library with more items than this threshold is a large list.
Let’s go back to our example of the library with one million items at the root. Say you had that library in SharePoint 2007, and you upgraded to SharePoint Server 2010. First thing you’ll see upon navigating to this library will look something like this:
See the yellow bar above the list view? That’s a sign you have the Metadata Navigation and Filtering site feature turned on and it’s causing something magical to happen! When you load this view, SharePoint 2010 knows that you’re being greedy and asking it to scan through those million items. Since this query exceeds the maximum number of items a single query is allowed to scan (5,000) it doesn’t run the query. But who wants to stare at an empty list view? Instead of running this query as-is, SharePoint finagles it a bit and transforms it into a query that’s almost as good as the one you were asking for, but won’t make the database buckle under the pressure. In this case, we assume that it’s fairly likely that the document you’re looking for is one of the most recently created items in the library – so instead of scanning all one million items, we only scan the top 1,000 or so recently created documents, sort those by filename, and show them to you in the list view. This is what we call a simple fallback query: a query that doesn’t specify an index and asks for too many items in return, so instead of considering the entire list as being eligible for the query, SharePoint considers only the thousand or so most recently added items.
“Wait a second. You’re telling me that SharePoint throttles queries without asking me first? How on earth am I supposed to find anything in this crazy world of fallback queries and partial results?”
Let me assure you; this throttling business is a good thing. It’s a core ingredient in what makes SharePoint 2010 a resource for addressing your scale challenges. Gone are the sleepless nights where you toss and turn and worry about page faults on your database cluster resulting from Mack in Accounting stuffing 6,000 beer pong tournament photos in the root of a library in a forgotten team site in the dusty corners of your SharePoint deployment. The SharePoint 2010 feature set replaces this overarching concern with a set of well-scoped challenges; instead of worrying about every library that might get big, you get to plan for and craft experiences for the set of libraries that need to get big for business reasons.
I should mention really quickly that throttling is about more than just list views. There is a whole class of operations that involve iterating through all the documents in a list, or all the documents in a folder, that will get throttled (in other words, they will not execute) when the list or container is large. These operations include things like:
Above is another screenshot from my million item library. This time, we’ve put a couple of SharePoint 2010 features to work. See that I have “demonstration scripts” selected in the left hand side in the tree view, and my list view is rendering without the yellow bar that’s telling me I’m only seeing newest results. That hierarchy of tags you see there represents a taxonomy, Item Type. I am browsing the documents in this library according to their Item Type; in the screenshot, I am filtering to show all documents with the value “demonstration scripts”. Here are the steps that I took to make this happen:
In these three easy steps, I made “Item Type” a first class navigational pivot over the data. Instead of just staring at a partial list of content at the root, I can now browse with impunity by this virtual folder structure.
Here’s a couple of cool aspects of this feature that aren’t apparent from a single nifty screenshot:
You aren’t immune from the laws of physics; if you ask for documents tagged with demonstration scripts and there are 10,000 demonstration scripts, we’re not going to be able to show you all of them. In this case, though, you get something better than a simple fallback; you get an indexed fallback, which means that instead of considering the entire list, the query considers only the items that match the indexed portion of your query.
This article was just the first in my series of posts about architecting and building large lists filled with discoverable content. Here’s what you can expect over the next few weeks:
After that, I’ll be widening my scope a bit to talk about the overall knowledge management story in SharePoint 2010 – which is about more than just browsing for content in a library!
Lincoln DeMaris, Program Manager, ECM