Hi there, my name is Pat Miller, and I am the development lead for the Enterprise Metadata / Taxonomy features in SharePoint 2010. I’ve been working on the ECM team and its fore-bearers for the better part of 11 years now, first with NCompass Labs which was acquired by Microsoft in 2001, then on the Content Management Server team, then with the CMS team as part of MOSS 2007. This is the first of many blog posts on the Enterprise Metadata Management (EMM) system in the 2010 release. This will be the overview of the system, and future posts will drill into specific areas like event receivers, field editing and search refinements.
First, some background. At one point during the development of Content Management Server 2002, we spent some time with the folks that run the Microsoft.com set of websites. One of the things they were very keen on was this taxonomy system that they had built. It seemed fairly useful, and we considered implementing something like it, but didn’t have the time, and there was a general concern that no one would actually do the work of tagging data. During the development of MOSS 2007, we were spending most of our time rewriting our feature set to run on top of SharePoint, and once again, taxonomy fell off the list of things we were willing to tackle (and still, people would consistently say that people just don’t tag).
Around this time people started tagging things in their own world. The rise of digital cameras and mp3 players brought a huge amount of data that for the most part, had to be marked up with metadata in order to be searchable. Some metadata was added to the files automatically (things like date, size, camera model, etc.), but specific user information wasn’t there. You quickly learned that if you categorized the images (either through folder location or tags) you could navigate your way through 10′s of thousands of files (images, music, etc.) the way that works for you personally, rather than relying on default information like date the picture was taken. People became more familiar with the concept of navigating their content via metadata – "Let’s listen to all my Pearl Jam albums, I feel like listening to Electronica, find me photos of Dad". It’s only a small step from that to wanting to impose some sort of hierarchy – find me photos of my whole family, my extended family, I want to listen to all classical music, or perhaps just from the Baroque period. Tagging all that data really unlocked a lot of potential.
Perhaps the landscape had changed…
We decided to run with it in the 2010 release. There were a few main tenets that we tried to let guide us:
To that end, we set out to enable a bunch of new user scenarios for SharePoint 2010.
We started out the release with a blank sheet of paper and some very knowledgeable people in the information management space. We also found that most people started twitching uncontrollably when the word "ontology" was mentioned. ‘Tagging’ was fine, ‘metadata’ was OK, at ‘taxonomy’ they started looking for an exit. Telling people that a taxonomy was just a hierarchy calmed them down, but the whole ontology thing was too much of a stretch. It also complicated things considerably, and we could still get a huge amount of value out of a taxonomy, so this was our starting point.
Some features were very obvious – filtering list views based on hierarchy inclusion, search refinement, etc. Some were a small step from this – if you have a consistent vocabulary across an enterprise, you can start to do some interesting things. You can match areas of expertise to specific content or workflows. You can start to relate content in totally different systems based on something with more context than a simple string. What if you could relate your analytics content to your taxonomy system and get a real-time view of what topics people are viewing instead of simply guessing based on their position in a URL namespace? How about overlaying your security model with your metadata so that certain people had rights to view content based on the metadata applied to it? How about we get down to business and focus our resources and ship a compelling collection of features.
To that end, we came up with the following components in the system:
The taxonomy repository itself, we call it the Term Store. Some companies have very top down strict taxonomies, so some term stores might have a very few people allowed to edit them. We’ll have to support having multiple term stores.
The taxonomy system needs to be able to support a complex enterprise. A simple flat list of strings isn’t going to be sufficient. To that end, we support the following concepts and behaviors:
OK, that’s a nice set of features in the taxonomy system. What do we want to do with all those terms and termsets?
The next set of features involve integrating the taxonomy system with SharePoint. The primary place this happens is in the new managed metadata field type. Think of it as a choice field that went to the gym. It’s much more powerful. The metadata field type is a normal field that can be applied to any content type (list or document library). However it has a few nice things associated with it:
Once data is in SharePoint, other SharePoint features can deliver additional goodness:
Now that we have all that nice consistent metadata on our content, we can do a few more things:
And since we know that we can’t possibly implement every feature that everyone would want, everything is accessible through our API. In future blog posts, we’ll go over how to use this API to deliver some compelling features.
Hopefully this is a nice introduction to the work we did around taxonomies and enterprise metadata. We had a lot of fun coming up with the design and implementation, and hope that it resonates with you.
Thanks for reading.
Pat.Miller at Microsoft.com
Hi everyone,
I’d like to answer a common question about how to modify the behavior the Variations feature in SharePoint 2010 uses when propagating pages. That is, how pages in the source variation site are copied and appear on target variation sites as minor draft versions.
Page propagation is triggered by publishing a page on the source variation site by default. Each time you publish a source page, the Variations Event Receiver adds a work item to the Variations Propagate Pages timer job queue. When the timer job runs, it will begin executing the first 100 page propagation work items. For each work item, Variations will copy the source page to all target sites, creating the page if it does not yet exist, or appending a draft minor version if the target page does already exist.
In some cases, users might not want changes to a page on the source to necessarily propagate to all targets. That is, users might want to make source-local changes and have the option to make changes globally applicable when they want. This often takes the form of a question like “How can I stop variations from overwriting my target pages every time I publish a source page?” Variations in SharePoint 2010 helps you do this.
In SharePoint 2010, we’ve worked to improve the Variations feature’s server citizenship by moving all Variations operations to the timer service. This way, server administrators can control the frequency with which operations run and better manage server load. The “Update Variations” button now adds a work item to the same Variations Propagate Page timer job queue as does publishing a page when “Automatic Creation” is enabled. What differentiates “Update Variations” is that you can also use this button to propagate source draft versions without publishing them on the source variation site.
When you run this PowerShell script to enable “On-Demand Page Propagation,” all Variations Propagate Pages timer job work items are filtered and discarded except those added to the queue by the Update Variations button:
Enable On-Demand Page Propagation:
[System.Reflection.Assembly]::LoadWithPartialName(“Microsoft.SharePoint”)
$site = new-object Microsoft.SharePoint.SPSite(“http://yourserver/sites/abc”)
$folder = $site.RootWeb.Lists["Relationships List"].RootFolder
$folder.Properties.Add(“DisableAutomaticPropagation”, “True”)
$folder.Update();
Disable On-Demand Page Propagation:
[System.Reflection.Assembly]::LoadWithPartialName(“Microsoft.SharePoint”)
$site = new-object Microsoft.SharePoint.SPSite(“http://yourserver/sites/abc”)
$folder = $site.RootWeb.Lists["Relationships List"].RootFolder
$folder.Properties.Remove(“DisableAutomaticPropagation”)
$folder.Update();
“Update Variations” in SharePoint 2010 can be used to propagate the current version of a page on-demand provided that the Variations Propagate Page Timer Job is enabled
In MOSS 2007, some users disabled the “Variations Propagate Pages” timer job in Central Admin as a workaround. With the timer job disabled, publishing a source page would not cause SharePoint to copy the source page to any target page. Authors on the source variation site could then use the “Update Variations” button to propagate the current version of the source page on-demand.
Clicking “Update Variations” in MOSS 2007 immediately propagated the current version of the source page to all target pages, “skipping the line” and bypassing the Variations Propagate Pages timer job. So when the timer job was disabled, “Update Variations” could still be used to propagate pages.
However, when the timer job is disabled, publishing pages on the source continues to add work items to the timer job queue if the “Automatic Creation” option is enabled, as it is by default. Over time, the queue can grow and contain hundreds or thousands of work items, all of which would begin to execute if the Variations timer job were re-enabled in the future. If you upgrade to SharePoint 2010 with a backlog of work items, SharePoint will discard these.
With the new “On-Demand Page Propagation” functionality, you can achieve this content distribution model out of the box.
“Update Variations” in MOSS 2007 works differently under the hood from its counterpart in SharePoint 2010
Things to keep in mind:
Thanks for reading. Happy propagating!
Josh Stickler
Program Manager
Client: “HELP! SharePoint is broken. Can you come to Chicago today and fix it?”
Concurrency: “Sure, can you tell me what is broken?”
Client: “Our users can’t [...]
My colleagues in the Exchange team have introduced a wealth of new capabilities in Exchange 2010 to support email archiving, retention and discovery but I’m often asked how an organization should think about managing emails in SharePoint as part of an overall collaboration and content management strategy. While there are no hard and fast rules, it pays to think about four distinct scenarios:
Each of these scenarios has a set of desired outcomes and set of capabilities that best meet those outcomes so let’s take each one in turn.
Personal email management is all about empowering end users to take control of their inbox, making it easier to organize, find and take action on email. Users want a mail client that makes it easy to manage email on a day to day basis and expect their IT department to take care of backup and restore.
Project and case management is all about sharing information and managing a group of related artifacts in a single location with a common security model, metadata model and information management policy. Users are looking for a solution that makes it easy to collaborate and find information while leveraging workflow to drive common business processes.
Email archiving is all about taking control of the proliferation of email within an organization, driving down the cost of provisioning ever increasing inbox requirements and applying broad brush time based disposition. Email archiving is typically driven by IT who implement rules and retention policy that is typically transparent to end users.
Records management is all about identifying business critical content, driving appropriate classification and then applying relevant retention management policies. Accurate classification of content and applying appropriate metadata ensures that information is easy to find and use throughout the enterprise. At the same time, appropriate use of retention policies ensure that businesses can gracefully age content that is no longer of value while adhering to relevant government and industry regulations. Email is a critical part of any modern records management strategy and so businesses need to make it easy for end users to identify and classify email that is considered to be business critical content.
Outlook 2010 and Exchange 2010 provide great capabilities to deal with personal email management and email archiving while SharePoint 2010 provides an ideal platform for storing email that is part of project and case management or an effective and encompassing records management strategy.
Of course there is a natural flow or continuum as email may start by being well managed in a user’s inbox, it may have an email archiving policy attached to it but a user may decide to manage it as part of a project and then finally declare the email as a record upon project completion. As I said at the start, there are no hard and fast rules but hopefully I’ve given you a better frame of reference for working out what systems are required to support email from creation to disposition depending on the required business outcomes.
If you want to hear more about this topic, I’ll be presenting a webinar with Colligo, one of our partners who provide an add-in for Outlook that makes it easy for users to drag and drop email in to SharePoint, applying the appropriate Content Type and metadata attributes as part of the process. The webinar is on June 17th so sign up now.
Ryan Duguid
Senior Product Manager – ECM and Compliance
Microsoft Corporation