Category: Metadata Driven Navigation

Taxonomy–The Challenge of Starting from Scratch

One of the most talked about capabilities since the launch of SharePoint 2010 is the Managed Metadata Service.  For those of you who aren’t already familiar with this service and the support it provides for modeling and deploying a rich corporate taxonomy, I’d recommend reading Pat’s post Introducing Enterprise Metadata Management.  For those of you who are familiar with the great taxonomy capabilities in SharePoint 2010, I’m sure many of you have spent time looking at an empty term store wondering where to start.  If you’re lucky, you already have a well defined corporate taxonomy and should by now have leveraged our import capabilities to pre load SharePoint with the vocabulary you want your users to leverage for tagging and finding content.  On the other hand, you could be like many customers I talk to who don’t even know where to start when it comes to developing a taxonomy, or have spent years in conference rooms debating what the right taxonomy should be.  You’ve probably even head someone say “I’m sure someone has already solved this problem”, and if that’s the case, that someone was the smartest person in the room for two key reasons.  The first is that there are professional taxonomists who have already modeled most business domains and the second is that the people responsible for creating content in your company have already developed a community vocabulary or folksonomy that they use extensively.

If you happen to be one of those customers who is stuck looking at an empty term store then I’ve got great news for you.  The SharePoint team have teamed up with WAND, a leading provider of Enterprise Taxonomies, to make their General Business Taxonomy available as a freely available download.  The General Business Taxonomy consists of around 500 terms describing common functional areas that exist in most businesses.  The General Business Taxonomy can be imported in to the SharePoint 2010 term store within minutes and provides a great starting point for customers looking to build a corporate vocabulary and take advantage of the Managed Metadata Service.  In addition to this freely available download, WAND provide a range of taxonomies covering a variety of domains including Products and Services, Local Search, Enterprise, Jobs, Travel, Medical, Lifecycle, Finance and Records Retention.

Download the General Business Taxonomy today and start to explore the benefits that taxonomy can bring to your business and your people. 

If you’re new to taxonomy and the benefits it can brings to your business, take a look at the following sites:

Ryan Duguid
Senior Product Manager
Microsoft Corporation

Taxonomy–The Challenge of Starting from Scratch

One of the most talked about capabilities since the launch of SharePoint 2010 is the Managed Metadata Service.  For those of you who aren’t already familiar with this service and the support it provides for modeling and deploying a rich corporate taxonomy, I’d recommend reading Pat’s post Introducing Enterprise Metadata Management.  For those of you who are familiar with the great taxonomy capabilities in SharePoint 2010, I’m sure many of you have spent time looking at an empty term store wondering where to start.  If you’re lucky, you already have a well defined corporate taxonomy and should by now have leveraged our import capabilities to pre load SharePoint with the vocabulary you want your users to leverage for tagging and finding content.  On the other hand, you could be like many customers I talk to who don’t even know where to start when it comes to developing a taxonomy, or have spent years in conference rooms debating what the right taxonomy should be.  You’ve probably even head someone say “I’m sure someone has already solved this problem”, and if that’s the case, that someone was the smartest person in the room for two key reasons.  The first is that there are professional taxonomists who have already modeled most business domains and the second is that the people responsible for creating content in your company have already developed a community vocabulary or folksonomy that they use extensively.

If you happen to be one of those customers who is stuck looking at an empty term store then I’ve got great news for you.  The SharePoint team have teamed up with WAND, a leading provider of Enterprise Taxonomies, to make their General Business Taxonomy available as a freely available download.  The General Business Taxonomy consists of around 500 terms describing common functional areas that exist in most businesses.  The General Business Taxonomy can be imported in to the SharePoint 2010 term store within minutes and provides a great starting point for customers looking to build a corporate vocabulary and take advantage of the Managed Metadata Service.  In addition to this freely available download, WAND provide a range of taxonomies covering a variety of domains including Products and Services, Local Search, Enterprise, Jobs, Travel, Medical, Lifecycle, Finance and Records Retention.

Download the General Business Taxonomy today and start to explore the benefits that taxonomy can bring to your business and your people. 

If you’re new to taxonomy and the benefits it can brings to your business, take a look at the following sites:

Ryan Duguid
Senior Product Manager
Microsoft Corporation

Introducing Enterprise Metadata Management

Hi there, my name is Pat Miller, and I am the development lead for the Enterprise Metadata / Taxonomy features in SharePoint 2010.  I’ve been working on the ECM team and its fore-bearers for the better part of 11 years now, first with NCompass Labs which was acquired by Microsoft in 2001, then on the Content Management Server team, then with the CMS team as part of MOSS 2007.  This is the first of many blog posts on the Enterprise Metadata Management (EMM) system in the 2010 release.  This will be the overview of the system, and future posts will drill into specific areas like event receivers, field editing and search refinements.

First, some background.  At one point during the development of Content Management Server 2002, we spent some time with the folks that run the Microsoft.com set of websites.  One of the things they were very keen on was this taxonomy system that they had built.  It seemed fairly useful, and we considered implementing something like it, but didn’t have the time, and there was a general concern that no one would actually do the work of tagging data.  During the development of MOSS 2007, we were spending most of our time rewriting our feature set to run on top of SharePoint, and once again, taxonomy fell off the list of things we were willing to tackle (and still, people would consistently say that people just don’t tag).

Around this time people started tagging things in their own world.  The rise of digital cameras and mp3 players brought a huge amount of data that for the most part, had to be marked up with metadata in order to be searchable.   Some metadata was added to the files automatically (things like date, size, camera model, etc.), but specific user information wasn’t there.  You quickly learned that if you categorized the images (either through folder location or tags) you could navigate your way through 10′s of thousands of files (images, music, etc.) the way that works for you personally, rather than relying on default information like date the picture was taken.  People became more familiar with the concept of navigating their content via metadata – "Let’s listen to all my Pearl Jam albums, I feel like listening to Electronica, find me photos of Dad".  It’s only a small step from that to wanting to impose some sort of hierarchy – find me photos of my whole family, my extended family, I want to listen to all classical music, or perhaps just from the Baroque period.  Tagging all that data really unlocked a lot of potential.

Perhaps the landscape had changed…

We decided to run with it in the 2010 release.  There were a few main tenets that we tried to let guide us:

  1. No one (well, almost no one) apply metadata for the shear joy.  It’s always for a purpose.
  2. #1 means that the reason for the system has to be for the end user benefit.  What can you do if you have this rich metadata applied?
  3. In order for #2 to come to realization, the metadata has to be present, which means that applying consistent metadata needs to be as easy and ubiquitous as possible.

To that end, we set out to enable a bunch of new user scenarios for SharePoint 2010.

We started out the release with a blank sheet of paper and some very knowledgeable people in the information management space.  We also found that most people started twitching uncontrollably when the word "ontology" was mentioned.  ‘Tagging’ was fine, ‘metadata’ was OK, at ‘taxonomy’ they started looking for an exit.  Telling people that a taxonomy was just a hierarchy calmed them down, but the whole ontology thing was too much of a stretch.  It also complicated things considerably, and we could still get a huge amount of value out of a taxonomy, so this was our starting point.

Some features were very obvious – filtering list views based on hierarchy inclusion, search refinement, etc.  Some were a small step from this – if you have a consistent vocabulary across an enterprise, you can start to do some interesting things.  You can match areas of expertise to specific content or workflows.  You can start to relate content in totally different systems based on something with more context than a simple string.  What if you could relate your analytics content to your taxonomy system and get a real-time view of what topics people are viewing instead of simply guessing based on their position in a URL namespace?  How about overlaying your security model with your metadata so that certain people had rights to view content based on the metadata applied to it?  How about we get down to business and focus our resources and ship a compelling collection of features.

To that end, we came up with the following components in the system:

The taxonomy repository itself, we call it the Term Store.  Some companies have very top down strict taxonomies, so some term stores might have a very few people allowed to edit them.  We’ll have to support having multiple term stores.

The taxonomy system needs to be able to support a complex enterprise.  A simple flat list of strings isn’t going to be sufficient.  To that end, we support the following concepts and behaviors:

  • Terms - A term is the central object in the taxonomy system.  It’s the concept itself.  It’s very hard to come up with a name for a concept and have it be sufficiently descriptive and not too vague.  Term is what we came up with.
  • Labels - Terms have to be known by a bunch of different names.  When someone types "check" it should be the same thing as someone that types "cheque".  "USA" and "United States" and "United States of America" are all referring to the same term.  We call these names labels.
  • Default Label - It’s a whole lot easier if one label is the default.  You can find it through any of its synonyms, but we’ll display the default label in most circumstances.
  • Termset - A collection of related terms in a hierarchy is a termset.  Things like "locations" and "products".
  • Term Reuse - This is a key point to the system.  If you have two termsets "Capitol Cities" and "Locations", the term "London" and all of it’s synonyms, etc. should be the same in both.  We don’t allow a term to have two parents in the same termset, but it can have two parents in different termsets.
  • Homographs – A homograph is a word that is spelt the same, but has a different meaning.  You should be able to have a hierarchy that has "Paris" existing in both France and Texas.  To keep things a bit more sane for the user, we don’t allow homographs to have the same parent.
  • Multiple language support - A given term has a bunch of meaning associated with it.  The translations belong to the term in the same way that synonyms do.  If a term doesn’t have a translation, we use the default language.
  • Groups - Groups in the taxonomy system are simply collections of termsets that share a common security assignment.  Termsets and terms aren’t ACL’d, groups are.
  • Deprecated terms - if a term shouldn’t be used any more, it can be deprecated.  This doesn’t remove it from the system, you just can’t apply it to new content moving forward.
  • Terms that are unavailable for tagging - this is slightly different from deprecated terms.  A deprecated term is deprecated in all occurrences in the taxonomy and isn’t shown to the user when tagging.  Unavailable terms are only unavailable in a specific termset, and are still displayed when browsing the hierarchy at tagging time.  The purpose of this is to allow things to be hierarchical without allowing people to tag with the wrong term.  For example, in the Capitol Cities termset, you might have continents in it so that people can find a particular city, but they would be marked as unavailable for tagging (with respect to Capitol Cities) because they should not be selectable at tagging time.
  • Merging terms - at times, you might get multiple terms in the system that really are the same thing.  They might be in the same termset, or they might be in different termsets.  When you merge them, you get a single term with all of the properties, and this new term will be reused in all termsets that the original terms existed.
  • Open Termsets - There are times when a highly managed taxonomy makes sense.  You shouldn’t be able to add random countries to the list of known countries.  However, you probably don’t want to give taxonomy editing permissions to everyone that is creating a new codeword.  Open termsets allow content editors to add new terms to a hierarchy at content authoring time.  It’s a bit of a meeting point between bottom up folksonomies and top down taxonomies.
  • Keywords - The degenerate case of a folksonomy is a simply flat list of strings.  They have no extra semantic meaning.  This is the enterprise keywords termset. Terms here don’t have a hierarchy, definitions, synonyms or translations.  However it is possible to move a keyword into a managed termset and add this additional data.
  • Local termsets - The taxonomy field type gives you all sorts of useful features, but you probably don’t want "places to order food from" to wind up in your enterprise taxonomy.  Local termsets are only visible within a single site collection.

OK, that’s a nice set of features in the taxonomy system.  What do we want to do with all those terms and termsets?

The next set of features involve integrating the taxonomy system with SharePoint.  The primary place this happens is in the new managed metadata field type.  Think of it as a choice field that went to the gym.  It’s much more powerful.  The metadata field type is a normal field that can be applied to any content type (list or document library).  However it has a few nice things associated with it:

  • Termset binding - You can specify what termset a field should be bound to.  You can have lots of fields bound to the same termset.  When you update the termset, all of the bound fields use the changes immediately.
  • Path or node display - You can choose to display the default label of the term by itself "Paris" or its path "Europe > France > Paris".
  • Multi-lingual rendering -   If a given term has been translated to a given language, when your UI is set to that language, the term translations are displayed.
  • Content type syndication – This isn’t a taxonomy feature per se, but it’s part of the enterprise metadata feature set.  We allow a term store to have a site collection defined as it’s "hub".  On that hub you can publish content types, and these content types will be pushed out to all consuming site collections.  This means that in addition to having a consistent vocabulary across your enterprise, you can have a consistent set of content types using all that goodness.
  • Rich editing - when you are applying a term to an item, you can search across the entire termset (including synonyms) or view the tree itself.  It makes it possible to choose from thousands of choices, which would normally break lookup and choice fields.
  • Editing support in the rich client applications - the document information panel in the Office client applications allows for applying terms.
  • Offline editing in the rich client applications - when you edit in the rich client applications, a copy of the bound termsets is cached locally.  You can tag on the plane.

Once data is in SharePoint, other SharePoint features can deliver additional goodness:

  • Better listview filtering - not only can you filter in the normal "everything with value X" but you can also do inclusive filtering, displaying everything tagged with X or a child of X.
  • Better metadata navigation behavior - The metadata navigation feature allows you to navigate through libraries using hierarchies other than the folder hierarchy.  The termset is one of the allowed hierarchy types, meaning that you can browse your libraries along multiple axes.  You can now free your data from the tyranny of the URL or folder namespace.
  • Routing and policy - The document routing feature can direct your content based on the metadata applied to it.  Taxonomy fields can even be used to create folder hierarchies at the routing destination.  Retention policies can be driven off of taxonomy fields as well.
  • File open / save - Can’t remember exactly where your document is stored in a large library?  You can use the taxonomy field to filter the open dialog display.

Now that we have all that nice consistent metadata on our content, we can do a few more things:

  • Content by query Web Part enhancements - You can configure the CBQ to filter based on taxonomy fields, including descendent inclusion.
  • Automatic search refinement - The search system is aware of all taxonomy fields, and if a result set has a sufficient amount of data with the same taxonomy fields, a search refinement will appear, allowing users to filter their data.
  • Power user profile and social tagging - it doesn’t make much sense to have a corporate taxonomy and then do your social tagging using just string matching.  All of the social properties are actually sourced from the taxonomy system, meaning that you won’t get people asking you where a good place to stay in Paris, France when you are an expert on Paris, Texas.

And since we know that we can’t possibly implement every feature that everyone would want, everything is accessible through our API.  In future blog posts, we’ll go over how to use this API to deliver some compelling features.

Hopefully this is a nice introduction to the work we did around taxonomies and enterprise metadata.  We had a lot of fun coming up with the design and implementation, and hope that it resonates with you.

Thanks for reading.

Pat.Miller at Microsoft.com

Introducing Records Management in SharePoint 2010

Hi everyone.  My name is Adam Harmetz and I work on the engineering team responsible for the SharePoint document and records management vision and features.  Many of you might remember me from the SharePoint 2007 recman blog.  The recman blog was a great way for the team to connect with records managers, IT professionals, and information architects and we’ll be continuing that discussion for the SharePoint 2010 compliance features via the Enterprise Content Management (ECM) Team Blog.


I think it makes sense to combine records management with other facets of ECM into one central blog.  After all, as Jim discussed, records management is a key component of our ECM strategy.  The notion that everyone should participate in ECM processes really served as a guiding principle to help expand the scope of records management in SharePoint 2010.  And for all you records managers out there, I think you’ll benefit greatly from learning about the other facets of ECM along the way.


To kick off the discussion, here are three key things you need to know about records management in SharePoint 2010.


The Records Center – A Place for Hierarchy, Driven By Metadata 


The Records Center was introduced in 2007 as a SharePoint site that served as a conventional records archive.   Content from all over the enterprise can be submitted to a Records Center and then routed to the appropriate place where it picks up the right permissions and policies, such as expiration and auditing.
For SharePoint 2010, we know it’s important to continue to invest here and add even more “traditional” archive features.   When looking at the broad swath of features we had to choose from, our goals here really focused on providing features that allow you to extract the most value out of an archive and find the data you need.  For instance, here are a few of the new features in a SharePoint 2010 Records Center:



  • Document ID: Every document can be assigned a unique identifier, which stays with the document even when it’s archived.  This allows records to be easily referenced by an ID no matter where the document moves.

  • Multi-Stage Retention: Retention policies can have multiple stages, allowing you to specify the entire document lifecycle as one policy (e.g. review Contracts every year, and delete after 7 years)

  • Per-Item Audit Reports: You can generate a customized audit report about an individual record.

  • Hierarchal File Plans: You can create deep, hierarchal folder structures and manage retention at each folder in the hierarchy (or inherit from parent folders).

  • File Plan Report: You can generate status reports showing the number of items in each stage of the file plan, along with a rollup of the retention policies on each node in the plan.

 Figure 1 - Records Center


Here’s the home page of the Records Center in SharePoint 2010 for a fictional government agency, the Joint Task Force.  Notice that the home page is a place for records managers to educate the organization on compliance policy, as well as a place to look up a record by its document identifier.


In addition to adding these traditional records management features to our archive, as product designers we made a big bet on the power of metadata to dive 21st century electronic records management.  This manifests itself in several ways in the SharePoint archive:



  • Taxonomy and Centralized Content Types:  The archive will be a consumer of enterprise-wide taxonomies and content types, ensuring consistency and context transfer between the collaborative spaces and the archive.  We’ll be talking a lot more about our 2010 taxonomy investments in future posts.

  • Content Organizer: The records router can use metadata to route incoming documents to the right place in the hierarchical file plan.  For instance, it enables you to automatically enforce rules on content that is submitted, like “If a Purchase Agreement is tagged with Project Alpha, send to the Alpha Contracts subfolder and apply that’s folder retention policy to the item.”

  • Virtual Folders: The file plan is a great way to manage a repository but often time isn’t what you want to use to navigate and find the content you are looking for.  The SharePoint 2010 Records Center makes use of a new feature called metadata based navigation, which allows you to expose key metadata as virtual folders:

Figure 2 - Metadata Driven Navigation 


Notice that end users discover content in this Records Center by navigating virtual folders based upon metadata properties on the records.


This bet on metadata is all about empowering the end user, thus increasing the chance of successful adoption of the RM system.  Instead of choosing a complicated node in a file plan, submitters just fill out a few pieces of useful metadata and they’ll use that metadata when they need to find the content again.


In Place Records Management – Injecting Records Management in the Content Creation Experience


With just about every customer engagement my team is involved in, we hear the same message again and again: records management doesn’t start (or stop!) in the archive.  Content isn’t created there and it sure doesn’t live there for the most interesting parts of its life.


We’ve made a huge effort in 2010 to enable you to do effective records management in collaborative spaces.  Auditing, Retention, Expiration, Reporting, Records Workflows, eDiscovery, Legal Hold and Recordization are all features you can use in collaborative space as you are striking a balance between SharePoint’s value to end users and the need for information governance.


Holding all of this together is a new feature in SharePoint 2010 called In Place Records Management.  This allows certain SharePoint documents (or blogs, wikis, web pages, and list items) to be declared records.  The system can prevent such records from being deleted or edited, if necessary by your organization’s definition of what a record is:


 Figure 3 - In Place Records Management


Note that some of the documents have locks, implying to the user that they are dealing with records.  When selecting a record, the UI for editing and deleting the item is disabled.


This recordization process can be done either manually, as part of a larger process in a workflow, or as a scheduled part of a document’s retention (e.g. after 2 years).  The key here is that, when declared a record, the content doesn’t move to an archive – it stays where it is so the end users can still find and interact with the content.


Once declared, the system knows about an item’s record status, so you can do things such as create different retention policies for records or use record state when defining workflows in SharePoint Designer.  We also enable a programmability model so you can perform custom processes and policies upon recordization to meet specialized compliance needs.


Is In Place Records a replacement for a traditional archive?  The answer is, of course, sometimes – we’ll find some customers who want to use an in place approach exclusively, some who will want the traditional hierarchy and centralization that an archive brings, and many who will want both.  It’ll be something we’ll talk about a lot on this blog, and our documentation has already started discussing the pros and cons of both approaches.


Scale: We’re Talking Big


With electronic information growing at a crazy pace and businesses spending billions on eDiscovery every year, records managers have enough to keep them up at night.  The scale of their records/content management system shouldn’t be another worry.


As the records management engineering team, we take this burden very seriously and a large part of our effort this release has been spent adding features to make it easier to scale to massive archives.  Features such as Remote Blob Storage, database query optimizations, internal timer job processing improvements, new database indexing strategies and other engineering initiatives enable us to make a great leap forward this release and allow our customers to have:



  • Tens of millions of records in a single Records Center

  • Hundreds of millions of records in a distributed archive: We’ll talk in more detail in future posts, but many of the features mentioned above light up to allow many Record Centers to bind together to act as one logical repository.

With our partners on the SharePoint blog, we are looking forward to showing more details on the new scale targets and performance profiles for deployments at this scale over the coming months.


Wrapping Up


It’s been a lot of hard work for the team around here to deliver on this vision for 21st century records management.  When combined with the integrated e-mail archiving, retention, and discovery capabilities of Exchange 2010, I think you’ll see the 2010 wave as a breakout release for Microsoft’s records management strategy.


The team here is proud of the work here and eager to talk about it and hear from everyone – feel free to leave suggestions on future blog post ideas in the comments!


Thanks for reading,
Adam Harmetz
Lead Program Manager


P.S. If you are hungry for even for information on SharePoint 2010 records management, check out an interview I did on Don Lueder’s blog.

Categories

  • An error has occurred; the feed is probably down. Try again later.

Other sites you might enjoy: