Task #611

Test using list of terms or vocab for intentional term extraction

Added by Lippe Lippe about 1 year ago. Updated about 1 year ago.

Status:New Start date:04/25/2012
Priority:Normal Due date:
Assignee:- % Done:

0%

Category:curation Spent time: 1.50 hour
Target version:-

Description

Automated term extraction was creating crufty vocab (see csv attached to #575)
But we can go through collected taxonomy, and create a cleaner set of terms, and use the MN vocab/term term extraction feature (see attached "vocab_term_extraciton.jpg") to improve tagging of content as it comes in.

I'm not quite sure if it would work to create a 2nd vocab to use for term extraction, if the 2nd vocab would have the relationship to the data taxonomy, and thus feed into the channel matching, or not.

1st step would be to test, using a common word.

vocab_term_extraciton.jpg - MN vocab/term term extraction feature (21.9 kB) Lippe Lippe, 04/25/2012 02:44 pm

newswiretaxonomy_2012-04-15_pruned.csv - Pruned version of exported taxonomy from devcloud site. (20.2 kB) Lippe Lippe, 05/16/2012 05:23 pm

History

Updated by Lippe Lippe about 1 year ago

Went through and pruned out terms from the CSV in #575.
Starting this, I thought I might be able to develop a vocab for term extraction. I started that, but also noticed a lot of vocabs that would be useful for term extraction, but not necessarily on a global basis. e.g.
  • Working group names
  • House bills
  • Senate bills
  • Media figures
  • Media outlets
  • Post types from GA sites
  • Days of action & campaigns
  • Locations

So while this pruned list could be further whittled down into a global term extraction vocab, I think it would also be useful to create feed categories, in order that specific term extraction vocabs be used only for that feed category. So GA site posts could be tagged with otherwise ambiguous terms like "sanitation."

Also available in: Atom PDF