Task #611
Test using list of terms or vocab for intentional term extraction
| Status: | New | Start date: | 04/25/2012 | |
|---|---|---|---|---|
| Priority: | Normal | Due date: | ||
| Assignee: | - | % Done: | 0% |
|
| Category: | curation | Spent time: | 1.50 hour | |
| Target version: | - |
Description
Automated term extraction was creating crufty vocab (see csv attached to #575)
But we can go through collected taxonomy, and create a cleaner set of terms, and use the MN vocab/term term extraction feature (see attached "vocab_term_extraciton.jpg") to improve tagging of content as it comes in.
I'm not quite sure if it would work to create a 2nd vocab to use for term extraction, if the 2nd vocab would have the relationship to the data taxonomy, and thus feed into the channel matching, or not.
1st step would be to test, using a common word.
History
Updated by Lippe Lippe about 1 year ago
- File newswiretaxonomy_2012-04-15_pruned.csv added
Starting this, I thought I might be able to develop a vocab for term extraction. I started that, but also noticed a lot of vocabs that would be useful for term extraction, but not necessarily on a global basis. e.g.
- Working group names
- House bills
- Senate bills
- Media figures
- Media outlets
- Post types from GA sites
- Days of action & campaigns
- Locations
So while this pruned list could be further whittled down into a global term extraction vocab, I think it would also be useful to create feed categories, in order that specific term extraction vocabs be used only for that feed category. So GA site posts could be tagged with otherwise ambiguous terms like "sanitation."