[Syndication] revisiting categorization: "controlled vocabulary"

Quinten Steenhuis quinten at andrew.cmu.edu
Mon Dec 6 06:35:11 PST 2004



Chris wrote:
> Hi
> 
> On Fri 03-Dec-2004 at 12:00:08PM -0500, Quinten Steenhuis
> wrote:
> 
>>I propose we make our own scheme -- maybe something like
>>IMC Controlled Vocabulary, or ICV.
> 
> 
> I used to think that this would make sense also... but I
> don't any more... basically each IMC comes up with it's
> own dc:subject's and this is where we need to start...

Well...most IMCs don't have their own dc:subjects right now. Only a 
handful generate them. But they do have "categories" that can be chosen 
from to assign to an article. That is the exact definition of a 
controlled vocabulary.

I'm suggesting not that we make everyone use the same words, but that we 
  find out what the basic terms are, come up with a system to code them 
-- either numerically or with your previous idea of a alphanumeric code 
with semantic meaning, and then develop a list of which subjects in 
which languages correspond to which codes. It's still a controlled 
vocabulary, since we'd only have a finite number of codes, but each code 
should be broad enough to still indicate some semantic connection for 
multiple words that are associated with it. The IPTC newscodes are in 4 
languages; we can consider the ICV to be in multiple languages and 
regionalizations, one for each IMC if necessary.

If, for example, some IMCs have a category called "guerra," some have 
one called "peace and justice," some have one called "militarization," 
some have one called "anti-war," some have one called just "war": they 
should all map to at least one subject code in common, but the ones that 
have multiple meanings should map to multiple subject codes.

These subject codes would only be used on the backend. The connection 
between syndication and the frontend category names would be that in 
each IMC's database, they have a list of categories, plus a table that 
connects each category to one or more subject codes.

> 
>>IMCs already use a controlled vocabulary, it's just that
>>each one has its own. We could work on making a standard
>>"controlled vocabulary" to cover broad categories. We
>>could choose just a subset of a larger controlled
>>vocabulary using its standardized reference system and
>>map them onto multiple languages and different IMCs'
>>preferences about how to describe the term.
> 
> 
> I'd suggest doing this slightly differently, rather than
> comming up with a definative list to start with I'd
> suggest starting with what is bing used.

I agree this is the way to go.

> Perhaps on wiki page we could start with a list of
> dc:subject's that different sites use. Then we could look
> at how they relate to each other. From this it might be
> possible to com up with some mapping, like "foo" on uk
> sites equals "bar" on nyc site...

That's a good idea -- I started on the IMCStandardCategorization wiki 
page with a list of the broad categories for IPTC News Codes, Dewey 
Decimal Classification, and the Library of Congress Classification 
Numbers. We should put up a list of the categories for every IMC 
somewhere -- I'm not sure that the wiki is the friendliest interface for 
finding duplicates, etc though.

http://docs.indymedia.org/view/Devel/ImcStandardCategorization

> 
> Chris
> _______________________________________________
> syndication mailing list
> syndication at lists.indymedia.org
> http://lists.indymedia.org/mailman/listinfo/syndication
> 


More information about the syndication mailing list