[Syndication] revisiting categorization: "controlled vocabulary"
Quinten Steenhuis
quinten at andrew.cmu.edu
Mon Dec 6 06:35:11 PST 2004
Chris wrote:
> Hi
>
> On Fri 03-Dec-2004 at 12:00:08PM -0500, Quinten Steenhuis
> wrote:
>
>>I propose we make our own scheme -- maybe something like
>>IMC Controlled Vocabulary, or ICV.
>
>
> I used to think that this would make sense also... but I
> don't any more... basically each IMC comes up with it's
> own dc:subject's and this is where we need to start...
Well...most IMCs don't have their own dc:subjects right now. Only a
handful generate them. But they do have "categories" that can be chosen
from to assign to an article. That is the exact definition of a
controlled vocabulary.
I'm suggesting not that we make everyone use the same words, but that we
find out what the basic terms are, come up with a system to code them
-- either numerically or with your previous idea of a alphanumeric code
with semantic meaning, and then develop a list of which subjects in
which languages correspond to which codes. It's still a controlled
vocabulary, since we'd only have a finite number of codes, but each code
should be broad enough to still indicate some semantic connection for
multiple words that are associated with it. The IPTC newscodes are in 4
languages; we can consider the ICV to be in multiple languages and
regionalizations, one for each IMC if necessary.
If, for example, some IMCs have a category called "guerra," some have
one called "peace and justice," some have one called "militarization,"
some have one called "anti-war," some have one called just "war": they
should all map to at least one subject code in common, but the ones that
have multiple meanings should map to multiple subject codes.
These subject codes would only be used on the backend. The connection
between syndication and the frontend category names would be that in
each IMC's database, they have a list of categories, plus a table that
connects each category to one or more subject codes.
>
>>IMCs already use a controlled vocabulary, it's just that
>>each one has its own. We could work on making a standard
>>"controlled vocabulary" to cover broad categories. We
>>could choose just a subset of a larger controlled
>>vocabulary using its standardized reference system and
>>map them onto multiple languages and different IMCs'
>>preferences about how to describe the term.
>
>
> I'd suggest doing this slightly differently, rather than
> comming up with a definative list to start with I'd
> suggest starting with what is bing used.
I agree this is the way to go.
> Perhaps on wiki page we could start with a list of
> dc:subject's that different sites use. Then we could look
> at how they relate to each other. From this it might be
> possible to com up with some mapping, like "foo" on uk
> sites equals "bar" on nyc site...
That's a good idea -- I started on the IMCStandardCategorization wiki
page with a list of the broad categories for IPTC News Codes, Dewey
Decimal Classification, and the Library of Congress Classification
Numbers. We should put up a list of the categories for every IMC
somewhere -- I'm not sure that the wiki is the friendliest interface for
finding duplicates, etc though.
http://docs.indymedia.org/view/Devel/ImcStandardCategorization
>
> Chris
> _______________________________________________
> syndication mailing list
> syndication at lists.indymedia.org
> http://lists.indymedia.org/mailman/listinfo/syndication
>
More information about the syndication
mailing list