[imc-sf-active] [imc-sfa-servers] maintaining popelin - the server hosting your imc

Daniel P dannyp at indypgh.org
Thu Mar 3 03:31:48 PST 2011


On Wed, Mar 2, 2011 at 5:11 PM, mark burdett <mark at indymedia.org> wrote:
> Having previously done a sf-active to drupal migration I'm not sure it's
> worth the extra steps of this intermediate conversion (yet another
> schema to have to know, and more insert into select from queries to
> write).  Maybe you just need me to document sf-active as I've been doing
> on IRC today w/ occam?  But if we do want or need this step then I'd ask
> what's the schema for this intermediate representation?

Detailed documentation of the schema would no doubt be valuable, but I
think we can save some work instead.  I wasn't thinking of another SQL
schema, I was thinking more about XML.  I think there are decent
existing XML schemas already defined that tools exist for.  I was
looking at NITF (www.nitf.org) because I had a passing familiarity
with it, and then a bit of digging revealed BlogML
(http://blogml.org/)[1].  People could then leverage existing BlogML
import tools to get to whatever CMS they want.  If there is data that
we can't massage into BlogML, we can export it separately, and those
so inclined can write tools to iterate over our extra metadata and
modify their imported documents to reflect what they want from it.

The other advantage to this approach is that it lets us use PHP to
generate the XML.  I was having the most luck at using sf-active code
itself to simply give me an article PHP object, and then using the
sf-active code itself to progamatically create drupal nodes.  People
importing to languages which aren't PHP don't have the privilege of
being able to do that, so I figured writing a PHP tool that leveraged
existing sf-active code to create a more portable serialized
representation of the object (in XML) would let others get the benefit
of using sf-active's code to parse sf-active's schema.

I think, specifically, about why I thought it was easier to use the
PHP to parse the schema is that getting a representation of a single
article is not easy in SQL in sf-active's database. It's been awhile
since I looked, but I believe you need to join webcast onto itself
(and in doing so you're getting photos as well as comments), and
figuring out what is hidden or not is also a bit tricky (or maybe I
couldn't just keep track of the several single character possible
values for the relevant field).

I think that aiming for our tools to create an interchange format that
is a standard, or at least heavily based on a standard, is the right
thing to do because inevitably not everyone is going to want to import
their site to the same target, and a lot of people would find it
easier to iterate over a set of objects than over a set of tables.
One thing I've been stuck on locally in conversation with others in
Pittsburgh IMC is that there's a lot of contention over what CMS
should be used (i.e. drupal or wordpress) and writing tools that would
be useful to import to anything therefore appeals to me.

Daniel

[1] the name BlogML is a bit unfortunate, because I doubt any/many of
us want people to think of IMC sites as "blogs" (given that term
largely refers to "opinion" and not "news" sites) but hey,
schema-wise, I think they're pretty close.


More information about the imc-sf-active mailing list