| |
Storage and documentation of
the data
Since 1997 over 140 hours of
digital (DAT) recordings were collected from members of the four target communities. They
include spontaneous texts of wide stylistic variation (monologues, dialogs, interviews
and like). Recordings are made in an informal atmosphere and one or two speakers
take part in the session. Themes are not limited, the only criterion is that the
subject should be interesting to the speaker. In most cases the talk is
concentrated around every-day life, speaker's biography, historical events that
had an impact upon the life of the interviewed person (collectivization, Second
World War, political reforms of 1990-s). Therefore, the data may be used
not only for linguistic investigations but for ethnographical, historical and social
studies as well.
To date a large part of raw audio data is processed,
segmented and housed on CD-s. Long recordings are presented as a number of
short tracks. Each track has a duration of about 2-4 minutes and contains relatively self-sufficient context. A CD of this type
allows playing back extracts of any length without pauses between tracks as well
as quick and easy reference to a certain position in the text and in the corpus.
Another
possibility is the electronic presentation of the corpora based on the
commercial software Pinnacle Liquid Edition that we are currently
adapting to the needs of corpus linguistics together with Professor Christian
Sappok and in collaboration with the Fachhochschule
Bochum. The software provides excellent facilities for combining audio
files, video data (if any), written texts, graphics (for example pitch contours,
spectrograms, tables, maps) and accompanying materials on a DVD with a
user-friendly interface. General guidelines are being currently worked out and a
number of experimental DVD’s were burned to test procedures and standards of
data presentation.. The goal behind
this work is publishing of a large
data level with accompanying materials of various types.
|