Haute Ecole Léonard De Vinci
Pôle Louvain
M.E.T.S.
Doctorat en Traductologie

Call for papers

Genre- and Register-related Text and Discourse Features
in Multilingual Corpora


International conference organized by
The Linguistic Society of Belgium and Institut Libre Marie Haps - Brussels
11-12 January 2013

The international conference on Genre- and Register-related Text and Discourse Features in Multilingual Corpora aims answer the call for the compilation of genre- and register-controlled multilingual corpora as well as the cross-linguistic annotation and analysis of genre- and register-related text and discourse features.

In his seminal book Seeing through Multilingual Corpora, Johansson (2007: 304) has convincingly argued that “it is desirable to extend contrastive studies by taking into account the variation across registers within languages”. Contrastive linguists are now increasingly aware of the need to revisit and refine the early-day corpus-based studies which were chiefly based on literary and/or news corpora and often tended to consider the languages under investigation as monolithic entities. New register- or genre-controlled multilingual corpora have recently been compiled for a number of language pairs (e.g. Hansen-Schirra et al. 2007).

This new interest in variation in contrastive linguistics has created new research paths, many of which had remained relatively under-explored in the field. One of them is the cross-linguistic analysis of genre- or register-sensitive text and discourse features. Biber et al.’s (1998: 106) statement that “we (…) know surprisingly little about discourse similarities or differences across texts and registers” still holds today, especially in the fields of contrastive and translation studies. Genre- and register-related discourse conventions may vary across languages: the same rhetorical purpose, such as argue, persuade, or prescribe, and the same discourse function (e.g. politeness strategies) can be realized by various forms in different languages. Such variations can be examined in comparable corpora, viz. corpora of original texts in two or more languages. Translation strategies may also have an impact on rhetorical discourse structures, as shown by studies based on translation corpora (see e.g. Da Cunha & Iruskieta 2010).

The analysis of text and discourse features often requires using annotated corpora. It is an established fact that the grammatical and syntactic annotation of multilingual corpora (e.g. lemmatization, part-of-speech tagging, parsing) is a difficult task. The pragmatic annotation of genre- and register-related text and discourse features is even more challenging as “discourse characteristics are more difficult to identify and analyze than lower-level lexical or grammatical features” (Biber, Conrad & Reppen 1998: 106). The difficulty at this higher level is compounded by many issues, such as the automation of data collection and the retrieval phases. However, the field has encouragingly witnessed major advances in recent years (see e.g. Taboada et al. 2011 on the automatic extraction of sentiment from text).

The conference aims to bring together (a) researchers working on the compilation and the linguistic, especially pragmatic, annotation of genre- and register-controlled multilingual corpora as well as (b) specialists in corpus-based contrastive linguistics and translation studies investigating genre- and/or register-related text and discourse features from a bilingual or multilingual perspective. Proposals dealing with the benefits of corpus-based studies for translator training are also welcome, especially those that examine discourse features across languages.

The conference will mainly explore four areas. Topics to be discussed include, but are not restricted to, the following:

  • Lexical variation across genres and registers;
  • Text- and discourse-structuring features;
  • Discourse markers;
  • Genre- and register-controlled multilingual corpus compilation projects.

 

1.    Lexical variation across genres and registers


We invite proposals establishing a connection between lexico-grammatical choices and genres/registers. Within this area, the following questions are of particular interest:

  • How do lexemes, morphemes and multiword expressions (collocations, lexical bundles, formulaic sequences and other prefabs) which belong to general lexis contribute to the (automatic) identification of genres and registers? To what extent is genre-/register-specific lexis (word uses) similar in languages A and B in comparable and translation corpora (or in non-translated and translated language)? Does the lexical level provide any evidence for (or against) the claim that translated language is “characterized by specific, identifiable features” (Olohan 2004: 90)?
  • How does the degree of fixedness of multiword expressions vary across genres/registers in the languages under investigation? Are the data extracted from corpora of non-translated language significantly different from those yielded by corpora of translated language (e.g. more or less fixedness, more or fewer formulaic sequences)?
  • How are keywords and key clusters (see e.g. Baker 2006) related to genres and registers across languages? Lexical preferences may vary across languages, not only at the level of the general lexis but also at the level of subject-matter-specific keywords. For example, an authority typically  abelled as judge in some legal genres in language A may be labelled as court in comparable texts in language B. Assuming that such lexical shifts are possible, should the notion of keyness be defined at an abstract semantic level (e.g. ‘authority enabled to pass judgments’) to be applicable to multilingual corpora?
  • What are the challenges and the benefits of colligational analysis (Hoey 2005) in establishing a relationship between a lexical item and its preferred/dispreferred grammatical function for genre and register identification across languages?

 

2.    Text- and discourse-structuring features


There are two major text segmentation methods at the level of macrostructures (Biber, Connor & Upton 2007: 13): bottom-up approaches, i.e. approaches starting from the identification of segments (macrostructures) in all texts in the corpus, and top-down approaches, which start from a predetermined set of discourse units, generally defined in terms of rhetorical moves, a move being defined as a discourse-level segment performing a specific communicative function (Swales 1990, 2002). The conference is open to both approaches. Proposals dealing with the question of how to combine bottom-up and top-down methods are especially welcome. Special attention will be paid to the following issues:

Bottom-up approaches

 

  • How is information packaging (topic-focus structure) at the sentence level related to text segmentation into higher-level macrostructures? What types of information packaging are relevant to genre and register identification? How relevant is information packaging to rhetorical moves analysis?
  • What lexico-syntactic features should be taken into account in text segmentation (e.g. anaphoric chains, sequence-framing adverbials, sequence-specific keywords)? How do these features vary across genres and registers, and across languages?
  • Multidimensional analysis (Biber 1995) based on the statistical analysis of co-occurring linguistic patterns (verb tenses, first-person pronouns, voices, that-clauses) has proved to be highly promising in the investigation of variation across registers (see e.g. Gozdz-Roszkowski 2011 for a recent application to legal discourse). It also offers an interesting perspective on rhetorical moves (see below). However, most of the studies within that framework are concerned with monolingual corpora. What possible linguistic shifts are found to occur in the languages under investigation when the method is applied to multilingual corpora?

 

Top-down approaches


The analysis in terms of rhetorical moves, e.g. claim-justification, problem-solution, goal-means, requires developing an exhaustive repertory/nomenclature and operational definition of discourse moves (discourse units) which are typical of a certain genre and register. This will then be applied to a corpus for segmenting it into discourse units (Biber et al. 2007: 17). The method raises the following questions:

  • Which rhetorical moves are specific to given genres and registers? What moves are common to some (or all) of them across languages?
  • Which moves are language-specific within the same genre and register?
  • Which linguistic features are move-specific across languages? What linguistic features are common to some (all) of them?
  • How is rhetorical-moves analysis related to text summarization?

 

3.    Pragmatic features in terms of discourse markers


The analysis in terms of pragmatic (discourse) markers is closely related to the preceding issue as it contributes to refining the detection of document structure. The question of what is a pragmatic (discourse) marker and what it is not is less obvious than it seems, the reason being that some of the pragmatic categories listed below are products of pragmatic inferencing. Contributions that look into the use of explicit or implicit markers across languages and across genres/registers are particularly welcome. Specifically, the markers should express:

  • deixis and subjectivity;
  • epistemic modality (the speaker’s commitment to the truth);
  • evidentiality;
  • the speaker’s (or other discourse participant’s) attitude (value judgments, attenuation, intensification);
  • argumentative charge of statements;
  • speech acts.

 

4.    New corpus compilation and annotation projects (and preliminary findings)


We also welcome corpus compilation reports, with special emphasis on:

  • Issues relating to the compilation and linguistic (mainly pragmatic) annotation of multilingual genre- and/or register-controlled corpora of written texts;
  • Issues relating to the compilation and linguistic (mainly pragmatic) annotation of audiovisual translation corpora (e.g. subtitling, dubbing, voice-over).


References

Adel, A. & Reppen, R. (eds) (2008). Corpora and Discourse: The Challenges of Different Settings. Amsterdam: Benjamins.
Baker, P. (2006). Using Corpora in Discourse Analysis. London: Continuum.
Biber, D. (1995). Variation across Speech and Writing. Cambridge: CUP.
Biber, D., Connor, U. & Upton, T.A. (2007). Discourse on the Move: Using Corpus Analysis to Describe Discourse Structure. Amsterdam: Benjamins.
Biber D., Conrad, S. & Reppen, R. (1998). Corpus Linguistics: Investigating Language Structure and Use. Cambridge: CUP.
Da Cunha, I. & Iruskieta, M. (2010). Comparing rhetorical structures in different languages: The influence of translation strategies. Discourse Studies 12: 563-598.
Hansen-Schirra, S., Neumann, S. & Steiner, E. (2007). Cohesive explicitness and explicitation in an English-German translation corpus. Languages in Contrast 7(2): 241-265.
Hoey, M. (2005). Lexical Priming: A New Theory of Words and Language. London: Routledge.
Johansson, S. (2007). Seeing through Multilingual Corpora. On the Use of Corpora in Contrastive Studies. Amsterdam: Benjamins.
Olohan, M. (2004). Introducing Corpora in Translation Studies. London: Routledge.
Gozdz-Roszkowski, S. (2011). Patterns of Linguistic Variation in American Legal English. Bern: Peter Lang.
Swales, J. (1990). Genre Analysis. Cambridge: CUP.
Swales, J. (2002). Research Genres: Exploration and Applications. Cambridge: CUP.
Taboada, M., Brooke, J., Tofiloski, M., Voll, K. & Stede, M. (2011). Lexicon-based methods for sentiment analysis. Computational Linguistics 37(2): 267-307.