# StyloMetrix Tutorial

This notebook presents full instructions for using StyloMetrix with examples.

J.Busse 2025-11-09: Dies ist eine modifizierte Fassung von 
* https://github.com/ZILiAT-NASK/StyloMetrix/blob/main/examples/Tutorial.ipynb

Unser Korpus ist auf DE. Das Original-Tutorial verwendet EN als Sprache. Selbstverständlich haben wir auch den englischen `lg`-Korpus insgtalliert:

```
python -m spacy download en_core_web_lg --direct
```

Problem: Wenn wir StyloMetrix in der hier vorliegenden Installation mit `en` starten:

```
stylo = sm.StyloMetrix('en') # define langauge, one of ('de','en', 'pl', 'ru', 'ukr')
```

Erhalten wir die Fehlermeldung

> [E050] Can't find model 'en_core_web_trf'. It doesn't seem to be a Python package or a valid path to a data directory.

Ursache: Wir haben den `lg` installiert, aber nicht `trf`. Lösung: `trf` installieren? Problem: Die Installation ist *riesig*, nicht gewollt für das dsci-lab.

ABER wir müssen dieses Problem ja auch gar nicht lösen, denn uns interessiert ja vorallem DE als Sprache. Entsprechend haben wir dieses Tutorial auf DE Beispiele umgestellt.


## 1. Quick start

StyloMetrix is a tool for stylometric analysis of texts. It is based on Spacy and supports four languages. In order for the tool to work properly, a corresponding language model is required. Below is the list of supported languages and their corresponding models:
- English [en_core_web_trf](https://spacy.io/models/en)
- German [de_core_news_lg](https://spacy.io/models/de)
- Polish [en_nask-0.0.7](http://mozart.ipipan.waw.pl/~rtuora/spacy/)
- Russian [ru_core_news_lg](https://spacy.io/models/ru)
- Ukrainian [uk_core_web_trf](https://spacy.io/models/uk)


The model must be downloaded and installed in the environment where SM will be used. StyloMetrix is installed using `pip install stylo_metrix`.

The following shows how to quickly calculate metrics for several texts. Please remember everything presented in this tutorial can be applied to all supported languages.

In [1]:
# import library
import stylo_metrix as sm

In [2]:
# example texts

texts = [ 
    # jbusse
    """Im Rahmen von Forschungs und Lehre müssen Leistungen individuell zuschreibbar sein. 
Aber auch moderne generative KI kann inzwischen Texte generieren. 
Problem: Bei einem einem “gut formulierten” Text ist es sehr schwer zu unterscheiden,
*   ob der Text ein von der KI generierter Text ist, den ein Mensch in Auftrag gegeben hat, oder
*   ob der Text ein vom Menschen selbst formulierter Rohtext ist, der von der KI lektoriert wurde.""",
    """Unser Prüfungsrecht verlangt, dass eine Leistung individuell zugeschrieben werden kann. 
Die KI-Leitlinie Bayern verlangt, dass KI-generierte Inhalte eindeutig gekennzeichnet werden. 
Es stellt sich die Frage, wie man in Studienarbeiten und Bachelorabeiten, 
aber auch Präsentationen etc. die Beiträge von KI auch typografisch kenntlich machen kann. """,
    # ChatGPT
    """In der Forschung und Lehre muss man wissen, wer etwas geschrieben hat. 
Aber heutzutage kann auch KI Texte erstellen.
Das Problem: Wenn ein Text gut geschrieben ist, kann man kaum erkennen, ob
* der Text von einer KI stammt, die jemand beauftragt hat, oder
* der Text zuerst von einem Menschen geschrieben und dann von der KI verbessert wurde.""",
    """Das Prüfungsrecht verlangt, dass man genau erkennen kann, wer eine Leistung erbracht hat.
Die KI-Leitlinie Bayern fordert, dass KI-generierte Inhalte klar gekennzeichnet werden.
Die Frage ist nun, wie man in Studienarbeiten, Bachelorarbeiten oder auch in Präsentationen zeigen kann, 
welche Teile von der KI stammen – auch durch die Gestaltung des Textes."""
    ]


In [3]:
# count metrics
stylo = sm.StyloMetrix('de') # define langauge, one of ('de','en', 'pl', 'ru', 'ukr')
metrics = stylo.transform(texts)
metrics

100%|████████████████████████████████████████████████████████████████████████████████████████████| 4/4 [00:00<00:00, 35.68it/s]


Unnamed: 0,text,G_N,G_ADJ,G_ADV,G_V,G_VMOD,G_NUM,G_PART,G_ADP,G_CONJ,...,L_STOP,L_TCCT1,L_TCCT5,DESC_PRON_VOC,GR_UPPER,GR_EMOT,GR_LENNY,GR_MENTION,GR_HASH,GR_LINK
0,Im Rahmen von Forschungs und Lehre müssen Leis...,0.2,0.058824,0.094118,0.141176,0.023529,0.0,0.011765,0.082353,0.058824,...,0.435294,0.070588,0.129412,0.0,0.035294,0.0,0.0,0.0,0.0,0.0
1,"Unser Prüfungsrecht verlangt, dass eine Leistu...",0.203704,0.018519,0.12963,0.185185,0.037037,0.0,0.0,0.037037,0.092593,...,0.407407,0.055556,0.111111,0.0,0.018519,0.0,0.0,0.0,0.0,0.0
2,"In der Forschung und Lehre muss man wissen, we...",0.166667,0.0,0.083333,0.222222,0.041667,0.0,0.0,0.055556,0.083333,...,0.5,0.083333,0.138889,0.0,0.041667,0.0,0.0,0.0,0.0,0.0
3,"Das Prüfungsrecht verlangt, dass man genau erk...",0.206349,0.015873,0.079365,0.190476,0.031746,0.0,0.0,0.063492,0.063492,...,0.428571,0.095238,0.142857,0.0,0.015873,0.0,0.0,0.0,0.0,0.0


You can count metrics for just as a string as well.

In [4]:
# You can provide string or list of strings to transform method
metrics_for_one = stylo.transform(texts[0])
metrics_for_one

100%|████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:00<00:00, 30.32it/s]


Unnamed: 0,text,G_N,G_ADJ,G_ADV,G_V,G_VMOD,G_NUM,G_PART,G_ADP,G_CONJ,...,L_STOP,L_TCCT1,L_TCCT5,DESC_PRON_VOC,GR_UPPER,GR_EMOT,GR_LENNY,GR_MENTION,GR_HASH,GR_LINK
0,Im Rahmen von Forschungs und Lehre müssen Leis...,0.2,0.058824,0.094118,0.141176,0.023529,0.0,0.011765,0.082353,0.058824,...,0.435294,0.070588,0.129412,0.0,0.035294,0.0,0.0,0.0,0.0,0.0


## 2. Create StyloMetrix instance

This chapter describes in detail the parameters of the `sm.StyloMetrix` class.

- The basis for building an SM object is to specify the language in which the processed texts are written. This is done by entering a parameter **`lang`** of type `string`, can be one of:
    -  `['english', 'angielski', 'en', 'eng', ]` for english,
    -  `['polish', 'polski', 'pl', 'pol']` for polish,
    -  `['russian', 'rosyjski', 'ru']` for russian,
    -  `['ukrainian', 'ukraiński', 'ukr']` for ukrainian.

- Quite an important parameter is **`debug`**, which takes boolean values. When set to `True`, the result of the `transform` operation will be two DataFrame objects - the first is the results of the metrics calculation, the second contains information about which tokens were taken into account during the metrics count.

In [5]:
stylo = sm.StyloMetrix('de', debug=True) 
metrics, debug = stylo.transform(texts)
debug

100%|████████████████████████████████████████████████████████████████████████████████████████████| 4/4 [00:00<00:00, 19.69it/s]


Unnamed: 0,text,G_N,G_ADJ,G_ADV,G_V,G_VMOD,G_NUM,G_PART,G_ADP,G_CONJ,...,L_STOP,L_TCCT1,L_TCCT5,DESC_PRON_VOC,GR_UPPER,GR_EMOT,GR_LENNY,GR_MENTION,GR_HASH,GR_LINK
0,Im Rahmen von Forschungs und Lehre müssen Leis...,"[Rahmen, Forschungs, Lehre, Leistungen, KI, Te...","[moderne, generative, formulierten, generierte...","[individuell, zuschreibbar, auch, inzwischen, ...","[müssen, sein, kann, generieren, ist, untersch...","[müssen, kann]",[],[zu],"[Im, von, Bei, von, in, vom, von]","[und, Aber, ob, oder, ob]",...,"[Im, von, und, müssen, sein, Aber, auch, kann,...",[der],"[der, Text]",[],"[KI, KI, KI]",[],[],[],[],[]
1,"Unser Prüfungsrecht verlangt, dass eine Leistu...","[Prüfungsrecht, Leistung, KI-Leitlinie, Bayern...",[KI-generierte],"[individuell, eindeutig, auch, etc., auch, typ...","[verlangt, zugeschrieben, werden, kann, verlan...","[kann, kann]",[],[],"[in, von]","[dass, dass, wie, und, aber]",...,"[Unser, dass, eine, werden, kann, Die, dass, w...",[\n],"[\n, der]",[],[KI],[],[],[],[],[]
2,"In der Forschung und Lehre muss man wissen, we...","[Forschung, Lehre, KI, Texte, Problem, Text, T...",[],"[heutzutage, auch, gut, kaum, zuerst, dann]","[muss, wissen, geschrieben, hat, kann, erstell...","[muss, kann, kann]",[],[],"[In, von, von, von]","[und, Aber, Wenn, ob, oder, und]",...,"[In, der, und, muss, man, wer, etwas, hat, Abe...",[der],"[der, \n]",[],"[KI, KI, KI]",[],[],[],[],[]
3,"Das Prüfungsrecht verlangt, dass man genau erk...","[Prüfungsrecht, Leistung, KI-Leitlinie, Bayern...",[KI-generierte],"[genau, klar, nun, auch, auch]","[verlangt, erkennen, kann, erbracht, hat, ford...","[kann, kann]",[],[],"[in, in, von, durch]","[dass, dass, wie, oder]",...,"[Das, dass, man, kann, wer, eine, hat, Die, da...",[der],"[der, \n]",[],[KI],[],[],[],[],[]


- If we want our results to save automatically, set the **`save_path`** parameter. It takes values of type `string`, which denotes the path to an existing directory where DataFrames are to be saved in csv form.

In [7]:
path = '.'
stylo = sm.StyloMetrix('de', debug=True, save_path=path)

stylo.transform(texts[:2])

100%|████████████████████████████████████████████████████████████████████████████████████████████| 2/2 [00:00<00:00, 31.60it/s]

File saved in location: ./sm_output4.csv
File saved in location: ./sm_debug4.csv





(                                                text       G_N     G_ADJ  \
 0  Im Rahmen von Forschungs und Lehre müssen Leis...  0.200000  0.058824   
 1  Unser Prüfungsrecht verlangt, dass eine Leistu...  0.203704  0.018519   
 
       G_ADV       G_V    G_VMOD  G_NUM    G_PART     G_ADP    G_CONJ  ...  \
 0  0.094118  0.141176  0.023529    0.0  0.011765  0.082353  0.058824  ...   
 1  0.129630  0.185185  0.037037    0.0  0.000000  0.037037  0.092593  ...   
 
      L_STOP   L_TCCT1   L_TCCT5  DESC_PRON_VOC  GR_UPPER  GR_EMOT  GR_LENNY  \
 0  0.435294  0.070588  0.129412            0.0  0.035294      0.0       0.0   
 1  0.407407  0.055556  0.111111            0.0  0.018519      0.0       0.0   
 
    GR_MENTION  GR_HASH  GR_LINK  
 0         0.0      0.0      0.0  
 1         0.0      0.0      0.0  
 
 [2 rows x 169 columns],
                                                 text  \
 0  Im Rahmen von Forschungs und Lehre müssen Leis...   
 1  Unser Prüfungsrecht verlangt, dass eine

- If you are analyzing numerous lengthy documents and are concerned about lengthy processing times or potential interruptions during the counting process, you have the option to define intermittent saving points. This allows you to generate temporary files as the counting progresses. If the analysis completes without issues, these temporary files are automatically deleted. However, in the event of a disruption, you'll have temporary files containing results for at least a portion of your documents. Specify `save_step` parameter (integer) to save results after each `save_step` processed documents.

In [8]:
path = '.'
stylo = sm.StyloMetrix('de', debug=True, save_path=path, save_step=2)

stylo.transform(texts)

100%|████████████████████████████████████████████████████████████████████████████████████████████| 4/4 [00:00<00:00, 28.16it/s]

File saved in location: ./sm_output5_temp.csv
File saved in location: ./sm_debug5_temp.csv
File saved in location: ./sm_output5_temp.csv
File saved in location: ./sm_debug5_temp.csv
File saved in location: ./sm_output5.csv
File saved in location: ./sm_debug5.csv





(                                                text       G_N     G_ADJ  \
 0  Im Rahmen von Forschungs und Lehre müssen Leis...  0.200000  0.058824   
 1  Unser Prüfungsrecht verlangt, dass eine Leistu...  0.203704  0.018519   
 2  In der Forschung und Lehre muss man wissen, we...  0.166667  0.000000   
 3  Das Prüfungsrecht verlangt, dass man genau erk...  0.206349  0.015873   
 
       G_ADV       G_V    G_VMOD  G_NUM    G_PART     G_ADP    G_CONJ  ...  \
 0  0.094118  0.141176  0.023529    0.0  0.011765  0.082353  0.058824  ...   
 1  0.129630  0.185185  0.037037    0.0  0.000000  0.037037  0.092593  ...   
 2  0.083333  0.222222  0.041667    0.0  0.000000  0.055556  0.083333  ...   
 3  0.079365  0.190476  0.031746    0.0  0.000000  0.063492  0.063492  ...   
 
      L_STOP   L_TCCT1   L_TCCT5  DESC_PRON_VOC  GR_UPPER  GR_EMOT  GR_LENNY  \
 0  0.435294  0.070588  0.129412            0.0  0.035294      0.0       0.0   
 1  0.407407  0.055556  0.111111            0.0  0.018519    

- Moreover, it is possible to set the parameter **`nlp`** which denotes a custom Spacy model.
- By default, all available metrics in a given language are counted. We can modify them with parameters **`metrics`** or **`exceptions`**. We can choose ourselves the set of metrics we want to calculate and assign it to `metrics` parameter. As well as we can select all metrics except a given set, in which case such a set is assigned to `exceptions` parameter. The next section will show how we can select metrics.

## 3. Selecting metrics

In order to select metrics, we first need to see what we have to choose from. **Please, keep in mind that metrics might differ between languages.** We can find out available metrics in the following way:

In [9]:
metrics = sm.get_all_metrics('de')
print(metrics)

0  |  GrammaticalForms  |  G_N  |  Nouns  |  Substantive
1  |  GrammaticalForms  |  G_ADJ  |  Adjectives  |  Adjektive
2  |  GrammaticalForms  |  G_ADV  |  Adverbs  |  Adverbien
3  |  GrammaticalForms  |  G_V  |  Verbs  |  Verben
4  |  GrammaticalForms  |  G_VMOD  |  Modal verbs  |  Modalverben
5  |  GrammaticalForms  |  G_NUM  |  Numerals  |  Numerale
6  |  GrammaticalForms  |  G_PART  |  Particles  |  Partikeln
7  |  GrammaticalForms  |  G_ADP  |  Adpositions  |  Präpositionen
8  |  GrammaticalForms  |  G_CONJ  |  Conjunctions  |  Konjunktionen
9  |  GrammaticalForms  |  G_CCONJ  |  Coordinating conjunctions  |  Koordinierende Konjunktionen
10  |  GrammaticalForms  |  G_SCONJ  |  Conjunctions  |  Subordinierende Konjunktionen
11  |  GrammaticalForms  |  G_PRO  |  Pronouns  |  Pronomen
12  |  GrammaticalForms  |  G_PRO_PRS  |  Personal pronouns  |  Personalpronomen
13  |  GrammaticalForms  |  G_PRO_DEM  |  Demonstrative pronouns  |  Demonstrativpronomen
14  |  GrammaticalForms  |  G_P

Above are the following (from left):
- **order number** - metrics is a `MetricGroup` object, from which we can select individual metrics, or snippets, e.g. `metrics[0]`, or `metrics[10:20]`.
- **category** - each metric is assigned to a subject category.
- **metrics code** - this is a unique string for each metric displayed in the DataFrame.
- **name** - the extended name of the metric.

Metrics can also be accessed in other ways, eg. we can choose all metrics from given category:

In [10]:
# check available categories
categories = sm.get_all_categories('de')
print(categories)

[GrammaticalForms, Inflection, Syntactic, Punctuation, Lexical, Descriptive, Graphical]


In [11]:
# choose category
category = categories[2]

# preview what metrics are available within this category
# this is the same DataFrame object as before and you can perform the same operations on it
category_metrics = category.get_metrics()
print(category_metrics)

0  |  Syntactic  |  SY_ADJD  |  Adjectives  |  Adjektive und Adverbien im Prädikativ
1  |  Syntactic  |  SY_PTKA  |  Particles with adjective or adverb  |  Partikeln mit Adjektiv oder Adverb
2  |  Syntactic  |  SY_APPR  |  Adpositions, left of the noun  |  Präpositionen, links des Nomens
3  |  Syntactic  |  SY_APPRART  |  Adpositions with fused articles  |  Präpositionen mit verschmolzenen Artikeln
4  |  Syntactic  |  SY_APPO  |  Postpositions  |  Postpositionen
5  |  Syntactic  |  SY_DO  |  Dative object  |  Dativobjekt
6  |  Syntactic  |  SY_OA  |  Accusative object  |  Akkusativobjekt
7  |  Syntactic  |  SY_S_DE  |  Words in declarative sentences  |  Wörter in Aussagesätzen
8  |  Syntactic  |  SY_S_EX  |  Words in exclamatory sentences  |  Wörter in Ausrufesätzen
9  |  Syntactic  |  SY_S_INT  |  Words in interrogative sentences  |  Wörter in Fragesätzen
10  |  Syntactic  |  SY_S_NEG  |  Words in negative sentences  |  Wörter in negativen Sätzen
11  |  Syntactic  |  SY_S_INF  |  Word

In [12]:
# example subset of metrics for analysis
metrics_to_analyse = metrics[60:100]
print(metrics_to_analyse)

0  |  Inflection  |  IN_POSS_3SG  |  Possessive pronouns in 3rd person singular  |  Possesivpronomen in der dritten Person Singular
1  |  Inflection  |  IN_POSS_1PL  |  Possessive pronouns in 1st person plural  |  Possesivpronomen in der ersten Person Plural
2  |  Inflection  |  IN_POSS_2PL  |  Possessive pronouns in 2nd person plural  |  Possesivpronomen in der zweiten Person Plural
3  |  Inflection  |  IN_POSS_3PL  |  Possessive pronouns in 3rd person plural  |  Possesivpronomen in der dritten Person Plural
4  |  Inflection  |  IN_ART_SG  |  Singular determiners  |  Artikel im Singular
5  |  Inflection  |  IN_ART_PL  |  Plural determiners  |  Artikel im Plural
6  |  Inflection  |  IN_ART_DEF_SG  |  Singular definite articles  |  Bestimmte Artikel im Singular
7  |  Inflection  |  IN_ART_DEF_PL  |  Plural definite articles  |  Bestimmte Artikel im Plural
8  |  Inflection  |  IN_ART_IND_SG  |  Singular indefinite articles  |  Unbestimmte Artikel im Singular
9  |  Inflection  |  IN_ART_M

Metrics / categories to use can be chosen directly as `MetricGroup` or defined as list of strings containing name of Category to include / exclude whole category of names of metrics to include/exclude them.

In [13]:
# choose metrics_to_analyse excluding syntatic using MetricGroup
stylo = sm.StyloMetrix('de', metrics=metrics_to_analyse, exceptions=category_metrics)
metrics = stylo.transform(texts)
metrics

100%|████████████████████████████████████████████████████████████████████████████████████████████| 4/4 [00:00<00:00, 73.24it/s]


Unnamed: 0,text,IN_POSS_3SG,IN_POSS_1PL,IN_POSS_2PL,IN_POSS_3PL,IN_ART_SG,IN_ART_PL,IN_ART_DEF_SG,IN_ART_DEF_PL,IN_ART_IND_SG,...,IN_V_PRES,IN_V_1SG,IN_V_2SG,IN_V_3SG,IN_V_1PL,IN_V_2PL,IN_V_3PL,IN_V_PP,IN_V_PAST_IMP,IN_V_PAST_IMP2
0,Im Rahmen von Forschungs und Lehre müssen Leis...,0.0,0.0,0.0,0.0,0.105882,0.0,0.047059,0.0,0.058824,...,0.070588,0.0,0.0,0.070588,0.0,0.0,0.011765,0.023529,0.0,0.0
1,"Unser Prüfungsrecht verlangt, dass eine Leistu...",0.0,0.018519,0.0,0.0,0.055556,0.018519,0.037037,0.018519,0.018519,...,0.055556,0.0,0.0,0.055556,0.0,0.0,0.0,0.074074,0.0,0.0
2,"In der Forschung und Lehre muss man wissen, we...",0.0,0.0,0.0,0.0,0.111111,0.0,0.069444,0.0,0.041667,...,0.097222,0.0,0.0,0.111111,0.0,0.0,0.0,0.069444,0.0,0.0
3,"Das Prüfungsrecht verlangt, dass man genau erk...",0.0,0.0,0.0,0.0,0.111111,0.0,0.095238,0.0,0.015873,...,0.095238,0.0,0.0,0.079365,0.0,0.0,0.015873,0.047619,0.0,0.0


In [14]:
# choose metrics_to_analyse excluding syntatic using list of strings
# remember to provide names as list even if you are defining only one name to include/exclude
stylo = sm.StyloMetrix('de', metrics=["Syntactic", "VerbTenses"], 
                       exceptions=['VT_SHOULD_PROGRESSIVE'])
metrics = stylo.transform(texts)
metrics

100%|████████████████████████████████████████████████████████████████████████████████████████████| 4/4 [00:00<00:00, 76.76it/s]


Unnamed: 0,text,SY_ADJD,SY_PTKA,SY_APPR,SY_APPRART,SY_APPO,SY_DO,SY_OA,SY_S_DE,SY_S_EX,...,SY_S_NEG,SY_S_INF,SY_S_MAN,SY_S_SUB,SY_S_SUB_ZU,SY_S_KOKOM,SY_S_COND1,SY_S_COND2,SY_S_COND3,SY_QUOT
0,Im Rahmen von Forschungs und Lehre müssen Leis...,0.047059,0.0,0.058824,0.023529,0.0,0.0,0.023529,0.976471,0.0,...,0.0,0.0,0.0,0.694118,0.0,0.0,0.0,0.0,0.0,0.0
1,"Unser Prüfungsrecht verlangt, dass eine Leistu...",0.074074,0.0,0.037037,0.0,0.0,0.0,0.018519,1.0,0.0,...,0.0,0.240741,0.518519,0.481481,0.0,0.0,0.0,0.0,0.0,0.0
2,"In der Forschung und Lehre muss man wissen, we...",0.013889,0.0,0.055556,0.0,0.0,0.0,0.041667,1.0,0.0,...,0.0,0.0,0.875,0.666667,0.0,0.0,0.0,0.0,0.0,0.0
3,"Das Prüfungsrecht verlangt, dass man genau erk...",0.031746,0.0,0.063492,0.0,0.0,0.0,0.015873,1.0,0.0,...,0.0,0.0,0.793651,0.47619,0.0,0.0,0.0,0.0,0.0,0.0


- You can calculate the value for a single metric (provided as a list)
- or you can provide list of any available metrics (also provided as string), as well as groups of metrics

In [15]:
stylo = sm.StyloMetrix('de', metrics=[metrics_to_analyse[1]])
metrics = stylo.transform(texts)
metrics

100%|████████████████████████████████████████████████████████████████████████████████████████████| 4/4 [00:00<00:00, 36.90it/s]


Unnamed: 0,text,IN_POSS_1PL
0,Im Rahmen von Forschungs und Lehre müssen Leis...,0.0
1,"Unser Prüfungsrecht verlangt, dass eine Leistu...",0.018519
2,"In der Forschung und Lehre muss man wissen, we...",0.0
3,"Das Prüfungsrecht verlangt, dass man genau erk...",0.0


In [16]:
stylo = sm.StyloMetrix('de', metrics=['SY_SIMILE'])
metrics = stylo.transform(texts)
metrics

100%|████████████████████████████████████████████████████████████████████████████████████████████| 4/4 [00:00<00:00, 90.50it/s]


Unnamed: 0,text
0,Im Rahmen von Forschungs und Lehre müssen Leis...
1,"Unser Prüfungsrecht verlangt, dass eine Leistu..."
2,"In der Forschung und Lehre muss man wissen, we..."
3,"Das Prüfungsrecht verlangt, dass man genau erk..."


Groups of metrics can be added (concatenated) as well as subtracted (remove some groups)

In [17]:
metrics = sm.get_all_metrics('de')
group1 = metrics[20:30]
group2 = metrics[50:70]
group3 = metrics[25:55]
final_group = group1 + group2 - group3
print(final_group)

0  |  GrammaticalForms  |  G_PRO_REFL  |  Reflexive pronouns  |  Reflexivpronomen
1  |  GrammaticalForms  |  G_PRO_REZ  |  Reciprocal pronouns  |  Reziprokpronomen
2  |  GrammaticalForms  |  G_PRO_UNPERS  |  Impersonal pronouns  |  Unpersönliche Pronomen
3  |  GrammaticalForms  |  G_PRO_ADV  |  Pronominal adverbs  |  Pronominaladverbien
4  |  GrammaticalForms  |  G_ART  |  Determiners  |  Artikel
5  |  Inflection  |  IN_PRO_1PL  |  Personal pronouns in 1st person plural  |  Personalpronomen in der ersten Person Plural
6  |  Inflection  |  IN_PRO_2PL  |  Personal pronouns in 2nd person plural  |  Personalpronomen in der zweiten Person Plural
7  |  Inflection  |  IN_PRO_3PL  |  Personal pronouns in 3rd person plural  |  Personalpronomen in der dritten Person Plural
8  |  Inflection  |  IN_POSS_1SG  |  Possessive pronouns in 1st person singular  |  Possesivpronomen in der ersten Person Singular
9  |  Inflection  |  IN_POSS_2SG  |  Possessive pronouns in 2nd person singular  |  Possesivpro

## 4. Creating new metrics
you can create new metric using the inheritance mechanism of the **`Metric`** class. Note that the `category`, and `name_en` and `name_local` fields are required for proper operation. The category object must then be retrieved from the list available via `sm.get_all_categories()` or a new one must be created. The counting method itself is implemented in the `count(doc)` method.

In [18]:
categories = sm.get_all_categories('de')
category = categories[5]

class SAMPL2(sm.Metric):
    category = category
    name_en = "abc"
    name_local = "abc"
    
    def count(doc):
        result = 0.1
        debug = [doc[2], doc[3], doc[4]]
        return result, debug

In [19]:
# create a new category - indicate the language to which the category belongs.
# (same as with get_all_metrics(), etc.).

class C1(sm.Category):
    lang = 'de'        # define language
    name_en = "C1"     # name in enslish
    name_local = "C1"  # local name

    
class SAMPL3(sm.Metric):
    category = C1
    name_en = "abc"
    name_local = "abc"
    
    def count(doc):
        result = 0.99
        debug = [doc[9], doc[0], doc[1]]
        return result, debug

The created metrics are automatically saved after calling the code in which they are defined. So after calling the above cells we are already able to use them. This can be seen by looking at all available metrics.

In [20]:
print(sm.get_all_metrics('de'))

0  |  GrammaticalForms  |  G_N  |  Nouns  |  Substantive
1  |  GrammaticalForms  |  G_ADJ  |  Adjectives  |  Adjektive
2  |  GrammaticalForms  |  G_ADV  |  Adverbs  |  Adverbien
3  |  GrammaticalForms  |  G_V  |  Verbs  |  Verben
4  |  GrammaticalForms  |  G_VMOD  |  Modal verbs  |  Modalverben
5  |  GrammaticalForms  |  G_NUM  |  Numerals  |  Numerale
6  |  GrammaticalForms  |  G_PART  |  Particles  |  Partikeln
7  |  GrammaticalForms  |  G_ADP  |  Adpositions  |  Präpositionen
8  |  GrammaticalForms  |  G_CONJ  |  Conjunctions  |  Konjunktionen
9  |  GrammaticalForms  |  G_CCONJ  |  Coordinating conjunctions  |  Koordinierende Konjunktionen
10  |  GrammaticalForms  |  G_SCONJ  |  Conjunctions  |  Subordinierende Konjunktionen
11  |  GrammaticalForms  |  G_PRO  |  Pronouns  |  Pronomen
12  |  GrammaticalForms  |  G_PRO_PRS  |  Personal pronouns  |  Personalpronomen
13  |  GrammaticalForms  |  G_PRO_DEM  |  Demonstrative pronouns  |  Demonstrativpronomen
14  |  GrammaticalForms  |  G_P