Priority screening
Documentation incomplete
This is just a placeholder page with some notes. Please get in contact if you would be interested in properly documenting this section.
TODO: Background
- Configure setup
- Verify inclusion rule and data table
- Train model and wait
- Once predictions are in, use in assignment config
Setup config⚓︎
- when generating the data table, will write result from inclusion rule to column
incl_fieldand write predict data toincl_pred_field(keep unchanged unless this collides with other column names that might exist) train_splitproportion of the dataset to use for training (rest is for some basic evaluation; usually keep this as high as possible if you value potentially higher quality over higher quality certainty)n_predictionsremembering all predictions adds unnecessary burden on our database, so specify the smallest possible number (e.g. the number of assignments you are planning to make with the predictions)
Training⚓︎
- Head over to gitlab (or follow the button)
- Create new pipeline
- Configure the variables (most importantly the ID, unless you know what you are doing, leave other variables untouched)
- Run pipeline (prioritise and cleanup; only run the other steps if you know what you are doing)
- Wait
Data table⚓︎
- basis for training date and predictions
- NQL defines total dataset (training and what you want to predict on)
- ignores hierarchy (if unclear, probably totally fine for you)
- when item appears in multiple selected scopes, all labels kept and assigned to first scope; some labels might be overridden (edge case where resolved multiple times)
- column name template
{res or username}|{label key}:{value}(whereasresis resolved) - table is only filled with
1s for selected label values and then populated with0s where it makes sense (e.g. assumemethod:0,method:1,method:2are all part of the "Methods" label and2was selected, then0and1should implicitly be0/False) - empty cells in table means no label available
- missing info propagates to inclusion rule
Training artifacts⚓︎
data.arrowFull dataset with all labels and predictions.buscar_est.pngBUSCAR score with number of relevant articles over time and estimation based on predictions on the rest.buscar_est.json... and the matching datainclusion_statistics.pngNumber of items and included/excluded items per assignment scope.inclusion_statistics.json... and the matching datareport_test.jsonpredictions.csvbuscar_p.jsonbuscar_recall.jsoninclusion_curve.pngbuscar.pngworkload_estimation.txtreport_self.jsonscore_distribution.pngroc_auc.pngroc_auc.json
Models⚓︎
BERT et al
Inclusion rule⚓︎
- based on columns
- note
NA OR 1is1andNA OR 0isNA(makes sense, but confusing at first) - note
NA AND 1isNAandNA AND 0is0(makes sense, but confusing at first) - general idea:
- list of column names means all should be True (if value available)
- add
!to column name to indicate value has to exist; implicitly assumingNA or 1which can be made explicit by adding? - prepend
~to column name to indicate negation of value (no effect on NA) - you might not care where a label comes from when at least one user chose it, then drop the first part of the column name (ANYSRC)
- add
*to ANYSRC to indicate that all should be 1 (e.g. all users agree) - note, this also always includes resolution, but that should implicitly be agreeing anyway
- lists can be enclosed in
[ .. ] - lists can be space- or comma-separated
- prepend bracketed list with
ANDorORto indicate how columns shall be combined - you can combine clauses with
ANDorOR - use
( .. )to combine multiple nested clauses - play with the grammar at https://www.lark-parser.org/ide/ (add
start: clauseto the beginning)
?clause: cols
| clause _and clause -> and
| clause _or clause -> or
| "(" clause ")"
| _neg clause -> not
cols: col [(("," | " ") col)*] -> anded
| "OR" "[" col [(("," | " ") col)*] "]" -> ored
| "AND" "[" col [(("," | " ") col)*] "]" -> anded
col: SRC -> maybeyes
| SRC "!" -> forceyes
| SRC "?" -> maybeyes
| _neg SRC -> maybeno
| _neg SRC "!" -> forceno
| _neg SRC "?" -> maybeno
| ANYSRC -> anyyes
| ANYSRC "*" -> allyes
| ANYSRC "!*" -> forceallyes
| _neg ANYSRC -> anyno
| _neg ANYSRC "*" -> allno
| _neg ANYSRC "!*" -> forceallno
SRC: LAB "|" LAB ":" DIGIT+
ANYSRC: LAB ":" DIGIT+
LAB: (LETTER|DIGIT|"-"|"_")+
_neg: "-" | "~"
_and: "AND"i | "&"
_or: "OR"i | "|"
%import common.DIGIT
%import common.LETTER
%import common.WS
%ignore WS





