Designing good annotation schemes

Screening philosophy⚓︎

Option 1) while screening, you annotate all the labels you can think of. In this way, you only go through the data once and have it all. Downside: May take a lot of time and abstracts may not contain enough information

Option 2) while screening, you only annotate inclusion/exclusion. Once you are done, you go back and look at only the included articles and code additional labels.

A hybrid of both is usually the best. Think of which labels (ideally with few and clearly distinguishable choices) you will really need in the beginning and what the abstracts can actually provide. Oftentimes, teams set up too complex or nuanced annotation schemes where annotators rarely agree on the correct label (which renders the annotations not useful, since they are inconsistent) or where abstracts frequently do not provide the necessary information to make a decision.

Annotate consistently

Most important of all, you have to annotate consistently throughout—a good documentation and annotation guide will go a long way. Rule of thumb: Keep it as simple as possible but think ahead and make it as useful as possible. For any label you add, you should think about how you are going to use it. If a label is coded inconsistently, it effectively cannot be used later to count or filter your data faithfully or will confuse machine learning models.

Learnings⚓︎

Here are some tips we learned along the way

"Maybe" and "other" may seem like good labels, but they are often not useful in practice
When in doubt while annotating, rather include than exclude; when you go though full-texts, you may be able to make a final decision