http://eqipd-toolbox.paasp.net/api.php?action=feedcontributions&user=85.216.81.116&feedformat=atomEQIPD - User contributions [en]2024-03-28T20:06:38ZUser contributionsMediaWiki 1.31.0http://eqipd-toolbox.paasp.net/index.php?title=2.1.7_Blinding&diff=12152.1.7 Blinding2021-01-11T17:40:02Z<p>85.216.81.116: /* B. Guidance & Expectations */</p>
<hr />
<div>== A. Background & Definitions ==<br />
<br />
<br />
'''Blinding''' refers to the masking of the treatment allocation for person(s) who perform the experiment, collect data and assess outcome. Blinding aims to make sure that someone has no knowledge about treatment allocation that may systematically influence his/her performance. The intended result is the equal treatment (as far as possible) of all experimental units (animals, subjects or samples) in the experiment.<br />
<br />
In the discussion below, experimental groups refer to '''all''' groups involved in an experiment, for example: control, sham, treated with drug A, treated with drug B, etc.<br />
<br />
Group allocation describes which experimental unit (animal, subject or sample) has been allocated to which experimental group.<br />
<br />
The group allocation, actions and outcome assessments are ‘'''blinded'''’. People are ‘'''blind'''’ to particular information.<br />
<br />
Blinding requires at least 2 people, one blinded person (unaware of experimental condition) and an unblinded person (knows the experimental condition and the blinding code). The unblinded person is the keeper of the blinding code which needs to be concealed until all processes under blinding are concluded.<br />
<br />
The most effective blinding covers every step in an experiment - from allocation to treatment conditions, application of treatment to data collection and analysis - this is often referred to as '''full blinding'''. <br />
<br />
Blinding should '''not''' be seen as "all or none". There are several situations when partial blinding may be applied (i.e. blinding of the most risk-prone step(s) in the experimental process). For example, partial blinding can be considered when:<br />
* a research unit with no prior experience with blinding is introducing a blinding procedure and, for organizational or other reasons, follows a stepwise implementation<br />
* a research unit has significantly constrained human resources and does not intend to conduct knowledge-claiming research<br />
<br />
In any case, full and transparent reporting of how blinding was applied is expected.<br />
<br />
== B. Guidance & Expectations ==<br />
<br />
<br />
EQIPD expects that the method(s) to implement blinding are described with as much details as possible:<br />
* either as a dedicated protocol (please see below for a template that may serve as an example on how to build such a protocol),<br />
* or as a separate section of a study plan.<br />
<br />
Dependent on the breadth of research methods in use, a given research unit may have one or more blinding protocols that can support blinding for specific types of experiments.<br />
<br />
When preparing a blinding protocol, the main objective is to have a description that is understandable for the actual users - i.e. bench scientists (especially, those that are new to the unit). Therefore, it should be written in a simple language with as many examples (specific to the research) as appropriate.<br />
<br />
A blinding protocol may describe the following:<br />
<br />
* Training and competence<br />
** is there any training needed? <br />
** are there any additional supporting tools or materials available?<br />
<br />
* Feasibility assessment (to avoid applying blinding when it makes no sense or would actually do harm) <br />
** how high is the risk of unintentional unblinding?<br />
** are the required resources available?<br />
** are emergency scenarios considered?<br />
<br />
* "Who does what?"<br />
** сlearly define the roles for those involved in the experiment and the blinding procedure (e.g., see section 4 in the [https://paasp.sharepoint.com/:w:/s/EQIPD/EZbRvZmZGoRGtsjc1Wk_XOsBLsnNAbg-FBFVj9h199oYMA?e=vIXtkO blinding protocol]);<br />
** the blinding protocol should make clear who is aware of the group allocation at the different stages of the experiment (during the allocation, the conduct of the experiment, the outcome assessment, and the data analysis);<br />
** to effectively blind a study, create a sequence containing all experimental steps of the study and, for each step, indicate the name of each person involved in the conduct and analysis of the study. For each experimental step, document for each person whether they are blinded or not blinded to the condition. Such an overview systematically creates a transparent workflow of blinded and unblinded personnel and shows when unintended unblinding might occur. Such overview (e.g. as a table) can be made part of experimental documentation and reporting. <br />
** it is generally expected and strongly recommended that any process using humans as perceptors, raters or interpreters needs to be blinded until the decision-making is concluded.<br />
<br />
* Blinding code<br />
** describe how the blinding code is developed and which specific steps are taken to practically apply it?<br />
** one simple blinding strategy is to assign each subject / sample a separate number of letter (or a combination thereof). This approach may create compliance issues in case of a large number of subjects / samples and the need to apply treatment repeatedly over extended periods of time.<br />
** another blinding strategy is to assign each experimental group a separate number of letter (or a combination thereof). This approach may be problematic when human processing and rating is involved in outcome assessment. The assessor may not know the condition behind the code but the knowledge of a group affiliation of a sample can influence rating.<br />
** the decision which strategy to follow is made by the researchers taking into account the details of a specific experiment and associated risks.<br />
<br />
<br />
'''PRACTICAL TIPS'''<br />
<br />
* Generation of alphanumeric codes for blinding<br />
** if possible, check whether it is possible to use a blinding scheme without repeating codes. This can be easily done with alphanumeric code consisting of 4 letter/number combinations, such as T7Z4. Such codes can be generated in Excel using the following formula:<br />
*** =CHAR(RANDBETWEEN(65;))&RANDBETWEEN(0;9)&CHAR(RANDBETWEEN(65;))&RANDBETWEEN(0;9)<br />
*** enter this formula in a row of cells for which you need coded samples and copy the outcome to another worksheet with the command Past Special-->Paste Values.<br />
<br />
* Allocation concealment in animal experiments<br />
** to prevents selection bias, the investigator shall not be aware and/or have the choice to which treatment group an animal is allocated to;<br />
** therefore, the assignment to a specific group needs to be concealed and every animal should have the same chance to be assigned to each of the groups;<br />
** this can be achieved by separating the assignment of animal_IDs to each animal (e.g. individual ear mark or subcutaneous chip) and randomization of treatments (see randomization) into two independent processes and then merging the two.<br />
<br />
<br />
'''RISK ASSESSMENT'''<br />
* Under some circumstances, unintentional unblinding (e.g. due to a different appearance of a positive control in solution or suspension) may be a risk to be assessed and/or controlled<br />
* Experimental treatments may produce adverse effects and attending veterinarians and animal care stuff may need to be informed in advance about the possibility of such adverse effects occurring and, if necessary, have emergency access to the blinding protocol.<br />
* if a blinding code is added to another code such as animal_ID, measurement_ID or file name, watch out for hidden cues in such IDs, containing temporal or sequential information that could increase rater bias. Also metadata, such as creation date and time of a file containg measurements can give away experimental conditions. <br />
<br />
<br />
'''PLEASE DO NOT FORGET'''<br />
<br />
* Blinding is sometimes not possible especially when certain cues cannot be blinded, such as skin color of transgenic mice or color of a solution in a well. It is important to document this and to communicate in reports where blinding could or could not be achieved.<br />
* Unblinding of the experimental conditions should be done when all blinded processes for the entire study are concluded. Early and partial unblinding for "checking" should be avoided and, if necessary, be part of the study protocol.<br />
* Control group(s) (e.g., positive control group) should not be excluded from the blinding procedure.<br />
* Provide training on how to apply the blinding procedure.<br />
<br />
<br />
== C. Resources ==<br />
<br />
<br />
Guidelines on reporting of blinding (in vivo research):<br />
* [[ARRIVE 2.0]] <br />
<br />
Template to develop a written description of the method used to implement blinding:<br />
* [https://paasp.sharepoint.com/:w:/s/EQIPD/EZbRvZmZGoRGtsjc1Wk_XOsBLsnNAbg-FBFVj9h199oYMA?e=vIXtkO blinding protocol]<br />
<br />
Reading material:<br />
<br />
* [https://link.springer.com/chapter/10.1007/164_2019_279 Handbook of Experimental pharmacology chapter on randomization and blinding]<br />
<br />
<br />
----------------<br />
<br />
back to [[Toolbox]]<br />
<br />
Next item: [[2.1.8 Randomisation]]<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
</div>85.216.81.116http://eqipd-toolbox.paasp.net/index.php?title=2.1.7_Blinding&diff=12142.1.7 Blinding2021-01-11T17:39:07Z<p>85.216.81.116: /* C. Resources */</p>
<hr />
<div>== A. Background & Definitions ==<br />
<br />
<br />
'''Blinding''' refers to the masking of the treatment allocation for person(s) who perform the experiment, collect data and assess outcome. Blinding aims to make sure that someone has no knowledge about treatment allocation that may systematically influence his/her performance. The intended result is the equal treatment (as far as possible) of all experimental units (animals, subjects or samples) in the experiment.<br />
<br />
In the discussion below, experimental groups refer to '''all''' groups involved in an experiment, for example: control, sham, treated with drug A, treated with drug B, etc.<br />
<br />
Group allocation describes which experimental unit (animal, subject or sample) has been allocated to which experimental group.<br />
<br />
The group allocation, actions and outcome assessments are ‘'''blinded'''’. People are ‘'''blind'''’ to particular information.<br />
<br />
Blinding requires at least 2 people, one blinded person (unaware of experimental condition) and an unblinded person (knows the experimental condition and the blinding code). The unblinded person is the keeper of the blinding code which needs to be concealed until all processes under blinding are concluded.<br />
<br />
The most effective blinding covers every step in an experiment - from allocation to treatment conditions, application of treatment to data collection and analysis - this is often referred to as '''full blinding'''. <br />
<br />
Blinding should '''not''' be seen as "all or none". There are several situations when partial blinding may be applied (i.e. blinding of the most risk-prone step(s) in the experimental process). For example, partial blinding can be considered when:<br />
* a research unit with no prior experience with blinding is introducing a blinding procedure and, for organizational or other reasons, follows a stepwise implementation<br />
* a research unit has significantly constrained human resources and does not intend to conduct knowledge-claiming research<br />
<br />
In any case, full and transparent reporting of how blinding was applied is expected.<br />
<br />
== B. Guidance & Expectations ==<br />
<br />
<br />
EQIPD expects that the method(s) to implement blinding are described with as much details as possible:<br />
* either as a dedicated protocol (please see below for a template that may serve as an example on how to build such a protocol),<br />
* or as a separate section of a study plan.<br />
<br />
Dependent on the breadth of research methods in use, a given research unit may have one or more blinding protocols that can support blinding for specific types of experiments.<br />
<br />
When preparing a blinding protocol, the main objective is to have a description that is understandable for the actual users - i.e. bench scientists (especially, those that are new to the unit). Therefore, it should be written in a simple language with as many examples (specific to the research) as appropriate.<br />
<br />
A blinding protocol may describe the following:<br />
<br />
* Training and competence<br />
** is there any training needed? <br />
** are there any additional supporting tools or materials available?<br />
<br />
* Feasibility assessment (to avoid applying blinding when it makes no sense or would actually do harm) <br />
** how high is the risk of unintentional unblinding?<br />
** are the required resources available?<br />
** are emergency scenarios considered?<br />
<br />
* "Who does what?"<br />
** сlearly define the roles for those involved in the experiment and the blinding procedure (e.g., see section 4 in the blinding protocol template below);<br />
** the blinding protocol should make clear who is aware of the group allocation at the different stages of the experiment (during the allocation, the conduct of the experiment, the outcome assessment, and the data analysis);<br />
** to effectively blind a study, create a sequence containing all experimental steps of the study and, for each step, indicate the name of each person involved in the conduct and analysis of the study. For each experimental step, document for each person whether they are blinded or not blinded to the condition. Such an overview systematically creates a transparent workflow of blinded and unblinded personnel and shows when unintended unblinding might occur. Such overview (e.g. as a table) can be made part of experimental documentation and reporting. <br />
** it is generally expected and strongly recommended that any process using humans as perceptors, raters or interpreters needs to be blinded until the decision-making is concluded.<br />
<br />
* Blinding code<br />
** describe how the blinding code is developed and which specific steps are taken to practically apply it?<br />
** one simple blinding strategy is to assign each subject / sample a separate number of letter (or a combination thereof). This approach may create compliance issues in case of a large number of subjects / samples and the need to apply treatment repeatedly over extended periods of time.<br />
** another blinding strategy is to assign each experimental group a separate number of letter (or a combination thereof). This approach may be problematic when human processing and rating is involved in outcome assessment. The assessor may not know the condition behind the code but the knowledge of a group affiliation of a sample can influence rating.<br />
** the decision which strategy to follow is made by the researchers taking into account the details of a specific experiment and associated risks.<br />
<br />
<br />
'''PRACTICAL TIPS'''<br />
<br />
* Generation of alphanumeric codes for blinding<br />
** if possible, check whether it is possible to use a blinding scheme without repeating codes. This can be easily done with alphanumeric code consisting of 4 letter/number combinations, such as T7Z4. Such codes can be generated in Excel using the following formula:<br />
*** =CHAR(RANDBETWEEN(65;))&RANDBETWEEN(0;9)&CHAR(RANDBETWEEN(65;))&RANDBETWEEN(0;9)<br />
*** enter this formula in a row of cells for which you need coded samples and copy the outcome to another worksheet with the command Past Special-->Paste Values.<br />
<br />
* Allocation concealment in animal experiments<br />
** to prevents selection bias, the investigator shall not be aware and/or have the choice to which treatment group an animal is allocated to;<br />
** therefore, the assignment to a specific group needs to be concealed and every animal should have the same chance to be assigned to each of the groups;<br />
** this can be achieved by separating the assignment of animal_IDs to each animal (e.g. individual ear mark or subcutaneous chip) and randomization of treatments (see randomization) into two independent processes and then merging the two.<br />
<br />
<br />
'''RISK ASSESSMENT'''<br />
* Under some circumstances, unintentional unblinding (e.g. due to a different appearance of a positive control in solution or suspension) may be a risk to be assessed and/or controlled<br />
* Experimental treatments may produce adverse effects and attending veterinarians and animal care stuff may need to be informed in advance about the possibility of such adverse effects occurring and, if necessary, have emergency access to the blinding protocol.<br />
* if a blinding code is added to another code such as animal_ID, measurement_ID or file name, watch out for hidden cues in such IDs, containing temporal or sequential information that could increase rater bias. Also metadata, such as creation date and time of a file containg measurements can give away experimental conditions. <br />
<br />
<br />
'''PLEASE DO NOT FORGET'''<br />
<br />
* Blinding is sometimes not possible especially when certain cues cannot be blinded, such as skin color of transgenic mice or color of a solution in a well. It is important to document this and to communicate in reports where blinding could or could not be achieved.<br />
* Unblinding of the experimental conditions should be done when all blinded processes for the entire study are concluded. Early and partial unblinding for "checking" should be avoided and, if necessary, be part of the study protocol.<br />
* Control group(s) (e.g., positive control group) should not be excluded from the blinding procedure.<br />
* Provide training on how to apply the blinding procedure.<br />
<br />
<br />
== C. Resources ==<br />
<br />
<br />
Guidelines on reporting of blinding (in vivo research):<br />
* [[ARRIVE 2.0]] <br />
<br />
Template to develop a written description of the method used to implement blinding:<br />
* [https://paasp.sharepoint.com/:w:/s/EQIPD/EZbRvZmZGoRGtsjc1Wk_XOsBLsnNAbg-FBFVj9h199oYMA?e=vIXtkO blinding protocol]<br />
<br />
Reading material:<br />
<br />
* [https://link.springer.com/chapter/10.1007/164_2019_279 Handbook of Experimental pharmacology chapter on randomization and blinding]<br />
<br />
<br />
----------------<br />
<br />
back to [[Toolbox]]<br />
<br />
Next item: [[2.1.8 Randomisation]]<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
</div>85.216.81.116http://eqipd-toolbox.paasp.net/index.php?title=2.1.7_Blinding&diff=12132.1.7 Blinding2021-01-11T17:38:49Z<p>85.216.81.116: /* C. Resources */</p>
<hr />
<div>== A. Background & Definitions ==<br />
<br />
<br />
'''Blinding''' refers to the masking of the treatment allocation for person(s) who perform the experiment, collect data and assess outcome. Blinding aims to make sure that someone has no knowledge about treatment allocation that may systematically influence his/her performance. The intended result is the equal treatment (as far as possible) of all experimental units (animals, subjects or samples) in the experiment.<br />
<br />
In the discussion below, experimental groups refer to '''all''' groups involved in an experiment, for example: control, sham, treated with drug A, treated with drug B, etc.<br />
<br />
Group allocation describes which experimental unit (animal, subject or sample) has been allocated to which experimental group.<br />
<br />
The group allocation, actions and outcome assessments are ‘'''blinded'''’. People are ‘'''blind'''’ to particular information.<br />
<br />
Blinding requires at least 2 people, one blinded person (unaware of experimental condition) and an unblinded person (knows the experimental condition and the blinding code). The unblinded person is the keeper of the blinding code which needs to be concealed until all processes under blinding are concluded.<br />
<br />
The most effective blinding covers every step in an experiment - from allocation to treatment conditions, application of treatment to data collection and analysis - this is often referred to as '''full blinding'''. <br />
<br />
Blinding should '''not''' be seen as "all or none". There are several situations when partial blinding may be applied (i.e. blinding of the most risk-prone step(s) in the experimental process). For example, partial blinding can be considered when:<br />
* a research unit with no prior experience with blinding is introducing a blinding procedure and, for organizational or other reasons, follows a stepwise implementation<br />
* a research unit has significantly constrained human resources and does not intend to conduct knowledge-claiming research<br />
<br />
In any case, full and transparent reporting of how blinding was applied is expected.<br />
<br />
== B. Guidance & Expectations ==<br />
<br />
<br />
EQIPD expects that the method(s) to implement blinding are described with as much details as possible:<br />
* either as a dedicated protocol (please see below for a template that may serve as an example on how to build such a protocol),<br />
* or as a separate section of a study plan.<br />
<br />
Dependent on the breadth of research methods in use, a given research unit may have one or more blinding protocols that can support blinding for specific types of experiments.<br />
<br />
When preparing a blinding protocol, the main objective is to have a description that is understandable for the actual users - i.e. bench scientists (especially, those that are new to the unit). Therefore, it should be written in a simple language with as many examples (specific to the research) as appropriate.<br />
<br />
A blinding protocol may describe the following:<br />
<br />
* Training and competence<br />
** is there any training needed? <br />
** are there any additional supporting tools or materials available?<br />
<br />
* Feasibility assessment (to avoid applying blinding when it makes no sense or would actually do harm) <br />
** how high is the risk of unintentional unblinding?<br />
** are the required resources available?<br />
** are emergency scenarios considered?<br />
<br />
* "Who does what?"<br />
** сlearly define the roles for those involved in the experiment and the blinding procedure (e.g., see section 4 in the blinding protocol template below);<br />
** the blinding protocol should make clear who is aware of the group allocation at the different stages of the experiment (during the allocation, the conduct of the experiment, the outcome assessment, and the data analysis);<br />
** to effectively blind a study, create a sequence containing all experimental steps of the study and, for each step, indicate the name of each person involved in the conduct and analysis of the study. For each experimental step, document for each person whether they are blinded or not blinded to the condition. Such an overview systematically creates a transparent workflow of blinded and unblinded personnel and shows when unintended unblinding might occur. Such overview (e.g. as a table) can be made part of experimental documentation and reporting. <br />
** it is generally expected and strongly recommended that any process using humans as perceptors, raters or interpreters needs to be blinded until the decision-making is concluded.<br />
<br />
* Blinding code<br />
** describe how the blinding code is developed and which specific steps are taken to practically apply it?<br />
** one simple blinding strategy is to assign each subject / sample a separate number of letter (or a combination thereof). This approach may create compliance issues in case of a large number of subjects / samples and the need to apply treatment repeatedly over extended periods of time.<br />
** another blinding strategy is to assign each experimental group a separate number of letter (or a combination thereof). This approach may be problematic when human processing and rating is involved in outcome assessment. The assessor may not know the condition behind the code but the knowledge of a group affiliation of a sample can influence rating.<br />
** the decision which strategy to follow is made by the researchers taking into account the details of a specific experiment and associated risks.<br />
<br />
<br />
'''PRACTICAL TIPS'''<br />
<br />
* Generation of alphanumeric codes for blinding<br />
** if possible, check whether it is possible to use a blinding scheme without repeating codes. This can be easily done with alphanumeric code consisting of 4 letter/number combinations, such as T7Z4. Such codes can be generated in Excel using the following formula:<br />
*** =CHAR(RANDBETWEEN(65;))&RANDBETWEEN(0;9)&CHAR(RANDBETWEEN(65;))&RANDBETWEEN(0;9)<br />
*** enter this formula in a row of cells for which you need coded samples and copy the outcome to another worksheet with the command Past Special-->Paste Values.<br />
<br />
* Allocation concealment in animal experiments<br />
** to prevents selection bias, the investigator shall not be aware and/or have the choice to which treatment group an animal is allocated to;<br />
** therefore, the assignment to a specific group needs to be concealed and every animal should have the same chance to be assigned to each of the groups;<br />
** this can be achieved by separating the assignment of animal_IDs to each animal (e.g. individual ear mark or subcutaneous chip) and randomization of treatments (see randomization) into two independent processes and then merging the two.<br />
<br />
<br />
'''RISK ASSESSMENT'''<br />
* Under some circumstances, unintentional unblinding (e.g. due to a different appearance of a positive control in solution or suspension) may be a risk to be assessed and/or controlled<br />
* Experimental treatments may produce adverse effects and attending veterinarians and animal care stuff may need to be informed in advance about the possibility of such adverse effects occurring and, if necessary, have emergency access to the blinding protocol.<br />
* if a blinding code is added to another code such as animal_ID, measurement_ID or file name, watch out for hidden cues in such IDs, containing temporal or sequential information that could increase rater bias. Also metadata, such as creation date and time of a file containg measurements can give away experimental conditions. <br />
<br />
<br />
'''PLEASE DO NOT FORGET'''<br />
<br />
* Blinding is sometimes not possible especially when certain cues cannot be blinded, such as skin color of transgenic mice or color of a solution in a well. It is important to document this and to communicate in reports where blinding could or could not be achieved.<br />
* Unblinding of the experimental conditions should be done when all blinded processes for the entire study are concluded. Early and partial unblinding for "checking" should be avoided and, if necessary, be part of the study protocol.<br />
* Control group(s) (e.g., positive control group) should not be excluded from the blinding procedure.<br />
* Provide training on how to apply the blinding procedure.<br />
<br />
<br />
== C. Resources ==<br />
<br />
<br />
Guidelines on reporting of blinding (in vivo research):<br />
* [[ARRIVE 2.0]] <br />
<br />
Template to develop a written description of the method used to implement blinding:<br />
* [[https://paasp.sharepoint.com/:w:/s/EQIPD/EZbRvZmZGoRGtsjc1Wk_XOsBLsnNAbg-FBFVj9h199oYMA?e=vIXtkO blinding protocol]]<br />
<br />
Reading material:<br />
<br />
* [https://link.springer.com/chapter/10.1007/164_2019_279 Handbook of Experimental pharmacology chapter on randomization and blinding]<br />
<br />
<br />
----------------<br />
<br />
back to [[Toolbox]]<br />
<br />
Next item: [[2.1.8 Randomisation]]<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
</div>85.216.81.116http://eqipd-toolbox.paasp.net/index.php?title=2.1.7_Blinding&diff=12122.1.7 Blinding2021-01-11T17:04:29Z<p>85.216.81.116: /* B. Guidance & Expectations */</p>
<hr />
<div>== A. Background & Definitions ==<br />
<br />
<br />
'''Blinding''' refers to the masking of the treatment allocation for person(s) who perform the experiment, collect data and assess outcome. Blinding aims to make sure that someone has no knowledge about treatment allocation that may systematically influence his/her performance. The intended result is the equal treatment (as far as possible) of all experimental units (animals, subjects or samples) in the experiment.<br />
<br />
In the discussion below, experimental groups refer to '''all''' groups involved in an experiment, for example: control, sham, treated with drug A, treated with drug B, etc.<br />
<br />
Group allocation describes which experimental unit (animal, subject or sample) has been allocated to which experimental group.<br />
<br />
The group allocation, actions and outcome assessments are ‘'''blinded'''’. People are ‘'''blind'''’ to particular information.<br />
<br />
Blinding requires at least 2 people, one blinded person (unaware of experimental condition) and an unblinded person (knows the experimental condition and the blinding code). The unblinded person is the keeper of the blinding code which needs to be concealed until all processes under blinding are concluded.<br />
<br />
The most effective blinding covers every step in an experiment - from allocation to treatment conditions, application of treatment to data collection and analysis - this is often referred to as '''full blinding'''. <br />
<br />
Blinding should '''not''' be seen as "all or none". There are several situations when partial blinding may be applied (i.e. blinding of the most risk-prone step(s) in the experimental process). For example, partial blinding can be considered when:<br />
* a research unit with no prior experience with blinding is introducing a blinding procedure and, for organizational or other reasons, follows a stepwise implementation<br />
* a research unit has significantly constrained human resources and does not intend to conduct knowledge-claiming research<br />
<br />
In any case, full and transparent reporting of how blinding was applied is expected.<br />
<br />
== B. Guidance & Expectations ==<br />
<br />
<br />
EQIPD expects that the method(s) to implement blinding are described with as much details as possible:<br />
* either as a dedicated protocol (please see below for a template that may serve as an example on how to build such a protocol),<br />
* or as a separate section of a study plan.<br />
<br />
Dependent on the breadth of research methods in use, a given research unit may have one or more blinding protocols that can support blinding for specific types of experiments.<br />
<br />
When preparing a blinding protocol, the main objective is to have a description that is understandable for the actual users - i.e. bench scientists (especially, those that are new to the unit). Therefore, it should be written in a simple language with as many examples (specific to the research) as appropriate.<br />
<br />
A blinding protocol may describe the following:<br />
<br />
* Training and competence<br />
** is there any training needed? <br />
** are there any additional supporting tools or materials available?<br />
<br />
* Feasibility assessment (to avoid applying blinding when it makes no sense or would actually do harm) <br />
** how high is the risk of unintentional unblinding?<br />
** are the required resources available?<br />
** are emergency scenarios considered?<br />
<br />
* "Who does what?"<br />
** сlearly define the roles for those involved in the experiment and the blinding procedure (e.g., see section 4 in the blinding protocol template below);<br />
** the blinding protocol should make clear who is aware of the group allocation at the different stages of the experiment (during the allocation, the conduct of the experiment, the outcome assessment, and the data analysis);<br />
** to effectively blind a study, create a sequence containing all experimental steps of the study and, for each step, indicate the name of each person involved in the conduct and analysis of the study. For each experimental step, document for each person whether they are blinded or not blinded to the condition. Such an overview systematically creates a transparent workflow of blinded and unblinded personnel and shows when unintended unblinding might occur. Such overview (e.g. as a table) can be made part of experimental documentation and reporting. <br />
** it is generally expected and strongly recommended that any process using humans as perceptors, raters or interpreters needs to be blinded until the decision-making is concluded.<br />
<br />
* Blinding code<br />
** describe how the blinding code is developed and which specific steps are taken to practically apply it?<br />
** one simple blinding strategy is to assign each subject / sample a separate number of letter (or a combination thereof). This approach may create compliance issues in case of a large number of subjects / samples and the need to apply treatment repeatedly over extended periods of time.<br />
** another blinding strategy is to assign each experimental group a separate number of letter (or a combination thereof). This approach may be problematic when human processing and rating is involved in outcome assessment. The assessor may not know the condition behind the code but the knowledge of a group affiliation of a sample can influence rating.<br />
** the decision which strategy to follow is made by the researchers taking into account the details of a specific experiment and associated risks.<br />
<br />
<br />
'''PRACTICAL TIPS'''<br />
<br />
* Generation of alphanumeric codes for blinding<br />
** if possible, check whether it is possible to use a blinding scheme without repeating codes. This can be easily done with alphanumeric code consisting of 4 letter/number combinations, such as T7Z4. Such codes can be generated in Excel using the following formula:<br />
*** =CHAR(RANDBETWEEN(65;))&RANDBETWEEN(0;9)&CHAR(RANDBETWEEN(65;))&RANDBETWEEN(0;9)<br />
*** enter this formula in a row of cells for which you need coded samples and copy the outcome to another worksheet with the command Past Special-->Paste Values.<br />
<br />
* Allocation concealment in animal experiments<br />
** to prevents selection bias, the investigator shall not be aware and/or have the choice to which treatment group an animal is allocated to;<br />
** therefore, the assignment to a specific group needs to be concealed and every animal should have the same chance to be assigned to each of the groups;<br />
** this can be achieved by separating the assignment of animal_IDs to each animal (e.g. individual ear mark or subcutaneous chip) and randomization of treatments (see randomization) into two independent processes and then merging the two.<br />
<br />
<br />
'''RISK ASSESSMENT'''<br />
* Under some circumstances, unintentional unblinding (e.g. due to a different appearance of a positive control in solution or suspension) may be a risk to be assessed and/or controlled<br />
* Experimental treatments may produce adverse effects and attending veterinarians and animal care stuff may need to be informed in advance about the possibility of such adverse effects occurring and, if necessary, have emergency access to the blinding protocol.<br />
* if a blinding code is added to another code such as animal_ID, measurement_ID or file name, watch out for hidden cues in such IDs, containing temporal or sequential information that could increase rater bias. Also metadata, such as creation date and time of a file containg measurements can give away experimental conditions. <br />
<br />
<br />
'''PLEASE DO NOT FORGET'''<br />
<br />
* Blinding is sometimes not possible especially when certain cues cannot be blinded, such as skin color of transgenic mice or color of a solution in a well. It is important to document this and to communicate in reports where blinding could or could not be achieved.<br />
* Unblinding of the experimental conditions should be done when all blinded processes for the entire study are concluded. Early and partial unblinding for "checking" should be avoided and, if necessary, be part of the study protocol.<br />
* Control group(s) (e.g., positive control group) should not be excluded from the blinding procedure.<br />
* Provide training on how to apply the blinding procedure.<br />
<br />
<br />
== C. Resources ==<br />
<br />
<br />
Guidelines on reporting of blinding (in vivo research):<br />
* [[ARRIVE 2.0]] <br />
<br />
Template to develop a written description of the method used to implement blinding:<br />
* [blinding protocol]<br />
<br />
Reading material:<br />
<br />
* [https://link.springer.com/chapter/10.1007/164_2019_279 Handbook of Experimental pharmacology chapter on randomization and blinding]<br />
<br />
<br />
----------------<br />
<br />
back to [[Toolbox]]<br />
<br />
Next item: [[2.1.8 Randomisation]]<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
</div>85.216.81.116http://eqipd-toolbox.paasp.net/index.php?title=2.1.8_Randomisation&diff=12102.1.8 Randomisation2021-01-11T16:44:06Z<p>85.216.81.116: /* B. Guidance & Expectations */</p>
<hr />
<div>== A. Background & Definitions ==<br />
<br />
Randomisation is a process of random assignment of experimental units to treatment conditions:<br />
<br />
* occurrence of one event should have no influence on the next event (independence principle);<br />
* randomisation sequence cannot be based on an easily memorizable and reproducible sequence (randomness principle).<br />
<br />
Randomization serves three main purposes:<br />
<br />
* enables the application of statistical tests based on the central limit theorem;<br />
* prevents a potential impact of the selection bias due to differing baseline or confounding characteristics of the subjects;<br />
* supports the implementation of other means to reduce the risks of bias (such as blinding).<br />
<br />
== B. Guidance & Expectations ==<br />
Randomisation protocol should describe the following:<br />
* Type of randomisation (simple / unrestricted, block, stratified, etc.)<br />
* Block size (if applicable)<br />
* Stratification variables (if applicable)<br />
* Tools used for randomisation (including copy of a script if R, SAS or another similar script-based software is used)<br />
* Reproducibility of the randomisation protocol such as the seed of random number generator (if applicable)<br />
* Reference to the protocol followed (if applicable)<br />
* Methods to monitor / detect deviations from the protocol (if any)<br />
* If a decision is made not to introduce a proper randomisation protocol, the reasons should be discussed in a declaration justifying the decision to use pseudo-randomisation or simple interspersion methods.<br />
<br />
'''RISK ASSESSMENT'''<br />
<br />
* Is pseudo-randomisation used instead of strongly recommended true randomisation?<br />
* Is there a risk that randomisation is introduced at allocation of subjects per experimental groups but is not maintained throughout the study conduct, outcome assessment and data analysis?<br />
<br />
<br />
'''PLEASE DO NOT FORGET'''<br />
<br />
* To consider adding this subject to a training program for new employees or refresher training (if appropriate)<br />
* To assess the risks of cross-contamination when animals housed in the same cage are exposed to different pharmacological treatments<br />
* To check whether there are feedback channels installed so that your colleagues can identify, record and report errors and critical incidents related to this subject (if appropriate)<br />
<br />
== C. Resources ==<br />
<br />
<br />
Guidelines on reporting of randomization (in vivo research):<br />
<br />
[[ARRIVE 2.0]] <br />
<br />
Online tools to support randomisation:<br />
<br />
* NC3Rs’ Experimental Design Assistant - [www.eda.nc3rs.org.uk]<br />
<br />
* QuickCalcs - [www.graphpad.com/quickcalcs/randMenu/]<br />
<br />
* Sealed Envelope - [https://www.sealedenvelope.com/simple-randomiser/v1/lists]<br />
<br />
* RandoMice software - [[https://doi.org/10.1371/journal.pone.0237096 read]] - [[https://github.com/Rve54/RandoMice/releases/ download and install]]<br />
<br />
<br />
Reading material:<br />
<br />
Handbook of Experimental pharmacology chapter on randomization and blinding [https://link.springer.com/chapter/10.1007/164_2019_279]<br />
<br />
<br />
<br />
----------------<br />
back to [[Toolbox]]<br />
<br />
Next item: [[2.1.9 Inclusion and exclusion criteria]]</div>85.216.81.116http://eqipd-toolbox.paasp.net/index.php?title=2.1.6_Sample_size_and_power_analysis&diff=6602.1.6 Sample size and power analysis2020-11-04T14:26:31Z<p>85.216.81.116: /* C. Resources */</p>
<hr />
<div><br />
<br />
UNDER CONSTRUCTION<br />
<br />
<br />
<br />
<br />
<br />
== A. Background & Definitions ==<br />
<br />
Statistical power is defined as the probability of detecting a statistically significant effect of a pre-specified size. Formally, power is equal to 1 minus the Type II error rate (beta or ß). The Type II error rate is the probability of obtaining a non-significant result when the null hypothesis is false — in other words, failing to find a difference or relationship when one exists.<br />
<br />
Balancing sample size, effect size and power is critical to good study design. When the power is low, only large effects can be detected, and negative results cannot be reliably interpreted. The consequences of low power are particularly dire in the search for high-impact results, when the researcher may be willing to pursue low-likelihood hypotheses for a groundbreaking discovery (see Fig. 1 in [https://www.nature.com/articles/nmeth.2738 Krzywinski & Altman 2013]). Ensuring that sample sizes are large enough to detect the effects of interest is an essential part of study design.<br />
<br />
Studies with inadequate power are a waste of research resources and arguably unethical when subjects are exposed to potentially harmful or inferior experimental conditions.<br />
<br />
Statistical power analysis exploits the relationships among the four variables involved in statistical inference: sample size (N), significance criterion (α), effect size (ES), and statistical power.<br />
<br />
== B. Guidance & Expectations ==<br />
<br />
General advice - DO:<br />
<br />
* Whenever possible, seek professional biostatistician support to estimate sample size.<br />
* Use power prospectively for planning future studies.<br />
* Put science before statistics. It is easy to get caught up in statistical significance and such; but studies should be designed to meet scientific goals, and you need to keep those in sight at all times (in planning and analysis). The appropriate inputs to power/sample-size calculations are effect sizes that are deemed scientifically important, based on careful considerations of the underlying scientific (not statistical) goals of the study. Statistical considerations are used to identify a plan that is effective in meeting scientific goals – not the other way around.<br />
* Do pilot studies. Investigators tend to try to answer all the world’s questions with one study. However, you usually cannot do a definitive study in one step. It is far better to work incrementally. A pilot study helps you establish procedures, understand and protect against things that can go wrong, and obtain variance estimates needed in determining sample size. A pilot study with 20-30 degrees of freedom for error is generally adequate for obtaining reasonably reliable sample-size estimates.<br />
* Generate sample size estimates for a range of power and effect size values to explore the gains and losses in power or detectable effect size due to increasing or decreasing n. This is why the term ‘sample size estimation’ is often preferred over ‘sample size calculation’. Although the arrival at a number for the required sample size is invariably based on (often complex) formulae, the term ‘calculation’ implies an unwarranted degree of precision. The purpose of sample size estimation is not to give an exact number but rather to subject the study design to scrutiny, including an assessment of the validity and reliability of data collection ([https://www.sciencedirect.com/science/article/pii/S1466853X05000714 Batterham & Atkinson 2005]).<br />
* Remember to consider attrition rate (i.e. possibility that some subjects or samples are lost during the conduct of the study or follow-up for technical and other data analysis-unrelated reasons)<br />
<br />
<br />
General advice - DO NOT:<br />
<br />
* Avoid using the definition of “small,” “medium,” or “large” effect size based on Cohen's d of .20, .50, or .80, respectively. Cohen's assessments are based on an extensive survey of statistics reported in the literature in the social sciences and may not apply to other fields of science. Further, this method uses a standardized effect size as the goal. Think about it: for a “medium” effect size, you’ll choose the same n regardless of the accuracy or reliability of your instrument, or the narrowness or diversity of your subjects. Clearly, important considerations are being ignored here. “Medium” is definitely not the message!<br />
<br />
* Retrospective power calculations should be avoided, because they add no new information to an analysis (i.e. avoid using observed power to interpret the results of the statistical test). You’ve got the data, did the analysis, and did not achieve “significance.” So you compute power retrospectively to see if the test was powerful enough or not. This is an empty question. Of course it wasn’t powerful enough – that’s why the result isn’t significant. Power calculations are useful for design, not analysis. <br />
<br />
<br />
Guidance on sample size estimation:<br />
<br />
--- to be added / revised (please do not edit - placeholder) ---<br />
<br />
.. getting a solid grip on the existing literature in one's topic, drilling down to what effects were identified and obtaining the corresponding ES values either directly from the publication or from appropriate calculations based on the printed documentation.<br />
<br />
.. being sure that the estimate you obtain is the one that fits the study design correctly; one cannot necessarily generalize across disparate research designs.<br />
<br />
.. and citing the algorithm or software used to generate the estimates. A power calculation result given without this detail can be viewed with suspicion.<br />
<br />
--- to be added / revised (please do not edit - placeholder) ---<br />
<br />
<br />
<br />
What to do if you have no choice about sample size:<br />
<br />
Limited budget, limited supply of research materials, or a difficult-to-overcome guidance from a collaborator, a funder or a senior colleagues may leave no choice but to consider running a study with a certain potentially small sample size. What can be done in such situations?<br />
<br />
* consider study designs involving correlated data (e.g. repeated measures, crossover or matched-pairs designs) that are usually associated with greater statistical power than those involving separate samples allocated to different treatment groups ([[https://www.sciencedirect.com/science/article/pii/S1466853X05000714 see section 2.1 here]).<br />
* consider intervening variables or pre-intervention measurements for stratification; if not possible, one can still improve statistical power by entering these variables as covariates in the analysis (this approach has its limitations and therefore should be consulted with the statisticians)<br />
* make sure that the most suited randomization schedule is used to control for random influences<br />
* explore and engage all other means to minimize variation (including using properly maintained and calibrated research instruments, adequate and well controlled environmental conditions, making sure that experiments are performed by competent and adequately trained scientists)<br />
* if a study has low power because of the given sample size, reflect this limitation in the study plan and indicate to all stakeholders that the study cannot be run as knowledge-claiming (decision-enabling, confirmatory).<br />
* evaluate power not only for the given sample size for also for the values around and discuss the impact of the sample size on power with the stakeholders - in some cases, it may help to lift or revise the original sample size restrictions. These discussions make sense and are justifiable only if they take place prior to the conduct of the study (i.e. not post hoc).<br />
<br />
== C. Resources ==<br />
<br />
Tools to sample size estimation:<br />
<br />
* [http://www.gpower.hhu.de/ G*Power]<br />
* [https://wise1.cgu.edu/power/index.asp WISE power tutorial]<br />
* [http://davidmlane.com/hyperstat/power.html JAVA applets for power and sample size]<br />
* [https://www.psychometrica.de/effect_size.html Computation of sample sizes @Psychometrica]<br />
* [http://powerandsamplesize.com/Calculators/ Overview of sample size and power calculators]<br />
<br />
<br />
Educational instruments and resources:<br />
<br />
* Mayo clinical online simulator - Size matters [https://rtools.mayo.edu/size_matters/]<br />
* Scientists talking to biostatisticians [https://www.youtube.com/watch?v=PbODigCZqL8&feature=youtu.be]<br />
<br />
<br />
Useful literature (for non-statisticians):<br />
<br />
* [https://stat.uiowa.edu/sites/stat.uiowa.edu/files/techrep/tr303.pdf Practical advice on sample size estimation by Russell Lenth]<br />
* [https://www.sciencedirect.com/science/article/pii/S1466853X05000714 A primer on murky world of sample size estimation by Alan Batterham & Greg Atkinson]<br />
<br />
Useful literature:<br />
<br />
* [https://pdfs.semanticscholar.org/1325/24bdfe70504fcd67016b17305ccddb4bcd14.pdf Power in various ANOVA designs by Joel Levin]<br />
<br />
* [https://www.ncbi.nlm.nih.gov/books/NBK43321/ https://www.ncbi.nlm.nih.gov/books/NBK43321/]<br />
* [http://davidmlane.com/hyperstat/power.html http://davidmlane.com/hyperstat/power.html]<br />
* [http://powerandsamplesize.com/Calculators/Test-1-Mean/1-Sample-Equality http://powerandsamplesize.com/Calculators]<br />
<br />
Guidelines on reporting of sample size (in vivo research):<br />
* [[ARRIVE 2.0]] <br />
<br />
<br />
----------------<br />
back to [[Toolbox]]<br />
<br />
Next item: [[2.1.7 Blinding]]</div>85.216.81.116http://eqipd-toolbox.paasp.net/index.php?title=2.1.11_Preregistration&diff=6392.1.11 Preregistration2020-10-29T09:59:49Z<p>85.216.81.116: /* A. Background & Definitions */</p>
<hr />
<div>== A. Background & Definitions ==<br />
<br />
Preregistration refers to a process of registration of study plan and data analysis plan before conducting a study. <br />
<br />
Registered Report is a maximal form of registration, in which study manuscript that includes the study hypothesis, rationale, methods, experimental plan and a detailed analysis plan is peer reviewed (Stage 1) before study data is collected. After data collection, the final manuscript that includes results and discussion sections undergoes a conventional peer review (Stage 2) where adherence to the original (Stage 1) procedures is confirmed. Following favorable reviews and, regardless of study results, the manuscript is accepted for publication ([https://www.nature.com/articles/s41562-016-0021 Munafo et al. 2017], [https://openscience.bmj.com/pages/registered-reports-guidelines/ BMJ Open Science, Registered Reports Guidelines]).<br />
<br />
Benefits of preregistration:<br />
<br />
* Increases transparency<br />
* Serves to reduce:<br />
** The risk of publication bias<br />
** "HARKing" (hypothesizing after the results are known)<br />
** P-hacking (analytical decisions after the results are known)<br />
** In case of a Registered Report, registration helps also against "CARKing" (unjustified critique of the article by reviewers after the results are known; Munafo et al. 2017).<br />
<br />
<br />
== B. Guidance & Expectations ==<br />
* If study is done to inform a knowledge claim ([[2.1.4 Purpose of research]]), it is strongly recommended to preregister the study protocol before data are collected.<br />
* It is strongly recommended to register systematic review protocols.<br />
* Training on planning and benefits of preregistration is highly recommended.<br />
<br />
'''PLEASE DO NOT FORGET'''<br />
* To consider adding this subject to a training program for new employees or refresher training (if appropriate)<br />
* To check for risks of disclosing confidential or otherwise sensitive or proprietary information (e.g. in the context of existing or emerging intellectual property)<br />
<br />
<br />
== C. Resources ==<br />
Online registry platforms: <br />
* Animal Study registries<br />
** Preclinicaltrials.eu - [https://www.preclinicaltrials.eu/]<br />
** Animal Study Registry - [https://www.animalstudyregistry.org] <br />
* Center for Open Science - [https://cos.io/prereg/]<br />
* AsPredicted - [https://aspredicted.org/]<br />
* PROSPERO, a platform for registering a systematic review of animal studies - [https://www.crd.york.ac.uk/prospero/#guidancenotes_animals]<br />
* OSF for Registered Reports - [https://osf.io/rr/]<br />
<br />
Publications:<br />
* Nosek BA, Ebersole CR, DeHaven AC et al. (2018) The preregistration revolution. Proc Natl Acad Sci U S A. 115:2600-2606. PubMed [https://www.ncbi.nlm.nih.gov/pubmed/?term=the+preregistr%E2%80%8Bation+revolution+and+Nosek]<br />
* De Vries RBM, Hooijmans CR, Langendam MW, et al. (2015) A protocol format for the preparation, registration and publication of systematic reviews of animal intervention studies. Evid Based Preclin Med. 2:1–9. [https://onlinelibrary.wiley.com/doi/full/10.1002/ebm2.7]<br />
* Chambers C (2014) Registered Reports: A change in scientific publishing<br />
<br />
<br />
----------------<br />
back to [[Toolbox]]<br />
<br />
Next item: [[2.2.1 Use of SOPs for standard experiments]]</div>85.216.81.116http://eqipd-toolbox.paasp.net/index.php?title=2.1.6_Sample_size_and_power_analysis&diff=6382.1.6 Sample size and power analysis2020-10-29T08:40:42Z<p>85.216.81.116: /* C. Resources */</p>
<hr />
<div><br />
<br />
UNDER CONSTRUCTION<br />
<br />
<br />
<br />
<br />
<br />
== A. Background & Definitions ==<br />
<br />
Statistical power is defined as the probability of detecting a statistically significant effect of a pre-specified size. Formally, power is equal to 1 minus the Type II error rate (beta or ß). The Type II error rate is the probability of obtaining a non-significant result when the null hypothesis is false — in other words, failing to find a difference or relationship when one exists.<br />
<br />
Balancing sample size, effect size and power is critical to good study design. When the power is low, only large effects can be detected, and negative results cannot be reliably interpreted. The consequences of low power are particularly dire in the search for high-impact results, when the researcher may be willing to pursue low-likelihood hypotheses for a groundbreaking discovery (see Fig. 1 in [https://www.nature.com/articles/nmeth.2738 Krzywinski & Altman 2013]). Ensuring that sample sizes are large enough to detect the effects of interest is an essential part of study design.<br />
<br />
Studies with inadequate power are a waste of research resources and arguably unethical when subjects are exposed to potentially harmful or inferior experimental conditions.<br />
<br />
Statistical power analysis exploits the relationships among the four variables involved in statistical inference: sample size (N), significance criterion (α), effect size (ES), and statistical power.<br />
<br />
== B. Guidance & Expectations ==<br />
<br />
General advice - DO:<br />
<br />
* Whenever possible, seek professional biostatistician support to estimate sample size.<br />
* Use power prospectively for planning future studies.<br />
* Put science before statistics. It is easy to get caught up in statistical significance and such; but studies should be designed to meet scientific goals, and you need to keep those in sight at all times (in planning and analysis). The appropriate inputs to power/sample-size calculations are effect sizes that are deemed scientifically important, based on careful considerations of the underlying scientific (not statistical) goals of the study. Statistical considerations are used to identify a plan that is effective in meeting scientific goals – not the other way around.<br />
* Do pilot studies. Investigators tend to try to answer all the world’s questions with one study. However, you usually cannot do a definitive study in one step. It is far better to work incrementally. A pilot study helps you establish procedures, understand and protect against things that can go wrong, and obtain variance estimates needed in determining sample size. A pilot study with 20-30 degrees of freedom for error is generally adequate for obtaining reasonably reliable sample-size estimates.<br />
* Generate sample size estimates for a range of power and effect size values to explore the gains and losses in power or detectable effect size due to increasing or decreasing n. This is why the term ‘sample size estimation’ is often preferred over ‘sample size calculation’. Although the arrival at a number for the required sample size is invariably based on (often complex) formulae, the term ‘calculation’ implies an unwarranted degree of precision. The purpose of sample size estimation is not to give an exact number but rather to subject the study design to scrutiny, including an assessment of the validity and reliability of data collection ([https://www.sciencedirect.com/science/article/pii/S1466853X05000714 Batterham & Atkinson 2005]).<br />
* Remember to consider attrition rate (i.e. possibility that some subjects or samples are lost during the conduct of the study or follow-up for technical and other data analysis-unrelated reasons)<br />
<br />
<br />
General advice - DO NOT:<br />
<br />
* Avoid using the definition of “small,” “medium,” or “large” effect size based on Cohen's d of .20, .50, or .80, respectively. Cohen's assessments are based on an extensive survey of statistics reported in the literature in the social sciences and may not apply to other fields of science. Further, this method uses a standardized effect size as the goal. Think about it: for a “medium” effect size, you’ll choose the same n regardless of the accuracy or reliability of your instrument, or the narrowness or diversity of your subjects. Clearly, important considerations are being ignored here. “Medium” is definitely not the message!<br />
<br />
* Retrospective power calculations should be avoided, because they add no new information to an analysis (i.e. avoid using observed power to interpret the results of the statistical test). You’ve got the data, did the analysis, and did not achieve “significance.” So you compute power retrospectively to see if the test was powerful enough or not. This is an empty question. Of course it wasn’t powerful enough – that’s why the result isn’t significant. Power calculations are useful for design, not analysis. <br />
<br />
<br />
Guidance on sample size estimation:<br />
<br />
--- to be added / revised (please do not edit - placeholder) ---<br />
<br />
.. getting a solid grip on the existing literature in one's topic, drilling down to what effects were identified and obtaining the corresponding ES values either directly from the publication or from appropriate calculations based on the printed documentation.<br />
<br />
.. being sure that the estimate you obtain is the one that fits the study design correctly; one cannot necessarily generalize across disparate research designs.<br />
<br />
.. and citing the algorithm or software used to generate the estimates. A power calculation result given without this detail can be viewed with suspicion.<br />
<br />
--- to be added / revised (please do not edit - placeholder) ---<br />
<br />
<br />
<br />
What to do if you have no choice about sample size:<br />
<br />
Limited budget, limited supply of research materials, or a difficult-to-overcome guidance from a collaborator, a funder or a senior colleagues may leave no choice but to consider running a study with a certain potentially small sample size. What can be done in such situations?<br />
<br />
* consider study designs involving correlated data (e.g. repeated measures, crossover or matched-pairs designs) that are usually associated with greater statistical power than those involving separate samples allocated to different treatment groups ([[https://www.sciencedirect.com/science/article/pii/S1466853X05000714 see section 2.1 here]).<br />
* consider intervening variables or pre-intervention measurements for stratification; if not possible, one can still improve statistical power by entering these variables as covariates in the analysis (this approach has its limitations and therefore should be consulted with the statisticians)<br />
* make sure that the most suited randomization schedule is used to control for random influences<br />
* explore and engage all other means to minimize variation (including using properly maintained and calibrated research instruments, adequate and well controlled environmental conditions, making sure that experiments are performed by competent and adequately trained scientists)<br />
* if a study has low power because of the given sample size, reflect this limitation in the study plan and indicate to all stakeholders that the study cannot be run as knowledge-claiming (decision-enabling, confirmatory).<br />
* evaluate power not only for the given sample size for also for the values around and discuss the impact of the sample size on power with the stakeholders - in some cases, it may help to lift or revise the original sample size restrictions. These discussions make sense and are justifiable only if they take place prior to the conduct of the study (i.e. not post hoc).<br />
<br />
== C. Resources ==<br />
<br />
Tools to sample size estimation:<br />
<br />
* [http://www.gpower.hhu.de/ G*Power]<br />
* [https://wise1.cgu.edu/power/index.asp WISE power tutorial]<br />
* [http://davidmlane.com/hyperstat/power.html JAVA applets for power and sample size]<br />
* [https://www.psychometrica.de/effect_size.html Computation of sample sizes @Psychometrica]<br />
* [http://powerandsamplesize.com/Calculators/ Overview of sample size and power calculators]<br />
<br />
<br />
Educational instruments and resources:<br />
<br />
* Mayo clinical online simulator - Size matters [https://rtools.mayo.edu/size_matters/]<br />
* Scientists talking to biostatisticians [https://www.youtube.com/watch?v=PbODigCZqL8&feature=youtu.be]<br />
<br />
<br />
Useful literature (for non-statisticians):<br />
<br />
* [https://stat.uiowa.edu/sites/stat.uiowa.edu/files/techrep/tr303.pdf Practical advice on sample size estimation by Russell Lenth]<br />
* [https://www.sciencedirect.com/science/article/pii/S1466853X05000714 A primer on murky world of sample size estimation by Alan Batterham & Greg Atkinson]<br />
<br />
Useful literature:<br />
<br />
[https://pdfs.semanticscholar.org/1325/24bdfe70504fcd67016b17305ccddb4bcd14.pdf Power in various ANOVA designs by Joel Levin]<br />
<br />
* [https://www.ncbi.nlm.nih.gov/books/NBK43321/ https://www.ncbi.nlm.nih.gov/books/NBK43321/]<br />
* [http://davidmlane.com/hyperstat/power.html http://davidmlane.com/hyperstat/power.html]<br />
* [http://powerandsamplesize.com/Calculators/Test-1-Mean/1-Sample-Equality http://powerandsamplesize.com/Calculators]<br />
<br />
Guidelines on reporting of sample size (in vivo research):<br />
* [[ARRIVE 2.0]] <br />
<br />
<br />
----------------<br />
back to [[Toolbox]]<br />
<br />
Next item: [[2.1.7 Blinding]]</div>85.216.81.116http://eqipd-toolbox.paasp.net/index.php?title=2.1.6_Sample_size_and_power_analysis&diff=6332.1.6 Sample size and power analysis2020-10-22T06:51:17Z<p>85.216.81.116: /* B. Guidance & Expectations */</p>
<hr />
<div><br />
<br />
UNDER CONSTRUCTION<br />
<br />
<br />
<br />
<br />
<br />
== A. Background & Definitions ==<br />
<br />
Statistical power is defined as the probability of detecting a statistically significant effect of a pre-specified size. Formally, power is equal to 1 minus the Type II error rate (beta or ß). The Type II error rate is the probability of obtaining a non-significant result when the null hypothesis is false — in other words, failing to find a difference or relationship when one exists.<br />
<br />
Balancing sample size, effect size and power is critical to good study design. When the power is low, only large effects can be detected, and negative results cannot be reliably interpreted. The consequences of low power are particularly dire in the search for high-impact results, when the researcher may be willing to pursue low-likelihood hypotheses for a groundbreaking discovery (see Fig. 1 in [https://www.nature.com/articles/nmeth.2738 Krzywinski & Altman 2013]). Ensuring that sample sizes are large enough to detect the effects of interest is an essential part of study design.<br />
<br />
Studies with inadequate power are a waste of research resources and arguably unethical when subjects are exposed to potentially harmful or inferior experimental conditions.<br />
<br />
Statistical power analysis exploits the relationships among the four variables involved in statistical inference: sample size (N), significance criterion (α), effect size (ES), and statistical power.<br />
<br />
== B. Guidance & Expectations ==<br />
<br />
General advice - DO:<br />
<br />
* Whenever possible, seek professional biostatistician support to estimate sample size.<br />
* Use power prospectively for planning future studies.<br />
* Put science before statistics. It is easy to get caught up in statistical significance and such; but studies should be designed to meet scientific goals, and you need to keep those in sight at all times (in planning and analysis). The appropriate inputs to power/sample-size calculations are effect sizes that are deemed scientifically important, based on careful considerations of the underlying scientific (not statistical) goals of the study. Statistical considerations are used to identify a plan that is effective in meeting scientific goals – not the other way around.<br />
* Do pilot studies. Investigators tend to try to answer all the world’s questions with one study. However, you usually cannot do a definitive study in one step. It is far better to work incrementally. A pilot study helps you establish procedures, understand and protect against things that can go wrong, and obtain variance estimates needed in determining sample size. A pilot study with 20-30 degrees of freedom for error is generally adequate for obtaining reasonably reliable sample-size estimates.<br />
* Generate sample size estimates for a range of power and effect size values to explore the gains and losses in power or detectable effect size due to increasing or decreasing n. This is why the term ‘sample size estimation’ is often preferred over ‘sample size calculation’. Although the arrival at a number for the required sample size is invariably based on (often complex) formulae, the term ‘calculation’ implies an unwarranted degree of precision. The purpose of sample size estimation is not to give an exact number but rather to subject the study design to scrutiny, including an assessment of the validity and reliability of data collection ([https://www.sciencedirect.com/science/article/pii/S1466853X05000714 Batterham & Atkinson 2005]).<br />
* Remember to consider attrition rate (i.e. possibility that some subjects or samples are lost during the conduct of the study or follow-up for technical and other data analysis-unrelated reasons)<br />
<br />
<br />
General advice - DO NOT:<br />
<br />
* Avoid using the definition of “small,” “medium,” or “large” effect size based on Cohen's d of .20, .50, or .80, respectively. Cohen's assessments are based on an extensive survey of statistics reported in the literature in the social sciences and may not apply to other fields of science. Further, this method uses a standardized effect size as the goal. Think about it: for a “medium” effect size, you’ll choose the same n regardless of the accuracy or reliability of your instrument, or the narrowness or diversity of your subjects. Clearly, important considerations are being ignored here. “Medium” is definitely not the message!<br />
<br />
* Retrospective power calculations should be avoided, because they add no new information to an analysis (i.e. avoid using observed power to interpret the results of the statistical test). You’ve got the data, did the analysis, and did not achieve “significance.” So you compute power retrospectively to see if the test was powerful enough or not. This is an empty question. Of course it wasn’t powerful enough – that’s why the result isn’t significant. Power calculations are useful for design, not analysis. <br />
<br />
<br />
Guidance on sample size estimation:<br />
<br />
--- to be added / revised (please do not edit - placeholder) ---<br />
<br />
.. getting a solid grip on the existing literature in one's topic, drilling down to what effects were identified and obtaining the corresponding ES values either directly from the publication or from appropriate calculations based on the printed documentation.<br />
<br />
.. being sure that the estimate you obtain is the one that fits the study design correctly; one cannot necessarily generalize across disparate research designs.<br />
<br />
.. and citing the algorithm or software used to generate the estimates. A power calculation result given without this detail can be viewed with suspicion.<br />
<br />
--- to be added / revised (please do not edit - placeholder) ---<br />
<br />
<br />
<br />
What to do if you have no choice about sample size:<br />
<br />
Limited budget, limited supply of research materials, or a difficult-to-overcome guidance from a collaborator, a funder or a senior colleagues may leave no choice but to consider running a study with a certain potentially small sample size. What can be done in such situations?<br />
<br />
* consider study designs involving correlated data (e.g. repeated measures, crossover or matched-pairs designs) that are usually associated with greater statistical power than those involving separate samples allocated to different treatment groups ([[https://www.sciencedirect.com/science/article/pii/S1466853X05000714 see section 2.1 here]).<br />
* consider intervening variables or pre-intervention measurements for stratification; if not possible, one can still improve statistical power by entering these variables as covariates in the analysis (this approach has its limitations and therefore should be consulted with the statisticians)<br />
* make sure that the most suited randomization schedule is used to control for random influences<br />
* explore and engage all other means to minimize variation (including using properly maintained and calibrated research instruments, adequate and well controlled environmental conditions, making sure that experiments are performed by competent and adequately trained scientists)<br />
* if a study has low power because of the given sample size, reflect this limitation in the study plan and indicate to all stakeholders that the study cannot be run as knowledge-claiming (decision-enabling, confirmatory).<br />
* evaluate power not only for the given sample size for also for the values around and discuss the impact of the sample size on power with the stakeholders - in some cases, it may help to lift or revise the original sample size restrictions. These discussions make sense and are justifiable only if they take place prior to the conduct of the study (i.e. not post hoc).<br />
<br />
== C. Resources ==<br />
<br />
Tools to sample size estimation:<br />
<br />
* [http://www.gpower.hhu.de/ G*Power]<br />
* [https://wise1.cgu.edu/power/index.asp WISE power tutorial]<br />
* [http://davidmlane.com/hyperstat/power.html JAVA applets for power and sample size]<br />
* [https://www.psychometrica.de/effect_size.html Computation of sample sizes @Psychometrica]<br />
* [http://powerandsamplesize.com/Calculators/ Overview of sample size and power calculators]<br />
<br />
<br />
Educational instruments and resources:<br />
<br />
* Mayo clinical online simulator - Size matters [https://rtools.mayo.edu/size_matters/]<br />
* Scientists talking to biostatisticians [https://www.youtube.com/watch?v=PbODigCZqL8&feature=youtu.be]<br />
<br />
<br />
Useful literature (for non-statisticians):<br />
<br />
* [https://stat.uiowa.edu/sites/stat.uiowa.edu/files/techrep/tr303.pdf Practical advice on sample size estimation by Russell Lenth]<br />
* [https://www.sciencedirect.com/science/article/pii/S1466853X05000714 A primer on murky world of sample size estimation by Alan Batterham & Greg Atkinson]<br />
<br />
Useful literature:<br />
<br />
[https://pdfs.semanticscholar.org/1325/24bdfe70504fcd67016b17305ccddb4bcd14.pdf Power in various ANOVA designs by Joel Levin]<br />
<br />
* [https://www.ncbi.nlm.nih.gov/books/NBK43321/ https://www.ncbi.nlm.nih.gov/books/NBK43321/]<br />
* [http://davidmlane.com/hyperstat/power.html http://davidmlane.com/hyperstat/power.html]<br />
* [http://powerandsamplesize.com/Calculators/Test-1-Mean/1-Sample-Equality http://powerandsamplesize.com/Calculators/Test-1-Mean/1-Sample-Equality]<br />
<br />
Guidelines on reporting of sample size (in vivo research):<br />
* [[ARRIVE 2.0]] <br />
<br />
<br />
----------------<br />
back to [[Toolbox]]<br />
<br />
Next item: [[2.1.7 Blinding]]</div>85.216.81.116http://eqipd-toolbox.paasp.net/index.php?title=2.1.6_Sample_size_and_power_analysis&diff=6322.1.6 Sample size and power analysis2020-10-22T06:48:20Z<p>85.216.81.116: /* B. Guidance & Expectations */</p>
<hr />
<div><br />
<br />
UNDER CONSTRUCTION<br />
<br />
<br />
<br />
<br />
<br />
== A. Background & Definitions ==<br />
<br />
Statistical power is defined as the probability of detecting a statistically significant effect of a pre-specified size. Formally, power is equal to 1 minus the Type II error rate (beta or ß). The Type II error rate is the probability of obtaining a non-significant result when the null hypothesis is false — in other words, failing to find a difference or relationship when one exists.<br />
<br />
Balancing sample size, effect size and power is critical to good study design. When the power is low, only large effects can be detected, and negative results cannot be reliably interpreted. The consequences of low power are particularly dire in the search for high-impact results, when the researcher may be willing to pursue low-likelihood hypotheses for a groundbreaking discovery (see Fig. 1 in [https://www.nature.com/articles/nmeth.2738 Krzywinski & Altman 2013]). Ensuring that sample sizes are large enough to detect the effects of interest is an essential part of study design.<br />
<br />
Studies with inadequate power are a waste of research resources and arguably unethical when subjects are exposed to potentially harmful or inferior experimental conditions.<br />
<br />
Statistical power analysis exploits the relationships among the four variables involved in statistical inference: sample size (N), significance criterion (α), effect size (ES), and statistical power.<br />
<br />
== B. Guidance & Expectations ==<br />
<br />
General advice - DO:<br />
<br />
* Whenever possible, seek professional biostatistician support to estimate sample size.<br />
* Use power prospectively for planning future studies.<br />
* Put science before statistics. It is easy to get caught up in statistical significance and such; but studies should be designed to meet scientific goals, and you need to keep those in sight at all times (in planning and analysis). The appropriate inputs to power/sample-size calculations are effect sizes that are deemed scientifically important, based on careful considerations of the underlying scientific (not statistical) goals of the study. Statistical considerations are used to identify a plan that is effective in meeting scientific goals – not the other way around.<br />
* Do pilot studies. Investigators tend to try to answer all the world’s questions with one study. However, you usually cannot do a definitive study in one step. It is far better to work incrementally. A pilot study helps you establish procedures, understand and protect against things that can go wrong, and obtain variance estimates needed in determining sample size. A pilot study with 20-30 degrees of freedom for error is generally adequate for obtaining reasonably reliable sample-size estimates.<br />
* Generate sample size estimates for a range of power and effect size values to explore the gains and losses in power or detectable effect size due to increasing or decreasing n. This is why the term ‘sample size estimation’ is often preferred over ‘sample size calculation’. Although the arrival at a number for the required sample size is invariably based on (often complex) formulae, the term ‘calculation’ implies an unwarranted degree of precision. The purpose of sample size estimation is not to give an exact number but rather to subject the study design to scrutiny, including an assessment of the validity and reliability of data collection ([https://www.sciencedirect.com/science/article/pii/S1466853X05000714 Batterham & Atkinson 2005]).<br />
* Remember to consider attrition rate (i.e. possibility that some subjects or samples are lost during the conduct of the study or follow-up for technical and other data analysis-unrelated reasons)<br />
<br />
<br />
General advice - DO NOT:<br />
<br />
* Avoid using the definition of “small,” “medium,” or “large” effect size based on Cohen's d of .20, .50, or .80, respectively. Cohen's assessments are based on an extensive survey of statistics reported in the literature in the social sciences and may not apply to other fields of science. Further, this method uses a standardized effect size as the goal. Think about it: for a “medium” effect size, you’ll choose the same n regardless of the accuracy or reliability of your instrument, or the narrowness or diversity of your subjects. Clearly, important considerations are being ignored here. “Medium” is definitely not the message!<br />
<br />
* Retrospective power calculations should be avoided, because they add no new information to an analysis (i.e. avoid using observed power to interpret the results of the statistical test). You’ve got the data, did the analysis, and did not achieve “significance.” So you compute power retrospectively to see if the test was powerful enough or not. This is an empty question. Of course it wasn’t powerful enough – that’s why the result isn’t significant. Power calculations are useful for design, not analysis. <br />
<br />
<br />
Guidance on sample size estimation:<br />
<br />
--- to be added / revised (please do not edit - placeholder) ---<br />
<br />
.. getting a solid grip on the existing literature in one's topic, drilling down to what effects were identified and obtaining the corresponding ES values either directly from the publication or from appropriate calculations based on the printed documentation.<br />
<br />
.. being sure that the estimate you obtain is the one that fits the study design correctly; one cannot necessarily generalize across disparate research designs.<br />
<br />
.. and citing the algorithm or software used to generate the estimates. A power calculation result given without this detail can be viewed with suspicion.<br />
<br />
--- to be added / revised (please do not edit - placeholder) ---<br />
<br />
<br />
<br />
What to do if you have no choice about sample size:<br />
<br />
Limited budget, limited supply of research materials, or a difficult-to-overcome guidance from a collaborator, a funder or a senior colleagues may leave no choice but to consider running a study with a certain potentially small sample size. What can be done in such situations?<br />
<br />
* consider study designs involving correlated data (e.g. repeated measures, crossover or matched-pairs designs) that are usually associated with greater statistical power than those involving separate samples allocated to different treatment groups ([[https://www.sciencedirect.com/science/article/pii/S1466853X05000714 see section 2.1 here]).<br />
* consider intervening variables or pre-intervention measurements for stratification; if not possible, one can still improve statistical power by entering these variables as covariates in the analysis (this approach has its limitations and therefore should be consulted with the statisticians)<br />
* make sure that the most suited randomization schedule is used to control for random influences<br />
* explore and engage all other means to minimize variation (including using properly maintained and calibrated research instruments, adequate and well controlled environmental conditions, making sure that experiments are performed by competent and adequately trained scientists)<br />
* if a study has low power because of the given sample size, reflect this limitation in the study plan and indicate to all stakeholders that the study cannot be run as knowledge-claiming (decision-enabling, confirmatory).<br />
* evaluate power not only for the given sample size for also for the values around and discuss the impact of the sample size on power with the stakeholders - in some cases, it may help reconsider the original sample size restrictions.<br />
<br />
== C. Resources ==<br />
<br />
Tools to sample size estimation:<br />
<br />
* [http://www.gpower.hhu.de/ G*Power]<br />
* [https://wise1.cgu.edu/power/index.asp WISE power tutorial]<br />
* [http://davidmlane.com/hyperstat/power.html JAVA applets for power and sample size]<br />
* [https://www.psychometrica.de/effect_size.html Computation of sample sizes @Psychometrica]<br />
* [http://powerandsamplesize.com/Calculators/ Overview of sample size and power calculators]<br />
<br />
<br />
Educational instruments and resources:<br />
<br />
* Mayo clinical online simulator - Size matters [https://rtools.mayo.edu/size_matters/]<br />
* Scientists talking to biostatisticians [https://www.youtube.com/watch?v=PbODigCZqL8&feature=youtu.be]<br />
<br />
<br />
Useful literature (for non-statisticians):<br />
<br />
* [https://stat.uiowa.edu/sites/stat.uiowa.edu/files/techrep/tr303.pdf Practical advice on sample size estimation by Russell Lenth]<br />
* [https://www.sciencedirect.com/science/article/pii/S1466853X05000714 A primer on murky world of sample size estimation by Alan Batterham & Greg Atkinson]<br />
<br />
Useful literature:<br />
<br />
[https://pdfs.semanticscholar.org/1325/24bdfe70504fcd67016b17305ccddb4bcd14.pdf Power in various ANOVA designs by Joel Levin]<br />
<br />
* [https://www.ncbi.nlm.nih.gov/books/NBK43321/ https://www.ncbi.nlm.nih.gov/books/NBK43321/]<br />
* [http://davidmlane.com/hyperstat/power.html http://davidmlane.com/hyperstat/power.html]<br />
* [http://powerandsamplesize.com/Calculators/Test-1-Mean/1-Sample-Equality http://powerandsamplesize.com/Calculators/Test-1-Mean/1-Sample-Equality]<br />
<br />
Guidelines on reporting of sample size (in vivo research):<br />
* [[ARRIVE 2.0]] <br />
<br />
<br />
----------------<br />
back to [[Toolbox]]<br />
<br />
Next item: [[2.1.7 Blinding]]</div>85.216.81.116http://eqipd-toolbox.paasp.net/index.php?title=2.1.6_Sample_size_and_power_analysis&diff=6292.1.6 Sample size and power analysis2020-10-21T06:24:36Z<p>85.216.81.116: /* B. Guidance & Expectations */</p>
<hr />
<div><br />
<br />
UNDER CONSTRUCTION<br />
<br />
<br />
<br />
<br />
<br />
== A. Background & Definitions ==<br />
<br />
Statistical power is defined as the probability of detecting as statistically significant an effect of a pre-specified size. Formally, power is equal to 1 minus the Type II error rate (beta or ß). The Type II error rate is the probability of obtaining a non-significant result when the null hypothesis is false — in other words, failing to find a difference or relationship when one exists.<br />
<br />
Balancing sample size, effect size and power is critical to good study design. When the power is low, only large effects can be detected, and negative results cannot be reliably interpreted. The consequences of low power are particularly dire in the search for high-impact results, when the researcher may be willing to pursue low-likelihood hypotheses for a groundbreaking discovery (see Fig. 1 in [https://www.nature.com/articles/nmeth.2738 Krzywinski & Altman 2013]). Ensuring that sample sizes are large enough to detect the effects of interest is an essential part of study design.<br />
<br />
Studies with inadequate power are a waste of research resources and arguably unethical when subjects are exposed to potentially harmful or inferior experimental conditions.<br />
<br />
Statistical power analysis exploits the relationships among the four variables involved in statistical inference: sample size (N), significance criterion (α), effect size (ES), and statistical power.<br />
<br />
== B. Guidance & Expectations ==<br />
<br />
General advice - DO:<br />
<br />
* Whenever possible, seek professional biostatistician support to estimate sample size.<br />
* Use power prospectively for planning future studies.<br />
* Put science before statistics. It is easy to get caught up in statistical significance and such; but studies should be designed to meet scientific goals, and you need to keep those in sight at all times (in planning and analysis). The appropriate inputs to power/sample-size calculations are effect sizes that are deemed scientifically important, based on careful considerations of the underlying scientific (not statistical) goals of the study. Statistical considerations are used to identify a plan that is effective in meeting scientific goals – not the other way around.<br />
* Do pilot studies. Investigators tend to try to answer all the world’s questions with one study. However, you usually cannot do a definitive study in one step. It is far better to work incrementally. A pilot study helps you establish procedures, understand and protect against things that can go wrong, and obtain variance estimates needed in determining sample size. A pilot study with 20-30 degrees of freedom for error is generally adequate for obtaining reasonably reliable sample-size estimates.<br />
* Effect size should be specified on the actual scale of measurement, not on a standardized scale.<br />
* Generate sample size estimates for a range of power and effect size values to explore the gains and and losses in power or detectable effect size due to increasing or decreasing n. This is why the term ‘sample size estimation’ is often preferred over ‘sample size calculation’. Although the arrival at a number for the required sample size is invariably based on (often complex) formulae, the term ‘calculation’ implies an unwarranted degree of precision. The purpose of sample size estimation is not to give an exact number but rather to subject the study design to scrutiny, including an assessment of the validity and reliability of data collection ([https://www.sciencedirect.com/science/article/pii/S1466853X05000714 Batterham & Atkinson 2005]).<br />
* Remember to consider attrition rate (i.e. possibility that some subjects or samples are lost during the conduct of the study or follow-up for technical and other data analysis-unrelated reasons)<br />
<br />
<br />
General advice - DO NOT:<br />
<br />
* Avoid using the definition of “small,” “medium,” or “large” effect size based on Cohen's d of .20, .50, or .80, respectively. Cohen's assessments are based on an extensive survey of statistics reported in the literature in the social sciences and may not apply to other fields of science. Further, this method uses a standardized effect size as the goal. Think about it: for a “medium” effect size, you’ll choose the same n regardless of the accuracy or reliability of your instrument, or the narrowness or diversity of your subjects. Clearly, important considerations are being ignored here. “Medium” is definitely not the message!<br />
<br />
* Retrospective power calculations should be avoided, because they add no new information to an analysis (i.e. avoid using observed power to interpret the results of the statistical test). You’ve got the data, did the analysis, and did not achieve “significance.” So you compute power retrospectively to see if the test was powerful enough or not. This is an empty question. Of course it wasn’t powerful enough – that’s why the result isn’t significant. Power calculations are useful for design, not analysis. <br />
<br />
<br />
Guidance on sample size estimation:<br />
<br />
--- to be added / revised (please do not edit - placeholder) ---<br />
<br />
.. getting a solid grip on the existing literature in one's topic, drilling down to what effects were identified and obtaining the corresponding ES values either directly from the publication or from appropriate calculations based on the printed documentation.<br />
<br />
.. being sure that the estimate you obtain is the one that fits the study design correctly; one cannot necessarily generalize across disparate research designs.<br />
<br />
.. and citing the algorithm or software used to generate the estimates. A power calculation result given without this detail can be viewed with suspicion.<br />
<br />
--- to be added / revised (please do not edit - placeholder) ---<br />
<br />
<br />
<br />
What to do if you have no choice about sample size:<br />
<br />
Limited budget, limited supply of research materials, or a difficult-to-overcome guidance from a collaborator, a funder or a senior colleagues may leave no choice but to consider running a study with a certain potentially small sample size. What can be done in such situations?<br />
<br />
* consider study designs involving correlated data (e.g. repeated measures, crossover or matched-pairs designs) that are usually associated with greater statistical power than those involving separate samples allocated to different treatment groups ([[https://www.sciencedirect.com/science/article/pii/S1466853X05000714 see section 2.1 here]).<br />
* consider intervening variables or pre-intervention measurements for stratification; if not possible, one can still improve statistical power by entering these variables as covariates in the analysis (this approach has its limitations and therefore should be consulted with the statisticians)<br />
* make sure that the most suited randomization schedule is used to control for random influences<br />
* explore and engage all other means to minimize variation (including using properly maintained and calibrated research instruments, adequate and well controlled environmental conditions, making sure that experiments are performed by competent and adequately trained scientists)<br />
* if a study has low power because of the given sample size, reflect this limitation in the study plan and indicate to all stakehodlers that the study cannot be run as knowledge-claiming (decision-enabling, confirmatory).<br />
* evaluate power not only for the given sample size for also for the values around and discuss the impact of the sample size on power with the stakeholders - in some cases, it may help reconsider the original sample size restrictions.<br />
<br />
== C. Resources ==<br />
<br />
Tools to sample size estimation:<br />
<br />
* [http://www.gpower.hhu.de/ G*Power]<br />
* [https://wise1.cgu.edu/power/index.asp WISE power tutorial]<br />
* [http://davidmlane.com/hyperstat/power.html JAVA applets for power and sample size]<br />
* [https://www.psychometrica.de/effect_size.html Computation of sample sizes @Psychometrica]<br />
* [http://powerandsamplesize.com/Calculators/ Overview of sample size and power calculators]<br />
<br />
<br />
Educational instruments and resources:<br />
<br />
* Mayo clinical online simulator - Size matters [https://rtools.mayo.edu/size_matters/]<br />
* Scientists talking to biostatisticians [https://www.youtube.com/watch?v=PbODigCZqL8&feature=youtu.be]<br />
<br />
<br />
Useful literature (for non-statisticians):<br />
<br />
* [https://stat.uiowa.edu/sites/stat.uiowa.edu/files/techrep/tr303.pdf Practical advice on sample size estimation by Russell Lenth]<br />
* [https://www.sciencedirect.com/science/article/pii/S1466853X05000714 A primer on murky world of sample size estimation by Alan Batterham & Greg Atkinson]<br />
<br />
Useful literature:<br />
<br />
[https://pdfs.semanticscholar.org/1325/24bdfe70504fcd67016b17305ccddb4bcd14.pdf Power in various ANOVA designs by Joel Levin]<br />
<br />
* [https://www.ncbi.nlm.nih.gov/books/NBK43321/ https://www.ncbi.nlm.nih.gov/books/NBK43321/]<br />
* [http://davidmlane.com/hyperstat/power.html http://davidmlane.com/hyperstat/power.html]<br />
* [http://powerandsamplesize.com/Calculators/Test-1-Mean/1-Sample-Equality http://powerandsamplesize.com/Calculators/Test-1-Mean/1-Sample-Equality]<br />
<br />
Guidelines on reporting of sample size (in vivo research):<br />
* [[ARRIVE 2.0]] <br />
<br />
<br />
----------------<br />
back to [[Toolbox]]<br />
<br />
Next item: [[2.1.7 Blinding]]</div>85.216.81.116http://eqipd-toolbox.paasp.net/index.php?title=2.1.6_Sample_size_and_power_analysis&diff=6282.1.6 Sample size and power analysis2020-10-19T15:45:53Z<p>85.216.81.116: /* B. Guidance & Expectations */</p>
<hr />
<div><br />
<br />
UNDER CONSTRUCTION<br />
<br />
<br />
<br />
<br />
<br />
== A. Background & Definitions ==<br />
<br />
Statistical power is defined as the probability of detecting as statistically significant an effect of a pre-specified size. Formally, power is equal to 1 minus the Type II error rate (beta or ß). The Type II error rate is the probability of obtaining a non-significant result when the null hypothesis is false — in other words, failing to find a difference or relationship when one exists.<br />
<br />
Balancing sample size, effect size and power is critical to good study design. When the power is low, only large effects can be detected, and negative results cannot be reliably interpreted. The consequences of low power are particularly dire in the search for high-impact results, when the researcher may be willing to pursue low-likelihood hypotheses for a groundbreaking discovery (see Fig. 1 in [https://www.nature.com/articles/nmeth.2738 Krzywinski & Altman 2013]). Ensuring that sample sizes are large enough to detect the effects of interest is an essential part of study design.<br />
<br />
Studies with inadequate power are a waste of research resources and arguably unethical when subjects are exposed to potentially harmful or inferior experimental conditions.<br />
<br />
Statistical power analysis exploits the relationships among the four variables involved in statistical inference: sample size (N), significance criterion (α), effect size (ES), and statistical power.<br />
<br />
== B. Guidance & Expectations ==<br />
<br />
General advice - DO:<br />
<br />
* Whenever possible, seek professional biostatistician support to estimate sample size.<br />
* Use power prospectively for planning future studies.<br />
* Put science before statistics. It is easy to get caught up in statistical significance and such; but studies should be designed to meet scientific goals, and you need to keep those in sight at all times (in planning and analysis). The appropriate inputs to power/sample-size calculations are effect sizes that are deemed scientifically important, based on careful considerations of the underlying scientific (not statistical) goals of the study. Statistical considerations are used to identify a plan that is effective in meeting scientific goals – not the other way around.<br />
* Do pilot studies. Investigators tend to try to answer all the world’s questions with one study. However, you usually cannot do a definitive study in one step. It is far better to work incrementally. A pilot study helps you establish procedures, understand and protect against things that can go wrong, and obtain variance estimates needed in determining sample size. A pilot study with 20-30 degrees of freedom for error is generally adequate for obtaining reasonably reliable sample-size estimates.<br />
* Effect size should be specified on the actual scale of measurement, not on a standardized scale.<br />
* Generate sample size estimates for a range of power and effect size values to explore the gains and and losses in power or detectable effect size due to increasing or decreasing n. This is why the term ‘sample size estimation’ is often preferred over ‘sample size calculation’. Although the arrival at a number for the required sample size is invariably based on (often complex) formulae, the term ‘calculation’ implies an unwarranted degree of precision. The purpose of sample size estimation is not to give an exact number but rather to subject the study design to scrutiny, including an assessment of the validity and reliability of data collection ([https://www.sciencedirect.com/science/article/pii/S1466853X05000714 Batterham & Atkinson 2005]).<br />
<br />
<br />
General advice - DO NOT:<br />
<br />
* Avoid using the definition of “small,” “medium,” or “large” effect size based on Cohen's d of .20, .50, or .80, respectively. Cohen's assessments are based on an extensive survey of statistics reported in the literature in the social sciences and may not apply to other fields of science. Further, this method uses a standardized effect size as the goal. Think about it: for a “medium” effect size, you’ll choose the same n regardless of the accuracy or reliability of your instrument, or the narrowness or diversity of your subjects. Clearly, important considerations are being ignored here. “Medium” is definitely not the message!<br />
<br />
* Retrospective power calculations should be avoided, because they add no new information to an analysis (i.e. avoid using observed power to interpret the results of the statistical test). You’ve got the data, did the analysis, and did not achieve “significance.” So you compute power retrospectively to see if the test was powerful enough or not. This is an empty question. Of course it wasn’t powerful enough – that’s why the result isn’t significant. Power calculations are useful for design, not analysis. <br />
<br />
<br />
Guidance on sample size estimation:<br />
<br />
--- to be added / revised (please do not edit - placeholder) ---<br />
<br />
.. getting a solid grip on the existing literature in one's topic, drilling down to what effects were identified and obtaining the corresponding ES values either directly from the publication or from appropriate calculations based on the printed documentation.<br />
<br />
.. being sure that the estimate you obtain is the one that fits the study design correctly; one cannot necessarily generalize across disparate research designs.<br />
<br />
.. and citing the algorithm or software used to generate the estimates. A power calculation result given without this detail can be viewed with suspicion.<br />
<br />
--- to be added / revised (please do not edit - placeholder) ---<br />
<br />
<br />
<br />
What to do if you have no choice about sample size:<br />
<br />
Limited budget, limited supply of research materials, or a difficult-to-overcome guidance from a collaborator, a funder or a senior colleagues may leave no choice but to consider running a study with a certain potentially small sample size. What can be done in such situations?<br />
<br />
* consider study designs involving correlated data (e.g. repeated measures, crossover or matched-pairs designs) that are usually associated with greater statistical power than those involving separate samples allocated to different treatment groups ([[https://www.sciencedirect.com/science/article/pii/S1466853X05000714 see section 2.1 here]).<br />
* consider intervening variables or pre-intervention measurements for stratification; if not possible, one can still improve statistical power by entering these variables as covariates in the analysis (this approach has its limitations and therefore should be consulted with the statisticians)<br />
* make sure that the most suited randomization schedule is used to control for random influences<br />
* explore and engage all other means to minimize variation (including using properly maintained and calibrated research instruments, adequate and well controlled environmental conditions, making sure that experiments are performed by competent and adequately trained scientists)<br />
* if a study has low power because of the given sample size, reflect this limitation in the study plan and indicate to all stakehodlers that the study cannot be run as knowledge-claiming (decision-enabling, confirmatory).<br />
* evaluate power not only for the given sample size for also for the values around and discuss the impact of the sample size on power with the stakeholders - in some cases, it may help reconsider the original sample size restrictions.<br />
<br />
== C. Resources ==<br />
<br />
Tools to sample size estimation:<br />
<br />
* [http://www.gpower.hhu.de/ G*Power]<br />
* [https://wise1.cgu.edu/power/index.asp WISE power tutorial]<br />
* [http://davidmlane.com/hyperstat/power.html JAVA applets for power and sample size]<br />
* [https://www.psychometrica.de/effect_size.html Computation of sample sizes @Psychometrica]<br />
* [http://powerandsamplesize.com/Calculators/ Overview of sample size and power calculators]<br />
<br />
<br />
Educational instruments and resources:<br />
<br />
* Mayo clinical online simulator - Size matters [https://rtools.mayo.edu/size_matters/]<br />
* Scientists talking to biostatisticians [https://www.youtube.com/watch?v=PbODigCZqL8&feature=youtu.be]<br />
<br />
<br />
Useful literature (for non-statisticians):<br />
<br />
* [https://stat.uiowa.edu/sites/stat.uiowa.edu/files/techrep/tr303.pdf Practical advice on sample size estimation by Russell Lenth]<br />
* [https://www.sciencedirect.com/science/article/pii/S1466853X05000714 A primer on murky world of sample size estimation by Alan Batterham & Greg Atkinson]<br />
<br />
Useful literature:<br />
<br />
[https://pdfs.semanticscholar.org/1325/24bdfe70504fcd67016b17305ccddb4bcd14.pdf Power in various ANOVA designs by Joel Levin]<br />
<br />
* [https://www.ncbi.nlm.nih.gov/books/NBK43321/ https://www.ncbi.nlm.nih.gov/books/NBK43321/]<br />
* [http://davidmlane.com/hyperstat/power.html http://davidmlane.com/hyperstat/power.html]<br />
* [http://powerandsamplesize.com/Calculators/Test-1-Mean/1-Sample-Equality http://powerandsamplesize.com/Calculators/Test-1-Mean/1-Sample-Equality]<br />
<br />
Guidelines on reporting of sample size (in vivo research):<br />
* [[ARRIVE 2.0]] <br />
<br />
<br />
----------------<br />
back to [[Toolbox]]<br />
<br />
Next item: [[2.1.7 Blinding]]</div>85.216.81.116http://eqipd-toolbox.paasp.net/index.php?title=2.1.6_Sample_size_and_power_analysis&diff=6272.1.6 Sample size and power analysis2020-10-19T15:38:49Z<p>85.216.81.116: /* B. Guidance & Expectations */</p>
<hr />
<div><br />
<br />
UNDER CONSTRUCTION<br />
<br />
<br />
<br />
<br />
<br />
== A. Background & Definitions ==<br />
<br />
Statistical power is defined as the probability of detecting as statistically significant an effect of a pre-specified size. Formally, power is equal to 1 minus the Type II error rate (beta or ß). The Type II error rate is the probability of obtaining a non-significant result when the null hypothesis is false — in other words, failing to find a difference or relationship when one exists.<br />
<br />
Balancing sample size, effect size and power is critical to good study design. When the power is low, only large effects can be detected, and negative results cannot be reliably interpreted. The consequences of low power are particularly dire in the search for high-impact results, when the researcher may be willing to pursue low-likelihood hypotheses for a groundbreaking discovery (see Fig. 1 in [https://www.nature.com/articles/nmeth.2738 Krzywinski & Altman 2013]). Ensuring that sample sizes are large enough to detect the effects of interest is an essential part of study design.<br />
<br />
Studies with inadequate power are a waste of research resources and arguably unethical when subjects are exposed to potentially harmful or inferior experimental conditions.<br />
<br />
Statistical power analysis exploits the relationships among the four variables involved in statistical inference: sample size (N), significance criterion (α), effect size (ES), and statistical power.<br />
<br />
== B. Guidance & Expectations ==<br />
<br />
General advice - DO:<br />
<br />
* Whenever possible, seek professional biostatistician support to estimate sample size.<br />
* Use power prospectively for planning future studies.<br />
* Put science before statistics. It is easy to get caught up in statistical significance and such; but studies should be designed to meet scientific goals, and you need to keep those in sight at all times (in planning and analysis). The appropriate inputs to power/sample-size calculations are effect sizes that are deemed scientifically important, based on careful considerations of the underlying scientific (not statistical) goals of the study. Statistical considerations are used to identify a plan that is effective in meeting scientific goals – not the other way around.<br />
* Do pilot studies. Investigators tend to try to answer all the world’s questions with one study. However, you usually cannot do a definitive study in one step. It is far better to work incrementally. A pilot study helps you establish procedures, understand and protect against things that can go wrong, and obtain variance estimates needed in determining sample size. A pilot study with 20-30 degrees of freedom for error is generally adequate for obtaining reasonably reliable sample-size estimates.<br />
* Effect size should be specified on the actual scale of measurement, not on a standardized scale.<br />
* Generate sample size estimates for a range of power and effect size values to explore the gains and and losses in power or detectable effect size due to increasing or decreasing n. This is why the term ‘sample size estimation’ is often preferred over ‘sample size calculation’. Although the arrival at a number for the required sample size is invariably based on (often complex) formulae, the term ‘calculation’ implies an unwarranted degree of precision. The purpose of sample size estimation is not to give an exact number but rather to subject the study design to scrutiny, including an assessment of the validity and reliability of data collection ([https://www.sciencedirect.com/science/article/pii/S1466853X05000714 Batterham & Atkinson 2005]).<br />
<br />
<br />
General advice - DO NOT:<br />
<br />
* Avoid using the definition of “small,” “medium,” or “large” effect size based on Cohen's d of .20, .50, or .80, respectively. Cohen's assessments are based on an extensive survey of statistics reported in the literature in the social sciences and may not apply to other fields of science. Further, this method uses a standardized effect size as the goal. Think about it: for a “medium” effect size, you’ll choose the same n regardless of the accuracy or reliability of your instrument, or the narrowness or diversity of your subjects. Clearly, important considerations are being ignored here. “Medium” is definitely not the message!<br />
<br />
* Retrospective power calculations should be avoided, because they add no new information to an analysis (i.e. avoid using observed power to interpret the results of the statistical test). You’ve got the data, did the analysis, and did not achieve “significance.” So you compute power retrospectively to see if the test was powerful enough or not. This is an empty question. Of course it wasn’t powerful enough – that’s why the result isn’t significant. Power calculations are useful for design, not analysis. <br />
<br />
<br />
Guidance on sample size estimation:<br />
<br />
--- to be added ---<br />
<br />
<br />
<br />
What to do if you have no choice about sample size:<br />
<br />
Limited budget, limited supply of research materials, or a difficult-to-overcome guidance from a collaborator, a funder or a senior colleagues may leave no choice but to consider running a study with a certain potentially small sample size. What can be done in such situations?<br />
<br />
* consider study designs involving correlated data (e.g. repeated measures, crossover or matched-pairs designs) that are usually associated with greater statistical power than those involving separate samples allocated to different treatment groups ([[https://www.sciencedirect.com/science/article/pii/S1466853X05000714 see section 2.1 here]).<br />
* consider intervening variables or pre-intervention measurements for stratification; if not possible, one can still improve statistical power by entering these variables as covariates in the analysis (this approach has its limitations and therefore should be consulted with the statisticians)<br />
* make sure that the most suited randomization schedule is used to control for random influences<br />
* explore and engage all other means to minimize variation (including using properly maintained and calibrated research instruments, adequate and well controlled environmental conditions, making sure that experiments are performed by competent and adequately trained scientists)<br />
* if a study has low power because of the given sample size, reflect this limitation in the study plan and indicate to all stakehodlers that the study cannot be run as knowledge-claiming (decision-enabling, confirmatory).<br />
* evaluate power not only for the given sample size for also for the values around and discuss the impact of the sample size on power with the stakeholders - in some cases, it may help reconsider the original sample size restrictions.<br />
<br />
== C. Resources ==<br />
<br />
Tools to sample size estimation:<br />
<br />
* [http://www.gpower.hhu.de/ G*Power]<br />
* [https://wise1.cgu.edu/power/index.asp WISE power tutorial]<br />
* [http://davidmlane.com/hyperstat/power.html JAVA applets for power and sample size]<br />
* [https://www.psychometrica.de/effect_size.html Computation of sample sizes @Psychometrica]<br />
* [http://powerandsamplesize.com/Calculators/ Overview of sample size and power calculators]<br />
<br />
<br />
Educational instruments and resources:<br />
<br />
* Mayo clinical online simulator - Size matters [https://rtools.mayo.edu/size_matters/]<br />
* Scientists talking to biostatisticians [https://www.youtube.com/watch?v=PbODigCZqL8&feature=youtu.be]<br />
<br />
<br />
Useful literature (for non-statisticians):<br />
<br />
* [https://stat.uiowa.edu/sites/stat.uiowa.edu/files/techrep/tr303.pdf Practical advice on sample size estimation by Russell Lenth]<br />
* [https://www.sciencedirect.com/science/article/pii/S1466853X05000714 A primer on murky world of sample size estimation by Alan Batterham & Greg Atkinson]<br />
<br />
Useful literature:<br />
<br />
[https://pdfs.semanticscholar.org/1325/24bdfe70504fcd67016b17305ccddb4bcd14.pdf Power in various ANOVA designs by Joel Levin]<br />
<br />
* [https://www.ncbi.nlm.nih.gov/books/NBK43321/ https://www.ncbi.nlm.nih.gov/books/NBK43321/]<br />
* [http://davidmlane.com/hyperstat/power.html http://davidmlane.com/hyperstat/power.html]<br />
* [http://powerandsamplesize.com/Calculators/Test-1-Mean/1-Sample-Equality http://powerandsamplesize.com/Calculators/Test-1-Mean/1-Sample-Equality]<br />
<br />
Guidelines on reporting of sample size (in vivo research):<br />
* [[ARRIVE 2.0]] <br />
<br />
<br />
----------------<br />
back to [[Toolbox]]<br />
<br />
Next item: [[2.1.7 Blinding]]</div>85.216.81.116http://eqipd-toolbox.paasp.net/index.php?title=2.1.6_Sample_size_and_power_analysis&diff=6232.1.6 Sample size and power analysis2020-10-18T15:14:51Z<p>85.216.81.116: /* A. Background & Definitions */</p>
<hr />
<div><br />
<br />
UNDER CONSTRUCTION<br />
<br />
<br />
<br />
<br />
<br />
== A. Background & Definitions ==<br />
<br />
Statistical power is defined as the probability of detecting as statistically significant an effect of a pre-specified size. Formally, power is equal to 1 minus the Type II error rate (beta or ß). The Type II error rate is the probability of obtaining a non-significant result when the null hypothesis is false — in other words, failing to find a difference or relationship when one exists.<br />
<br />
Balancing sample size, effect size and power is critical to good study design. When the power is low, only large effects can be detected, and negative results cannot be reliably interpreted. The consequences of low power are particularly dire in the search for high-impact results, when the researcher may be willing to pursue low-likelihood hypotheses for a groundbreaking discovery (see Fig. 1 in [https://www.nature.com/articles/nmeth.2738 Krzywinski & Altman 2013]). Ensuring that sample sizes are large enough to detect the effects of interest is an essential part of study design.<br />
<br />
Studies with inadequate power are a waste of research resources and arguably unethical when subjects are exposed to potentially harmful or inferior experimental conditions.<br />
<br />
Statistical power analysis exploits the relationships among the four variables involved in statistical inference: sample size (N), significance criterion (α), effect size (ES), and statistical power.<br />
<br />
== B. Guidance & Expectations ==<br />
<br />
General advice - DO:<br />
<br />
* Whenever possible, seek professional biostatistician support to estimate sample size.<br />
* Use power prospectively for planning future studies.<br />
* Put science before statistics. It is easy to get caught up in statistical significance and such; but studies should be designed to meet scientific goals, and you need to keep those in sight at all times (in planning and analysis). The appropriate inputs to power/sample-size calculations are effect sizes that are deemed scientifically important, based on careful considerations of the underlying scientific (not statistical) goals of the study. Statistical considerations are used to identify a plan that is effective in meeting scientific goals – not the other way around.<br />
* Do pilot studies. Investigators tend to try to answer all the world’s questions with one study. However, you usually cannot do a definitive study in one step. It is far better to work incrementally. A pilot study helps you establish procedures, understand and protect against things that can go wrong, and obtain variance estimates needed in determining sample size. A pilot study with 20-30 degrees of freedom for error is generally quite adequate for obtaining reasonably reliable sample-size estimates.<br />
* Effect size should be specified on the actual scale of measurement, not on a standardized scale.<br />
* Generate sample size estimates for a range of power and effect size values to explore the gains and and losses in power or detectable effect size due to increasing or decreasing n. This is why the term ‘sample size estimation’ is often preferred over ‘sample size calculation’. Although the arrival at a number for the required sample size is invariably based on (often complex) formulae, the term ‘calculation’ implies an unwarranted degree of precision. The purpose of sample size estimation is not to give an exact number but rather to subject the study design to scrutiny, including an assessment of the validity and reliability of data collection ([https://www.sciencedirect.com/science/article/pii/S1466853X05000714 Batterham & Atkinson 2005]).<br />
<br />
<br />
General advice - DO NOT:<br />
<br />
* Avoid using the definition of “small,” “medium,” or “large” effect size based on Cohen's d of .20, .50, or .80, respectively. Cohen's assessments are based on an extensive survey of statistics reported in the literature in the social sciences and may not apply to other fields of science. Further, this method uses a standardized effect size as the goal. Think about it: for a “medium” effect size, you’ll choose the same n regardless of the accuracy or reliability of your instrument, or the narrowness or diversity of your subjects. Clearly, important considerations are being ignored here. “Medium” is definitely not the message!<br />
<br />
* Retrospective power calculations should generally be avoided, because they add no new information to an analysis (i.e. avoid using observed power to interpret the results of the statistical test). You’ve got the data, did the analysis, and did not achieve “significance.” So you compute power retrospectively to see if the test was powerful enough or not. This is an empty question. Of course it wasn’t powerful enough – that’s why the result isn’t significant. Power calculations are useful for design, not analysis. <br />
<br />
<br />
Guidance on sample size estimation:<br />
<br />
--- to be added ---<br />
<br />
<br />
<br />
What to do if you have no choice about sample size:<br />
<br />
Limited budget, limited supply of research materials, or a difficult-to-overcome guidance from a collaborator, a funder or a senior colleagues may leave no choice but to consider running a study with a certain potentially small sample size. What can be done in such situations?<br />
<br />
* consider study designs involving correlated data (e.g. repeated measures, crossover or matched-pairs designs) that are usually associated with greater statistical power than those involving separate samples allocated to different treatment groups ([[https://www.sciencedirect.com/science/article/pii/S1466853X05000714 see section 2.1 here]).<br />
* consider intervening variables or pre-intervention measurements for stratification; if not possible, one can still improve statistical power by entering these variables as covariates in the analysis<br />
* make sure that the most suited randomization schedule is used to control for random influences<br />
* explore and engage all other means to minimize variation (including using properly maintained and calibrated research instruments, adequate and well controlled environmental conditions, making sure that experiments are performed by competent and adequately trained scientists)<br />
* if a study has low power because of the given sample size, reflect this limitation in the study plan and indicate to all stakehodlers that the study cannot be run as knowledge-claiming (decision-enabling, confirmatory).<br />
* evaluate power not only for the given sample size for also for the values around and discuss the impact of the sample size on power with the stakeholders - in some cases, it may help re-consider the original sample size restrictions.<br />
<br />
== C. Resources ==<br />
<br />
Tools to sample size estimation:<br />
<br />
* [http://www.gpower.hhu.de/ G*Power]<br />
* [https://wise1.cgu.edu/power/index.asp WISE power tutorial]<br />
* [http://davidmlane.com/hyperstat/power.html JAVA applets for power and sample size]<br />
* [https://www.psychometrica.de/effect_size.html Computation of sample sizes @Psychometrica]<br />
* [http://powerandsamplesize.com/Calculators/ Overview of sample size and power calculators]<br />
<br />
<br />
Educational instruments and resources:<br />
<br />
* Mayo clinical online simulator - Size matters [https://rtools.mayo.edu/size_matters/]<br />
* Scientists talking to biostatisticians [https://www.youtube.com/watch?v=PbODigCZqL8&feature=youtu.be]<br />
<br />
<br />
Useful literature (for non-statisticians):<br />
<br />
* [https://stat.uiowa.edu/sites/stat.uiowa.edu/files/techrep/tr303.pdf Practical advice on sample size estimation by Russell Lenth]<br />
* [https://www.sciencedirect.com/science/article/pii/S1466853X05000714 A primer on murky world of sample size estimation by Alan Batterham & Greg Atkinson]<br />
<br />
Useful literature:<br />
<br />
[https://pdfs.semanticscholar.org/1325/24bdfe70504fcd67016b17305ccddb4bcd14.pdf Power in various ANOVA designs by Joel Levin]<br />
<br />
* [https://www.ncbi.nlm.nih.gov/books/NBK43321/ https://www.ncbi.nlm.nih.gov/books/NBK43321/]<br />
* [http://davidmlane.com/hyperstat/power.html http://davidmlane.com/hyperstat/power.html]<br />
* [http://powerandsamplesize.com/Calculators/Test-1-Mean/1-Sample-Equality http://powerandsamplesize.com/Calculators/Test-1-Mean/1-Sample-Equality]<br />
<br />
Guidelines on reporting of sample size (in vivo research):<br />
* [[ARRIVE 2.0]] <br />
<br />
<br />
----------------<br />
back to [[Toolbox]]<br />
<br />
Next item: [[2.1.7 Blinding]]</div>85.216.81.116http://eqipd-toolbox.paasp.net/index.php?title=1.4.3_Quality_in_collaborative_research&diff=6211.4.3 Quality in collaborative research2020-10-16T11:25:17Z<p>85.216.81.116: /* C. Resources */</p>
<hr />
<div>== A. Background & Definitions ==<br />
Research collaboration, in the context of this Toolbox article, refers to any mode of collaboration between two or more researchers or research organizations where one collaborating party depends on quality of results generated by another collaborating party. <br />
The collaboration modes range from a fee-for-service relationships to research projects executed jointly by members of a consortium where each member contributes towards shared goals.<br />
<br />
<br />
== B. Guidance & Expectations ==<br />
It is strongly recommended that:<br />
* each collaborating party defines research quality expectations prior to entering any formal collaboration agreements and certainly before initiating any experimental work;<br />
* if a collaboration is supported by a formal collaboration agreement, research quality expectations are specified as an attachment to the agreement;<br />
* all factors that can bias the research conduct (e.g. time pressure) are defined and discussed between parties;<br />
* if appropriate, individuals responsible for specific aspects of research quality are explicitly identified.<br />
<br />
Research quality expectations may also include on measurements to ensure data integrity, traceability and security:<br />
* Data generation and documentation practices<br />
** Will raw data be properly handled and stored?<br />
** Do collaborators have laboratory notebooks? <br />
* Data management practices <br />
** Are practices compliant with FAIR principles? <br />
* Platform for data sharing with collaborating parties <br />
** Does it support transparent data sharing? <br />
** Is it secure?<br />
* Reporting of results (presentation of research data between collaborating parties)<br />
** Are there any measures necessary to ensure complete reporting including all replicates?<br />
<br />
<br />
'''RISK ASSESSMENT'''<br />
<br />
* Is there any risk that inadequate quality of research practices (e.g. documentation) will endanger intellectual property rights?<br />
<br />
<br />
'''PLEASE DO NOT FORGET'''<br />
<br />
* To check whether research at the collaborating party meet required ethical standards<br />
<br />
<br />
== C. Resources ==<br />
'''Examples and templates for external collaborators:'''<br />
* [https://paasp.sharepoint.com/:w:/s/EQIPD/Edznay8k9dZAuNwmwZeuXz8B3D9lgHGyHsKna8nNfLXLeg?e=zc9VBz 1.4.3.1 Expectations for external collaborators.docx]<br />
* [https://paasp.sharepoint.com/:w:/s/EQIPD/EQ5criyu0xtDu4DJOmOTutMBuIoFK-jUxXpVSfixe_OOFw?e=DT4naO 1.4.3.1 Expectations for external collaborators one pager.docx]<br />
<br />
'''NEEDs set up by EQIPD for academia-industry collaborations:'''<br />
The pharma industry partners of EQIPD prepared a specific NEED for academic collaboration partners. This NEED can be downloaded here<br />
* [https://paasp.sharepoint.com/:x:/s/EQIPD/EUBTQDvb0J5HnUXpKdpyZUsB041zFCMf-0MHbNoG6w2t5w?e=syBDiY EQIPD external NEED Collaboration with Pharma Industry]<br />
Information on how to use such NEEDs can be found in section [[4.3.2 Using the EQIPD applications]]<br />
<br />
The FAIR Guiding Principles for scientific data management and stewardship [https://www.nature.com/articles/sdata201618]<br />
<br />
<br />
----------------<br />
back to [[Toolbox]]<br />
<br />
Next item: [[1.5.1 Quality policy]]</div>85.216.81.116http://eqipd-toolbox.paasp.net/index.php?title=2.1.6_Sample_size_and_power_analysis&diff=6162.1.6 Sample size and power analysis2020-10-15T16:17:38Z<p>85.216.81.116: /* B. Guidance & Expectations */</p>
<hr />
<div><br />
<br />
UNDER CONSTRUCTION<br />
<br />
<br />
<br />
<br />
<br />
== A. Background & Definitions ==<br />
<br />
Statistical power is defined as the probability of detecting as statistically significant a clinically or practically important difference of a pre-specified size, if such a difference truly exists. Formally, power is equal to 1 minus the Type II error rate (beta or ß). The Type II error rate is the probability of obtaining a non-significant result when the null hypothesis is false — in other words, failing to find a difference or relationship when one exists.<br />
<br />
Balancing sample size, effect size and power is critical to good study design. When the power is low, only large effects can be detected, and negative results cannot be reliably interpreted. The consequences of low power are particularly dire in the search for high-impact results, when the researcher may be willing to pursue low-likelihood hypotheses for a groundbreaking discovery (see Fig. 1 in [https://www.nature.com/articles/nmeth.2738 Krzywinski & Altman 2013]). Ensuring that sample sizes are large enough to detect the effects of interest is an essential part of study design.<br />
<br />
Studies with inadequate power are a waste of research resources and arguably unethical when subjects are exposed to potentially harmful or inferior experimental conditions.<br />
<br />
Statistical power analysis exploits the relationships among the four variables involved in statistical inference: sample size (N), significance criterion (α), effect size (ES), and statistical power. For any statistical model, these relationships are such that each is a function of the other three.<br />
<br />
== B. Guidance & Expectations ==<br />
<br />
General advice - DO:<br />
<br />
* Whenever possible, seek professional biostatistician support to estimate sample size.<br />
* Use power prospectively for planning future studies.<br />
* Put science before statistics. It is easy to get caught up in statistical significance and such; but studies should be designed to meet scientific goals, and you need to keep those in sight at all times (in planning and analysis). The appropriate inputs to power/sample-size calculations are effect sizes that are deemed scientifically important, based on careful considerations of the underlying scientific (not statistical) goals of the study. Statistical considerations are used to identify a plan that is effective in meeting scientific goals – not the other way around.<br />
* Do pilot studies. Investigators tend to try to answer all the world’s questions with one study. However, you usually cannot do a definitive study in one step. It is far better to work incrementally. A pilot study helps you establish procedures, understand and protect against things that can go wrong, and obtain variance estimates needed in determining sample size. A pilot study with 20-30 degrees of freedom for error is generally quite adequate for obtaining reasonably reliable sample-size estimates.<br />
* Effect size should be specified on the actual scale of measurement, not on a standardized scale.<br />
* Generate sample size estimates for a range of power and effect size values to explore the gains and and losses in power or detectable effect size due to increasing or decreasing n. This is why the term ‘sample size estimation’ is often preferred over ‘sample size calculation’. Although the arrival at a number for the required sample size is invariably based on (often complex) formulae, the term ‘calculation’ implies an unwarranted degree of precision. The purpose of sample size estimation is not to give an exact number but rather to subject the study design to scrutiny, including an assessment of the validity and reliability of data collection ([https://www.sciencedirect.com/science/article/pii/S1466853X05000714 Batterham & Atkinson 2005]).<br />
<br />
<br />
General advice - DO NOT:<br />
<br />
* Avoid using the definition of “small,” “medium,” or “large” effect size based on Cohen's d of .20, .50, or .80, respectively. Cohen's assessments are based on an extensive survey of statistics reported in the literature in the social sciences and may not apply to other fields of science. Further, this method uses a standardized effect size as the goal. Think about it: for a “medium” effect size, you’ll choose the same n regardless of the accuracy or reliability of your instrument, or the narrowness or diversity of your subjects. Clearly, important considerations are being ignored here. “Medium” is definitely not the message!<br />
<br />
* Retrospective power calculations should generally be avoided, because they add no new information to an analysis (i.e. avoid using observed power to interpret the results of the statistical test). You’ve got the data, did the analysis, and did not achieve “significance.” So you compute power retrospectively to see if the test was powerful enough or not. This is an empty question. Of course it wasn’t powerful enough – that’s why the result isn’t significant. Power calculations are useful for design, not analysis. <br />
<br />
<br />
Guidance on sample size estimation:<br />
<br />
--- to be added ---<br />
<br />
<br />
<br />
What to do if you have no choice about sample size:<br />
<br />
Limited budget, limited supply of research materials, or a difficult-to-overcome guidance from a collaborator, a funder or a senior colleagues may leave no choice but to consider running a study with a certain potentially small sample size. What can be done in such situations?<br />
* consider study designs involving correlated data (e.g. repeated measures, crossover or matched-pairs designs) that are usually associated with greater statistical power than those involving separate samples allocated to different treatment groups ([[https://www.sciencedirect.com/science/article/pii/S1466853X05000714 see section 2.1 here]).<br />
* consider <br />
* explore and engage all other means to minimize variation (including using properly maintained and calibrated research instruments, adequate and well controlled environmental conditions, making sure that experiments are performed by competent and adequately trained scientists)<br />
* if a study has low power because of the given sample size, reflect this limitation in the study plan and avoid running the study as knowledge-claiming (decision-enabling, confirmatory).<br />
<br />
It is hard to argue with budgets, journal editors, and superiors. But this does not mean that there is no sample-size problem. As we discuss in more detail in Section 5, sample size is but one of several quality characteristics of a statistical study; so if n is held fixed, we simply need to focus on other aspects of study quality. For instance, given the budgeted (or imposed) sample size, we can find the effect size θ ̈ such that π (θ ̈ , n, α , . . .) = π ̃ . Then the value of θ ̈ can be discussed and evaluated relative to scientific goals. If it is too large, then the study is under-powered, and then the recommendation depends on the situation. Perhaps this finding may be used to argue for a bigger budget. Perhaps a better instrument can be found that will bring the study up to a reasonable standard. Last (but definitely not least), re-consider possible improvements to the study design that will reduce the variance of the estimator of θ, e.g., using judicious stratification or blocking..<br />
<br />
== C. Resources ==<br />
<br />
Tools to sample size estimation:<br />
<br />
* [http://www.gpower.hhu.de/ G*Power]<br />
* [https://wise1.cgu.edu/power/index.asp WISE power tutorial]<br />
* [http://davidmlane.com/hyperstat/power.html JAVA applets for power and sample size]<br />
* [https://www.psychometrica.de/effect_size.html Computation of sample sizes @Psychometrica]<br />
<br />
<br />
Educational instruments and resources:<br />
<br />
* Mayo clinical online simulator - Size matters [https://rtools.mayo.edu/size_matters/]<br />
* Scientists talking to biostatisticians [https://www.youtube.com/watch?v=PbODigCZqL8&feature=youtu.be]<br />
<br />
<br />
Useful literature (for non-statisticians):<br />
<br />
* [https://stat.uiowa.edu/sites/stat.uiowa.edu/files/techrep/tr303.pdf Practical advice on sample size estimation by Russell Lenth]<br />
* [https://www.sciencedirect.com/science/article/pii/S1466853X05000714 A primer on murky world of sample size estimation by Alan Batterham & Greg Atkinson]<br />
<br />
Useful literature:<br />
<br />
[https://pdfs.semanticscholar.org/1325/24bdfe70504fcd67016b17305ccddb4bcd14.pdf Power in various ANOVA designs by Joel Levin]<br />
<br />
* [https://www.ncbi.nlm.nih.gov/books/NBK43321/ https://www.ncbi.nlm.nih.gov/books/NBK43321/]<br />
* [http://davidmlane.com/hyperstat/power.html http://davidmlane.com/hyperstat/power.html]<br />
* [http://powerandsamplesize.com/Calculators/Test-1-Mean/1-Sample-Equality http://powerandsamplesize.com/Calculators/Test-1-Mean/1-Sample-Equality]<br />
<br />
Guidelines on reporting of sample size (in vivo research):<br />
* [[ARRIVE 2.0]] <br />
<br />
<br />
----------------<br />
back to [[Toolbox]]<br />
<br />
Next item: [[2.1.7 Blinding]]</div>85.216.81.116http://eqipd-toolbox.paasp.net/index.php?title=2.1.6_Sample_size_and_power_analysis&diff=6152.1.6 Sample size and power analysis2020-10-15T16:08:02Z<p>85.216.81.116: /* C. Resources */</p>
<hr />
<div><br />
<br />
UNDER CONSTRUCTION<br />
<br />
<br />
<br />
<br />
<br />
== A. Background & Definitions ==<br />
<br />
Statistical power is defined as the probability of detecting as statistically significant a clinically or practically important difference of a pre-specified size, if such a difference truly exists. Formally, power is equal to 1 minus the Type II error rate (beta or ß). The Type II error rate is the probability of obtaining a non-significant result when the null hypothesis is false — in other words, failing to find a difference or relationship when one exists.<br />
<br />
Balancing sample size, effect size and power is critical to good study design. When the power is low, only large effects can be detected, and negative results cannot be reliably interpreted. The consequences of low power are particularly dire in the search for high-impact results, when the researcher may be willing to pursue low-likelihood hypotheses for a groundbreaking discovery (see Fig. 1 in [https://www.nature.com/articles/nmeth.2738 Krzywinski & Altman 2013]). Ensuring that sample sizes are large enough to detect the effects of interest is an essential part of study design.<br />
<br />
Studies with inadequate power are a waste of research resources and arguably unethical when subjects are exposed to potentially harmful or inferior experimental conditions.<br />
<br />
Statistical power analysis exploits the relationships among the four variables involved in statistical inference: sample size (N), significance criterion (α), effect size (ES), and statistical power. For any statistical model, these relationships are such that each is a function of the other three.<br />
<br />
== B. Guidance & Expectations ==<br />
<br />
General advice - DO:<br />
<br />
* Whenever possible, seek professional biostatistician support to estimate sample size.<br />
* Use power prospectively for planning future studies.<br />
* Put science before statistics. It is easy to get caught up in statistical significance and such; but studies should be designed to meet scientific goals, and you need to keep those in sight at all times (in planning and analysis). The appropriate inputs to power/sample-size calculations are effect sizes that are deemed scientifically important, based on careful considerations of the underlying scientific (not statistical) goals of the study. Statistical considerations are used to identify a plan that is effective in meeting scientific goals – not the other way around.<br />
* Do pilot studies. Investigators tend to try to answer all the world’s questions with one study. However, you usually cannot do a definitive study in one step. It is far better to work incrementally. A pilot study helps you establish procedures, understand and protect against things that can go wrong, and obtain variance estimates needed in determining sample size. A pilot study with 20-30 degrees of freedom for error is generally quite adequate for obtaining reasonably reliable sample-size estimates.<br />
* Effect size should be specified on the actual scale of measurement, not on a standardized scale.<br />
* Generate sample size estimates for a range of power and effect size values to explore the gains and and losses in power or detectable effect size due to increasing or decreasing n. This is why the term ‘sample size estimation’ is often preferred over ‘sample size calculation’. Although the arrival at a number for the required sample size is invariably based on (often complex) formulae, the term ‘calculation’ implies an unwarranted degree of precision. The purpose of sample size estimation is not to give an exact number but rather to subject the study design to scrutiny, including an assessment of the validity and reliability of data collection ([https://www.sciencedirect.com/science/article/pii/S1466853X05000714 Batterham & Atkinson 2005]).<br />
<br />
<br />
General advice - DO NOT:<br />
<br />
* Avoid using the definition of “small,” “medium,” or “large” effect size based on Cohen's d of .20, .50, or .80, respectively. Cohen's assessments are based on an extensive survey of statistics reported in the literature in the social sciences and may not apply to other fields of science. Further, this method uses a standardized effect size as the goal. Think about it: for a “medium” effect size, you’ll choose the same n regardless of the accuracy or reliability of your instrument, or the narrowness or diversity of your subjects. Clearly, important considerations are being ignored here. “Medium” is definitely not the message!<br />
<br />
* Retrospective power calculations should generally be avoided, because they add no new information to an analysis (i.e. avoid using observed power to interpret the results of the statistical test). You’ve got the data, did the analysis, and did not achieve “significance.” So you compute power retrospectively to see if the test was powerful enough or not. This is an empty question. Of course it wasn’t powerful enough – that’s why the result isn’t significant. Power calculations are useful for design, not analysis. <br />
<br />
<br />
Guidance on sample size estimation:<br />
<br />
--- to be added ---<br />
<br />
<br />
<br />
What to do if you have no choice about sample size:<br />
<br />
Limited budget, limited supply of research materials, or a difficult-to-overcome guidance from a collaborator, a funder or a senior colleagues may leave no choice but to consider running a study with a certain potentially small sample size. What can be done in such situations?<br />
* consider study designs involving correlated data (e.g. repeated measures, crossover or matched-pairs designs) that are associated with greater statistical power than those involving separate samples allocated to different treatment groups .<br />
explore and engage all means to minimize variation (from <br />
It is hard to argue with budgets, journal editors, and superiors. But this does not mean that there is no sample-size problem. As we discuss in more detail in Section 5, sample size is but one of several quality characteristics of a statistical study; so if n is held fixed, we simply need to focus on other aspects of study quality. For instance, given the budgeted (or imposed) sample size, we can find the effect size θ ̈ such that π (θ ̈ , n, α , . . .) = π ̃ . Then the value of θ ̈ can be discussed and evaluated relative to scientific goals. If it is too large, then the study is under-powered, and then the recommendation depends on the situation. Perhaps this finding may be used to argue for a bigger budget. Perhaps a better instrument can be found that will bring the study up to a reasonable standard. Last (but definitely not least), re-consider possible improvements to the study design that will reduce the variance of the estimator of θ, e.g., using judicious stratification or blocking..<br />
<br />
== C. Resources ==<br />
<br />
Tools to sample size estimation:<br />
<br />
* [http://www.gpower.hhu.de/ G*Power]<br />
* [https://wise1.cgu.edu/power/index.asp WISE power tutorial]<br />
* [http://davidmlane.com/hyperstat/power.html JAVA applets for power and sample size]<br />
* [https://www.psychometrica.de/effect_size.html Computation of sample sizes @Psychometrica]<br />
<br />
<br />
Educational instruments and resources:<br />
<br />
* Mayo clinical online simulator - Size matters [https://rtools.mayo.edu/size_matters/]<br />
* Scientists talking to biostatisticians [https://www.youtube.com/watch?v=PbODigCZqL8&feature=youtu.be]<br />
<br />
<br />
Useful literature (for non-statisticians):<br />
<br />
* [https://stat.uiowa.edu/sites/stat.uiowa.edu/files/techrep/tr303.pdf Practical advice on sample size estimation by Russell Lenth]<br />
* [https://www.sciencedirect.com/science/article/pii/S1466853X05000714 A primer on murky world of sample size estimation by Alan Batterham & Greg Atkinson]<br />
<br />
Useful literature:<br />
<br />
[https://pdfs.semanticscholar.org/1325/24bdfe70504fcd67016b17305ccddb4bcd14.pdf Power in various ANOVA designs by Joel Levin]<br />
<br />
* [https://www.ncbi.nlm.nih.gov/books/NBK43321/ https://www.ncbi.nlm.nih.gov/books/NBK43321/]<br />
* [http://davidmlane.com/hyperstat/power.html http://davidmlane.com/hyperstat/power.html]<br />
* [http://powerandsamplesize.com/Calculators/Test-1-Mean/1-Sample-Equality http://powerandsamplesize.com/Calculators/Test-1-Mean/1-Sample-Equality]<br />
<br />
Guidelines on reporting of sample size (in vivo research):<br />
* [[ARRIVE 2.0]] <br />
<br />
<br />
----------------<br />
back to [[Toolbox]]<br />
<br />
Next item: [[2.1.7 Blinding]]</div>85.216.81.116http://eqipd-toolbox.paasp.net/index.php?title=2.1.6_Sample_size_and_power_analysis&diff=6142.1.6 Sample size and power analysis2020-10-15T16:07:31Z<p>85.216.81.116: /* B. Guidance & Expectations */</p>
<hr />
<div><br />
<br />
UNDER CONSTRUCTION<br />
<br />
<br />
<br />
<br />
<br />
== A. Background & Definitions ==<br />
<br />
Statistical power is defined as the probability of detecting as statistically significant a clinically or practically important difference of a pre-specified size, if such a difference truly exists. Formally, power is equal to 1 minus the Type II error rate (beta or ß). The Type II error rate is the probability of obtaining a non-significant result when the null hypothesis is false — in other words, failing to find a difference or relationship when one exists.<br />
<br />
Balancing sample size, effect size and power is critical to good study design. When the power is low, only large effects can be detected, and negative results cannot be reliably interpreted. The consequences of low power are particularly dire in the search for high-impact results, when the researcher may be willing to pursue low-likelihood hypotheses for a groundbreaking discovery (see Fig. 1 in [https://www.nature.com/articles/nmeth.2738 Krzywinski & Altman 2013]). Ensuring that sample sizes are large enough to detect the effects of interest is an essential part of study design.<br />
<br />
Studies with inadequate power are a waste of research resources and arguably unethical when subjects are exposed to potentially harmful or inferior experimental conditions.<br />
<br />
Statistical power analysis exploits the relationships among the four variables involved in statistical inference: sample size (N), significance criterion (α), effect size (ES), and statistical power. For any statistical model, these relationships are such that each is a function of the other three.<br />
<br />
== B. Guidance & Expectations ==<br />
<br />
General advice - DO:<br />
<br />
* Whenever possible, seek professional biostatistician support to estimate sample size.<br />
* Use power prospectively for planning future studies.<br />
* Put science before statistics. It is easy to get caught up in statistical significance and such; but studies should be designed to meet scientific goals, and you need to keep those in sight at all times (in planning and analysis). The appropriate inputs to power/sample-size calculations are effect sizes that are deemed scientifically important, based on careful considerations of the underlying scientific (not statistical) goals of the study. Statistical considerations are used to identify a plan that is effective in meeting scientific goals – not the other way around.<br />
* Do pilot studies. Investigators tend to try to answer all the world’s questions with one study. However, you usually cannot do a definitive study in one step. It is far better to work incrementally. A pilot study helps you establish procedures, understand and protect against things that can go wrong, and obtain variance estimates needed in determining sample size. A pilot study with 20-30 degrees of freedom for error is generally quite adequate for obtaining reasonably reliable sample-size estimates.<br />
* Effect size should be specified on the actual scale of measurement, not on a standardized scale.<br />
* Generate sample size estimates for a range of power and effect size values to explore the gains and and losses in power or detectable effect size due to increasing or decreasing n. This is why the term ‘sample size estimation’ is often preferred over ‘sample size calculation’. Although the arrival at a number for the required sample size is invariably based on (often complex) formulae, the term ‘calculation’ implies an unwarranted degree of precision. The purpose of sample size estimation is not to give an exact number but rather to subject the study design to scrutiny, including an assessment of the validity and reliability of data collection ([https://www.sciencedirect.com/science/article/pii/S1466853X05000714 Batterham & Atkinson 2005]).<br />
<br />
<br />
General advice - DO NOT:<br />
<br />
* Avoid using the definition of “small,” “medium,” or “large” effect size based on Cohen's d of .20, .50, or .80, respectively. Cohen's assessments are based on an extensive survey of statistics reported in the literature in the social sciences and may not apply to other fields of science. Further, this method uses a standardized effect size as the goal. Think about it: for a “medium” effect size, you’ll choose the same n regardless of the accuracy or reliability of your instrument, or the narrowness or diversity of your subjects. Clearly, important considerations are being ignored here. “Medium” is definitely not the message!<br />
<br />
* Retrospective power calculations should generally be avoided, because they add no new information to an analysis (i.e. avoid using observed power to interpret the results of the statistical test). You’ve got the data, did the analysis, and did not achieve “significance.” So you compute power retrospectively to see if the test was powerful enough or not. This is an empty question. Of course it wasn’t powerful enough – that’s why the result isn’t significant. Power calculations are useful for design, not analysis. <br />
<br />
<br />
Guidance on sample size estimation:<br />
<br />
--- to be added ---<br />
<br />
<br />
<br />
What to do if you have no choice about sample size:<br />
<br />
Limited budget, limited supply of research materials, or a difficult-to-overcome guidance from a collaborator, a funder or a senior colleagues may leave no choice but to consider running a study with a certain potentially small sample size. What can be done in such situations?<br />
* consider study designs involving correlated data (e.g. repeated measures, crossover or matched-pairs designs) that are associated with greater statistical power than those involving separate samples allocated to different treatment groups .<br />
explore and engage all means to minimize variation (from <br />
It is hard to argue with budgets, journal editors, and superiors. But this does not mean that there is no sample-size problem. As we discuss in more detail in Section 5, sample size is but one of several quality characteristics of a statistical study; so if n is held fixed, we simply need to focus on other aspects of study quality. For instance, given the budgeted (or imposed) sample size, we can find the effect size θ ̈ such that π (θ ̈ , n, α , . . .) = π ̃ . Then the value of θ ̈ can be discussed and evaluated relative to scientific goals. If it is too large, then the study is under-powered, and then the recommendation depends on the situation. Perhaps this finding may be used to argue for a bigger budget. Perhaps a better instrument can be found that will bring the study up to a reasonable standard. Last (but definitely not least), re-consider possible improvements to the study design that will reduce the variance of the estimator of θ, e.g., using judicious stratification or blocking..<br />
<br />
== C. Resources ==<br />
<br />
Tools to sample size estimation:<br />
<br />
* [http://www.gpower.hhu.de/ G*Power]<br />
* [https://wise1.cgu.edu/power/index.asp WISE power tutorial]<br />
* [http://davidmlane.com/hyperstat/power.html JAVA applets for power and sample size]<br />
* [https://www.psychometrica.de/effect_size.html Computation of sample sizes @Psychometrica]<br />
<br />
<br />
Educational instruments and resources:<br />
<br />
* Mayo clinical online simulator - Size matters [https://rtools.mayo.edu/size_matters/]<br />
* Scientists talking to biostatisticians [https://www.youtube.com/watch?v=PbODigCZqL8&feature=youtu.be]<br />
<br />
<br />
Useful literature (for non-statisticians):<br />
<br />
* [https://stat.uiowa.edu/sites/stat.uiowa.edu/files/techrep/tr303.pdf Practical advice on sample size estimation by Russell Lenth]<br />
* [https://www.sciencedirect.com/science/article/pii/S1466853X05000714 A primer on murky world of sample size estimation by Alan Batterham & Greg Atkinson])<br />
<br />
Useful literature:<br />
<br />
[https://pdfs.semanticscholar.org/1325/24bdfe70504fcd67016b17305ccddb4bcd14.pdf Power in various ANOVA designs by Joel Levin]<br />
<br />
* [https://www.ncbi.nlm.nih.gov/books/NBK43321/ https://www.ncbi.nlm.nih.gov/books/NBK43321/]<br />
* [http://davidmlane.com/hyperstat/power.html http://davidmlane.com/hyperstat/power.html]<br />
* [http://powerandsamplesize.com/Calculators/Test-1-Mean/1-Sample-Equality http://powerandsamplesize.com/Calculators/Test-1-Mean/1-Sample-Equality]<br />
<br />
Guidelines on reporting of sample size (in vivo research):<br />
* [[ARRIVE 2.0]] <br />
<br />
<br />
----------------<br />
back to [[Toolbox]]<br />
<br />
Next item: [[2.1.7 Blinding]]</div>85.216.81.116http://eqipd-toolbox.paasp.net/index.php?title=2.1.6_Sample_size_and_power_analysis&diff=6132.1.6 Sample size and power analysis2020-10-15T10:52:18Z<p>85.216.81.116: /* A. Background & Definitions */</p>
<hr />
<div><br />
<br />
UNDER CONSTRUCTION<br />
<br />
<br />
<br />
<br />
<br />
== A. Background & Definitions ==<br />
<br />
Statistical power is defined as the probability of detecting as statistically significant a clinically or practically important difference of a pre-specified size, if such a difference truly exists. Formally, power is equal to 1 minus the Type II error rate (beta or ß). The Type II error rate is the probability of obtaining a non-significant result when the null hypothesis is false — in other words, failing to find a difference or relationship when one exists.<br />
<br />
Balancing sample size, effect size and power is critical to good study design. When the power is low, only large effects can be detected, and negative results cannot be reliably interpreted. The consequences of low power are particularly dire in the search for high-impact results, when the researcher may be willing to pursue low-likelihood hypotheses for a groundbreaking discovery (see Fig. 1 in [https://www.nature.com/articles/nmeth.2738 Krzywinski & Altman 2013]). Ensuring that sample sizes are large enough to detect the effects of interest is an essential part of study design.<br />
<br />
Studies with inadequate power are a waste of research resources and arguably unethical when subjects are exposed to potentially harmful or inferior experimental conditions.<br />
<br />
Statistical power analysis exploits the relationships among the four variables involved in statistical inference: sample size (N), significance criterion (α), effect size (ES), and statistical power. For any statistical model, these relationships are such that each is a function of the other three.<br />
<br />
== B. Guidance & Expectations ==<br />
<br />
General advice - DO:<br />
<br />
* Whenever possible, seek professional biostatistician support to estimate sample size.<br />
* Use power prospectively for planning future studies.<br />
* Put science before statistics. It is easy to get caught up in statistical significance and such; but studies should be designed to meet scientific goals, and you need to keep those in sight at all times (in planning and analysis). The appropriate inputs to power/sample-size calculations are effect sizes that are deemed scientifically important, based on careful considerations of the underlying scientific (not statistical) goals of the study. Statistical considerations are used to identify a plan that is effective in meeting scientific goals – not the other way around.<br />
* Do pilot studies. Investigators tend to try to answer all the world’s questions with one study. However, you usually cannot do a definitive study in one step. It is far better to work incrementally. A pilot study helps you establish procedures, understand and protect against things that can go wrong, and obtain variance estimates needed in determining sample size. A pilot study with 20-30 degrees of freedom for error is generally quite adequate for obtaining reasonably reliable sample-size estimates.<br />
* Effect size should be specified on the actual scale of measurement, not on a standardized scale.<br />
* Generate sample size estimates for a range of power and effect size values to explore the gains and and losses in power or detectable effect size due to increasing or decreasing n. This is why the term ‘sample size estimation’ is often preferred over ‘sample size calculation’. Although the arrival at a number for the required sample size is invariably based on (often complex) formulae, the term ‘calculation’ implies an unwarranted degree of precision. The purpose of sample size estimation is not to give an exact number but rather to subject the study design to scrutiny, including an assessment of the validity and reliability of data collection ([https://www.sciencedirect.com/science/article/pii/S1466853X05000714 Batterham & Atkinson 2005]).<br />
<br />
<br />
General advice - DO NOT:<br />
<br />
* Avoid using the definition of “small,” “medium,” or “large” effect size based on Cohen's d of .20, .50, or .80, respectively. Cohen's assessments are based on an extensive survey of statistics reported in the literature in the social sciences and may not apply to other fields of science. Further, this method uses a standardized effect size as the goal. Think about it: for a “medium” effect size, you’ll choose the same n regardless of the accuracy or reliability of your instrument, or the narrowness or diversity of your subjects. Clearly, important considerations are being ignored here. “Medium” is definitely not the message!<br />
<br />
* Retrospective power calculations should generally be avoided, because they add no new information to an analysis (i.e. avoid using observed power to interpret the results of the statistical test). You’ve got the data, did the analysis, and did not achieve “significance.” So you compute power retrospectively to see if the test was powerful enough or not. This is an empty question. Of course it wasn’t powerful enough – that’s why the result isn’t significant. Power calculations are useful for design, not analysis. <br />
<br />
<br />
We begin by setting the values of type I error (a) and power (1 – b) to be statistically adequate: traditionally 0.05 and 0.80, respectively. We then determine n on the basis of the smallest effect we wish to measure. If the required sample size is too large, we may need to reassess our objectives or more tightly control the experimental conditions to reduce the variance.<br />
<br />
== C. Resources ==<br />
<br />
Tools to sample size estimation:<br />
<br />
* [http://www.gpower.hhu.de/ G*Power]<br />
* [https://wise1.cgu.edu/power/index.asp WISE power tutorial]<br />
* [http://davidmlane.com/hyperstat/power.html JAVA applets for power and sample size]<br />
* [https://www.psychometrica.de/effect_size.html Computation of sample sizes @Psychometrica]<br />
<br />
<br />
Educational instruments and resources:<br />
<br />
* Mayo clinical online simulator - Size matters [https://rtools.mayo.edu/size_matters/]<br />
* Scientists talking to biostatisticians [https://www.youtube.com/watch?v=PbODigCZqL8&feature=youtu.be]<br />
<br />
<br />
Useful literature (for non-statisticians):<br />
<br />
* [https://stat.uiowa.edu/sites/stat.uiowa.edu/files/techrep/tr303.pdf Practical advice on sample size estimation by Russell Lenth]<br />
* [https://www.sciencedirect.com/science/article/pii/S1466853X05000714 A primer on murky world of sample size estimation by Alan Batterham & Greg Atkinson])<br />
<br />
Useful literature:<br />
<br />
[https://pdfs.semanticscholar.org/1325/24bdfe70504fcd67016b17305ccddb4bcd14.pdf Power in various ANOVA designs by Joel Levin]<br />
<br />
* [https://www.ncbi.nlm.nih.gov/books/NBK43321/ https://www.ncbi.nlm.nih.gov/books/NBK43321/]<br />
* [http://davidmlane.com/hyperstat/power.html http://davidmlane.com/hyperstat/power.html]<br />
* [http://powerandsamplesize.com/Calculators/Test-1-Mean/1-Sample-Equality http://powerandsamplesize.com/Calculators/Test-1-Mean/1-Sample-Equality]<br />
<br />
Guidelines on reporting of sample size (in vivo research):<br />
* [[ARRIVE 2.0]] <br />
<br />
<br />
----------------<br />
back to [[Toolbox]]<br />
<br />
Next item: [[2.1.7 Blinding]]</div>85.216.81.116http://eqipd-toolbox.paasp.net/index.php?title=2.1.6_Sample_size_and_power_analysis&diff=6122.1.6 Sample size and power analysis2020-10-15T07:00:38Z<p>85.216.81.116: /* C. Resources */</p>
<hr />
<div><br />
<br />
UNDER CONSTRUCTION<br />
<br />
<br />
<br />
<br />
<br />
== A. Background & Definitions ==<br />
<br />
Statistical power is defined as the probability of detecting as statistically significant a clinically or practically important difference of a pre-specified size, if such a difference truly exists. Formally, power is equal to 1 minus the Type II error rate (beta or ß). The Type II error rate is the probability of obtaining a non-significant result when the null hypothesis is false — in other words, failing to find a difference or relationship when one exists.<br />
<br />
Balancing sample size, effect size and power is critical to good study design. When the power is low, only large effects can be detected, and negative results cannot be reliably interpreted. The consequences of low power are particularly dire in the search for high-impact results, when the researcher may be willing to pursue low-likelihood hypotheses for a groundbreaking discovery (see Fig. 1 in [https://www.nature.com/articles/nmeth.2738 Krzywinski & Altman 2013]). Ensuring that sample sizes are large enough to detect the effects of interest is an essential part of study design.<br />
<br />
Studies with inadequate power are a waste of research resources and arguably unethical when subjects are exposed to potentially harmful or inferior experimental conditions. Addressing this shortcoming is a priority — [https://www.nature.com/documents/nr-reporting-summary.pdf the Nature Publishing Group's reporting checklist for life sciences] includes as the first question: “How was the sample size chosen to ensure adequate power to detect a pre-specified effect size?” <br />
<br />
Statistical power analysis exploits the relationships among the four variables involved in statistical inference: sample size (N), significance criterion (α), effect size (ES), and statistical power. For any statistical model, these relationships are such that each is a function of the other three.<br />
<br />
== B. Guidance & Expectations ==<br />
<br />
General advice - DO:<br />
<br />
* Whenever possible, seek professional biostatistician support to estimate sample size.<br />
* Use power prospectively for planning future studies.<br />
* Put science before statistics. It is easy to get caught up in statistical significance and such; but studies should be designed to meet scientific goals, and you need to keep those in sight at all times (in planning and analysis). The appropriate inputs to power/sample-size calculations are effect sizes that are deemed scientifically important, based on careful considerations of the underlying scientific (not statistical) goals of the study. Statistical considerations are used to identify a plan that is effective in meeting scientific goals – not the other way around.<br />
* Do pilot studies. Investigators tend to try to answer all the world’s questions with one study. However, you usually cannot do a definitive study in one step. It is far better to work incrementally. A pilot study helps you establish procedures, understand and protect against things that can go wrong, and obtain variance estimates needed in determining sample size. A pilot study with 20-30 degrees of freedom for error is generally quite adequate for obtaining reasonably reliable sample-size estimates.<br />
* Effect size should be specified on the actual scale of measurement, not on a standardized scale.<br />
* Generate sample size estimates for a range of power and effect size values to explore the gains and and losses in power or detectable effect size due to increasing or decreasing n. This is why the term ‘sample size estimation’ is often preferred over ‘sample size calculation’. Although the arrival at a number for the required sample size is invariably based on (often complex) formulae, the term ‘calculation’ implies an unwarranted degree of precision. The purpose of sample size estimation is not to give an exact number but rather to subject the study design to scrutiny, including an assessment of the validity and reliability of data collection ([https://www.sciencedirect.com/science/article/pii/S1466853X05000714 Batterham & Atkinson 2005]).<br />
<br />
<br />
General advice - DO NOT:<br />
<br />
* Avoid using the definition of “small,” “medium,” or “large” effect size based on Cohen's d of .20, .50, or .80, respectively. Cohen's assessments are based on an extensive survey of statistics reported in the literature in the social sciences and may not apply to other fields of science. Further, this method uses a standardized effect size as the goal. Think about it: for a “medium” effect size, you’ll choose the same n regardless of the accuracy or reliability of your instrument, or the narrowness or diversity of your subjects. Clearly, important considerations are being ignored here. “Medium” is definitely not the message!<br />
<br />
* Retrospective power calculations should generally be avoided, because they add no new information to an analysis (i.e. avoid using observed power to interpret the results of the statistical test). You’ve got the data, did the analysis, and did not achieve “significance.” So you compute power retrospectively to see if the test was powerful enough or not. This is an empty question. Of course it wasn’t powerful enough – that’s why the result isn’t significant. Power calculations are useful for design, not analysis. <br />
<br />
<br />
We begin by setting the values of type I error (a) and power (1 – b) to be statistically adequate: traditionally 0.05 and 0.80, respectively. We then determine n on the basis of the smallest effect we wish to measure. If the required sample size is too large, we may need to reassess our objectives or more tightly control the experimental conditions to reduce the variance.<br />
<br />
== C. Resources ==<br />
<br />
Tools to sample size estimation:<br />
<br />
* [http://www.gpower.hhu.de/ G*Power]<br />
* [https://wise1.cgu.edu/power/index.asp WISE power tutorial]<br />
* [http://davidmlane.com/hyperstat/power.html JAVA applets for power and sample size]<br />
* [https://www.psychometrica.de/effect_size.html Computation of sample sizes @Psychometrica]<br />
<br />
<br />
Educational instruments and resources:<br />
<br />
* Mayo clinical online simulator - Size matters [https://rtools.mayo.edu/size_matters/]<br />
* Scientists talking to biostatisticians [https://www.youtube.com/watch?v=PbODigCZqL8&feature=youtu.be]<br />
<br />
<br />
Useful literature (for non-statisticians):<br />
<br />
* [https://stat.uiowa.edu/sites/stat.uiowa.edu/files/techrep/tr303.pdf Practical advice on sample size estimation by Russell Lenth]<br />
* [https://www.sciencedirect.com/science/article/pii/S1466853X05000714 A primer on murky world of sample size estimation by Alan Batterham & Greg Atkinson])<br />
<br />
Useful literature:<br />
<br />
[https://pdfs.semanticscholar.org/1325/24bdfe70504fcd67016b17305ccddb4bcd14.pdf Power in various ANOVA designs by Joel Levin]<br />
<br />
* [https://www.ncbi.nlm.nih.gov/books/NBK43321/ https://www.ncbi.nlm.nih.gov/books/NBK43321/]<br />
* [http://davidmlane.com/hyperstat/power.html http://davidmlane.com/hyperstat/power.html]<br />
* [http://powerandsamplesize.com/Calculators/Test-1-Mean/1-Sample-Equality http://powerandsamplesize.com/Calculators/Test-1-Mean/1-Sample-Equality]<br />
<br />
Guidelines on reporting of sample size (in vivo research):<br />
* [[ARRIVE 2.0]] <br />
<br />
<br />
----------------<br />
back to [[Toolbox]]<br />
<br />
Next item: [[2.1.7 Blinding]]</div>85.216.81.116http://eqipd-toolbox.paasp.net/index.php?title=2.1.6_Sample_size_and_power_analysis&diff=6112.1.6 Sample size and power analysis2020-10-15T07:00:12Z<p>85.216.81.116: /* C. Resources */</p>
<hr />
<div><br />
<br />
UNDER CONSTRUCTION<br />
<br />
<br />
<br />
<br />
<br />
== A. Background & Definitions ==<br />
<br />
Statistical power is defined as the probability of detecting as statistically significant a clinically or practically important difference of a pre-specified size, if such a difference truly exists. Formally, power is equal to 1 minus the Type II error rate (beta or ß). The Type II error rate is the probability of obtaining a non-significant result when the null hypothesis is false — in other words, failing to find a difference or relationship when one exists.<br />
<br />
Balancing sample size, effect size and power is critical to good study design. When the power is low, only large effects can be detected, and negative results cannot be reliably interpreted. The consequences of low power are particularly dire in the search for high-impact results, when the researcher may be willing to pursue low-likelihood hypotheses for a groundbreaking discovery (see Fig. 1 in [https://www.nature.com/articles/nmeth.2738 Krzywinski & Altman 2013]). Ensuring that sample sizes are large enough to detect the effects of interest is an essential part of study design.<br />
<br />
Studies with inadequate power are a waste of research resources and arguably unethical when subjects are exposed to potentially harmful or inferior experimental conditions. Addressing this shortcoming is a priority — [https://www.nature.com/documents/nr-reporting-summary.pdf the Nature Publishing Group's reporting checklist for life sciences] includes as the first question: “How was the sample size chosen to ensure adequate power to detect a pre-specified effect size?” <br />
<br />
Statistical power analysis exploits the relationships among the four variables involved in statistical inference: sample size (N), significance criterion (α), effect size (ES), and statistical power. For any statistical model, these relationships are such that each is a function of the other three.<br />
<br />
== B. Guidance & Expectations ==<br />
<br />
General advice - DO:<br />
<br />
* Whenever possible, seek professional biostatistician support to estimate sample size.<br />
* Use power prospectively for planning future studies.<br />
* Put science before statistics. It is easy to get caught up in statistical significance and such; but studies should be designed to meet scientific goals, and you need to keep those in sight at all times (in planning and analysis). The appropriate inputs to power/sample-size calculations are effect sizes that are deemed scientifically important, based on careful considerations of the underlying scientific (not statistical) goals of the study. Statistical considerations are used to identify a plan that is effective in meeting scientific goals – not the other way around.<br />
* Do pilot studies. Investigators tend to try to answer all the world’s questions with one study. However, you usually cannot do a definitive study in one step. It is far better to work incrementally. A pilot study helps you establish procedures, understand and protect against things that can go wrong, and obtain variance estimates needed in determining sample size. A pilot study with 20-30 degrees of freedom for error is generally quite adequate for obtaining reasonably reliable sample-size estimates.<br />
* Effect size should be specified on the actual scale of measurement, not on a standardized scale.<br />
* Generate sample size estimates for a range of power and effect size values to explore the gains and and losses in power or detectable effect size due to increasing or decreasing n. This is why the term ‘sample size estimation’ is often preferred over ‘sample size calculation’. Although the arrival at a number for the required sample size is invariably based on (often complex) formulae, the term ‘calculation’ implies an unwarranted degree of precision. The purpose of sample size estimation is not to give an exact number but rather to subject the study design to scrutiny, including an assessment of the validity and reliability of data collection ([https://www.sciencedirect.com/science/article/pii/S1466853X05000714 Batterham & Atkinson 2005]).<br />
<br />
<br />
General advice - DO NOT:<br />
<br />
* Avoid using the definition of “small,” “medium,” or “large” effect size based on Cohen's d of .20, .50, or .80, respectively. Cohen's assessments are based on an extensive survey of statistics reported in the literature in the social sciences and may not apply to other fields of science. Further, this method uses a standardized effect size as the goal. Think about it: for a “medium” effect size, you’ll choose the same n regardless of the accuracy or reliability of your instrument, or the narrowness or diversity of your subjects. Clearly, important considerations are being ignored here. “Medium” is definitely not the message!<br />
<br />
* Retrospective power calculations should generally be avoided, because they add no new information to an analysis (i.e. avoid using observed power to interpret the results of the statistical test). You’ve got the data, did the analysis, and did not achieve “significance.” So you compute power retrospectively to see if the test was powerful enough or not. This is an empty question. Of course it wasn’t powerful enough – that’s why the result isn’t significant. Power calculations are useful for design, not analysis. <br />
<br />
<br />
We begin by setting the values of type I error (a) and power (1 – b) to be statistically adequate: traditionally 0.05 and 0.80, respectively. We then determine n on the basis of the smallest effect we wish to measure. If the required sample size is too large, we may need to reassess our objectives or more tightly control the experimental conditions to reduce the variance.<br />
<br />
== C. Resources ==<br />
<br />
Tools to sample size estimation:<br />
<br />
* [http://www.gpower.hhu.de/ G*Power]<br />
* [https://wise1.cgu.edu/power/index.asp WISE power tutorial]<br />
* [http://davidmlane.com/hyperstat/power.html JAVA applets for power and sample size]<br />
* [https://www.psychometrica.de/effect_size.html Computation of sample sizes @Psychometrica]<br />
<br />
<br />
Educational instruments and resources:<br />
<br />
* Mayo clinical online simulator - Size matters [https://rtools.mayo.edu/size_matters/]<br />
* Scientists talking to biostatisticians [https://www.youtube.com/watch?v=PbODigCZqL8&feature=youtu.be]<br />
<br />
<br />
Useful literature (for non-statisticians):<br />
<br />
[https://stat.uiowa.edu/sites/stat.uiowa.edu/files/techrep/tr303.pdf Practical advice on sample size estimation by Russell Lenth]<br />
[https://www.sciencedirect.com/science/article/pii/S1466853X05000714 A primer on murky world of sample size estimation by Alan Batterham & Greg Atkinson])<br />
<br />
Useful literature:<br />
<br />
[https://pdfs.semanticscholar.org/1325/24bdfe70504fcd67016b17305ccddb4bcd14.pdf Power in various ANOVA designs by Joel Levin]<br />
<br />
* [https://www.ncbi.nlm.nih.gov/books/NBK43321/ https://www.ncbi.nlm.nih.gov/books/NBK43321/]<br />
* [http://davidmlane.com/hyperstat/power.html http://davidmlane.com/hyperstat/power.html]<br />
* [http://powerandsamplesize.com/Calculators/Test-1-Mean/1-Sample-Equality http://powerandsamplesize.com/Calculators/Test-1-Mean/1-Sample-Equality]<br />
<br />
Guidelines on reporting of sample size (in vivo research):<br />
* [[ARRIVE 2.0]] <br />
<br />
<br />
----------------<br />
back to [[Toolbox]]<br />
<br />
Next item: [[2.1.7 Blinding]]</div>85.216.81.116http://eqipd-toolbox.paasp.net/index.php?title=2.1.6_Sample_size_and_power_analysis&diff=6102.1.6 Sample size and power analysis2020-10-15T06:44:00Z<p>85.216.81.116: /* B. Guidance & Expectations */</p>
<hr />
<div><br />
<br />
UNDER CONSTRUCTION<br />
<br />
<br />
<br />
<br />
<br />
== A. Background & Definitions ==<br />
<br />
Statistical power is defined as the probability of detecting as statistically significant a clinically or practically important difference of a pre-specified size, if such a difference truly exists. Formally, power is equal to 1 minus the Type II error rate (beta or ß). The Type II error rate is the probability of obtaining a non-significant result when the null hypothesis is false — in other words, failing to find a difference or relationship when one exists.<br />
<br />
Balancing sample size, effect size and power is critical to good study design. When the power is low, only large effects can be detected, and negative results cannot be reliably interpreted. The consequences of low power are particularly dire in the search for high-impact results, when the researcher may be willing to pursue low-likelihood hypotheses for a groundbreaking discovery (see Fig. 1 in [https://www.nature.com/articles/nmeth.2738 Krzywinski & Altman 2013]). Ensuring that sample sizes are large enough to detect the effects of interest is an essential part of study design.<br />
<br />
Studies with inadequate power are a waste of research resources and arguably unethical when subjects are exposed to potentially harmful or inferior experimental conditions. Addressing this shortcoming is a priority — [https://www.nature.com/documents/nr-reporting-summary.pdf the Nature Publishing Group's reporting checklist for life sciences] includes as the first question: “How was the sample size chosen to ensure adequate power to detect a pre-specified effect size?” <br />
<br />
Statistical power analysis exploits the relationships among the four variables involved in statistical inference: sample size (N), significance criterion (α), effect size (ES), and statistical power. For any statistical model, these relationships are such that each is a function of the other three.<br />
<br />
== B. Guidance & Expectations ==<br />
<br />
General advice - DO:<br />
<br />
* Whenever possible, seek professional biostatistician support to estimate sample size.<br />
* Use power prospectively for planning future studies.<br />
* Put science before statistics. It is easy to get caught up in statistical significance and such; but studies should be designed to meet scientific goals, and you need to keep those in sight at all times (in planning and analysis). The appropriate inputs to power/sample-size calculations are effect sizes that are deemed scientifically important, based on careful considerations of the underlying scientific (not statistical) goals of the study. Statistical considerations are used to identify a plan that is effective in meeting scientific goals – not the other way around.<br />
* Do pilot studies. Investigators tend to try to answer all the world’s questions with one study. However, you usually cannot do a definitive study in one step. It is far better to work incrementally. A pilot study helps you establish procedures, understand and protect against things that can go wrong, and obtain variance estimates needed in determining sample size. A pilot study with 20-30 degrees of freedom for error is generally quite adequate for obtaining reasonably reliable sample-size estimates.<br />
* Effect size should be specified on the actual scale of measurement, not on a standardized scale.<br />
* Generate sample size estimates for a range of power and effect size values to explore the gains and and losses in power or detectable effect size due to increasing or decreasing n. This is why the term ‘sample size estimation’ is often preferred over ‘sample size calculation’. Although the arrival at a number for the required sample size is invariably based on (often complex) formulae, the term ‘calculation’ implies an unwarranted degree of precision. The purpose of sample size estimation is not to give an exact number but rather to subject the study design to scrutiny, including an assessment of the validity and reliability of data collection ([https://www.sciencedirect.com/science/article/pii/S1466853X05000714 Batterham & Atkinson 2005]).<br />
<br />
<br />
General advice - DO NOT:<br />
<br />
* Avoid using the definition of “small,” “medium,” or “large” effect size based on Cohen's d of .20, .50, or .80, respectively. Cohen's assessments are based on an extensive survey of statistics reported in the literature in the social sciences and may not apply to other fields of science. Further, this method uses a standardized effect size as the goal. Think about it: for a “medium” effect size, you’ll choose the same n regardless of the accuracy or reliability of your instrument, or the narrowness or diversity of your subjects. Clearly, important considerations are being ignored here. “Medium” is definitely not the message!<br />
<br />
* Retrospective power calculations should generally be avoided, because they add no new information to an analysis (i.e. avoid using observed power to interpret the results of the statistical test). You’ve got the data, did the analysis, and did not achieve “significance.” So you compute power retrospectively to see if the test was powerful enough or not. This is an empty question. Of course it wasn’t powerful enough – that’s why the result isn’t significant. Power calculations are useful for design, not analysis. <br />
<br />
<br />
We begin by setting the values of type I error (a) and power (1 – b) to be statistically adequate: traditionally 0.05 and 0.80, respectively. We then determine n on the basis of the smallest effect we wish to measure. If the required sample size is too large, we may need to reassess our objectives or more tightly control the experimental conditions to reduce the variance.<br />
<br />
== C. Resources ==<br />
<br />
Tools to sample size estimation:<br />
<br />
* [http://www.gpower.hhu.de/ G*Power]<br />
* [https://wise1.cgu.edu/power/index.asp WISE power tutorial]<br />
* [http://davidmlane.com/hyperstat/power.html JAVA applets for power and sample size]<br />
* [https://www.psychometrica.de/effect_size.html Computation of sample sizes @Psychometrica]<br />
<br />
<br />
Educational instruments and resources:<br />
<br />
* Mayo clinical online simulator - Size matters [https://rtools.mayo.edu/size_matters/]<br />
* Scientists talking to biostatisticians [https://www.youtube.com/watch?v=PbODigCZqL8&feature=youtu.be]<br />
<br />
<br />
Useful literature (for non-statisticians):<br />
<br />
[https://stat.uiowa.edu/sites/stat.uiowa.edu/files/techrep/tr303.pdf Practical advice on sample size estimation by Russell Lenth]<br />
<br />
Useful literature:<br />
<br />
[https://pdfs.semanticscholar.org/1325/24bdfe70504fcd67016b17305ccddb4bcd14.pdf Power in various ANOVA designs by Joel Levin]<br />
<br />
* [https://www.ncbi.nlm.nih.gov/books/NBK43321/ https://www.ncbi.nlm.nih.gov/books/NBK43321/]<br />
* [http://davidmlane.com/hyperstat/power.html http://davidmlane.com/hyperstat/power.html]<br />
* [http://powerandsamplesize.com/Calculators/Test-1-Mean/1-Sample-Equality http://powerandsamplesize.com/Calculators/Test-1-Mean/1-Sample-Equality]<br />
<br />
Guidelines on reporting of sample size (in vivo research):<br />
* [[ARRIVE 2.0]] <br />
<br />
<br />
----------------<br />
back to [[Toolbox]]<br />
<br />
Next item: [[2.1.7 Blinding]]</div>85.216.81.116http://eqipd-toolbox.paasp.net/index.php?title=2.1.6_Sample_size_and_power_analysis&diff=6092.1.6 Sample size and power analysis2020-10-15T06:43:36Z<p>85.216.81.116: /* B. Guidance & Expectations */</p>
<hr />
<div><br />
<br />
UNDER CONSTRUCTION<br />
<br />
<br />
<br />
<br />
<br />
== A. Background & Definitions ==<br />
<br />
Statistical power is defined as the probability of detecting as statistically significant a clinically or practically important difference of a pre-specified size, if such a difference truly exists. Formally, power is equal to 1 minus the Type II error rate (beta or ß). The Type II error rate is the probability of obtaining a non-significant result when the null hypothesis is false — in other words, failing to find a difference or relationship when one exists.<br />
<br />
Balancing sample size, effect size and power is critical to good study design. When the power is low, only large effects can be detected, and negative results cannot be reliably interpreted. The consequences of low power are particularly dire in the search for high-impact results, when the researcher may be willing to pursue low-likelihood hypotheses for a groundbreaking discovery (see Fig. 1 in [https://www.nature.com/articles/nmeth.2738 Krzywinski & Altman 2013]). Ensuring that sample sizes are large enough to detect the effects of interest is an essential part of study design.<br />
<br />
Studies with inadequate power are a waste of research resources and arguably unethical when subjects are exposed to potentially harmful or inferior experimental conditions. Addressing this shortcoming is a priority — [https://www.nature.com/documents/nr-reporting-summary.pdf the Nature Publishing Group's reporting checklist for life sciences] includes as the first question: “How was the sample size chosen to ensure adequate power to detect a pre-specified effect size?” <br />
<br />
Statistical power analysis exploits the relationships among the four variables involved in statistical inference: sample size (N), significance criterion (α), effect size (ES), and statistical power. For any statistical model, these relationships are such that each is a function of the other three.<br />
<br />
== B. Guidance & Expectations ==<br />
<br />
General advice - DO:<br />
<br />
* Whenever possible, seek professional biostatistician support to estimate sample size.<br />
* Use power prospectively for planning future studies.<br />
* Put science before statistics. It is easy to get caught up in statistical significance and such; but studies should be designed to meet scientific goals, and you need to keep those in sight at all times (in planning and analysis). The appropriate inputs to power/sample-size calculations are effect sizes that are deemed scientifically important, based on careful considerations of the underlying scientific (not statistical) goals of the study. Statistical considerations are used to identify a plan that is effective in meeting scientific goals – not the other way around.<br />
* Do pilot studies. Investigators tend to try to answer all the world’s questions with one study. However, you usually cannot do a definitive study in one step. It is far better to work incrementally. A pilot study helps you establish procedures, understand and protect against things that can go wrong, and obtain variance estimates needed in determining sample size. A pilot study with 20-30 degrees of freedom for error is generally quite adequate for obtaining reasonably reliable sample-size estimates.<br />
* Effect size should be specified on the actual scale of measurement, not on a standardized scale.<br />
* Generate sample size estimates for a range of power and effect size values to explore the gains and and losses in power or detectable effect size due to increasing or decreasing n. This is why the term ‘sample size estimation’ is often preferred over ‘sample size calculation’. Although the arrival at a number for the required sample size is invariably based on (often complex) formulae, the term ‘calculation’ implies an unwarranted degree of precision. The purpose of sample size estimation is not to give an exact number but rather to subject the study design to scrutiny, including an assessment of the validity and reliability of data collection [https://www.sciencedirect.com/science/article/pii/S1466853X05000714 Batterham & Atkinson 2005].<br />
<br />
<br />
General advice - DO NOT:<br />
<br />
* Avoid using the definition of “small,” “medium,” or “large” effect size based on Cohen's d of .20, .50, or .80, respectively. Cohen's assessments are based on an extensive survey of statistics reported in the literature in the social sciences and may not apply to other fields of science. Further, this method uses a standardized effect size as the goal. Think about it: for a “medium” effect size, you’ll choose the same n regardless of the accuracy or reliability of your instrument, or the narrowness or diversity of your subjects. Clearly, important considerations are being ignored here. “Medium” is definitely not the message!<br />
<br />
* Retrospective power calculations should generally be avoided, because they add no new information to an analysis (i.e. avoid using observed power to interpret the results of the statistical test). You’ve got the data, did the analysis, and did not achieve “significance.” So you compute power retrospectively to see if the test was powerful enough or not. This is an empty question. Of course it wasn’t powerful enough – that’s why the result isn’t significant. Power calculations are useful for design, not analysis. <br />
<br />
<br />
We begin by setting the values of type I error (a) and power (1 – b) to be statistically adequate: traditionally 0.05 and 0.80, respectively. We then determine n on the basis of the smallest effect we wish to measure. If the required sample size is too large, we may need to reassess our objectives or more tightly control the experimental conditions to reduce the variance.<br />
<br />
== C. Resources ==<br />
<br />
Tools to sample size estimation:<br />
<br />
* [http://www.gpower.hhu.de/ G*Power]<br />
* [https://wise1.cgu.edu/power/index.asp WISE power tutorial]<br />
* [http://davidmlane.com/hyperstat/power.html JAVA applets for power and sample size]<br />
* [https://www.psychometrica.de/effect_size.html Computation of sample sizes @Psychometrica]<br />
<br />
<br />
Educational instruments and resources:<br />
<br />
* Mayo clinical online simulator - Size matters [https://rtools.mayo.edu/size_matters/]<br />
* Scientists talking to biostatisticians [https://www.youtube.com/watch?v=PbODigCZqL8&feature=youtu.be]<br />
<br />
<br />
Useful literature (for non-statisticians):<br />
<br />
[https://stat.uiowa.edu/sites/stat.uiowa.edu/files/techrep/tr303.pdf Practical advice on sample size estimation by Russell Lenth]<br />
<br />
Useful literature:<br />
<br />
[https://pdfs.semanticscholar.org/1325/24bdfe70504fcd67016b17305ccddb4bcd14.pdf Power in various ANOVA designs by Joel Levin]<br />
<br />
* [https://www.ncbi.nlm.nih.gov/books/NBK43321/ https://www.ncbi.nlm.nih.gov/books/NBK43321/]<br />
* [http://davidmlane.com/hyperstat/power.html http://davidmlane.com/hyperstat/power.html]<br />
* [http://powerandsamplesize.com/Calculators/Test-1-Mean/1-Sample-Equality http://powerandsamplesize.com/Calculators/Test-1-Mean/1-Sample-Equality]<br />
<br />
Guidelines on reporting of sample size (in vivo research):<br />
* [[ARRIVE 2.0]] <br />
<br />
<br />
----------------<br />
back to [[Toolbox]]<br />
<br />
Next item: [[2.1.7 Blinding]]</div>85.216.81.116http://eqipd-toolbox.paasp.net/index.php?title=2.1.6_Sample_size_and_power_analysis&diff=6082.1.6 Sample size and power analysis2020-10-15T06:30:42Z<p>85.216.81.116: /* A. Background & Definitions */</p>
<hr />
<div><br />
<br />
UNDER CONSTRUCTION<br />
<br />
<br />
<br />
<br />
<br />
== A. Background & Definitions ==<br />
<br />
Statistical power is defined as the probability of detecting as statistically significant a clinically or practically important difference of a pre-specified size, if such a difference truly exists. Formally, power is equal to 1 minus the Type II error rate (beta or ß). The Type II error rate is the probability of obtaining a non-significant result when the null hypothesis is false — in other words, failing to find a difference or relationship when one exists.<br />
<br />
Balancing sample size, effect size and power is critical to good study design. When the power is low, only large effects can be detected, and negative results cannot be reliably interpreted. The consequences of low power are particularly dire in the search for high-impact results, when the researcher may be willing to pursue low-likelihood hypotheses for a groundbreaking discovery (see Fig. 1 in [https://www.nature.com/articles/nmeth.2738 Krzywinski & Altman 2013]). Ensuring that sample sizes are large enough to detect the effects of interest is an essential part of study design.<br />
<br />
Studies with inadequate power are a waste of research resources and arguably unethical when subjects are exposed to potentially harmful or inferior experimental conditions. Addressing this shortcoming is a priority — [https://www.nature.com/documents/nr-reporting-summary.pdf the Nature Publishing Group's reporting checklist for life sciences] includes as the first question: “How was the sample size chosen to ensure adequate power to detect a pre-specified effect size?” <br />
<br />
Statistical power analysis exploits the relationships among the four variables involved in statistical inference: sample size (N), significance criterion (α), effect size (ES), and statistical power. For any statistical model, these relationships are such that each is a function of the other three.<br />
<br />
== B. Guidance & Expectations ==<br />
<br />
General advice - DO:<br />
<br />
* Whenever possible, seek professional biostatistician support to estimate sample size.<br />
* Use power prospectively for planning future studies.<br />
* Put science before statistics. It is easy to get caught up in statistical significance and such; but studies should be designed to meet scientific goals, and you need to keep those in sight at all times (in planning and analysis). The appropriate inputs to power/sample-size calculations are effect sizes that are deemed scientifically important, based on careful considerations of the underlying scientific (not statistical) goals of the study. Statistical considerations are used to identify a plan that is effective in meeting scientific goals – not the other way around.<br />
* Do pilot studies. Investigators tend to try to answer all the world’s questions with one study. However, you usually cannot do a definitive study in one step. It is far better to work incrementally. A pilot study helps you establish procedures, understand and protect against things that can go wrong, and obtain variance estimates needed in determining sample size. A pilot study with 20-30 degrees of freedom for error is generally quite adequate for obtaining reasonably reliable sample-size estimates.<br />
* Effect size should be specified on the actual scale of measurement, not on a standardized scale.<br />
* Generate sample size estimates for a range of power and effect size values to explore the gains and and losses in power or detectable effect size due to increasing or decreasing n.<br />
<br />
<br />
General advice - DO NOT:<br />
<br />
* Avoid using the definition of “small,” “medium,” or “large” effect size based on Cohen's d of .20, .50, or .80, respectively. Cohen's assessments are based on an extensive survey of statistics reported in the literature in the social sciences and may not apply to other fields of science. Further, this method uses a standardized effect size as the goal. Think about it: for a “medium” effect size, you’ll choose the same n regardless of the accuracy or reliability of your instrument, or the narrowness or diversity of your subjects. Clearly, important considerations are being ignored here. “Medium” is definitely not the message!<br />
<br />
* Retrospective power calculations should generally be avoided, because they add no new information to an analysis (i.e. avoid using observed power to interpret the results of the statistical test). You’ve got the data, did the analysis, and did not achieve “significance.” So you compute power retrospectively to see if the test was powerful enough or not. This is an empty question. Of course it wasn’t powerful enough – that’s why the result isn’t significant. Power calculations are useful for design, not analysis. <br />
<br />
<br />
We begin by setting the values of type I error (a) and power (1 – b) to be statistically adequate: traditionally 0.05 and 0.80, respectively. We then determine n on the basis of the smallest effect we wish to measure. If the required sample size is too large, we may need to reassess our objectives or more tightly control the experimental conditions to reduce the variance.<br />
<br />
== C. Resources ==<br />
<br />
Tools to sample size estimation:<br />
<br />
* [http://www.gpower.hhu.de/ G*Power]<br />
* [https://wise1.cgu.edu/power/index.asp WISE power tutorial]<br />
* [http://davidmlane.com/hyperstat/power.html JAVA applets for power and sample size]<br />
* [https://www.psychometrica.de/effect_size.html Computation of sample sizes @Psychometrica]<br />
<br />
<br />
Educational instruments and resources:<br />
<br />
* Mayo clinical online simulator - Size matters [https://rtools.mayo.edu/size_matters/]<br />
* Scientists talking to biostatisticians [https://www.youtube.com/watch?v=PbODigCZqL8&feature=youtu.be]<br />
<br />
<br />
Useful literature (for non-statisticians):<br />
<br />
[https://stat.uiowa.edu/sites/stat.uiowa.edu/files/techrep/tr303.pdf Practical advice on sample size estimation by Russell Lenth]<br />
<br />
Useful literature:<br />
<br />
[https://pdfs.semanticscholar.org/1325/24bdfe70504fcd67016b17305ccddb4bcd14.pdf Power in various ANOVA designs by Joel Levin]<br />
<br />
* [https://www.ncbi.nlm.nih.gov/books/NBK43321/ https://www.ncbi.nlm.nih.gov/books/NBK43321/]<br />
* [http://davidmlane.com/hyperstat/power.html http://davidmlane.com/hyperstat/power.html]<br />
* [http://powerandsamplesize.com/Calculators/Test-1-Mean/1-Sample-Equality http://powerandsamplesize.com/Calculators/Test-1-Mean/1-Sample-Equality]<br />
<br />
Guidelines on reporting of sample size (in vivo research):<br />
* [[ARRIVE 2.0]] <br />
<br />
<br />
----------------<br />
back to [[Toolbox]]<br />
<br />
Next item: [[2.1.7 Blinding]]</div>85.216.81.116http://eqipd-toolbox.paasp.net/index.php?title=2.1.6_Sample_size_and_power_analysis&diff=6072.1.6 Sample size and power analysis2020-10-15T06:28:42Z<p>85.216.81.116: /* A. Background & Definitions */</p>
<hr />
<div><br />
<br />
UNDER CONSTRUCTION<br />
<br />
<br />
<br />
<br />
<br />
== A. Background & Definitions ==<br />
<br />
Statistical power is defined as the probability of detecting as statistically significant a clinically or practically important difference of a pre-specified size, if such a difference truly exists. Formally, power is equal to 1 minus the Type II error rate (beta or ß). The Type II error rate is the probability of obtaining a non-significant result when the null hypothesis is false — in other words, failing to find a difference or relationship when one exists.<br />
<br />
Balancing sample size, effect size and power is critical to good study design. When the power is low, only large effects can be detected, and negative results cannot be reliably interpreted. The consequences of low power are particularly dire in the search for high-impact results, when the researcher may be willing to pursue low-likelihood hypotheses for a groundbreaking discovery (see Fig. 1 in [https://www.nature.com/articles/nmeth.2738 Krzywinski & Altman 2013]). Ensuring that sample sizes are large enough to detect the effects of interest is an essential part of study design.<br />
<br />
Studies with inadequate power are a waste of research resources and arguably unethical when subjects are exposed to potentially harmful or inferior experimental conditions. Addressing this shortcoming is a priority — [[https://www.nature.com/documents/nr-reporting-summary.pdf the Nature Publishing Group's reporting checklist for life sciences]] includes as the first question: “How was the sample size chosen to ensure adequate power to detect a pre-specified effect size?” <br />
<br />
Statistical power analysis exploits the relationships among the four variables involved in statistical inference: sample size (N), significance criterion (α), effect size (ES), and statistical power. For any statistical model, these relationships are such that each is a function of the other three.<br />
<br />
== B. Guidance & Expectations ==<br />
<br />
General advice - DO:<br />
<br />
* Whenever possible, seek professional biostatistician support to estimate sample size.<br />
* Use power prospectively for planning future studies.<br />
* Put science before statistics. It is easy to get caught up in statistical significance and such; but studies should be designed to meet scientific goals, and you need to keep those in sight at all times (in planning and analysis). The appropriate inputs to power/sample-size calculations are effect sizes that are deemed scientifically important, based on careful considerations of the underlying scientific (not statistical) goals of the study. Statistical considerations are used to identify a plan that is effective in meeting scientific goals – not the other way around.<br />
* Do pilot studies. Investigators tend to try to answer all the world’s questions with one study. However, you usually cannot do a definitive study in one step. It is far better to work incrementally. A pilot study helps you establish procedures, understand and protect against things that can go wrong, and obtain variance estimates needed in determining sample size. A pilot study with 20-30 degrees of freedom for error is generally quite adequate for obtaining reasonably reliable sample-size estimates.<br />
* Effect size should be specified on the actual scale of measurement, not on a standardized scale.<br />
* Generate sample size estimates for a range of power and effect size values to explore the gains and and losses in power or detectable effect size due to increasing or decreasing n.<br />
<br />
<br />
General advice - DO NOT:<br />
<br />
* Avoid using the definition of “small,” “medium,” or “large” effect size based on Cohen's d of .20, .50, or .80, respectively. Cohen's assessments are based on an extensive survey of statistics reported in the literature in the social sciences and may not apply to other fields of science. Further, this method uses a standardized effect size as the goal. Think about it: for a “medium” effect size, you’ll choose the same n regardless of the accuracy or reliability of your instrument, or the narrowness or diversity of your subjects. Clearly, important considerations are being ignored here. “Medium” is definitely not the message!<br />
<br />
* Retrospective power calculations should generally be avoided, because they add no new information to an analysis (i.e. avoid using observed power to interpret the results of the statistical test). You’ve got the data, did the analysis, and did not achieve “significance.” So you compute power retrospectively to see if the test was powerful enough or not. This is an empty question. Of course it wasn’t powerful enough – that’s why the result isn’t significant. Power calculations are useful for design, not analysis. <br />
<br />
<br />
We begin by setting the values of type I error (a) and power (1 – b) to be statistically adequate: traditionally 0.05 and 0.80, respectively. We then determine n on the basis of the smallest effect we wish to measure. If the required sample size is too large, we may need to reassess our objectives or more tightly control the experimental conditions to reduce the variance.<br />
<br />
== C. Resources ==<br />
<br />
Tools to sample size estimation:<br />
<br />
* [http://www.gpower.hhu.de/ G*Power]<br />
* [https://wise1.cgu.edu/power/index.asp WISE power tutorial]<br />
* [http://davidmlane.com/hyperstat/power.html JAVA applets for power and sample size]<br />
* [https://www.psychometrica.de/effect_size.html Computation of sample sizes @Psychometrica]<br />
<br />
<br />
Educational instruments and resources:<br />
<br />
* Mayo clinical online simulator - Size matters [https://rtools.mayo.edu/size_matters/]<br />
* Scientists talking to biostatisticians [https://www.youtube.com/watch?v=PbODigCZqL8&feature=youtu.be]<br />
<br />
<br />
Useful literature (for non-statisticians):<br />
<br />
[https://stat.uiowa.edu/sites/stat.uiowa.edu/files/techrep/tr303.pdf Practical advice on sample size estimation by Russell Lenth]<br />
<br />
Useful literature:<br />
<br />
[https://pdfs.semanticscholar.org/1325/24bdfe70504fcd67016b17305ccddb4bcd14.pdf Power in various ANOVA designs by Joel Levin]<br />
<br />
* [https://www.ncbi.nlm.nih.gov/books/NBK43321/ https://www.ncbi.nlm.nih.gov/books/NBK43321/]<br />
* [http://davidmlane.com/hyperstat/power.html http://davidmlane.com/hyperstat/power.html]<br />
* [http://powerandsamplesize.com/Calculators/Test-1-Mean/1-Sample-Equality http://powerandsamplesize.com/Calculators/Test-1-Mean/1-Sample-Equality]<br />
<br />
Guidelines on reporting of sample size (in vivo research):<br />
* [[ARRIVE 2.0]] <br />
<br />
<br />
----------------<br />
back to [[Toolbox]]<br />
<br />
Next item: [[2.1.7 Blinding]]</div>85.216.81.116http://eqipd-toolbox.paasp.net/index.php?title=2.1.6_Sample_size_and_power_analysis&diff=6062.1.6 Sample size and power analysis2020-10-15T06:07:27Z<p>85.216.81.116: /* A. Background & Definitions */</p>
<hr />
<div><br />
<br />
UNDER CONSTRUCTION<br />
<br />
<br />
<br />
<br />
<br />
== A. Background & Definitions ==<br />
<br />
Statistical power is defined as the probability of detecting as statistically significant a clinically or practically important difference of a pre-specified size, if such a difference truly exists. Formally, power is equal to 1 minus the Type II error rate (beta or ß). The Type II error rate is the probability of obtaining a non-significant result when the null hypothesis is false — in other words, failing to find a difference or relationship when one exists.<br />
<br />
Balancing sample size, effect size and power is critical to good study design. When the power is low, only large effects can be detected, and negative results cannot be reliably interpreted. The consequences of low power are particularly dire in the search for high-impact results, when the researcher may be willing to pursue low-likelihood hypotheses for a groundbreaking discovery (see Fig. 1 in [https://www.nature.com/articles/nmeth.2738 Krzywinski & Altman 2013]). Ensuring that sample sizes are large enough to detect the effects of interest is an essential part of study design.<br />
<br />
Studies with inadequate power are a waste of research resources and arguably unethical when subjects are exposed to potentially harmful or inferior experimental conditions. Addressing this short- coming is a priority—the Nature Publishing Group checklist for statistics and methods (http://www.nature.com/authors/policies/ checklist.pdf) includes as the first question: “How was the sample size chosen to ensure adequate power to detect a pre-specified effect size?” <br />
<br />
Statistical power analysis exploits the relationships among the four variables involved in statistical inference: sample size (N), significance criterion (α), effect size (ES), and statistical power. For any statistical model, these relationships are such that each is a function of the other three.<br />
<br />
== B. Guidance & Expectations ==<br />
<br />
General advice - DO:<br />
<br />
* Whenever possible, seek professional biostatistician support to estimate sample size.<br />
* Use power prospectively for planning future studies.<br />
* Put science before statistics. It is easy to get caught up in statistical significance and such; but studies should be designed to meet scientific goals, and you need to keep those in sight at all times (in planning and analysis). The appropriate inputs to power/sample-size calculations are effect sizes that are deemed scientifically important, based on careful considerations of the underlying scientific (not statistical) goals of the study. Statistical considerations are used to identify a plan that is effective in meeting scientific goals – not the other way around.<br />
* Do pilot studies. Investigators tend to try to answer all the world’s questions with one study. However, you usually cannot do a definitive study in one step. It is far better to work incrementally. A pilot study helps you establish procedures, understand and protect against things that can go wrong, and obtain variance estimates needed in determining sample size. A pilot study with 20-30 degrees of freedom for error is generally quite adequate for obtaining reasonably reliable sample-size estimates.<br />
* Effect size should be specified on the actual scale of measurement, not on a standardized scale.<br />
* Generate sample size estimates for a range of power and effect size values to explore the gains and and losses in power or detectable effect size due to increasing or decreasing n.<br />
<br />
<br />
General advice - DO NOT:<br />
<br />
* Avoid using the definition of “small,” “medium,” or “large” effect size based on Cohen's d of .20, .50, or .80, respectively. Cohen's assessments are based on an extensive survey of statistics reported in the literature in the social sciences and may not apply to other fields of science. Further, this method uses a standardized effect size as the goal. Think about it: for a “medium” effect size, you’ll choose the same n regardless of the accuracy or reliability of your instrument, or the narrowness or diversity of your subjects. Clearly, important considerations are being ignored here. “Medium” is definitely not the message!<br />
<br />
* Retrospective power calculations should generally be avoided, because they add no new information to an analysis (i.e. avoid using observed power to interpret the results of the statistical test). You’ve got the data, did the analysis, and did not achieve “significance.” So you compute power retrospectively to see if the test was powerful enough or not. This is an empty question. Of course it wasn’t powerful enough – that’s why the result isn’t significant. Power calculations are useful for design, not analysis. <br />
<br />
<br />
We begin by setting the values of type I error (a) and power (1 – b) to be statistically adequate: traditionally 0.05 and 0.80, respectively. We then determine n on the basis of the smallest effect we wish to measure. If the required sample size is too large, we may need to reassess our objectives or more tightly control the experimental conditions to reduce the variance.<br />
<br />
== C. Resources ==<br />
<br />
Tools to sample size estimation:<br />
<br />
* [http://www.gpower.hhu.de/ G*Power]<br />
* [https://wise1.cgu.edu/power/index.asp WISE power tutorial]<br />
* [http://davidmlane.com/hyperstat/power.html JAVA applets for power and sample size]<br />
* [https://www.psychometrica.de/effect_size.html Computation of sample sizes @Psychometrica]<br />
<br />
<br />
Educational instruments and resources:<br />
<br />
* Mayo clinical online simulator - Size matters [https://rtools.mayo.edu/size_matters/]<br />
* Scientists talking to biostatisticians [https://www.youtube.com/watch?v=PbODigCZqL8&feature=youtu.be]<br />
<br />
<br />
Useful literature (for non-statisticians):<br />
<br />
[https://stat.uiowa.edu/sites/stat.uiowa.edu/files/techrep/tr303.pdf Practical advice on sample size estimation by Russell Lenth]<br />
<br />
Useful literature:<br />
<br />
[https://pdfs.semanticscholar.org/1325/24bdfe70504fcd67016b17305ccddb4bcd14.pdf Power in various ANOVA designs by Joel Levin]<br />
<br />
* [https://www.ncbi.nlm.nih.gov/books/NBK43321/ https://www.ncbi.nlm.nih.gov/books/NBK43321/]<br />
* [http://davidmlane.com/hyperstat/power.html http://davidmlane.com/hyperstat/power.html]<br />
* [http://powerandsamplesize.com/Calculators/Test-1-Mean/1-Sample-Equality http://powerandsamplesize.com/Calculators/Test-1-Mean/1-Sample-Equality]<br />
<br />
Guidelines on reporting of sample size (in vivo research):<br />
* [[ARRIVE 2.0]] <br />
<br />
<br />
----------------<br />
back to [[Toolbox]]<br />
<br />
Next item: [[2.1.7 Blinding]]</div>85.216.81.116http://eqipd-toolbox.paasp.net/index.php?title=2.1.6_Sample_size_and_power_analysis&diff=6052.1.6 Sample size and power analysis2020-10-15T06:06:53Z<p>85.216.81.116: /* A. Background & Definitions */</p>
<hr />
<div><br />
<br />
UNDER CONSTRUCTION<br />
<br />
<br />
<br />
<br />
<br />
== A. Background & Definitions ==<br />
<br />
Statistical power is defined as the probability of detecting as statistically significant a clinically or practically important difference of a pre-specified size, if such a difference truly exists. Formally, power is equal to 1 minus the Type II error rate (beta or ß). The Type II error rate is the probability of obtaining a non-significant result when the null hypothesis is false—in other words failing to find a difference or relationship when one exists.<br />
<br />
Balancing sample size, effect size and power is critical to good study design. When the power is low, only large effects can be detected, and negative results cannot be reliably interpreted. The consequences of low power are particularly dire in the search for high-impact results, when the researcher may be willing to pursue low-likelihood hypotheses for a groundbreaking discovery (see Fig. 1 in [https://www.nature.com/articles/nmeth.2738 Krzywinski & Altman 2013]). Ensuring that sample sizes are large enough to detect the effects of interest is an essential part of study design.<br />
<br />
Studies with inadequate power are a waste of research resources and arguably unethical when subjects are exposed to potentially harmful or inferior experimental conditions. Addressing this short- coming is a priority—the Nature Publishing Group checklist for statistics and methods (http://www.nature.com/authors/policies/ checklist.pdf) includes as the first question: “How was the sample size chosen to ensure adequate power to detect a pre-specified effect size?” <br />
<br />
Statistical power analysis exploits the relationships among the four variables involved in statistical inference: sample size (N), significance criterion (α), effect size (ES), and statistical power. For any statistical model, these relationships are such that each is a function of the other three.<br />
<br />
== B. Guidance & Expectations ==<br />
<br />
General advice - DO:<br />
<br />
* Whenever possible, seek professional biostatistician support to estimate sample size.<br />
* Use power prospectively for planning future studies.<br />
* Put science before statistics. It is easy to get caught up in statistical significance and such; but studies should be designed to meet scientific goals, and you need to keep those in sight at all times (in planning and analysis). The appropriate inputs to power/sample-size calculations are effect sizes that are deemed scientifically important, based on careful considerations of the underlying scientific (not statistical) goals of the study. Statistical considerations are used to identify a plan that is effective in meeting scientific goals – not the other way around.<br />
* Do pilot studies. Investigators tend to try to answer all the world’s questions with one study. However, you usually cannot do a definitive study in one step. It is far better to work incrementally. A pilot study helps you establish procedures, understand and protect against things that can go wrong, and obtain variance estimates needed in determining sample size. A pilot study with 20-30 degrees of freedom for error is generally quite adequate for obtaining reasonably reliable sample-size estimates.<br />
* Effect size should be specified on the actual scale of measurement, not on a standardized scale.<br />
* Generate sample size estimates for a range of power and effect size values to explore the gains and and losses in power or detectable effect size due to increasing or decreasing n.<br />
<br />
<br />
General advice - DO NOT:<br />
<br />
* Avoid using the definition of “small,” “medium,” or “large” effect size based on Cohen's d of .20, .50, or .80, respectively. Cohen's assessments are based on an extensive survey of statistics reported in the literature in the social sciences and may not apply to other fields of science. Further, this method uses a standardized effect size as the goal. Think about it: for a “medium” effect size, you’ll choose the same n regardless of the accuracy or reliability of your instrument, or the narrowness or diversity of your subjects. Clearly, important considerations are being ignored here. “Medium” is definitely not the message!<br />
<br />
* Retrospective power calculations should generally be avoided, because they add no new information to an analysis (i.e. avoid using observed power to interpret the results of the statistical test). You’ve got the data, did the analysis, and did not achieve “significance.” So you compute power retrospectively to see if the test was powerful enough or not. This is an empty question. Of course it wasn’t powerful enough – that’s why the result isn’t significant. Power calculations are useful for design, not analysis. <br />
<br />
<br />
We begin by setting the values of type I error (a) and power (1 – b) to be statistically adequate: traditionally 0.05 and 0.80, respectively. We then determine n on the basis of the smallest effect we wish to measure. If the required sample size is too large, we may need to reassess our objectives or more tightly control the experimental conditions to reduce the variance.<br />
<br />
== C. Resources ==<br />
<br />
Tools to sample size estimation:<br />
<br />
* [http://www.gpower.hhu.de/ G*Power]<br />
* [https://wise1.cgu.edu/power/index.asp WISE power tutorial]<br />
* [http://davidmlane.com/hyperstat/power.html JAVA applets for power and sample size]<br />
* [https://www.psychometrica.de/effect_size.html Computation of sample sizes @Psychometrica]<br />
<br />
<br />
Educational instruments and resources:<br />
<br />
* Mayo clinical online simulator - Size matters [https://rtools.mayo.edu/size_matters/]<br />
* Scientists talking to biostatisticians [https://www.youtube.com/watch?v=PbODigCZqL8&feature=youtu.be]<br />
<br />
<br />
Useful literature (for non-statisticians):<br />
<br />
[https://stat.uiowa.edu/sites/stat.uiowa.edu/files/techrep/tr303.pdf Practical advice on sample size estimation by Russell Lenth]<br />
<br />
Useful literature:<br />
<br />
[https://pdfs.semanticscholar.org/1325/24bdfe70504fcd67016b17305ccddb4bcd14.pdf Power in various ANOVA designs by Joel Levin]<br />
<br />
* [https://www.ncbi.nlm.nih.gov/books/NBK43321/ https://www.ncbi.nlm.nih.gov/books/NBK43321/]<br />
* [http://davidmlane.com/hyperstat/power.html http://davidmlane.com/hyperstat/power.html]<br />
* [http://powerandsamplesize.com/Calculators/Test-1-Mean/1-Sample-Equality http://powerandsamplesize.com/Calculators/Test-1-Mean/1-Sample-Equality]<br />
<br />
Guidelines on reporting of sample size (in vivo research):<br />
* [[ARRIVE 2.0]] <br />
<br />
<br />
----------------<br />
back to [[Toolbox]]<br />
<br />
Next item: [[2.1.7 Blinding]]</div>85.216.81.116http://eqipd-toolbox.paasp.net/index.php?title=2.1.8_Randomisation&diff=6042.1.8 Randomisation2020-10-15T05:42:47Z<p>85.216.81.116: /* C. Resources */</p>
<hr />
<div>== A. Background & Definitions ==<br />
<br />
Randomisation is a process of random assignment of experimental units to treatment conditions:<br />
<br />
occurrence of one event should have no influence on the next event (independence principle)<br />
randomisation sequence cannot be based on an easily memorizable and reproducible sequence (randomness principle)<br />
Randomization serves three main purposes:<br />
<br />
enables the application of statistical tests based on the central limit theorem<br />
prevents a potential impact of the selection bias due to differing baseline or confounding characteristics of the subjects<br />
supports the implementation of other means to reduce the risks of bias (such as blinding)<br />
<br />
== B. Guidance & Expectations ==<br />
Randomisation protocol should describe the following:<br />
* Type of randomisation (simple / unrestricted, block, stratified, etc.)<br />
* Block size (if applicable)<br />
* Stratification variables (if applicable)<br />
* Tools used for randomisation (including copy of a script if R, SAS or another similar script-based software is used)<br />
* Reproducibility of the randomisation protocol such as the seed of random number generator (if applicable)<br />
* Reference to the protocol followed (if applicable)<br />
* Methods to monitor / detect deviations from the protocol (if any)<br />
* If a decision is made not to introduce a proper randomisation protocol, the reasons should be discussed in a declaration justifying the decision to use pseudo-randomisation or simple interspersion methods.<br />
<br />
'''RISK ASSESSMENT'''<br />
* Is pseudo-randomisation used instead of strongly recommended true randomisation?<br />
* Is there a risk that randomisation is introduced at allocation of subjects per experimental groups but is not maintained throughout the study conduct, outcome assessment and data analysis?<br />
<br />
'''PLEASE DO NOT FORGET'''<br />
* To consider adding this subject to a training program for new employees or refresher training (if appropriate)<br />
* To check whether there are feedback channels installed so that your colleagues can identify, record and report errors and critical incidents related to this subject (if appropriate)<br />
<br />
<br />
== C. Resources ==<br />
<br />
<br />
Guidelines on reporting of randomization (in vivo research):<br />
<br />
[[ARRIVE 2.0]] <br />
<br />
Online tools to support randomisation:<br />
<br />
* NC3Rs’ Experimental Design Assistant - [www.eda.nc3rs.org.uk]<br />
<br />
* QuickCalcs - [www.graphpad.com/quickcalcs/randMenu/]<br />
<br />
* Sealed Envelope - [https://www.sealedenvelope.com/simple-randomiser/v1/lists]<br />
<br />
* RandoMice software - [[https://doi.org/10.1371/journal.pone.0237096 read]] - [[https://github.com/Rve54/RandoMice/releases/ download and install]]<br />
<br />
<br />
Reading material:<br />
<br />
Handbook of Experimental pharmacology chapter on randomization and blinding [https://link.springer.com/chapter/10.1007/164_2019_279]<br />
<br />
<br />
<br />
----------------<br />
back to [[Toolbox]]<br />
<br />
Next item: [[2.1.9 Inclusion and exclusion criteria]]</div>85.216.81.116http://eqipd-toolbox.paasp.net/index.php?title=2.1.8_Randomisation&diff=6032.1.8 Randomisation2020-10-15T05:42:29Z<p>85.216.81.116: /* C. Resources */</p>
<hr />
<div>== A. Background & Definitions ==<br />
<br />
Randomisation is a process of random assignment of experimental units to treatment conditions:<br />
<br />
occurrence of one event should have no influence on the next event (independence principle)<br />
randomisation sequence cannot be based on an easily memorizable and reproducible sequence (randomness principle)<br />
Randomization serves three main purposes:<br />
<br />
enables the application of statistical tests based on the central limit theorem<br />
prevents a potential impact of the selection bias due to differing baseline or confounding characteristics of the subjects<br />
supports the implementation of other means to reduce the risks of bias (such as blinding)<br />
<br />
== B. Guidance & Expectations ==<br />
Randomisation protocol should describe the following:<br />
* Type of randomisation (simple / unrestricted, block, stratified, etc.)<br />
* Block size (if applicable)<br />
* Stratification variables (if applicable)<br />
* Tools used for randomisation (including copy of a script if R, SAS or another similar script-based software is used)<br />
* Reproducibility of the randomisation protocol such as the seed of random number generator (if applicable)<br />
* Reference to the protocol followed (if applicable)<br />
* Methods to monitor / detect deviations from the protocol (if any)<br />
* If a decision is made not to introduce a proper randomisation protocol, the reasons should be discussed in a declaration justifying the decision to use pseudo-randomisation or simple interspersion methods.<br />
<br />
'''RISK ASSESSMENT'''<br />
* Is pseudo-randomisation used instead of strongly recommended true randomisation?<br />
* Is there a risk that randomisation is introduced at allocation of subjects per experimental groups but is not maintained throughout the study conduct, outcome assessment and data analysis?<br />
<br />
'''PLEASE DO NOT FORGET'''<br />
* To consider adding this subject to a training program for new employees or refresher training (if appropriate)<br />
* To check whether there are feedback channels installed so that your colleagues can identify, record and report errors and critical incidents related to this subject (if appropriate)<br />
<br />
<br />
== C. Resources ==<br />
<br />
<br />
Guidelines on reporting of randomization (in vivo research):<br />
<br />
[[ARRIVE 2.0]] <br />
<br />
Online tools to support randomisation:<br />
<br />
* NC3Rs’ Experimental Design Assistant - [www.eda.nc3rs.org.uk]<br />
<br />
* QuickCalcs - [www.graphpad.com/quickcalcs/randMenu/]<br />
<br />
* Sealed Envelope - [https://www.sealedenvelope.com/simple-randomiser/v1/lists]<br />
<br />
* RandoMice software - [[ https://doi.org/10.1371/journal.pone.0237096 read]] - [[https://github.com/Rve54/RandoMice/releases/ download and install]]<br />
<br />
<br />
Reading material:<br />
<br />
Handbook of Experimental pharmacology chapter on randomization and blinding [https://link.springer.com/chapter/10.1007/164_2019_279]<br />
<br />
<br />
<br />
----------------<br />
back to [[Toolbox]]<br />
<br />
Next item: [[2.1.9 Inclusion and exclusion criteria]]</div>85.216.81.116