The last time we each learned something new was probably outside of a classroom: looking up the ideal watering frequency for balcony tomatoes, listening to a podcast, or talking to a knowledgeable friend about a news item they knew more about. Informal educators, such as workshop facilitators in a science museum, provide a huge benefit to society. Yet, much informal education happens without clear mechanisms for feedback or opportunities to reflect on how programs can better serve learners. Evaluation can help improve an informal education program and demonstrate value to funders and clients. However, evaluating the impact of informal education can be a big challenge, especially for instructors or program administrators who might not have the resources to engage an external evaluator.

The Information and Communications Technology Council (ICTC) is working with Science North to evaluate CanCode programming – informal STEM education for youth – through an observation tool. In this article, we talk about our experience building an observation tool with advice for other informal educators interested in observation as a method. Much of this advice is a combination of our interviews with evaluation practitioners, a literature review, and early reflections from our observation tool pilot.

Background: What is informal education, and how does evaluation fit in?

Informal education includes nearly all learning outside of a structured, curriculum-focused setting, such as museums, podcasts, or field day workshops.

Facilitated informal education (with an instructor) offers “learner choice, low consequence assessment, and structures that build on learners’ motivations, culture, and competence” and create a “safe, non-threatening, open-ended environment for engaging with [the topic]” (NRC, 2009, p. 47).

At ICTC, our focus is on science, technology, engineering, and math (STEM) and the “soft” or transferable/human skills like critical thinking, teamwork, and problem-solving that help everyone excel. We hope, however, that this article contains helpful information for educators from other informal learning settings.

Evaluating an informal learning program begins by articulating what you aim to achieve with your program, and then identifying which outputs or outcomes you want to try to assess.

Most informal education programs have numerous intended outcomes that could be measured. Are you interested in understanding how satisfied parents are with your program, how faithfully your instructors deliver curriculum, or what socio-emotional learning occurs for students during the program? Determining the parts of your program that you want to measure is an essential first step in designing an evaluation and figuring out whether an observation tool is right for you. To help you get started, there are many great resources out there on articulating program goals through logic models and theories of change.

Observation has its limitations! In program evaluation, we often try to identify cause and effect: this is a key challenge in informal education, and with observation tools.

It would be very helpful to say that a specific STEM initiative taught its participants new things or resulted in intentional learning outcomes. However, evaluating learning outcomes in informal education creates a serious problem of attribution. For example, think through your last visit to a museum: can you pinpoint something new you learned, and are you sure that it was new, rather than a reminder of something you’d learned in school or online? As Staus et al. (2021) ask, “to what extent is it reasonable to assume that measured outcomes and effects for a specific informal learning experience can be directly attributed to participant engagement in that particular experience? Did these experiences actually ‘cause’ observed outcomes or did they merely contribute to those outcomes?” (p. 2). Furthermore, measuring knowledge retention in informal education can be a challenge if evaluators don’t have access to participants in the long term (Diamond et al., 2016). Staus et al. and other researchers have suggested that measuring “contribution” to outcomes is more appropriate than trying to assert that informal education has generated an entirely new skill or insight.

An evaluation need not be limited to pinpointing contributions to learning: it can also investigate engagement, experience, attitude, and equity. These measures may be better candidates for use in an observation tool.

Museums frequently evaluate their exhibitions by degree and quality of visitor engagement, and equity and diversity in access to and impact of their services. Furthermore, informal learning outcomes can include awareness or knowledge; interest; attitudes; and behaviour and skills (Diamond et al., 2016). A common tool in museum visitor evaluation is observation, often taking the form of “timing and tracking” rubrics that examine which installation features garner the most interest, and what kind of interactions they attract, not dissimilar to user experience research. Museums may have a researcher physically present, observing one visitor at a time to see how long they spend doing what, or they may (typically with signs up to alert visitors) film an exhibition space, time sample, and recode audience behaviour after the fact.

Observation tools can also be used in evaluations of informal or formal education, where a researcher sits in on a workshop or learning event. Observation offers several benefits: primarily, it is low-impact and doesn’t require educators to collect personally identifying information or chase learners for survey participation. Of course, observation can be combined with other evaluation tools like instructor and student interviews, or pre- and post-surveys, in formative evaluation (informing improved program delivery) or summative evaluation (assessing the degree to which a completed program was successful).

This project is developing an observation tool (or protocol) for youth in an in-person, field day setting. In what follows, we share some practical advice for developing an evaluation tool garnered through experience and benchmarking interviews with other evaluation professionals. These tips should speak to contexts other than youth workshops: nevertheless, they do assume physical presence in the same space.

10 things we’ve learned while designing an observation tool for evaluating a youth informal science education program.

1. Make a clear list of the constructs you’d like to try to measure (e.g., equity of engagement) and think through what indicators will provide a valid way of assessing them.

A central challenge in observation is being sure that what you’re seeing is actually a good measurement for what you’d like to understand. For example, is counting the number of students who raise their hands adequate for assessing engagement, or should a tool build in a way to incorporate students who focus quietly on their work? Going to other observation tools for examples of indicators they have validated (e.g., comparing self-reported engagement in a survey with observation tool engagement rating to be sure the two correlate) is a helpful way of thinking through indicators with better rigour.

2. Use clear and well-defined language in the tool’s instructions.

Avoid ambiguous language, or language that builds in assumptions. For example, rather than asking what proportion of students in a class appear engaged, ask observers to record behaviours and utterances without attributing interpretations to them until after the observation is complete. This is especially important in the informal education setting when you may have new or inexperienced observers, such as volunteers.

3. Be deliberate in your design to make sure every question on the tool is there for a reason.

Asking observers to do too much—especially in informal settings, where external evaluators aren’t always available or brought in—can lead to an unreliable tool. It is important to be mindful of how much an individual can actually observe and take note of. For example, questionnaires that are too long, overcomplicated, or ask evaluators to conduct live coding of information, may go beyond what an individual can accurately keep tabs on.

4. Consider who your observers are; usability is key.

Observers may be parents, practitioners, or other staff or volunteers that may not have extensive experience with observation. Consider how best to break down observation dimensions to make sure they are interpretable and meaningful to those who are observing. A tool that is widely usable may not ask evaluators to measure counts or complete rating scales, but may provide standardized frames for dimensions that ask, “Did X happen? Yes or no.”

5. Consider using qualitative note-taking and assigning ratings after the fact.

Many validated observation tools for informal science education have observers take rich descriptive notes (focusing on verbal utterances and observable behaviours), using a rubric after the fact to assign scores across constructs of interest.

6. Integrate multi-point evidence/triangulation wherever you can.

Consider using “think alouds,” student and instructor surveys or interviews, or assignment scores to compare with your observation findings and see if different tools agree on program assessment. Finding things that students already do to use as data (e.g., workshop activities demanding some level of observable engagement) can be helpful for streamlining your design.

7. Try your tool out before you use it.

There are a number of open access samples of filmed in-class observations. While they will likely not be a perfect match with your use case, they will help shine a light on immediate usability issues and easy fixes before you bring your tool into its first real-world pilot.

Piloting and Training

8. Select an intentional sampling technique.

Some observation tools follow an individual learner throughout their experience, others use time sampling (e.g., observing everything that is going on every five minutes), still others recommend a continuous scan. Each comes with different pros and cons and merits some research into what works best for your program. Another way to think about sampling is giving observers clear instructions for who they should focus on and when: in particular, be clear on whether you are observing learners or instructors. Science educators, particularly those working with youth, are skilled at drawing attention to them: it is important to train observers to maintain attention on students if that is the main focus of the tool.

9. Use piloting to refine a tool’s usability and train observers.

It is important to account for the fact that if your observers are not a third party, they won’t be as objective. In informal settings, it is sometimes hard for observers to just “see” and not “interpret.” Training can help shift people’s minds from interpretation to description. If your observers are practitioners, remind them not to go into “educator mode” and start teaching or helping. Training in this case should cover staying neutral and observing without affecting what’s happening in the program.

10. Use interrater agreement and reliability as training tools.

Interrater agreement is the extent to which two different observers assign similar scores to the same observation. Interrater reliability is correlation between rater scores over several observations. Interrater agreement and reliability are helpful tools for making sure that observers are measuring the same quantifiable indicators. In practice, many evaluators report that interrater reliability is hard to achieve; while some subjectivity in observation is inevitable, these measurements remain important touchstones for ensuring that all observers are trying to use your tool in the same way.

Open Access Observation Resources for Informal Science Education Evaluation

  1. The Activation Lab Observation Tool: see also this discussion of tool design and the purpose of the Activation Lab project.
  2. LEAP into Science Observation Protocol: see Appendix B
  3. iCode Observation Protocol: see Appendix I
  4. Assessment Tool for an Environmental Field Day Project: While this tool is not available online, see a discussion of how the tool was validated through comparison with a student survey and how the authors assessed interrater reliability.
  5. Building Informal Science Education (BISE) project and database of evaluation reports.
  6. : a searchable database of ISE evaluations and evaluation tools.


Diamond, J., Horn, M., Uttal., D. (2016). Practical Evaluation Guide: Tools for Museums and Other Informal Educational Settings: Third Edition. Rowman & Littlefield.

National Research Council. (2009). Learning science in informal environments. National Academies Press.

Staus, N.L., Falk, J.H., Price, A. et al. Measuring the long-term effects of informal science education experiences: challenges and potential solutions. Discip Interdscip Sci Educ Res 3, 3 (2021).…