Assessing Science Learning Outcomes Using Assessment Instruments Based on Higher Order Thinking Skills

There were still many teachers who had difficulty making student learning assessment instruments. The orientation of the learning carried out was still at the level of low-level thinking (remembering, memorizing, and understanding) so that students had low knowledge. This study aimed to develop an instrument for assessing science learning outcomes on Theme 8 based on Higher Order Thinking Skills. This type of research was development research. The model used in this study was 4D which is divided into four stages, namely Define, Design, Develop, and Disseminate. The techniques used in collecting data were observation, interviews, document recording, rating scales, and tests. The instrument used in collecting data was a questionnaire. The results of the study will be analyzed for validity, reliability, discriminating power, and level of difficulty, quality of distractors, practitioner responses, and student responses. Instruments designed to collect data were tested first to find out the instrument's validity. The results of the research were the analysis of the content validity test and the items were in very high criteria. Based on the results of the reliability test, the instrument being tested had a very high consistency. In the analysis of the quality of the distractor, the instrument functions well. In the analysis of practitioners' responses and student responses, the category was very good. It can be concluded that the instrument that assesses the science learning outcomes of theme 8 based on Higher Order Thinking Skills was valid and feasible to be used as an assessment instrument for students.


INTRODUCTION
Learning is an activity that aims to convey information or knowledge to students (Dwipayana et al., 2018;Sari et al., 2016). Learning can be said to be successful if the learning objectives can be achieved optimally. In the learning process, all students must be involved so that they can activate students in learning (Hartini et al., 2014;Hartuti, 2015). Learning becomes effective if the teacher can create a Ni Made Mita Puspita Dewi 1, I Made Suarjana 2 , Luh Putu Putrini Mahadewi 3 / Assessing Science Learning Outcomes Using Assessment Instruments Based on Higher Order Thinking Skills conducive learning environment. Learning will be fun if students are motivated in learning (Dewi, 2018;Suprihatin, 2015). Effective learning must be created by teachers in all subjects, especially in science subjects. Science is one of the compulsory subjects. Science learning can make students have scientific knowledge, skills, and attitudes (Agustina, 2015;Halim, 2017). Through learning science, students can also improve their ability to adapt to changes. When learning science, the teacher is obliged to create a learning atmosphere that is fun, interesting, and not boring so that students can understand science material well. The development of science learning has three activities, namely starting from the planning, implementation, and assessment stages in learning (Anwar, 2018;Zulmaharni, 2016). The problem that occurs today is that there are still many teachers who have difficulty making student learning assessment instruments (Arif, 2016;Pratiwi, 2017). This problem also occurs in one elementary school. Based on the results of observations, interviews, and document recording conducted in elementary school class V Tangkuban Perahu Gugus, Melaya District, various problems were found in the assessment of science learning. The results of observations that have been made found problems, namely students are less active during the learning process, and student learning outcomes are low. Based on the results of the interviews found problems, namely (1) the teacher has difficulty in determining the right instrument to assess student learning outcomes, (2) the orientation of the learning carried out is still at the level of low-level thinking (remembering, memorizing, and understanding). This is evidenced in the assessment instruments used by teachers are still at levels C1 to C3. (3) There is no HOTS-based assessment instrument with cases of everyday life phenomena that have a high level of validity and reliability, teachers only use assessment instruments obtained from core cluster schools. Previous research also stated that the results of the PAP analysis stated that elementary school students tended to have sufficient HOTS Thinking Ability and were still low in solving C6 cognitive problems (Saraswati & Agustika, 2020).
Based on these problems, then proposed solutions to overcome them. One solution that can be done is to develop a HOTS assessment instrument. Assessment is a process of collecting and processing information to measure the achievement of student learning outcomes (Chng & Lund, 2018;Edriati et al., 2015;Wicaksono et al., 2016;Zuliani et al., 2017). Achievement of learning outcomes is knowledge, skills, and attitudes. Teachers play an important role in training students so that they have higher-order thinking skills which are the demands of the 2013 curriculum (Dewi, 2018;Mega et al., 2015;Mulyadin, 2016). To have higher-order thinking skills (HOTS), teachers can provide HOTS-based test questions to train students. Higher-Order Thinking Skills (HOTS) based test questions can help students develop higherorder thinking skills (Andoko, 2020;Ineson et al., 2013;Pratiwi, 2017;Saraswati & Agustika, 2020). The abilities in question are critical, reflective, metacognitive, and creative thinking skills. In the preparation of HOTS questions, they generally use a stimulus. Stimulus is the basis for making questions. Higher Order Thinking Skills stimulate students to interpret, analyze or even be able to manipulate previous information so that it is not monotonous (Budiarti et al., 2020;Fadzam & Rokhimawan, 2020;Khotimah & Sari, 2020;Suratno et al., 2020). Higher Order Thinking Skills (HOTS) skill to explain, connect ideas and facts, determine hypotheses, analyze, to conclude. This ability is a demand for the 2013 curriculum, namely, students not only know, understand, and apply but are required to be able to analyze, evaluate and even create (Nasihin, 2016;Sanjiwana et al., 2015;Sofyan, 2016).
In the context of HOTS, the stimulus presented is contextual and interesting. The stimulus can be sourced from global issues such as information technology, social, economic, health and education issues (Ardiansyah, 2018;Nurmala & Mucti, 2019). The stimulus can be raised from problems in the environment around the education unit such as culture and cases in certain areas. Teacher creativity affects the quality and variety of stimuli used in writing HOTS questions (Ani Rahmawati, Nur Lailatin Nisfah, 2019). In preparing HOTS questions, context is the most important thing that must be considered. Context is an utterance in the form of a sentence that has the intention of knowing the meaning of the sentence or utterance in a situation related to the event being discussed (Afrita & Darussyamsu, 2020;Ibrahim et al., 2020). The findings of previous studies stated that the HOTS assessment was feasible for elementary school students (Herawati et al., 2014). Then other research findings also state that HOTSbased questions are very important to measure learning achievement (Prastikawati et al., 2021;Umami et al., 2021). There is no in-depth study of the Higher-Order Thinking Skills-based science learning outcome assessment instrument. This study aims to develop an instrument for assessing science learning outcome's theme 8 based on Higher Order Thinking Skills. The instrument used is an objective test or multiple choice. This test instrument has a characteristic on questions that contain phenomena of everyday life so that students can analyze and think at a higher level.

METHOD
This type of research is development research. The procedure or development stage used in this research is to use a 4D model divided into four stages, namely Define, Design, Develop and Disseminate (Agung, 2014). This model is used because it is systematic and easy to understand. The subject of this research is the HOTS-based science learning outcome assessment instrument on the theme 8 Neighborhoods of Our Friends on the topic of the fifth grade of the elementary school water cycle. As a research subject, it will be tested by experts in the field of science content. The techniques used in collecting data are observation, interviews, document recording, rating scales, and tests. Observations and interviews were conducted to dig up information about the obstacles related to science learning that children have in participating in the learning process. Document studies are used to provide stable and useful sources of evidence in a test. The study document in this study was in the form of UTS IPA scores for class V students. Questionnaires were used to determine practitioner responses and student responses. The test is used to determine students' understanding. The instrument used to collect data is a questionnaire. The instrument used in collecting data is a questionnaire. The questionnaire grid is presented in Table 1. Summarizing factors affecting water quality 3.8.5 Analyzing the function of water for living things 3.8.6 Analyzing the relationship between human and human environment 3.8.7 Discovering the process of the water cycle 3.8.8 Proving the function of water as an important element of the environment 3.8.9 Summarizing the impact of the water cycle on the life 3.8.10 Determine the factors that affect the water cycle 3.8.11 Analyzing the availability of clean water 3.8.12 Summarizing the factors that affect water quality 3.8.13 Choosing ways to maintain the availability of clean water 3.8.14 Analyzing the occurrence of the water cycle  The results of the study will be analyzed for validity, reliability, discriminating power, and level of difficulty, quality of distractors, practitioner responses, and student responses. Instruments designed to collect data are then tested first to find out the instrument's validity. The research instrument was analyzed using the content validity test and the test item validity test. The content validity test was carried out using the Gregory formula. The selected empirical item validity is the biserial point correlation technique. A reliability test was also conducted to measure the validity of the test.

RESULT AND DISCUSSION
The procedure or development stages used in the research using the 4D model are divided into four stages, namely Define, Design, Develop, and Disseminate. The presentation of the procedure is as follows. The initial step is to define. At this stage the activities carried out are observation, interviews, document studies, needs analysis, analysis of student characteristics, and task analysis, these activities are carried out to find out the difficulties that occur at school. The second stage designs. At this stage, the preparation of the grid and initial design is carried out. The resulting grid is a science question grid. At this stage, the preparation of multiple-choice (objective) science instruments is carried out, making validation sheets and printing instruments. The third stage is developing. At this stage, the development of an assessment instrument for the science learning outcomes of theme 8 based on higher-order thinking skills was carried out. After the media is developed, it is then tested by experts. This stage is divided into expert appraisal and developmental testing. Expert Appraisal (Expert Validation) is a stage that must be carried out and passed before the developed instrument is used or given to students. This expert validation stage uses a content validation analysis test in the form of a validation sheet given to the expert, who will later provide useful assessments and input for the improvement and refinement of the resulting product. The results of the expert test are presented in Table 4. Relevant  Irrelevant  Relevant  Irrelevant  1 ,2,3,4,5,6,7,8,9,10,11,12,13,14,15, 16,17,18,19,20,21,22,23,24,25,26, 27,28,29,30 -2,4,5,6,7,8,9,10,11,12,13, 14,15,16,17,18,19,20,21, 22,23,24,25,26,27,28,29,30 1 and 3

Expert I Expert II
Based on Table 4. it is known that the 28 statements developed are stated to be relevant. A total of 2 statements are irrelevant. Based on the calculation results, the content validity index of the instrument is 0.93. If it is conventionalized according to Arikanto, then the science learning outcomes instrument tested is in the "Very High" criteria. The next stage of Developmental Testing is carried out in several stages, namely practitioner response testing, student response testing involving individual trials, and finally small group trials. Practitioners' responses were carried out to provide values and responses to the instruments that had been made. The test of the instrument was carried out in two stages, the first was the student response test which was tested with individual trials, the aim was to obtain an initial response or response to the contents of the instrument from the students and to find out and get rid of the most conspicuous errors in the instrument. Based on the practitioner response analysis, the average practitioner response score was 3.69. This means that the HOTS-based science learning outcome assessment instrument is valid and gets a very good category. The results of the analysis of student responses the average student response score as a whole is 3.58. This means that the HOTS-based science learning outcome assessment instrument is valid and gets the "Very good" category.
After being assessed by practitioners and students, a small group trial was then conducted. In the trial of this instrument product, the test subjects were grade VI elementary school students in Cluster V Tangkuban Perahu, Melaya District for the 2020/2021 academic year, who played an important role in each stage of the trial, both individual trials and small group trials. Next is the small group trial stage, the test subjects in this stage consist of three groups, each group consists of 9 students, with categories of three students with high learning achievement, three students with moderate learning achievement, and three students with low learning achievement, so that the total number of small group trials is 27 students. The results of the calculation of the validity of the instrument items from small group trials are presented in Table 5.  2,3,4,6,7,8,9,10,11,12,13,14,15, 16,17,18,19,20,21,22,24,25,26, and 27 25 invalid 5, 23 and 28 3 Based on Table 6. it is known that of the 27 questions developed, it is stated that they are relevant. A total of 3 questions are irrelevant. Based on the calculation results, the item validity index is 0.39. Invalid questions indicate that the quality of the questions is not good. Thus, 3 invalid questions are declared invalid and will not be used as an assessment instrument. Furthermore, the reliability test was carried out using items that were declared valid. Based on the calculation results, the reliability coefficient of 25 items obtained KR-20 = 0.933292. If the normative criteria of test reliability are convened according to Koyan, it can be concluded that the instrument being tested has a consistency of answers or "Very High" reliability. After the reliability is carried out, a different power test is carried out. The results of the different power tests are presented in Table 6.  2,3,5,6,11,12 and 20 8 Not good 7,9,14,15,17,21,23,24,and 25 9 From the results of the calculation of the different power tests, according to the criteria for the different power tests, 8 items with good criteria. Furthermore, 8 items with fairly good criteria. The poor category consists of 9 items. The total number of questions is 25 items. After the difference power test, the level of difficulty is then carried out. Based on the results of the analysis of the level of difficulty as many as 25 items, obtained 1 question with a difficult category that is 4%, 19 questions with an easy category that is 76%, and 5 questions with a moderate category that is 20%. After performing the difficulty level calculation test, the distractor quality test is then carried out. Distractor quality test results. The results of the calculation of the quality of the distractors can function properly. Based on the results of data analysis, it can be concluded that the instrument that assesses the science learning outcomes of theme 8 based on Higher Order Thinking Skills is valid and feasible to be used as an assessment instrument for students. The instrument that assesses the learning outcomes of science theme 8 based on Higher Order Thinking Skills is valid and feasible to use due to several factors, namely as follows. First, the instrument that assesses the science learning outcomes theme 8 based on Higher Order Thinking Skills is valid and feasible to use because it has met the requirements of a good assessment instrument. Instruments will be of high quality if they follow the correct instrument procurement procedure (Gaol et al., 2017;Zuliani et al., 2017). The requirements for a good assessment instrument are valid, reliable, and practical. The general principles that must be met in the assessment are valid, educational, sustainable, meaningful, comprehensive, and competency-oriented (Hulukati & Rahmi, 2020;Novitasari & Wardani, 2020). A good and appropriate instrument to use must have validity, reliability, and practicality value. The instrument that has been developed has been through testing and revising the product according to suggestions and input from experts so that the assessment instrument developed is perfect.
Second, the instrument that assesses the science learning outcome's theme 8 based on Higher Order Thinking Skills is valid and feasible to use because this instrument can be used to obtain the desired information. An instrument is a measuring tool that can be used to assess something in the context of collecting data in obtaining the desired information (Candra et al., 2018;Gaol et al., 2017). Assessment in Ni Made Mita Puspita Dewi 1, I Made Suarjana 2 , Luh Putu Putrini Mahadewi 3 / Assessing Science Learning Outcomes Using Assessment Instruments Based on Higher Order Thinking Skills learning must be done to find out information regularly about student development during the learning process. Assessment in learning can be done through tests. The existence of a test aims to determine the achievement of understanding that each student has (Adjii, 2019;Yusup, 2018). To achieve this goal, each test item must be structured according to the learning objectives and have a high level of reasoning (HOTS). Teachers need instruments to obtain information about student development in learning (Arifin, 2017;Zuliani et al., 2017). In research on the development of HOTS-based science learning outcomes instruments, the criteria for validity and reliability are very high. Based on the results obtained, the learning outcomes instruments have validity and reliability which are in very high criteria. So that this instrument is appropriate and suitable to be used as an evaluation tool or an accurate learning assessment on the cognitive aspects of students' science learning outcomes. These findings support previous research which states that a good test instrument can measure higher-order thinking skills in students (Umami et al., 2021). Other research findings also state that the assessment instrument is valid and reliable, feasible to use, and can be used to measure students' abilities (Arif, 2016;Solihah et al., 2020;Yusup, 2018). It can be concluded that a valid instrument can be used to measure students' abilities. This research implies that the developed HOTS-based assessment instrument can be used by teachers to assess student performance on daily tests or mid-semester assessments. In addition, this instrument can also be used as a guide for developing other assessment instruments. the existence of an assessment instrument for learning outcomes can help teachers understand how to make good questions so that learning objectives can be achieved properly.

CONCLUSION
Based on the results of data analysis obtained from material experts, practitioner response tests, student response tests, and instrument trials, the criteria were very good. The instrument for science learning outcomes based on Higher Order Thinking Skills has met the criteria of being valid, reliable, and of good quality. It can be concluded that the instrument that assesses the science learning outcomes of theme 8 based on Higher Order Thinking Skills is valid and feasible to be used as an assessment instrument for students.