Developing Classroom Assessment Tool using Learning Management System-based Computerized Adaptive Test in Vocational High Schools

Komputer telah mengambil peran besar dalam pendidikan, termasuk pengujian dan evaluasi. Tes tradisional yang tidak komprehensif dan tidak membedakan antara bakat awal siswa mengarah pada temuan pengukuran yang tidak mewakili kemampuan mereka yang sebenarnya. Penelitian ini bertujuan untuk mengembangkan dan menguji kelayakan alat penilaian kelas yang digunakan sebagai adaptif berbasis LMS. Jenis penelitian ini termasuk penelitian pengembangan. Responden penelitian ini adalah para ahli yang menilai validasi dan siswa SMK Kompetensi Keahlian Teknik Instalasi Tenaga Listrik. Teknik analisis data menggunakan teori respon butir, teori tes klasik dan statistik deskriptif. Analisis butir soal dengan Model Rasch menunjukkan 10 butir soal tidak sesuai dan sisanya 50 butir soal sesuai. Butir analisis teori tes klasik dengan validitas kurang ada ada 55 ada 5 reliabilitas Alpha 8 item. Ada item item item Alpha 16 observasi. Item item, item ada 15 item, dan 1 item. Penggunaan sistem manajemen pembelajaran berbantuan CAT pada perangkat penelitian sangat memudahkan guru dalam melakukan penelitian secara akurat dan praktis. Computers have taken on a large role in education, including testing and evaluation. Traditional tests that aren't comprehensive and don't distinguish between students' beginning talents lead to measurement findings that aren't representative of their true abilities. This study aims to develop and test assessment tool eligibility class that is used as an LMS-based adaptive. This type of research includes development research. The respondents of this study were experts who assessed the validation and students of SMK Electrical Power Installation Engineering Expertise Competencies. The data analysis technique used item response theory, classical test theory and descriptive statistics. Item analysis using the Rasch Model showed 10 items were not fit and the remaining 50 items were fit. Classical test theory analysis items with less validity there are 0 items, moderate there are 55 items, and high there are 5 items, with an Alpha reliability of 0.934. The attitude questionnaire developed consists of 8 items. There are 0 items with less validity, 6 items being moderate, and 2 items high, with an Alpha reliability of 0.731. The developed observation guide contains 16 observations. Items with less validity have 0 items, while there are 15 items, and 1 item high. The use of CAT-assisted learning management systems on research tools greatly facilitates teachers in conducting research accurately and practically. were: 1. A test in the form of multiple choices in order to assess the students’ knowledge related to Electrical Installation in XI 2 class of Electrical Installation Engineering Expertise Program; 2. Rating scale for assessing the students’ attitude; 3. Obse rvation Guidelines for assessing the students’ skills; 4. Likert scale for assessing the appropriateness of the instruments on LMS-based Computerized Adaptive Test. In attachment 1, the lattice of the multiple-choice test is presented. The assessment on st udents’ attitude is designed based on IP -21CSS ( Indonesian Partnership for 21 Century Skill Standard ), consist of 4C: Creativity, Critical Thinking, Communication, and Collaboration.


INTRODUCTION
The fourth industrial revolution era (industry 4.0) has impacts on all aspects in humans' life, and as a current issue which becomes critical in the field of education (Devi et al., 2020). The fact is, in the modern concept, education is a mean for the students to face the challenges of modern era (Hadi et al., 2018). In addition, when dealing with the challenges of the fourth industrial revolution era, an appropriate education is needed in order to build a creative, innovative, and competitive generation (Lase, 2019). In industry 4.0, 5 central technologies were used in order to support the education process and as a form of significant development which becomes the main characteristic of education 4.0 (Hussin, 2018). Those five central technologies are: computing hardware, production hardware, software, interfaces, and connections (Abidin et al., 2019). The continuous development on the paradigm of revolution has initiated by the development of sciences and technology as a form of innovation (Liao et al., 2018). Moreover, each level of education utilizes technology in the learning process, and one of them is Vocational High School (SMK) (Yunis & Telaumbanua, 2017). If a Vocational High School does not utilize technology in the teaching and learning process, then the school will not be able to compete with other Vocational High Schools or other High Schools that have adapted with the technology in 4.0 industry (Batubara, 2017). In the fourth industrial revolution era, education and technology cannot be separated, especially in Vocational Schools (Hardyanto & Surjono, 2016). In addition, the developments in the fourth industrial revolution era are conducted consciously and systematically, which are in accordance with Indonesian' future education goals and have good impacts (Abbasi et al., 2018).
The development of learning program in Vocational Schools needs at least 4 of 5 central technologies in the fourth industrial revolution era, since from 5 central technologies, one of them cannot be conducted in Vocational High Schools since it depends on the expertise program, and it is the production hardware (Putra et al., 2020). Meanwhile, four technology centrals that can be utilized in Vocational High Schools are computing hardwares, interfaces, softwares, and connections, while the utilization of technology in this era will help the students in achieving the learning goals. The four central technologies can be accesses by using gadget, laptop, and WIFI as the facilities provided in the school. The changes are seen from how the conventional technology turns into smart technology (Haryanto, 2013). One of the characteristics of 4.0 industrial revolution is the implementation of artificial intelligence as well as the utilization of 4.0 industry in vocational high schools, by implementing Learning Management System (LMS) (Codish et al., 2019). LMS is an internet-based software designed to control the e-learning program (Hu, X., Ng et al., 2020). LMS has certain characteristics, such as in the management of the learning content, the management of learning process, evaluation, by implementing online examination system, as well as the administration process that can be completed using online chatting and discussion forum (Jana & Khatun, 2021). One of outstanding LMS is Moodle (Modular Object-Oriented Dynamic Learning Environment).
Moodle is an Open Source Course Management System (CMC) which is also known as Learning Management System (LMS) or Virtual Learning Environment (VLE) which plays roles in the development of technology-based learning, in the form of website (Chaw & Tang, 2018). Moodle is an open source software that can be used for free, and it increases the number of Moodle users. As noted in 104.361 sites or, 21.000.000 courses or subjects, 179.000.000 users, and 232 countries that use Moodle (Copriady et al., 2020). The Moodle users tend to be increasing as well as the improvement of the features of the software, and the learning demands that are not limited. Moodle has several advantages, such as having complete features of communication media (chatting, messaging, or discussion forum), creating and administrating the learning materials, tracking the data of students' learning outcomes (data tracking), and its function can be extended (extensibility plugin); After that, Moodle is easy to use, since there is a flexibility to arrange the application based on the institutions' needs and policy in the learning process. Currently, in Indonesia, many still use classical test theory to analyze tests and estimate their abilities (Fajrianthi et al., 2016), but in the use of classical test theory, and the measurement results are far from the actual ability (Abidin et al., 2019;Asriadi & Istiyono, 2020). Classical test theory has many shortcomings, one of which is that the characteristics of the questions depend on the examinee (Istiyono et al., 2018;Wise et al., 2015). For example, if answered by an intelligent participant, the question will be a simple question, otherwise if it is answered by a less intelligent participant, the question will be difficult. Likewise the score that reflects the ability of the participant depends on the. If this happens, we will never know the true ability level of the examinees. The solution to this problem is to apply item response theory (IRT) (Van der Linden, 2018). This is because the skill level of participants and project parameter estimates are measured accurately.
4.0 industrial revolution era is one of the ways to change the assessment strategy by using internet (Febliza & Okatariani, 2020). One of the advantages of internet-based assessment application is its ability to present a test whose difficulty level is appropriate with students' capability. In addition, adaptive testing model can be used in dealing with the weakness of educational assessment, and the test is known as adaptive test or tailored test, or a test that has been designed based on the students' capability (Aybek & Demirtasli, 2017). The adaptive test is designed by determining the questions / items based on the participants' initial answers (responses) (Kezer, 2021). Adaptive test can accurately determine the participants' ability, even though there are various questions and working times done by each participant (A. Sahin et al., 2018). In addition, as a computer-based test, an adaptive test can show the final participant value immediately after the test is completed; and as a network-based test, an adaptive test can present all results to the teacher or administrator of the system (SahiṄ & Gelbal, 2020). Moreover, Adaptive Quiz module installed in Moodle application can be used to perform adaptive assessments (D. I. Sahin & Shelley, 2020). However, LMS has not been used in Vocational High Schools. Therefore, it is necessary to use of Learning Management Systems (LMS)-based Computerized Adaptive Test (CAT) in Vocational High Schools.
Relevant research conducted by (Abidin et al., 2019) in developing a CAT to measure the HOTS of 10th grade high school students in physics. Further research conducted in developing CAT to measure critical physics skills. Research conducted by other researcher in developing CAT with fuzzy logic algorithms in describing students' abilities and inference system models in making decisions to choose the right test items for students (Haryanto, 2013). Based on previous research, that this study aims to develop a computer-based adaptive classroom assessment tool or Computerized Adaptive Test (CAT) based on Learning Management Systems for the Electrical Engineering Skills Program in Vocational High Schools and test the feasibility of classroom assessment tools for the Electrical Engineering Skills Program in Vocational Schools.

METHODS
In this study, the researcher implemented Research and Development (R&D) method and scale construction process as the development model (Gustiani, 2019). First, an application of LMS was conducted as a trial in Vocational High Schools, and then the data were analyzed and reported. Second, the application of entire LMS was conducted, and then the data were analyzed and reported again. The whole research processes were presented in Figure 1. There were six types of population of the research. First, the expertise of the students in the Vocational High Schools, since in this research, the expertise that was selected is Electrical Engineering. Hence, the expertise was selected by applying purposive sampling, in order to focus on developing the assessment tool. Second, the population is the students' level in Vocational High Schools, from the tenth grade to the twelfth grade, and the selection was conducted by applying purposive sampling, with the condition that the students have had an ideal capability on the expertise. Third, the population is the course whose assessment will be developed, and in this research is Electrical Installation. The fourth population is the vocational high school where the research was conducted. The determination of the schools was done randomly, and the selected schools were SMK Muhammadiyah 3 Yogyakarta and SMK Negeri 3 Yogyakarta. The fifth population was the student(s). The selection of students was also done randomly, such as 17 students from SMK Muhammadiyah 3 Yogyakarta, and 26 students from SMK Negeri 3 Yogyakarta. Finally, the sixth population was the teacher(s) that were selected purposively, and 3 teachers were selected from each school.
The instruments used in collecting the data were: 1. A test in the form of multiple choices in order to assess the students' knowledge related to Electrical Installation in XI 2 class of Electrical Installation Engineering Expertise Program; 2. Rating scale for assessing the students' attitude; 3. Observation Guidelines for assessing the students' skills; 4. Likert scale for assessing the appropriateness of the instruments on LMS-based Computerized Adaptive Test. In attachment 1, the lattice of the multiple-choice test is presented. The assessment on students' attitude is designed based on IP-21CSS (Indonesian Partnership for 21 Century Skill Standard), consist of 4C: Creativity, Critical Thinking, Communication, and Collaboration.

Ensuring the Construction and Context of the Instrument
The classroom assessment tools developed in this study are: a) Electrical Installation test for the twelfth-grade students of Electrical Installation Engineering Expertise Program; b) questionnaires for assessing students' attitude; and c) observation guidelines for assessing students' practical skills. The assessment tools and the practical assessment tools are presented in the following section. The construct of the classroom assessment tool in this study was chosen based on its level of difficulty in order to be applied in a computer program. The questions are in the form of multiple choices with 5 alternative answers. Meanwhile, the questionnaires were designed in the form of case assessments with 4 alternative answers in order to assess the students' attitude, while the observation guideline contents consist of short entries, ranged from 1 to 4.

Selecting the format of the instrument
A Mathematical test on Electrical Installation for the twelfth-grade students of Electrical Installation Engineering Expertise Program was analyzed with multiple choices. This type was chosen since it is easier to implement in CAT and the answer is certain. The last participant can choose or click the alternative answer that is considered as the best answer or the most correct. The participant's answer will be corrected directly by the computer and compared with the answer key. The questionnaire on assessing students' attitude uses a rating scale format based on the case study. The context is related to the participants' daily life at school and in the community. Then, the participants were asked to answer by choosing one of the available answers. The alternative for each answer has different weight values from 1 to 4. This model was chosen since it has been implemented in CAT or CBT, the participants' value depends on the alternative answers that are clicked by the students. The observation guideline for assessing students' practical skills uses a format for assessing the subcomponents of students' skills, conducted by the teacher. In addition, the observation guideline is complemented by the assessment grid, so it contains subcomponents of skills that can be assessed by the teacher based on the students' abilities, as presented in the rubric. The observation guideline is also easy to implement in CAT or CBT since because each item is already in the form of numbers.

Data Collection
The data collected in this study were obtained from the validation of the experts and students' responses. Before testing the instruments on the real users, the instruments were validated by the experts, who consisted of 6 persons. The validators were asked to assess each item of the instruments, ranged from 1 to 4, while 1= very invalid, 2 = invalid 2, 3 = valid, and 4 = very valid. After validating the items of the instruments, then the instruments were given to the users. Questionnare for assessing students' attitude was given online, while the observation guidelines were not tested, since the research was conducted during Covid-19 pandemic.

The Test on Characteristics and Psychometric Quality Validity and Reliability Validity and Reliability of the Test
The results of the analysis of the validity using Aiken's formula from 60 items obtained index V minimum = 0.444, maximum = 0.833, mean = 0.661, and standard deviation = 0.096, if the validity is classified into less, moderate, and high then the items which have less validity are 0 or 0%, the items which have moderate validity are 55 or 91.67%, and the items which have high validity is 5 or 8.34%. The results of the validity of the questions done by the experts indicate that all items are valid. From 60 items, and the items which have less validity are 0 or 0%, the items which have moderate validity are 55 or 91.67%, and the items which have high validity is 5 or 8.34%. Meanwhile, the value obtained from the reliability calculation using Cronbach's Alpha on 43 test participants is of 0.934. Thus, it can be concluded that the questions are valid and reliable, as presented in Figure 2. If the quality of the items in terms of total-item biserial correlation and a correlation coefficient is less than 0.3, then it is categorized as bad, and if it is greater than 0.3, it is categorized as good. From 60 items, 9 items or 15% of the items are bad, and 51 items or 85% of the items are good as presented in Figure  3. Even though several items are bad, but the correlation is positive and small, and there are 8 items that can still be used. If an item has a negative and small correlation, such items need to be corrected or discarded.   If the quality of the items is viewed from the total-item product moment correlation and the correlation coefficient is less than 0.3, it is categorized as bad, and if it is greater than 0.3, it is categorized as good. Then, the result reveals that from 8 items, 1 item or 12.50% of the items are bad, and 7 items or 87.50% of the items are good, as presented in Figure 6. These bad items have positive and small correlation, but such items can still be used since they do not cause negative judgments.

Validity of the Observation Guideline
Aiken's analysis on the validity of the 16 items on the observation guideline obtained index V minimum = 0.5, maximum = 0.833, mean = 0.694, and standard deviation = 0.097. If the validity is grouped into less, medium, and high, then the items which have less validity are 0 or 0%, the items which have moderate validity are 15 or 93.75%, and the items which have high validity is 1 or 6.25%. The result of validity testing by experts on the observation guideline reveals that 0 item or 0% of the items have less validity, 15 items or 93.75% of the items have moderate validity, and 1 item or 6.25% of the items have high validity, as presented in Figure 7. Since the observation guideline has not been tested to assess students' practical skills, the reliability has not been proven. By identifying the number of valid and good items from the test and questionnaires as well as the valid and goof items from the observation guideline, it can be concluded that the question items of the test, questionnaires, and observation guidelines can be used in Learning Management System-based Computerized Adaptive Test (CAT).

The Correlation of Item -Total
This analysis aims to identify good and bad question items or statements used in the data collection process. At this stage, the correlation will be analyzed, namely tests and questionnaires. Biserial Correlation of Item Test -Total shown on Table 1.  Table 1. Analysis of biserial correlation on the test items obtained a minimum value = -0.0004, maximum = 0.936, mean = 0.542, and standard deviation = 0.218, while the criteria of biserial correlation coefficient is: <0.300 is bad and> = 0.3 is good, and it can be concluded that from 60 items, 9 items or 15% of the items are bad, and 51 items or 85% of the items are good. Then Bisserial Correlation of Item of the Questionnaire -Total is shown on Table 2.  Table. 2, analysis of the product moment correlation item -the total items of the questionnaire obtained the minimum value = 0.282, the maximum = 0.649, mean = 0.512, and the standard deviation = 0.224., with the criteria of biserial correlation coefficient: <0.300 is bad and> = 0.3 is good, and it can be concluded that from 8 items, 1 item or 12.50% of the items are bad, and 7 items or 87.50% of the items are good. The difficulty level of the test item are shown on Table 3.  Table 3, the IRT analysis using Rasch model on the participants' responses in the multiple choice test obtained a minimum value = -2.837; maximum = 1,858; mean = 0.283; and standard deviation = 0.766. and it can be concluded that from 60 items, 10 items are not fit with the Rasch model, while 50 items are fit.

Discussion
The development of an adaptive classroom assessment device based on a learning management system is a set of assessment forms consisting of three types, namely the assessment of knowledge aspects in the form of multiple choice, assessment of aspects of attitudes made in the form of a graded scale (rating scale), case assessment, and assessment of skill aspects (psychomotor). It was made in the form of an observation guide. The assessment tool is used in assessing learning activities in the electrical engineering expertise program in SMK. This learning device is made based on a learning management system by applying the CAT system. Currently the assessment no longer has to be done manually but can be made in a digital system sistem (Boonprasom & Sintanakul, 2020). A learning management system is a platform that offers a variety of integrated tools for delivering and managing online instruction via computer devices (Fearnley & Amora, 2020). The implementation of the LMS has made it easy and provides the right digital learning environment for a large number of distance lessons (Noreen, 2020). Assessment with a learning management system serves as the right platform to assess their self-learning skills, so that they become more useful and relevant in today's world of work (Biney, 2020). Adaptive classroom assessment tools based on the learning management system with the CAT system provide the opportunity for teachers to assess comprehensive student competencies in an easy way (Rizal et al., 2020). The standardized learning management system supports an inclusive learning environment for academic progress with an intermediary structure that promotes online collaborative grouping, professional training, discussion, and communication (Bradley, 2020). Learning management system-based assessment forms approach can produce objective values in a practical way (Ngafeeson & Gautam, 2021). So that this assessment tool can be used by teachers or schools in mapping students' abilities more honestly and transparently. The assessment component that is the main emphasis in this assessment tool is the 4C component, namely Creativity, Critical Thinking, Communication, and Collaboration. These components are the focus of competencies that must be achieved in 21st century learning (Asriadi & Istiyono, 2020).
Based on the feasibility analysis, all developed instruments (tests, questionnaires, and observation sheets) have met the valid and reliable criteria. From the results of the Aiken V index calculation, an item or device can be categorized based on its index. If the index is less than or equal to 0.4, it is said to be less valid, 0.4-0.8 is said to be moderately valid, and if it is greater than 0.8 it is said to be very valid. (Retnawati, 2017). Based on the findings from data analysis, the test instrument has 55 test items in the valid category and 5 test items in the very valid category. The questionnaire instrument obtained 6 statement items that were in the valid category and 2 statements that were in the very valid category. In the observation sheet instrument obtained 15 items that are in the valid category and 1 item which is in the very valid category. The results of Cronbach's Alpha reliability calculation from the questions obtained a value of 0.934 which was in the very reliable category, while the questionnaire instrument obtained a value of 0.731 which was in the reliable category. It is supported by previous study that uses validity and reliability to ensure that the measurement of the assessment tool developed is appropriate and implementation in the learning process program runs well (Rudibyani et al., 2020). Thus, the concepts of validity and reliability are needed so that the required information can be obtained accurately and precisely.
A good instrument in addition to meeting the valid and reliable criteria, must also have the conformity of the item function with the overall function of the instrument so it is necessary to do Biserial Correlation analysis. Instrument items are said to meet good criteria if they have a biserial correlation coefficient greater than 0.3 (Dewanti et al., 2020). Based on the results of the Biserial Correlation analysis on the test instrument, 51 test items were in the good category and 9 test items were in the bad category. The questionnaire instrument obtained 7 good statements and 1 bad statement. Especially for test instruments, it is necessary to know the index of difficulty of each item because the index of difficulty of the items will not change even if the questions are done by different respondents. Therefore, it is necessary to carry out an IRT analysis of the Rasch model of the responses to multiple choice questions to determine the level of difficulty of the items. The test item is said to be fit/good if it has a difficulty level item greater than 0.5 (Dewanti et al., 2021). Based on the results of the IRT analysis of the Rasch model, there were 50 items that fit and 10 items that did not fit.
In addition, the use of CAT-assisted learning management systems on research tools greatly facilitates teachers in conducting research accurately and practically. This is in accordance with research conducted by other researcher that learning using the Learning management system makes for a better and more professional understanding (Liu & Geertshuis, 2021). So in general, this assessment tool is suitable for use on different respondents but at the same level. The novelty of this research lies in the research output, namely the output of the assessment tool that is oriented to the knowledge aspect, attitude aspect and skill aspect. This study focuses on the development of an assessment system based on the Learning management system with the CAT system. In the midst of the current situation of technological development, of course, the use of digital devices in assessing the learning process must be applied.

CONCLUSION
The construct of the class assessment tool in this study was chosen which is easy to implement in a computer program. The questions are made in the form of multiple choice with five alternative answers. The attitude questionnaire was made in the form of a case assessment with 4 alternative answers, while the observation guidelines were made in the form of short entries with numbers between 1 to 4. All assessment tools had met the valid and reliable criteria based on validity analysis and reliability tests so that the class assessment tool was suitable to be used as a reference. Then teachers to evaluate in measuring the achievement of indicators at the end of learning to achieve predetermined learning objectives. This is seen from the aspects of readability, construction, and content suitability. The limitation in this study lies in the material presented on the test equipment which focuses on the electrical system installation engineering material. As a recommendation in the future for researchers to develop assessment tools made in various formats such as test instruments can be formed in essay format with more diverse material. In addition, it can also be applied to focus on different materials which are expected to increase knowledge and variety and their impact on the achievement of learning objectives.

REFERENCES
Abbasi, S., Moeini, M., Shahriari, M., Ebrahimi, M., & Khoozani, E. K. (2018). Designing and manufacturing of educational multimedia software for preventing coronary artery disease and its effects on modifying the risk factors in patients with coronary artery disease.