Where | Summary | Action | Status | |
---|---|---|---|---|
NCR 167 | DPU software | magnifier tracking | investigation | OPEN |
NCR 171 | DPU software | eng4 16-chunk | reversed | closed |
NCR 174 | DEM hardware | KAL not working | reload code | effectively closed |
NCR 183 | DPU software | track not turned off | fixed | closed |
NCR 184 | DPU software | guide star selection | investigation | OPEN |
NCR 185 | ICU software | 60h in offset field | fixed | closed |
NCR 186 | DPU software | fast mode | investigation | OPEN |
Where | Summary | Action | Status | |
---|---|---|---|---|
NCR 187 | ICU software | memory dumps | fixed | closed |
NCR 188 | ICU software | memory dumps | fixed | closed |
NCR 189 | DPU/ground | single ev upsets | MOC problem | effectively closed |
NCR 190 | OM optics | stray light | investigated | effectively closed |
NCR 191 | ICU software | uninitialized var | fixed | closed |
NCR 192 | OM optics | broad PSF | focus heaters | closed |
NCR 193 | DPU software | memory check | fixed | closed |
NCR 194 | ICU software | safe mode | fixed | closed |
NCR 195 | ICU software | unsuccessful exec | fixed | closed |
NCR 196 | ICU software | hv ramp-up | fixed | closed |
NCR 197 | DEM software | dpu not get cgs | nothing | closed |
NCR 198 | DPU software | scrubbing reported late | fixed | closed |
NCR 199 | ICU software | ICU hang | investigation | OPEN |
NCR 200 | DPU software | Spont DPU reset | investigation | OPEN |
NCR 201 | DPU software | 16 bit wraparound | investigation | OPEN |
NCR 202 | DPU software | Missing alerts | fixed | closed |
NCR 203 | DPU software | FAQ word order | fixed | closed |
NCR 204 | DPU software | Unexpected A5AD alerts | investigated | closed |
NCR 205 | DPU software | Alerts from CGS | investigation | closed |
NCR 206 | DPU software | FAQ failed | investigation | to be closed soon |
NCR 207 | OM hardware? | HV failure | investigation | closed |
NCR 208 | DPU software | fast mode pointer | investigation | closed |
NCR 209 | DPU software | offsets remembered | investigation | closed |
NCR 210 | DPU software | a580 seen in cgs | investigation | closed |
NCR 211 | ICU software | DPU package exception | investigation | OPEN |
NCR 212 | DPU software | DP_FAQ ordering | investigation | to be closed soon |
NCR 213 | OM hardwaree | Cathode anomaly | investigation | closed |
NCR 214 | DPU software | DPU spontaneous reset | investigation | OPEN |
NCR 215 | DPU software | Eng 3 corruption | investigation | OPEN |
ECR 86 | ICU software | mode changes | fixed | closed |
ECR 87 | ICU software | exceptions->anomalies | fixed | closed |
ECR 88 | ICU software | safe on filter loss | fixed | closed |
ECR 89 | ICU software | tm timout change | fixed | closed |
ECR 90 | DPU software | Eng BEG/ENDOF_EXP | investigation | closed |
ECR 91 | DPU software | Whole frame | investigation | closed |
ECR 92 | ICU software | fw offset of UV grism | investigation | OPEN |
Pre-launch NCRs and ECRs are available at:
http://xmmom.mssl.ucl.ac.uk/docs/xmm-om-ncrs/ncr.ps
http://xmmom.mssl.ucl.ac.uk/docs/xmm-om-ecrs/ecr.ps
NCR 187 ------- XMM-NC-ESO-0103 A Dump from OM MID 37 was commanded at 2000.019.13.12.24, and based on the contents of the dump packets 2037.OM6, the following has been found: Packet # Start address Delta address from last packet (dec) 1 00E5 0000 --- 2 00E5 00A6 166 3 00E5 014C 166 4 00E5 01F2 166 5 00E5 0299 167 6 00E5 033F 166 7 00E5 03E5 166 8 00E5 048A 165 9 00E5 0530 166 10 00E5 05D6 166 11 00E5 067D 167 12 00E5 0723 166 13 00E5 07C9 166 14 00E5 086E 165 15 00E5 0914 166 As can be seen, the start address for the 5th packet is one larger than expected. It then corrects itself in the 8th packet. This behaviour continues through the rest of the dump. In our comparison task we saw the the dump starting off OK, then slipping one word, and then synching back up again. Kate Telecommand: H4125 memory dump 2000.019.13.12.21.324 mid = MID 37 hex H0500 = e50000 hex H0510 = 2990 hex VxWorks command: tc_dump_mem(0x25, 0xe50000, 0x2990) Reproduced at MSSL. Fix known. Same as NCR 188. Edit icu/fm/oper/memdpu.adb. Will be implemented for OM flight code release 10.
NCR 188 ------- XMM-NC-ESO-0103 "no dump TM for fix DPU status" Command to dump DPU status not responding. Telecommand HL113 Memd White LOCAL did not produce any packet dump. 1M packet 94213 Memd White LOCAL (OM6 2019) is expected. Telecommand: H4113 memory dump 2000.005.17.32.46.853 mid=13 hex h0500=23278 dec h0510=1 VxWorks command: tc_dump_mem(0x13, 0x5aee, 0x1) Reproduced at MSSL. Fix known. Same as NCR 187. Edit icu/fm/oper/memdpu.adb. Will be implemented for OM flight code release 10.
NCR 189 ------- XMM-NC-ESO-0105 "OM DPU Code corrupted by non-recoverable doublebit error" DPU crashes occur due to higher than expected Single Event Upsets on board and requires manual reload of DPU code from the MOC If this (multi-bit errors) turns out to be a problem that occurs too frequently I can make some software modifications to implement a voting scheme based on multiple copies of the executable stored in the DPU. It could be done in such a way that it would *NOT* affect upload times (loading the code from the ground into the DPU). Jim The architecture of the RAMs used in the DPU have eight 8bit X 32K RAM chips on a single ceramic sub straight (for a total of 8bits X 256K). The die themselves are organized into multiple pages of 8bit words. So it is possible that an energetic particle could impact a single die, let's say between two adjacent RAM cells, and cause both of them to flip. This could be seen as like bits (ex. 2^3) in sequential addresses (ex. e01662 and e01663) being flipped. Another possible scenario is the 2^0 and 2^7 bits on widely separated addresses. Jim We will wait and see how many of these we get. Phil
NCR 190 ------- ICU stray light Several OM images acquired showed a low emission structure roughly three times the background level. The increased background has the shape of loops or as elongated streaks. Intial analysis suggests that the increased background is caused by a chamfer in the detector holding structure.
NCR 191 ------- Helpdesk Ref E351 Vega ID 480 The variable SYNCHRONISING in time_man.adb is not initialized. This could cause an unpredictable initial value in the housekeeping though it does not on the real hardware. This is seen in (and causes a problem with) the simulator at VILSPA. Fix known. Edit icu/fm/oper/time_man.adb. Will be implemented for OM flight code release 10.
NCR 192 ------- Instrument PSF The instrument PSF is broader than expected. The magnifier PSF exhibit a donut shape, which suggests that the images are out of focus. Beside the defocus seen in the lenticular filters an additional 'defocus' component may be contained in the magnifier PSF. Focus heaters will be set to the following when the filter is chosen. 1200 -- Blocked - Filter 0 +100% 1400 -- V - Filter 1 +100% 1600 -- Magnfier - Filter 2 -100% 1800 -- U (no bar) - Filter 3 +100% 2000 -- B - Filter 4 +100% 0000 -- White - Filter 5 +100% 0200 -- Grism 2 (Visible) - Filter 6 +100% 0400 -- UVW1 - Filter 7 +100% 0600 -- UVM2 - Filter 8 +100% 0800 -- UVW2 - Filter 9 +100% 1000 -- Grism 1 (UV) - Filter 10 +100% 2100 -- Bar - Filter 11 +100%
NCR 193 ------- Problem: The checking of memory access for writing single small word integer would have generated with a hard-coded number 0xa555. The code should report with the symbolically defined name DA_GADE_WSI. Since this logic path is rarely passed through, the problem is only found accidentally through inspection. Action: Implement the correction. Test: 1. Write special code to generate this logic path. 2. Load and execute the special DPU codes. 3. Confirm the correct generation of the error message. Results: Exception message is correctly generated. Affected codes: include/global_access.c: v. 1.16.
NCR 194 ------- Goto Safe Mode can lose filter wheel position. If the DPU is sending alerts or data during the filter wheel movement of a goto safe command, this traffic on the SSI can interrupt the filter wheel on its way to blocked. Fix known. Edit icu/fm/oper/modeman.adb. Will be implemented for OM flight code release 10.
NCR 195 ------- The expected TM packet Unsuccesful Command Execution Type 3.4 TPN 91404 Error Code 134 which should have arrived when the FW was commanded and not Datumed, was generated by OM however arrived in a TM packet Type 3,2 Unsuccesful Command Acceptance. These TM packets are set up in the database as defined in the TC and TM Specification - User Manual XMM-OM/MSSL/ML/0010.4 section 3.3.3 as type 3.4s. The XMCS was not able to recognise the packet when it arrived as a type 3.2 as the packet is not defined in the database. As a result of this no automatic action was taken by the XMCS and commanding to OM was not stopped. Fix known: Missing if statement at the end of icu/fm/oper/tc_verify.adb Will be implemented for OM flight code release 10.
NCR 196 ------- HV ramp-up failed 1. First we send a hv ramp param tc and it works. 2. Then we send hv ramp tc 3. and it fails with an unknown tm packet. 4. The same tc is repeated and it works. 1. ****TELEMETRY**** 2000 53d 16:49:18.265 detector event mcp23 OK 2. ****TELECOMMAND**** 2000 53d 16:52:09.562 H7140 set hv ramp para mcp1 500 19 0 0 OFF 3. ****TELECOMMAND**** 2000 53d 16:52:49.785 H5140 start hv ramp 4. ****TELEMETRY**** 2000 53d 16:52:51.651 XREF.XXX <------------------bad telemetry packet 8c00 edb4 000d 0332 0061 ff0e 2f93 0a17 8400 a115 5. ****TELECOMMAND**** 2000 53d 16:56:45.441 H7140 set hv ramp param mcp1 500 19 0 0 OFF 6. ****TELEMETRY**** 2000 53d 16:58:12.892 MCP1 at correct voltage tm packet This was caused by a command too soon from the ground. The bad telemetry packet was because of NCR 195. This NCR should make sure that the HV code sends a command too soon packet rather than an invalid parameters packet when the command is too soon.
NCR 197 ------- Problem: It has been observed that ICU operation appears abnormal when setting up the exposure, if the DPU continues to transmit significant amount of data. The key symptom is the DPU does not receive the IC_CHOOSE_GS command. The DPU continues to operate properly. This condition does not appear to have any permanent damage to continuing operation of OM, other than the improper set up of the exposure configuration which leads to loss of science data. Resolution: This condition has been alleviated during the OM commissioning phase when the constraint was imposed that all data donwlink from the exposure n must be completed prior to the termination of exposure n+1. Thus this condition should not occur under the current operation scenario. If the operation scenario is revised, then this error/issue needs to be recreated/revisited.
NCR 198 ------- Scrubbing reports errors too often (when not busy) and sometimes too late (when busy).
NCR 199 ------- The ICU got stuck on 2000.129.16.34.30.155 whilst loading the DPU.
NCR 200 ------- Spontaneous DPU reset The first causes a spontaneous reset of the DPU. At the moment it is not clear what the cause of this is though it happens relatively infrequently. In these tests this was only seen once in approximately 160 hr of testing. Previous testing on earlier version of the code indicate that this NCR occurs randomly with no obvious period, with occurrence times of between 2 hr to over 100 hr.
NCR 201 ------- 16 bit wraparound when no stars No star in the fast mode window which causes the 16 bit fast mode memory to overflow. Since operationally you would expect to have a star in the fast mode window this is unlikely to happen often but there are plans to implement fast mode using 24 bit memory which would solve this problem.
NCR 202 ------- Missing alerts before Dave. 12.06.2000 All alerts from the DPU except for heartbeats are missing before the first IC_INIT_DPU 0XA430 is sent. From Cheng: This symptom is resolved by initializing the inhibit_ssi in cwhite.c, instead of in su_initialize.c. This is the same as the missing Jim. Releases 10 and 10b will inhibit all alerts except for HB between Jim and Dave. Release 11 should fix this problem.
NCR 203 ------- FAQ failure: DPU assumes incorrect word ordering of input reference stars. 08.10.2000. Opened by: Jamie Kennea (8th June 2000) Explanation: When loading reference stars for field acquistion, the DPU code assumes that the word ordering of the coordinates in the IC_LOAD_REF_STARS (A428) command as least significant word first. The DPU-ICU Protocol Definitions document (XMM-OM/MSSL/ML/0011.4) states that "In all cases, the most significant bit is transmitted first." This error causes the input reference star positions to be corrupted and therefore field acquisition fails.
NCR 204 ------- Unexpected A5AD alerts. 13.06.2000. From Jamie: On several occassions during long exposures, the DPU has entered a mode where it issues many A5AD alerts, indicating that the DPU is attempting to access illegal memory locations. The alerts are random, infrequent and not reproducable on simulator tests, indicating a possible cause to be corruption of the PROC memory area buy an SEU, causing the DPU code to run unpredictably. From Mat: I've been through the REPEX alerts and exceptions looking for the a5ad alerts. I found the following groups of a5ad "WHITE BOUND I. EXCEPT" warnings: ERT first warning no. of warnings ERT last warning 2000.073.19.31.42.293 lots of screenfuls 2000.074.11.19.18.102 2000.118.09.54.08.258 lots of screenfuls 2000.118.10.14.51.050 2000.119.03.53.13.409 lots of screenfuls 2000.119.03.53.54.193 2000.120.10.30.33.586 lots of screenfuls 2000.120.10.48.38.029 2000.121.01.59.19.190 lots of screenfuls 2000.121.01.59.54.051 2000.134.06.23.36.409 lots of screenfuls 2000.134.07.34.22.601 2000.135.01.34.15.016 lots of screenfuls 2000.135.01.38.50.322 2000.139.17.10.10.842 4 2000.139.17.11.30.877 2000.139.22.52.17.807 4 2000.139.22.52.18.637 2000.140.13.01.56.247 lots of screenfuls 2000.140.23.31.48.258 2000.142.11.03.07.877 lots of screenfuls 2000.142.12.15.09.510 2000.142.14.25.56.025 lots of screenfuls 2000.142.15.01.18.828 2000.143.00.45.52.465 lots of screenfuls 2000.143.00.49.35.290 From Cheng: This e-mail concerns the so-called A5AD problem seen on XMM-Newton-OM. We have seen a series of A5AD (GADE_RSIA_L) errors after frame 48 in a 100 frame exposures. One suspicion is that is is related to the long exposures. But Jamie reports that there are many incidents of long exposures where no data corruption has occurred on OM proper. Also, we cannot reproduce this problem on the ground. Rudi, Kate, how many times have we seen error like this? once or twice? Jamie has set up the data archive on eridanus. I haven't got the chance to look at them carefully. But, let me just raise this food for thought. We have seen data corruption, presumably due to radiation, both in PROG and real data product. It is thus not inconceivable that data corruption will occur in the DPU operation parameter area (PROC). When that happens, the behavior will be all over the map. One way to reduce this problem is to have short exposures. This will force a reference frame acquisition and processing, which can serve as a mini-reset where many variables are refreshed. Thus, if there is a 20 sec x 100 exposure, I recommend we break it up into two 20sec x 50, as long as the data volume and TM bandwidth can be accommodated. Even though we might incur some overhead, I think the resultant data resilience/robustness is worth it.
NCR 205 ------- Alerts from CGS. The DPU produces lots of alerts during choose guide stars for the magnifier.
NCR 206 ------- FAQ failed 03.07.2000 Field acquisition gave a5b2 alerts at 22:16 03.07.2000.
NCR 207 ------- HV failed. (Vilspa NCRs 60 and 66) 05.07.2000. The OM high voltages spontaneously turned off and disabled themselves on day 186 at 05:15. The ICU software knew nothing of this (it didn't seem to be responsible) though it was seen by the ICU software in the housekeeping. The ICU software correctly disallowed a further manual attempt to change the voltage as it correctly thought there was a problem as the measured and expected voltage were different. A RBI reset had to be performed to reset the software. The high voltages ramped up correctly afterwards. It looks like the electronics did this spontaneously. The software is very simple and before and after this problem was running correctly. No errors were reported from the software. The only suggestion is that this was radiation-induced. The out-of-limits should be changed so that in science mode (mode 3) the high voltages should be enabled. Currently there is a limit that the high voltages should be in limits when they are enabled but this doesn't catch the spontaneous disabling of the high voltages. If this is radiation-induced, we can do nothing.
NCR 208 ------- Toggling of the fast mode data buffer pointer. 17.07.2000. Problem Statement: Toggling of the fast mode data buffer pointer between exposures in the whitedsp.c is incorrectly coupled to the toggling of image mode accumulated image pointer. The correction is straightforward to implement. (It is actually implemented and tested.)
NCR 209 ------- Old tracking offsets remembered by shift and add algorithm. 26.07.2000 In the case where tracking is off due to a failure of the choose guide star algorithm, the Red DSP will apply the last calculated shift and add offset (from a previous exposure). Usually this value is no more than a +/- a few pixels - however in extreme cases (explicitely where tracking has gone badly wrong, followed by an exposure where CGS fails) this offset can be large (>100 pixels) - which can lead to corruption/loss of data. This behaviour has so far been seen twice in OM data, once in a series of observations in UV, and once using the magnifier.
NCR 210 ------- a595 (DA_BLUE1_S_ALERT) and a580 (DA_CLK_SYNCH_ERROR) seen in Choose Guide Stars. The observed a580 understood to some extent. It is correlated with exposures with two fast mode windows. If you have only one fast window, then we should have any problem. Reason we haven't seen this is that EOB2 has no facility to do clock sync. Shouldn't have any detrimental effect. It happens when we tell both blue1 and blue2 to load BFAST simultaneously. Fix is to split up the command to load BFAST.
NCR 211 ------- Ada exception in DPU data manager. Opened by: Fabio Giannini 25.09.2000 at 11:24.53 totade the ICU code crashed. Exception packet 92800 was sent indicating an Ada error with parameters: H8080=32hex (Ada exception DPU data manager) H8085=E010hex (out of range as defined in TM-TC doc) TM was lost and the command to go to safe was not accepted. A cold reset was then issued and a RBI dump of the memory was performed
NCR 212 ------- NCR 212: Ordering issue with DP_FAQ packet Opened by: Jamie Kennea, Rudi Much 13.10.2000 The DP_FAQ alert contains both a list of the uplinked guide stars used for field acquisition and the positions of the stars identified by the DPU to be associated with these reference stars. Currently if a reference star is not found, then this has not entry in the downlinked star list. Therefore if 2 stars out of 16 are not found, the reference star list is an array of 16 and the found star list is an array of 14 with with padding at the end. This makes direct comparison of the reference star list and the found star list difficult in the case where not all uplinked stars are found.
NCR 213 ------- NCR 213: Cathode anomaly Opened by: Jorge Fauste (Vilspa NCR 70) 04.10.2000 During Optical Monitor Activation on day 272 revolution 148 around 20 hours 45 minutes parameter H5165 HVM VCATH had several spikes. The problem appeared some minutes before the cathode High voltage had been set up by Telecommand during the High Voltages ramp up These are the values found: before 2000.272.20.45.16 H5165=0.50 volts at 2000.272.20.45.16 H5165=3.91 volts at 2000.272.20.45.25 H5165=7.32 volts at 2000.272.20.45.35 H5165=10.25 volts at 2000.272.20.45.46 H5165=98.5 volts at 2000.272.20.45.56 H5165=0.50 volts after that time some small spikes. at 2000.272.20.49.36 the command to set up the high voltages for the cathode was sent. After the command was sent everything was O.K The cathode is seen to raise on the QM hardware when the other high voltages are ramped up. This is a known feature of the hardware.
NCR 214 ------- NCR 214: DPU spontaneous reset Opened by: Jorge Fauste (Vilspa NCR 76) 16.01.2001 On day 2001-011 at 19:07 the following Repex message appeared: "92301 SSI Exception" after this message another two messages appeared: "92802 Heartbeat lost", "92803 DPU Reset exception" Executed procedure CRP_OPM_004
NCR 215 ------- NCR 215: Eng 3 corruption Opened by: Jorge Fauste (Vilspa NCR 78) 09.02.2001 First OM exposures of observation 0109870201 on revolution 215 showed corrupted images. After some investigations was discovered that Engineering 3 data was corrupted as well. Science observations were stopped for OM, and Engineering 3 and 6 executed again. No Telecommand or telemetry problems detected.
ECR 86 ------ Two mode change commands to the same mode should not produce an error. For example, tc_mode 2 tc_mode 2 should not generate an error Change known. Edit icu/fm/oper/modeman.adb. Will be implemented for OM flight code release 10.
ECR 87 ------ Change all critical exceptions to major anomalies.
ECR 88 ------ Loss of filter wheel position should cause an automatic goto safe and prevent mode changes until recovered. As an additional safety measure, HV ramp-up is not allowed unless filter wheel in blocked position.
ECR 89 ------ Change the loss of telemetry timeout to goto safe approx 1 min after the CDMU dies.
ECR 90 ------ Implement BEGOF_EXP ENDOF_EXP in engineering exposures.
ECR 91 ------ Implement a whole field of view exposure mode.
ECR 92 ------ Change fw offset of UV grism. Opened by: Rudi Much 11.01.2001 The UV grism spectra are disturbed by 0-order images and by straylight features. After the analysis of the UV grism data acquired in several test observations of BPM16274 the OM team came up with a new filter wheel position of the UV grism. The new position cleans up parts of the OM FOV both for straylight and 0 order iamges. Cleaner grism image are obtained. The new grism position will replace the current one. The new position is defined as (old position - 60) = 940. The FW position is normally commanded by filter element identifier and the translation from FW identifier to FW counter position is made by the ICU. Therefore a change on the ICU s/w is required (OM team). There is also a absolute filter wheel commanding capability. Here the reference document "TC and TM specificition" is consulted. An update of this document is required (OM team). However changes of the ground segment are required as well, e.g. the translation from FW position counter into FW position for the mimic display of the filter wheel position.