Quickstart guide Millipede cluster
1. Accounts and logging in
               
               
For the new cluster we have tried to bring the accounts in line with the general university accounts. This means that from now on we will use p- or s-numbers as the login names. For users that do not have an account in the university system, we will arrange, so called, functional accounts using an f-number. Unfortunately we have not yet been able to synchronise the passwords between both systems. In order to get an account on the system you will therefore have to contact the CIT Servicedesk Centraal to obtain a password. The easiest way to do this is by filling in the online form
When you have obtained a password you can login to the system using the hostname millipede.service.rug.nl
e.g. from a Unix shell you can use:
ssh -X user@millipede.service.rug.nl
2. Support & Documentation
In the past the system administrator Kees Visser was the main contact point for questions about the HPC cluster. In order to prevent relying on a single person we have decided to make the CIT Servicedesk the contact point for support for the Millipede system. This means that multiple people are notified of your question or problem and that there should always be someone available to solve it.
The documentation for the new system has been made available at:
http://www.rug.nl/cit/hpcv/publications/docs/millipede
Note that the documentation will still need to be updated frequently.
3. System description
                  
                  
The new cluster consists of 4 parts:
- 1frontend nodes
- 235 batch nodes with 12 cores and 24 GB of memory (nodes)
- 16 batch nodes with 24 cores and 128 GB of memory (quads)
- 1 SMP node with 64 cores and 576 GB of memory (smp). Using this machine requires efficient placement of the processes on the cores. 
                  
The whole system is connected with a 20 Gbit/s Infiniband network.
                  
The storage is divided in four parts:
/home: 7 TB of storage with a quotum of 10 GB per user, this space is backed up daily
/data: 110 TB of fast storage for large data sets. We currently limit the users to 200 GB. Because of the amount of space available here no backup is made of this data.
/data/scratch: On the same 110 TB as /data, but for temporary storage. 
/local: The local disk in each of the nodes. For each job a temporary directory is created here, which is available as $TMPDIR. This directory is removed after the job has finished. For the rest of the space a cleanup policy will be applied.
                  
Note that the given limits are the ones set as default. If you really are in need for more space you should contact the Servicedesk and explain why you need more space, so that we can see if it is possible to fulfil you requirements.
                  
We have made the /home and /data areas available on the University Windows desktop environment. Details on this can be found on the Millipede documentation page:
                  
http://www.rug.nl/cit/hpcv/publications/docs/millipede
                  
               
4. Scheduling system
                  
The new cluster uses the same scheduling system as the old one. This means that the commands have not changed. The queue names have changed, however. There are now 10queues for the general users:
- short : Queue for the 12 core nodes for short jobs, lasting at most 30 minutes. There is some capacity reserved for this queue;
- nodes: Queue for the 12 core nodes with a limit of 24 hours;
- nodesmedium: Queue for the 12 core nodes with a limit of 72 hours;
- nodeslong: Queue for the 12 core nodes with a limit of 10 days;
- quads: Queue for the 24 core nodes with a limit of 24 hours;
- quadsmedium: Queue for the 12 core nodes with a limit of 72 hours;
- quadslong: Queue for the 24 core nodes with a limit of 10 days;
- smp: Queue for the 64 core node with a limit of 24 hours;
- smpmedium: Queue for the 12 core nodes with a limit of 72 hours;
- smplong: Queue for the 12 core nodes with a limit of 10 days.
                  
When you submit a job with no specific queue requirement the job will be routed to one of the 12 core node queues (nodes, nodesmedium, nodeslong) based on your wallclock time requirement.
                  
Note that a big difference between the old and new cluster is that on the new system multiple jobs are allowed on a single node. This because the much higher number of cores available. If you really want to be scheduled full nodes you can request this in your job requirement using the nodes/ppn option. E.g. adding the following line to the job script you submit will request 4 nodes with 12 cores.
#PBS -lnodes=4:ppn=12
Memory requirements
For all jobs a default memory requirement of 1900MB per core has been set. This setting can be changed using the -lpmem option, by putting for example the following line in your job script:
#PBS -lpmem=3800M
5. Modules environment
               
               
One feature of the system that is worth taking note of is the modules environment that has been installed. Several software packages have been organised into modules that can be loaded into the environment. You can use the command "module" to work with this module environment. Useful options to the command are avail, list, add and rm. E.g. the following command loads the intel compilers into your environment:
$ module add intel/compiler/64
6. Final remarks
This is only a very short overview of the new system. More documentation is available online, and will be extended in the near future. Please be aware that the system is rather new, which means that we may still need to adjust some settings after having gained some more experience with it. If you have questions or suggestions for us, or if you find that the current settings don't match your requirements, don't hesitate to contact us.