I guess somehow there is no communication between the scheduler and the server. Personal Open source Business Explore Sign up Sign in Pricing Blog Support Search GitHub This repository Watch 2 Star 1 Fork 0 spuder/torque-setup Code Issues 2 Pull requests 0 Projects Reason 1: Filesystem that has logs is full A problem I had was the following: there was some jobs running on the system, but newer jobs wasn't running. Index: torque.setup =================================================================== --- torque.setup (revision 1544) +++ torque.setup (revision 1543) @@ -30,18 +30,12 @@ pbs_server -t create -echo set server operators += $USER | qmgr -a 2> /dev/null +echo set http://linuxtoolkit.blogspot.com/2014/06/resolution-for-error-cannot-set-torque.html
I first wgot torque-2.3.1.tar.gz, and extracted it. Installation in a supercomputer At the time of this writing, TORQUE 4.2.6 was the newest version. One is that your scheduler is probably not running or cannot communicate with pbs_server. Looking all over for this to no avail.
if the directory: /home/user or /home/user/.ssh has a bad permission, this problem will appear, you just need to perform: chmod 755 /home/user/.ssh -- MinghuiLiu - 07-Feb-2012 Edit|Attach|Print version|History: r1|Backlinks|Raw View|WYSIWYG|More topic We will call that directory with $TORQUE_HOME. In mom_logs we find the line: 11/05/2014 18:30:47;0008;PBS_Server.23876;Job;3080.bachianas.ufabc.edu.br;unable to run job, send to MOM '3364214663' failed And a call to qrun command to force the execution of the job returns: qrun: job failing into the wrong queue Job failing in the wrong queue can have several reasons.
User contributions on this site are licensed under the Creative Commons Attribution Share Alike 4.0 International License. Adv Reply Quick Navigation Tutorials Top Site Areas Settings Private Messages Subscriptions Who's Online Search Forums Forums Home Forums The Ubuntu Forum Community Ubuntu Official Flavours Support New to Ubuntu If we just try to relase it with releasehold and wait the scheduler cycle, we see that it is put the hold again. By default the local host is found anyway.
An observation - MAUI MAXPROC initally not working... In this very specific case, the machine supported the NUMA architecture, so we can compile TORQUE with NUMA support to divide the CPUs in logical units. In our case: hostname np=134 Where "hostname" is the output of the hostname command. I always start pbs/torque with Code: qterm pbs_server pbs_sched The computers 'talk' to each other successfully: Code: [email protected]:/var/spool/torque/server_priv# pbsnodes gordon.che.wisc.edu state = free np = 1 ntype = cluster status =
You signed in with another tab or window. http://osdir.com/ml/clustering.torque.user/2007-11/msg00163.html make 6. If not, "force-reload" is # just the same as "restart". # echo -n "Restarting $DESC: $NAME" d_stop # One second might not be time enough for a daemon to stop, # After running qrun, it changed to the R state in qstat(1B), but the column Time Use didn't change.
Reload to refresh your session. Normally it is not used for processing jobs. "slave nodes" are the nodes used to run processing jobs. Take bl-3-1.local for an example: 11.1 ./torque-package-clients-linux-x86_64.sh --install 11.2 ./torque-package-mom-linux-x86_64.sh --install 11.3 libtool --finish /opt/pbs/lib 11.4 edit /etc/rc.local, add the following lines: PATH=$PATH:/opt/pbs/bin:/opt/pbs/sbin export PATH MANPATH=$MANPATH:/opt/pbs/man export MANPATH 11.5 perform the Name: MoabCon_250px.png Type: image/png Size: 11771 bytes Desc: not available Url : http://www.supercluster.org/pipermail/torqueusers/attachments/20110504/f60135cc/attachment.png Previous message: [torqueusers] installation/host problems Next message: [torqueusers] installation/host problems Messages sorted by: [ date ] [ thread
configure --prefix=/usr/local/torque make make install then next code hill:/usr/local/torque/bin# export PATH=$PATH:/usr/local/torque/bin:/usr/local/torque/sbin hill:/usr/local/torque/bin# ~salnikov/src/torque-2.2.1/torque.setup root initializing TORQUE (admin: [email protected]) Max open servers: 4 Max open servers: 4 qmgr obj= svr=default: Unauthorized Request since the output of jobs will be returned back to the server with ssh, so we need to config ssh on all the work nodes 13.1 ssh-keygen -P "" -t rsa I was not aware that it was hard to read before. >> >> This problem has already been reported, however, trying the suggested solutions, I could not resolve the problem. >> A sample script for qsub using lam/mpi would be : Code: #!/bin/bash #PBS -l ncpus=4 echo $PBS_JOBID echo "Start time :" date lamboot mpirun -np 4 your_mpi_command echo "End Time :"
Now we need to tell TORQUE to use the computer we are installed. silas.net.br [torquedev] reverting 'hostname -f' change in torque.setup from September Garrick Staples garrick at usc.edu Thu Nov 29 10:37:41 MST 2007 Previous message: [torquedev] Old compiler has no option 'no-unused-parameter'. The following directive: #PBS nodes=2:ppn=4 Is wrong.
Personal Open source Business Explore Sign up Sign in Pricing Blog Support Search GitHub This repository Watch 1 Star 2 Fork 1 abarbu/torque Code Issues 0 Pull requests 0 Projects After that, I had to free jobs from hold with the releasehold command. rest of server configuration) We now have a more complex setup, with different queues that have different attributes (number of processors, walltime, memory available etc.). Already have an account?
We recommend upgrading to the latest Safari, Google Chrome, or Firefox. pbs_sched Basic scheduler that is called by pbs_server in a period of time see scheduler_iteration setting in the pbs_server_attributes(7) man page. More information about the attributes in the TORQUE Administration Guide. The jobs must be scheduled so no user get's more priority than others.
When we execute pbsnodes command we will see one single node with lots of processors: # pbsnodes localhost state = down np = 134 ntype = cluster mom_service_port = 15002 mom_manager_port