Daemonzing linux processes

If you use Linux on servers and connect remotely via SSH, many times you'd want run a process which will keep doing its work quietly in the background while you work on something else. Sometimes you'd like to process your job in the background in such a way that even if you close your session or logout, it keeps doing it work until it finishes; for instance a resizing script to create thumbnails of all images on your machine. In some cases you'd also want to control your running process such that it keeps running in background but you may stop or restart it conveniently at any time.

This blog post covers these scenarios and looks at different methods of running processes in background.

On the Linux shell, you can launch any task in background by appending "&" while launching it. The task keeps running in the background while you get back control and issue further shell related commands. However the background running process's standard output will still be shown on your shell. More importantly, if you logout while the task is running in the background, the process will exit as well. In many cases this is not desirable.

To truly run a process in the background, you need a "daemon". In computer science, a daemon is a program which runs in the background and doesn't have direct interaction with the user, neither the user has direct control over it. This means some other programs are required to control its working. Normally daemons are system processes which provide some service. For instance sshd is a daemon which provides ssh access to the users. daemons work by detaching them from standard output stream and working in background.

It is possible to daemonize a normal process. There is a command called "nohup" which tells the system not to forward the HangUp signal in case of disconnection. It also automatically redirects standard stream output to a file. Combined with "&", you can run a program in background which does not exit upon disconnection. For instance:

 $ nohup ./analyzelogmessages.py *.log & 

However once you launch this process, you cannot stop it except by help of other programs. For instance you can figure out the process ID with ps command and use the kill command. Also, in some cases you'd want your daemon to automatically start in case your server restarts.

To have a convenient control over your daemon, there are some tools and services available in Linux. A program called startstopdaemon is used to start and stop daemon programs. It provides various options such as running daemon as another user, jail root into different directory, change the priority, stopping already running daemon, force kill daemon etc.

 $ startstopdaemon --background --start --quiet --exec sms_server -- /etc/sms_server.conf # Starts the SMS Server $ startstopdaemon --stop --quiet --exec sms_server -- /etc/sms_server.conf # Stops the SMS Server 

This works well except that the system doesn't know you have a daemon and it won't start it automatically. One can put it in a init.d script. An init.d script, depending upon the linux distribution, usually resides in /etc/init.d/ and is used in much like the same manner as that of services on Windows. Creating script is simple; just create a script file and set its execute bit. On Ubuntu there is a skeleton file which acts as a template for reference.

Here is an example init.d script I wrote:



#! /bin/sh
### BEGIN INIT INFO
# Provides:          streamerd
# Required-Start:    $all
# Required-Stop:     $all
# Default-Start:     2 3 4 5
# Default-Stop:      0 1 6
# Short-Description: Streaming Service
# Description:       This is a streaming server

# Author: Sharjeel Ahmed Qureshi

PATH=/usr/sbin:/usr/bin:/sbin:/bin:/home/sharjeel/streamingserver
DESC="Streaming Server"
NAME=streamerd
DAEMON=/home/sharjeel/streamingserver/$NAME
DAEMON_OPTS="-d /tmp/ -p 8080 -f flv "
PIDFILE=/var/run/$NAME.pid
SCRIPTNAME=/etc/init.d/streamerd
NOHUP=/usr/bin/nohup
USER=sharjeel
GROUP=sharjeel

case "$1" in
 start)
 echo -n "Starting $DESC: "
 start-stop-daemon --start -c $USER -g $GROUP \
 --background --exec $DAEMON -- $DAEMON_OPTS
 echo "$NAME."
 ;;
 stop)
 echo -n "Stopping $DESC: "
 start-stop-daemon --stop -c $USER -g $GROUP \
 --exec $DAEMON -- $DAEMON_OPTS
 echo "$NAME."
 ;;
 restart|force-reload)
 echo -n "Restarting $DESC: "
 start-stop-daemon --stop --quiet --pidfile \
 /var/run/$NAME.pid -c $USER -g $GROUP --exec $DAEMON -- $DAEMON_OPTS
 sleep 1
 start-stop-daemon --start --quiet --pidfile \
 /var/run/$NAME.pid -c $USER -g $GROUP --exec $DAEMON -- $DAEMON_OPTS
 echo "$NAME."
 ;;
 reload)
 echo -n "Reloading $DESC configuration: "
 start-stop-daemon --stop --signal HUP --quiet --pidfile /var/run/$NAME.pid \
 --chuid iwitness --exec $DAEMON
 echo "$NAME."
 ;;
 *)
 N=/etc/init.d/$NAME
 echo "Usage: $N {start|stop|restart|reload|force-reload}" >&2
 exit 1
 ;;
esac


Put your init.d file in your /etc/init.d and make it executable:

# chmod +x

You have to register your script with the system so that it starts it automatically when system boots. Use the update-rc.d script which registers your script with init of the system:





# update-rc.d  defaults


 

The "defaults" parameter installs it with default runlevels of your system.

For more information, see:

Filter crappy posted videos from Facebook feed

Once Facebook's feed used to be very useful informing you about the updates of friends by their status, . I feel my Facebook feed has been hijacked by redundant and time wasting posted videos. So I wrote a small GreaseMonkey script to filter out posted videos. Here it is:

// ==UserScript==
// @name           Facebook Remove Vids
// @namespace
// @description    Removes videos from your Facebook Feed
// @include        http://www.facebook.com/*home.php*
// @include        http://www.new.facebook.com/*home.php*
// ==/UserScript==
 
function cleanUpPage() {
    var stories = document.getElementsByClassName("UIIntentionalStory");
 
    for ( var i = 0; i < stories.length; i++ ) {
        var sHTML = stories[i].innerHTML;
        if ( sHTML.match("class=\"UIMediaItem_video") || sHTML.match("class=\"swfvideo") ) {
            stories[i].style.display = "none";
        }
    }
}
 
window.addEventListener("load",
    function() {
        t = setInterval(cleanUpPage, 1000);
    }
    , false);

I must say that after removing those videos, very little has been left in my feed :)

Attempt Retries decorator in Python

In certain situations particularly when dealing with networks and distributed systems you'd want your program to retry certain a particular operation certain number of times in case it fails before giving up. For instance your program is trying to connect to a mail server which is due to network connectivity temporarily giving socket error.

The simple way is to put a loop, try certain number of times and if the result still fails give up. I found this pattern normal so I wrote a decorator in Python which can be generically used.

def attempt_retries(func, retries=3, delay=3, IgnoreException=Exception):
    """
    Decorator for ignoring certain exception for certain number times and retrying with certain delay
    e.g.   func = @retry(connect_imap, 10, 5, SocketError) # Tries ten times
    e.g.2. retry(connect_imap)(username, password)
    """
    def dec(*args, **kwargs):
        for i in range(retries-1):
            try:
                return func(*args, **kwargs)
            except IgnoreException:
                sleep(delay)
        func(*args, **kwargs)
    return dec

Usage:

attempt_retries( server_connect, 10, 20, SocketError ) ( host, user, pass)

or

r_server_connect = attempt_retries( server_connect, 10, 50, SocketError )
r_server_connect( host, user, pass)

OpenLaszlo: First Impression

OpenLaszlo is a framework which, officially, is for creating Rich Internet Applications. It takes a higher level markup language Laszlo and converts it into SWF Flash or DHTML. Like many other tools, you are not bound to create only Internet Apps using OpenLaszlo; go ahead if you wanna create your restaurants digital menu card, interface to your car's sensors or for create a skinned menu for your phone.

I just wrote my first piece of code in OpenLaszlo and the experience has turned out be pretty ambivalent. On one hand I feel having lot of power and control to generate a Flash application. On the other hand I feel very restricted due to poor documentation, lack of supporting editors and a language I feel isn't powerful enough and hence inappropriate.

Let me share my experience:

I saw OpenLaszlo couple of years back and thought it to be pretty cool, but kept on my stack of to-learn technologies until yesterday when I encountered a simple problem: My friend had to prepare his PhD proposal defense presentation and he wanted me to create him a three minute count down timer animation for a particular slide. I thought I'd generate an SWF using Flash and embed it in PowerPoint. Then I thought maybe it is a good oppurtunity to learn OpenLaszlo. Alas, this simple problem which I estimated to be a one hour excercise turned out to be 8 hour long nighter marathon; even at the end I couldn't produce something really impressive.

The main reason I took time was lack of good documentation and examples. The basic tutorial is pretty neat but afterwards everything is messed up. I thought maybe I should, just like learning any other language/tool, take a look at examples on net but it turns out that there are few examples as opposed to one would expect. The reference is pretty complex and the guide is not only hard to understand, but seems outdated as well. I couldn't get some things running which were stated in the documentation (I'm pretty sure I was doing what the documentation says).

But that is Ok as with any open source project, documentation and support gets mature over time. However there are other things that deter you from using OpenLaszlo i.e. lack of tools. The running process is horrible for the first time user. There are no editors out there. The best I could do was use Notepad++ was HTML or XML as language. The most annoying this is that there is no debugger. There IS a debug console but not a debugger.

To me, the biggest limitation was the language, the markup language. I felt that I had to hit the keyboard five more times as I would in a scripting language like Python to achieve the same task. For simple UI element level stuff it seems OK but whenever I needed to introduce some logic, I felt as if I were writing code in machine language: so much stuff to do.

For instance, take a look at this piece of code:

 <class name="box" bgcolor="red"
 height="100" width="100" />
 
 <class name="borderedbox" extends="box"
 width="${size}" height="${size}"
 onmouseover="this.changeSize(50)"
 onmouseout="this.changeSize(-50)">
 <attribute name="size" value="100"/>
 <attribute name="bordersize" value="3"/>
 <view bgcolor="yellow" x="${parent.bordersize}" y="${parent.bordersize}" width="${parent.width - parent.bordersize*2}" height="${parent.height - parent.bordersize*2}"/>
 
 <method name="changeSize" args="pixels">
         this.animate('size', pixels, 500, true);
 </method>
 </class>
 

If instead of a Markup language, had it been modeled in an object oriented script language it would have been much more succinct and readable. Let's see how almost same thing could be modeled in Python:



class Box:
 height=100
 width=100
 bgcolor="red"

class BorderedBox(Box):
 bgcolor="yellow"
 size=100
 bordersize=3

 unnamed_view = view(bgcolor="yellow", x=self.bordersize, y=self.bordersize,
 width=self.width - self.bordersize * 2,
 height=self.height - self.bordersize * 2)
 def __init__(self):
 self.width = self.size
 self.height = self.size
 onMouseOver=lambda:self.changeSize(50)
 onMouseOut=lambda:self.changeSize(-50)

 def changeSize(self):
 somehow_embed_js(""" this.animate('size', pixels, 500, true); """);

Of course it has its own limitations and cannot achieve all what the markup can, I still think that the trade-off would deter people from using it.

Simple and Effective Uploading with Python Script and PSCP

Sometimes you want your File Transfer tool to map certain folders on your dev machine to remote folders on your different remote machines. Wouldn't it be great if you could choose a particular file in your project, click it and it automatically uploads in appropriate folder on desired server?

For instance I have an XP dev machine with a project in folder "D:\workspace\saima" and another in "D:\workspace\ismspk". I would like all files in any sub-directory in "saima" to upload on my server "saima" in the appropriate sub-directory in "/home/saima/workspace/". Same goes for rest of my projects and servers. Of course I'd like to do it with a simple click rather than choosing the sessions and folders in my FTP client every time I make a change in a file.

I tried such an option with FileZilla and WinSCP but couldn't find any such option (maybe there exists one). So I thought about writing my own. Since I'm running XP with RSA keys setup with pageant, pscp (Putty SCP Client) was a good choice. Though other command-line utilities such as rsync etc. may do the job pretty as well.

I wrote a small Python script which I linked to my "Send To" menu. To add something in your Send-To menu, goto run, type "sendto" and create a new link here. I created a link to my Python Script and named it "Upload".

UPDATE: For Windows 7, type "shell:sendto" in your explorer bar to make a Send-To shortcut

Here is how it looks like

Media_httpsharjeel2sc_jedhg

Here is the Python script I wrote

"""
Author: Sharjeel Ahmed Qureshi
Description: Script for uploading files via pscp
You need to setup putty sessions and your RSA keys first.
Make sure that pageant is running and pscp is in your PATH variable.
Works for windows only
You are free to use this script anyway you like
"""
 
from os import path, popen
import sys
 
import logging
logger = logging
logger.basicConfig(level=logging.DEBUG)
 
FILES_TO_UPLOAD = sys.argv[1:]
 
# PATH_RULES is a config variable which is list of path rules
# Each path rule is a list of three members:
#    ['Local Drive + Directory', 'putty session name', 'remote directory']
# e.g.
# PATH_RULES = [
#    ['d:/workspace/saima/', 'saima', '/home/saima/workspace/'],
#    ['d:/workspace/smsfriends/', 'facebooksms', '/home/fbsms/public_html/www/smsfriends/'], ]
 
try:
    PATH_RULES = [
        # ['localdir', 'puttysession', 'remote_dir'],
        ]
 
    for i in PATH_RULES: i[0] = path.abspath(i[0]).lower()
    if not PATH_RULES: raise Exception("No rules defined!")
except Exception, e:
    logging.exception("There was an loading the config. Check your rules: %s" % e.message)
 
def err(msg):
    logging.error(msg)
    sys.stderr.write(msg + '\n')
 
def match_server(filename):
    """ Gets matching server for a given filename including fullpath """
    global PATH_RULES
    filename = path.abspath(filename).lower()
    for r in PATH_RULES:
        if path.dirname( filename ).startswith( r[0] ):
            psession = r[1]
            r_path = r[2] + filename.rsplit(r[0],1)[1].replace('\\', '/')
            return (psession, r_path)
    return None
 
def upload(filename):
 
    if path.isdir(filename):
        err("Directory Upload is currently not supported")
 
    server = match_server(filename)
    if not server:
        err("No rule available about uploading this file")
        return
    psession, r_path = server
    cmd = "pscp %s %s:%s" % (filename, psession, r_path)
    # logger.debug("Executing command %s" % cmd)
    print cmd
    print popen(cmd).read()
 
for f in FILES_TO_UPLOAD:
    try:
        upload(f)
    except:
        logging.exception("Error while uploading %s" % f)
 
raw_input("Press ENTER to Continue")


Python: Copy to clipboard

I use this small utility function for debugging certain Python scripts, especially those for processing data, e.g. moving some bits of information to Excel. The idea is pretty simple. Just pass it some information and it'll be available in your clipboard.

import sys
import win32clipboard as wc
import win32con
 
def copy_to_clipboard(msg):
   if sys.platform == 'win32':
      wc.OpenClipboard()
      wc.EmptyClipboard()
      wc.SetClipboardData(win32con.CF_TEXT, msg)
      wc.CloseClipboard()

It works on Win32 and you need CTypes installed.

Recursion, Misunderstood

My experience with the Electrical Engineers in academics is that they usually view Computer Science in terms of circuits and details of how things work at the most tangible level. One hazard of using this approach in teaching is that many times you end up with some wrong core concepts of Computer Science. They enable your abstract level thinking at large. The worst part is, you don't even know that your concepts are incorrect yet you are very confident about them.

My undergrad CS program at FAST-NUCES was heavily dominated by dedicated and competent Electrical Engineers. That is why my classmates and I have really good C/C++ programming skills, great concepts about pointers, indepth understanding of microprocessors work, how implementations of Operating Systems take advantage of them at the very nitty gritty level and have strong knowledge of other implementation specific things. On the downside, I feel such students have some gaps left in their personalities towards the mathematical face of Computer Science.

Perhaps the most commonly misunderstood such topic is Recursion. For me, it was a shock to know, after about two years of my graduation, that what I knew about recursion was quite wrong. I, along with many others were taught (and are still being taught) that recursion is about a function calling itself.  When a function calls itself, a separate call is made, the parameters and some other info is placed on stack and a new 'instance' of function takes over until it returns. If this process goes on infinitely, the program would end with an overflowed stack so there must be a base or terminating condition. The instructors would give us assignments in which recursion had to be "removed" by explicitly storing some state information on a stack so that a new call to the same function was not made. The emphasis of these assignments was that although "recursive" code is simple, recursion has a huge overhead which should be removed in most of the cases.

This is not recursion! This way of looking at recursion might be OK for a low-level C/C++ programmer who wants to build his career coding micro-controllers for the rest of his life. But these concepts of recursion are certainly disastrous for a Computer Science Major who wants to truly appreciate the beauty of algorithms.

I was lucky to learn Lisp after my graduation which gave enough abstraction from low level nitty gritty controls such as memory management and pointers as well as complex syntax, to let me focus on the problem itself rather than going into syntax.

What I learnt then is that recursion is solving a problem by deferring until it's subproblem and reconciling the solution of the problem with a simpler computation. The subproblem itself is solved in the same manner until the subproblem becomes so simple that it's solution is trivial. In mathematics, this is called inductive step.

This seems much like the former view but it's implication is quite different. The second definition of recursion implies that a function calling itself may not necessarily be forming a recursive solution. For example the following two implementations of factorial call themselves but the first one is recursive while the second one is iterative:



; Recursive
(define fact
 (lambda (n)
 (if (= n 0) 1
 (* n (factorial (- n 1))))))
 
; Iterative
(define facti
 (lambda (n, c)
 (if (= n 0) c
 (facti (- n 1) (* c n))

The reason facti is not recursive is that it is not deferring any computation.

The equal implementation of these in Python would be:

# Recursive
def fact(n):
 if n == 0: return 1
 return n * fact(n - 1)
 
# Iterative
def facti(n, c=1):
 if n == 0: return c
 else: return facti(n - 1, c * n)

Also, a function not calling itself may still be a recursive function. For example, an explicit stack storing different states to avoid function calling itself is still recursive!

 

PS: If you still don't get what recursion is, see "Recursion, Misunderstood" by Sharjeel Ahmed Qureshi.

django standalone scripts

Standalone scripts and cron jobs are integral part of any significantly complex web based application. We use them in our applications for health monitoring, mails sending, computing scores and ratings, deferred tasks and job queues such as video processing etc.

To make a python script work which uses django's functionality, you need to have the required environment variables set: your application's path in your PYTHON path and name of the settings module in DJANGO_SETTINGS_MODULE. If you are using Unix, you can set these in your bash_profile so that each time you log in, the environment variables are already set.

For instance, if your project's name is myproject, then add these lines (with appropriate changes) in ~/.bash_profile file:


export PYTHONPATH="/home/path/to/myproj/"
export DJANGO_SETTINGS_MODULE="myproj.settings"

Now you can write and execute a python script which imports models and views of myproj.

CronJobs using the django models or views also need these environment variables which can be set at the top of the crontab file. Open the crontab file using this command on shell:


$crontab -e # Edit the crontab file

Put these lines at the top to pass required environment variables to any subsequent cronjobs


PYTHONPATH=/home/path/to/myproj/
DJANGO_SETTINGS_MODULE=myproj.settings

If you use webmin for scheduling cronjobs, you can create environment variables from from the menu.

Django Subdomains

We have added a new feature at See'n'Report. It now provides each user with a personal sub-domain URL. The URL maintains user's profile which lists all the photo reports submitted by the user. For example my See'n'Report profile URL is http://sharjeel.seenreport.com/ .

Adding support for sub-domains in Django is simple but has a few catches. It took me quite some time to get everything working.

The following links provide a quick way to make sub-domains:

rossp.org - Using Subdomains with Django
Django Ticket #5022 - Proposed middleware: SubdomainURLsMiddleware

The summary of above links is that you have to put a wildcard entry in your domain DNS so that all your subdomains resolve to the IP on which your site is hosted. Then make sure apache handles all the requests to sub-domains; again using a wildcard in apache's configuration. Then write a middleware which checks the subdomain in request's HTTP Host header and processes it accordingly by either loading a customized URLs pattern or by handling the request with a view with appropriate logic.

This worked well for me and I was quickly able to make the changes so that http://www.seenreport.com/user/sharjeel was available at http://sharjeel.seenreport.com/.>

There were a few little problems:

Firstly, sessions did not work across the subdomains. If I were logged in www.seenreport.com, I wouldn't be logged in at sharjeel.seenreport.com.  This was solved by using setting the SESSION_COOKIE_DOMAIN variable in settings.py:

 SESSION_COOKIE_DOMAIN = '.seenreport.com' if not DEBUG else '.localhost'

Secondly, all the main navigation links (login, register etc.) which were written as relative links were now pointing to sharjeel.seenreport.com instead of www.seenreport.com. For instance the login on my profile page became sharjeel.seenreport.com/login instead of www.seenreport.com/login.>

I could have hardcoded "www.seenreport.com" with these links to make them absolute URLs but that would have been bad in terms of maintainability. It requires an if-else logic for each link so that it renders accordingly on dev/test machines and the production machine.

I used <base> tag with href="http://www.seenreport.com/" (for production, and localhost for dev) to make the navigation links relative to main domain.  This also changed the profile links. However, in our case, profile links are very few so adding hardcoded absolute URLs alongwith some {{ if }}  {{ else }} is viable.

Using base-href everything became well except one. Ajax calls didn't work now. When using <base> tag, the URLs of your ajax calls become relative to href. In this case when an ajax call is made, a domain other than your current one is contacted, which is disallowed by the browser and following exception is raised:

"Access to restricted URI denied" code: "1012"

To overcome this, I appended following with each ajax call's URL:

location.<span style="color: #006600;">protocol</span> + <span style="color: #3366cc;">'//'</span> + location.<span style="color: #006600;">host</span> + original_url ...
 

This made all the ajax calls relative to the current subdomain and made everything work perfectly. Took me quite some time to figure out :)

What Lisp & Assembly instill

My Assembly Programming Language teacher, Belal Hashmi Sahib said in the first lecture of the course, “Assembly is extremely simple. It is so simple that students don’t expect such simplicity and hence it starts appearing complex to them”. Then he wrote MOV AX, BX on the board, explained it and asked if everyone understood it. Everybody nodded. He asked again to make sure there was not even a slightest bit of confusion. It seemed very simple and trivial to everyone. He proceeded by saying “This is the end of the assembly language programming course. The rest of the course is just the revision of MOV AX,BX”. Everybody thought it was a joke but later it turned out that assembly indeed ended at MOV AX, BX. The rest of the course was about learning other concepts to fully utilize the potential of MOV AX, BX.

Last year I learned Lisp on my own and then TAed a course on Lisp at LUMS with Dr. Umar Saif. In the first lecture the first thing Dr. Umar taught was (lambda (x) (+ x x)). At this moment I realized that just like Assembly, Lisp had finished! Because if students had understood it, there was nothing much about the language left. All they needed now was to understand some important concepts such as recursion, closures etc. to use Lisp.

What is common between Assembly and Lisp? Both of these two languages are very simple to learn as they have little syntax or fancy language features. However the process of learning them compels one to grasp some important concepts. For instance by learning assembly you are bound to understand some flavor of computer architecture, interrupts, timers and other hardware related stuff. Similarly by learning Lisp you automatically get to know functional programming, interpretation, recursion, closures etc.

A programmer familiar with assembly can easily understand the use of unions, options such as __decl and other such features of C because he knows the underlying magic. Similarly if one wants to understand the philosophy of the features of Javascript, Python & Ruby I strongly recommend learning Lisp.