In Part 1, we talked about the problems in trying to hot reload a CH module.

Before talking about our progress, I would like to mention a few problems in plugins package. The plugins package is currently not maintained well and does not work correctly with all recent versions of ghc. Particularly, module loading does not work when a third party module is given as input on 64-bit architecture. load requires the dependency information for loading all the dependent modules in the correct order. This information is parsed from the .hi files created when a haskell module is compiled. I am not sure why it is unable to resolve objects in 64 bit architecture. Maybe the hi parsing code is broken for 64 bit architecture. So, it probably will only work on 32 bit architecture for now. I am testing the code on a 32-bit Vagrant VM. Let me know if you want the VM image for testing.

Also, in ghc-7.6, unload does not actually unload the object code. It has been fixed in ghc-7.8 by Simon Marlow for static builds.

In my previous post, I talked about the problem of doing an unload before a reload otherwise a reload won't really do a load. In ghc-7.6, an unload does not unload the Object code anyways. It simply removes the module from the list of currently loaded modules. And load checks whether a module is already loaded by looking at that list and does not reload if it already exists. Now, one way we could prevent changing the type of DynamicT to include a Module type as argument(which was a problem) is that we could simply call unload m just after we load it. The next call to load will work perfectly, the current load is still valid because unload didn't really unload the object code and we don't need to explicitly pass the module to reboot.

Now, it works except that it still executes the old version. The reboot function reloads the new version of code and simply returns. The return goes back to the old code(still in memory). This means that load can handle multiple versions of module in memory. But, instead of returning to the old version of code, we would want to start with the new version of code.

If you look at the type of load,

1
load :: FilePath->[FilePath]->[PackageConf]->Symbol->IO (LoadStatus a)

you can pass a symbol and when you evaluate the value associated with that symbol, you evaluate the code of that symbol (for example main).

One way to approach the problem would be the following. Instead of reboot just loading a new version of code and returning, we don't let reboot return. Instead we evaluate the new symbol.

If the symbol is main, we basically start from the beginning of the new version (main) and we can short-circuit all initialization if it's reboot instead of a boot. But we would need to pass all the state of the processes as well as their hidden state such as mailbox(CQueue) which would require lots of changes to the existing CH infrastructure.

If the symbol is instead "server", the process that we are upgrading, it should not type check because the type of reboot is IO() and the type of server is (a -> Process ()) but it does not give any type error. Any it runs perfectly with the old vesion of the code as if the call to server at the end of reboot has failed silently. I am not sure what is happening here.

If instead of evaluating server inside reboot, we could return the value to server and let the server call the new server, that would be ideal just like in erlang. But this does not look possible.

The type of reboot would be something like

1
reboot :: IO (Maybe a)

where a = DynamicT -> Process ()

The type of DynamicT is same as the type of reboot. So, the type of DynamicT becomes

1
type DynamicT = IO (Maybe (DynamicT -> Process ()) )

This gives a "Cycle in type synonym declarations" type error. Even if it type checked , calling the new server value from the old server code, might not magically lead to sharing of process state between the old and new "server".

Is there any hope to achieve erlang style ?MODULE:server call to simply switch to the new version in memory flawlessly (with access to old mailbox which is decoupled from the process in erlang) in Cloud Haskell? For now, it does not look like it would be possible without fundamentally ripping apart the Process state and radically changing the CH code.

We want to achieve message passing reliability as in case of Erlang during a code upgrade. In an earlier post , we showed a simple test to check if messages get lost in the ether while an upgrade is taking place.

We would ideally like that cloud haskell also does not lose messages while it upgrades our code from one version to the next. In the worst case, we could make do without this requirement since messages can anyways get lost due to physical disconnects and the supervisor mechanism along with fault tolerance techniques like link and monitor can be used to make sure that faults get isolated. In a way, we can defer the problem to the unreliability of physical world while an upgrade is in place.

One important issue in haskell is that we cannot easily have multiple versions of the same module in memory using plugins package. It might require ghc runtime support but not much research has been done in that direction. Even without this support, we still could have multiple processes running on a single node and all of them might get upgraded together since all refer to the same module in memory. Or some processes might crash. Best way to know is by trying it out.

Below is a simple implementation of Ping Pong system in cloud haskell. Instead of a special message type for upgrade, we simply trigger an upgrade when we receive a particular integer.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
{-# LANGUAGE DeriveDataTypeable,ScopedTypeVariables #-}
module PingPong where
import Control.Concurrent ( threadDelay )
import Control.Distributed.Process
import Control.Distributed.Process.Node
import Network.Transport ( closeTransport )
import Network.Transport.TCP

server :: DynamicT -> Process ()
server st = do
    (cid,x) :: (ProcessId,Int) <- expect
    liftIO $ putStrLn $ "Got  a Ping with value : " ++ (show x)
    case x of
      4 -> do
        liftIO $ putStrLn $ "UPGRADE"
        liftIO $ st
      _ -> do
        send cid x
        liftIO $ putStrLn $ "Sent a Pong with value : " ++ (show x)
        server st

client :: DynamicT -> Int -> ProcessId -> Process ()
client st 10 sid = do
  liftIO $ putStrLn "DONE"
client st c sid = do
  me <- getSelfPid
  send sid (me,c)
  liftIO $ putStrLn $ "Sent a Ping with value : " ++ (show c)
  (v :: Int) <- expect
  liftIO $ putStrLn $ "Got  a Pong with value : " ++ (show v)
  client st (c+1) sid

ignition :: DynamicT -> Process ()
ignition st= do
    -- start the server
    sid <- spawnLocal $ server st
    -- start the client
    cid <- spawnLocal $ client st 0 sid
    return ()
    liftIO $ threadDelay 100000-- wait a while

type DynamicT = IO ()

main :: DynamicT -> IO ()
main st = do
    Right transport <- createTransport "127.0.0.1" "8080"
                            defaultTCPParameters
    node <- newLocalNode transport initRemoteTable
    runProcess node $ ignition st
    closeTransport transport

Here is the static core which is responsible for running and reloading our dynamic application. The code in static core is pretty trivial and boilerplate. It simply compiles, loads the PingPong module and evaluates main of PingPong by passing it an argument of type DynamicT which contain everything that is required for dynamic reloading. In this example, we are not concerned with state preservation or state migration, so we only pass the reboot function to the dynamic core.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
reboot :: IO ()
reboot = forever $ do
  putStrLn "Loading"
  r <- makeAll "PingPong.hs" []
  case r of
    MakeSuccess mc fp -> do
      mv <- load fp [] [] "main"
      putStrLn $ show $ mc
      putStrLn "Loaded"
      case mv of
        LoadFailure msgs -> putStrLn "fail" >> print msgs
        LoadSuccess m v -> do
        putStrLn "success"
        v reboot
        unloadAll m
    MakeFailure msgs -> putStrLn "failed to make" >> print msgs
  putStrLn "Press y to reload"
  getChar

type DynamicT = IO ()

main :: IO ()
main = do
  putStrLn "Loading"
  r <- makeAll "PingPong.hs" []
  case r of
    MakeFailure msgs -> putStrLn "failed to make" >> print msgs
    MakeSuccess mc fp -> do
      mv <- load fp [] [] "main"
      putStrLn $ show $ mc
      putStrLn "Loaded"
      case mv of
        LoadFailure msgs -> putStrLn "fail" >> print msgs
        LoadSuccess m v -> do
        putStrLn "success"
        s <- v reboot
        getChar
        return ()

Why did we define reboot in the static module?Why can't we simple define a reboot function in the dynamic module?

reboot function will require load,unload and make from System.Plugins modules which need to be imported in the dynamic module. This does not work in practice which means we can't import plugins in the dynamic module. Why not? The plugins module contains the code for dynamically reloading. If it is imported in the dynamic module also, the static core will try to recursively load and unload the plugins module also. If the code for plugins module is unloaded, then the dynamic reloading cannot work. So, it does not make sense to try to load/unload plugins as a module in dynamic module.

We cannot write our reboot function in dynamic module. So the only other place, we can write it is in the static module. But this reboot function must be executed inside the process which wants to upgrade. So, what do we do? We pass this reboot function as a value to all processes which might need to upgrade themselves.

So, all is well and good and the code above should work right? Not so fast bro! Can you spot the problem in the code? Take your time!

Well the first subtle bug is that reboot function is incorrect. The dynamic module must be unloaded before a load otherwise it still keeps the old version in the memory. Basically, load doesn't really load if the module already exists in memory even though the version in memory is old. We must unload m(which basically removes the entry that m is loaded) before the load. To really do that, we need to know which module m should we unload. And that information is only available after a load has occurred! So, we tried doing a useless load and then getting hold of the module m, we unload m and then again do the final real load which should work perfectly. It does not work in practice! The hack is very ugly, since we don't want to load twice. The ideal solution would be to pass the module that we need to unload as an argument to reboot. So the type of reboot then becomes

reboot :: Module -> IO ()

But who knows what module needs to be reloaded. The static core only knows it(while it boots up the dynamic module). But, reboot needs to be called by the dynamic module. So, static core should communicate this value of m to the dynamic module. How? It can simply send this value while calling the main of dynamic module.

The type of main of dynamic module becomes :

DynamicT :: (Module,Module->IO())

But the type of DynamicT needs to be also declared in the dynamic module. Declaring the new type of DynamicT in dynamic module is not possible since that would required importing the plugins module(since it exports the Module constructor).We cannot import plugins module in dynamic module without doing some hackery as discussed above. One possible way to go about it might be to change the unload function to no op in case it want to unload the plugins package. That would allow importing the plugins package in the dynamic module.

What other reasons might be against loading the plugins module in dynamic module? Is there some other way to do an unload before a load in the reboot function?

Once the reboot function is fixed, we have whole host of other problems such as

  • properly returning to the correct location in the new version of code
  • making sure that message queues don't get wiped out when the upgrade occurs
  • making sure that threads responsible for managing the queue don't get killed and continue to receive message from the network buffers
  • upgrading one process does not crash other processes running on the same node
  • how to manage the state problem in case the state of processes need to be preserved across upgrades

Currently, we need answers to the following questions before we can look to fixing the above problems?

  • What happens to message queues and other state created by processes during an upgrade?
  • What happens to the threads which are running the processes themselves during an upgrade? Are the processes(threads) killed? If not, will they continue to work without problems after reboot?

Thoughts and comments please!

UPDATE : The next post in the series is up here.

We can spawn multiple processes on a single node. Both erlang and cloud-haskell support this feature.

Can we force a process to upgrade at any arbitrary point in time?

Think about a scenario when a process is writing to a file/socket and acquiring lock to a shared resource (for example printing a file). We would not like the process to abruptly upgrade.

So, how do we upgrade a process?

We send an upgrade message to the process. When the process is ready, it can pick the message from its mailbox, and upgrade safely. Processes can now upgrade at different points in time. Imagine multiple processes running in a single node. Each process can be running a different version of a module. We cannot keep infinite number of versions of a module in memory. Erlang only supports 2 simultaneous versions per module. When a third version is loaded into memory, the oldest version is removed and processes running the oldest version simply crash. It improves reliability in the sense that at least a single upgrade doesn't crash other processes in a node.

Keeping 2 versions of a module in memory is then an issue of enhancing reliability. Even if we have only one version in memory, it still is not bad enough. The only caveat is that other local processes might crash.

This blog would contain my thoughts on how to achieve hot code reloading in cloud haskell. Nothing too rigorous! Just my thoughts on a paper.

One of the problems that I am currently dealing with in my thesis is the following:

How to make sure that the messages arriving during an upgrade don't get dropped?

First , we have to make sure what happens in erlang. According to the erlang manual, erlang keeps 2 versions of the module in memory i.e old and current. When a module is upgraded,

  • the old version is discarded
  • processes running the old version crash(or maybe are killed)
  • the current version is marked as old
  • the new version becomes current

Fully qualified function calls of the from ?MODULE:foo() always refer to the current version. Non qualifited function calls such as foo() refer to the version in which they were invoked.

So, code reloading boils down to simple using a fully qualified function call!

Coming back to the original question, what happens to the messages in transit during an upgrade. Below is my attempt at figuring out the answer.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
-module('pingpong').
-compile(export_all).
-import(timer,[sleep/1]).


start(N) ->
    Server = spawn(?MODULE,server,[]),
    _ = spawn(?MODULE,client,[Server,N,0]).

server() ->
    receive
        upgrade ->
            compile:file(?MODULE),
%%            sys:suspend(?MODULE),
            code:purge(?MODULE),
            sleep(1000),
            code:load_file(?MODULE),
%%            sys:resume(?MODULE),
            ?MODULE:server();
        {ping,Cid} ->
            sleep(1000),
%%            io:format("New version Running!~n"),
%%            io:format("Received a PING!~n"),
            Cid ! pong,
%%            io:format("Sent        a PONG!~n"),
            server()
    end.

client(_,0,C) ->
    io:format("DONE!~n"),
    io:format("Received ~p PONGS!~n",[C]);
client(Server,5,C) ->
    io:format("Sending an upgrade message!~n"),
    Server ! upgrade,
    From = self(),
    Server ! {ping,From},
    io:format("Sent          a PING!~n"),
    receive
        pong ->
            io:format("Received a PONG!~n"),
            client(Server,5-1,C+1)
    end;
client(Server,N,C) ->
    sleep(1000),
    From = self(),
    Server ! {ping,From},
    io:format("Sent          a PING!~n"),
    receive
        pong ->
            io:format("Received a PONG!~n"),
            client(Server,N-1,C+1)
    end.

You can try changing the code before an upgrade message is received and see for real that hot-code reloading indeed works! Also, since the client is sending 10 pings, it must receive 10 pongs. Does this guarantee that no pings are lost?

Here is a demo in my terminal showing the execution of the above program. During the execution before the 5th ping, I uncomment the code at line 29 to display the new version running message after an upgrade has occurred.

Compared to Haskell, Erlang has a vm which is a huge advantage since it controls the execution and keeps multiple versions of modules in memory. Besides, it can make sure that some calls go to the current module and some go to the old module. Also, the mailbox of each process is preserved during upgrades as it is separately managed by the vm.

How to achieve all this in Cloud Haskell? Why do we need to have mutiple versions of modules in memory? Think about it! I will answer this in my next post.