Setting up an AWS EC2 node as jenkins slave to a master hosted locally

Setup a password less ssh authentication for <<user>> on EC2 node

  1. Login to AWS EC2 node
  2. su to the user where you would setup jenkins slave
  3. create a key value pair
  4. add pub part of the key value pair to the authorized_keys
  5. copy the private part of the key value pair into a local file. Name it as <<user>>.pem
  6. chmod 400 <<user>>.pem
  7. test ssh to the ec2 node as <<user>> from local
    1. ssh -i <<user>>.pem <<user>>@ec2-node

Add ec2 node as slave node in jenkins

  1. scp the pem file to the jenkins master server
  2. Login to jenkins UI as admin
  3. Go to Jenkins —> Manage Jenkins –> Manage Nodes —> New Node
  4. provide /home/<<user>> as remote FS root
  5. click on advanced button
  6. provide hostname and user name
  7. leave password blank
  8. specify absolute path to the pem file for this <<user>>
  9. save and launch slave

Configure jenkins jobs to run on ec2 slave node

  1. go to jenkins job that you want to run remotely
  2. click on “Restrict where this project can be run”
  3. specify the slave name as “Label Expression”
  4. save the job and build.

Measuring performance and understanding user navigation through javascript

Measuring performance and understanding user navigation is supported by most of the browsers now natively through their implementations of “Navigation & Timing” W3C spec. It doesn’t need any additional library.

For example: The below code snippets work correctly on both IE(tested on version is 11.x) and Chrome (tested on 40.x).

Measure page load time.

var perfData = window.performance.timing;
var pageLoadTime = perfData.loadEventEnd – perfData.navigationStart;

Measure request response time.

var connectTime = perfData.responseEnd – perfData.requestStart;

Several other metrics can be captured directly using the window.performance object ex: window.performance.memory returns information about memory consumption of the page, window.performance.navigation.type tells if a page load is triggered by a redirect, back/forward button or normal URL load.

Measure Ajax Requests

app.render = function(content){ 
myEl.innerHTML = content; window.performance.mark('end_render'); 
window.performance.measure('measure_render', 'start_xhr', 'end_render'); 
};
var req= new XMLHttpRequest(); 
req.open('GET', url, true); 
req.onload = function(e) { window.performance.mark('end_xhr'); window.performance.measure('measure_xhr', 'start_xhr', 'end_xhr'); app.render(e.responseText); } 
window.performance.mark('start_xhr'); 
myReq.send();

Measure Custom events

Very similar to measuring Ajax Requests using mark and measure.

More information Here:

 

 

Graph processing in query languages using UDAFs – Part II

In the previous post I have attempted to make a case that it is feasible to perform some graph processing, (specifically the problem of finding components in a graph given edges), in query languages using user-defined-aggregation-functions.

The approach relied on going over edges one by one and continuously build up components. The components were built as a <Key, Value> pair where Key was the node and the Value was the array of nodes constituting a component.

For example for a dataset like this

N1,N3
N1,N5
N2,N1
N2,N6
N7,N8
N10,N11
N20,N4
N9,N4

Below would be the result

{“N9”:[“N20″,”N4″,”N9″],”N5”:[“N1″,”N3″,”N5″,”N2″,”N6″],”N6”:[“N1″,”N3″,”N5″,”N2″,”N6″],”N7”:[“N7″,”N8″],”N8”:[“N7″,”N8″],”N1”:[“N1″,”N3″,”N5″,”N2″,”N6″],”N2”:[“N1″,”N3″,”N5″,”N2″,”N6″],”N3”:[“N1″,”N3″,”N5″,”N2″,”N6″],”N20”:[“N20″,”N4″,”N9″],”N4”:[“N20″,”N4″,”N9″],”N10”:[“N10″,”N11″],”N11”:[“N10″,”N11”]}

The above approach gives results in a map structure that is easier to join with other tables.

This approach as we noted in earlier post could need considerable compute power as the number of edges increase. This is due to the fact that as number of edges increase the component size inturn increases and this leads to O(n^3) complexity for the merge method which is considerable.

We also noted in the earlier post that a better approach would be to implement Weighted Quick Union with Path Compression(WQUPC) algorithm which has several optimizations ontop of the map approach. Some of them include

    1. Build trees to represent clusters.
    2. Reduce depth of individual trees as the algorithm progresses through edges.
    3. Use integer arrays vs heavier data structures
    4. No repetitions of clusters

NOTE: This approach would need atleast one array as big as the number of nodes in the graph.

There is one challenge in implementing the above approach in a horizontal-scale cluster setup ex: in Hadoop-Hive ecosystem.

Sets of edges need to be looked at in parallel and partial results need to be merged.
Below is an attempt at implementing WQUPC in Hive as an UDAF.

Below is the result in the form of <Node, Root of the cluster tree>

N9 N4
N8 N8
N7 N8
N6 N5
N5 N5
N4 N4
N3 N5
N20 N4
N2 N5
N11 N11
N10 N11
N1 N5

This approach is very close to O(m* log(n)) complexity where n is the number of edges and m is the number of connections.

It easily scales to 10s of millions of edges.

Gitlab – Bitnami – An Awesome combination

Gitlab community edition is a fantastic software for code hosting and bitnami has a very nice self-contained gitlab distribution.


Few observations on bitnami distribution of gitlab 7.3.2.

Few references for upgrade:

VoltDB – Cheat Sheet

  1. Sequences in VoltDB – https://up-the-voltage.runkite.com/generating-ids-in-voltdb/
  2. Dual and Connection keep alive query – select 1 from dual – https://issues.voltdb.com/browse/ENG-5687
  3. Variable length encoding schemes – define varchar using character length instead of byte size – https://issues.voltdb.com/browse/SUP-158
  4. Install VoltDB client jar into local maven repository – http://dheerajvoltdb.wordpress.com/2013/09/03/installing-voltdb-in-local-maven-repository/

NULL in SQL operations in Netezza

The equivalent of Oracle’s Dual table in Netezza is a view named _v_dual


--select 1/null from _v_dual;
null
--select 1*null from _v_dual;
null
--select 1 + null from _v_dual;
null
--select 1 - null from _v_dual;
null
--select DECODE(0, 0, 1, 2/DECODE(0,0, null, 0)) from _v_dual;
1
--select DECODE(3, 0, 1, 2/DECODE(0,0, null, 0)) from _v_dual;
null
--select DECODE(4, 0, 1, 2/DECODE(5,0, null, 0)) from _v_dual;
Error: ERROR: Divide by 0

SQLState: HY000
ErrorCode: 1100

--select DECODE(4, 0, 1, 2/DECODE(5,0, null, 1)) from _v_dual;
2

Building a Personal Website.

This site is built using

  • Ubuntu
  • AWS : EC2, Elastic IP
  • Apache : 3 Sites, Virtual Hosts, a2ensite, a2enmod, mod proxy http, customized document Root
  • JRuby on Rails
  • JQuery
  • Twitter Bootstrap: JS and CSS
  • Google Drive: Docs and Presentations
  • Atlassian: Jira and GreenHopper
  • Git
  • Google Analytics
  • WordPress

Below is the sequence of tasks that I took thus far

  • Setup an EC2 node
    • Purchase a reserved instance of m1.large type
    • choice of m1.large was made based on three factors
      • Considerable memory may be required for the large number of applications that will be setup
      • Higher CPU may not be required given that the load will not be high
      • micro, small, medium may be too small for the memory requirements and m2.xlarge would be too high
    • Purchase a reserved instance of shortest term possible (In this case I purchased a node from a 3rd party for 1 month term)
    • Consider a heavy utilization instance since, the node has to be up and running all the time and load cannot be predicted.
    • Take Ubuntu 13.04 AMI
  • Setup Apache
  • Setup MySQL
  • Purchase and Configure a domain name
  • Purchase Jira Starter license and Setup Jira with MySQL as backend
  • Setup WordPress with MySQL as backend
  • Setup a private repository on bitbucket.org to host website project
  • Configure website with Twitter bootstrap starter template
  • Configure website to centralize all content

References