Introduction

Q is a successor to k which is a successor to A+, an APL
successor. 
Arthur Whitney, the designer and implementor believes
in concise, expressive, and efficient languages.
In fact, q is implemented on top of k.
The executable is under 400 kilobytes.

Some people like to develop directly in the interpreter.
I tend to develop programs in files and then execute them.

The basic data types are atoms (single entities, e.g. a single number),
lists (a sequence of atoms) which double as arrays, dictionaries, and tables.
The language is very powerful: you can do bulk operators on entire
arrays and tables and do interprocess communication.
If treated well, it can also execute fast.

Operators (sometimes called "verbs") can operate on any data type
but in different ways. 
A good way to get familiar with the language is to type
\l help.q
in a q window
help `
help `verbs

Let's start off simple:

x: 4 

y: 5

x+y

x * y

x: 4 40 400 

y: 5 55 80

x + y

x * y

Note however that processing goes from right to left rather than
through operator precedence.
So, for example, what do you anticipate you would get from:

5 - 4 * 3

5 * 4 - 3

(5 * 4) - 3




sum x * y / ability to "compose" functions

x[1]

y[2]

z: ((1 2 3);(4 5 6);(7 8 9 10))

z[1;2]

z[1 2] 

Some useful monadic (single argument) verbs:
https://code.kx.com/trac/wiki/Reference

til 10
1 + til 10
x: 1 + til 10
y: 3 + til 10
reverse x

\c 25 200
til 100


v: 3 2 6 4 9 8 6 5 2 3 4
distinct v
first v
min v
max v
sum v
sums v
prd v
prds v
count v
iasc v / permutation that will give sorted array
v[iasc v] / sorted array itself
avg v

Please take time tonight to review all the commands
in basicmoves.q


===============

Now let's put some commands inside a file and execute them 
based on their command-line arguments.

Call the file findcount.q 
The "I"$ means that the arguments should be integers.
The arguments are on the command line and are copied to .z.x

findcount.q:
  count "I"$ .z.x

q findcount.q 2 3 42 11 1

Exercise 1: Create a file that produces the arithmetic mean of 
the arguments on the command line.
	q findmean.q 2 1 4 0 9


A few dyadic (dual argument) verbs.
x + y
x * y
x % y  / notice that division is %, because the / is used for comments and more

Now, we can start writing procedures to do things.
For example, let us write a procedure to project the medicare budget
into the future.
We have to know the current level, the percent raise, and 
the number of years.
Then we can write a procedure that will give us the budget
after so many years.

project:{[current; interest; numyears]
 x: (1 + interest) xexp numyears;
 x * current}

/ 2010 medicare budget is about $453
/ So projected budget in  2030 will be:
project[453; 0.03; 20]

See budget.q.

Note a few things: Need a semi-colon after every line
but the last one. (Forgetting this can lead to strange bugs.)
Every line except the function definition should be outdented.


Now compute into the future and see what the budget will be.

The present value of an income stream given an annual interest rate
is the sum of the contribution from year 1, year 2, ...
To compute the contribution from year k, you have to take the amount
you receive in year k and find the amount you would have to deposit
now in order to get that amount k years from now.

Exercise 2: Create a file that produces the present value
of a sequence of floating point numbers given as payments
for year 1, year 2, ... up to the length of the vector.
Hint: prds and # may help.
Hint: try 10 # 0.3
	q presval.q 100 200 100 400

Suppose you want to compute certain summary statistics of a
collection of numbers.
For example, if the original sample is 
x: 10 ? 5

x

You might get a mean of:

avg x

but the numbers are jumping around much more than if you have 
1.9 + 10 ? 0.2

Under the normal distribution assumption, we would compute a 
standard deviation. 
sqrt var x

But what if we don't care to assume
a normal distribution (because we think the data could have a bias).
In resampling statistics, you don't assume a distribution.
Instead you take the data as it is and take repeated samples
of the same size "with uniform probability at random using replacement".
What this means is that we might create samples like this:

x[(count x) ? count x]

and again

x[(count x) ? count x]

and again

x[(count x) ? count x]


That is, you might create such samples 10,000 times and then
store all the means.
If you then sort all the means and take the 250th and the 9,750th
you have the limits of the 95% confidence interval.

Tost,: avg (count x) ? x]

to do this, you'll need to understand two constructs:
to append to a list, you can do something like:

mylist: x[10 ? count x]
mylist,: x[10 ? count x]

To iterate, you can use a loop such as:

mylist: ();
/ do[10000; mylist,: avg x[(count x) ? count x]]
do[10000; mylist,: avg (count x) ? x]

Exercise 3: Given a list of integers at the command line,
compute the mean, 2.5% value and 97.5% value using resampling.
	q findconf.q 1 2 1 2 3 1 4

	<Students may find a deliberately placed error. They should find
	it.>

Ok, now continuing our discussion of resampling statistics,
let us say that we have two lists and we want to compute the 
significance of the difference of two means.
A classical application is drug testing.
We test the drug vs. the placebo.
The way we do this is again by resampling.
We take the two lists, compute the mean difference
and then see whether permuting the labels would give us the same
difference or more. 
We do this many times and determine what fraction of the time
random permutation gives a bigger difference.
Whenever random permutation gives a bigger difference,
we ascribe the difference to chance.

For example, if we have

drug: 50 + 20 ? 15
placebo: 30 + 30 ? 30

then we get an initial difference of means of

(avg drug) - (avg placebo)

To see how often this would happen if we permuted the labels,
we take the whole vector
v: drug,placebo

and permute it 10,000 times.
After each permutation,
we take the average value of  the first (in this case) 15 elements
and the next 30.
We count how often the computed average is greater than 
or equal to the measured average.
That count divided by 10,000 is the p-value.
(p-value = probability that the observed difference in the mean
happened by chance.)

Exercise 4a: Given two lists drug and placebo, find the p-value
of the difference in their means. 
  q evalsig.q 4 5 2 3 5 1 _ 1 2 3 1 2 1 

Exercise 4b (easier than 4a):
The present interest rate is 3%.
Over each of the next five years (including the first year),
the interest rate can go up
by 0.5%, down by 0.5%, or stay the same, all with the same
probability.
Find the median net 5 year interest rate, as well as the 75th
and 25th percentiles.

	q montecarlo.q 3 0.5

Exercise 4c (harder than 4b)
Allow the number of years, the present interest rate and the 
increment of the interest rate to be variable.
Find the median net n year interest rate, as well as the 75th
and 25th percentiles.
	q montecarlogen.q 5 3 0.5

Switching gears a little and recognizing that finance is a lot about time,
let's learn a little about date arithmetic.
First, there is a date data type:

mystart: "D"$(string 2030),(".01.01")
myend: "D"$(string 2030),(".12.31")


`week$ mystart / finds  the date of the monday just prior or equal to mystart
`week$ myend

To find the day of the week of mystart.

finddayofweek:{[x] 
  numdays: x - `week$ x;
  vec:`mon`tues`wed`thurs`fri`sat`sun;
  vec[numdays]}

finddayofweek[mystart]


Exercise 5:
Find all the weekend dates of a given year where the year is given 
as a command line argument.
	q findweekend.q 2020

Exercise 5b: Find all third Saturdays in every month
for a given year.
	q findthirdsat.q 2025


Q can also handle strings.
For example, suppose that we wanted to 
take a bunch of command line arguments and find out how many times
each argument occurs.
Overall, this means we want to collect the command line arguments,
then group them, and count them.

Let's build our way up to this because there is a lot we can learn from this.

x: "S"$ .z.x

Now can do something called "grouping"

g: group x

Grouping creates a "dictionary" consisting (in this case)
of argument, position pairs.
For example, 

q countargs.q dave was not here was not was

yields the following g:

dave | ,0
was | 1 4 6
not | 2 5
here| ,3

Now type:
key g

Then

value g

Do you see what you are getting?

Now it is time to put the keys and the counts together.

(key g),'(count each value g)

What is going on here?
count just counts the number in a list.
The modifier (or "adverb") each means that each list should
be counted separately.
The ,' means that we are taking each key and combining it with
the corresponding count.


Exercise 6: Find each argument with its count.
	qcountargs.q

Exercise 7: Find the words tied for having the highest counts.
	findmaxarg.q

Here are some hints:

y: (key g),'(count each value g)

y[;1]

x: 50 + 10 ? 6
max x
x[where x = max x]


Set Operations:

Is every member of x also in y?
subset:{[x;y] min x in y}


Find the elements of the x vector that are in a y vector
 intersect
 set.q


Find those elements of x that are not in y
 difference
 set.q


String operations
Delete all blanks in a line

Exercise: Write a multiintersect function

/ deletes all blanks
delblanks:{[x] 
 ii: where not x = " "; 
 x[ii] }

Take a string and separate it by blanks

/ separate into words
sepwords:{[x]
  ii: where x = " ";
  ii: distinct 0, ii;
  y: delblanks each ii _ x;
  c: count each y;
  ii: where not c = 0;
  y[ii]}


lower takes a string and puts it in lower case
ltrim removes leading blanks; rtrim trailing blanks
"|" sv("hey";"there") yields "hey|there"
` sv `foo`bar yields `foo.bar
` sv `:/q`tutorial`draft1 yields `:/q/tutorial/draft1 
2 sv 1000b yields 8
" " vs "hey there" yields ("hey"; "there")

Exercise 8: Write a function that takes its command line arguments,
splits them based on underbars into k lists and finds the intersection
among those lists. All words should be reduced to lower case.

q findinter A b c x _ D x a e b _ b b d X d

yields b x
 
 
Other string ops


Translations from k to q:

cross[x;y] == ,/ x ,/:\: y
til 10  == !10
distinct x  == ? x
where x = 1 == & x = 1
first x == *x
min x == &/ x
max x == |/ x
reverse x == |x
x like y == x _sm y  note that we can do things 
 	like fl[ao]p to mean either a or o
	like [09] to mean 0 though 9
       like "*[^09]" strings that do not end with a digit
x ss y == x _ss y
enlist x == ,x
raze x  == ,/x
list1 cross list2 == ,/list1 ,/:\: list2

value group x == =x 
group x == dictionary consisting of ?x and =x



Some things to note
1. In functions, must end lines with a semi-colon
yet it executes even without. But when it executes it
doesn't do the right thing. 
2. If I use find, I have to know the type of the entity I'm finding.
For example if I find among bits then I must say x ? 1b not x ? 1

