Here is a good rule of thumb: If you are trying to solve a problem, and there are multi-billion dollar firms whose entire business model depends on solving the same problem, and there are whole courses at your university devoted to how to solve that problem, you might want to figure out what the experts do and see if you can’t learn something from it.
Rcpp
or Cython
etc is a bit like Stargate. You loose control the moment you pass the barrier to C++
for a little bit. (Even though those are great solutions.) If the C++
part of your code becomes large, testing this code becomes increasingly difficult.C++
. But that has it's own drawbacks.C
, e.g., as is the case in R, matlab or python). It's easy to look and understand at how things work.fortran
fortran
is not faster than C++
matlab
, given the many good options that are available for free.julia
is a very good tool for economists with non-trivial computational tasks.In Donald Knuth's paper "Structured Programming With GoTo Statements", he wrote:
"Programmers waste enormous amounts of time thinking about, or worrying about, the speed of noncritical parts of their programs, and these attempts at efficiency actually have a strong negative impact when debugging and maintenance are considered. We should forget about small efficiencies, say about 97% of the time: premature optimization is the root of all evil. Yet we should not pass up our opportunities in that critical 3%."
v0.6.2
include("develop.jl")
in the terminalJulia
¶eps()
.eps()
. x = typemax(Int64)
x + 1
for T in [Int8,Int16,Int32,Int64,Int128,UInt8,UInt16,UInt32,
UInt64,UInt128,Float32,Float64]
println("$(lpad(T,7)): [$(typemin(T)),$(typemax(T))]")
end
Julia
Primer: Types¶Julia
Primer: Custom Types¶Jaguar
and Cat
as being subtypes of Feline
:struct Feline
weight::Float64
sound::String
end
struct Cat <: Feline
weight::Float64
sound::String
end
abstract type Feline end
struct Jaguar <: Feline
weight::Float64
sound::String
end
struct Cat <: Feline
weight::Float64
sound::String
end
# is a cat a Feline?
Cat <: Feline
# create a cat and a jaguar
c = Cat(15.2,"miauu")
j = Jaguar(95.1,"ROARRRRRR!!!!!")
# is c an instance of type Cat?
isa(c,Cat)
# methods
function behave(c::Cat)
println(c.sound)
println("my weight is $(c.weight) kg! should go on a diet")
end
function behave(j::Jaguar)
println(j.sound)
println("Step back! I'm a $(j.weight) kg jaguar.")
end
# make a cat behave:
behave(c)
# and a jaguar
behave(j)
multiple dispatch
. The same function name dispatches to different functions, depending on the input argument type.Int
s and two Float
s. The difference matters.function t1(n)
s = 0 # typeof(s) = Int
for i in 1:n
s += s/i
end
end
function t2(n)
s = 0.0 # typeof(s) = Float64
for i in 1:n
s += s/i
end
end
@time t1(10000000)
@time t2(10000000)
module MyModule
# which other modules to use: imports
using Lib
using BigLib: thing1, thing2
import Base.show
importall OtherLib
# what to export from this module
export MyType, foo
# type defs
struct MyType
x
end
# methods
bar(x) = 2x
foo(a::MyType) = bar(a.x) + 1
show(io::IO, a::MyType) = print(io, "MyType $(a.x)")
end
include
other files like thismodule Foo
include("file1.jl")
include("file2.jl")
end
~/.julia/v0.6
(system-dependent)~/.juliarc.jl
on each startup. Modify the LOAD_PATH
variable:# add this to ~/.juliarc.jl
push!(LOAD_PATH, "/Path/To/My/Module/")
Even with the best validation, it’s very hard to achieve perfect quality in software. Here are some typical residual defect rates (bugs left over after the software has shipped) per kloc (one thousand lines of source code):
- 1 - 10 defects/kloc: Typical industry software.
- 0.1 - 1 defects/kloc: High-quality validation. The Java libraries might achieve this level of correctness.
- 0.01 - 0.1 defects/kloc: The very best, safety-critical validation. NASA and companies like Praxis can achieve this level. This can be discouraging for large systems. For example, if you have shipped a million lines of typical industry source code (1 defect/kloc), it means you missed 1000 bugs!
It took the European Space Agency 10 years and $$7 billion to produce Ariane 5, a giant rocket capable of hurling a pair of three-ton satellites into orbit with each launch and intended to give Europe overwhelming supremacy in the commercial space business. All it took to explode that rocket less than a minute into its maiden voyage last June, scattering fiery rubble across the mangrove swamps of French Guiana, was a small computer program trying to stuff a 64-bit number into a 16-bit space. This shutdown occurred 36.7 seconds after launch, when the guidance system's own computer tried to convert one piece of data -- the sideways velocity of the rocket -- from a 64-bit format to a 16-bit format. The number was too big, and an overflow error resulted. When the guidance system shut down, it passed control to an identical, redundant unit, which was there to provide backup in case of just such a failure. But the second unit had failed in the identical manner a few milliseconds before. And why not? It was running the same software.
For nine months, the Mars Climate Orbiter was speeding through space and speaking to NASA in metric. But the engineers on the ground were replying in non-metric English. It was a mathematical mismatch that was not caught until after the $$125-million spacecraft, a key part of NASA's Mars exploration program, was sent crashing too low and too fast into the Martian atmosphere. The craft has not been heard from since. Noel Henners of Lockheed Martin Astronautics, the prime contractor for the Mars craft, said at a news conference it was up to his company's engineers to assure the metric systems used in one computer program were compatible with the English system used in another program. The simple conversion check was not done, he said.
(IEEE Spectrum) -- It was an air traffic controller's worst nightmare. Without warning, on Tuesday, 14 September, at about 5 p.m. Pacific daylight time, air traffic controllers lost voice contact with 400 airplanes they were tracking over the southwestern United States. Planes started to head toward one another, something that occurs routinely under careful control of the air traffic controllers, who keep airplanes safely apart. But now the controllers had no way to redirect the planes' courses. The controllers lost contact with the planes when the main voice communications system (VCS) shut down unexpectedly. To make matters worse, a backup system that was supposed to take over in such an event crashed within a minute after it was turned on. The outage disrupted about 800 flights across the country. Inside the control system unit (VCSU) is a countdown timer that ticks off time in milliseconds. The VCSU uses the timer as a pulse to send out periodic queries to the VSCS. It starts out at the highest possible number that the system's server and its software can handle — 232. It's a number just over 4 billion milliseconds. When the counter reaches zero, the system runs out of ticks and can no longer time itself. So it shuts down. Counting down from 232 to zero in milliseconds takes just under 50 days. The FAA procedure of having a technician reboot the VSCS every 30 days resets the timer to 232 almost three weeks before it runs out of digits.
Pkg.test("Package_name")
Base.runtests()
# let's do some simple testing
using Base.Test
@test 1==1
@test pi ≈ 3.14159 atol=1e-4
@test 2>3
There are at least 2 ways to chase down a bug.
println
statements at various points in your code.DataFrame
. A tabular dataset with column names.Missing
data type, provided in Missings.jl
using DataFrames # DataFrames re-exports Missings.jl
m = missings(Float64,3) # you choose a datatype, and the dims for an array
m[1:2] = ones(2)
sum(m) # => missing. because Float64 + missing = missing
prod(m) # => missing. because Float64 * missing = missing
sum(skipmissing(m)) # => 2
# you can replace missing values
println(typeof(Missings.replace(m,3)))
collect(Missings.replace(m,3))
df = DataFrame(nums = rand(3),words=["little","brown","dog"])
# there is alot of functionality. please consult the manual.
df[:nums]
df[:words][1] # "little"
df[:nums][2:3] = [1,1]
df
using RDatasets # popular Datasets from R
iris = dataset("datasets", "iris")
head(iris) # get the first 6 rows
println(iris[:SepalLength][1:6]) # get a column
show(iris[2,:]) # get a row
describe(iris); # get a description
d = DataFrame(A = rand(10),B=rand([1,2],10),C=["word $i" for i in 1:10])
by(d,:B,x->mean(x[:A]))
# subsetting a dataframe can become cumbersome
d = DataFrame(A = rand(10),B=rand([1,2],10),C=["word $i" for i in 1:10])
d[(d[:A].>0.1) .& (d[:B].==2),:] # approach 1
using DataFramesMeta
@where(d,(:A .>0.1) .& (:B.==2)) # approach 2
enter
using DataFramesMeta
@where(d,(:A .>0.1) & (:B.==2))
# the previous operation would become
@by(d,:B,m=mean(:A))
@linq
macro together with a pipe operator.LINK
(language integrated query) from microsoft .NETdf = DataFrame(a = 1:20,b = rand([2,5,10],20), x = rand(20))
x_thread = @linq df |>
transform(y = 10 * :x) |>
where(:a .> 2) |>
by(:b, meanX = mean(:x), meanY = mean(:y)) |>
orderby(:meanX) |>
select(meanx=:meanX,meany= :meanY, var = :b)
# or
# using Lazy
# x_thread = @> begin
# df
# @transform(y = 10 * :x)
# @where(:a .> 2)
# @by(:b, meanX = mean(:x), meanY = mean(:y))
# @orderby(:meanX)
# @select(:meanX, :meanY, var = :b)
# end
an alternative
q = @from <range variable> in <source> begin
<query statements>
end
using Query
df = DataFrame(name=["John", "Sally", "Kirk"], age=[23., 42., 59.], children=[3,5,2])
x = @from i in df begin
@where i.age>50
@select {i.name, i.children}
@collect DataFrame
end
println(df)
println(x)
you can do several things in a query:
df = DataFrame(name=repeat(["John", "Sally", "Kirk"],inner=[1],outer=[2]),
age=vcat([10., 20., 30.],[10., 20., 30.].+3),
children=repeat([3,2,2],inner=[1],outer=[2]),state=[:a,:a,:a,:b,:b,:b])
x = @from i in df begin
@group i by i.state into g
@select {group=g.key,mage=mean(g..age), oldest=maximum(g..age), youngest=minimum(g..age)}
@collect DataFrame
end
println(x)