Writing (code) in an unfamiliar language feels a bit weird. By adjusting one’s mindset and expectations, one can reduce the weirdness.
Here are some expectations I use to have a good time while coding:
- Sometimes you can code with self-control, other times you can only code by instinct.
- Human-generated code will always have defects. Compilers, tests, code reviews, and users can catch those defects.
- The “users” of code are anybody who interact with it. The most common “users”" are customers (of the software product), developers, QA, and SREs.
- Typing code is not always the bottleneck, thinking is often the bottleneck. In some cases, waiting on compilation or test cases is the bottleneck.
- Code quality is not straightforward. When writing personal code, “quality” is highly related to the rate at which I learn. Even if the code does not work, if it helps me learn more quickly, then it is useful.
Rust provides some “safe defaults”. You need to write let mut in order to experiment with mutability. You need to
include the keyword unsafe to do pointer arithmetic.
Building a language like Rust is impressive. I look forward to being able to build my own one day. There are many ideas that haven’t been tried yet.
use std::collections::HashMap;
use simd_json::
derived::{ValueTryAsScalar, ValueTryIntoContainer}
;
struct JsonColumnStore {
// TODO:
deltas: Vec<(DeltaOperation, i64, String)>,
// Column-oriented
keys: Vec<i64>, // Same size as values
// Sparse or dense? (columns[i])
// Let's do dense for now and figure out how sparse works later
// TODO:
values: Vec<Vec<Option<Variant>>>, // TODO:
columns: HashMap<String, usize>,
}
#[derive(Clone, PartialEq, Debug)]
enum DeltaOperation {
Insert,
Update,
Delete,
}
impl JsonColumnStore {
fn new() -> Self {
return JsonColumnStore {
deltas: Vec::new(),
keys: Vec::new(),
values: Vec::new(),
columns: HashMap::new(),
};
}
fn insert(&mut self, k: i64, value: String) {
self.deltas.push((DeltaOperation::Insert, k, value));
}
fn update(&mut self, k: i64, value: String) {
self.deltas.push((DeltaOperation::Update, k, value));
}
fn delete(&mut self, k: i64) {
self.deltas.push((DeltaOperation::Delete, k, String::new()));
}
fn compact(&mut self) {
// Move data from deltas to columns
let mut delta_map = {
let mut m: HashMap<i64, (DeltaOperation, String)> = HashMap::new();
for (op, k, v) in self.deltas.iter() {
match op {
DeltaOperation::Insert => {
m.insert(k.clone(), (op.clone(), v.clone()));
}
DeltaOperation::Update => {
// TODO: what about partial updates?
m.insert(k.clone(), (op.clone(), v.clone()));
}
DeltaOperation::Delete => {
m.insert(k.clone(), (op.clone(), v.clone()));
}
}
}
m
};
self.deltas = Vec::new();
// Scan all data to update it
// TODO: can we do better?
let mut new_keys = Vec::new();
let mut new_values: Vec<Vec<Option<Variant>>> = Vec::new();
let mut new_columns = self.columns.clone();
for (k, old_v) in self.keys.iter().zip(self.values.iter()) {
if let Some((op, new_v)) = delta_map.get(&k) {
match op {
DeltaOperation::Insert | DeltaOperation::Update => {
self.insert_delta(
*k,
new_v,
&mut new_keys,
&mut new_values,
&mut new_columns,
);
}
DeltaOperation::Delete => {
// Ignore the row
}
}
delta_map.remove(k);
} else {
new_keys.push(*k);
for (j, c) in new_values.iter_mut().enumerate() {
c.push(old_v[j].clone());
}
// empty columns
for j in new_values.len()..new_columns.len() {
new_values[j].push(None);
}
}
}
// New values
for (k, (op, v)) in delta_map {
match op {
DeltaOperation::Insert | DeltaOperation::Update => {
self.insert_delta(
k,
v.as_str(),
&mut new_keys,
&mut new_values,
&mut new_columns,
);
}
DeltaOperation::Delete => {}
}
}
self.keys = new_keys;
self.values = new_values;
self.columns = new_columns;
}
fn insert_delta(
&self,
k: i64,
v: &str,
new_keys: &mut Vec<i64>,
new_values: &mut Vec<Vec<Option<Variant>>>,
new_columns: &mut HashMap<String, usize>,
) {
let mut value: Vec<Option<Variant>>;
unsafe {
let v = simd_json::to_owned_value(v.to_string().as_bytes_mut()).unwrap(); // TODO: unwrap
let ov = v.try_into_object().unwrap();
// TODO: turn into function, find duplicate code
// TODO: what to do if schema changes?
// TODO: recursive key support, array support
for k in ov.keys() {
if !new_columns.contains_key(k) {
new_columns.insert(k.into(), new_columns.len());
if new_values.len() > 0 {
new_values.push(vec![None; new_values[0].len()]);
} else {
new_values.push(Vec::new());
}
}
}
value = vec![None; new_values.len()];
for (c, i) in new_columns.iter() {
let kv = ov.get(c);
if let Ok(o) = kv.try_as_bool() {
value[*i] = Some(Variant {
bool: o,
i64: 0,
f64: 0.0,
str: String::new(),
arr: Vec::new(),
obj: HashMap::new(),
})
} else if let Ok(o) = kv.try_as_i64() {
value[*i] = Some(Variant {
bool: false,
i64: o,
f64: 0.0,
str: String::new(),
arr: Vec::new(),
obj: HashMap::new(),
})
} else {
panic!("unimplemented")
}
}
}
new_keys.push(k);
for (i, c) in new_values.iter_mut().enumerate() {
// TODO: json parse functionify (also used above)
c.push(value[i].clone());
}
}
fn scan(&mut self, columns: Vec<String>) -> Vec<Vec<Option<Variant>>> {
self.compact(); // TODO: optimize
let mut column_indices = Vec::with_capacity(columns.len());
for k in columns {
column_indices.push(self.columns.get(&k).unwrap());
}
//
let mut result = Vec::new();
for i in column_indices {
result.push(self.values[*i].clone()); // TODO: avoid clone possible?
}
return result;
}
}
// TODO: is it possible to make this better in memory?
// Cache locality.
#[derive(Debug, Clone)]
struct Variant {
bool: bool,
i64: i64,
f64: f64,
str: String,
arr: Vec<Variant>,
// Hashmap or Vector is better here?
obj: HashMap<String, Variant>,
}
fn main() {
println!("Hello, world!");
// let mut j = JsonColumnStore::new();
// let v = r#"{"b": true, "z": 1234}"#;
// let v2 = r#"{"b": true, "z": 5677}"#;
// j.insert(1, v.to_string());
// j.insert(2, v.to_string());
// j.update(1, v2.to_string());
// j.delete(2);
// let result = j.scan(vec!["b".into(), "z".into()]);
// println!("result: {:?}", result);
let mut sc = delta::ScalarColumn::new();
let v = vec![1,2,3,4,5,6,9,10];
sc.load(v.clone().into_iter());
let v2 = sc.scan();
println!("{:?}", (v, v2.collect::<Vec<i32>>()));
}