Skip to main content

Datasets

Mochi has dataset operations built into the language. A program loads a CSV, JSON, JSONL, or YAML file into a typed list, queries it with from / where / select, joins it against another list, sorts and slices the result, and saves it back to disk. No imports.

Loading data

load <path> as <Type> reads a file and decodes each record into the given type. The format is inferred from the extension.

type Person { name: string, age: int }

let people = load "people.json" as Person

Supported formats:

ExtensionFormat
.csvComma-separated. Header row required.
.jsonJSON array of objects.
.jsonlOne JSON object per line.
.yaml, .ymlYAML sequence of mappings.

An explicit format clause overrides inference:

let people = load "raw.txt" as Person format = "jsonl"

If the file cannot be read or a record fails to decode, load raises an error. Wrap in try / catch (see errors) to recover.

A first query

The minimal query is from <var> in <list> select <expression>:

let names = from p in people select p.name

from introduces a binding (p) that ranges over the list. select shapes each output. The result is a list<string>.

A where clause filters:

let adults = from p in people
where p.age >= 18
select p

Each clause is optional except from and select. Clause order matters: where runs before sort by, which runs before skip / take.

Sorting, skipping, taking

let top = from p in people
where p.age >= 18
sort by -p.age
skip 1
take 5
select p
ClauseMeaning
sort by <expr>Ascending sort. Prefix with - for descending.
skip nDrop the first n rows.
take nKeep only the first n rows after skip.

Multiple sort keys are written as a comma-separated list:

sort by p.country, -p.age

Shaping the output

select produces any value, not only the original record. Maps and struct literals are common.

let summary = from p in people
where p.age >= 18
select {
name: p.name,
is_senior: p.age >= 60
}

A struct literal works when the shape is also a known type:

type Adult { name: string, age: int }

let adults: list<Adult> = from p in people
where p.age >= 18
select Adult { name: p.name, age: p.age }

Joins

join combines two lists. The on clause is the join condition.

type Order { id: int, customer_id: int, total: float }
type Customer { id: int, name: string }

let orders = load "orders.json" as Order
let customers = load "customers.json" as Customer

let report = from o in orders
join c in customers on o.customer_id == c.id
select {
order_id: o.id,
customer: c.name,
total: o.total
}

join is an inner join. Orders without a matching customer are discarded. For an outer join, use left join:

let report = from o in orders
left join c in customers on o.customer_id == c.id
select {
order_id: o.id,
customer: c?.name ?? "(unknown)",
total: o.total
}

A left join introduces a Customer | nil binding because some orders have no match. Use ?. to access fields safely.

Grouping

group by collects rows into buckets. The select clause runs once per group.

let by_country = from p in people
group by p.country
select {
country: p.country,
total: len(p)
}

Inside the select of a grouped query, the binding (p here) refers to the list of rows in the group, not a single row. Aggregate over it with prelude functions like len, sum, max, avg:

let stats = from p in people
group by p.country
select {
country: p.country,
avg_age: avg(p | map(fun(x: Person): int => x.age))
}

Saving

save <list> to <path> writes a list back to disk. Format is inferred from the extension; format = "..." overrides.

save adults to "adults.json"
save adults to "adults.jsonl" format = "jsonl"

save understands the same four formats as load. Unknown extensions raise a compile-time error.

Combining with the rest of the language

from / where / select returns an ordinary list. The result passes into for, into map / filter / reduce, or into another query.

let summary = from p in people where p.age >= 18 select p

for p in summary {
print(p.name)
}

let names = summary | map(fun(p: Person): string => p.name)

A query is a value. Store it in a let, return it from a function, or pass it as an argument like any other list.

Common patterns

Top-K

let top3 = from item in scores
sort by -item.value
take 3
select item

Filtered count

let unread_count = len(from b in books where !b.read select b)

Distinct

let distinct_countries = from p in people
select p.country
| unique

unique is a prelude helper that removes duplicates from a list while preserving order.

Reading and rewriting

let users = load "users.json" as User
let updated = from u in users select User { ...u, active: u.last_seen > now() - 30 }
save updated to "users.json"

The struct spread (...u) copies all fields of u, then overrides what follows.

Performance notes

  • Queries evaluate eagerly. The intermediate result of every clause is a real list in memory.
  • For very large datasets, prefer streaming via .jsonl and process row by row with a for loop.
  • Joins build a hash table on the smaller side before scanning the larger side. Both inputs need to fit in memory.

Common errors

MessageCauseFix
cannot decode <field> as <type>Type mismatch in the fileCheck the source data or the type declaration.
unknown extensionload from a file with no recognized formatAdd format = "...".
group by must precede selectClause orderReorder the query.
aggregator over a non-list valueCalling len(p) outside a groupUse the binding only inside a grouped select.

See also

  • Errors for catching load failures.
  • Generative AI for feeding query results into a prompt.
  • Tutorial for a worked example with load, query, and save.